Post

Top 10 Stock Market Datasets for Machine Learning

July 23, 2021

Investors have been working forever to find ways to accurately predict the stock market. As cryptocurrencies continue to blow everyone away with their make-or-break volatility, the financial sector is scrambling to find ways to predict and anticipate these market fluctuations. Whether you’re an investor, or just someone who likes the challenge of building an accurate predictive model, here are iMerit’s top ten picks around stock market and cryptocurrency datasets that will aid in machine learning.

Stock Market Datasets

  1. Historical Stock Market Dataset: Containing the daily prices and volume information surrounding US stocks and ETFs on the NASDAQ, NYSE, and NYSE MKT, this dataset features high-quality financial data that was updated as recently as November 2017. It’s also considered to be the best stock market dataset of its kind.
  2. Istanbul Stock Exchange: Originally used in the paper “A novel Hybrid RBF Neural Networks model as a forecaster, Statistics and Computing”, where it was cited as a viable and accurate forecasting algorithm, this dataset contains seven columns including information from the Istanbul stock exchange national 100 index, standard & poor’s 500 return index, stock market return index of Germany, stock market return index of UK, stock market return index of Japan, stock market return index of Brazil, MSCI European Index, and MSCI emerging markets index.
  3. News and Stock Data: Prepared by a teacher within Deep Learning and NLP, this dataset was initially meant for a binary classification task setup. It includes information from Reddit’s r/worldnews subreddit between the dates of June 8th, 2008 and July 1st, 2016. There’s also information from the Dow Jones Industrial Average between August 8th, 2008 and July 1st, 2016.
  4. Stock Market from a High Level: Just as the name suggests, this stock market dataset features high-level stock market data taken from the Nasdaq, Dow Jones, and S&P 500 market indexes beginning in 1977 and ending in 2017.
  5. Stock Market Turnover Ratio: This dataset features information from the Federal Reserve Bank of St. Louis that zeroes in on the total value of shares that were traded during very specific time periods. These results are then references against the average market capitalization for the time period being examined to determine the disparity and predict accordingly.
  6. Uniqlo Stock Price Prediction: While the previous entries on this list focus on the stock market, this dataset zeroes in on a single company: Uniqlo. As Uniqlo has been one of the largest clothing retailers in Japan for close to five decades, the stock data between 2012 and 2016 contained in this dataset showcase some interesting fluctuations that help when building predictive models.

National Currencies and Cryptocurrency Datasets

  1. CoinMarketCap Dataset: As cryptocurrency continues to be one of the most volatile assets in the world, people are continually looking for ways to predict its ups and downs. CoinMarketCap is a cryptocurrency market analysis website featuring terabytes of data around each coin’s daily price and trade volume. This dataset features CoinMarketCap’s information in columns around date, symbol, open, high, low, close, volume, and market cap.
  2. Currency Exchange Rates: Featuring information reported around daily currency exchange rates by the International Monetary Fund (IMF), this dataset contains information on 51 currencies between January 1st, 1995 and November 4th, 2018. 
  3. Daily Prices for All Cryptocurrencies: This massive dataset features historical price data taken from every cryptocurrency that’s currently being traded between the dates of April 28th, 2013 and November 30th, 2018. The information is organized around each cryptocurrency’s name, date, rank, close ratio, and spread.
  4. Free Forex Data: Taken from Histdata.com, this dataset focuses on Forex data from multiple currencies and is available on General ASCII, MetaStock, MetaTrader, Microsoft Excel, and NinjaTrader.