Online retailers are embracing machine learning with open arms. That’s because machine learning is an unprecedented opportunity that’s capable of significantly boosting sales, reducing waste, and increasing supply chain and new product development efficiency. Besides that, the data collected by online retailers over the past several decades can finally be put to good use in the hands of a solid ML model.
Luckily, much of this data is available to the general public. If you’re looking to train an ML model using retail datasets, then look no further than this iMerit-compiled list.
Product Datasets for Machine Learning
Women’s Shoe Prices: List of 10,000 women’s shoes along with their corresponding product information (name, brand, price).
Men’s Shoes Prices: This retail dataset features a list of about 10,000 men’s shoes and their corresponding prices (at which they were sold).
Fashion Products on Amazon.com: Initially created via extraction of data from Amazon.com, this pre-crawled retail dataset features 22,000 unique fashion products.
Item Data: Featuring 500 SKUs around an outdoor-lifestyle apparel brand, this retail dataset gives real item-level data in a real-world format. It’s extremely useful for recommendation systems.
Fashion-MNIST: This retail dataset is perfect for anyone crafting a recommendation system. It contains SKUs across 60,000 training images along with a set of 10,000 test images that are classified in 10 classes.
Innerwear Data from Victoria’s Secret and Others: Containing roughly 600,000+ innerwear products, this retail dataset features information extracted from sites including Amazon, Victoria’s Secret, Calvin Klein, Macy’s, Nordstrom, and more.
Electronic Products and Pricing Data: This list of over 7,000 electronic products contain pricing information across 10 unique fields.
E-commerce Tagging for Clothing: This retail dataset features images from ecommerce sites. Of the 907 items contained within, 504 of these items have been previously annotated.
Ecommerce Data and Search Relevance Datasets for Machine Learning
BestBuy Search Queries NER Dataset: Bestbuy’s retail dataset comes with manually labeled search queries that have been previously labeled to denote important entities like Brand, Model Name, and Category Name.
Ecommerce Search Relevance: This retail dataset features image URLs, detailed listings of product features, queries that produced the end search result, and more.
E-Commerce Data: Compiled by the UCI Machine Learning Repository, this ecommerce dataset features online retail transactions taken between 2010 and 2011 for a UK-based and registered non-store online retailer.
Retail Transaction Datasets for Machine Learning
Retailrocket Recommender System Database: Collected from real-world Ecommerce sites, this retail dataset is built around visitor behavior and contains information surrounding click rates, add-to-carts, and checkout data that eventually led to complete transactions.
Online Retail Dataset (UCI Machine Learning Repository): This transactional retail dataset features all transactions spanning an eight month period for a major UK-based online retailer.
Online Auctions Dataset: Retail dataset from eBay featuring auction data on luxury items such as Cartier wristwatches and Swarovski beads. There’s also data on game consoles and other popular electronics.
Brazilian Ecommerce Public Dataset: Brazilian retail dataset containing over 100,000 orders that were placed on Olist spanning between 2016 and 2018 across several marketplace. Information contained and tracked within pertain s to price, order status, payment and freight performance with reviews also featured.
Custom Review Datasets for Machine Learning
Grammar and Online Product Reviews: Retail dataset featuring 71,045 reviews across 1,000 different products that were gathered and provided by Datainfiniti’s Product Database. It can be used for a multitude of ML use cases.
Women’s E-Commerce Clothing Reviews: Featuring anonymized commercial data, this retail dataset contains 23,000 real customer reviews and ratings.
Amazon Commerce Reviews Set: This custom-tailored retail dataset was derived by identifying 50 of Amazon’s most active users who frequently post reviews in several newsgroups. This retail dataset is perfect for pattern recognition.
Amazon and Best Buy Electronics: This list of 7,000+ reviews made online around 50 electronic products includes information regarding date, source, rating, title, reviewer metadata, and more.
Multi-Domain Sentiment Analysis Dataset: A retail dataset compiling product reviews on a 1 star to 5 star basis. While slightly older, this dataset can be converted into binary labels if needed.
Ecommerce Data for Machine Learning
Economic Census: This retail dataset provides a detailed portrait around business happenings across several industries and businesses once every five years. The information ranges from the national level down to the local level.
E-Stats: This US-government retail dataset is used to report the value of goods and services that were sold on the internet and other open networks.
Ecommerce Sales by Merchandise Category (1999-2015): Containing census data that focuses on total ecommerce sales, this retail dataset provides intimate knowledge of line items such as merchandise line and compound annual growth rate from 1999-2015.
Annual Retail Trade Survey (ARTS): Contains national estimates of total annual sales, expenses, and inventories that were stored outside of the US.
EU External Trade Datasets: This government retail dataset features information regarding import value, trade/export surpluses, and country of origin for specific products.