28 Free Image Datasets for Computer Vision

July 20, 2021

Computer vision empowers computers with the ability to understand, label ,and interpret images. With the right image datasets a data scientist can teach a computer to essentially function as though it had eyes of its own. This technology forms the backbone for many of tomorrow’s breakthroughs and innovations like facial recognition and autonomous vehicles.

Build your own proprietary computer vision dataset. Get a quote for an end-to-end data solution to your specific requirements.

Talk with an expert

We at iMerit compiled this list to empower data scientists and innovators to make these breakthroughs happen. The following image datasets contain a diverse swathe of images, including video sequences, multiple camera angles around the same subject, and even multi-dimensional medical scanner data.

Image Datasets for Computer Vision Training

VisualQA: Among image datasets, VisualQA is notable for its open-ended questions around the roughly 265,000 images contained within. 

CompCars: This image dataset features 163 car makes with 1,716 car models, with each car annotated and labeled around five attributes including number of seats, type of car, max speed, and displacement. 


Oxford-IIIT Pet Images Dataset: This pet image dataset features 37 categories with 200 images for each class. The images vary based on their scale, pose, and lighting, and have an associated ground truth annotation of breed, head ROI, and pixel-level trimap segmentation.

CIFAR-10: One of the larger image datasets, CIFAR-10 features 60,000 32×32 images that are colored divided into 10 separate classes. Each dataset is also divided into five training batches and one test batch, with each containing 10,000 images. 

Indoor Scene Recognition: This dataset is highly specialized for anyone training a model to recognize indoor scenery. Contained within are 67 indoor categories across 15620 images.

Plant Image Analysis: This is a compilation of several image datasets that features a whopping 1 million images of plants, with the choice of roughly 11 species of plants. 

Home Objects: Contains commonly found objects from around the house.

Celebfaces: This image dataset features over 200,000 images of your favorite celebrities. Each celebrity comage comes with 40 attribute annotations. 

Stanford Dogs Dataset: 20,580 images of dogs across 120 unique breed categories with roughly 150 images for each class.

Fishnet Open Images Dataset: Perfect for training face recognition algorithms, Fishnet Open Images Dataset features 35,000 fishing images that each contain 5 bounding boxes. 

Google’s Open Images: Featuring a fantastic 9 million URLs, this is among the largest of the image datasets on this list that features millions of images annotated with labels across 6,000 categories.

Google’s Open Images

Columbia University Image Library: Featuring 100 unique objects from every angle within a 360 degree rotation.

MS COCO: MS COCO is among the most detailed image datasets as it features a large-scale object detection, segmentation, and captioning dataset of over 200,000 labeled images. 

Lego Bricks: This image dataset contains 12,700 images of Lego bricks that have each been previously classified and rendered using 

Labelme: One of MIT’s Computer Science image datasets created in conjunction with Artificial Intelligence Laboratory (CSAIL), this one features 187,240 images, 62,197 previously-annotated images across 658,992 labeled objects. 

ImageNet: Organized in accordance with the WordNet hierarchy, ImageNet is among the go-to image datasets for all new algorithms. Each node within the WordNet hierarchy is depicted in hundreds of thousands of images.

VisualGenome: Visual Genome was created to connect language with organized image concepts, and features a detailed visual knowledge base with 108,077 previously captioned images.

Youtube-8M: This large-scale dataset comes labeled with millions of YouTube video IDs, along with annotations of 3,800+ visual entities. Entities are excluded that aren’t localizable like movies or TV series.

FERET: FERET (Facial Recognition Technology Database) is an image dataset featuring over 14,000 images off annotated human faces.

Labelled Faces in the Wild: An aptly over-titled image dataset, labelled faces in the wild features 13,000 labeled images of human faces. It’s especially useful for facial recognition.

Places: This scene-centric image dataset contaqins 205 unique scene categories with 2.5 million images that are labeled based on within a category.

Flowers: Featuring flowers commonly found across the UK, this image dataset contains over 102 different categories with each flower seen from different poses and light variations.


xView: Features over 1 million objects across complex scenery and large images in one of the largest publicly available overhead image datasets.

PascalVOC: Also known as Pascal Visual Object Classes, this dataset is aimed at improving visual object recognition, it provides a substantial dataset and tools on a specialized platform. With 20 classes, the training and validation data has 11,530 images, 27,450 ROI annotated objects, and 6929 segmentations.

Cityscapes: It has a diverse collection of stereo video sequences from street scenes in 50 different cities. Notable for its high-quality and pixel-accurate annotations across 5,000 frames, it includes 20,000 frames with coarse annotations.

VGGFace2: Also known as the Visual Geometry Group, contains nearly 3.31 million images across 9191 classes, each representing a unique individual. It is utilized for a range of tasks including face detection, recognition, and landmark localization.

IMDB-WIKI: It includes 460,723 face images from 20,284 celebrities indexed on IMDb, complemented by an additional 62,328 images from Wikipedia. A total of 523,051 images are useful for facial recognition training.

SUN Database: Scene Categorization Benchmark or Scene UNderstanding database contains over 130,000 images and 900 categories, each annotated to provide precise scene recognition. It is essential for SV applications such as scene layout analysis, scene classification, and object detection, across different contexts.

iMerit’s has labeled, annotated, and segmented over 100 million images and videos, empowering computer vision algorithms. Our teams leverage a vast array of annotation techniques, including semantic segmentation and LiDAR annotations, to extract rich data at the pixel level. This enables us to create highly detailed datasets that go beyond facial recognition. Imagine training algorithms to identify objects in self-driving cars or segment medical images for accurate diagnoses.

Explore more about our computer vision solutions here.