22 Free Image Datasets for Computer Vision

July 20, 2021

Computer vision empowers computers with the ability to understand, label ,and interpret images. With the right image datasets a data scientist can teach a computer to essentially function as though it had eyes of its own. This technology forms the backbone for many of tomorrow’s breakthroughs and innovations like facial recognition and autonomous vehicles.

Build your own proprietary computer vision dataset. Get a quote for an end-to-end data solution to your specific requirements.

Talk with an expert

We at iMerit compiled this list to empower data scientists and innovators to make these breakthroughs happen. The following image datasets contain a diverse swathe of images, including video sequences, multiple camera angles around the same subject, and even multi-dimensional medical scanner data.

Image Datasets for Computer Vision Training

VisualQA: Among image datasets, VisualQA is notable for its open-ended questions around the roughly 265,000 images contained within. 

CompCars: This image dataset features 163 car makes with 1,716 car models, with each car annotated and labeled around five attributes including number of seats, type of car, max speed, and displacement. 


Oxford-IIIT Pet Images Dataset: This pet image dataset features 37 categories with 200 images for each class. The images vary based on their scale, pose, and lighting, and have an associated ground truth annotation of breed, head ROI, and pixel-level trimap segmentation.

CIFAR-10: One of the larger image datasets, CIFAR-10 features 60,000 32×32 images that are colored divided into 10 separate classes. Each dataset is also divided into five training batches and one test batch, with each containing 10,000 images. 

Indoor Scene Recognition: This dataset is highly specialized for anyone training a model to recognize indoor scenery. Contained within are 67 indoor categories across 15620 images.

Plant Image Analysis: This is a compilation of several image datasets that features a whopping 1 million images of plants, with the choice of roughly 11 species of plants. 

Home Objects: Contains commonly found objects from around the house.

Celebfaces: This image dataset features over 200,000 images of your favorite celebrities. Each celebrity comage comes with 40 attribute annotations. 

Stanford Dogs Dataset: 20,580 images of dogs across 120 unique breed categories with roughly 150 images for each class.

Fishnet Open Images Dataset: Perfect for training face recognition algorithms, Fishnet Open Images Dataset features 35,000 fishing images that each contain 5 bounding boxes. 

Google’s Open Images: Featuring a fantastic 9 million URLs, this is among the largest of the image datasets on this list that features millions of images annotated with labels across 6,000 categories.

Google’s Open Images

Columbia University Image Library: Featuring 100 unique objects from every angle within a 360 degree rotation.

MS COCO: MS COCO is among the most detailed image datasets as it features a large-scale object detection, segmentation, and captioning dataset of over 200,000 labeled images. 

Lego Bricks: This image dataset contains 12,700 images of Lego bricks that have each been previously classified and rendered using 

Labelme: One of MIT’s Computer Science image datasets created in conjunction with Artificial Intelligence Laboratory (CSAIL), this one features 187,240 images, 62,197 previously-annotated images across 658,992 labeled objects. 

ImageNet: Organized in accordance with the WordNet hierarchy, ImageNet is among the go-to image datasets for all new algorithms. Each node within the WordNet hierarchy is depicted in hundreds of thousands of images.

VisualGenome: Visual Genome was created to connect language with organized image concepts, and features a detailed visual knowledge base with 108,077 previously captioned images.

Youtube-8M: This large-scale dataset comes labeled with millions of YouTube video IDs, along with annotations of 3,800+ visual entities. Entities are excluded that aren’t localizable like movies or TV series.

FERET: FERET (Facial Recognition Technology Database) is an image dataset featuring over 14,000 images off annotated human faces.

Labelled Faces in the Wild: An aptly over-titled image dataset, labelled faces in the wild features 13,000 labeled images of human faces. It’s especially useful for facial recognition.

Places: This scene-centric image dataset contaqins 205 unique scene categories with 2.5 million images that are labeled based on within a category.

Flowers: Featuring flowers commonly found across the UK, this image dataset contains over 102 different categories with each flower seen from different poses and light variations.


xView: Features over 1 million objects across complex scenery and large images in one of the largest publicly available overhead image datasets.