Classifying Images with Artificial Intelligence

October 20, 2022

Photo by Hitesh Choudhary on Unsplash

An image classification model is a supervised learning algorithm that falls under the category of computer vision. The ultimate goal is for a computer to identify objects in an image based on labeled training images, and then provide recommendations or take action based on the information.

For example, computer vision is used in fields like facial recognition and autonomous vehicles. However, the process of developing a reliable image classifier takes time and training. Many data scientists and ML engineers can write the code for an image classifier using Python, specifically TensorFlow.

That being said, the purpose of this article is to run through the basics of how an image classifier works, ending with an example of a python script that can be used to build your own image classifier. Specifically, we’ll be leveraging TensorFlow. But, before we jump into the coding tutorial, it’s essential to understand the fundamentals…

How does an image classifier work? 

The first thing to note is that image classification requires a lot of data. Luckily, there are quite a few free image datasets available, including images of cars, plants, pets, fishing, and more.

Ultimately, there are two core technologies behind an image classifier – deep learning and a convolutional neural network (CNN).

What is deep learning?

Deep Learning is an area of machine learning that deals with artificial neural network algorithms, which were inspired by the structure and function of the brain. In order to model the human brain, deep learning employs a combination of data inputs, weights, and biases. Together, these three elements enable us to accurately detect, classify, and describe objects within unstructured data. In this example, we are trying to classify images.

The neural networks used in deep learning models typically contain three or more layers and essentially the model “learns” from large amounts of data. While some effective neural networks use one layer, by including multiple hidden layers, deep learning allows data scientists to optimize the models for accuracy.

Forward Propagation and Backpropagation

Two major processes are involved in deep learning – forward propagation, and backpropogation. We’ll go into how these processes work below.

In forward propagation, layers of interconnected notes build upon each other to refine the model. The initial input layer ingests the data and the output layer contains the prediction or classification. These are both considered ‘visible layers’.

On the other hand, backpropagation focuses on moving backward through the layers for model training. By utilizing algorithms like gradient descent, the model calculates errors and adjusts weights and biases to refine predictions. Combining forward propagation and backpropagation, the neural network makes predictions and then corrects for errors as it learns, allowing it to become more and more accurate over time.

What are Convolutional Neural Networks?

To start, Convolutional Neural Networks (CNN) are computationally demanding, requiring graphical processing units (GPU) to train models. While that aspect might deter some from trying to work with CNN, there’s another side to this – CNN takes the manual, time-consuming feature extraction preprocessing and makes it scalable. CNN starts with the raw pixel data and “learns” how to extract the higher-level features on its own, by incorporating principles like matrix multiplication to detect patterns in an image.

Photo by Hal Gatewood on Unsplash

Three types of CNN Layers

The architecture of Convolutional neural networks is fairly complex, but at its core, there are three types of layers. As the CNN moves through the layers, the model becomes increasingly complex, enabling it to identify more of the image. Let’s take a look at the three-layer CNN architecture:

  • Convolutional layer – This is the first layer where the majority of the computation occurs. Requirements include input, filters, and a feature map. For example, the input for a CNN is a color image made up of a matrix of pixels, each with a height, width, and depth (RGB). Then there is the feature detector, which is known as a filter. This moves across the receptive fields of the image and looks for the specific feature, while also adding specific weights. Together, the process is called convolution. The final output from the input and filter is a feature map, which is essentially a set of numerical values that the neural network then interprets. After each convolution, the model applies what is known as a Rectified Linear Unit (ReLU) transformation, which introduces nonlinearity to the model.
  • Pooling layer – This layer conducts dimensionality reduction. Also known as downsampling, these layers reduce the number of input parameters. Pooling layers don’t have specific weights, so the values in the receptive field are aggregated. This results in two main types of pooling known as max pooling and average pooling. The goal of the pooling layer is to simplify the CNN, improve efficiency, and prevent overfitting to the training data.
  • Fully-connected (FC) layer – What sets the FC layer apart from the previous partially-connected layers is that each node in the output layer connects directly to a node in the previous layer. Ultimately, the FC layer is what classifies the image based on the features extracted from the convolutional and pooling layers. The FC layer typically uses a softmax activation function for classification versus the ReLu functions that occur after convolutions.

Image Classification with Python

While developing an image classification model can be daunting, downloading TensorFlow with a pip package manager is a good place to start. The following sections review the basics of TensorFlow as well as a summary of the sample image classification tutorial.

What is Tensorflow?

Before we jump into the python script, it’s important to understand what Tensorflow is and how the package will enable you to develop your image classifier.

At a high level, TensorFlow is an end-to-end platform for machine learning, which supports things like multidimensional-array-based numeric computation, GPU and distributed processing, automatic differentiation, model development, model training, and model export.

There is a ton of information in the Tensorflow guides available online, but for the purpose of this tutorial, we’re going to skip to the application for an image classifier. Therefore, the major focus of the following sections is utilizing the Keras library. However, we’ll need to import a few other libraries as well:

Import libraries

import matplotlib.pyplot as plt

import numpy as np

import PIL

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers

from tensorflow.keras.models import Sequential

Load the Data

Next, you’ll want to load your dataset. This tutorial references a dataset with 3700 flower images. Next, the Keras library can turn the directory of images on a disk into a with the following code:

image_size=(256, 256),


As you can see, there are places to set your parameters, including batch_size, img_size, and validation_split.

The next step is normalizing the data, which keeps the pixel values in [0,1]. The basic structure is demonstrated below:

normalization_layer = layers.Rescaling(1./255)
normalized_ds = x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]

print(np.min(first_image), np.max(first_image))

Build the Model

Now that you have your data ready, it’s time to build an initial model. For this demonstration, we’ll review the Keras Sequential model:

num_classes = len(class_names)

model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding=’same’, activation=’relu’),
layers.Conv2D(32, 3, padding=’same’, activation=’relu’),
layers.Conv2D(64, 3, padding=’same’, activation=’relu’),
layers.Dense(128, activation=’relu’),

The next bit of code compiles the model with an optimizer and loss function.

model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’)

There are different options available for the optimizer and loss function, which can be found in the Keras library documentation.

Model Training

Next, to fit the model to the training data, we call and define the number of epochs (the number of times the entire dataset is passed forward and backward through the neural network).

history =

Predict on New Data

When you feel that your model is ready, load the new dataset and make predictions:

predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])

“This image most likely belongs to {} with a {:.2f} percent confidence.”
.format(class_names[np.argmax(score)], 100 * np.max(score))

While this was a basic introduction to image classification with Tensorflow, the hope is that this can help get a team started with CNN and deep learning.

iMerit Computer Vision Solutions

iMerit collaborates to deploy AI and Machine Learning in Autonomous Technology, Geospatial Technology, Medical AI, and other industries. Our solutions labeled, annotated, enriched, and segmented over 100 million images and videos that power Computer Vision algorithms.

If you’d like to learn how iMerit can augment your deep learning projects, please contact us to talk to an expert.