An Ultimate Guide to Understanding Semantic Segmentation

by | Jan 5, 2020 | Machine Learning

10 Min Read. |

Semantic segmentation has gained prominence in recent times. Nowadays, there is a lot of discussion on self-driven automatic cars. How on earth can a car drive on its own? We read about some road accident or the other in the newspapers almost every day. If cars with drivers can cause accidents, how can we expect driverless cars to drive safely?

You will be astonished to know that they can. It is because they use semantic segmentation techniques to identify images, and thereby negotiate the obstacles correctly. Now, you will wonder if it is possible. Remember the famous quote,  “It always seems impossible until it’s done.”

We shall now discuss what semantic segmentation is in this semantic segmentation tutorial.

What is semantic segmentation?

All of us have heard about pixels in an image. As humans, it is not a challenge for us to identify different objects in a picture quickly. We can distinguish a tree from a man and a car from a bicycle easily. It takes a fraction of a second for us to do that. However, machines do not have this sensory perception. They follow a set of rules. One such rule that helps them identify images via linking the pixels in an image is known as semantic segmentation.

In simple words, semantic segmentation can be defined as the process of linking each pixel in a particular image to a class label. These labels could include people, cars, flowers, trees, buildings, roads, animals, and so on. The list is endless.

Thus, it is image classification at the pixel level. Accordingly, if you have many people in an image, segmentation will label all the objects as people objects. However, there is a separate concept known as instance segmentation that can label different instances where an object appears in an image. This concept is handy for counting footfalls in a specific location such as a city mall.

It has applications in various fields. But before we look into that, let us first understand semantic segmentation networks.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

A semantic segmentation tutorial

We shall explore popular methods to perform semantic segmentation using the classical and deep learning-based approaches.

Classical Methods for performing semantic segmentation

Nowadays, everyone uses deep learning-based methods for semantic segmentation. However, before this era, people were using classical techniques to segment images into regions of interest.

Gray Level Segmentation

It is the simplest of all forms of semantic segmentation, as it involves hard-coded rules that a region should satisfy to be assigned a specific label. You can use the pixel’s properties like grey-level intensity to frame such rules. The Split and Merge algorithm uses this technique where it recursively splits the image into different sub-regions until it can assign a label. Subsequently, it combines the adjacent sub-regions with the same label by merging them.

Semantic Segmentation

Semantic Segmentation Source – The University of Warwick

However, this method has an issue as it requires hard-coded rules. It is also a challenge to represent complicated classes such as humans with grey-level information. It involves the use of optimization and feature extraction techniques to do so.

Conditional random fields

Try segmenting an image by training the model to assign a class per pixel. There can be issues with this technique as well because it can result in noisy segmentation if the model is not perfect. One way of rectifying such a problem is to consider a prior relationship among pixels. If the objects are continuous, the nearby pixels should have the same labels. Thus, the Conditional Random Fields concept is useful for modeling such relationships.

Thus, it becomes possible to distinguish the cat’s pixels from that of a dog, as the objects will not be continuous. CRF is useful for structured prediction. It can consider neighboring context such as the relationship between pixels before making the predictions. This concept has two aspects, Grid CRF and Dense CRF. Pairs of pixels that are immediate neighbors constitute the grid CRF, whereas all pairs of pixels in the image constitute Dense CRF.

The Grid CRF leads to over smoothing of the images around the boundaries.

The Dense CRF recovers the subtle boundaries.

Semantic Segmentation

Semantic Segmentation Source – Carnegie Mellon University

We have seen the classical methods for semantic segmentation networks. Nowadays, no one uses these methods because Deep Learning has made things easy.

Deep Learning Methods for semantic segmentation networks

Deep Learning has made it simple to perform semantic segmentation. Here are some model architectures to train these deep learning methods.

Model Architectures

We shall now look at some of the model architectures available today in this semantic segmentation tutorial.

Fully Convolutional Network

The Fully Convolutional Network (FCN) is the most straightforward and accessible architecture used for semantic segmentation. In this architecture, the authors use FCN to downsample the image input to a smaller size through a series of convolutions. It is also known as the encoder. This output is unsampled through bilinear interpolation or transposes convolutions known as a decoder.

FCN is a capable architecture, but it has its drawbacks. The uneven overlapping of the output of the deconvolution operation results in the presence of checkerboard artifacts. The loss of information from encoding also results in a reduced resolution at the boundaries.

Here are some solutions to improve the performance of this semantic segmentation network, the FCN model.

Semantic Segmentation Tutorial

Semantic Segmentation Tutorial Source – Wikipedia


U-Net is an upgrade to the FCN architecture. This solution has skip connections from the output of convolution blocks to the inputs of the transposed blocks at the same level. Therefore, it allows the smoother flow of gradients and provides more bits of information from multiple scales of the image size. Info from the upper layers helps in better classification of the model, whereas the data from the deeper layers help the model to localize better.

Tiramisu Model

The Tiramisu Model is more or less similar to the U-Net model, but it uses Dense Blocks for the convolution and transposed convolutions. It involves the use of several layers of convolutions so that the feature-maps of the preceding layers serve as input data for the subsequent layers. Thus, it improves the output.

However, there is an issue with this method, as well. It requires a large GPU to perform efficiently.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

Multiscale Methods

Some Deep learning models use methods for incorporating information from multiple scales. One such example is the Pyramid Scene Parsing Network, also known as PSPNet. It performs the pooling operation by using four different kernel sizes to stride to the output feature map of a CNN.

Subsequently, it upgrades the size of the pooling outputs and the CNN output feature map by using techniques like bilinear interpolation and concatenates them along the channel axis. It performs the final convolution on this concatenated output to generate the prediction.

Another example is the Atrous Convolution that presents an efficient method for combining features from multiple scales without increasing the number of parameters. It adjusts the dilation rate, thereby resulting in the same filter spreading out its weight values farther.

One such use of Atrous Convolution is the DeepLabv3 paper. It uses this method with different dilation rates for capturing information from multiple scales without compromising on the size of the image.

Semantic Segmentation

Semantic Segmentation Source – MIT

Hybrid CNN-CRF Methods

Some semantic segmentation networks use CNN as a feature extractor and subsequently use the features as potential input to a Dense CRF. This hybrid method is successful because of the ability of CRFs to model inter-pixel relationships.

Loss Functions

We have seen the model architectures. Now, we shall look at the role of loss functions. Unlike the standard classifiers, semantic segmentation requires the use of different loss functions. Here are some of them.

Pixel-wise Softmax with Cross-Entropy

In this mode, the labels for semantic segmentation are similar in size to the original image. Therefore, it can be represented in a one-hot encoded form. The label can be used as a target for calculating cross-entropy. One should ensure to apply the Softmax pixel-wise before applying cross-entropy.

Focal Loss

Focal Loss proposes an upgrade to the standard cross-entropy loss for usage, especially in cases with extreme class imbalance.

Dice Loss

It is a widespread loss function used in semantic segmentation problems having an extreme class imbalance. Dice Loss can help calculate the overlap between the predicted class and ground-truth class.


We have seen the various deep learning methods for semantic segmentation networks. We shall now look at some of the popular real-life applications to understand the concept better.

Self-driving cars

The most popular use of semantic segmentation networks is autonomous driving. Using this technology, self-driven cars can identify between lanes, vehicles, people, and other obstacles. It helps to guide the vehicle properly.

One demerit of autonomous vehicles is that the semantic segmentation performance should be on a real-time basis. One way to ensure the same is to integrate a GPU along with the car. Neural networks can also be used to enhance the performances.

Medical Image Segmentation

Semantic segmentation has tremendous utility in the medical field to identify salient elements in medical scans. It is instrumental in detecting tumors. It is also valuable for finding the number of blockages in the cardiac arteries and veins.

Scene Understanding

Scene understanding algorithms use semantic segmentation to explain the concepts better. It forms the base for complicated tasks like the Visual Question and Answer.

Fashion Industry

Semantic segmentation has excellent use in the fashion industry where the designer can extract clothing items from a specific image to provide suggestions from retail shops. It is also used for re-dressing particular items of clothing in an image.

Satellite Image Processing

While using semantic segmentation, it is possible to distinguish between land and water bodies in satellite image processing. It is also possible to map roads to identify traffic, free parking space, and so on. It plays a vital role in Google Maps to identify busy streets, thereby guiding the driver through less vehicle-populated areas.

In this semantic segmentation tutorial, we have seen various applications of semantic segmentation networks. We shall now proceed further into the topic and understand the difference between instance segmentation and semantic segmentation.

Semantic Segmentation Tutorial

Semantic Segmentation Tutorial Source – Aero News Network

What is semantic segmentation, and how is it different from instance segmentation?

We have seen that semantic segmentation is a technique that detects the object category for each pixel. Thus, it is a broad classification technique that labels similar-looking objects in the same way. For instance, if there are several cars in an image, it marks them all as car objects.

Instance segmentation goes deeper and separates the instances from one another besides identifying the category. Thus, it distinguishes between cases different in its class.

This semantic segmentation tutorial now moves towards looking at its advantages and disadvantages.

Semantic Segmentation

Semantic Segmentation vs Instance Segmentation Source – Analytics Vidhya


1) It helps identify different objects in an image depending on the color and texture.

2) By identifying and segregating objects of different colors, it becomes easier to analyze.

3) It has tremendous utility in designing self-driving cars and the healthcare sector.


1) The concept is a broad one because it treats all objects of the same color in an image similarly.

2) The neighboring pixels of the same class could belong to different objects. However, semantic segmentation fails to identify the distinction. Instance segmentation can come to your rescue in such circumstances.

Final thoughts

Semantic segmentation makes it easier for incorporating deep learning techniques in concepts like AI and Machine Learning. It has helped pave the way for its adoption in real-life applications. It makes it easy for doctors and radiologists to locate tumors deep inside the body.

It also helps in weather forecasting, as it can distinguish between regular cloud activity and water-laden cloud activity. It helps weather forecasters track cyclones and predict their path better. It also plays a tremendous role in satellite imaging by identifying dense traffic areas and marking them with a distinct hue in the maps. Thus, semantic segmentation is the way forward in today’s technology-driven world.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.


Are you inspired by the opportunity of Deep Learning and Data Science? If you decide to learn data science, you will have ample job prospects in numerous industries. Enroll in Digital Vidya’s  Data Science Course to create a strong foundation in Data Science & build a successful career as a Data Scientist.

Register for FREE Digital Marketing Orientation Class
Date: 27th Jan, 2021 (Wed)
Time: 3:00 PM to 4:30 PM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.
We are good people. We don't spam.

You May Also Like…


Submit a Comment

Your email address will not be published. Required fields are marked *