Semantic segmentation has gained prominence in recent times. Nowadays, there is a lot of discussion on self-driven automatic cars. How on earth can a car drive on its own? We read about some road accident or the other in the newspapers almost every day. If cars with drivers can cause accidents, how can we expect driverless cars to drive safely?
You will be astonished to know that they can. It is because they use semantic segmentation techniques to identify images, and thereby negotiate the obstacles correctly. Now, you will wonder if it is possible. Remember the famous quote, “It always seems impossible until it’s done.”
We shall now discuss what semantic segmentation is in this semantic segmentation tutorial.
What is semantic segmentation?
All of us have heard about pixels in an image. As humans, it is not a challenge for us to identify different objects in a picture quickly. We can distinguish a tree from a man and a car from a bicycle easily. It takes a fraction of a second for us to do that. However, machines do not have this sensory perception. They follow a set of rules. One such rule that helps them identify images via linking the pixels in an image is known as semantic segmentation.
In simple words, semantic segmentation can be defined as the process of linking each pixel in a particular image to a class label. These labels could include people, cars, flowers, trees, buildings, roads, animals, and so on. The list is endless.
Thus, it is image classification at the pixel level. Accordingly, if you have many people in an image, segmentation will label all the objects as people objects. However, there is a separate concept known as instance segmentation that can label different instances where an object appears in an image. This concept is handy for counting footfalls in a specific location such as a city mall.
It has applications in various fields. But before we look into that, let us first understand semantic segmentation networks.
Download Detailed Brochure and Get Complimentary access to Live Online Demo Class with Industry Expert.
A semantic segmentation tutorial
We shall explore popular methods to perform semantic segmentation using the classical and deep learning-based approaches.
Classical Methods for performing semantic segmentation
Nowadays, everyone uses deep learning-based methods for semantic segmentation. However, before this era, people were using classical techniques to segment images into regions of interest.
Gray Level Segmentation
It is the simplest of all forms of semantic segmentation, as it involves hard-coded rules that a region should satisfy to be assigned a specific label. You can use the pixel’s properties like grey-level intensity to frame such rules. The Split and Merge algorithm uses this technique where it recursively splits the image into different sub-regions until it can assign a label. Subsequently, it combines the adjacent sub-regions with the same label by merging them.
However, this method has an issue as it requires hard-coded rules. It is also a challenge to represent complicated classes such as humans with grey-level information. It involves the use of optimization and feature extraction techniques to do so.
Conditional random fields
Try segmenting an image by training the model to assign a class per pixel. There can be issues with this technique as well because it can result in noisy segmentation if the model is not perfect. One way of rectifying such a problem is to consider a prior relationship among pixels. If the objects are continuous, the nearby pixels should have the same labels. Thus, the Conditional Random Fields concept is useful for modeling such relationships.
Thus, it becomes possible to distinguish the cat’s pixels from that of a dog, as the objects will not be continuous. CRF is useful for structured prediction. It can consider neighboring context such as the relationship between pixels before making the predictions. This concept has two aspects, Grid CRF and Dense CRF. Pairs of pixels that are immediate neighbors constitute the grid CRF, whereas all pairs of pixels in the image constitute Dense CRF.
The Grid CRF leads to over smoothing of the images around the boundaries.
The Dense CRF recovers the subtle boundaries.
We have seen the classical methods for semantic segmentation networks. Nowadays, no one uses these methods because Deep Learning has made things easy.
Deep Learning Methods for semantic segmentation networks
Deep Learning has made it simple to perform semantic segmentation. Here are some model architectures to train these deep learning methods.
Model Architectures
We shall now look at some of the model architectures available today in this semantic segmentation tutorial.
Fully Convolutional Network
The Fully Convolutional Network (FCN) is the most straightforward and accessible architecture used for semantic segmentation. In this architecture, the authors use FCN to downsample the image input to a smaller size through a series of convolutions. It is also known as the encoder. This output is unsampled through bilinear interpolation or transposes convolutions known as a decoder.
FCN is a capable architecture, but it has its drawbacks. The uneven overlapping of the output of the deconvolution operation results in the presence of checkerboard artifacts. The loss of information from encoding also results in a reduced resolution at the boundaries.
Here are some solutions to improve the performance of this semantic segmentation network, the FCN model.
U-NET
U-Net is an upgrade to the FCN architecture. This solution has skip connections from the output of convolution blocks to the inputs of the transposed blocks at the same level. Therefore, it allows the smoother flow of gradients and provides more bits of information from multiple scales of the image size. Info from the upper layers helps in better classification of the model, whereas the data from the deeper layers help the model to localize better.
Tiramisu Model
The Tiramisu Model is more or less similar to the U-Net model, but it uses Dense Blocks for the convolution and transposed convolutions. It involves the use of several layers of convolutions so that the feature-maps of the preceding layers serve as input data for the subsequent layers. Thus, it improves the output.
However, there is an issue with this method, as well. It requires a large GPU to perform efficiently.
Multiscale Methods
Some Deep learning models use methods for incorporating information from multiple scales. One such example is the Pyramid Scene Parsing Network, also known as PSPNet. It performs the pooling operation by using four different kernel sizes to stride to the output feature map of a CNN.
Subsequently, it upgrades the size of the pooling outputs and the CNN output feature map by using techniques like bilinear interpolation and concatenates them along the channel axis. It performs the final convolution on this concatenated output to generate the prediction.
Another example is the Atrous Convolution that presents an efficient method for combining features from multiple scales without increasing the number of parameters. It adjusts the dilation rate, thereby resulting in the same filter spreading out its weight values farther.
One such use of Atrous Convolution is the DeepLabv3 paper. It uses this method with different dilation rates for capturing information from multiple scales without compromising on the size of the image.
Hybrid CNN-CRF Methods
Some semantic segmentation networks use CNN as a feature extractor and subsequently use the features as potential input to a Dense CRF. This hybrid method is successful because of the ability of CRFs to model inter-pixel relationships.
Loss Functions
We have seen the model architectures. Now, we shall look at the role of loss functions. Unlike the standard classifiers, semantic segmentation requires the use of different loss functions. Here are some of them.
Pixel-wise Softmax with Cross-Entropy
In this mode, the labels for semantic segmentation are similar in size to the original image. Therefore, it can be represented in a one-hot encoded form. The label can be used as a target for calculating cross-entropy. One should ensure to apply the Softmax pixel-wise before applying cross-entropy.
Focal Loss
Focal Loss proposes an upgrade to the standard cross-entropy loss for usage, especially in cases with extreme class imbalance.
Dice Loss
It is a widespread loss function used in semantic segmentation problems having an extreme class imbalance. Dice Loss can help calculate the overlap between the predicted class and ground-truth class.
Applications
We have seen the various deep learning methods for semantic segmentation networks. We shall now look at some of the popular real-life applications to understand the concept better.
Self-driving cars
The most popular use of semantic segmentation networks is autonomous driving. Using this technology, self-driven cars can identify between lanes, vehicles, people, and other obstacles. It helps to guide the vehicle properly.
One demerit of autonomous vehicles is that the semantic segmentation performance should be on a real-time basis. One way to ensure the same is to integrate a GPU along with the car. Neural networks can also be used to enhance the performances.
Medical Image Segmentation
Semantic segmentation has tremendous utility in the medical field to identify salient elements in medical scans. It is instrumental in detecting tumors. It is also valuable for finding the number of blockages in the cardiac arteries and veins.
Scene Understanding
Scene understanding algorithms use semantic segmentation to explain the concepts better. It forms the base for complicated tasks like the Visual Question and Answer.
Fashion Industry
Semantic segmentation has excellent use in the fashion industry where the designer can extract clothing items from a specific image to provide suggestions from retail shops. It is also used for re-dressing particular items of clothing in an image.
Satellite Image Processing
While using semantic segmentation, it is possible to distinguish between land and water bodies in satellite image processing. It is also possible to map roads to identify traffic, free parking space, and so on. It plays a vital role in Google Maps to identify busy streets, thereby guiding the driver through less vehicle-populated areas.
In this semantic segmentation tutorial, we have seen various applications of semantic segmentation networks. We shall now proceed further into the topic and understand the difference between instance segmentation and semantic segmentation.
What is semantic segmentation, and how is it different from instance segmentation?
We have seen that semantic segmentation is a technique that detects the object category for each pixel. Thus, it is a broad classification technique that labels similar-looking objects in the same way. For instance, if there are several cars in an image, it marks them all as car objects.
Instance segmentation goes deeper and separates the instances from one another besides identifying the category. Thus, it distinguishes between cases different in its class.
This semantic segmentation tutorial now moves towards looking at its advantages and disadvantages.
Advantages
1) It helps identify different objects in an image depending on the color and texture.
2) By identifying and segregating objects of different colors, it becomes easier to analyze.
3) It has tremendous utility in designing self-driving cars and the healthcare sector.
Demerits
1) The concept is a broad one because it treats all objects of the same color in an image similarly.
2) The neighboring pixels of the same class could belong to different objects. However, semantic segmentation fails to identify the distinction. Instance segmentation can come to your rescue in such circumstances.
Final thoughts
Semantic segmentation makes it easier for incorporating deep learning techniques in concepts like AI and Machine Learning. It has helped pave the way for its adoption in real-life applications. It makes it easy for doctors and radiologists to locate tumors deep inside the body.
It also helps in weather forecasting, as it can distinguish between regular cloud activity and water-laden cloud activity. It helps weather forecasters track cyclones and predict their path better. It also plays a tremendous role in satellite imaging by identifying dense traffic areas and marking them with a distinct hue in the maps. Thus, semantic segmentation is the way forward in today’s technology-driven world.
Conclusion
Are you inspired by the opportunity of Deep Learning and Data Science? If you decide to learn data science, you will have ample job prospects in numerous industries. Enroll in Digital Vidya’s Data Science Course to create a strong foundation in Data Science & build a successful career as a Data Scientist.