Real Time Instance Segmentation – 3 Proposed Solutions

Cameras sense the real-world through multi-modal devices to capture a scene at different wavelengths.

To understand scenes, it is essential for the camera systems to employ computer vision techniques in order to provide real-time assistance.

They need techniques like instance segmentation to refine segmentation in 3D vision and autonomous driving.

Refined segmentation can help improve the effectiveness of 3D reconstruction and safety in driving.

Since instance segmentation is the building block of image processing tasks, tremendous research has been invested into developing novel methods that employ AI and machine learning solutions to learn on their own and achieve real-time segmentation.

In this article, we review 3 of those methods proposed for real-time instance segmentation and how they can achieve real-time results at a faster rate.

What is Real-time Instance Segmentation?

In digital image processing, instance segmentation is an approach to matching every pixel in an image to a belonging class.

It treats each individual object present in an image as distinct.

Real-time instance segmentation is no different from instance segmentation. It is the task of segmenting pixels in real-time while maintaining a level of accuracy.

3 Proposed Real-time Instance Segmentation Methods

1. YOLACT – Proposed in a paper by the University of California, Davis

YOLACT (You Only Look At CoefficientTs) is a fully convolutional model for real-time instance segmentation.

After training on a single GPU (graphics processing unit), YOLACT can achieve results significantly faster.

This model breaks down the segmentation into two tasks: generating prototype masks and predicting a coefficient per instance. This way, it makes full-image segmentation simple.

By segmenting in this manner, YOLACT, on its own, learns to localize instance masks. Resultantly, the instances that are “visually, spatially, and semantically” identical, look different in prototypes.

Practically, this approach has several advantages:

It’s fast, and the assembly process is extremely lightweight
It produces high-quality masks using the full extent of the image.
It’s general, which means it can complement any object detector.

2. Deep Snake – Proposed by authors from Zhejiang University

Deep Snake is a contour-based algorithm that takes an initial contour of objects to learn about their features. Using a neural network, it deforms contour, matches it with the object boundary, and obtains the object shape.

For feature learning, Deep Snake employs circular convolution, which is better at exploiting the cycle graph structure of a contour.

With Deep Snake, instance segmentation happens in a two-stage pipeline.

First, it takes an initial contour of the object, defined by a group of vertices along the object silhouette.
Second, it deforms contour by extracting features at each vertex. The contour is a round graph. Therefore, circular convolution is applied to learn the features. It regresses offsets at each vertex for deforming contour to match the object edge.

Deep Snake algorithm achieves competitive accuracy by training on datasets such as Cityscapes, SBD, KINS, and COCO.

It performs real-time instance segmentation at the speed of 32.3 fps (frames per second), which is much faster than most other methods.

3. SOLACT – Proposed in a study by PMC (an archive in the U.S. National Institutes of Health’s National Library of Medicine)

SOLACT offers a novel perspective on instance segmentation by introducing the concept of “instance categories”.

It is deep learning-based neural network architecture proposed as a blend of YOLACT and SOLO approaches.

The development of SOLACT is augmented by modifying blocks in standard neural network design.

SOLACT is a reliable method for real-time instance segmentation for traffic videos by deploying in embedded devices.

After deploying in devices, SOLACT can be further modified by increasing network speed.

Based on a conventional approach, SOLACT performs segmentation by:

Using backbone architectures to discern feature maps at multiple resolutions
Then, employing two head branches that detect class category and object shape, respectively

Finally, the results are achieved using a post-processing algorithm.

This instance segmentation method reduces the chances of false detection by evaluating the quality of output masks. SOLACT can outperform some popular methods and function well in real-time applications in instance segmentation tasks.

Conclusion

The instance segmentation technique is key to numerous image processing tasks. But the area still remains less explored, which is why we do not find many solutions for this type of segmentation.

The solutions we have listed above are solely based on research and experiments. They offer a different level of accuracy and efficiency in performing instance segmentation.

When further modified, these methods would probably produce far better results and assist in processing images, providing immediate results.