Object Detection on Real-Time Video with FPN and Modified Mask RCNN Based on Inception-ResNetV2

Instance segmentation of Real-time video is a crucial step in the object identification and classification process. Object detection is the task of finding different types of information about the object in a video by masking and bounding a rectangular box on the object’s position in the image. Deep...

Full description

Saved in:
Bibliographic Details
Published in:Wireless personal communications Vol. 138; no. 4; pp. 2065 - 2090
Main Authors: Yadav, Anu, Kumar, Ela
Format: Journal Article
Language:English
Published: New York Springer US 08-10-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Instance segmentation of Real-time video is a crucial step in the object identification and classification process. Object detection is the task of finding different types of information about the object in a video by masking and bounding a rectangular box on the object’s position in the image. Deep learning advances in the field of object identification by utilizing its excellent feature learning ability. Numerous researchers have employed various deep-learning methods to perform object detection with the goal of improving the precision of feature extraction. Due to the poor extraction features of the video frame, the higher and lower-level features of an object from the video frame are not extracted properly. Hence, the Feature Pyramid Network (FPN) integrated Modified Mask RCNN based on Inception-ResNetV2 is employed to extract the higher and lower level features from the video in order to solve this problem. In this designed model, the video dataset is converted to frames, and the features from the lower and higher level of the video frame are extracted using the FPN and backbone (Inception-ResNetV2) of the designed model. The automatic selective approach of the regions for the detection of an object is made by using the Regional Proposal Network, and the selected region is aligned using Region of Interest. From the aligned image, the fully convoluted layer is used for boxing and class detection of the object. Then, the convoluted layers are used for masking the detected object. In order to evaluate object detection on real-time video using Modified Mask RCNN, the performance metrics such as Accuracy, Precision, and Recall attained by the proposed model for the CoCo dataset are 0.98, 0.93, 0.94, which results in better values than the existing approaches including RCNN, SWINV2-G, Mask RCNN, SWINV2-L, and Fast RCNN. As a result, the developed model accurately and rapidly differentiates the object from the real-time video.
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-024-11539-9