Adaptive feature learning CNN for behavior recognition in crowd scene

Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approache...

Full description

Saved in:
Bibliographic Details
Published in:2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) pp. 357 - 361
Main Authors: Shuaibu, Aliyu Nuhu, Malik, Aamir Saeed, Faye, Ibrahima
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approaches such as hand-crafted method and 2D-CNN architectures are widely used in adaptive feature representations on image data. However, learning dynamic and temporal features in 3D scale features in videos remains an open problem. In this study, we proposed a novel technique 3D-scale Convolutional Neural Network (3DS-CNN), based on the decomposition of 3D feature maps into 2D spatio and 2D temporal feature representations. Extensive experiments on hundreds of video scene were demonstrated on publicly available crowd datasets. Quantitative and qualitative evaluations indicate that the proposed model display superior performance when compared to baseline approaches. The mean average precision of 95.30% was recorded on WWW crowd dataset.
DOI:10.1109/ICSIPA.2017.8120636