Adaptive feature learning CNN for behavior recognition in crowd scene
Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approache...
Saved in:
Published in: | 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) pp. 357 - 361 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-09-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approaches such as hand-crafted method and 2D-CNN architectures are widely used in adaptive feature representations on image data. However, learning dynamic and temporal features in 3D scale features in videos remains an open problem. In this study, we proposed a novel technique 3D-scale Convolutional Neural Network (3DS-CNN), based on the decomposition of 3D feature maps into 2D spatio and 2D temporal feature representations. Extensive experiments on hundreds of video scene were demonstrated on publicly available crowd datasets. Quantitative and qualitative evaluations indicate that the proposed model display superior performance when compared to baseline approaches. The mean average precision of 95.30% was recorded on WWW crowd dataset. |
---|---|
DOI: | 10.1109/ICSIPA.2017.8120636 |