Evaluation of Traditional and Deep Learning Human Detection Techniques Applied to Surveillance: A Performance Comparison at Distinct Object Sizes

Making computers capable of identifying and localizing people in images and videos is a topic that has been attracting the attention of many researchers in recent years. Several applications, including surveillance systems, can benefit from this capacity. There is no study that provides an unbiased...

Full description

Saved in:
Bibliographic Details
Published in:2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) pp. 1 - 5
Main Authors: Gonealves, Vinicius P. M., Silva, Lourival P., Nunes, Fatima L. S., Ferreira, Joao E., Araujo, Luciano V.
Format: Conference Proceeding
Language:English
Published: IEEE 17-08-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Making computers capable of identifying and localizing people in images and videos is a topic that has been attracting the attention of many researchers in recent years. Several applications, including surveillance systems, can benefit from this capacity. There is no study that provides an unbiased comparison between the most representative types of methods, traditional and recent ones, focusing on human detection and specifically within a context of surveillance, where the size of the objects is mostly small relative to the image size. This paper aims to compare the performance of a set of representative human detection techniques applied to surveillance systems in terms of Average Precision (AP) and speed. Two main types of human detection methods are analyzed: the traditional ones, represented by HOG and Haar Cascades, and the deep learning ones, represented by Faster R-CNN, YOLO, and Mobile SSD. The comparison was performed using the VIRAT Ground Dataset, a large-scale real-world surveillance video dataset. When humans were small relative to the overall image size, the Haar Cascades method had an AP of 9.09, a value around six times higher than the other ones. The results indicate that detecting humans when they are far from the camera in videos and images is still a challenge to be overcome, and that, in some aspects, the traditional approaches outperform the recent deep learning ones.
DOI:10.1109/ICSPCC52875.2021.9564442