Egocentric Temporal Action Proposals

We present an approach to localize generic actions in egocentric videos, called temporal action proposals (TAPs), for accelerating the action recognition step. An egocentric TAP refers to a sequence of frames that may contain a generic action performed by the wearer of a head-mounted camera, e.g., t...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing Vol. 27; no. 2; pp. 764 - 777
Main Authors: Shao Huang, Weiqiang Wang, Shengfeng He, Lau, Rynson W. H.
Format: Journal Article
Language:English
Published: United States IEEE 01-02-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present an approach to localize generic actions in egocentric videos, called temporal action proposals (TAPs), for accelerating the action recognition step. An egocentric TAP refers to a sequence of frames that may contain a generic action performed by the wearer of a head-mounted camera, e.g., taking a knife, spreading jam, pouring milk, or cutting carrots. Inspired by object proposals, this paper aims at generating a small number of TAPs, thereby replacing the popular sliding window strategy, for localizing all action events in the input video. To this end, we first propose to temporally segment the input video into action atoms, which are the smallest units that may contain an action. We then apply a hierarchical clustering algorithm with several egocentric cues to generate TAPs. Finally, we propose two actionness networks to score the likelihood of each TAP containing an action. The top ranked candidates are returned as output TAPs. Experimental results show that the proposed TAP detection framework performs significantly better than relevant approaches for egocentric action detection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2017.2772904