A Refined Non-Driving Activity Classification Using a Two-Stream Convolutional Neural Network

It is of great importance to monitor the driver's status to achieve an intelligent and safe take-over transition in the level 3 automated driving vehicle. We present a camera-based system to recognise the non-driving activities (NDAs) which may lead to different cognitive capabilities for take-...

Full description

Saved in:
Bibliographic Details
Published in:IEEE sensors journal Vol. 21; no. 14; pp. 15574 - 15583
Main Authors: Yang, Lichao, Yang, Ting-Yu, Liu, Haochen, Shan, Xiaocai, Brighton, James, Skrypchuk, Lee, Mouzakitis, Alexandros, Zhao, Yifan
Format: Journal Article
Language:English
Published: New York IEEE 15-07-2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:It is of great importance to monitor the driver's status to achieve an intelligent and safe take-over transition in the level 3 automated driving vehicle. We present a camera-based system to recognise the non-driving activities (NDAs) which may lead to different cognitive capabilities for take-over based on a fusion of spatial and temporal information. The region of interest (ROI) is automatically selected based on the extracted masks of the driver and the object/device interacting with. Then, the RGB image of the ROI (the spatial stream) and its associated current and historical optical flow frames (the temporal stream) are fed into a two-stream convolutional neural network (CNN) for the classification of NDAs. Such an approach is able to identify not only the object/device but also the interaction mode between the object and the driver, which enables a refined NDA classification. In this paper, we evaluated the performance of classifying 10 NDAs with two types of devices (tablet and phone) and 5 types of tasks (emailing, reading, watching videos, web-browsing and gaming) for 10 participants. Results show that the proposed system improves the averaged classification accuracy from 61.0% when using a single spatial stream to 90.5%.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2020.3005810