Accuracy Comparison of CNN, LSTM, and Transformer for Activity Recognition Using IMU and Visual Markers

Human activity recognition (HAR) has applications ranging from security to healthcare. Typically these systems are composed of data acquisition and activity recognition models. In this work, we compared the accuracy of two acquisition systems: Inertial Measurement Units (IMUs) vs Movement Analysis S...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access Vol. 11; pp. 106650 - 106669
Main Authors:	Trujillo-Guerrero, Maria Fernanda, Roman-Niemes, Stadyn, Jaen-Vargas, Milagros, Cadiz, Alfonso, Fonseca, Ricardo, Serrano-Olmedo, Jose Javier
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Accelerometers Accuracy arm exercises Cameras CNN Convolutional neural networks Data acquisition Datasets Feature extraction Human activity recognition Human motion IMU Inertial platforms LSTM Machine learning Model accuracy movement analysis system Sensors Transformer Transformers visual marker Visualization Waveforms Wrist
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Human activity recognition (HAR) has applications ranging from security to healthcare. Typically these systems are composed of data acquisition and activity recognition models. In this work, we compared the accuracy of two acquisition systems: Inertial Measurement Units (IMUs) vs Movement Analysis Systems (MAS). We trained models to recognize arm exercises using state-of-the-art deep learning architectures and compared their accuracy. MAS uses a camera array and reflective markers. IMU uses accelerometers, gyroscopes, and magnetometers. Sensors of both systems were attached to different locations of the upper limb. We captured and annotated 3 datasets, each one using both systems simultaneously. For activity recognition, we trained 8 architectures, each one with different operations and layers configurations. The best architectures were a combination of CNN, LSTM, and Transformer achieving test accuracy from 89% to 99% on average. We evaluated how feature selection reduced the sensors required. We found IMU and MAS data were able to distinguish correctly the arm exercises. CNN layers at the beginning produced better accuracy on challenging datasets. IMU had advantages over other acquisition systems for activity recognition. We analyzed the relations between models accuracy, signal waveforms, signals correlation, sampling rate, exercise duration, and window size. Finally, we proposed the use of a single IMU located at the wrist and a variable-size window extraction.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3318563