Lip Reading with 3D Convolutional and Bidirectional LSTM Networks on the GRID Corpus

In recent years, the application of artificial intelligence has revolutionized the field of lip reading by enabling the development of sophisticated models capable of accurately interpreting lip movements from video data. This work presents a novel deep learning approach to lip reading, focused on d...

Full description

Saved in:
Bibliographic Details
Published in:2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON) pp. 1 - 8
Main Authors: Prashanth, B S, Manoj Kumar, M V, Puneetha, B H, Lohith, R, Darshan Gowda, V, Chandan, V, Sneha, H R
Format: Conference Proceeding
Language:English
Published: IEEE 09-08-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, the application of artificial intelligence has revolutionized the field of lip reading by enabling the development of sophisticated models capable of accurately interpreting lip movements from video data. This work presents a novel deep learning approach to lip reading, focused on decoding spoken text from video sequences of lip movements. Traditional lip reading methods involve separate stages for visual feature design and prediction. The proposed system utilizes an endto-end deep learning model to directly map video frames to text transcriptions, leveraging 3D convolutional neural networks and bidirectional Long Short-Term Memory. By analyzing visual cues from lipmotions, the system can interpret speech, improving accessibility for individuals with hearing impairments and enabling communication in noisy environments. Compared to existing lip reading techniques, the deep learning model achieves superior performance on benchmark datasets, demonstrating its effectiveness in this challenging task. The model has achieved a character error rate of 1.54% and a word error rate of 7.96% on the best model.
DOI:10.1109/NMITCON62075.2024.10699241