Multimodal Lip- Reading for Tracheostomy Patients in the Greek Language
Loss of voice, may be due to problems in various organs of the human voice system is a major disability which commonly results in social isolation. By utilizing multimodal information sources such as video, audio and text, a sophisticated personalized voice reconstruction system can be created. In t...
Saved in:
Published in: | 2021 16th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP) pp. 1 - 5 |
---|---|
Main Authors: | , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
04-11-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Loss of voice, may be due to problems in various organs of the human voice system is a major disability which commonly results in social isolation. By utilizing multimodal information sources such as video, audio and text, a sophisticated personalized voice reconstruction system can be created. In this work we collect multimodal information from patients before loss-of-voice medical procedures and design a complete system for lip-reading in the Greek language. In order to preprocess the data we apply lip-segmentation and frame-level sampling techniques. Text is matched with the corresponding video frames in order to create a word to frames dataset. Then deep learning methods are applied to create a sequence-to-sequence model for word prediction from lip-area frames. Our results show that the presented model achieves 85 % accuracy on the training set. Due to the limited availability of data the model overfits the training set resulting in worse performance on unseen data. Our model is the first deep learning model that is trained to recognize words from lip-are images in the Greek language. Overall it is necessary to collect more data and deal with the overfitting problem. In future work we plan to train the model in larger word sets from as many as 30 patients by the end of 2021. |
---|---|
DOI: | 10.1109/SMAP53521.2021.9610767 |