Smartphone Visual Assistance System with AI-Based Depth Estimation and Embedded/Cloud Voice Controls

This paper presents an advanced smartphone visual assistance system that effectively utilizes artificial intelligence (AI) to fast image depth estimation and voice control based on embedded and cloud technologies. We developed a voice control module that utilizes a combination of embedded and cloud-...

Full description

Saved in:
Bibliographic Details
Published in:2023 IEEE 5th International Conference on Advanced Information and Communication Technologies (AICT) pp. 87 - 91
Main Authors: Pastukh, Volodymyr, Beshley, Mykola, Chopyk, Pavlo, Beshley, Halyna, Ivanochko, Iryna, Gregus, Michal
Format: Conference Proceeding
Language:English
Published: IEEE 21-11-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an advanced smartphone visual assistance system that effectively utilizes artificial intelligence (AI) to fast image depth estimation and voice control based on embedded and cloud technologies. We developed a voice control module that utilizes a combination of embedded and cloud-based APIs to improve system availability and efficiency. The proposed visual assistance system leverages the MobiNet3 neural network for object classification and the MiDaS 2 Lite model for depth estimation. In addition, we implemented an iterative asynchronous algorithm for calculating the image depth matrix, which is on average 35 times faster than the recursive approach. The test results confirm that the processing time for this algorithm is in the range from 600 µs to 2 ms. This is an important aspect to ensure the system's efficiency and accuracy in real-world use, particularly for people with visual impairments. We presented an approach that allows to determine the average values of luminance coefficients along the vertical and horizontal planes. Using grayscale values in the range from 0.0 to 1.0, we estimate the depth of the image to determine the location of objects in space and provide a spatial interpretation of the scene.
DOI:10.1109/AICT61584.2023.10452675