Semantic-Fuzzing-Based Empirical Analysis of Voice Assistant Systems of Asian Symbol Languages

Recently, smart voice assistants (VAs) are widely deployed to provide control services via voice commands in IoT systems, e.g., smart home, industrial IoT systems, etc. However, due to the complexity of the application environment and the diversity of voice commands, more and more attacks against VA...

Full description

Saved in:
Bibliographic Details
Published in:IEEE internet of things journal Vol. 9; no. 12; pp. 9151 - 9166
Main Authors: Mao, Jian, Liu, Ziwen, Lin, Qixiao, Liang, Zhenkai
Format: Journal Article
Language:English
Published: Piscataway IEEE 15-06-2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, smart voice assistants (VAs) are widely deployed to provide control services via voice commands in IoT systems, e.g., smart home, industrial IoT systems, etc. However, due to the complexity of the application environment and the diversity of voice commands, more and more attacks against VAs cause severe security problems. As voice development platforms allow third-party voice skills to be accessed, adversaries are able to obtain users' private information by squatting attacks using confusing names. The existing work studied the exploitability of semantic misinterpretation in VA systems on phonetic languages such as English. However, due to the semantic structural difference between phonetic English and symbol-based Asian languages, such as Chinese, the linguistic-model-guided fuzzing tool proposed by the previous work is insufficient to conduct semantic analysis on the VAs of Asian Languages. In this article, we conduct a systematic analysis to evaluate the feasibility of voice misinterpretation attacks to typical Asian language VAs through semantic fuzzing. We develop Harmony-Fuzzer, the semantic fuzzing tool that the fuzzing process is under the guidance of fuzzing rules abstracted from phenomena of speech errors, disfluency, or semantically similar expressions in Chinese corpus. We use Bayesian networks to formulate fuzzing models statistically so that the fuzzing space can be controlled by the probability of fuzzing processing. We use our results to test VAs and design malicious skills to empirically verify the feasibility of squatting attacks. We found that squatting attacks on Chinese VAs are feasible when attackers leverage some linguistic phenomena delicately.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2021.3113645