Semantic-Fuzzing-Based Empirical Analysis of Voice Assistant Systems of Asian Symbol Languages

Recently, smart voice assistants (VAs) are widely deployed to provide control services via voice commands in IoT systems, e.g., smart home, industrial IoT systems, etc. However, due to the complexity of the application environment and the diversity of voice commands, more and more attacks against VA...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE internet of things journal Vol. 9; no. 12; pp. 9151 - 9166
Main Authors:	Mao, Jian, Liu, Ziwen, Lin, Qixiao, Liang, Zhenkai
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 15-06-2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Bayesian analysis Deep learning security Empirical analysis English language Feasibility studies Fuzzing Internet of Things IoT security Languages Linguistics Phonetics Security Semantics Skills Smart buildings Speech processing Speech recognition voice assistant (VA) Voice control
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, smart voice assistants (VAs) are widely deployed to provide control services via voice commands in IoT systems, e.g., smart home, industrial IoT systems, etc. However, due to the complexity of the application environment and the diversity of voice commands, more and more attacks against VAs cause severe security problems. As voice development platforms allow third-party voice skills to be accessed, adversaries are able to obtain users' private information by squatting attacks using confusing names. The existing work studied the exploitability of semantic misinterpretation in VA systems on phonetic languages such as English. However, due to the semantic structural difference between phonetic English and symbol-based Asian languages, such as Chinese, the linguistic-model-guided fuzzing tool proposed by the previous work is insufficient to conduct semantic analysis on the VAs of Asian Languages. In this article, we conduct a systematic analysis to evaluate the feasibility of voice misinterpretation attacks to typical Asian language VAs through semantic fuzzing. We develop Harmony-Fuzzer, the semantic fuzzing tool that the fuzzing process is under the guidance of fuzzing rules abstracted from phenomena of speech errors, disfluency, or semantically similar expressions in Chinese corpus. We use Bayesian networks to formulate fuzzing models statistically so that the fuzzing space can be controlled by the probability of fuzzing processing. We use our results to test VAs and design malicious skills to empirically verify the feasibility of squatting attacks. We found that squatting attacks on Chinese VAs are feasible when attackers leverage some linguistic phenomena delicately.
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2021.3113645