Evaluation of Applying LDA to Redacted Documents in Security and Safety Analysis

Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity...

Full description

Saved in:

Bibliographic Details
Published in:	2023 IEEE International Conference on Cyber Security and Resilience (CSR) pp. 212 - 218
Main Authors:	Umezawa, Katsuyuki, Wohlgemuth, Sven, Hasegawa, Keisuke, Takaragi, Kazuo
Format:	Conference Proceeding
Language:	English
Published:	IEEE 31-07-2023
Subjects:	cosine similarity Cryptography Cyberattack Degradation latent dirichlet allocation natural language processing Resilience Resource management Safety vulnerability analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity between documents using the Latent Dirichlet Allocation (LDA) technology and compared the design document of the new product with the vulnerability database. When this comparison processing is conducted by a third party as a service, it may be desirable to not inadvertently disclose a part of the design document of the new product to the third party. In this study, we used the LDA technique to experimentally verify that the calculated similarity value does not deteriorate even when a portion of the design document is encrypted or obfuscated. In conclusion, we discovered no substantial difference in similarity with the original document; however, there are changes in numerical values depending on the words to be encrypted/obfuscated. In particular, the degradation of similarity is very small when the version number is encrypted/obfuscated.
DOI:	10.1109/CSR57506.2023.10224991