Evaluation of Applying LDA to Redacted Documents in Security and Safety Analysis
Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity...
Saved in:
Published in: | 2023 IEEE International Conference on Cyber Security and Resilience (CSR) pp. 212 - 218 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
31-07-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cyber attacks are often executed by imitating existing attacks and combining them. Using existing vulnerability databases, we have presented a way to semi-automatically determine the presence of vulnerabilities in the design documents of products under development. We have calculated the similarity between documents using the Latent Dirichlet Allocation (LDA) technology and compared the design document of the new product with the vulnerability database. When this comparison processing is conducted by a third party as a service, it may be desirable to not inadvertently disclose a part of the design document of the new product to the third party. In this study, we used the LDA technique to experimentally verify that the calculated similarity value does not deteriorate even when a portion of the design document is encrypted or obfuscated. In conclusion, we discovered no substantial difference in similarity with the original document; however, there are changes in numerical values depending on the words to be encrypted/obfuscated. In particular, the degradation of similarity is very small when the version number is encrypted/obfuscated. |
---|---|
DOI: | 10.1109/CSR57506.2023.10224991 |