Investigating Vector-Based Detection of Code Clones Using BigCloneBench
In a vector-based approach to detecting code clones from source code, all code fragments in the source are mapped to a vector space and then code fragments are detected as code clones if they are neighbors in the vector space. So far, our research group has developed a vector-based approach using TF...
Saved in:
Published in: | 2018 25th Asia-Pacific Software Engineering Conference (APSEC) pp. 699 - 700 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-12-2018
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In a vector-based approach to detecting code clones from source code, all code fragments in the source are mapped to a vector space and then code fragments are detected as code clones if they are neighbors in the vector space. So far, our research group has developed a vector-based approach using TF-IDF and cosine similarity. For the improvement of the vector-based approach, we preliminary investigated what kind of vectorization algorithms and similarity measurements are effective in terms of recall and detection time. In this paper, we present preliminary investigation results using BigCloneBench, a large-scale code clone benchmark. |
---|---|
ISSN: | 2640-0715 |
DOI: | 10.1109/APSEC.2018.00095 |