Investigating Vector-Based Detection of Code Clones Using BigCloneBench

In a vector-based approach to detecting code clones from source code, all code fragments in the source are mapped to a vector space and then code fragments are detected as code clones if they are neighbors in the vector space. So far, our research group has developed a vector-based approach using TF...

Full description

Saved in:
Bibliographic Details
Published in:2018 25th Asia-Pacific Software Engineering Conference (APSEC) pp. 699 - 700
Main Authors: Yokoi, Kazuki, Choi, Eunjong, Yoshida, Norihiro, Inoue, Katsuro
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In a vector-based approach to detecting code clones from source code, all code fragments in the source are mapped to a vector space and then code fragments are detected as code clones if they are neighbors in the vector space. So far, our research group has developed a vector-based approach using TF-IDF and cosine similarity. For the improvement of the vector-based approach, we preliminary investigated what kind of vectorization algorithms and similarity measurements are effective in terms of recall and detection time. In this paper, we present preliminary investigation results using BigCloneBench, a large-scale code clone benchmark.
ISSN:2640-0715
DOI:10.1109/APSEC.2018.00095