MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation

For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed. However, research on multilingual code clone detection has...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhu, Wenqing, Yoshida, Norihiro, Kamiya, Toshihiro, Choi, Eunjong, Takada, Hiroaki
Format: Journal Article
Language:English
Published: 06-04-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed. However, research on multilingual code clone detection has not received sufficient attention. In this study, we propose MSCCD (Multilingual Syntactic Code Clone Detector), a grammar pluggable code clone detection tool that uses a parser generator to generate a code block extractor for the target language. The extractor then extracts the semantic code blocks from a parse tree. MSCCD can detect Type-3 clones at various granularities. We evaluated MSCCD's language extensibility by applying MSCCD to 20 modern languages. Sixteen languages were perfectly supported, and the remaining four were provided with the same detection capabilities at the expense of execution time. We evaluated MSCCD's recall by using BigCloneEval and conducted a manual experiment to evaluate precision. MSCCD achieved equivalent detection performance equivalent to state-of-the-art tools.
DOI:10.48550/arxiv.2204.01028