Automatic Algorithm Recognition of Source-Code Using Machine Learning
As codebases for software projects get larger, reaching ranges of millions of lines of code, the need for computer-aided program comprehension grows. We define one of the tasks of program comprehension to be algorithm recognition: given a piece of source-code from a file, identify the algorithm this...
Saved in:
Published in: | 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 170 - 177 |
---|---|
Main Authors: | , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-12-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As codebases for software projects get larger, reaching ranges of millions of lines of code, the need for computer-aided program comprehension grows. We define one of the tasks of program comprehension to be algorithm recognition: given a piece of source-code from a file, identify the algorithm this code is implementing, such as brute-force or dynamic programming. Most research in this area is making use of pattern matching, which involves much human effort and is of questionable accuracy when the structure and semantics of programs change. Thus, this paper proposes to let go of defined patterns, and make use of simpler features, such as counts of variables and counts of different constructs to recognize algorithms. We then feed these features to a classification algorithm to predict the class or type of algorithm used in this source code. We show through experimental results that our proposed method achieves a good improvement over baseline. |
---|---|
DOI: | 10.1109/ICMLA.2017.00033 |