Automatic Algorithm Recognition of Source-Code Using Machine Learning

As codebases for software projects get larger, reaching ranges of millions of lines of code, the need for computer-aided program comprehension grows. We define one of the tasks of program comprehension to be algorithm recognition: given a piece of source-code from a file, identify the algorithm this...

Full description

Saved in:
Bibliographic Details
Published in:2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 170 - 177
Main Authors: Shalaby, Maged, Mehrez, Tarek, El Mougy, Amr, Abdulnasser, Khalid, Al-Safty, Aysha
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As codebases for software projects get larger, reaching ranges of millions of lines of code, the need for computer-aided program comprehension grows. We define one of the tasks of program comprehension to be algorithm recognition: given a piece of source-code from a file, identify the algorithm this code is implementing, such as brute-force or dynamic programming. Most research in this area is making use of pattern matching, which involves much human effort and is of questionable accuracy when the structure and semantics of programs change. Thus, this paper proposes to let go of defined patterns, and make use of simpler features, such as counts of variables and counts of different constructs to recognize algorithms. We then feed these features to a classification algorithm to predict the class or type of algorithm used in this source code. We show through experimental results that our proposed method achieves a good improvement over baseline.
DOI:10.1109/ICMLA.2017.00033