Learning Semantic Vector Representations of Source Code via a Siamese Neural Network

The abundance of open-source code, coupled with the success of recent advances in deep learning for natural language processing, has given rise to a promising new application of machine learning to source code. In this work, we explore the use of a Siamese recurrent neural network model on Python so...

Full description

Saved in:
Bibliographic Details
Main Authors: Wehr, David, Fede, Halley, Pence, Eleanor, Zhang, Bo, Ferreira, Guilherme, Walczyk, John, Hughes, Joseph
Format: Journal Article
Language:English
Published: 26-04-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The abundance of open-source code, coupled with the success of recent advances in deep learning for natural language processing, has given rise to a promising new application of machine learning to source code. In this work, we explore the use of a Siamese recurrent neural network model on Python source code to create vectors which capture the semantics of code. We evaluate the quality of embeddings by identifying which problem from a programming competition the code solves. Our model significantly outperforms a bag-of-tokens embedding, providing promising results for improving code embeddings that can be used in future software engineering tasks.
DOI:10.48550/arxiv.1904.11968