Learning Semantic Vector Representations of Source Code via a Siamese Neural Network
The abundance of open-source code, coupled with the success of recent advances in deep learning for natural language processing, has given rise to a promising new application of machine learning to source code. In this work, we explore the use of a Siamese recurrent neural network model on Python so...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
26-04-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The abundance of open-source code, coupled with the success of recent
advances in deep learning for natural language processing, has given rise to a
promising new application of machine learning to source code. In this work, we
explore the use of a Siamese recurrent neural network model on Python source
code to create vectors which capture the semantics of code. We evaluate the
quality of embeddings by identifying which problem from a programming
competition the code solves. Our model significantly outperforms a
bag-of-tokens embedding, providing promising results for improving code
embeddings that can be used in future software engineering tasks. |
---|---|
DOI: | 10.48550/arxiv.1904.11968 |