Learning Semantic Vector Representations of Source Code via a Siamese Neural Network
The abundance of open-source code, coupled with the success of recent advances in deep learning for natural language processing, has given rise to a promising new application of machine learning to source code. In this work, we explore the use of a Siamese recurrent neural network model on Python so...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
26-04-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | The abundance of open-source code, coupled with the success of recent
advances in deep learning for natural language processing, has given rise to a
promising new application of machine learning to source code. In this work, we
explore the use of a Siamese recurrent neural network model on Python source
code to create vectors which capture the semantics of code. We evaluate the
quality of embeddings by identifying which problem from a programming
competition the code solves. Our model significantly outperforms a
bag-of-tokens embedding, providing promising results for improving code
embeddings that can be used in future software engineering tasks. |
---|---|
AbstractList | The abundance of open-source code, coupled with the success of recent
advances in deep learning for natural language processing, has given rise to a
promising new application of machine learning to source code. In this work, we
explore the use of a Siamese recurrent neural network model on Python source
code to create vectors which capture the semantics of code. We evaluate the
quality of embeddings by identifying which problem from a programming
competition the code solves. Our model significantly outperforms a
bag-of-tokens embedding, providing promising results for improving code
embeddings that can be used in future software engineering tasks. |
Author | Pence, Eleanor Walczyk, John Ferreira, Guilherme Wehr, David Zhang, Bo Fede, Halley Hughes, Joseph |
Author_xml | – sequence: 1 givenname: David surname: Wehr fullname: Wehr, David – sequence: 2 givenname: Halley surname: Fede fullname: Fede, Halley – sequence: 3 givenname: Eleanor surname: Pence fullname: Pence, Eleanor – sequence: 4 givenname: Bo surname: Zhang fullname: Zhang, Bo – sequence: 5 givenname: Guilherme surname: Ferreira fullname: Ferreira, Guilherme – sequence: 6 givenname: John surname: Walczyk fullname: Walczyk, John – sequence: 7 givenname: Joseph surname: Hughes fullname: Hughes, Joseph |
BackLink | https://doi.org/10.48550/arXiv.1904.11968$$DView paper in arXiv |
BookMark | eNotj81KxDAURrPQhY4-gCvzAq1J07TJUop_UBRscVtukxsJTpMh7Yz69tbR1YGPjwPnnJyEGJCQK87yUknJbiB9-UPONStzznWlzkjfIqTgwzvtcIKweEPf0Cwx0VfcJZwxLLD4GGYaHe3iPhmkTbRIDx4o0M7DtJ7oM-4TbFcsnzF9XJBTB9sZL_-5If39Xd88Zu3Lw1Nz22ZQ1SorBY7MqsI6p4QFo6CSGkpgutB8tKIea-60WCeF0pbOKmDCFEJLY7kEFBty_ac9Zg275CdI38Nv3nDMEz95mE06 |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.1904.11968 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1904_11968 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-LOGICAL-a678-43eb0d82dff83dac8a659a4a09291bd37b71f939a48e5d4fd8a03c2395cd15ae3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:43:02 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a678-43eb0d82dff83dac8a659a4a09291bd37b71f939a48e5d4fd8a03c2395cd15ae3 |
OpenAccessLink | https://arxiv.org/abs/1904.11968 |
ParticipantIDs | arxiv_primary_1904_11968 |
PublicationCentury | 2000 |
PublicationDate | 2019-04-26 |
PublicationDateYYYYMMDD | 2019-04-26 |
PublicationDate_xml | – month: 04 year: 2019 text: 2019-04-26 day: 26 |
PublicationDecade | 2010 |
PublicationYear | 2019 |
Score | 1.7328358 |
SecondaryResourceType | preprint |
Snippet | The abundance of open-source code, coupled with the success of recent
advances in deep learning for natural language processing, has given rise to a
promising... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Computer Science - Programming Languages Computer Science - Software Engineering Statistics - Machine Learning |
Title | Learning Semantic Vector Representations of Source Code via a Siamese Neural Network |
URI | https://arxiv.org/abs/1904.11968 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDI3YTlwQCND4lA9cK9KkbdIjGhs7DYlOaLfJqV3UAx3a2MTPJ02L4MLVsSzFUWTHfn4R4o6MlMzGRNSCw5IydpEtYx0x-diAJBUGyvxZYeZL-zhpaXLgZxYGN1_1vuMHdtt7H60Sf6nzzA7EQKkWsvX0vOyak4GKq9f_1fM5ZhD9CRLTY3HUZ3fw0B3HiTjg5lQseg7TNyj43W-kLuE11MrhJcBQ--mfZgvrCopQTIfxmhj2NQJCUbdAVoaWRsPbnne47TOxmE4W41nUf2YQoY8HUaLZSbKKqspqwtJiluaYoPTpSexIG2fiKtdeZDmlpCKLUpdK52lJcYqsz8WwWTc8EiB15Z9-mSRvJiGbW3RMDrVjJY1zfCFGwQWrj46vYtV6ZxW8c_n_0pU49LlAaJSo7FoMPzc7vhGDLe1ug9O_AYBSgfo |
link.rule.ids | 228,230,782,887 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Semantic+Vector+Representations+of+Source+Code+via+a+Siamese+Neural+Network&rft.au=Wehr%2C+David&rft.au=Fede%2C+Halley&rft.au=Pence%2C+Eleanor&rft.au=Zhang%2C+Bo&rft.date=2019-04-26&rft_id=info:doi/10.48550%2Farxiv.1904.11968&rft.externalDocID=1904_11968 |