A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
05-02-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature. |
---|---|
DOI: | 10.48550/arxiv.1902.01954 |