MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
19-12-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Code completion is a valuable topic in both academia and industry. Recently,
large-scale mono-programming-lingual (MonoPL) pre-training models have been
proposed to boost the performance of code completion. However, the code
completion on low-resource programming languages (PL) is difficult for the
data-driven paradigm, while there are plenty of developers using low-resource
PLs. On the other hand, there are few studies exploring the effects of
multi-programming-lingual (MultiPL) pre-training for the code completion,
especially the impact on low-resource programming languages. To this end, we
propose the MultiCoder to enhance the low-resource code completion via MultiPL
pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a
novel PL-level MoE routing strategy (PL-MoE) for improving the code completion
on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1)
the proposed MultiCoder significantly outperforms the MonoPL baselines on
low-resource programming languages, and 2) the PL-MoE module further boosts the
performance on six programming languages. In addition, we analyze the effects
of the proposed method in details and explore the effectiveness of our method
in a variety of scenarios. |
---|---|
DOI: | 10.48550/arxiv.2212.09666 |