Unsupervised Translation of Programming Languages

Conference on Neural Information Processing Systems (NeurIPS)


A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is time-consuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.

Related Publications

All Publications

NAACL - June 6, 2021

Deep Learning on Graphs for Natural Language Processing

Lingfei Wu, Yu Chen, Heng Ji, Yunyao Li

ICASSP - June 6, 2021

On the Predictability of HRTFs from Ear Shapes Using Deep Networks

Yaxuan Zhou, Hao Jiang, Vamsi Krishna Ithapu

ACL - July 7, 2020

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

Kavya Srinet, Yacine Jernite, Jonathan Gray, Arthur Szlam

L4DC - June 7, 2021

On the model-based stochastic value gradient for continuous reinforcement learning

Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy