When Deep Learning Met Code Search

The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)


There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of embedding code and natural language queries into real vectors and then using vector distance to approximate semantic correlation between code and the query. Multiple approaches exist for learning these embeddings, including unsupervised techniques, which rely only on a corpus of code examples, and supervised techniques, which use an aligned corpus of paired code and natural language descriptions. The goal of this supervision is to produce embeddings that are more similar for a query and the corresponding desired code snippet.
Clearly, there are choices in whether to use supervised techniques at all, and if one does, what sort of network and training to use for supervision. This paper is the first to evaluate these choices systematically. To this end, we assembled implementations of state-of-the-art techniques to run on a common platform, training and evaluation corpora. To explore the design space in network complexity, we also introduced a new design point that is a minimal supervision extension to an existing unsupervised technique. Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out super-vision, there is a sizable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.

Related Publications

All Publications

AVAR - August 13, 2020

Localization of Virtual Sounds in Dynamic Listening Using Sparse HRTFs

Zamir Ben-Hur, David Lou Alon, Philip W. Robinson, Ravish Mehra

AVAR - August 13, 2020

Listener-Preferred Headphone Frequency Response for Stereo and Spatial Audio Content

Isaac Engel, David Lou Alon, Kevin Scheumann, Ravish Mehra

ICMI - December 4, 2019

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

The Journal of the Audio Engineering Society (AES) - May 3, 2021

Full Range Omnidirectional Sound Source for Near-Field Head-Related Transfer-Functions Measurement

Bartlomiej Chojnacki, Sang-Ik Terry Cho, Ravish Mehra

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy