Neural Query Expansion for Code Search

Machine Learning and Programming Languages (MAPL) Workshop at PLDI


Searching repositories of existing source code for code snippets is a key task in software engineering. Over the years, many approaches to this problem have been proposed. One recent tool called NCS, takes in a natural language query and outputs relevant code snippets, often being able to correctly answer Stack Overflow questions. But what happens when the developer doesn’t provide a query with a clear intent?

What if shorter queries are used to demonstrate a more vague intent? We find that the performance of NCS regresses with shorter queries. Furthermore, data from developers’ code search history logs shows that shorter queries have a less successful code search session: there are more query reformulations and more time is spent browsing the results. These observations lead us to believe that using NCS alone with short queries may not be productive enough.

In this paper, we explore an additional way of using neural networks in code search: the automatic expansion of queries. We present NQE, a neural model that takes in a set of keywords and predicts a set of keywords to expand the query to NCS. NQE learns to predict keywords that co-occur with the query keywords in the underlying corpus, which helps expand the query in a productive way. Our results show that with query expansion, NQE + NCS is able to perform better than using NCS alone.

Related Publications

All Publications

Uncertainty and Robustness in Deep Learning Workshop at ICML - June 24, 2021

DAIR: Data Augmented Invariant Regularization

Tianjian Huang, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

AutoML Workshop at NeurIPS - July 18, 2021

Neural Fixed-Point Acceleration for Convex Optimization

Shobha Venkataraman, Brandon Amos

ESEM - September 23, 2021

Measurement Challenges for Cyber Cyber Digital Twins: Experiences from the Deployment of Facebook’s WW Simulation System

Kinga Bojarczuk, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Simon Mark Lucas, Erik Meijer, Rubmary Rojas, Silvia Sapora

Federated Learning for User Privacy and Data Confidentiality Workshop At ICML - July 24, 2021

Federated Learning with Buffered Asynchronous Aggregation

John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy