June 11, 2015

Open-sourcing Facebook Infer: Identify bugs before you ship

By: Cristiano Calcagno, Dino Distefano, Peter O'Hearn

Originally published on the Facebook Code blog.

Today, we’re open-sourcing Facebook Infer, a static program analyzer that Facebook uses to identify bugs before mobile code is shipped. Static analyzers are automated tools that spot bugs in source code by scanning programs without running them. They complement traditional dynamic testing: Where testing allows individual runs through a piece of software to be checked for correctness, static analysis allows multiple and sometimes even all flows to be checked at once. Facebook Infer uses mathematical logic to do symbolic reasoning about program execution, approximating some of the reasoning a human might do when looking at a program. We use Facebook Infer internally to analyze the main Facebook apps for Android and iOS (used by more than a billion people), Facebook Messenger, and Instagram, among others. At present, the analyzer reports problems caused by null pointer access and resource and memory leaks, which cause a large percentage of app crashes.

Each month, hundreds of potential bugs identified by Facebook Infer are fixed by our developers before they are committed to our codebases and deployed to people’s phones. This saves our developers many hours finding and fixing bugs, and results in better products for people.

Motivation

Facebook likes to move fast by shipping software as soon as it’s ready, rather than after a lengthy manual testing process. This approach is riskier on mobile platforms than on the web. When a bug is discovered on the web, a fix can immediately be shipped to servers, but in order for a bug fix to take effect on mobile, people have to update their apps. There is thus added value in catching bugs early, before they are shipped to people’s phones. Combining static analysis with automated tests helps Facebook find crashes and memory leaks before they are shipped, which helps us keep moving fast on mobile.

To fit in with our developers’ workflow, Facebook Infer runs incrementally, analyzing code modifications to the source code of our app when they are submitted by developers for review. The analyzer automatically writes comments on source code when it finds potential problems. It is important that these reports be high quality so that we help developers without slowing them down. In recent months, the fix rate for issues reported to our developers has been hovering around 80 percent, a high rate for such an automated tool.

In the current deployment model, we run the analyzer on the Android and Objective-C code for our apps, but more is possible. Facebook Infer can analyze C projects and Java code that is not Android, and we plan to further extend its capabilities. With your help, we hope to expand the places where Facebook Infer is deployed.

Novel techniques: Separation logic and bi-abduction

Facebook Infer uses logic to do reasoning about a program’s execution, but reasoning at this scale — for large applications built from millions of lines of source code — is hard. Theoretically, the number of possibilities that need to be checked is more than the number of estimated atoms in the observable universe. Furthermore, at Facebook our code is not a fixed artifact but an evolving system, updated frequently and concurrently by many developers. It is not unusual to see more than a thousand modifications to our mobile code submitted for review in a given day. The requirements on the program analyzer then become even more challenging because we expect a tool to report quickly on these code modifications — in the region of 10 minutes — to fit in with developers’ workflow. Coping with this scale and velocity requires advanced mathematical techniques. Facebook Infer uses two such techniques: separation logic and bi-abduction.

Separation logic is a theory that allows Facebook Infer’s analysis to reason about small, independent parts of the application storage, rather than having to consider the entirety of the memory potentially at every step. That would be a daunting task on modern processors with their large addressable virtual memories.

Bi-abduction is a logical inference technique that allows Facebook Infer to discover properties about the behavior of independent parts of the application code. By storing these properties between runs, Facebook Infer needs to analyze only the parts of the software that have changed, reusing the results of its previous analysis where it can.

By combining these approaches, our analyzer is able to find complex problems in modifications to an application built from millions of lines of code, in minutes.

History: From theoretical computer science to verification for a billion-plus people

Automatic verification of software has been a long-held goal in the computer science community. Facebook Infer builds on fundamental work in the area, including theories of Hoare logic and abstract interpretation. Before joining Facebook, we participated in the development of another fundamental work, “separation logic,” as a response to problems in scaling verification techniques to real-world code.

Separation logic was a breakthrough for highly theoretical computer science — a new kind of mathematical logic used to describe mutations of computer memory (analogous to how Boolean logic describes circuits). We worked on applying and automating the theory, creating a series of research prototype tools (Smallfoot, Space Invader, Abductor) and underpinning inference techniques, culminating in the discovery of bi-abduction as a way to analyze programs modularly. (We retain university affiliations: Calcagno with Imperial College London, Distefano with Queen Mary University of London, and O’Hearn with University College London.)

Based on the success of these research advances, we formed a startup company, Monoidics, in 2009. Monoidics joined Facebook in 2013, and since then we have adopted the continuous development and deployment style practiced at Facebook and improved our analyzer based on an iterative feedback loop between our analyzer’s development team and Facebook’s mobile software engineers. We have demonstrated that verification technology can indeed thrive when applied to codebases at Facebook’s scale and pace of change.

The future

Program verification is an area with an active research community and promising technology. At Facebook, we have a saying that this journey is 1 percent finished. In program verification, certainly, a tremendous amount remains to be done. But, with continued progress, we believe there is also much more value that can be unlocked for programmers. We look forward to a future in which, with your help, program verification technology can prove more and more useful in helping programmers develop reliable code, fast.

To find out more about Facebook Infer, download it and try it out, or check out fbinfer.com for more details.

Acknowledgements: The Infer engineering team @FB includes Sam Blackshear, Jeremy Dubreil, Andrzej Kotulski, Martino Luca, Irene Papakonstantinou, Dulma Rodriguez, and Jules Villard, in addition to Calcagno, Distefano, and O’Hearn. We thank our FB colleagues Mathieu Baudet, Dominik Gabi, and Pieter Hooimeijer for their help, and Bryan O’Sullivan, David Mortenson, and Jim Purbrick for their support. Outside of Facebook, we particularly acknowledge the scientific contributions of David Pym, John Reynolds, Hongseok Yang, and Josh Berdine.