October 30, 2017

Facebook SOSP papers present real-world solutions to complex system challenges

By: Meta Research

SVE: Distributed Video Processing at Facebook Scale

Video is a growing part of the experience of the billions of people who use the Internet. On an average day, videos are viewed more than 8 billion times on Facebook, and each of these videos needs to be uploaded, processed, shared, and then downloaded. This week Facebook researchers Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, and Wyatt Lloyd present their paperSVE: Distributed Video Processing at Facebook Scale at the biennial ACM Symposium on Operating Systems Principles (SOSP). SOSP is the world’s premier forum for academics and industry practitioners to share their latest research spanning theory and practice of computer systems software.

A video-first world

At Facebook, we envision a video-first world with video being used in many of our apps and services. The Streaming Video Engine (SVE) paper focuses on the requirements that Facebook scale presents, and how we handle them for the uploading and processing stages of the video pipeline. Processing uploaded videos is a necessary step before they are made available for sharing. It includes validating that the uploaded file follows a video format and then re-encoding it into a variety of bitrates and formats. Multiple bitrates enable clients to continuously stream videos at the highest sustainable quality under varying network conditions, and multiple formats enable support for diverse devices with varied client releases.

There are three major requirements for Facebook’s video uploading and processing pipeline: provide low latency, be flexible enough to support many applications, and be robust to faults and overload. Flexibility allows us to address the ever-increasing quantity and complexity of uploading and processing operations, including managing low latency sharing and application-specific video operations, such as computer vision extraction and speech recognition. Overload is inevitable due to highly variable workloads that include large peaks of activity that, by their viral nature, are unpredictable. We therefore need our systems to be reliable.

Initial solutions for uploading and processing videos centered around a monolithic encoding script (MES). The MES worked when videos were nascent but did not handle the requirements of low latency, flexibility, or robustness well, as we scaled. To handle those requirements, the team developed the SVE.

SVE provides low latency by harnessing parallelism in three ways that MES did not. Firstly, SVE overlaps the uploading and processing of videos. Secondly, it parallelizes the processing of videos by chunking them into (essentially) smaller videos and processing each chunk separately in a large cluster of machines. Thirdly, SVE parallelizes the storing of uploaded videos, with replication for fault tolerance, with processing. Taken together, these improvements enable SVE to reduce the time between an upload complete and video share by 2.3X–9.3X over MES.

SVE provides flexibility with a directed acyclic graph (DAG) programming model where application programmers write tasks that operate on a stream-of-tracks abstraction which break a video down into tracks of video, audio, and metadata that simplify application programming. A DAG can be dynamically generated for each video even within the same application, simplifying the process to offer a customized pipeline for every upload. The DAG execution model allows programmers to chain tasks sequentially, enabling parallelization, when it matters for latency. There have been more than 15 applications registered with SVE that integrate video, including four well known examples as video posts on News Feed, Messenger videos, Instagram stories, and Facebook 360. These applications ingest over tens of millions of uploads per day and generate billions of tasks within SVE.

Faults are inevitable in SVE due to the scale of the system, our incomplete control over the pipeline, and the diversity of uploads we receive. SVE handles component failure through a variety of replication techniques and masks non-deterministic and machine-specific processing failures through retries. Processing failures that cannot be masked are simple to pinpoint in SVE due to automatic, fine-grained monitoring. SVE provides robustness to overload through a progressing set of reactions to increasing load. These reactions escalate from delaying non-latency-sensitive tasks, to rerouting tasks across datacenters, to time-shifting load by storing some uploaded videos on disk for later processing.

Canopy: End-to-End Performance Tracing at Scale

Also being presented at SOSP this week is the paper Canopy: An End-to-End Performance Tracing and Analysis System by Facebook researchers Jonathan Kaldor, Michał Bejda, Edison Gao, Wiktor Kuropatwa, Joe O’Neill, KianWin Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song, along with Facebook Fellow Jonathan Mace from Brown University. Canopy is Facebook ’s end-to-end performance tracing infrastructure, which records causally related performance data across the end-to-end execution path of requests, including from browsers, mobile applications, and backend services.

Addressing the challenges of scaling performance analysis

Performance is an important part of the overall experience of using Facebook. Our users expect to be able to quickly do things like start our mobile application or load facebook.com in their browsers, view their News Feed, or see their notifications. Understanding this performance can be challenging because users can perform a wide range of actions and access Facebook through heterogeneous clients, including web browsers and mobile apps, which offer varying levels of information and control over the application. Each action, such as loading a page on Facebook.com, entails complex executions spanning clients, networks, and distributed back-end services, each with their own potential issues. When performance problems arise, the cause needs to be quickly isolated be resolved, but with so many possible causes, the data needs to be rapidly collected and sifted through.

If we were able to collect information from every system involved in a given request and correlate them together, it could be combined to get a global picture of that single request. This is the idea behind distributed tracing, explored in previous systems research like Dapper and X-Trace and available in public implementations like Zipkin. Gathering a distributed trace for a given request enables developers to collect and correlate information for that request, usually through some shared trace_id. However, this doesn’t directly solve Facebook’s performance analysis problem, because a single request will likely not provide enough information to figure out what is going on. We need to look at millions – or more – requests, and see how performance behaves in aggregate, customized for the specific type of request or action being examined.

To solve this challenge, the team built Canopy, a system used at Facebook for collecting and analyzing these types of distributed traces. Canopy allows engineers to define custom datasets, which includes a filtering rule specifying which traces are relevant as well as any number of features, which are extracted values from a single trace. Features typically are the answer to some question engineers may have about a trace – for our pageload example above, feature examples include:

  • Straightforward measurements like “how long did a particular phase of the request take?”
  • More complex analysis like “figure out the parts of the request which, if they were made faster would speed up the request as a whole, and compute how much of that was on the browser/server/network”
  • Analysis over subcomponents of the request, like “for each page element that is on facebook.com, compute how much time we spent generating that element”

Fundamentally, these features map to columns in a database table, while each row corresponds to some tuple of unique keys from a trace (e.g. (trace_id) or (trace_id, page-element)). Canopy’s backend processes traces in realtime, extracts features defined within each dataset, and outputs the resulting rows to an underlying data store for future querying. Features and datasets are designed to be quickly deployable, with changes being reflected in minutes. Features can be generally applicable to a broad number of traces or specific to only a few types, and Canopy encourages engineers to build customized datasets per domain to extract the features they care about, so we e.g. have datasets specific to mobile app startup, browser pageloads, etc.

In order to apply to as many scenarios as possible, Canopy is highly flexible and customizable, starting with the ability to define custom datasets, but it doesn’t end there. There is a wide variety of system types, execution models, and types of performance data to collect. Collecting and representing this data involves instrumenting every system we want to collect data from, or at a minimum core frameworks and libraries they all depend on.

Canopy has been in use at Facebook for the past 2 years to diagnose performance issues through analysis of 1 billion traces a day. These traces start from browsers, mobile applications, or backend services, and are fed through >100 different datasets to extract thousands of different features. Canopy has helped address several challenges in scaling performance analysis using trace data, and has found numerous issues in both server code and client code, as well as the network connecting the two.

SOSP Papers

SVE: Distributed Video Processing at Facebook Scale
Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, and Wyatt Lloyd

Canopy: An End-to-End Performance Tracing and Analysis System
Jonathan Kaldor, Jonathan Mace, Michał Bejda, Edison Gao, Wiktor Kuropatwa, Joe O’Neill, KianWin Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song