All Research Areas
Research Areas
Year Published

115 Results

September 20, 2017

Characterizing Large-Scale Production Reliability for 100G Optical Interconnect in Facebook Data Centers

Frontiers in Optics / Laser Science (FiO/LS)

Facebook is deploying cost effective 100G CWDM4 transceivers in data centers. This paper describes the post production performance monitoring system which is being implemented to identify optical interconnect early failure modes.

By: Abhijit Chakravarty, Srinivasan Giridharan, Matt Kelly, Ashwin Poojary, Vincent Zeng
September 17, 2017

Passive Realtime Datacenter Fault Detection and Localization

USENIX ;login:

In this article, we present our passive hybrid approach that combines network path information with end-host-based statistics to rapidly detect and pinpoint the location of datacenter network faults inside a production Facebook datacenter.

By: Arjun Roy, James Hongyi Zeng, Jasmeet Bagga, Alex C. Snoeren
August 28, 2017

Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner

Very Large Data Bases Conference (VLDB)

We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k − 1)-cut metric, by optimizing a novel objective called probabilistic fanout. This choice allows a simple local search heuristic to achieve comparable solution quality to the best existing hypergraph partitioners.

By: Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita
August 21, 2017

Engineering Egress with Edge Fabric: Steering Oceans of Content to the World


This paper presents Edge Fabric, an SDN-based system we built and deployed to tackle the challenges of point presence for Facebook, which serves over two billion users from dozens of points of presence on six continents.

By: Brandon Schlinker, Hyojeong Kim, Timothy Cui, Ethan Katz-Bassett, Harsha V. Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, James Hongyi Zeng
August 21, 2017

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

Association for Computing Machinery's Special Interest Group on Data Communications (SIGCOMM)

In this paper, we show that up to hundreds of software load balancer (SLB) servers can be replaced by a single modern switching ASIC, potentially reducing the cost of load balancing by over two orders of magnitude. Today, large data centers typically employ hundreds or thousands of servers to load-balance incoming traffic over application servers.

By: Rui Miao, James Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, Minlan Yu
May 30, 2017

Sensors for Future VR Applications

International Image Sensor Workshop (IISW)

In this paper, we provide examples of some tracking and mapping functions of virtual reality sensors that illustrate the critical requirements and performance metrics. The sensor performance, form factor, power, and data bandwidth are the main challenges in a battery powered, always on VR devices.

By: Chiao Liu, Michael Hall, Renzo De Nardi, Nicholas Trail, Richard Newcombe
April 27, 2017

Passive Realtime Datacenter Fault Detection

USENIX Symposium on Networked Systems Design and Implementation (NSDI) 2017

We describe how to expedite the process of detecting and localizing partial datacenter faults using an end-host method generalizable to most datacenter applications.

By: Arjun Roy, James Hongyi Zeng, Jasmeet Bagga, Alex C. Snoeren
April 19, 2017

Joint User-Entity Representation Learning for Event Recommendation in Social Network

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

In this work, we consider the heavy sparseness in both user and event feedback history caused by short lifespans (transiency) of events and user participation patterns in a production event system. We propose to solve the resulting cold-start problems by introducing a joint representation model to project users and events into the same latent space.

By: Lijun Tang, Eric Yi Liu
April 1, 2017

Spinner: Scalable Graph Partitioning in the Cloud

IEEE International Conference on Data Engineering (ICDE)

In this paper, we present a graph partitioning algorithm to partition graphs with trillions of edges.

By: Claudio Martella, Dionysios Logothetis, Andreas Loukas, Georgos Siganos
March 27, 2017

Flexplane: An Experimentation Platform for Resource Management in Datacenters

USENIX Symposium on Networked Systems Design and Implementation (NSDI)

Flexplane enables users to program data plane algorithms and conduct experiments that run real application traffic over them at hardware line rates. Flexplane explores an intermediate point in the design space between past work on software routers and emerging work on programmable hardware chipsets. Like software routers, Flexplane enables users to express resource management schemes in a high-level language (C++), but unlike software routers, Flexplane runs at close to hardware line rates.

By: Amy Ousterhout, Jonathan Perry, Hari Balakrishnan, Petr Lapukhov