January 30, 2017

Data Sharing on traffic pattern inside Facebook’s data center network

By: James Hongyi Zeng

Today, we are happy to announce that we are releasing packet header samples from several of our datacenters to facilitate datacenter network research. The data set has been shared with a few research groups since late last year and now it’s open to everyone. To get started, please request to join Facebook Network Analytics Data Sharing group for details and instructions.

Large cloud service providers have invested in increasingly larger datacenters to house the computing infrastructure required to support their services. Accordingly, researchers and industry practitioners alike have focused a great deal of effort designing network fabrics to efficiently interconnect and manage the traffic within these datacenters in performant yet efficient fashions.

Unfortunately, datacenter operators are generally reticent to share the actual requirements of their applications, making it challenging to evaluate the practicality of any particular design. Moreover, the limited large-scale workload information available in the literature has, for better or worse, heretofore largely been provided by a few operators whose use cases may not be widespread.

First introduced at SIGCOMM 2015, we worked closely with a research team at UCSD and shared some insights of the traffic pattern of Facebook’s datacenter network in our paper, “Inside the Social Network’s (Datacenter) Network.” In the paper, we discussed several discoveries that had not been reported in the literature. Since then, numerous researchers have inquired whether we can share the data we collected so that they can further the research in this area, and today we are pleased to announce that we are making the data available to everyone.

At Facebook, our mission is to make the world more open and connected – we value openness a lot. The data now available was collected during a 24-hour period from 3 clusters running different applications – Frontend, Database, and Hadoop. These clusters are located in our Altoona Data Center, which has Facebook’s datacenter fabric topology. Each sample contains packet header information, as well as some metadata such as locality and packet length. The data is released under the CC BY-NC 3.0 US license.

There are some interesting results that can be derived from the data set. For example, prior studies have observed heavy rack locality in datacenter traffic. This behavior seems in line with applications that seek to minimize network utilization by leveraging data locality, allowing for topologies with high levels of over-subscription. The table below shows the clear majority of traffic is intra-cluster but not intra-rack (i.e., the 12.9% of traffic that stays within a rack is not counted in the 57.5% of traffic labeled as intra-cluster). Moreover, more traffic crosses between datacenters than stays within a rack.

The table further breaks down the locality of traffic generated by the top-five cluster types which, together, account for 78.6% of the traffic in Facebook’s network. Hadoop clusters generate the most traffic (23.7% of all traffic), and are significantly more rack-local than others, but even its traffic is far from the 40—80% rack-local reported in the literature. Rather, Hadoop traffic is clearly cluster local. Frontend (FE) traffic is cluster-local by design, but not very rack-local, and the locality of a given rack’s traffic depends on its constituent servers (e.g., Web server, Multifeed, or cache).

This distinction is clearly visualized in the figure below. The two left portions of the figure graph the relative traffic demands between 64 racks within clusters of two different types. While we show only a subset of the total set of racks in each cluster, the pattern is representative of the cluster as a whole. Traffic within the Hadoop cluster (left) is homogenous with a very strong diagonal (i.e., intra-rack locality). The cluster-wide uniformity outside the local rack accounts for intra-cluster traffic representing over 80% of Hadoop traffic—even though traffic to the local rack dominates any given other rack in isolation. Map tasks are placed to maximize read locality, but there are a large number of concurrent jobs which means that it is possible that any given job will not fit entirely within a rack. Thus, some amount of traffic would necessarily need to leave the rack during the shuffle and output phases of a MapReduce job. In addition, the cluster serves data requests from other services which might not strive for as much read locality, which would also contribute to reduced overall rack locality. The Frontend cluster (center) exhibits three different patterns according to rack type, with none being particularly rack-local. In particular, we see a strong bipartite traffic pattern between the Web servers and the cache followers in Webserver racks that are responsible for most of the traffic, by volume, within the cluster. This pattern is a consequence of placement: Web servers talk primarily to cache servers and vice versa, and servers of different types are deployed in distinct racks, leading to low intra-rack traffic.

This striking difference in Facebook’s locality compared to previously studied Internet-facing user-driven applications is a consequence of the realities of serving a densely connected social graph. Cache objects are replicated across clusters; however, each object typically appears once in a cluster (though hot objects are replicated to avoid hotspots. Since each Web server needs to be able to handle any request, they might need to access data in a potentially random fashion due to load balancing.

To make this argument more concrete, loading the Facebook news feed draws from a vast array of different objects in the social graph: different people, relationships, and events comprise a large graph interconnected in a complicated fashion. This connectedness means that the working set is unlikely to reduce even if users are partitioned; the net result is a low cache hit rate within the rack, leading to high intra-cluster traffic locality. In addition, partitioning the graph such that users and their data are co-located on racks has the potential to introduce failure modes which disproportionately target subsets of the user base, leading to a suboptimal experience.

We believe that in this dataset, there is still much to be discovered. We now invite you to join us in the discovery! For more information, free free to contact us at networkanalytics@fb.com.

Thanks Arjun Roy, Alex Snoren, and George Porter of UCSD, Jasmeet Bagga and Omar Baldonado of Facebook for the initial collaboration on mining data patterns; and a lot of people at Facebook for making the data open sourcing possible.