November 1, 2017

High-Resolution Measurement of Data Center Microbursts

ACM Internet Measurement Conference

By: Qiao Zhang, Vincent Liu, James Hongyi Zeng, Arvind Krishnamurthy

Abstract

Data centers house some of the largest, fastest networks in the world. In contrast to and as a result of their speed, these networks operate on very small timescales—a 100 Gbps port processes a single packet in at most 500 ns with end-to-end network latencies of under a millisecond. In this study, we explore the fine-grained behaviors of a large production data center using extremely highresolution measurements (10s to 100s of microsecond) of rack-level traffic. Our results show that characterizing network events like congestion and synchronized behavior in data centers does indeed require the use of such measurements. In fact, we observe that more than 70% of bursts on the racks we measured are sustained for at most tens of microseconds: a range that is orders of magnitude higher-resolution than most deployed measurement frameworks. Congestion events observed by less granular measurements are likely collections of smaller µbursts. Thus, we find that traffic at the edge is significantly less balanced than other metrics might suggest. Beyond the implications for measurement granularity, we hope these results will inform future data center load balancing and congestion control protocols.