August 23, 2016

Robotron: Top-down Network Management at Facebook Scale

SIGCOMM

By: Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, James Hongyi Zeng

Abstract

Network management facilitates a healthy and sustainable network. However, its practice is not well understood outside the network engineering community. In this paper, we present Robotron, a system for managing a massive production network in a top-down fashion. The system’s goal is to reduce effort and errors on management tasks by minimizing direct human interaction with network devices. Engineers use Robotron to express high-level design intent, which is translated into low-level device configurations and deployed safely. Robotron also monitors devices’ operational state to ensure it does not deviate from the desired state. Since 2008, Robotron has been used to manage tens of thousands of network devices connecting hundreds of thousands of servers globally at Facebook.