Sharing a MapReduce cluster between users is attractive because it enables statistical multiplexing (lowering costs) and allows users to share a common large data set. However, we find that traditional scheduling algorithms can perform very poorly in MapReduce due to two aspects of the MapReduce setting: the need for data locality (running computation where the data is) and the dependence between map and reduce tasks.
We illustrate these problems through our experience designing a fair scheduler for MapReduce at Facebook, which runs a 600-node multi-user data warehouse on Hadoop. We developed two simple techniques, delay scheduling and copy-compute splitting, which improve throughput and response times by factors of 2 to 10. Although we focus on multi-user workloads, our techniques can also raise throughput in a single-user, FIFO workload by a factor of 2.