Modern data centers operate with hundreds of software systems, running in millions of containers, serving billions of requests per second [Hahn LISA ’18]. At such a large scale, high operational flexibility, debuggability, and high reliability are paramount.
Unfortunately, at such a large scale, three key challenges are at odds with achieving high operational flexibility, debuggability, and high reliability:
- Interdependence. The programs running in modern data centers make up larger workloads, and failures in one program can have a ripple effect, causing failures in other programs or in the entire workload.
- Distribution. The workloads in modern data centers are distributed across many servers – and even across the globe – and server failures in one location can have widespread implications across the site.
- Heterogeneous hardware. Modern data centers are composed of a diverse mix of hardware, from commodity devices to special-purpose accelerators, and servers are configured with different compute, memory, and storage profiles. In such an environment, fail-slow behavior, where only a single job makes forward progress but becomes slow, can collapse cluster performance [Gunawi+ FAST ’18].
At Facebook, we are performing forward-looking research into the area of distributed systems, applying key techniques from the field at Facebook’s scale and sharing our designs, implementations, insights, and data with the community. In recent years, we’ve released our work on Distributed Configuration Management [Tang+ SOSP ’15], Large-Scale Production Load Testing [Veeraraghavan+ OSDI ’16], End-to-End Performance Tracing [Kaldor+ SOSP ’17], and more [Huang+ SOSP ’17, Veeraraghavan+ OSDI ’18, Annamalai+ OSDI ’18].
To understand the future challenges that have yet to emerge, it is important that Facebook builds a strong collaboration pipeline with key experts in academia. Therefore, Facebook is pleased to invite faculty and graduate students to respond to this year’s call for research proposals pertaining to distributed systems. We anticipate giving eight awards, each around $50,000 as an unrestricted gift to the proposer’s host university.
While all proposals are welcome, we are particularly interested in those that address fundamental challenges that arise in distributed systems operating at an extremely large scale. Example topics include:
- Distributed performance tracing and analysis
- Efficient shard mapping and placement (on the order of billions of shards)
- Request load balancing (e.g., for HTTP and RPC traffic)
- Fault tolerance (with a focus on power and network fault domains)
- Large-scale configuration management, monitoring, and deployment
- Efficient use of hardware resources via software codesign
Applicants should submit a proposal detailing what contribution their research is expected to make, how the broader research community will benefit from the work, a project timeline, and an overview of how the proposed funding will be used.
Proposals should include
- A summary of the project (1-2 pages) explaining the area of focus, a description of techniques, any relevant prior work, and a timeline with milestones and expected outcomes
- A draft budget description (1 page) including an approximate cost of the award and an explanation of how funds would be spent
- Curriculum vitae for each project participant
- Organization details; this will include tax information and administrative contact details
- Awards must comply with applicable U.S. and international laws, regulations, and policies.
- Applicants must be current full-time faculty at an accredited academic institution that awards research degrees to PhD students.
- Applicants must be the principal investigator on any resulting award.
Winners will be invited to a Facebook event at one of our data centers for a tour and summit discussing the winning proposals and an opportunity for networking with Facebook engineers. The event will be sometime in spring 2020, but the exact date and location have yet to be determined. Facebook will pay for the winners’ travel and accommodations to attend (one representative per winning proposal).
Timing and dates
Applications are now open. Deadline to apply is December 6 at 11:59 p.m. AOE.
Notifications will be sent by email to selected applicants in January 2020.
Frequently Asked Questions
Do you typically limit the salary of the PI in the gift?+
Should the proposal be double- or single-spaced? Is there any required/expected font?+
What is the award cycle or when does the funding year begin and end?+
Can award funds be used to cover a researcher's summer salary while conducting research?+
Can you please explain the budget breakdown in more detail?+
We are working as co-PIs and are at the same institution. Is it possible to list both of our names as PI for an RFP proposal?+