Applications closed

Next-generation Data Infrastructure request for proposals

About

All around the world, businesses and organizations are becoming increasingly data driven, products and services are built more and more around intelligence derived from data, and the need for reliable and efficient data storage and processing at a global scale is becoming even more critical. Modern data infrastructure architectures have emerged from years of evolution in analytical and transactional data systems, along with a continuous infusion of capabilities stemming from new use cases and new data processing paradigms. Tightly coupled data warehouses are being replaced by more flexible ecosystems built around low-cost globally available storage and open file formats; data science and machine learning workloads are increasingly sharing the same infrastructure as analytical workloads; transactional systems and key-value stores are exploring ways to preserve consistency, reliability, and performance while operating efficiently at global scale. Yet, despite all these efforts and progress, many challenges still remain as the data management community is seeking out the defining characteristics of next-generation data infrastructure.

Facebook has had a long history of making contributions to the data management space – Hive, Presto, RocksDB, MyRocks all being examples of innovative work that started within the company. The scale at which we run and the unique constraints of our workloads make many existing solutions infeasible and provide a perspective that leads to new ideas. As we continue to build and evolve our data infrastructure, we are focused on a number of problems. These range from techniques to optimize CPU usage (and thus power consumption during large scale query processing) to strategies to optimize physical layouts and data transfer bandwidth, and from techniques to address the challenges rising from data storage and processing across widely separated data centers to novel approaches in converging data wrangling, machine learning, and analytics. Since guaranteeing correctness is a key requirement for our data storage and processing systems, we also remain focused in systems for testing and verification. Despite the unique constraints of our workloads, a lot of these problems are common in the industry and we believe that there is a lot to be gained by collaborating with academia in this area.

To foster further innovation in this area, and to deepen our collaboration with academia, Facebook is pleased to invite faculty to respond to this call for research proposals pertaining to the aforementioned topics. We anticipate awarding a total of 10 awards, each in the $50,000 range. Payment will be made to the proposer’s host university as an unrestricted gift. In addition, PIs and Co-PIs on the winning proposals will be automatically granted access to CrowdTangle, a public insights tool from Facebook that makes it easy to follow, analyze, and report on what’s happening with public content on social media. Learn more about CrowdTangle here.


Award Recipients

University of Maryland, College Park

Daniel Abadi

Swiss Federal Institute of Technology Lausanne

Anastasia Ailamaki

The Ohio State University

Spyros Blanas

University of California, Berkeley

Natacha Crooks

University of Chicago

Haryadi S. Gunawi

ETH Zurich

Ana Klimovic

University of Wisconsin–Madison

Paraschos Koutris

Massachusetts Institute of Technology

Tim Kraska

Technische Universität Dresden

Wolfgang Lehner

University of California, Irvine

Faisal Nawab

Applications Are Currently CLosed

Application Timeline

Launch Date

April 19, 2021

Deadline

June 2, 2021

Winners Announced

July or August 2021

Areas of Interest

Areas of interest include, but are not limited to, the following:

1. Large scale query processing

Data processing at scale imposes substantial CPU and power challenges to Facebook’s data centers. We are interested in techniques that can optimize the usage of CPU during common data processing pipelines, including, but not limited to the following:

  • Advances in vectorized engines, vectorized operators, and fast data decoding and decompression
  • Code generation techniques to accelerate query execution
  • Novel query optimization strategies and adaptivity techniques
  • Innovations in processing of time series, semi and unstructured data sets and graph data

2. Physical layout and IO optimizations

Large scale decoupled data systems make heavy use of IO when transferring data from storage to compute nodes, and from permanent media to main memory. We are looking for innovative strategies and techniques that can reduce the amount of data transferred during data processing pipelines, including but not limited to the following:

  • Micro-layout: innovative file formats, data encoding techniques, column and row reordering, new compression algorithms, efficient representations for semi and unstructured data
  • Macro-layout: novel partitioning strategies and indexing structures to improve data pruning, kv structures that offer innovative ways of balancing Read/Write/Memory overhead for OLTP workloads, innovations on materialized views and virtual tables
  • Caching: new local and remote caching systems for hot blocks, novel eviction strategies and hierarchical caching techniques
  • Tuning: advanced systems able to optimize data physical layout based on changing workloads

3. Data management and processing at a global scale

Data storage and processing across widely separated data centers presents a different set of challenges. We are interested in techniques that look to address problems caused by increased latency, resource constraints such as network bottlenecks, and heterogeneous hardware. Areas include but not limited to the following:

  • Global replication, transaction management, and consistency for OLTP use cases
  • Global data placement algorithms that balance cost concerns with latency requirements
  • Resource management algorithms that balance compute allocation for data processing workloads on a global scale

4. Converged architectures for data wrangling, machine learning, and analytics

Decoupling compute from storage and using low-cost storage based on open file formats to store from raw to fully curated data and for a wide variety of use cases has led to the need to rethink many areas of data management, including but not limited to the following:

  • Data modeling, data lineage, and data governance at scale and for complex workflows
  • Systems, languages, and APIs for expressing and executing efficiently complex business logic and data transformations
  • Systems and techniques to bring closer together analytical, data science, and machine learning workloads

5. Advances in testing and verification for storage and processing systems

Guaranteeing correctness is a key requirement for our data storage and processing systems. We are looking for advances in systems to test and verify that these systems perform correctly and within spec when change (e.g., new code, faults, new hardware) is introduced. Areas include but not limited to the following:

  • Randomized and fuzzy testing for storage (key value, relational) and compute (e.g., SQL)
  • Fault injection and chaos testing
  • Formal verification techniques for distributed algorithms

Requirements

Proposals should include

  • A summary of the project (1–2 pages), in English, explaining the area of focus, a description of techniques, any relevant prior work, and a timeline with milestones and expected outcomes
  • A draft budget description (1 page) including an approximate cost of the award and explanation of how funds would be spent
  • Curriculum Vitae for all project participants
  • Organization details; this will include tax information and administrative contact details

Eligibility

  • Proposal must comply with applicable U.S. and international laws, regulations and policies.
  • Applicants must be current full-time faculty at an accredited academic institution that awards research degrees to PhD students.
  • Applicants must be the Principal Investigator on any resulting award.
  • Facebook cannot consider proposals submitted, prepared or to be carried out by individuals residing in, or affiliated with an academic institution located in, a country or territory subject to comprehensive U.S. trade sanctions.
  • Government officials (excluding faculty and staff of public universities, to the extent they may be considered government officials), political figures, and politically affiliated businesses (all as determined by Facebook in its sole discretion) are not eligible.

Frequently Asked Questions

Terms & Conditions

Facebook’s decisions will be final in all matters relating to Facebook RFP solicitations, including whether or not to grant an award and the interpretation of Facebook RFP Terms and Conditions. By submitting a proposal, applicants affirm that they have read and agree to these Terms and Conditions.

  • Facebook is authorized to evaluate proposals submitted under its RFPs, to consult with outside experts, as needed, in evaluating proposals, and to grant or deny awards using criteria determined by Facebook to be appropriate and at Facebook’s sole discretion. Facebook’s decisions will be final in all matters relating to its RFPs, and applicants agree not to challenge any such decisions.
  • Facebook will not be required to treat any part of a proposal as confidential or protected by copyright, and may use, edit, modify, copy, reproduce and distribute all or a portion of the proposal in any manner for the sole purposes of administering the Facebook RFP website and evaluating the contents of the proposal.
  • Personal data submitted with a proposal, including name, mailing address, phone number, and email address of the applicant and other named researchers in the proposal may be collected, processed, stored and otherwise used by Facebook for the purposes of administering Facebook’s RFP website, evaluating the contents of the proposal, and as otherwise provided under Facebook’s Privacy Policy.
  • Neither Facebook nor the applicant is obligated to enter into a business transaction as a result of the proposal submission. Facebook is under no obligation to review or consider the proposal.
  • Feedback provided in a proposal regarding Facebook products or services will not be treated as confidential or protected by copyright, and Facebook is free to use such feedback on an unrestricted basis with no compensation to the applicant. The submission of a proposal will not result in the transfer of ownership of any IP rights.
  • Applicants represent and warrant that they have authority to submit a proposal in connection with a Facebook RFP and to grant the rights set forth herein on behalf of their organization. All awards provided by Facebook in connection with this RFP shall be used only in accordance with applicable laws and shall not be used in any way, directly or indirectly, to facilitate any act that would constitute bribery or an illegal kickback, an illegal campaign contribution, or would otherwise violate any applicable anti-corruption or political activities law.
  • Awards granted in connection with RFP proposals will be subject to terms and conditions contained in the unrestricted gift agreement (or, in some cases, other mechanisms) pursuant to which the award funding will be provided. Applicants understand and acknowledge that they will need to agree to these terms and conditions to receive an award.