June 7, 2019

Preserving privacy while fostering meaningful research on elections and democracy

By: Christina DeGregorio, Bennett Hillenbrand, Da Li, Solomon Messing, Chaya Nayak

Social Science One and Facebook this week hosted training for more than 50 independent researchers announced last month by the Social Science Research Council to study the role of social media in elections and democracy. Drawn from 25 universities located across eight countries and five continents, the researchers participated in hands-on sessions and discussions this week in order to learn the new, highly secure tool created by Facebook to share aggregated and anonymized data with researchers for the purposes of studying social media’s role in elections.

At the training, researchers were given access to this new Facebook research tool along with an initial data set of URLs shared publicly on Facebook (rather than shared privately or to specific friends or Groups) by 100 or more unique users. This aggregated and anonymized data includes the URL link, the URL’s “share title,” a text summary of the content, information on the country where the URL was shared most often, and any ratings from Facebook’s third-party fact-checking partners. Researchers will be able to use this information in conjunction with data already available to them from CrowdTangle and Facebook’s Ads Library API to analyze topics ranging from “Measuring the Effects of Peer Sharing on Fake and Polarized News Consumption” to “False News on Facebook During the 2017 Chilean Elections: Analyzing Its Content, Diffusion, and Audience Characteristics” to “Mapping Disinformation Campaigns Across Platforms: The German General Election.”

An innovative approach to safeguarding data

This first-of-its-kind partnership between the academic research community and Facebook holds the potential to unlock important findings with large societal impact. At the same time, this work must be performed in a manner that protects people’s privacy. To achieve these dual goals, Facebook worked with the academic, privacy, and security communities to build a system that allows researchers to access data through a querying system that provides insights without revealing individual people’s identities.

This new system will help address challenges encountered in past data-sharing efforts:

  • Privacy and security: Traditional data-sharing approaches often rely on files being passed across parties and then analyzed on local machines, which can increase the risk of data leaking to an unauthorized third party. Our tool only allows researchers to analyze data through a remote system, which is one way we can minimize this risk.
  • Scalability: Past practices have also often required researchers to work in physical clean rooms under employment agreements with the organization sharing the data. These models are valuable, especially in the most sensitive circumstances, but in many cases they don’t scale to allow a diverse global community of academics to access data that could be groundbreaking for their research. By building a tool that can be accessed remotely, we hope our system can be readily used by leaders in the academic and research communities spread across continents.

A key innovation of the development of the research tool has been to build in systems, such as differential privacy, that help provide more formal guarantees of privacy. Differential privacy is an innovative new method of adding “noise” to data sets to protect against reidentification attacks, which attempt to break conventional anonymization techniques. For this project, the tool uses differential privacy to prevent those who have access to the data from determining whether a specific individual contributed to the data set.

Seeking outside accountability and expertise

In designing, building, and testing this innovative approach to data sharing, Facebook sought guidance from a wide range of experts. For example, in late 2018, we worked with a third-party security vendor to conduct security penetration tests, which allowed the team to identify and fix potential vulnerabilities prior to sharing any data. We also worked with Nick Nikiforakis (Stony Brook University) to validate that we utilized best practices for the removal of personally identifiable information from the URLs in the data set. And much of our work to ensure privacy in our data-sharing efforts has also involved the help of privacy experts from the academic community, including Michael Hay (Colgate University), Daniel Kifer (Penn State University), Aaron Roth (University of Pennsylvania), Abhradeep Thakurta (UC Santa Cruz), and Danfeng Zhang (Penn State University). Along with Social Science One, they will continue to advise our team and test our systems to help ensure that the tool offers strong privacy protection while still facilitating reliable research. This work will result in two formal white papers on our system’s ability to 1) protect privacy and 2) serve the research community.

Building on this work to further aid researchers

Over the next several months, Facebook will continue to work with our privacy advisers and Social Science One to confirm that the system we’re building offers strong privacy protections while also providing valuable insights for researchers. Additionally, we are continuing to review what other types of aggregated and anonymized data could safely be released to further aid researchers, and we look forward to these ongoing discussions.