Weights and Methodology Brief for the COVID-19 Symptom Survey by University of Maryland and Carnegie Mellon University, in Partnership with Facebook

arXiv

Abstract

The Facebook company is partnering with academic institutions to support COVID-19 research and to help inform public health decisions. Currently, we are inviting Facebook app users in the United States to take a survey collected by faculty at Carnegie Mellon University (CMU) Delphi Research Center, and we are inviting Facebook app users in more than 200 countries or territories globally to take a survey collected by faculty at the University of Maryland (UMD) Joint Program in Survey Methodology. As part of this initiative, we are applying best practices from survey statistics to design and execute two components:

  1. Sampling Design: deciding who to invite to participate in the survey each day.
  2. Weighting Methodology: providing a weight per user so that respondents better represent the target population as a whole.

We and our partners designed this initiative with privacy in mind from the start. The survey and its privacy practices are reviewed by the Institutional Review Boards of both UMD and CMU. Facebook does not receive any survey responses to weight the data. Instead, UMD and CMU send Facebook the list of Random ID numbers for the users who complete the survey each day. We then use internal Facebook data covered by our Data Policy in conjunction with publicly available population benchmark data to calculate a single weight for each user in the survey sample., We then provide these weights only to researchers with an approved Data Use Agreement. We and our partners describe the surveys, including the privacy-preserving processes we use to weight the data, in a special issue of Survey Research Methods (Kreuter et al. 2020).

Using the total survey error framework (Groves and Lyberg 2010 [6]), our goal when calculating the weights is to minimize errors of representation, including coverage, random sampling and non-response errors. We achieve this through generating weights in two stages for the US CMU and global UMD surveys. First, we adjust for non-response error using Inverse Propensity Score Weighting (IPSW) to make the sample more representative of the sampling frame of Facebook app users. Second, we adjust for coverage error using poststratification with weights from the first stage as inputs. Intuitively, the final weights can be understood as the number of adults in the general population who are represented by a respondent in the sample for that day. A respondent who belongs to a demographic group that has a high likelihood of responding to the survey may get a weight of 100 while someone who belongs to a group that is less likely to respond may get a weight of 500.

The weights are available for the CMU US survey and, separately, for 114 other countries or territories in the UMD global survey. The set of non-US entities for which we provide weights was determined by our ability to generate high quality results as well as other considerations. Aggregate weighted estimates are publicly available through UMD and CMU. Academic and nonprofit researchers may request access to non-aggregated survey data in addition to the raw survey weights for their research. Once an initial request is approved by both Facebook and either UMD or CMU, the researcher’s institution must then sign Data Use Agreements before data access will be provided by UMD or CMU. More information can be found on the Facebook Data for Good website.

Below we provide a more technical overview of the methodology, our choices, and provide guidelines for using the survey weights.

Featured Publications