After a natural disaster, humanitarian organizations need to know where affected people are located, what resources are needed, and who is safe. This information is extremely difficult and often impossible to capture through conventional data collection methods in a timely manner. As more people connect and share on Facebook, our data is able to provide insights in near-real time to help humanitarian organizations coordinate their work and fill crucial gaps in information during disasters. This morning we announced a Facebook disaster map initiative to help organizations address the critical gap in information they often face when responding to natural disasters.
Facebook disaster maps provide information about where populations are located, how they are moving, and where they are checking in safe during a natural disaster. All data is de-identified and aggregated to a 360 square meter tile or local administrative boundaries (e.g. census boundaries). 
This blog describes the disaster maps datasets, how insights are calculated, and the steps taken to ensure that we’re preserving privacy.
Data and aggregation
When people use the Facebook app with Location Services enabled, their latitudes and longitudes are received at regular intervals. Location information is used in a number of ways, like delivering a feature or content that is most relevant to people. For example, it allows us to send AMBER Alerts to people’s News Feeds in targeted search areas after a child has been abducted, or Safety Check notifications to those in regions affected by natural disasters.
The same geolocation data, when aggregated and de-identified, provides valuable information to humanitarian organizations after a natural disaster. Aggregation not only helps to preserve privacy, but also makes the data more usable and interpretable to organizations by separating signal from noise, and thus reducing the intermediate processing steps required to move from data to insights to action.
The disaster maps datasets are aggregated across time and space in the following ways:
- Temporal aggregation: While timely data is needed during a disaster, feedback from our partners indicated that organizations do not process and respond to new inputs in real time. For this reason, we share data at regular intervals (e.g., every hour, every six hours, every 24 hours). 
- Spatial aggregation: We aggregated geolocalized points to 360 square meter grid or local administrative boundaries.
- Spatial smoothing: Once we have calculated each metric (e.g., the number of people in administrative or pixel unit x during time period y), spatial smoothing is performed. For each spatial location, we compute a weighted average of the value in the tile itself with the values in neighboring tiles; tiles that are closer have a bigger contribution to the final result . This local averaging results in a map with a smoother, clearer signal, reducing noise due to random variation while preserving the key signal and further protecting privacy.
Composition of the disaster maps datasets
Using the data and aggregation techniques described above, we’re able to generate three unique but complementary datasets:
- Population: Metrics indicating the density of the Facebook population in each tile.
- Movement: Metrics related to population movements between tile pairs.
- Safety Check: Metrics indicating the density of Safety Check check-ins versus total invitations for each tile.
By aggregating geolocation data, we are able to show a smoothed representation of how many people with location services enabled are using Facebook’s app in each administrative region or map grid for each time period.
One of the limitations of providing counts is that it is not immediately obvious which values represent important deviations from normal. To help provide this context, we also include baseline counts – an approximation of how many people (measured from the same population) are in each administrative area averaged over the previous three weeks at the same time. By matching on location and time, we can be more confident that any differences we observe are due to the disaster event and thus are important to focus on. We also provide additional statistics to indicate whether the observed changes in density are statistically meaningful.
The data is structured as follows where each metric is calculated per unique area: 
- crisis_name: The name of the event.
- time_window: The hour(s) during which the data are recorded.
- area_id: The tile name. In the raster form, this represents a given raster pixel on the map which can be spatially aggregated to be interoperable with other data sets. In the administrative form, the area_id represents the administrative boundary name of an area that can be joined with other administrative datasets (e.g. census data).
- n_baseline, density_baseline: The average number of people in the same area during the same time window, but averaged over the previous three weeks. This estimates how many people we expect to be in each area during the specified time.
- n_crisis, density_crisis: The number of people observed in the tile during the time period t.
- n_diff: The difference between the population at the time of the crisis and the population during the baseline.
- percent_change: The percentage difference between the population at the time of the crisis and the population during the baseline.
- z_score: The number of standard deviations by which the crisis population differs from the baseline.
This dataset contains information about the number of people moving between tile pairs over a given time period. We measure this during baseline (movement between tile pairs averaged across the three weeks prior to the disaster) as well, so we can understand how many more or fewer people are moving during the disaster period compared to usual. This helps us distinguish disaster related movements from people’s normal migration patterns.
These data look like:
- area_id_start and area_id_end represent the tile pairs, where s is the start tile, and e is the end tile.
- n_people_baseline is the total number of people who moved from s to e during time period t averaged over the three weeks prior to the disaster.
- n_people_crisis is the total number of people moving from s to e during time t.
- n_diff is the difference between the number of people moving from s to e during the disaster relative to the baseline.
- percent_change is the percentage difference between the number of people moving from s to e during the disaster relative to the baseline.
Safety Check Maps
Some of the metrics we provide for each disaster derive from Facebook’s Safety Check product. Safety Check helps people connect with friends and family during a disaster. People are invited that might be affected by the crisis to check in safe. Once they have checked in, they can then invite others that might be affected. In this way, invitations to check in safe spread to people on Facebook who are likely affected by a disaster.
We aggregate and share Safety Check data to show where people are indicating that they are safe.
These data look like:
- n_invited is the total number of people invited to Safety Check who are located in area a.
- n_safe is the total number of people who checked in safe during time t or any prior time in area a.
- safe_ratio is the proportion of people in area a who have checked in safe out of the number invited.
When analyzing this information, it is important to keep in mind that the people invited to Safety Check may not represent a uniform sample across the disaster-affected area, and that the data will accrue over time. Additionally, there are many reasons why people may not check in safe, for example: they are unsafe or busy dealing with the crisis, there is a lack of connectivity where they are, or they are completely unaffected and do not feel the need to respond to the invitation. For these reasons, it’s important to consider this information in context.
The insights included in the disaster maps data are representative of people who use the Facebook app and have Location Services enabled. This subset of people is likely different from the broader population, particularly in regions where connectivity is scarce.
We encourage our humanitarian partners, who are experts in disaster response, to use our data as a part of a broader set of data that help inform resource deployment. Specifically, they should take into account that our data represents this specific population and to consider it in the context of the other information they receive. As a next step, we’re teaming up with data science teams from UNICEF, World Food Programme, and the Red Cross to analyze potential biases in the data so that we can correct for and report them to the community.
For example, we can evaluate the spatial coverage of our data by comparing the Disaster Maps Density dataset to open-source population density datasets (such as Facebook’s population Density Maps). By making this comparison, we can clearly communicate to our partners any areas that are likely not adequately covered by the disaster maps population dataset.
Getting the disaster maps data into the right hands
Over the next few months, we will collaborate closely with our partners to further validate the disaster maps data offerings. As we validate the data, we are working in parallel to ensure that the datasets are accessible to humanitarian responders that are actively driving policy and response efforts during natural disasters.
Our Infrastructure team is building an API and visualization tool that will allow us to make the disaster maps available to organizations around the world who have the capacity to use the data for humanitarian response. This API will offer visualization and download functionality, and will be as interoperable as our disaster maps datasets, allowing our partners to access the data in temporally and spatially high resolution at the level of aggregation that is most useful to them (e.g. raster or administrative data).
If you’re interested in working with the Disaster Maps data through a partnership or research, reach out to email@example.com.
 In some cases where the crisis impacts a large region, such as a whole country, the tile size is slightly lower resolution.
 If we receive multiple location pings from a person in the time window, we use the most frequently occurring place; if there is a tie, we then use the most recent of the frequently occurring locations within the time window.
 Formulas in the chart are adapted to increase comprehension.