Originally published on the Facebook Code blog.
At Connectivity Lab, our mission is to connect the unconnected and underserved in the world. Ten percent of the world’s population lives in areas of the world where connectivity is simply not available; connecting these often remote and rural areas will require the development of new wireless communication technologies and platforms.
Defining the specifications of the technologies that we are developing first requires accurate information of how people are aggregated in these areas. For example, short-range access networks such as Wi-Fi hotspots are suitable for people living close together, while cellular technologies are better for regions where people live farther apart, in isolated houses. Additionally, knowing how communities are located in relation to one another is important for planning backhaul networks — the links to the internet backbone. Villages lined up along a river or road could be connected by a string of terrestrial point-to-point links, while scattered settlements might require an aerial backhaul solution such as unmanned aerial vehicles or satellites.
Whatever technological solution will eventually be used to connect these people, accurate knowledge about the population distribution is at the core of its development. Creating a data set with high spatial resolution for some of the countries that could benefit from better internet connectivity is a large undertaking. Aggregate population counts on the spatial scale of provinces or districts are known from population censuses but alone are insufficient, as these areas vary in geographical size and do not provide insight about population distributions on a granular level.
We solved this challenge by applying techniques from computer vision on DigitalGlobe high-resolution satellite imagery. We identified human-built structures, such as buildings or other infrastructure, and used those locations as a proxy for where people live. We then combined our results with existing census counts and created a population data set with 5-meter resolution for 20 countries. While recognizing structures in aerial imagery is a popular task in computer vision, scaling it to a global level came with additional difficulty. Aside from processing billions of images, finding buildings with high fidelity in rural areas is really a needle-in-a-haystack problem: Typically, more than 99 percent of the landmass we analyze does not contain any human-made structure, and it therefore poses a challenge for the machine learning algorithms to learn from such an unbalanced data set.
For the computer vision analysis, we used a combination of three image-processing steps. First, we used a conventional image-processing procedure to preselect candidate areas that potentially contained human-made structures, discarding images with vast bodies of desert, forest, and water. Next, we invoked Facebook’s image-recognition engine — based on a deep convolutional neural network that provides a fixed dimensional feature embedding for all images — and found that, with minor modifications, we could use the engine trained on normal photos to efficiently detect whether a satellite image contained a building. Finally, we developed a weakly supervised neural network with an architecture tailored for this particular problem. By using a binary labeling scheme (the image does/does not contain a building), the neural network learned “what” and “where” simultaneously. It succeeded in identifying outlines of buildings and highlighted those for which it had high confidence while suppressing areas not likely to contain human-made structures.
Typically, neural networks need to be trained on large volumes of images to obtain sufficient accuracy. Using the above approach, we were able to train our model by adding only about 8,000 binary labeled satellite images from within one country, and we found that the accuracy was only slightly reduced when the model was applied to other countries across the world.
Based on the settlement identification we redistributed all the census counts equally over all the buildings found within that census area. This method assumes equal population distribution per building within a census area, which we felt was the least error-prone method of obtaining population densities since it doesn’t make any assumptions on the number of people per building and constrains systematic errors to within one census area. Potentially, our results can be used to validate the census.
We analyzed 20 countries, which amounts to 21.6 million square kilometers and 350 TB of imagery. For one pass of our analysis we processed 14.6 billion images with our convolutional neural nets, typically running on thousands of servers simultaneously. Our final data set has a spatial resolution of 5 meters and thereby improves over previous countrywide data sets by multiple orders of magnitude. As an example, we show below a visualization of the results for Naivasha, Kenya.
DigitalGlobe satellite image of Naivasha, Kenya. DigitalGlobe satellite image of Naivasha, Kenya (left) and results of our analysis of the same area (right). DigitalGlobe satellite image of Naivasha, Kenya (left) and gridded population of the world v4 from CIESIN at Columbia University (right). DigitalGlobe satellite image of Naivasha, Kenya (left) and population distribution map of the same area from WorldPop (right). (Courtesy of WorldPop under a Creative Commons Attribution 4.0 International License.)
Many Facebook teams have been instrumental in the execution of this project. We collaborated with the Core Data Science team, who brought the expertise in handling large data sets and machine learning; the Infrastructure team, who provided the resources required to scale nearly instantaneously, enabling us to perform the analysis of all countries in less than two weeks; and the FAIR and Applied Machine Learning teams, who developed internal tools enabling us to use state-of-the-art, pretrained convolutional neural networks and to test whether this approach would work within a matter of hours.
Later this year, we will be releasing this data to the general public. We believe this data has many more impactful applications, such as socio-economic research and risk assessment for natural disasters. We will be working with the Center for International Earth Science Information Network at Columbia University to create a combined population data set to be released later this year.
Many people contributed to this project, but the core team members — all of whom were instrumental to making this project happen — are Angelica Escareno, Brian Karrer, Nan Li, Xianming Liu, Philip Yang, and Amy Zhang.