Impact evaluation costs a lot (World Bank average: $500,000 per impact evaluation). Combined with the fact that results may vary from context to context and multiple studies may be needed to form a clear idea of what to do, this leads some to argue that impact evaluation should not be the main tool that is used (see, for instance, Pritchett et al.’s arguments here, or IPA’s “Goldilocks” initiative which seeks to find “appropriately-sized” evaluations for different policy needs).
My guess is that in the future we will see a lot more leveraging of big data sets, especially given the heterogeneity of results.
With that in mind, I thought to gather some large data sets that could help. Here are the top 6 types of data I found, as well as 2 cautionary tales. Am I missing any?
1. CDR data
Call detail records (CDR) are formed when people make phone calls or send SMS messages. These have been used to track people’s movements in disasters, to track migrant workers, or estimate when transportation infrastructure might need improvement.
2. Satellite data
The night lights data have famously been used to estimate growth. Satellites can also measure vegetation or, conversely, urban expansion, as well as having a number of other uses. Back in 2009 I wrote up a system to extract satellite data from NASA’s public-but-difficult-to-access database; Amazon has recently added the data to its own servers, making it much easier to access.
3. Apps
I am surprised that more people haven’t leveraged apps. ResearchKit, for example, allows people to get medical information from people who sign up with an Apple device. You have to rely on people who choose to share their own information, but apparently tens of thousands of people have already voluntarily signed up to fill out surveys and take coordination tests or share their daily activity. I am still looking for good health apps for a developing country context; let me know if you have any suggestions. I’ve toyed with the idea of making my own version of an app just to get the data that people transmit.
4. Online education data
Online education would seem to lead to many opportunities to learn how to best convey new ideas and teach different material.
5. Social media data
My understanding is that it is illegal to scrape Facebook or LinkedIn. However, many companies will have an API from which one can legally obtain some of their data (here is a list of APIs). The rule of thumb more generally is that if it is not disallowed in a site’s robots.txt, it is okay, but there have been challenges to this. You can also legally get some data from Facebook if you start your own group, invite people to like it, and observe their interactions within your group.
6. Financial and other private sector data
Evaluations of microfinance programs have long benefitted from the fact that due to the nature of that type of program, data is collected on a regular basis. Other financial data are also typically produced on a frequent basis, and collaborations with companies can be very fruitful.
What didn’t work
Finally, David McKenzie has highlighted some data collection strategies that sounded promising but didn’t quite work, and it could be just as important to hear about them. One study attempted to measure inventory by photography; the other tried to use RFID technology to track inventory.