:::: MENU ::::

Workshop on Causal Inference and Extrapolation

I have been helping the Global Priorities Institute at the University of Oxford organize a workshop on new approaches in causal inference and extrapolation, coming up March 16-17. It is pretty small, with a lot of breaks to facilitate good discussion (designed with SITE in mind as a model). As it is just before the annual conference of the Centre for the Study of African Economies, we are hoping others can make it, especially to the keynote, which is on the evening before the CSAE conference begins and is open to all. I’m really excited about both the institute and the workshop and hope that this becomes an annual event in some way, shape or form.

If you will be in town and would like to attend but have not signed up, please let me know. The programme is here.

Edit: site appears to be down (too much traffic?), so I am uploading a copy of the programme here.


Aidan Coville and I have been running a behavioural experiment with policymakers, practitioners and researchers to see how they update based on new evidence from impact evaluations and what biases they may have in updating. We have done some analysis of the numerical data, but we also have the audio transcripts from the one-on-one enumeration, where people were asked to describe their thought processes and why they gave the answers they gave. We are looking for someone proficient in text analysis to collaborate on a separate paper. This would be a great opportunity for grad students, but others welcome, too. Please pass on to anyone you know who might be interested.

Second, please see this post for a description of a Research Assistant position. Deadline is Jan. 31.

Finally, I am helping to organize a workshop for the Global Priorities Institute at the University of Oxford on causal inference and extrapolation. The call for papers is closed but please get in touch if you are interested in attending, as space is limited.

Clean meat is not a panacea

I am a very strong advocate of “clean” or “cultured” meat – meat that is grown in a lab rather than in an animal. But it’s because I’m a strong advocate that I think it’s better to look at what the evidence says and not let our biases get in the way.

And based on the results of two experiments, I suspect there will be more resistance to clean meat than we’d like to think.

Bobbie Macdonald and I did a set of two experiments leveraging Amazon’s Mechanical Turk, a site which allows people to take surveys for money. We focused on U.S. respondents, as the U.S. is one of the first places where clean meat is expected to be made commercially available.

First, we found that while many people were very excited about clean meat, a large group was concerned that it wouldn’t taste good, that it would be too expensive, and that it was “unnatural” and vaguely unsafe or unhealthy.

In one of our papers, we tried to overcome the “naturalistic heuristic” that people seemed to be using to judge clean meat to be unhealthy even in the absence of any evidence that this was the case (in fact, a subset of the sample was provided with evidence that clean meat may be healthier than conventional meat). We tried several messaging strategies: an approach that tried to directly debunk the idea that “natural is good” (“direct debunking”); an approach that noted that many of the food products respondents currently enjoy are “unnatural” (“embrace unnatural”); and a descriptive norms approach (“descriptive norms”). We also randomly primed half the sample with real negative statements about the “unnaturalness” of the product made by participants in another study. The negative priming turned out to have stronger effects than any of the messages intended to help overcome the naturalistic heuristic, with the “embrace unnatural” message faring the best among the latter.

The strong effect of the negative priming treatment is a bit worrisome, because we can easily imagine that if clean meat begins to threaten conventional meat producers, some of them may engage in campaigns to spread distrust via these kinds of negative social information.

In our other paper, we explored a different topic: whether merely knowing about clean meat products could change ethical beliefs. Standard models of cognitive dissonance (e.g. Rabin, 1994) would suggest that if people believe the costs of avoiding meat from factory farms are high, they will be less receptive to information about the environmental costs or the harm caused to animals by factory farming. If clean meat is regarded as a good substitute, it could lower the costs of avoiding meat from factory farms (e.g. by lowering monetary costs or being a closer substitute in taste and nutrition than vegetarian products). Thus, upon receipt of new information (e.g. a video about factory farming), people may be more receptive to it and likely to shift their ethical beliefs.

Contrary to expectations, we saw no effect on ethical views – until we restricted attention to those who viewed clean meat positively (remember: many thought that clean meat would not be a good substitute). Since it was possible that those who viewed clean meat positively were also more likely to change their views, to identify the causal effect of a positive perception of clean meat on ethical views we leveraged the randomized priming treatment previously described as an instrument. Those who were randomly selected to be shown negative statements about the “unnaturalness” of clean meat were less likely to view clean meat positively and were also less likely to exhibit changes in their ethical beliefs upon viewing the video. From a broader economics perspective, this is very interesting: it implies our ethical beliefs are a function of the products around us.

None of this is to say that companies developing clean meat products won’t be wildly successful. They could also help directly mitigate the effects of factory farming by providing a product that at least some proportion of the population will substitute towards. Further, while we did not observe shifts in ethical values on net in our experiments, it is possible that over time more people will perceive clean meat positively and clean meat will affect ethical values in the long run.

Nonetheless, these experiments suggest that there is still room for animal advocacy organizations to make a difference in changing people’s ethical views, and while some conventional meat companies have shown interest in clean meat, clean meat companies should prepare for negative advertising campaigns, since the effects of negative messages can be tough to overcome.

External validity and research credibility

I recently put out a revised version of my paper on how well we can generalize from impact evaluation results in development.

To summarize: imagine we have a set of impact evaluations on the effect of a given type of intervention (e.g. conditional cash transfer programs) on a given outcome (e.g. school enrollment rates) and want to say something about the true effect of a similar program in another setting. I argue for using τ², a measure of true inter-study variation used in the meta-analysis literature, to make inferences about how a similar program might fare in another setting, which I take to be the crux of what we mean when we talk about generalizability.

There is actually a very close link between research credibility and generalizability. τ² and the related I² statistic (which is the share of total variance that is not sampling variance) have a long history (see e.g. Rubin, 1981; Efron and Morris, 1975; Stein, 1955), but have perhaps most notably been used to improve how credible a research result is in its own setting through forming a shrinkage estimator. Gelman and Tuerlinckx and Gelman and Carlin‘s work on Type S and Type M errors is also very relevant. Type S errors represent the probability that a hypothesized difference between two true effects has the wrong sign (in Gelman and Tuerlinckx’s parlance, making a claim that θi > θj when θi < θj, where one might be considered to make such a claim if the estimate of θi were found to be significantly greater than the estimate of θj, for example); Type M errors are similar but represent errors of magnitude.[1]

In my paper, I take it that when we are interested in generalizability, we are interested in generalizing from a set of results to make predictions about the effect of a similar program in another setting. Perhaps, as in Type S or Type M errors, we are interested in the sign or magnitude of that true effect. Just as τ² and I² can help improve a study’s own estimates or provide a measure of how likely they are to be correct, they can also help us form estimates about how likely our estimates are to be correct in other settings. How well we can predict the sign or magnitude of this true effect depends entirely on the parameters of the model, assuming the model is not misspecified, and is something we can estimate.

There may well be other things we care about, in addition to the sign and magnitude of such an effect, but I would argue the sign and magnitude are certainly among the things that policymakers attempting to form evidence-based policy would care about, and the same approach could also be leveraged to answer many other similar questions (like the likelihood that an impact evaluation will find a significant effect of an intervention on a particular outcome). Nor does this approach preclude building more complicated models of the treatment effects — in fact, in the paper I include an example leveraging a mixed model to reduce the “residual” τ², or unexplained variation.

One small new statistic reported in this version: the correlation between the standard deviation of the treatment effect and τ². Each effect has a standard deviation associated with it. If we take their mean within an intervention-outcome combination, that statistic is quite strongly correlated with τ² — 0.54 using standardized values. It would be nice if this relationship held at the level of the individual study, as then one could estimate the generalizability of a result simply using the data from one’s own study, however, the relationship between a single study’s standard deviation and τ² is much noisier. Still, I think one promising area for further research is looking more closely at how well one can predict across-study variation using within-study variation.

Another change in this version: R code for both the random-effects and mixed model is now available as an online appendix. I know there are existing packages to estimate these kinds of models using a variety of methods (e.g. metafor’s “empirical Bayes” approach) but this code follows the paper and hopefully can be useful to someone.

[1] Aidan Coville and I have a related SSMART grant-supported working paper estimating Type S and Type M errors and the false positive report probability and false negative report probability in development economics.

Is it better to build the evidence base or improve the decision-making process?

A key reason we do impact evaluations is to inform policy decisions. It is absolutely crucial to build up an evidence base. However, in a paper leveraging AidGrade data combined with experimental data, I argue that we also shouldn’t neglect the decision-making process and that improvements in the decision-making process can sometimes dominate the returns from conducting an impact evaluation.

This perhaps sounds crazy, and I’m not at all suggesting we abandon impact evaluations. They can be important tools. What I do in the paper is to build a model of the returns to impact evaluation (in terms of improved policy outcomes) assuming policymakers are Bayesian updaters and have only altruistic motives (caring only about the impact of the project on intended beneficiaries). I then gather the real priors of policymakers, practitioners, researchers, and a comparison group of MTurk workers and use these priors to estimate the returns of impact evaluations. Since most projects have fairly small impacts, the typical return to an impact evaluation is also very small.*

Thanks to the great advice of a referee, I also looked at different ways a decision could be made, comparing a single policymaker making a decision as a dictator and a group using majority voting. There is a large literature on the wisdom of the crowds, and there has also been some work to suggest people’s priors are better than meta-analysis results. There are many other ways in which decisions could be made, but even without considering more complicated decision-making rules, it is already apparent that changing the way in which decisions are made can sometimes be more valuable than conducting an impact evaluation. Of course, this depends on the quality of the decision-makers; for the relatively poorly-informed MTurk subjects, I observed something like a “folly” of the crowds when considering how they would behave if faced with a particularly noisy signal of the effects of a program.

In another paper, joint with Aidan Coville, I focus on how policymakers update and find the situation may actually be worse because policymakers (and practitioners and researchers – no one should feel superior here!) do not Bayesian update but are subject to several behavioural biases.

In summary, we talk a lot about “evidence-based” decisions, but making an evidence-based decision takes a lot more than just evidence. There remains a lot of low-hanging fruit in this research area.

*I argue that an impact evaluation is most useful for the highly uncertain, possibly highly effective projects, a straightforward and well-known result of the normal learning model.