:::: MENU ::::

Is it better to build the evidence base or improve the decision-making process?

A key reason we do impact evaluations is to inform policy decisions. It is absolutely crucial to build up an evidence base. However, in a paper leveraging AidGrade data combined with experimental data, I argue that we also shouldn’t neglect the decision-making process and that improvements in the decision-making process can sometimes dominate the returns from conducting an impact evaluation.

This perhaps sounds crazy, and I’m not at all suggesting we abandon impact evaluations. They can be important tools. What I do in the paper is to build a model of the returns to impact evaluation (in terms of improved policy outcomes) assuming policymakers are Bayesian updaters and have only altruistic motives (caring only about the impact of the project on intended beneficiaries). I then gather the real priors of policymakers, practitioners, researchers, and a comparison group of MTurk workers and use these priors to estimate the returns of impact evaluations. Since most projects have fairly small impacts, the typical return to an impact evaluation is also very small.*

Thanks to the great advice of a referee, I also looked at different ways a decision could be made, comparing a single policymaker making a decision as a dictator and a group using majority voting. There is a large literature on the wisdom of the crowds, and there has also been some work to suggest people’s priors are better than meta-analysis results. There are many other ways in which decisions could be made, but even without considering more complicated decision-making rules, it is already apparent that changing the way in which decisions are made can sometimes be more valuable than conducting an impact evaluation. Of course, this depends on the quality of the decision-makers; for the relatively poorly-informed MTurk subjects, I observed something like a “folly” of the crowds when considering how they would behave if faced with a particularly noisy signal of the effects of a program.

In another paper, joint with Aidan Coville, I focus on how policymakers update and find the situation may actually be worse because policymakers (and practitioners and researchers – no one should feel superior here!) do not Bayesian update but are subject to several behavioural biases.

In summary, we talk a lot about “evidence-based” decisions, but making an evidence-based decision takes a lot more than just evidence. There remains a lot of low-hanging fruit in this research area.

*I argue that an impact evaluation is most useful for the highly uncertain, possibly highly effective projects, a straightforward and well-known result of the normal learning model.


Four reasons your study should collect priors

Several of my research projects have involved collecting priors from policymakers, practitioners and researchers (e.g. this and this). I think that collecting priors is quite important and undervalued in economics.

They have several uses:

1) They can help you prioritize outcomes or tweak other features of your design

If you know that there is more disagreement as to whether an intervention will affect a certain set of outcomes, you can focus your attention on that set of outcomes. This can help maximize learning and hopefully ensure your work is widely cited.

2) They help you avoid the problem that, regardless of what results you find, people say they knew it already

Have you ever done a study and then had people say they knew the results already, when you’re pretty sure they didn’t? It would be really nice to avoid this situation and keep your research from being overly discounted.

3) They enable learning about updating

If you collect priors, you can also collect posteriors and start to say something about how people interpret evidence and what behavioural biases a group of people might have, as in my paper with Aidan Coville on how policymakers, practitioners and researchers update.

4) They can make null results more interesting

Researchers currently aren’t given much credit for null results, a problem that can lead to specification searching. However, if we know a priori that some null results were completely unexpected, they become more interesting and informative.

 
For all these reasons, I am happy to say that due to a SSMART-funded project, which gathered priors from researchers and policymakers on their priors regarding the size of various interventions’ impacts, the World Bank’s Development Impact Evaluation group (DIME) is now capturing priors across their portfolio of impact evaluations through their monitoring system. This should lead to a large corpus of priors that can be very helpful in the future.

What do you think? Have you heard of any other interesting work eliciting priors?


Clear opinions, weakly held

Recently I encountered the phrase “strong opinions, weakly held” — something advocated in the rationalist community. Some backstory for it is here. I am interested in considering the first part of the phrase and will ignore the “weakly held” portion, as I trust everyone agrees on the importance of being able to change their minds in the face of new evidence.

What could “strong opinions” mean? I see four possibilities:

Definition 1) Narrow priors (or posteriors, if you will — depends on which point of time you are considering)

Definition 2) Strongly stated opinions, in the sense of making a point forcefully

Definition 3) Strongly stated opinions, in the sense of making a point with precise language that accurately conveys one’s beliefs

Definition 4) Having an opinion at all, even if one’s beliefs entertain a wide range of possible outcomes (e.g. a uniform distribution over the entire space)

I can see several possible arguments for or against “strong opinions” in the sense of each of those definitions. Nonetheless, it is wholly unclear to me which arguments are typically made, using which definitions. If at the bare minimum one would like statements to be made clearly, in the sense of Definition 3, presumably there are better ways of putting that. By the sheer number of things it could mean, it is an ironic phrase. Perhaps it is better put as “clear opinions, weakly held”.


Comments

After spam issues, re-enabling comments. Comments will be automatically locked 30 days after a post is made. Also testing out a new spam filter – apologies if your comments end up caught up in it (e-mail me to let me know if that is the case).


Priors matter for optimal design of experiments

Banerjee, Chassang and Snowberg have an under-appreciated paper, “Decision-Theoretic Approaches to Experiment Design and External Validity”, that anyone who designs experiments should think about.

Some highlights:

1. Bayesians do not (if making policy decisions themselves) randomize

Suppose you were a Bayesian and trying to maximize expected utility. There exists some set of ways to assign individuals to the treatment group that would maximize your expected utility. Randomizing could sometimes deviate from that set of ways (e.g. if you are unlucky enough to have imbalance between the treatment and control group along some observable characteristics). Therefore, randomizing would not be optimal. This is in the same spirit as Kasy (2013).

2. Priors matter

Apart from randomizing sometimes leading to failure to obtain balance, randomizing could also not be optimal for some priors. They provide the example of a superintendent who believes that whether a student is from a poor or privileged background is the main determinant of educational outcomes and who believes that those who go to private schools do better because they tend to be from privileged backgrounds, but who is open to testing whether private schools are helpful in and of themselves. The superintendent has the chance to enroll a single student in a private school. Clearly, they would not learn much by enrolling a privileged student in a private school — to learn the most, they should enroll a poor student.

3. The optimal experimental design depends on how the decisions are made

The experimenter may not be making the policy decision themselves. Rather, they may be trying to convince others (or trying to convince some small part of themselves that is uncertain, in the Knightian sense). Thus, when designing the experiment they need to place some weight on how much they want to convince themselves, given their own priors, vs. how much they want to convince someone else (or themselves, under ambiguity), given the other person’s priors. This depends on how the decisions are being made, e.g., whether it is a group of people with quite varied priors making a decision.

4. Randomizing is optimal when faced with an adversarial audience (given a sufficiently large sample size and assuming a maximin objective)

Suppose you care about the worst case scenario: a decision-maker whose priors are such that given the experimenter’s chosen design they have a greater chance of picking the wrong policy than anyone else with different priors.

In this situation, a randomized experiment is best (so long as the sample size is sufficiently large). It is not targeted towards people with any particular priors, and given that it is not targeted towards people with any particular priors, it also leaves less room for error. Optimizing for some priors generally means making decisions worse for people with other priors. The qualifying statement that the sample size must be sufficiently large is included because with small samples comes greater loss of power from randomization compared with the optimal deterministic experiment (again, think of covariate balance).

This paper is nice because, among other things, it helps explain why academics who face skeptical audiences randomize while firms that do not face an adversarial audience but merely wish to learn for the sake of their own decision-making will experiment in a smaller and more targeted way, especially when the costs per participant are high. A key assumption in the current framework is the maximin objective, which may not always be what we care about.