:::: MENU ::::

Reinhart-Rogoff and the problem with economics research

If you haven’t read about the Reinhart and Rogoff scandal, you can read about it here, here or here, among other places. In brief, a major paper was found to have made a number of errors, from Excel errors to questionable exclusion of several data points.

There was a lot of outrage in the public, but the response of economists was much more muted in general. Partly, I think this is because economics is a small world and everyone knows everyone. Partly, I think it’s because nobody’s particularly surprised; errors and even misrepresentations happen all the time.
As a discipline, we should be focusing on better correcting mechanisms. But why is there such a big problem in economics in the first place and what can we do about it?

First, regarding data. There is a big push to get people to share their data and their code, but the devil is in the details. It’s not enough to put data or code out there – you need someone to look at it to see whether or not it’s any good. Nobody wants to closely examine data or code unless it’s a really important topic and they are trying to replicate the results, but there are very low incentives to replicate papers.
Solutions? Journals requiring authors share their data and code are already doing a good job on at least encouraging some sharing. What is needed is more pressure and attention to what exactly it is that is shared, along with feedback mechanisms to correct any mistakes. AidGrade has feedback mechanisms explicitly built into its meta-analysis protocols. More radically, there is a sort of “GitHub for research” on the way that would allow all the usual features of forking along with automated posting of data and code. The nice thing about this is that it could take choice out of the picture, both eliminating the hurdle of manually posting data as well as serving as a commitment device for openness.

Second problem: people are biased and this bias can permeate their methods and affect their results, even unconsciously. It is easy enough to, after running a regression, think to run it on a subgroup or with different controls and, if you obtain a result that supports your priors, think this regression closer to capturing reality. Donald Green describes the problems associated with this well.

One thing that would help solve this problem is a pre-analysis registry where people can share their initial hypotheses and how they plan to test them. Then, if they deviate from them, at least we know and can consider the results in light of this.

There is already an effort by the AEA/J-PAL to have such a pre-analysis registry, and while it is a fantastic endeavour it does not go far enough. It only accepts randomized controlled trials, which make up a very small share of development economics research or economics research in general.

The other day I was looking for a place to post a pre-analysis plan for work I am doing. Since it was not an RCT, there was nowhere to post it. I tried signing up for the Open Science Framework but didn’t see a single public pre-analysis plan posted there. Though some may be hidden, that really is a shame and points to the fact that if people don’t have the incentive to do it, they won’t. So for now I am sharing my plans with friends, with the side benefit of getting feedback, but I would of course prefer for there to be a repository for these plans. Would anyone like to set one up with me? Let me know @evavivalt – this is the kind of work best done jointly, so share the word.

Controversy in development economics

How to know you have hit upon a very controversial subject: two titans of development economics each castigate you for diametrically opposite reasons. Next time, I should let them fight directly!

I am trying to use AidGrade’s data to say something about the generalizability of impact evaluation results. I’m not coming in with an agenda, but basing this on the belief that:

1) People want to know what works. There are a lot of grandiose claims that impact evaluation can tell us this. Economists are usually very careful not to generalize from particular cases, knowing that results are heavily context-dependent and have no external validity. But there is also a sense in which we really do want to use the results to update our priors. We want to get something generalizable out of an impact evaluation, else why do one in the first place if it only tells us how successful something which will never again occur was?

The extent to which results are generalizable is an empirical question. So long as people are extrapolating from past results, whether explicitly or with a wink and a nudge when trying to get policy makers to agree to a new impact evaluation, we’d better at least know how generalizable the results are. You can say we know they aren’t generalizable. Fine. People still talk as though they are, they are likely to be to some non-zero degree, so what is that degree?

2) There are undoubtedly contexts under which results are more or less generalizable in practice. For example, since a lot of people don’t want to randomize, I suspect that RCTs may be done in “weirder” situations than quasi-experimental studies. I wonder if we can see this in the results. Some causal chains between interventions and outcomes may also be more complicated than others. The theory here is quite clear – why not test it out?

Unfortunately, I’m stuck between a rock and a hard place. Some say it goes too far, others not far enough. I’m a fan of impact evaluations for what they do tell us about human behaviour. I also think they are often vastly overpriced and that many but not all of them would be more helpful were they to give more immediate, actionable feedback to the project implementer.

I’m not actually interested in participating in the War of the Randomistas. But when a war goes on, it seems that some on either side have hammers and everything looks like a nail.

If special interests kill it, so be it, but practically speaking it’s an important issue.

Big on BITSS

Briefly: BITSS is highly necessary.

As mentioned in the new transparency series on CEGA’s blog, there are a lot of problems in the discipline today. Focusing on interaction terms or particular subgroups is one way of increasing the odds of obtaining the elusive 5% significance level, so a lot of people do it, but this overstates the results’ true significance. Something can always appear significant by accident, and tests of significance are only legitimate if they are defined beforehand.

Because of this issue, AidGrade has been forced to take a very conservative stance with regards to the values it collects from studies. In the absence of pre-analysis plans, we have avoided collecting interaction terms and have also focused on results containing few controls (another way that people can lie with statistics). This is obviously a shame, because there is a real sense in which these terms could be important. However, in the absence of pre-analysis plans, this is the best we can do in order to avoid fraud.

Here’s hoping the discipline will change with all the new attention brought to bear on this topic!

Academic proposal

This is a very old idea that I’m putting out in the public domain so that someone can use it. I’ve already shared it with several academics but so far nobody has tried to use it to my knowledge….

For dual-academic couples: one person publishes an academic article and puts in the acknowledgements “X, will you marry me?” The other person then has to either publish another article with their reply or perhaps write up their reply as a comment to the first article!

Reminded of it as I see this is making the rounds. Yes, I know, my idea takes a lot longer!

Call for feedback

Over 100 million impressions. 50,000 tweets. Twitter dump here.

This might be the last update of the twitter dump, as others have continued to scrape the tweets and have added in some of the data from the older tweets. Please let me know if you’d like the occasional update to continue here in .csv form (e.g. for some of the other data contained in the tweets beyond the scraped pdfs).