:::: MENU ::::

A note on pre-analysis plans in economics

I developed a tool, EarlyReview, which helps researchers screen their pre-analysis plans and registered reports for completeness, clarity and consistency prior to registration.

The tool is designed to help researchers develop better plans. While anyone can upload their plan to their favourite LLM, EarlyReview was extensively tested to provide helpful comments. It also lets researchers whose plans “pass” the screening generate a public page that they can then share with funders or journals as an indicator that it cleared these third-party checks.

In testing it out on a random sample of publicly-available plans from the AEA RCT Registry, several things stood out to me. Here is a quick summary of the dimensions along which researchers seem to do well vs. poorly, with the percent of plans that passed related checks:

Excellent (100%): All plans provided basic information on what the study is about. This includes things like a description of the research question, design type (perhaps easy on the AEA RCT Registry… it’s a RCT!), unit of randomization, treatment arms, and unit of analysis.

Good (85-95%): Plans generally – but not universally – included things like a clear description of the target population, the assignment mechanism, the type of estimate (ITT/etc.), the sample size, covariates, and a regression equation. It may be a bit surprising that not every plan included all of these: the registry requires some of them be entered into the website directly, and they may be present there but nonetheless not be described in the pre-analysis plan itself.

Often problematic (55-70%): Here it gets a little dicier. Many plans did not have a good plan to deal with attrition or were unclear on how they would cluster standard errors, and there were also frequent issues with how the primary outcomes or main specifications were described. While a regression equation was generally present, as previously described, there could nonetheless be a flag with regards to the main specification if the plan contained inconsistencies or lacked clarity as to which of several approaches would be taken for the main specification. Similarly, primary outcomes could be described in the document, but lack of clarity or inconsistency across the pre-analysis plan could raise a flag.

Yikes (<50%): How authors plan to deal with multiple hypothesis testing is often unspecified or unclear, and plans often lack credible power calculations or information on MDEs. Again, when power calculations were flagged in a pre-analysis plan, the plan could have contained power calculations, but power calculations with some error or which were not realistic given other information in the plan.

The average pre-analysis plan passed about 80% of checks. Very few plans tested passed all the checks.

One could wonder whether EarlyReview’s screening tool simply is missing things in the pre-analysis plans – that they would actually do much better but for some fault of the tool. This is plausible, but I spot checked the sample of plans. Ones that failed a lot of checks were indeed pretty basic, such as just providing a list of hypotheses on a single page. (It is likely that as authors upload more information to the AEA RCT Registry directly, that extra information in conjunction with the posted pre-analysis plans would help more of these plans clear all of the checks.) I also conducted the same tests with the Journal of Development Economics’ Registered Reports, and they performed much better, as one might expect given registered reports tend to be more detailed than pre-analysis plans.1 Caveats remain.2

In the long run, I expect that research pipelines will develop with increased formalism and checks, but hopefully other AI tools will make it less burdensome to create plans, too. The easier it is to run AI-assisted experiments, the more screening can be helpful. I’m happy to work with journal editors and others on other AI-based tools for the new research paradigm.

Anyway, try it out for yourself – uploading a pre-analysis plan or registered report is free and confidential. I welcome feedback that can improve the tool for others. I’m also still looking for examples to highlight on the site – please get in touch if you’re willing to publicly share your report in exchange for more free credits.


1Important note: The approach is currently tuned to be generous to authors. In early testing, far fewer JDE RRs were passing all checks; being wary of the limitations of AI-based screening, I intentionally made the screening more conservative so that almost all JDE RRs “pass”. The above stats, in which very few AEA RCT Registry pre-analysis plans pass all checks, are based on the same approach under which almost all JDE RRs pass.

We can debate when it is appropriate to be generous. My thinking at the moment is that the point of such a tool is to provide useful feedback through comments (there are typically many more potential issues flagged in comments, which do not prevent a paper from “clearing” the core checks), while nudging the field in the right direction. “Nudging the field in the right direction” requires generosity, imo. The approach is versioned; I’m thinking of adding an option for users who want comments that flag more potential issues.

2In particular: 1) AI-based tools are not perfect. I think the results above are more reliable than one would have obtained by asking humans to hand-code the same fields, but nonetheless care is warranted in interpretation; 2) the above analysis was done on the documents that could be easily processed. A small share of randomly-selected AEA RCT Registry pre-analysis plans JDE Registered Reports could not be processed and were replaced. For the JDE RRs, this was because of file size, and for the pre-analysis plans this was because not all were PDFs. I imagine that the longer RR documents would, all else equal, potentially contain even more detail – but they could also contain more inconsistencies.


Pre-doctoral fellow position

I am looking to hire 1-2 pre-doctoral fellows at the University of Toronto in applied microeconomics. The deadline is February 24, 2026.

Some possible projects:

  • Work on guaranteed income
  • Projects relating to evidence-based decision-making
  • Forecasting research
  • AI for science

Pre-docs may work on one or more of the topics above, depending on fit, availability, and interests. I also encourage pre-docs to develop and propose their own ideas (subject to finding a good fit).

Eligibility

Applicants must, at minimum, have:

  • A bachelor’s degree (or be graduating this year);
  • Experience in R or Stata;
  • Work authorization in Canada, whether by citizenship or an open work permit. This is strictly required. In practice, citizens of several countries can often obtain an open work permit. In the pre-interview application screener, you will see some questions designed to help you figure out if you might be eligible (though you may need to do further investigation on your own).

The ideal candidate would have:

  • A strong quantitative background and potentially a master’s degree;
  • Proficiency with more than one programming language;
  • Familiarity with ML;
  • Familiarity with LLM coding tools;
  • Previous research experience, such as through past research assistantships or an independent research project;
  • An interest in pursuing a PhD in Economics or a related field.

To apply:

To apply, please fill out a pre-interview screener here, including uploading a transcript and CV. Only shortlisted applicants will be contacted for an interview.


How much do policymakers and policy practitioners weigh context and local expertise when making decisions?

One of my papers with Aidan Coville of the World Bank and Sampada KC, a former pre-doctoral student, is now out at the Journal of Development Economics.

In this piece (“Local knowledge, formal evidence, and policy decisions”), we conduct discrete choice experiments with both policymakers (staff at various LMIC government line agencies, largely constituting those with decision-making power over programs or monitoring and evaluation specialists) and policy practitioners (World Bank and Inter-American Development Bank staff). We painstakingly collected these data at World Bank and IDB impact evaluation workshops. The benefit of collecting data at these workshops is that the sampling frame was demonstrably interested in evidence-based policy and we obtained relatively high response rates there (82% overall, and 91% in the case of World Bank workshops).

Participants were asked which of two hypothetical programs they would prefer to increase enrollment rates. The programs had impact evaluation results attached to them and varied by the following attributes:

Estimated impact (0, +5, +10 pp)

Study design (observational, quasi-experimental, RCT)

Study location (same country, same region, other)

Precision (±1 pp vs. ±10 pp confidence interval)

Recommended by local expert (recommended by a local expert or not)

One difference between this study and earlier work is that by asking participants to consider which programs they would prefer (rather than just which studies they would be interested in), we can estimate participants’ willingness-to-pay for programs being supported by different kinds of evidence in terms of how much estimated impact they would be willing to give up. As described in the paper, program impacts are a nice yardstick for assessing tradeoffs since they are similar to public budgets, representing social costs that depend on the policymaker’s choices.

We find that local expert advice matters so much – 5 pp on average – that it can often outweigh estimated differences in program impacts. Policymakers have a similarly strong preference towards programs with an impact evaluation from the same country. Clearly, there is a strong desire to have locally-relevant evidence.

The bottom line: If researchers want their studies to have the maximal impact, they should try to run the study in as close to the target setting as possible and stay in close communication with local experts.

Side perk: While we didn’t set up the study intending to focus on how much policymakers and policy practitioners care about statistical significance, we could crudely capture this through the data we gathered. When we generate an indicator of whether a program was associated with a statistically significant effect or not and include it in the regressions, it seems to mostly drive the preference for results with large effect sizes and small confidence intervals. To our knowledge this is the first study to consider such preferences.


Pages:1234567...18