:::: MENU ::::

Do randomized controlled trials engage in less specification searching?

An excerpt from on-going work based on AidGrade’s database of impact evaluation results in development economics.

These are results from caliper tests which essentially compare the number of results just above a critical threshold (t=1.96) with those just below a critical threshold. You can vary the width of the band; for example, a 5% caliper would look at the range 1.862 – 2.058. If you see a jump at 1.96, you might suspect specification searching is going on, in which researchers only report the results they like, biasing the results.

   Over     Under   p-value   * 
All studies        
2.5% Caliper 45 26 0.02 <0.05
5% Caliper 73 51 0.03 <0.05
10% Caliper 127 117 0.28  
15% Caliper 182 185 0.58  
20% Caliper 220 231 0.71  
RCTs        
2.5% Caliper 24 14 0.07 <0.10
5% Caliper 35 28 0.22  
10% Caliper 64 68 0.67  
15% Caliper 97 107 0.78  
20% Caliper 119 134 0.84  
Non-RCTs        
2.5% Caliper 21 12 0.08 <0.10
5% Caliper 38 23 0.04 <0.05
10% Caliper 63 49 0.11  
15% Caliper 85 78 0.32  
20% Caliper 101 97 0.42

Okay, there seems to be a jump. Possibly more among quasi-experimental studies than among RCTs.

Overall, though, this jump is actually quite small. Gerber and Malhotra did the same kinds of tests for political science and sociology. They used different selection criteria when gathering their papers, essentially maximizing the probability they would see a jump, but take a look at their numbers:

Political science:

   Over     Under   * 
A. APSR      
Vol. 89-101      
10% Caliper 49 15 <0.001
15% Caliper 67 23 <0.001
20% Caliper 83 33 <0.001
Vol. 96-101      
10% Caliper 36 11 <0.001
15% Caliper 46 17 <0.001
20% Caliper 55 21 <0.001
Vol. 89-95      
10% Caliper 13 4 0.02
15% Caliper 28 12 0.008
20% Caliper 21 6 0.003
B. AJPS      
Vol. 39-51      
10% Caliper 90 38 <0.001
15% Caliper 128 66 <0.001
20% Caliper 165 95 <0.001
Vol. 46-51      
10% Caliper 56 25 <0.001
15% Caliper 80 45 0.001
20% Caliper 105 66 0.002
Vol. 39-45      
10% Caliper 34 13 0.002
15% Caliper 48 21 <0.001
20% Caliper 60 29 <0.001

Sociology:

   Over     Under   * 
ASR (Vols. 68-70)      
5% Caliper 15 4 0.01
10% Caliper 26 15 0.06
15% Caliper 47 17 <0.001
20% Caliper 54 19 <0.001
ASJ (Vols. 109-111)      
5% Caliper 16 4 0.006
10% Caliper 25 11 0.01
15% Caliper 41 14 <0.001
20% Caliper 48 18 <0.001
TSQ (Vols. 44-46)      
5% Caliper 13 4 0.02
10% Caliper 22 7 0.004
15% Caliper 26 11 0.01
20% Caliper 30 20 0.1
Combined (recent vols.)      
5% Caliper 44 12 <0.001
10% Caliper 73 33 <0.001
15% Caliper 114 42 <0.001
20% Caliper 132 57 <0.001
ASR (Vols. 58-60)      
5% Caliper 17 2 <0.001
10% Caliper 22 5 <0.001
15% Caliper 27 11 0.007
20% Caliper 30 15 0.02

Wow! Economics is not doing so badly after all! (Some public health papers are also included, but results are comparable if you break it down.) To match Gerber and Malhotra, these are all reporting number of results rather than number of papers, and sometimes papers report more than one result, so there are some subtleties here that I get into in the longer working paper. Data are still being gathered, and there is much more to be said on this topic. If you’d like to see more of this kind of work on research credibility, please support us in the last few days of our Indiegogo campaign!


One Comment

So, what do you think ?