When replication goes awry

How could it happen that when combining data across experiments that promising p-values vanish, and treatment groups look indistinguishable? It happens often enough that PhD biology students have been warned by their supervisors against combining data from experiments. But surely experiments that show a similar trend should show a stronger trend when combined?  Absolutely, say the statisticians in the Statistical Consulting Unit. Replication is an essential component of good practice in experimental design, and the information gained from experiment replication should be added to the growing body of evidence.

But simply merging the data and running a simple analysis like a t-test can give misleading results. A naïve statistical analysis fails to recognise the difference between within-experiment variability and the variability between experiments. Between-experiment variability can be large relative to observed treatment differences and can introduce noise, obscuring true treatment effects. Batch effects, different environmental conditions, different reagents and different machine runs are just some of the sources of unwanted between-experiment noise.

An intelligent statistical analysis recognises that between-experiment variation is irrelevant to the treatment comparisons that are made within an experiment. Data are not just the numbers; data have structure that should be incorporated into all analyses. In this case, the experiment is considered to be a blocking factor, and data are collected within each block. When data structure is incorporated into the analysis, then between-block variation will removed as a noise component. Consistent treatment effects will become more statistically significant with the combined sample and standard errors around the means will shrink. 

To find out more about how you can combine similar experiments to strengthen the evidence for treatment effects, make an appointment to see a statistical consultant at the Statistical Consulting Unit.