Unplanned analyses, data dredging and cherry picking

Posted on 15 October 2020

Dr Alicia Shaw, Innovation and Evaluation Lead at the IEE

This is a blog about data analysis, and specifically about the dangers of carrying out unplanned analyses on results from scientific research. I know this might sound like a bit of a niche topic, but whenever you’re reading about research in education (or in other fields) it’s important to be aware of how researchers planned to analyse their data, and then what they actually did, as this will influence how much trust you can place on the conclusions being drawn.

In an ideal world, researchers should plan their research thoroughly before they start their data collection and stick to this plan as closely as possible. This includes making decisions about how they will select participants, what the participants will do, what measures they will use and how the data they gather will be analysed. Doing this prevents researchers from making decisions as they go along in order to make their results look more positive. Of course, research doesn’t always go as planned – schools drop out of trials half-way through an evaluation, there is a delay in the publisher sending the test that was going to be used – but by being transparent about any changes to pre-specified plans, the research remains objective and readers can draw informed conclusions.

While the question of “what should I do with the numbers?” can feel like an issue to resolve once all of the data come in, it is important that data analysis is included in the plans written before the research begins. This prevents people from carrying out lots of unplanned analyses on all sorts of participant characteristics and outcomes (data dredging) and then only reporting the ones which support the original hypothesis or show their intervention in a positive light (cherry picking). Data dredging is a problematic technique because some positive results will be found purely by chance. Most researchers carry out statistical tests using a significance level of 0.05 – this means they are looking for relationships between outcomes and variables which have less than a 5% probability of happening by chance. However, this means that around 5% of the relationships tested will be found to be statistically significant by chance. So if 1,000 analyses are carried out, 50 are likely to be found to be significant despite having no relation to each other.

It is easy to see how large-scale data dredging could occur in epidemiology or nutrition research, where huge data sets are analysed to look for patterns between people’s behaviour and their health outcomes. But it can also be a problem in education research. Consider an evaluation of a reading intervention which uses four measures of reading (one each for accuracy, comprehension, fluency and phonological awareness). It analyses the data taking lots of different pupil characteristics into account: gender; Pupil Premium status; prior attainment; special educational needs status; home language; ethnicity. By analysing the outcomes of the four reading measures against all of these characteristics, and then with combinations of these characteristics, it would be easy to carry out hundreds, if not thousands, of analyses. If researchers have not stated in advance which analyses they will carry out, they could confidently report the statistically significant results (“This reading intervention has a positive impact on the reading accuracy of low-prior-attaining white British boys who do not have identified special educational needs”) while ignoring the hundreds of relationships analysed which were not positive or statistically significant.

Another issue is p-hacking, where decisions are made about how the data will be analysed during the analysis process; this might include deciding whether to exclude some participants’ data from the analysis or which statistical tests to use. Making these decisions as you go along can have obvious advantages for unscrupulous researchers in search of positive findings!

I can hear you thinking “But surely this is just a dry, methodological issue. Does it actually have anything to do with teachers?” Well, yes, there are several ways the uncritical reporting of unplanned data analyses can impact on the pupils in your schools and on society more broadly.

Researchers and education publishers are under pressure to make their results look as positive as possible. There are a number of reasons for this, including publication bias in academia and the fact that positive results make great marketing tools for publishers. They therefore have an incentive to cherry pick positive findings and to ignore or downplay any negative findings. When misleading results are published, schools, teachers, and pupils can be negatively affected:

  • Conclusions drawn from cherry-picked or p-hacked results are likely to be more positive than the impact you would typically see in school. This makes it difficult for school staff to make informed decisions and may lead to schools buying interventions that are ineffective or teachers making choices that are not the best option for their class. This will cost schools time and resources that could be used more effectively.
  • Not reporting data analyses that show negative findings can cause teachers to make choices without understanding the possible negative consequences. This can lead to schools wasting precious time and resources on an approach that could potentially harm the learning of some pupils.
  • Only reporting positive results can also waste the time of other researchers. Right now researchers are probably replicating research which has already been conducted but isn’t accessible, instead of new carrying out research with the potential to have beneficial consequences for your pupils.

What’s more, unplanned data analyses can contribute to contradictory advice from different research papers and a general distrust of science. If it’s possible to play with the data until it tells researchers what they want, why should any of us believe anything that scientists or statisticians tell us? And when false positive findings find their way into systematic reviews and meta-analyses, they can distort our understanding of a topic for years to come.

So what can and should be done about this? Researchers should publish their data analysis plans before starting any research. They should acknowledge and explain any changes to these plans when writing up their findings, and draw tentative conclusions from unplanned analyses. And these are my top tips for deciding how much to trust the data analysis when you are reading research:

  • Look to see whether the researcher published a research protocol or plan before starting their research. Check whether their planned data analysis matches up with the data they report; if it does, you can be fairly confident that the researcher hasn’t engaged in data dredging or p-hacking.
  • If the analyses carried out are different from those in the plan but the researchers have acknowledged this and explained why they analysed the data as they did, that’s OK. Just remember not to put as much weight on the unplanned analyses as you would on planned analyses.
  • Researchers often don’t pre-published a research plan. In this case you should be a little cautious, especially where results are reported for sub-groups of participants or sub-tests of measures.
  • If you can’t find a pre-published plan and there are lots of seemingly disjointed analyses in the results section, all of which are positive, be extremely cautious about accepting the results.

Leave a reply

Your email address will not be published.