Saturday 25 August 2012

Replication Alone Is Not Enough

Psychology has lately been hit by high-profile fraud scandals, and broader concerns over questionable research practices. Now the Society for Personality and Social Psychology (SPSP) has released a statement on "Responsible Conduct", and a task force has produced a report.

This is a start, and the SPSP is to be commended for facing up these problems (which affect many other fields) relatively early. However, neither of their documents contains much meat in my view.

Point One on the task force report is that "Replication is the key to building our science" and they suggest a "web site for depositing replications and failures to replicate" - but don't mention that various enterprising researchers have already made one. Nor do they tip their hats to the Open Science Initiative addressing just this issue. This makes me worried that they're planning to reinvent the wheel.

More fundamentally I disagree that replication is key to psychology or any field. Our goal should be replicability. Failure to replicate findings is a symptom of problems with those original findings, rather than being a problem in and of itself. Good results replicate; we want better results to be published.

In other words, we should strike at the root cause of invalid research, namely, the perverse incentives towards publishing as many eye-catching positive results with p values below 0.05 as possible by any means necessary. P-value fishing, selective reporting, post-hoc "prior hypotheses" and other questionable practices are a large part of what make unreplicable results.

We should encourage replication, but it's no panacea.

An overemphasis on replication, without addressing the incentives, could actually harm science. It could lead to scientists spending all their time worrying about the political drama of who's replicating who and why, and which questionable practices they can use to replicate their friends' data - rather than actually doing science.

This is why we shouldn't be satisfied with any reform effort that puts replication before replicability. If you can fudge a result, you can fudge the data a replication. How to fight questionable practices is another question but I've proposed reforms that I think would work, namely pre-registration of hypotheses, methods, and statistical analyses. Others have their own ideas.

A lesson from clinical medicine here. Clinical trials of new drugs adopted pre-registration, but only after they tried replication and it didn't work. Pharmaceutical regulators have long required multiple demonstrations of drug efficacy. One trial was not enough. Sounds good - but the problem was that drug companies just did lots of trials and analyses, picked the positive ones, and used them.

So in summary: replication is important, and we don't do enough of it, but replication alone is not enough to fix psychology.

15 comments:

Zen Faulkes said...

Excellent stuff. Not much more to add at the moment.

Science Refinery said...

An often overlooked but important distinction (between replication and replicability)!

Wintz said...

Good stuff. I'm surprised I didn't come across your pre-registration idea sooner (considering I regularly read this blog). Especially as I proposed something similar here

Anonymous said...

Completely agree with your post. Replication is not the problem - it's the crazy emphasis academia places on publishing for the sake of publishing, whether what we are doing says anything useful/applicable or not.

Eric Charles said...

This is all such a bizarre conversation. Do you know why chemists don't worry about things like this? Because the most important results, published in the bestest, best, best chemistry journals, are instantly replicated in hundreds of labs. The results are important exactly because people want to do that thing they read about in the article. There are obvious exceptions for big-big science that takes huge teams to accomplish... but in general other sciences think results are important under conditions in which people will almost instantly notice if it does not work. Under those conditions a distinction between "replicable" and "replicated" is silly.

Science Refinery said...

If only psychology was as easy/ cheap/ fast to do as chemistry apparently is...

Noah Motion said...

I think I get the distinction between replication and replicability, but I'm not sure I agree that the latter is more important. How do we know something is replicable (i.e., has the property of replicability) if people aren't replicating it (or trying and failing)?

There is a big a cultural difference between behavioral science (psych, experimental linguistics, etc...) and "hard" science. As Eric Charles notes, this discussion sounds bizarre from the perspective of chemistry (and physics, I assume), where replication is required prior to general acceptance of any results of import.

On the other hand, as Lauren Meyer notes, straight replication in behavioral science isn't particularly easy or fast. And as is widely recognized, there really isn't any incentive to replicate in behavioral fields. And, as noted in the post, it's easy to imagine replication-publication just feeding into the same screwy system we already have.

I don't know what the answer is, but the pre-registration system seems like it could be gamed fairly easily. How is adherence to pre-registered methods enforced, for example? It's all too easy to imagine people proposing one set of methods, finding nothing of interest, and fiddling around (in much the same way it seems people do already) to get "good" results. While guaranteed publication of pre-approved hypotheses and methods helps, there's would still be an incentive to publish "real" results rather than null results, I would think.

To be clear, I don't know what the answer is. And I appreciate that you're thinking and writing about these issues. They're worth discussing openly, even if they're difficult (or impossible) to solve.

Bernard Carroll said...

Would you please clarify the difference between replication and replicability? Operationally, how would we demonstrate the latter without the former?

Neuroskeptic said...

Eric: I don't think it's bizarre. Because the fundamentals of chemistry (i.e. physics) are so well understood, most chemists nowadays are essentially engineers, their discoveries are more like inventions than observations. They invent a new way of synthesizing X or adding group Y. So as you say, everyone can then try to use that invention and if it doesn't work they'll say so. But psychology isn't like that (neither is, say, evolutionary biology).

Neuroskeptic said...

Re: Replicability, that's the overall reliability of findings in a particular field. It's not very useful to talk about the replicability of a particular paper: that would just mean, is it true or not? But at the level of fields, some fields seem to suffer many more failures to replicate than others.

The only way to measure replicability is through replication attempts, but once you've done that, you need to try and improve replicability and that will take more than just replications.

Michael Barnett-Cowan said...

I'd like to point out that in addition to attempting to replicate a large sample of Psychology studies (through pre-registration), The Reproducibility Project is also investigating factors such as replication power, study design, and the original study’s sample and effect sizes as predictors of reproducibility. Each of these has a distinct implication for interventions to improve reproducibility.

Michael Cohn said...

I also want to stump for the Reproducibility Project. There's good reason to be concerned that psychology is publishing unreproducible findings, but we haven't demonstrated it in a systematic, empirical fashion. We should find out the actual extent of the problem (and, if possible, which subfields and study types are most affected) before we start tossing around solutions.

The Reproducibility Project also avoids many problems with questionable research practices by 1) doing pre-trial registration and 2) requiring that contributors replicate the analysis used in the original study and not just the methods. The original author could have inflated their alpha by fiddling with multiple analyses, but the replication is required to process and analyze the data only once, using the analytic plan that was actually published.

Regardless, I strongly agree that we also need to reform the "p<.05 = publication" problem. Even if it turns out that there aren't pervasive replicability problems, we know for certain that a lot of valuable findings are getting file-drawered.

Michael Cohn said...

@Eric Psychology findings generally take a lot more time and money to replicate than chemistry ones (Human subjects protections mean that even the simplest findings require hours of work and weeks of waiting -- at best -- before we can even start a replication). More importantly, most findings in the social sciences aren't directly useful. The purpose is to demonstrate a theory on which other applications can be built.

That kind of replication does happen, but when it fails, it's impossible to tell whether it reflects on the original finding or on the way the new researchers tried to extend it. I believe the root of many of our problems is the default assumption that the new authors must have messed up, meaning that their negative finding is of no interest.

Unknown said...

I find this a bit odd - primarily due to the diverse nature of psychology as a field. If what we are studying is human behaviour, surely replicability and/or replication is neither here nor there.

For example, if one is a social psychologist looking at how individuals understand a particular phenomena, replicability of the research is largely redundant. In the case of research investigating human experience, no two individuals are going to have had, or perceive to have had exactly the same experience as the other. The experiences are contextually, historically and culturally bound and coloured by individual experience of the social world.

Maybe the replication/replicability of research makes major paradigmatic assumptions about the nature of research, which tends to be generalised to psychology as a discipline, rather than treating it as a field made up of numerous inter-relating sub-disciplines.

I think the problem lies with the underlying implication that the 'aim' of psychology is essentially to define, measure and ultimately predict aspects of human behaviour.

Andrew Oh-Willeke said...

To echo Eric Charles a bit, I would have to agree that meaningful, consensus operational definitions that have some basis in an underlying fundamental psychological reality - as many other academic disciplines do - is a goal that remains elusive in psychology.

An important reason that this is the case is that a lot of research is done with lazy methods - convenience samples of "WEIRD" college students, assuming that survey responses reflect the objective reality, artificial laboratory methods that an incapable of handling levels of social complexity that elementary school kids can grok easily.

Replication and replicability aren't so terribly important when the hole discipline is lost in the wilderness and using methods that have far too low resolution to capture what is going on in a complex world.

Psychology needs more researchers who already really, deeply, understand people who invest a great deal of time in a more rich analysis and a better conceptual framework in a way based on observations of an ordinary mix of people, in real life contexts, based upon tools more objective than paper and pencil (or internet) surveys.

For example, rather than striving to take a reductionist minimal number of diagnostic symptoms approach to DSM diagnosis, more research should be devoted to identifying non-diagnostic symptoms that are part of the syndrome and might help us to get the bottom of the fundamental etiology of a DSM condition and would improve the level of consensus between professionals in the field about a diagnosis. Clinical psychologists, who all too often try to make a diagnosis based on conversations with one or two people in a single office visit without really seeing the condition in act in the patient are just as guilty of lazy methods as the academic researchers.

The pressure to make replicable results, even if those results are the product of such poor measurement tools that they are like four or sixteen pixel photographs, can be harmful too.