An epidemic of non-reproducible
research in the life and behavioral sciences has been revealed in
recent years. Much of the spadework has been done by John Ioannidis
and collaborators, discussed earlier on DTLR. Well known
biopharmaceutical industry reports from Bayer (Prinz, et al., 2011)
and Amgen (Begley & Ellis, 2012) provide further confirmation.
Glenn Begley, one of the co-authors of
these papers, was interviewed for a story by Jennifer Couzin-Frankel
in the recent Science special issue on Communication in Science, discussed on DTLR last month. Couzin-Frankel (2013) discusses
Begley's failed attempts to reproduce the results published by a
prominent oncologist in Cancer Cell. At a 2011 conference, Begley
invited the author to breakfast and inquired about his team's
inability to reproduce the results from the paper. According to
Begley, the oncologist replied, “We did this experiment a dozen
times, got this answer once, and that's the one we decided to
publish.” Begley couldn't believe what he'd heard.
Indeed, I am simultaneously shocked but
not surprised. Shocked, because it displays an utter lack of
critical thinking on the oncologist's part. Not surprised, because
in my experience critical thinking is rarely formally taught to
scientific researchers, and the incentive system for scientists
rewards such lax behavior. The oncologist may have forgotten why he
got into science and medicine to begin with. The pressures of a
career in academic medicine may have corrupted his integrity, but the
work of Ioannidis and others alluded to above shows that this
phenomenon is pretty common.
The rest of Couzin-Frankel's article
discusses how clinical studies often get published even when the
primary objective of the study has failed. Usually (but not always)
the authors are up front about the failure, but try to spin the
results positively in various ways. For instance, by making enough
unplanned post hoc statistical comparisons, inevitably they'll find
one that achieves (nominal) statistical significance, and they'll use that to
justify the publication. Evidently journals allow this to occur,
resulting in tremendous bias in what gets published. These are
examples of selective reporting (cherry-picking) and exaggeration
that result in misleading interpretations. This is not how science
ought to be done.
Couzin-Frankel's article ends with a
discussion of journals dedicated to publishing negative results, as
well as recent efforts by mainstream medical journals to allow
publishing negative studies.
Non-reproducible research has also
gotten the attention of The Economist, which ran a cover story and editorial on it
a few weeks ago. As additional evidence they cite the following
statistic: “In 2000-2010 roughly 80,000 patients took part in
clinical trials based on research that was later retracted because of
mistakes or improprieties.” Thus there are real consequences.
Patients are needlessly exposed to clinical trials that may have
negligible scientific value; their altruism is being abused. This
should be a worldwide scandal, and I congratulate The Economist for
shining a harsh light on the problem.
The Economist points out that much of
this research is publicly funded, and hence a scientific scandal
becomes a political and financial one. “When an official at
America's National Institutes of Health (NIH) reckons, despairingly,
that researchers would find it hard to reproduce at least
three-quarters of all published biomedical findings, the public part
of the process seems to have failed.” They then discuss the
journal PLoS One, which publishes papers without regard to novelty
and significance, but only for methodological soundness.
“Remarkably, almost half the submissions to PLoS One are rejected
for failing to clear that seemingly low bar.” Among the
statistical issues the article discusses are multiplicity, blinding,
and overfitting.
The Economist points discusses the main
reasons for these problems: scarcity of funding for science, which
leads to hyper-competition; the incentive system that rewards
non-reproducible research and punishes those interested in
reproducibility; incompetent peer review; and statistical
malpractice. The suggest a number of solutions: raising publication
standards, particularly on statistical matters; making study
protocols publicly available prior to running a trial; making trial
data publicly available; and making funding available for attempt to
reproduce work, not just publish new work.
The Economist's article has generated a certain amount of controversy, but I think it gets it mostly right. I would have formulated the statistical discussion differently, and I think the article misses the chance to point out more fundamental statistical problems. I also don't give much weight to the comments by Harry Collins about "tacit knowledge". A truly robust scientific result should be reproducible under slightly varying conditions.
References
Begley, C.G., and Ellis, L.M. (2012):
Drug development: raise standards for preclinical cancer research.
Nature, 483: 531-533.
Jennifer Couzin-Frankel, 2013: The
power of negative thinking. Science, 342: 68-69.
Prinz, F., Schlange, T., and Asadullah,
K. (2011): Believe it or not: how much can we rely on published data
on potential drug targets? Nature Reviews Drug Discovery, 10:
712.
No comments:
Post a Comment