Monday, July 29, 2013

Nature's policy on reproducible research

The Diffusion Tensor Literary Review (DTLR) applauds the Nature family of journals for its new policy on reproducible research.  They have been doing a great job covering non-reproducible research in science, illustrated by this open-access collection of previous reports.  Their new policy is embodied in a Reporting checklist for life science articles.  If adequately enforced, the new policy should be a major step forward in promoting reproducible research.  I am pretty happy with what is on the checklist, but it would be boring for me to rehash it here (please go read it yourself).  Instead, I will dwell on a few deficiencies, which I don't think are minor.  Despite these complaints, I feel that the policy and the checklist represent a major step forward and I hope other journals consider formulating reproducible research policies of their own.

As Nature acknowledges, the checklist is not exhaustive.  One item in particular should have been mentioned:  if data, signal, or image analysis software was used either for data reduction or analysis, the software version, settings, and options used for such analysis need to be disclosed.  Unfortunately, scientists often do not realize that such procedural minutiae matter, let alone require disclosure.  In my experience, use of software (including commercial software) can be a minefield if users have free reign to fiddle with settings and options that could affect the data reduction and/or data analysis.  In some cases, the hardware-software interface (e.g., data acquisition) itself requires further elucidation.  I recall an occasion where I was working with a commerical biosignal device that allowed the user to choose data reporting at (say) either 500 Hz or 1 kHz.  A colleague and I wondered what the true sampling rate was, and how the reported data were being generated (downsampling?  interpolation?).  After pressing the manufacturer for an explanation, it turns out the actual sampling rate was non-uniform, due to the physics of the transducer, and interpolation was used to report an artificially equally-spaced signal.  Most users would have no need of such details, but we were processing the fine structure of these signals, and really did need to understand the data acquisition and reporting process.

The other deficiency in the checklist that I am not happy about is the checklist's use of p-values as an example of statistical test results, in the guidance on figure legends.  The p-value is one of the least informative statistical inferences that could be reported in an analysis, since it focuses only on statistical significance.  Point and interval estimation of appropriate quantities is more likely to convey both statistical and practical/clinical significance.  In any case, attempts to boil down statistical results into a single number (e.g., a p-value or a correlation coefficient) serve to hide the richness of information contained in the data.  Statistical tests, usually formulated as null hypothesis tests, usually lack any meaningful measures of magnitude and precision.  I will return to this point at greater length in a future post.

As I said in the previous post, there needs to be a major cultural and infrastructural change in the scientific community, and it can best be driven by funding agencies, journals, and employers.  The Nature policy is an excellent contribution to infrastructural change.  Ultimately, though, scientists themselves need to be upset enough about non-reproducible research in their own fields before things will change.  It is necessary but not sufficient to drive the principles of reproducible research from the top down.  Without support from the bottom up, scientists may view reproducibility policies as just additional bureaucratism, and may even seek ways to circumvent their spirit (if not their letter).

No comments:

Post a Comment