Wednesday, November 30, 2016

"To err is human, but so often?" - David Freedman

Nature's editorial this week discusses the unleashing of the "statcheck" computer program on psychology journal articles.  Evidently it is an automated mechanism to detect errors in the calculation of p-values reported in published papers.

While I do not object to anything they said in the editorial, my concern is that there is very little penalty for carelessness in scientific research.  P-values are actually the least of my concerns; of greater concern are errors or even sub-optimal practices in the design, execution, and reporting of research.  Statistical inference is of no value if these other issues are present, and even when not present, statistical inference remains of incredibly limited value compared to a descriptive presentation of the data.  There are several reasons for this, such as:
  • Statistical inference presumes some kind of generalization, usually to a larger, stable population of which the data in the study can be thought of as representative.  This is rarely justified.
  • The statistical analysis adds information to the data  in the form of an assumed probability model.  This model's assumptions may well influence the outcome more than the data does.
  • Statistical inference is an inherently confirmatory activity, while most research is exploratory.  Statistical models in this context are overfitted to the data, and the generalization implied by statistical inference is invalid.
Nonetheless, sloppy calculation is a sign of carelessness, and for this reason the "statcheck" episode has certainly done a service, if it dis-incentivizes future carelessness.  On the other hand, legitimate criticisms of "statcheck's" own error rate have been raised.  I see this as the needed back-and-forth in the ongoing discussion of reproducible research, and accusations of "harassment" on the part of "statcheck's" creators are over-sensitive and unwarranted.



No comments:

Post a Comment