The Seven Pillars of Statistical Wisdom, by Stephen M. Stigler (Harvard University Press,
Cambridge, Mass., 2016).
The book presents seven principles that the author
believes support the core of statistics as a unified science of data, “the
original and still preeminent data science” (p. 195). It is intended for both professional
statisticians and the “interested layperson,” though I suspect the latter would
struggle a bit, as the author does not shy from formulae, calculations, and
even name-drops of advanced statistical methods and concepts. The author is a distinguished professor of
statistics at the University of Chicago, and a leading historian of the field. Each of the seven main chapters discusses one
of the “pillars,” illustrated with historical examples (as opposed to
contemporary ones) and often accompanied by discussions of the pitfalls
involved with each principle. The author
states that “I will try to convince you that each of these was revolutionary
when introduced, and each remains a deep and important conceptual advance” (p.
2).
The first principle is titled “Aggregation” or “the
combination of observations,” of which the arithmetic mean is the chief example
discussed. The author implies that the method
of least squares, and more general smoothing methods, also falls under
aggregation, broadly understood. The
concept of aggregation is radical because the principle implies that individual
observations can be discarded in favor of sufficient statistics. Prior to a general acceptance of averages,
scientists would often simply choose the “best” of a set of observations, or
perhaps take a midrange (average of the highest and lowest values). The concept offers other dangers, as the
author illustrates with Quetelet’s notion of the Average Man.
The second principle is titled “Information: its measurement and rate of change”, which
focuses on the Central Limit Theorem and the root-N law (which roughly states that the gain in precision of an
estimate increases with only the square root of the amount of data used to
calculate it). The author acknowledges
the contrast of statisticians’ usage of the term “Information” (specifically,
Fisher Information) with its more general use in signal processing and
information theory (specifically, Shannon information). Again, pitfalls are discussed, including a
case where randomly selecting one of two data points is better than using their
average. This is a case where
cannonballs of two calibers are reported by different spies. A cannon whose caliber equals their average
would not exist. “The measurement of information
clearly required attention to the goal of the investigation” (p. 59). (In my view, one could write an entire chapter on that last point, and it would be more important than most of the 7 principles selected by the author for this book.)
The third principle is titled “Likelihood: Calibration on a Probability Scale”. Here the concept of a statistical significance
test is introduced, along with p-values, Bayesian induction, and the theory of
maximum likelihood. The fourth principle
is titled “Intercomparison: within-sample
variation as the standard.” He
illustrates it with Student’s distribution and t-test, and the analysis of
variance. Pitfalls are illustrated with
an example of data dredging in the hands of economist William Stanley
Jevons. The author acknowledges further
pitfalls, “for the lack of appeal to an outside standard can remove our
conclusions from all relevance” (p. 198). (In my view this concern is
understated: statisticians are fond of
standardizing data, but this prevents multiple data sets from being compared
using an external standard. Dimensional
analysis offers an alternate approach.)
The fifth principle is titled “Regression: Multivariate Analysis, Bayesian Inference,
and Causal Inference”. This principle
warrants the longest chapter of the book, and begins by focusing on regression
to the mean, a discovery made by Francis Galton. This discovery resolved a paradox Galton had
noticed in Darwin’s theory of evolution:
if each generation produced heritable variation of traits to its
offspring, why was the aggregate variation in those traits stable over time? Later in the chapter, Stein’s paradox is
discussed, and shrinkage estimation is presented as a version of regression. The correlation-causation fallacy is also
discussed, including spurious correlation and Austin Bradford Hill’s principles
for epidemiological inference. The
chapter also covers multivariate analysis, Bayesian statistics, and path
analysis-- a real hodge podge.
The sixth principle is “Design: Experimental Planning and the Role of
Randomization”. Fisher’s demolishment of
one-factor-at-a-time experimentation is discussed, as is Pierce’s innovation of
using randomization in experimental psychology studies, and later Neyman’s
discussion of random sampling in social science. The chapter ends with a brief discussion of
clinical trials and a lengthier discussion of the French lottery of the 18th
and 19th centuries.
The seventh principle is titled “Residual”, by which the
author means both residual analysis, a commonly used approach for statistical
model criticism, as well as formal model comparison for nested models, using a
significance test. The author also
detours into the history of data graphics.
The chapter is marred by infelicities in its history of physics and
astronomy. At one point the author
states that “We are still looking for that [lumineferous] aether” (p.
172). Rest assured, most physicists are
not worried about that. The author then
describes Laplace’s approach to resolving an apparent discrepancy in the orbits
of Jupiter and Saturn; Laplace was able to show that the motions could be
explained using a mutual 3-body problem with the sun. Using an exaggeration worthy of our current
Presidential candidates, the author observes that “A residual analysis had
saved the solar system.” Finally, in the
Conclusion, the author speculates about the possibility of an (as yet unknown) eighth
pillar to accommodate the era of big data.
At this point, readers should be warned that I have an
unconventional and dissident view of statistical ideology. For instance, where the author states about
statistical significance tests, “misleading uses have been paraded as if they
were evidence to damn the entire enterprise rather than the particular use” (p.
197), I would number myself among those who would damn the entire enterprise. (This is a topic of current controversy, as
evidenced by Wasserstein, 2016.) There
is some value in distilling the ideas of statistics into a set of principles;
similar exercises are commonly embarked on, and Kass et. al (2016) is another
example published in the same year. Were
I to write such an account, it would differ from both Stigler’s and others’,
and present my own statistical ideology. This will have to wait for another day. Suffice it to say that my selection of pillars would differ, and any discussion I'd offer of Stigler's would dwell far more on the pitfalls and hazards than he has.
In my view, this book’s chapter on “Design” is the best (except for the digression on the French lottery), while the topics discussed
in the other chapters are so fraught with difficulties that the concepts
described might be as potentially harmful as they are helpful to the serious
data analyst. I found the book
disappointing and less enlightening than I had hoped. While not as bad as Salsburg's The Lady Tasting Tea, I would find it difficult to recommend this book to readers of any level of statistical sophistication.
References
Kass, R.E., et al., 2016:
Ten simple rules for effective statistical practice. PLoS Computational Biology, vol. 12 (6),
e1004961.
Wasserstein, R. L. (ed.), 2016: ASA statement on statistical significance and
P-values. The American Statistician,
vol. 70, pp. 129-133.
No comments:
Post a Comment