Sunday, August 14, 2016

Book Review: Stephen Stigler's "The Seven Pillars of Statistical Wisdom"



The Seven Pillars of Statistical Wisdom, by Stephen M. Stigler (Harvard University Press, Cambridge, Mass., 2016).

The book presents seven principles that the author believes support the core of statistics as a unified science of data, “the original and still preeminent data science” (p. 195).  It is intended for both professional statisticians and the “interested layperson,” though I suspect the latter would struggle a bit, as the author does not shy from formulae, calculations, and even name-drops of advanced statistical methods and concepts.  The author is a distinguished professor of statistics at the University of Chicago, and a leading historian of the field.  Each of the seven main chapters discusses one of the “pillars,” illustrated with historical examples (as opposed to contemporary ones) and often accompanied by discussions of the pitfalls involved with each principle.  The author states that “I will try to convince you that each of these was revolutionary when introduced, and each remains a deep and important conceptual advance” (p. 2).

The first principle is titled “Aggregation” or “the combination of observations,” of which the arithmetic mean is the chief example discussed.  The author implies that the method of least squares, and more general smoothing methods, also falls under aggregation, broadly understood.  The concept of aggregation is radical because the principle implies that individual observations can be discarded in favor of sufficient statistics.  Prior to a general acceptance of averages, scientists would often simply choose the “best” of a set of observations, or perhaps take a midrange (average of the highest and lowest values).  The concept offers other dangers, as the author illustrates with Quetelet’s notion of the Average Man.

The second principle is titled “Information:  its measurement and rate of change”, which focuses on the Central Limit Theorem and the root-N law (which roughly states that the gain in precision of an estimate increases with only the square root of the amount of data used to calculate it).  The author acknowledges the contrast of statisticians’ usage of the term “Information” (specifically, Fisher Information) with its more general use in signal processing and information theory (specifically, Shannon information).  Again, pitfalls are discussed, including a case where randomly selecting one of two data points is better than using their average.  This is a case where cannonballs of two calibers are reported by different spies.  A cannon whose caliber equals their average would not exist.  “The measurement of information clearly required attention to the goal of the investigation” (p. 59).  (In my view, one could write an entire chapter on that last point, and it would be more important than most of the 7 principles selected by the author for this book.)

The third principle is titled “Likelihood:  Calibration on a Probability Scale”.  Here the concept of a statistical significance test is introduced, along with p-values, Bayesian induction, and the theory of maximum likelihood.  The fourth principle is titled “Intercomparison:  within-sample variation as the standard.”  He illustrates it with Student’s distribution and t-test, and the analysis of variance.  Pitfalls are illustrated with an example of data dredging in the hands of economist William Stanley Jevons.  The author acknowledges further pitfalls, “for the lack of appeal to an outside standard can remove our conclusions from all relevance” (p. 198). (In my view this concern is understated:  statisticians are fond of standardizing data, but this prevents multiple data sets from being compared using an external standard.  Dimensional analysis offers an alternate approach.)

The fifth principle is titled “Regression:  Multivariate Analysis, Bayesian Inference, and Causal Inference”.  This principle warrants the longest chapter of the book, and begins by focusing on regression to the mean, a discovery made by Francis Galton.  This discovery resolved a paradox Galton had noticed in Darwin’s theory of evolution:  if each generation produced heritable variation of traits to its offspring, why was the aggregate variation in those traits stable over time?  Later in the chapter, Stein’s paradox is discussed, and shrinkage estimation is presented as a version of regression.  The correlation-causation fallacy is also discussed, including spurious correlation and Austin Bradford Hill’s principles for epidemiological inference.  The chapter also covers multivariate analysis, Bayesian statistics, and path analysis-- a real hodge podge.

The sixth principle is “Design:  Experimental Planning and the Role of Randomization”.  Fisher’s demolishment of one-factor-at-a-time experimentation is discussed, as is Pierce’s innovation of using randomization in experimental psychology studies, and later Neyman’s discussion of random sampling in social science.  The chapter ends with a brief discussion of clinical trials and a lengthier discussion of the French lottery of the 18th and 19th centuries.

The seventh principle is titled “Residual”, by which the author means both residual analysis, a commonly used approach for statistical model criticism, as well as formal model comparison for nested models, using a significance test.  The author also detours into the history of data graphics.  The chapter is marred by infelicities in its history of physics and astronomy.  At one point the author states that “We are still looking for that [lumineferous] aether” (p. 172).  Rest assured, most physicists are not worried about that.  The author then describes Laplace’s approach to resolving an apparent discrepancy in the orbits of Jupiter and Saturn; Laplace was able to show that the motions could be explained using a mutual 3-body problem with the sun.  Using an exaggeration worthy of our current Presidential candidates, the author observes that “A residual analysis had saved the solar system.”  Finally, in the Conclusion, the author speculates about the possibility of an (as yet unknown) eighth pillar to accommodate the era of big data.

At this point, readers should be warned that I have an unconventional and dissident view of statistical ideology.  For instance, where the author states about statistical significance tests, “misleading uses have been paraded as if they were evidence to damn the entire enterprise rather than the particular use” (p. 197), I would number myself among those who would damn the entire enterprise.  (This is a topic of current controversy, as evidenced by Wasserstein, 2016.)  There is some value in distilling the ideas of statistics into a set of principles; similar exercises are commonly embarked on, and Kass et. al (2016) is another example published in the same year.  Were I to write such an account, it would differ from both Stigler’s and others’, and present my own statistical ideology.  This will have to wait for another day.  Suffice it to say that my selection of pillars would differ, and any discussion I'd offer of Stigler's would dwell far more on the pitfalls and hazards than he has.

In my view, this book’s chapter on “Design” is the best (except for the digression on the French lottery), while the topics discussed in the other chapters are so fraught with difficulties that the concepts described might be as potentially harmful as they are helpful to the serious data analyst.  I found the book disappointing and less enlightening than I had hoped. While not as bad as Salsburg's The Lady Tasting Tea, I would find it difficult to recommend this book to readers of any level of statistical sophistication.


References


Kass, R.E., et al., 2016:  Ten simple rules for effective statistical practice.  PLoS Computational Biology, vol. 12 (6), e1004961.

Wasserstein, R. L. (ed.), 2016:  ASA statement on statistical significance and P-values.  The American Statistician, vol. 70, pp. 129-133.

No comments:

Post a Comment