Monday, January 27, 2014

NIH ponders reproducible research, redux

Last August I wrote about the NIH's considerations of measures to encourage reproducible research.  Today, the NIH's leaders Francis Collins and Lawrence Tabak have announced what they've been up to.  Their note quite appropriately emphasizes study design and the reporting of details of experimental design.  They are basically running a range of pilot projects and plan to decide which measures to adopt permanently at the end of this year.  Although this is a very cautious and gradualist approach, I applaud their intention to act on this issue.  And I agree with them that the whole scientific community needs to take ownership of reproducible research.  I look forward to reading about their findings and their decisions in 4Q14.

DTLR is delighted to endorse the NIH's actions to support reproducible research.

Reference


Francis S. Collins and Lawrence Tabak, 2014:  NIH plans to enhance reproducibility.  Nature, 505:  612-613.

Friday, January 24, 2014

Science Magazine embraces reproducibility (finally)

Last week's issue of Science carried an editorial titled simply, "Reproducibility" (McNutt, 2014).  The Editor-in-Chief, Marcia McNutt, announced that Science would be adopting recommendations of the U.S. National Institute of Neurological Disorders and Stroke (NINDS) to encourage transparency and reproducible research in preclinical studies (Landis, et al., 2012).  In addition, the journal would spend the next six months gathering examples of "excellence in transparency" in order to further develop guidelines to incentive reproducible research.  Finally Science plans to add more statisticians to its editorial board, to involve them in manuscript review.  Science follows its sister journal, Science Translational Medicine, which has already adopted the NINDS guidelines, as well as Nature which published the NINDS guidelines and posted its own, discussed here. Although Science has certainly dragged its feet on this issue, in comparison to Nature, they should be commended for finally joining the bandwagon.

The NINDS paper (Landis, et al., 2012) is a small masterpiece.  Not only does it provide a concise and thoughtful list of issues that preclinical investigators should consider in the design, execution, and analysis of preclinical studies, the paper also provides a useful literature review documenting both the problems and the proposed solutions and their outcomes, particularly in comparison to the clinical trials literature.  I would also like to point readers to a more specific set of guidelines for preclinical imaging studies (Stout, et al., 2013).  (This blog, after all, is named after an imaging method.)

References


S. C. Landis, et al., 2012:  A call for transparent reporting to optimize the predictive value of preclinical research.  Nature, 490:  187-191.

M. McNutt, 2014:  Reproducibility.  Science, 343:  229.

D. Stout, et al., 2013:  Guidance for methods descriptions used in preclinical imaging papers.  Molecular Imaging, 2013:  1-15.



Monday, January 20, 2014

The New York Times on nonreproducible research

Today's New York Times has a new column, Raw Data, by George Johnson.  Appropriately enough, the inaugural column is about non-reproducible research.  It covers mostly the same ground that the Economist did in its special issue, "How Science Goes Wrong" last October.  We've talked about many of these issues on this blog, DTLR, over the past half year as well.  It is gratifying to see that the venerable newspaper deems the topic fit to write about.

There is one topic in Johnson's column that we haven't examined here on DTLR yet, and that is the opinion piece by Mina Bissell writing in Nature last November.  Her thoughtful piece will be the subject of a future post. 

Hurricane Sandy, climate change, and the limits of data science

Climate science is a very touchy subject, because unfortunately nearly all discussion of it becomes quickly entwined with politics. A reminder of this was recently discussed by Kerr (2013). In the President's State of the Union address a year ago, Mr. Obama said “We can choose to believe that Superstorm Sandy, and the most severe drought in decades, and the worst wildfires some states have ever seen were all just a freak coincidence. Or we can choose to believe in the overwhelming judgment of science—and act before it's too late.”

However, Kerr (2013) notes that “there is little or no evidence that global warming steered Sandy into New Jersey or made the storm any stronger. And scientists haven't even tried yet to link climate change with particular fires.” Kerr also points to a Republican Congressman's equally incorrect claim that “Extreme weather isn't linked to climate change.” Kerr states that several heat waves have indeed been “securely linked” to global warming. Kerr says that “Links between extreme weather and climate change are not only often scientifically suspect, they may also be a risky strategy to take climate change seriously.” After all, climate by definition is a statistical average of weather, which is what we experience on a day-to-day basis. The President, alas, was wrong.

Last March, I attended a lecture by Dr. Richard A. Anthes, an eminent meteorologist and president emeritus of the University Corporation for Atmospheric Research, and a former president of the American Meteorological Society. He pointed out that the track of Hurricane Sandy, with its left turn towards New Jersey, had been predicted days in advance by the ECMWF forecast model (European Center for Medium-Range Weather Forecasts). The U.S. forecast models were not able to give as early a warning, due to technical limitations of the computers and computer models (see my earlier post).

Let's use this example in a thought experiment on how data could be used to make a hurricane forecast. A purely empirical approach (whether by conventional statistical methods or by data mining/machine learning) would likely have failed: never before had a hurricane approached New Jersey from the east in late October. The ECMWF model uses data too, but not for statistical forecasting. It uses data as initial conditions to simulate the atmosphere using partial differential equations that incorporate subject matter knowledge of the physics and chemistry of the atmosphere and ocean. By doing so, it forecast the formation of the storm before it actually formed, as well as its subsequent track. The successful forecast of the ECMWF model was interpreted correctly by political authorities and heeded by the public, saving tens of thousands of lives. If you want an example of science at its best, here it is.

Make no mistake: meteorology as a science does make very judicious use of statistical and monte carlo methods. The atmospheric sciences, however, are driven primarily by methods based on subject matter knowledge, not strictly empirical methods such as those used by statisticians and data scientists (“superficial statistics” in the devastating words of Salby, 2012). Note that both approaches are equally data hungry.

Let's return to climate now. The whole point of climate change is that data in the future will not be like data from the past. As Showstack (2013) reports, Kathryn Sullivan, acting administrator of the National Oceanic and Atmospheric Administration (NOAA) gave the keynote address at the National Research Council's Board on Earth Sciences and Resources in November. She said, “The past is no longer prologue when it comes to the risks we bear at any given place on this planet. The statistical pattern of our past cannot be relied upon fully to tell us what our future will be.” Isn't this a conundrum for a statistician or a data scientist?  What good is the training data when we know it will not be informative about data from the future? 

In my view, the solution to all this is to use first principles modeling, as climate scientists do. As with meteorology, in climatology knowledge of the physics and chemistry of the atmosphere, embodied in the partial differential equations of climate modeling, is preferred to the methods of statistics and data science for predicting both the weather and the climate.

References

 

 

Richard A. Kerr, 2013: In the hot seat. Science, 342: 688-689.

Murray L. Salby, 2012: Physics of the Atmosphere and Climate. Cambridge University Press, p. xvi.

Randy Showstack, 2013: Earth sciences and societal needs explored at National Research Council meeting. EOS, Transactions of the American Geophysical Union, 94 (48): 457-459.






The lunacy of the Apollo Lunar Landing Legacy Act

According to Hertzfeld and Pace (2013), last summer a bill was introduced in the US House of Representatives, H.R. 2617, called the Apollo Lunar Landing Legacy Act. The bill was sponsored by Reps. Donna Edwards (D-MD) and Eddie Bernice Johnson (D-TX). Hertzfeld and Pace write that “In essence, it proposes to designate the Apollo landing sites and U.S. equipment on the Moon as a U.S. National Park with jurisdiction under the auspices of the U.S. Department of Interior.”

The article explains the very real reasons why one might be concerned about protecting the historical artifacts on the moon, with increasing space activity by other nations, as well as increasing involvement by the private sector in space operations. However, the authors take pains to explain that the above law is a bad idea. It violates international law such as the U.N. Outer Space Treaty of 1969, and oh, the proposed bill is “unenforceable”. Hertzfeld and Pace go on to propose addressing the legitimate concerns of the bill's sponsors by international treaties, rather than unilateral U.S. legislation. I find Hertzfeld and Pace's arguments persuasive. Writing for the Huffington Post, Leonard David points out that the watchdog group Citizens Against Government Waste gave the sponsors of the bill a “Porkers of the Month” award “for straying so far from reality's orbit, wasting the taxpayer's money on the paper and ink on which H.R. 2617 is written, and engaging in sheer 'lunarcy'.” Sounds about right to me.

We don't do politics on this blog unless it intersects with science policy.  As a matter of public policy then, DTLR opposes H.R. 2617.  The bill is well intentioned, but there are far better ways to achieve its goals.

Reference


Henry R. Herzfeld and Scott N. Pace, 2013: International cooperation on human lunar heritage. Science, 342: 1049-1050.

When mice mislead

Two months ago, the Nov. 22, 2013, issue of Science announced the detection of high energy neutrinos from beyond the solar system. This of course is a major achievement for physicists. However, my attention was drawn to another story in the same issue, “When mice mislead” (Couzin-Frankel, 2013). I regard it as one of the most important works of science journalism of the year just ended.

The article documents the following “bad habits” in studies of laboratory animals, such as mice, that can lead to non-reproducible results and misleading conclusions. The bad habits discussed include:
  • Removing data, such as from animals enrolled in a study but removed from the analysis for any number of reasons.
  • Lack of randomization and blinding.
  • No attention paid to inclusion/exclusion criteria for enrolling animals.
  • Different experimental conditions for different groups of animals.
  • Sample sizes too small to lead to definitive results, since researchers have very good reasons (ethical and financial) to minimize the number of animals used for research.
  • Publication bias, along the lines of Ioannidis (2005).
A good example is discussed by Lisa Bero, interviewed in the article. About scientists and their mentors, Bero states that “Their idea of randomization is, you stick your hand in the cage and whichever one comes up to you, you grab. That is not a random way to select an animal.” Couzin-Frankel goes on to say that “Some animals might be fearful, or biters, or they might just be curled up in the corner, asleep. None will be chosen. And there, bias begins.”

All of these bad habits are ones that have been largely eliminated from randomized clinical trials. Lab animal studies are traditionally pursued with far less rigor than clinical studies, but the article suggests that the good habits that dominate clinical trials could really clean up preclinical research if they were to be adopted widely there too. In other words, it's time to raise the level of the game in lab animal studies, and practically it wouldn't take much additional effort to do so. In fact, this very article will help those of us who try to push back on bad habits. We now have a convenient summary of the findings that such bad habits really matter, and should be avoided.

Surprisingly, Lisa Bero found that industry funded research is less likely to endorse a drug that research funded through other means, “maybe because companies don't want to pour millions of dollars into testing a treatment in people that's unlikely to help them.” This certainly has the ring of truth; however, I think even within industrial labs, the good habits of clinical trials are not always pervasive among users of lab animals.

Joseph Bass is also interviewed with a less pessimistic view. He believes there are substantive reasons why many mouse studies fail to reproduce, such as the temperature that mice are housed at, or variation in their response with age.  However, he seems too optimistic.  Nonreproducible research is more pervasive than I would like, and while fitness for purpose as a decision criterion should always over-rule "one size fits all" rules and checklists, I think such rules and checklists will do more good than harm at this point in the history of science.

Couzin-Frankel (2013) refers to the ongoing effort by the NIH to draft rules for research it funds, to encourage openness and reproducibility, as well as the checklist for biology research promulgated by Nature last year (discussed here). She even says that Science is considering a similar policy! An NIH official is quoted as saying “Sometimes the fundamentals get pushed aside—the basics of experimental design, the basics of statistics.” Amen!  This quote summarizes the problem in a nutshell.

I have been critical of Science as a follower, not a leader, on reproducible research. However, their publication of Couzin-Frankel's report goes a long way to earning forgiveness.

References


Jennifer Couzin-Frankel, 2013: When mice mislead. Science, 342: 922-925.

John P.A. Ioannidis, 2005: Why most published research findings are false. PLoS Medicine, 2 (8), e124: 696-701.



Wrap-up on last year's Science and The Economist special issues

In this post I will wrap up my discussion of the Science special issue on “Communication in Science: Pressures and Predators,” from last October, and the Economist special feature on “How Science Goes Wrong” (Oct. 19-25, 2013, issue). I have discussed various aspects of the Science special issue previously, particularly the open access “sting” operation by John Bohannon (with a follow-up here). I wrote about the excellent article by Jennifer Couzin-Frankel in the Science special issue, as well as the Economist special issue, here.

In this post I first want to summarize the salient points from the policy forum article in the Science special issue, by Diane Harley (2013). The article begins by discussing the potential for vehicles of communication other than the traditional peer-reviewed journal article. Social media technology, the open source movement in computer science, and crowd-sourcing movements such as Wikipedia illustrate the possibilities. The ArXiv preprint server and open access journals are specific manifestations within the scholarly community, along with less laudable developments such as bibliometrics for evaluating the quality of a researcher, a journal, or an institution. Harley's research has found, however, that the scientific community, including its youngest members, have been resistant to these new developments. The traditional peer reviewed article appears to be the least risky form of communicating research, particularly in view of funding, tenure, and promotion practices. Harley's study of 12 disciplines “revealed that individual imperatives for career self-interest, advancing the field, and receiving credit are often more powerful motivators in publishing decisions than the technological affordances of new media.”

The increasing deluge of publications, driven by the demands of funding, tenure, and promotion pressures, has resulted in an increased need for filtering research. The imprimatur of “good journals” is often used as just such a filter. Thus, the choice of where to publish is made based on three factors: prestige, time to publication, and visibility to a target audience.

Harley goes on to discuss how the final, peer-reviewed version of a paper receives the greatest weight, compared to preprints, working papers, conference papers, etc. She also discusses the lack of traction that experiments in open peer review have had, as well as the unfortunate ceasing of publication by two journals of supplementary data, due to the inability of referees to cope with reviewing such materials. Finally, alternative bibliometrics based on social media can too easily be gamed.

I've touched on a few highlights of the paper that caught my attention; the full paper is well worth reading and pondering. It provides a good airing of the tensions regarding scientific communication that the infrastructure of our profession will need to resolve.

Next, I will mention that the December 6, 2013, issue of Science published a selection of letters and online comments reacting to the “Communication in Science” special issue. The only one I want to point to is a letter by Lopez-Cozar et al., discussing how, as an experiment, they found that they were able to game Google Scholar by uploading fake documents. This is an example of the vulnerabilities of alternative bibliometrics that Harley alludes to.

The Economist also published a selection of letters to its “How Science Goes Wrong” special issue in the November 9, 2013, issue. There is not much for me to comment on there either, with one exception. Professor Stuart Firestein, a Columbia University biologist, wrote a fairly critical letter. He writes, “Demanding that scientists be sophisticated statisticians is as silly as demanding that statisticians be competent molecular biologists or electrophysiologists. Both are professional abilities that are not likely to be mastered by the same people. I agree that every laboratory should have the services of a professional statistician, but that is a luxury available at best to a few wealthy labs.”

I think Firestein is missing the point, albeit the Economist did not do a good job of making the point I want to make. The most important contributions of statistics is not in statistical methodology, but in critical thinking. Much of that critical thinking is non-mathematical in nature, and I believe it could be taught to lay scientists by lay scientists. Unfortunately, even in statistics courses taught by professional statisticians, the kind of critical thinking I am speaking of is often absent. I shall seek to make this point more fully in another forum.


Reference


 
Diane Harley, 2013: Scholarly communication: cultural contexts, evolving models. Science, 342: 80-82.



Congratulations to the winners of the International Data Rescue Award in the Geosciences!

Last month at the American Geophysical Union fall meeting, the above award was given to the Nimbus Data Rescue Project of the National Snow and Ice Data Center, an organization at the University of Colorado, Boulder, funded by NOAA, NASA, and NSF.  According to Showstack (2014), the Nimbus project is "recovering, reprocessing, and digitizing infrared and visible data from the NASA-funded Nimbus 1, 2, and 3 weather satellites, the first of which launched in 1964.  None of the early Nimbus data had been available for 4 decades because of archaic data formats and difficulty in accessing film rolls."  Three runners up for the prize were also chosen: you can read more about them here as well as in Showstack (2014).

The award is sponsored by the Integrated Earth Data Applications group at Columbia University, and Elsevier, who has a business interest in data stewardship services. Elsevier might be considered a controversial player, as many researchers are unhappy with their allegedly predatory journal pricing policies. Nonetheless, we should praise them when they do something right, an this appears to be a rare example.

I've written previously about data stewardship issues here and here. I am pleased that the scientific community is giving more and more attention to such issues, as exemplified by this new award program. I hope that it continues and that similar awards emerge for other sciences.  I am also more than relieved to begin a new year of blogging with some positive news.

Reference


Randy Showstack, 2014: Award program recognizes efforts to protect geoscience data. EOS, Transactions of the American Geophysical Union, 95 (1): 2.