Sunday, November 3, 2013

Data management plan to be included in open access mandate

This month's APS News has a front-page report by Michael Lucibella, “Open Access Mandate will Include Raw Data.” The story focuses on the forthcoming mandate from the U.S. Office of Science and Technology Policy (OSTP), regarding open access to journal papers derived from federally funded research, one year after publication. Lucibella says that although no official statement has been made, it is expected that the mandate would include a data management plan, to make data sets generated by public funds available to the public as well. The story quotes the OSTP memo stating that “scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” Specifics are “just starting to take shape.” The story goes on to outline various challenges to such a mandate.

One valuable feature of the mandate is that “Data points that have been expunged from the final analysis will likely have to be included, the idea being that scientists can evaluate why those points were eliminated.” In principle this is a good thing, but it will be nearly impossible to enforce. Also, there may be some subjectivity involved, as data that are clearly from documented technical errors should probably not be included (in my view); transcription errors should be corrected before posting. Also, I would like meta data to be included along with the raw data files.

The story states that computer codes would not be included in the mandate, “though talks are continuing over this point.” The story quotes statistician Victoria Stodden, who expresses concern about the omission of computer codes, which will obstruct reproducible research. I share Stodden's concern and I hope the mandate will include computer codes.

Modulo the concern about computer programs, DTLR endorses both the mandate to make journal articles public after one year, as well as the mandate to make the data publicly available.  I was a co-author on two publications where we provided supplemental information that included data sets and computer scripts.  However, I've co-authored nearly 20 refereed papers in total, and obviously most of them did not include such supplemental information.  As a result, all these years later, it is impossible for me to reproduce any of that work.  (Caveat:  some of this research was not financed by public funds; nonetheless I believe the principle should apply to all published research.)  I wish such a mandate had been in place at the beginning of my career, so that all of my published work could be reproducible.  With job changes and so on, I've long lost track of data sets and computer codes that were employed in doing the work reported in those papers.

Reference


No comments:

Post a Comment