Pages

Monday, September 16, 2013

Data stewardship

In last week's issue of Eos, Karen Simmons (2013) writes about data stewardship and her experience of attempting to recover and migrate data from long completed NASA space probe missions from the 1970s and 1980s.  Some of the data (on 7-track tapes and punch cards) was about to be discarded, as the storage facility was being shut down.  Much of the valuable metadata required to interpret the raw data is buried in mission documents that may be lost as personnel retire, change jobs, or even change offices.  The article is well worth reading and, though she doesn't mention it, the issues she raises are closely related to reproducible research.  Although the difficulties described are particularly extreme (the casual discarding of data from expensive, taxpayer-funded space probes, each unique in their own way, but long decommissioned), any of us doing experimental or observational science has much to learn from her experience.  Sadly, she finds that "usable science data and metadata from the early days of planetary exploration are now missing from the archives."  Don't let this happen to you!

Reference


K.E. Simmons, 2013:  Lost Science:  Protecting Data Through Improved Archiving.  Eos, Transactions of the American Geophysical Union, 94 (37):  323-324.

Saturday, September 7, 2013

The AMS Data Policy Statement: Full and Open Access to Data

The American Meteorological Society currently has a draft statement open to comments by members until Oct. 3, on Full and Open Access to Data.  As a member I submitted the following comment:

"I applaud the principles outlined in this document, and AMS's willingness to take a definite stand on the issues discussed. I am particularly in favor of strong encouragement for academic journals and funding agencies to *require* that data sources be clearly identified and publicly available, unless a justification can be given.  What the policy fails to address, perhaps because it is out of scope, is a more complete endorsement of the principles of reproducible research, which would also require making computer code publicly available, and also specification of software options and settings used, and finally full specification of any data reduction or manipulation procedures carried out between the raw and the analyzed data sets (e.g., filtering, interpolation to convert non-equally spaced time series into equally spaced time seris, etc.)  Thank you for this opportunity to comment."

The Diffusion Tensor Literary Review (DTLR) similarly endorses the draft statement on Full and Open Access to Data.