Sunday, October 27, 2013

Biology's dry future

A few weeks ago, Science magazine featured a very interesting story by Robert F. Service titled “Biology's Dry Future.” The subtitle tells us, “The explosion of publicly available databases housing sequences, structures, and images allows life scientists to make fundamental discoveries without ever getting their hands 'wet' at the lab bench.” The story highlights two quotes from interviews. The first is by Atul Butte of Stanford University School of Medicine: “I'm like a kid in a candy store. There is so much we can do.” The second is by David Heckerman of Microsoft Research: “You basically don't need a wet lab to explore biology.”

The title of the story is not quite accurate. There will always need to be wet lab biologists to do experiments and generate data. What is novel here is the new breed of biologist who works on data generated by other labs, but need not have a lab themselves. This may be new to biology, but physicists have long had a split between experimentalists and theorists, recently joined by computationalists.

Of particular interest to DTLR are the three “growing pains” mentioned in the article: data access, data standardization, and genetic privacy. I will focus on the first two here. Regarding data access:

In many cases, researchers who have spent their careers generating powerful data sets are reluctant to share. They may be hoping to mine it themselves before others make discoveries based on their work. Or the data may be raw and in need of further analyses or annotation. “These are really hard problems,” Butte says. “We need better systems to reward people that share their data.”

DTLR endorses that last sentence. First of all, anyone who makes the effort to generate a good data set should make the effort to document and annotate it for use. Even if the data are never shared, pretending that it might be shared one day instills the necessary discipline for documentation and annotation. Moreover, if the work is publicly funded, then in my view the social contract requires that the data be made available to the broader scientific community at some point, perhaps after an appropriate time period of exclusive use, say, no more than two years. (This is about the time needed for a grad student or post-doc to squeeze at least one paper out of results.) The new Nature online journal for data sets would provide an excellent venue to generate a peer reviewed publication for the data set alone, rather than discoveries that can be made with it. Bear in mind that in physics, Nobel prizes are awarded to both theorists and experimentalists. Biology as a discipline should adopt a similar cultural mindset to reward both wet bench and dry bench biologists.

Regarding standardization:

Not only do research groups file their data using different software tools and file formats, but also in many cases the design of the experiments—and therefore precisely what is being measured—can differ. Butte and others argue that dealing with multiple file formats is somewhat cumbersome but that the problem is surmountable. But it can be harder to account for differences in experimental design when comparing large data sets.

DTLR could not have said it better. The core problem here is experimental design, and it will always be a limiting factor for dry lab biologists trying to combine data from more than one experiment. A similar problem exists in clinical medicine, under the term 'meta-analysis', and I'm not sure there are really good solutions there either. The best approach, in my view, is to take any findings based on multiple data sets as tentative, exploratory, and hypothesis-generating, rather than definitive. The findings should then be confirmed (or refuted) in a new experiment. This is where the dry lab biologist might have to return to the bench.

Finally, DTLR cautions that dry lab biologists should still spend some time in the lab, at least while in training. There is no substitute for bench time for getting a feel for how sloppy and imprecise experimental data can be, and where the pitfalls and potential systematic and random errors may arise from. It is too easy for a dry bench scientist to take data found in a database at face value. Spending time at the bench will provide a needed reality check.

Reference


Robert F. Service, 2013: Biology's dry future. Science, 342: 186-189.

No comments:

Post a Comment