A few weeks ago, Science magazine
featured a very interesting story by Robert F. Service titled
“Biology's Dry Future.” The subtitle tells us, “The explosion
of publicly available databases housing sequences, structures, and
images allows life scientists to make fundamental discoveries without
ever getting their hands 'wet' at the lab bench.” The story
highlights two quotes from interviews. The first is by Atul Butte of
Stanford University School of Medicine: “I'm like a kid in a candy
store. There is so much we can do.” The second is by David
Heckerman of Microsoft Research: “You basically don't need a wet
lab to explore biology.”
The title of the story is not quite
accurate. There will always need to be wet lab biologists to do
experiments and generate data. What is novel here is the new breed of
biologist who works on data generated by other labs, but need not
have a lab themselves. This may be new to biology, but physicists
have long had a split between experimentalists and theorists,
recently joined by computationalists.
Of particular interest to DTLR are the
three “growing pains” mentioned in the article: data access,
data standardization, and genetic privacy. I will focus on the first
two here. Regarding data access:
In many cases, researchers who have spent their careers generating powerful data sets are reluctant to share. They may be hoping to mine it themselves before others make discoveries based on their work. Or the data may be raw and in need of further analyses or annotation. “These are really hard problems,” Butte says. “We need better systems to reward people that share their data.”
DTLR endorses that last sentence.
First of all, anyone who makes the effort to generate a good data set
should make the effort to document and annotate it for use. Even if
the data are never shared, pretending that it might be shared one day
instills the necessary discipline for documentation and annotation.
Moreover, if the work is publicly funded, then in my view the social
contract requires that the data be made available to the broader scientific community at some point, perhaps
after an appropriate time period of exclusive use, say, no more than
two years. (This is about the time needed for a grad student or
post-doc to squeeze at least one paper out of results.) The new
Nature online journal for data sets would provide an excellent venue
to generate a peer reviewed publication for the data set alone,
rather than discoveries that can be made with it. Bear in mind that
in physics, Nobel prizes are awarded to both theorists and
experimentalists. Biology as a discipline should adopt a similar
cultural mindset to reward both wet bench and dry bench biologists.
Regarding standardization:
Not only do research groups file their data using different software tools and file formats, but also in many cases the design of the experiments—and therefore precisely what is being measured—can differ. Butte and others argue that dealing with multiple file formats is somewhat cumbersome but that the problem is surmountable. But it can be harder to account for differences in experimental design when comparing large data sets.
DTLR could not have said it better.
The core problem here is experimental design, and it will always be a
limiting factor for dry lab biologists trying to combine data from
more than one experiment. A similar problem exists in clinical
medicine, under the term 'meta-analysis', and I'm not sure there are
really good solutions there either. The best approach, in my view, is to take any
findings based on multiple data sets as tentative, exploratory, and hypothesis-generating, rather than definitive. The findings should then be
confirmed (or refuted) in a new experiment. This is where the dry lab
biologist might have to return to the bench.
Finally, DTLR cautions that dry lab
biologists should still spend some time in the lab, at least while in
training. There is no substitute for bench time for getting a feel
for how sloppy and imprecise experimental data can be, and where the
pitfalls and potential systematic and random errors may arise from.
It is too easy for a dry bench scientist to take data found in a
database at face value. Spending time at the bench will provide a
needed reality check.
Reference
Robert F. Service, 2013: Biology's dry
future. Science, 342: 186-189.
No comments:
Post a Comment