It sounds like a good problem to have, but it is still a very real problem: we are now producing genomic and metagenomic data much faster than we can analyze it. Two of BEACON’s own scientists are quite familiar with this problem, and were quoted in today’s New York Times article, “DNA Sequencing Caught in Deluge of Data.”
C. Titus Brown, one of BEACON’s thrust group 1 leaders (Evolution of Genomes, Networks, and Evolvability) and who teaches one of BEACON’s core multidisciplinary graduate courses, points out the problem of too much data:
The Human Microbiome Project, which is sequencing the microbial populations in the human digestive tract, has generated about a million times as much sequence data as a single human genome, said C. Titus Brown, a bioinformatics specialist at Michigan State University.
“It’s not at all clear what you do with that data,” he said. “Doing a comprehensive analysis of it is essentially impossible at the moment.”
Professor Brown of Michigan State said: “We are going to have to come up with really clever ways to throw away data so we can see new stuff.”
E. Virginia Armbrust, University of Washington professor and BEACON researcher, comments on the overwhelming amount of data produced by metagenomics projects:
E. Virginia Armbrust, who studies ocean-dwelling microscopic organisms at the University of Washington, said her lab generated 60 billion bases — as much as 20 human genomes — from just two surface water samples. It took weeks to do the sequencing, but nearly two years to then analyze the data, she said.
“There is more data that is infiltrating lots of different fields that weren’t particularly ready for that,” Professor Armbrust said. “It’s all a little overwhelming.”
One thing is clear: BEACON is at the forefront of bioinformatics research, and is poised to figure out new solutions to this unusual problem.