Entries tagged with “genomics” from O'Reilly Radar
Four short links: 20 November 2009
Social Network Search for Morons, Bulking Up Bio Data, Better E-Mail, Better Standards
by Nat Torkington | @gnat | comments: 1
- Spokeo -- abysmal indictment of society, first prize in mankind's race to the bottom. Uncover personal photos, videos, and secrets ... GUARANTEED! Spokeo deep searches within 48 major social networks to find truly mouth-watering news about friends and coworkers. PS, anybody who gives their gmail username and password to a site that specializes in dishing dirt can only be described as a fucking idiot. (via Jim Stogdill, who was equally disappointed in our species)
- Biologists rally to sequence 'neglected' microbes (Nature) -- The Genomic Encyclopedia of Bacteria and Archaea is project to sequence genomes from more branches of the evolutionary tree of life. Eisen's team selected and sequenced more than 100 'neglected' species that lacked close relatives among the 1,000 genomes already in GenBank. The researchers reported earlier this year at the JGI's Fourth Annual User Meeting that even mapping the first 56 of these microbes' genomes increased the rate of discovery of new gene and protein families with new biological properties. It also improved the researchers' ability to predict the role of genes with unknown functions in already sequenced organisms. (via Jonathan Eisen)
- Mail Learning: The What and the How (Simon Cozens) -- a few things that a really good mail analysis tool needs to do. I hope that my mail client and server does these out of the box in the next five years.
- Introducing the Open Web Foundation Agreement -- The Open Web Foundation Agreement itself establishes the copyright and patent rights for a specification, ensuring that downstream consumers may freely implement and reuse the licensed specification without seeking further permission. In addition to the agreement itself, we also created an easy-to-read "Deed" that provides a high level overview of the agreement. Applying the open source approach to better standards.
tags: bio, data, email, genomics, idiots, opensource, search, social graph, social software, standards
| comments: 1
submit:
Four short links: 10 November 2009
DIY Diagnostic Chips, Genetics on $5k a Genome, Cellphones as Diagnostic Microscopes, AR-Equipped Mechanics Do It Heads-Up
by Nat Torkington | @gnat | comments: 0
- A children’s toy inspires a cheap, easy production method for high-tech diagnostic chips -- microfluidic chips (with tiny liquid-filled channels) can cost $100k and more. Michelle Khine used the Shrinky Dinks childrens' toy to make her own. "I thought if I could print out the [designs] at a certain resolution and then make them shrink, I could make channels the right size for microfluidics," she says. (via BoingBoing)
- Complete Genomics publishes in Science on low-cost sequencing of 3 human genomes (press release) -- The consumables cost for these three genomes sequenced on the proof-of-principle genomic DNA nanoarrays ranged from $8,005 for 87x coverage to $1,726 for 45x coverage for the samples described in this report. Drive that cost down! There's a gold rush in biological discovery at the moment as we pick the low-hanging fruit of gross correlations between genome and physiome, but the science to reveal the workings of cause and effect is still in its infancy. We're in the position of the 18th century natural philosophers who were playing with static electricity, oxygen, anaesthetics, and so on but who lacked today's deeper insights into physical and chemical structure that explain the effects they were able to obtain. More data at this stage means more low-hanging fruit can be plucked, but the real power comes when we understand "how" and not just "what". (via BoingBoing)
- Far From a Lab? Turn a Cellphone into a Microscope (NY Times) -- for some tests, you can use a camphone instead of a microscope. In one prototype, a slide holding a finger prick of blood can be inserted over the phone’s camera sensor. The sensor detects the slide’s contents and sends the information wirelessly to a hospital or regional health center. For instance, the phones can detect the asymmetric shape of diseased blood cells or other abnormal cells, or note an increase of white blood cells, a sign of infection, he said.
- Augmented reality helps Marine mechanics carry out repair work (MIT TR) -- A user wears a head-worn display, and the AR system provides assistance by showing 3-D arrows that point to a relevant component, text instructions, floating labels and warnings, and animated, 3-D models of the appropriate tools. An Android-powered G1 smart phone attached to the mechanic's wrist provides touchscreen controls for cueing up the next sequence of instructions. [...] The mechanics using the AR system located and started repair tasks 56 percent faster, on average, than when wearing the untracked headset, and 47 percent faster than when using just a stationary computer screen.
tags: augmented reality, diybio, genomics, hacks, medicine, mobile, sensors
| comments: 0
submit:
Four short links: 29 July 2009
by Nat Torkington | @gnat | comments: 3
- Bioweathermap -- crowdsourcing the gathering of environmental samples for DNA sequencing to study the changing distribution of microbial life. Another George Church project. (via timoreilly at Twitter)
- We Are All African Now -- a great article about our genetic history and the computational genomics that makes it possible. (via Tim Bray)
- Standing Out In The Crowd -- OSCON keynote by Kirrily Robert on women in open source. Excellent.
- Energy Harvesting Powers Printed LED -- an interesting combination of two emerging technologies. Like an RFID, the circuit has a current induced by the presence of a changing RF field. The EL display and the RFID circuit are printed in organic compounds, whereas the power control is built with traditional circuit fabrication techniques. (via Freaklabs)
tags: bio, energy, gender imbalance, genomics, history, materials science, opensource, oscon
| comments: 3
submit:
Sequencing a Genome a Week
Radar Talks to OSCON Speaker David Dooling
by James Turner | comments: 3
You may also download this file. Running time: 34:51
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
The Human Genome Project took 13 years to fully sequence a single human's genetic information. At Washington University's Genome Center, they can now do one in a week. But when you're generating that much data, just keeping track of it can become a major challenge in itself. David Dooling is in charge of managing the massive output of the Center's herd of gene sequencing machines, and making it available to researchers inside the Center and around the world. He'll be speaking at OSCON, the O'Reilly Open Source Convention. His talk, titled The Freedom to Cure Cancer: Open Source Software in Genomics, will be about how he uses open source tools to keep things under control, and he agreed to talk about how the field of genomics is evolving.
James Turner: Can you start by describing what it is you do and how you came to be doing it?
David Dooling: Sure. I work at the Genome Center at Washington University in St. Louis. We are one of the handful or so of large scale genome sequencing centers around the world. What that means is essentially we participate in large genome sequencing projects that some people may have heard of, like the Human Genome Project, Thousand Genomes Project, things like that. And involved in that is a lot of data processing, laboratory processing, tracking and all sorts of things, so it's a rather large enterprise.
There are about 300 or so people that work here. And how I came to work here was that about eight years ago, I decided that I wanted to get more into programming and more into open science. So I took a job as a programmer here at the Genome Center and gradually worked my way around to where I am now, where I oversee all of the software development and IT infrastructure here at the Genome Center. And it's a fairly large IT infrastructure.
We have somewhere around three petabytes of storage online, and somewhere north of 3,000 cores in our computational cluster. And we're generating terabytes, tens of terabytes of data, per day with our current sequencing instruments. The sorts of things that we're doing now as we transition from more fundamental evolutionary types of projects, such as the Human Genome Project and subsequent projects like the Mouse Genome Project, we've done things like corn and things of that nature, now we're doing more and more sequencing projects related to medicine and medical sequencing.
Last year, we published the full cancer genome sequence. In doing both the cancer and the normal, we were able to determine the differences between those two genomes and begin to identify what might've possibly caused cancer in that individual. So projects like that. We're also doing projects with metabolic syndromes, like diabetes, and several other cancer projects as well. That's essentially what we're doing and how we're doing it and how I got here.
James Turner: Genomics is an area that seems to be on the steep part of the hockey stick curve right now. In just a decade, we've gone from sequencing one genome over a period of years to doing them routinely. Can you talk a bit about what's enabled this acceleration?
David Dooling: Well, a whole host of things. But I think really at the core was the changing fundamentals of sequencing itself. For a long time, DNA sequencing was based on a process invented by Sanger, sometimes called Sanger Sequencing, sometimes called capillary electrophoresis now because of the last revision of the instruments that were generated. But essentially with that approach, you did reactions in 96 plate wells. You processed sequence in these 96 plate well chunks. And you did reactions in there. You loaded them on the readers, and the readers read out sequence for each of those 96 wells. So that's sort of how you processed it. And at the height of that sort of sequencing, which was only a few years ago, we had about 130 or so of those instruments each churning about 15 to 20 runs per day. Each run gave you 100 pieces of sequences. You had 100 or so machines. And so you got on the order of a few thousand sequence reads, that's what we called them, because of the way the instrument read the information.
Now, since that time, 454 was first [of the new generation of sequencers] and then Solexa came, which was later bought-out by Illumina, and the ABI SOLiD has a platform. There's one from Helicos as well. And then several other third generation, those first being the second generation, sequencers have come out. And what those do is greatly increase the parallelism with which you're able to process DNA and sequence it. So instead of a few thousand runs per day, or a few thousand reads per day, you may get a few million reads per run. And these runs, for some of the platforms, do take a little bit longer. But the parallelism of it increases your throughput tremendously. And so now we have about 35 to 40 of these highly parallel instruments in-house. And with that, we're able to sequence the human genome to complete coverage in less than a week.
So the main driver has been this change in the sequencing technology and the parallelism of it. It's a fundamentally different chemistry, different physics. The flipside of it is that we talked about the hockey stick, and so that hockey stick is the sequencing hockey stick, but it's brought several other hockey sticks along with it, mainly the amount of data that these things generate. And the amount of processing power that is required to process that data has increased greatly as well. Much faster than Moore's Law over the last two years or so. Whereas with those original instruments, you would generate on the order of megabytes per day, now we're doing tens of terabytes per day with these new instruments. And then processing that, instead of taking a single processor a few minutes, it can take a small cluster a few days to actually analyze the data from each of these runs.
Those are the main things. The enabling technology was the change in the sequencing chemistry itself. And then what had to come along with that was building these infrastructures to be able to track these things and process these things and store all of this data as the instruments increased in their abilities.
tags: genomics, informatics, interviews, open source, oscon
| comments: 3
submit:
Four short links: 9 Jan 2009
by Nat Torkington | @gnat | comments: 0
Four questions, one per link: what next, can it solve a big problem, what's the final boss for Python programming, and why on earth would anyone want yogurt that glows in the dark?
- End Times - gloomy piece on the future of journalism, to be added to the large pile of other gloomy pieces on the future of journalism (e.g., Bad News, Good News). The critical problem is still how to pay for journalism if the new media revenues are significant lower than old, and if the new media economics decree that journalism is dead then who fills the social good role that journalism's death will leave?
- Ward Cunningham's Visible Workings - an intriguing glimpse, from March last year, into the way Ward lays out web interactions. Nice system for laying out these interactions, but it's also fascinating for how it makes transparent what will happen as a result of the data you submit. How scalable is this? Could it tackle privacy?
- Project Euler - fun programming exercises that require more than math to finish. We learn by doing, not by reading, so interesting exercises are part and parcel of training. It's interesting to see educators are moving from being authors to being game designers, providing a series of staged challenges that make us stronger by defeating them. I'm presently dieing in as many ways as I can while learning iterators and generators in Python, as a way of ensuring I have Python's "game physics" sussed.
- Rise of the Garage Genome Hackers - more on hobbyist molecular biology. It mentions DIYBio, the Cambridge biohacker collective that I first heard about at BioBarCamp. (via Glynn Moody)
tags: biology, design, diy, education, games, genomics, journalism, make, media, programming, python
| comments: 0
submit:
Challenges for the New Genomics
by Matt Wood | comments: 14
New guest blogger Matt Wood heads up the Production Software team at the Wellcome Trust Sanger Institute, where he builds tools and processes to manage tens of terabytes of data per day in support of genomic research. Matt will be exploring the intersection of data, computer technology, and science on Radar.
The original Human Genome Project was completed in 2003, after a 13-year worldwide effort and a billion dollar budget. The quest to sequence all three billion letters of the human genome, which encodes a wide range of human characteristics including the risk of disease, has provided the foundation for modern biomedical research.
Through research built around the human genome, the scientific community aims to learn more about the interplay of genes, and the role of biologically active regions of the genome in maintaining health or causing disease. Since such active areas are often well conserved between species, and given the huge costs involved in sequencing a human genome, scientists have worked hard to sequence a wide range of organisms that span evolutionary history.
This has resulted in the publication of around 40 different species' genomes, ranging from C. elegans to the Chimpanzee, from the Opossum to the Orangutan. These genomic sequences have helped progress the state of the art of human genomic research, in part, by helping to identify biologically important genes.
Whilst there is great value in comparing genomes between species, the answers to key questions of an individual's genetic makeup can only be found by looking at individuals within the same species. Until recently, this has been prohibitively expensive. We needed a quantum leap in cost-effective, timely individual genome sequencing, a leap delivered by a new wave of technologies from companies such as Illumina, Roche and Applied Biosystems.
In the last 18 months, new horizons in genomic research have opened up, along with a number of new projects looking to make a big impact (the 1000 Genomes Project and International Cancer Genome Consortium to name but two). Despite the huge potential, these new technologies bring with them some tough challenges for modern biological research.
High throughput
For the first time, biology has become truly data driven. New short-read sequencing technologies offer orders of magnitude greater resolution when sequencing DNA, sufficient to detect the single-letter changes that could indicate an increased risk of disease. The cost of this enhanced resolution comes in the form of substantial data throughput requirements, with a single sequencing instrument generating terabytes of data a week--more than all biological protocols to date. The methods by which data of this scale can be efficiently moved, analyzed, and made available to scientific collaborators (not least the challenge of backing it up), are cause for intense activity and discussion in biomedical research institutes around the globe.
Very rapid change
Scientific research has always been a relatively dynamic realm to work in, but the novel requirements of these new technologies bring with them unprecedented levels of flux. Software tools built around these technologies are required to bend and flex with the same agility as the frequently updated and refined underlying laboratory protocols and analysis techniques. A new breed of development approaches, techniques and technologies are needed to help biological researches add value to this data.
In a very short space of time the biological sciences have caught up with the data and analysis requirements of other large scale domains, such as high energy physics and astronomy. It is an exciting and challenging time to work in areas with such large scale requirements, and I look forward to discussing the role distribution, architecture and the networked future of science here on Radar.
tags: genomics, informatics, science, software
| comments: 14
submit:



