When 23andme offered a few select clients the opportunity to have the protein-coding portion of their genome sequenced, Gabe Rudy jumped at the chance. On Wednesday, he walked strangers through the results. His conclusion: most detected genetic ?variants of interest? are either not variants or not interesting. ?Clinics beware,? he writes in a blog detailing the analysis.
23andme?s standard service does not sequence people?s DNA, but instead probes for common variants, then it lists these variants with an analysis of health, ancestry, and other information, such as whether you carry a variant more often found in people who find cilantro tastes soapy.
The exome sequence contained no such information, says Rudy, it was simply a list of ?variant calls? or differences that had been found between the sequenced individual and the reference genome. There are several research software pipelines available to call variants. 23and me used what is probably the most popular one, which is available from the Broad Institute.
An executive at DNA analysis company Golden Helix, Rudy was much better prepared than most to tackle this list.
He took the files he received for himself (as well as for his wife and son) and poured them into his own company?s software: the SNP and Variation Suite (SVS) and a freely available visualization and inspection tool called GenomeBrowse. Next, he began to assess the evidence behind his 151,000 variant calls and start to put them in their biological context.
While whole genome sequences cover all the DNA on all the chromosomes, exomes focus on the 2% or so of the genome that contains genes. Though exome sequencing aims to provide data for all protein-coding genes, only about three-quarters of genetic regions are profiled with enough accuracy for variants to be called confidently in a ?research-grade??(30x) exome. Even with ?clinical-grade? exomes in which each DNA fragment is sampled 80 times or more, 5 to 15% of variants will still not be called variants. And those ?low-coverage? regions vary with each exome. As a result, Rudy had variants that had been called in his genome that he couldn?t compare with those in his wife and son.
The raw data contained about 151,000 variants, but that number dropped to 80,000 when he pulled out problemmatic variants. These were generally variants with too few reads (meaning not enough data had been collected for certainty) or variants with far too many reads, indicating they came from highly duplicated regions and so could not be reliably tied to particular genes or even chromosomes.
Then he excluded variants that were quite common. (Not only are these unlikely to have catastrophic impacts on health, but about 17,000 are already interrogated by 23andme?s publicly available, more interpretive services designed to detect common variation). And then he excluded variants not expected to change the protein sequence, for example variants in introns?regions within genes that do not code for protein?as well as ?synonymous? variants that code for the same protein, but with a different sequence. (Such variants might have an effect on cells, but we don?t know enough to figure out what it is.)
That left him with nearly 1,700 variants likely to affect the protein. Of these about 197 were predicted to be of a sort called ?loss-of-function? meaning that the protein would either not be made at all, or would be made in a radically different way.
With this list in hand, he turned to a database called OMIM (Online Mendelian Inheritance in Man) that catalogues genes associated with diseases. That left him with 40 variants. Then, he focused on genes in which both copies contained a variant, since most genetic diseases require problems in both genes to have a real impact. Now his list had whittled down to a mere 16 ? a mere 0.01% of the total.
Now was the time to re-assess the evidence that each variant existed. For some variants, there were no reads supporting the variant call. The bad call came down to a bug in the variant-calling software. All in all, 5 reads were bad calls when the evidence was examined. Four variants were actually common, but in a way that was hard to detect. And three were also found in his wife, a healthy, unrelated individual. Those three variants are probably common and will show up in population catalogues, such as the 1000 Genomes data, as catalogues improve.
That left three variants that seemed to be real and also seemed to be in genes. Rudy has no idea what two of them mean, but one variant had been classified as pathogenic. Diseases associated with this gene include a fatal neonatal form and a late-onset form implicated with toxic build-up of metabolites in dietary protein.
To be sure, the variants that were whittled away hold medically relevant clues, but scientists have yet to piece them together. And Rudy?s detective work with the 16 variants he deemed most interesting showed how quickly intriguing signals can evaporate.
While very similar approaches have helped find the cause of rare, genetic diseases, we are still a ways away from helping healthy people routinely make medical sense of their genome, says Rudy. What a clinician needs to advise patients is very different from what the most-available tools. Finding and assessing variants is ?still largely a research field, and tools are built by researchers who are trying to push the limits of their data,? he warns.
But even if few people may be able to make sense of their genome sequence, many more people will soon be able to obtain them, whether or not 23andme rolls out its exome service to more people provide. Late last week, a company called Gene By Gene announced a direct to consumer sequencing service.?It will?sequence exomes for $695 and whole genomes for $5495.
The company will explicitly not provide any interpretation services, says President Bennett Greenspan. The reasoning is that providing medical advice could run afoul of FDA regulations. Though genomicists say that few physicians now how to deal with genetic information, Greenspan bets that there are some hospitals and physicians without access to sequencing, who are willing to give it a try. ?We?re not going to be the guys who will turn up our nose because a doctor wants to look at something and has three samples.?
But if experience is any guide, the quest to figure out what all the letters mean will be more difficult than finding the letters in the first place.
[Note: This post has been edited to clarify definition of research-grade exomes and low coverage regions.]
Source: http://blogs.nature.com/news/2012/12/expert-tours-his-own-exome-and-finds-mainly-false-alarms.html
loretta lynn gene kelly zoe saldana zooey deschanel and joseph gordon levitt debra messing ayaan hirsi ali rachel uchitel
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.