Wednesday, February 26, 2014

NGS Saves A Young Life


One of the most electrifying talks at AGBT this year was given by Joe DeRisi of UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably a few other causes) and then ran down a gripping case history which seemed straight out of House.


The patient in the case history was a 14 year old boy (which hit home; that’s my son’s age) born with Severe Combined Immunodeficiency Syndrome (SCID). He had previously received a bone marrow transplant from his mother.  In summer 2013 he swam on vacations in Puerto Rico and Florida resort (nervous murmurs from the crowd, many of whom had just swam in a Florida resort).  The household also had multiple cats (toxoplasmosis)?  Given the setting and the preamble, this was clearly going to be a zebra hunt.

I can’t do justice to all the stages of the progression of this patient’s illness, so I won’t try.  It started with a bout of conjunctivitis, then later uveitis.  Then more medical visits, with elevated white cell counts.  Given the patient’s complex background, sometimes he was treated for an infection and other times for possible Graft-vs-Host Disease (GVD)  and other times for a recurrence of SCID.  Clearly these are approaches which might not be sympatico: GVD is treated with immunosuppressants but for infections one might want to tune up the immune system.  Various assays for infection, using cultures and PCR, repeated came back negative.  Diagnostic tests became increasingly invasive, starting with spinal taps.

Eventually, the boy landed in the hospital and was there for over a month, with steadily worsening condition. A 1 cubic centimeter brain biopsy (shudders from the crowd) revealed inflammation, but no definite cause.  DeRisi showed a picture of the boy which his parent's had made public; he was nearly encased in medical tubing. Due to worsening mental status, a coma was induced.  In desperation, the doctors approached DeRisi to use sequencing as an unbiased search for an occult pathogen.

So, a protocol was quickly thrown together and approved by an emergency Institutional Review Board (IRB).  DNA was purified from the brain biopsy and subjected to sequencing on a MiSeq.  The entire process from DNA to results took 2 days, with the team painfully aware that their patient might expire at any time. 

The sequence reads were mostly human, and after that was culled out a lot were very pedestrian.  But 400+ reads mapped to Leptospira, a spirochete which can cause encephalitis.  Which often presents as eye infections. Further sequencing fingerprinted it as a strain common in the Caribbean.  The IRB reconvened to consider the issue of treating a patient based on research data from a non-CLIA lab, but given the general safety of the recommended course (high dose penicillin) and the grave condition of the patient, treatment was initiated and the boy recovered.   Further testing at the CDC with a specific PCR test for Leptospira, under CLIA conditions, was negative (but the test has a rated sensitivity of only 60%!) – a reminder that CLIA is solely a set of procedural requirements but not around analytical value.

It’s a great story (I hope DeRisi and team publish it), but it can also be seen as a great jumping off point for designing a system to tackle such cases.

For example, DeRisi stated that most of the first day was taken up by library preparation.  There are certainly faster library preps; Illumina claims 1.5 hours for a rapid Nextera XT protocol.  Should the protocol include a step to deplete human DNA? NEB has a kit based on methylation and Carlos Bustamante from Stanford talked about using RNA baits derived from human genomic libraries to enhance microbiome studies (or conversely, to enrich for ancient human DNA).  Would the extra complexity and time gain added sensitivity?  Could no-PCR or low-PCR library preps be used, to both acceleratethe preparation and reduce the risk of carryover contamination?  Or perhaps using other tools, such as selecting sample barcodes from a very large pool of barcodes be the best way to detect cross-contamination?

There’s also the choice of platform and read characteristics.  DeRisi used MiSeq, which has some fast options.  I’ve heard some vocal proponents of the Ion platform for rapid analyses such as this.  How many reads and how long should they be?  What are the effects on downstream informatics.  And would this be a good target for some of the emerging platforms?  The David Jaffe (Broad Institute) talk on assembling with Oxford Nanopore data was a bit short on statistics, but perhaps this platform would have enough firepower to detect a pathogen, with the advantage of no PCR.  But is the desire to simply identify a bacterium, or should one be shooting to detect subtle features?  I doubt this; the depth of the data DeRisi described was far short of being able to assemble or to detect some small feature, but should this be a goal?

If the key requirements for rapid pathogen identification are speed and read quantity, but with relaxed demands on base accuracy or read length, then this field may represent a huge opportunity for emerging sequencing technologies.  Several such companies were at the conference in different forms – Genapsys presented, Genia & PicoSeq had posters and a charming fellow from Quantum Biosystems was showing off the evolutionary history of their chips in the bar.  If simple identification of pathogens is sufficient, then perhaps lots of really noisy reads would be sufficient – and pathogen detection an early revenue opportunity for these companies -- particularly if this sort of analysis becomes routine and expected at every major medical center.

The downstream informatics could be a rich source of innovation, as this took a significant amount of time (as long as the sequencing, if I recall correctly).  Could reads be scanned for human-ness as they are generated, with the sequencer only exporting non-human reads?  DeRisi used BLASTN versus all of GenBank after depleting the human reads, which is neither the fastest algorithm nor an ideal database.  Genbank has both a lot of redundancy and a lot of uninteresting genomes; if you identify Ficus reads will you care in this setting?  Perhaps just matching the k-mer profile of the sequences would be sufficient.  How much time could be saved by a tool which wasn’t actually aligning to the human genome, but simply finding fragments that could align to the human genome? Or do you just take their protocol and have ginormous compute resources on standby, using a large cloud very briefly to chew through the data?  And for clinicians who are not genomicists, how do you best present the end results?  DeRisi showed a taxonomy browser,  but using such implies a certain degree of training and background.  Perhaps a list of bad actors reported in rank order of abundance makes more sense.

A truly poignant question is whether this infection could have been detected by NGS earlier and less invasively, before the patient and family went through so much suffering and anxiety?  Could it have been detected in the spinal taps?  I don't believe DeRisi addressed that question, though specific PCR assays designed from the sequence data failed to detect Leptospira in the blood samples.

DeRisi is setting up a center at UCSF to routinely use sequencing to identify pathogens, as well as a rat brain slice assay for autoimmunity (I just caught myself before explaining that one to someone over lunch!). Centers such as this will presumably work out the sorts of questions above: what are the requirements for this space and what are the best ways to meet them.  There are some great opportunities here for the bioinformatics community to focus on something truly different and potentially life-saving, far better than dubious performance improvement claims for short read aligners.  And also more challenging: real time bioinformatics has not received much attention, and carries with it some strong programming issues. If releasing human datasets has privacy concerns, perhaps DeRisi could release some of data from his ursine subjects?  Diverse public datasets, and perhaps even CASP- or Assemblathon-style challenges would seem very apropos for rapid pathogen detection, where the stakes are potentially very high.

15 comments:

nextgenseek said...

Thanks for the awesome post. Glad that you beat me to do it :) I thought this great story got completely swept away due to Nanopore talk at AGBT. Really glad that you wrote it. I have storified tweets from Joe DeRis's talk here

http://nextgenseek.com/2014/02/ngs-in-critical-care-a-feel-good-story/

Anonymous said...

"In desperation, the doctors approached DeRisi to use sequencing as an unbiased search for an occult pathogen." This part is fascinating to me. Do we know how the doctors knew to contact DeRisi? How did they know who he was, or what he might be able to do? If this were fiction, this would be a deus ex machina. It's that missing link that explains so much of what goes on in the world. Sure, maybe he plays golf with one of the doctors and it was totally serendipitous. Or is there now a consciousness among clinicians of NGS (and knowledge of the people doing the sequencing and bioinformatics) that stories like this actually happen all the time. I'm just curious.

Keith Robison said...

Anonyomous: that's how DeRisi presented it in my memory - a desperation call. Whether they were aware of his polar bear work he didn't say; I found that when checking if the story had been published. Not being in an academic medical center, I can't comment on the degree to which this communication occurs -- but it certainly needs to!

Unknown said...

Many thanks Keith for this write-up. (You saved me the effort!)

There are a few items that I've thought about since that great talk, in order:

1) A bit of a custom-tailored situation to demonstrate the power of NGS for a critical-care case. You point out the multiple possible etiologies for encephilitis, and in this case the physicians presumed since the infectious diagnostics came back negative, it was presumed it was autoimmune in nature. And the patient's condition only grew worse. So here was a case where the patient was both immune-compromised and exposed to a possible environmental pathogen.

2) DeRisi focused his efforts on the 90 min of analysis and how it could be accelerated, and no detail about how the sequencing could have been sped up. Thanks for the acknowledgement that Ion Torrent PGM could have been used to shave at least 10h if not 12h from the sample to answer cycle.

3) Also Joe did not mention anything at all about the 7h sample preparation. He knew he was working with an unknown causative agent, including fungus and virus. In the Q&A he was asked about it, he just mentioned it was a 'total nucleic acid prep', which presumably meant accounting for both RNA and DNA viruses, along with fungi which could pose problems of its own, along with Gram+ and Gram- bacteria. But DeRisi knew all this, and knew how to prepare separate libraries, equalize/pool them, and sequence.

Bonus point 4: Out of 1570 cases in the past 7y, a full 63% went undiagnosed. So there's a real unmet healthcare need here that NGS can solve.

Thanks again for the post.

Dale

Unknown said...

Many thanks Keith for this write-up. (You saved me the effort!)

There are a few items that I've thought about since that great talk, in order:

1) A bit of a custom-tailored situation to demonstrate the power of NGS for a critical-care case. You point out the multiple possible etiologies for encephilitis, and in this case the physicians presumed since the infectious diagnostics came back negative, it was presumed it was autoimmune in nature. And the patient's condition only grew worse. So here was a case where the patient was both immune-compromised and exposed to a possible environmental pathogen.

2) DeRisi focused his efforts on the 90 min of analysis and how it could be accelerated, and no detail about how the sequencing could have been sped up. Thanks for the acknowledgement that Ion Torrent PGM could have been used to shave at least 10h if not 12h from the sample to answer cycle.

3) Also Joe did not mention anything at all about the 7h sample preparation. He knew he was working with an unknown causative agent, including fungus and virus. In the Q&A he was asked about it, he just mentioned it was a 'total nucleic acid prep', which presumably meant accounting for both RNA and DNA viruses, along with fungi which could pose problems of its own, along with Gram+ and Gram- bacteria. But DeRisi knew all this, and knew how to prepare separate libraries, equalize/pool them, and sequence.

Bonus point 4: Out of 1570 cases in the past 7y, a full 63% went undiagnosed. So there's a real unmet healthcare need here that NGS can solve.

Thanks again for the post.

Dale

xiechaos said...

Thanks for sharing this. When doctors trying to use culture to identify the pathogen for my 2-month old son one month ago, I strongly feel sequencing should be used for such purposes.

I also like your proposal on CASP or Assemblothon style challenge. There was such a challenge organized by DTRA of Department of Defense last year (DTRA Algorithm Challenge, 1 million dollar prize) covering exactly the same problem. We won the challenge by developing a series of new algorithms, such as fast host read filter, fast and sensitive GenBank alignment tool (has to work with reads from Miseq Ion Torrent 454 PacBio), and an accurate taxa assignment algorithm. Hopefully those algorithms will be released soon, and hope sequencing will be routinely used in hospitals to detect pathogens soon.

xiechaos said...

Thanks for sharing this. When doctors trying to use culture to identify the pathogen for my 2-month old son one month ago, I strongly feel sequencing should be used for such purposes.

I also like your proposal on CASP or Assemblothon style challenge. There was such a challenge organized by DTRA of Department of Defense last year (DTRA Algorithm Challenge, 1 million dollar prize) covering exactly the same problem. We won the challenge by developing a series of new algorithms, such as fast host read filter, fast and sensitive GenBank alignment tool (has to work with reads from Miseq Ion Torrent 454 PacBio), and an accurate taxa assignment algorithm. Hopefully those algorithms will be released soon, and hope sequencing will be routinely used in hospitals to detect pathogens soon.

Jonathan Jacobs said...

Great post Keith - your comment on using the Ion platform for "rapid ID of pathogens" is spot on - some protocols in my lab are <12 hours at the moment (ampli-seq) for biosurveillance purposes. The MiSeq though - is a very capable fast, brute force "metagenomics" platform when it needs to be as well. (speaking from experience) But in a diagnostics/biosurveillance role - you're not _really_ doing metagenomics. The basic question is "Given a list of pathogens (maybe a fairly big list, but smaller than Genbank) - are any of these bugs in this sample?"

IMHO - this is where NGS is going for rapid pathogen diagnostics. In a few years, given the rate of NGS innovation, the days of PCR testing samples for pathogen ID will be dead.

Unknown said...

The quality of the reads coming from NGS platform is a key for diagnostic purposes. The time is fine but the quality is everything. !!!

klontok said...

@Anonymous: I was a grad student in the DeRisi lab. Before switching over to NGS, the lab did similar pathogen hunting using a custom designed mircoarray called the ViroChip. In fact, they (I say "they" because I worked on a completely different project) helped identify the causative agent for SARS, and there were several projects in the lab on identifying novel viruses in patient samples of unknown etiology from Bay Area hospitals. So, the lab is actually somewhat well-known for this type of thing.

Torsten Seemann said...

Thanks for this story - it is particularly interesting to me having worked on Leptospira genomics since I started in bioinformatics 11 years ago!

The k-mer approach for identification is a good strategy, recently implemented in Kraken by Wood and Salzberg: http://ccb.jhu.edu/software/kraken/

Leptospira has now been at AGBT, on The Simpsons, on Mythbusters, and on The Big Bang Theory! :-)

Mark Pallen said...

Thanks for the interesting post: a triumph for diagnostic metagenomics! Readers might be interested in some of our recent publications on this approach:
Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9186805

A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4
http://jama.jamanetwork.com/article.aspx?articleid=1677374

Metagenomic analysis of tuberculosis in a mummy
http://www.nejm.org/doi/full/10.1056/NEJMc1302295

Mark Pallen said...

Thanks for the interesting post: a triumph for diagnostic metagenomics! Readers might be interested in some of our recent publications on this approach:
Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9186805

A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4
http://jama.jamanetwork.com/article.aspx?articleid=1677374

Metagenomic analysis of tuberculosis in a mummy
http://www.nejm.org/doi/full/10.1056/NEJMc1302295

PasserBy said...

What I think the bigger issue is the failure of the directed PCRs. What this shows is a disconnect of curation of the PCR or similar technique (qPCR, sanger sequencing, fusion NGS) that keeps the sequence design current with the correct specifications. NGS is a great tool and I use it quite often for specific cases like this, however after reviewing what was done and not done it almost always comes down to failure to update the oligo design for current strain and region information that causes a specific assay to fail. Other times, it is the time point of when the sample is collected an analyzed makes it so the target is not present in the sample. NGS is a great tool, but for several hundred/thousand dollars per sample it would be much more beneficial to the patient and cost effective to run an accurately designed and curated qPCR or panel of qPCRs in a half day than spending the effort required by NGS. That said, each molecular analysis tool has its place. The key to successful molecular analysis is rapid, redundant, and repetative bioinformatics curating the assay desings in silico on a regular and ongoing basis. Good to hear the causative agent was identified in this case.

Keith Robison said...

PasserBy: thanks for the comment! While I agree with you that a good qPCR panel could be an option, the appeals of sequencing are that on the one hand one is empowering detection of a very broad range of possible pathogens and on the other that the cost or sequencing is still on a steady downward drop, whereas qPCR is a pretty mature technology. Shotgun sequencing also has the advantage of not requiring any specialized (for the assay) reagents to be pre-positioned at point-of-care; for this setting time is critical.