Monday, March 20, 2017

plexWell: Illumina Libraries by the Plateload

The advent of so-called next generation sequencers, particularly those from Illumina, have brought the price of sequence data down dramatically.  However, there is a catch: the cost of preparing DNA to go into the sequencer, the process known as library preparation, has glided downwards on a much shallower trajectory.  This means that for projects wishing to sequence very large numbers of small genomes or large constructs the cost of library preparation can be similar to or even exceed the cost of data generation.  A small company north of Boston called seqWell Inc has a new approach to Illumina library generation which they are on the cusp of making widely available, and not only does this bring the cost per well down but it is designed to yield normalized libraries from relatively unnormalized samples.

Note that I've prepared this piece after interviewing seqWell, but I'm also very familiar with the technology as we have been accessing it for the last couple of years via seqWell's contract sequencing service.  I did let seqWell review this prior to release, as I must be mindful of the confidentiality agreements I am bound to via my employer. seqWell was also generous in allowing me to steal images 

First, a quick summary of the current state of Illumina library preparation.  A number of general strategies for library preparation have been developed, but they largely fall into two camps.  What I will call "conventional" libraries start by shearing the DNA by physical, chemical or enzymatic means.  The fragmented DNA is end repaired, tailed and then ligated to adapters.  Such preparations can give great control and uniformity for library properties such as insert size and can (particularly with physical shearing) give very uniform coverage, but are difficult to perform en masse except with expensive liquid handling robots.

In tagmentation approaches, transposase synaptic complexes (transposase primed with an oligonucleotide) are used to both fragment the DNA and link the sequencing adapters.  The most popular and best known such approach is Nextera from Illumina, but similar schemes exist for other platforms.  In the current scheme, Nextera transposase complexes add both the P5 and P7 adapters needed for sequencing but do not barcode the DNA; this is performed during the library amplification PCR (some Illumina publications use barcoded transposons, but these have not been commercialized).  After barcoding PCR, the samples can then be pooled.  Illumina has long offered 96 different barcode combinations; 12 row barcodes and 8 column barcodes.

Nextera can be sensitive in several steps to the amount of input DNA.  During the tagmentation step, the ratio of transposases to DNA determines the overall insert size distribution.  If too little DNA is in the reaction, then transposases will tend to hop close to each other and generate very short inserts.  Overload the reaction and most transposases will be inserted far from each other.  The pooling step also has a chance for mischief; if the concentrations of the different libraries are wildly different, then higher sequencing depth will be required to adequately sequence the low concentration libraries.  Illumina has introduced bead-based schemes to normalize the libraries, but the possibility for error can still arise.

Nextera tends to be the lowest price library construction system for Illumina sequencers, perhaps getting the cost of a library into the $50-$100 range.  Nextera also avoids expensive DNA fragmentation instruments (which can have expensive consumables) and is a much simpler workflow than ligation-based procedures.  Periodically papers are published which claim to get the cost of libraries down much lower, often by using a more sparing amount of the Nextera transposase reagent.  But none of these seem to have really taken off, perhaps because variability in library quality increases with the diluted reagent.

What seqWell has developed with their plexWell reagent set is a variation on the tagmentation approach that attacks the different problems. seqWell estimates that plexWell reduces library construction costs by 50-90% and in addition eliminates in most cases the cost and labor of normalizing samples.  By compressing input DNA quantity variation into very little library quanitity variation, samples can be packed more tightly on a sequencer, as there is less worry that the lowest input samples will be undersequenced. Now, the one catch is that this pricing really only kicks in if you prepare libraries in large batches, but for really large projects with lots of samples, that shouldn't be an issue.

The plexWell protocol starts with the stamping of sample DNA into the plexWell 96-well reagent plates.  These plates contain transposase complexes which tagment the DNA, but only with the P7 tag.  The reagent is formulated to insert tags relatively sparsely.  Each well's reagent contains a unique barcode.

After that first tagmentation step the samples from a 96-well plate are pooled.  Now a second transposase reagent is added which has its own barcode (to track to the plate) and add the P5 tag.  This reagent also contains carrier DNA, which forces the overall DNA concentration into a predictable range. So the frequency of hopping into the DNA is very predictable, which means the insert size distribution is very consistent between runs.
Since each sample is now barcoded by well (from the first tagmentation) and plate (second tagmentation), a pool-of-pools can now be made to carry through further processing.  So plexWell can address 96 plates of 96 wells each for 9,216 individual libraries, enabling very large numbers of samples to be run in a single flowcell lane.  Now obviously this would make no sense with large genomes, but if you are sequencing 10-20Kb constructs then a MiSeq or even MiniSeq run could accommodate many, many constructs. You do need to allow for host DNA background, though the degree of it will depend on the construct and your DNA purification methods -- preps of small plasmids can be very clean whereas large BACs inevitably have preps with significant background.

So by splitting the tagmentation into two steps, instead of requiring a single step to add both tags, plexWell enables a more rapid consolidation to a single tube (though Nextera could accomplish this by tagging the transposases with barcodes, which Illumina has published in research papers but not commercialized), gives better control over insert size and a greater tolerance to input DNA variation.

Indeed, seqWell touts that plexWell can compress about 100-fold variation in input DNA (1 to 100 ng) to about 2-4 fold variation in read counts between libraries, with very consistent insert sizes.

 plexWell also shows insensitivity to %GC content which is superior to Nextera in terms of coverage, at least for an E.coli test sample.

The applications of such a lower cost library preparation system are many-fold.  I've mentioned synthetic biology constructs, but this technique can also be applied to microbial genomes, long amplicon libraries (such as 16S or ITS for species identification), ORFeome projects and the like. A goal of the company is to make such projects cost-feasible on desktop sequencers and thereby chip away at one remaining stronghold of Sanger sequencing.

Don't be surprised to see future updates on seqWell Inc.  At AGBT they had posters both on plexWell and on a single tube phased read library approach.  That's a lot of innovation coming out of a very small company: I think their full-time employee count is up to 5!  seqWell is housed in incubator space at a small local college, which is a concept I'd love to see more of. Then there's the personal angle: I developed this piece with face-to-face and email interviews with three of the seqWell team: CEO Joe Mellor, CTO Jack Leonard and VP Commercialization Chris Boggess.  Even before working with this group on projects we had connections: Joe was a post doc with Frederick Roth, who was a fellow graduate student in the Church lab (and is also a seqWell advisor) and Jack was at Codon Devices.


Anonymous said...

Would you be able to comment if this system is applicable for cDNA sequencing/transcriptome sequencing? That is something important as kind of a substitute for qPCR of multiple samples; or to tag lot of single cells. Ability to tag a cell with single barcode is pretty useful I think.


Rob said...

Hi, price comparisons could be made between plexWell and Amyris protocol that also uses downscaled tagmentase.