Introduction
Long-read sequencing isn’t exactly new. Instruments for this method have been on the market for the past decade. However, recent technologic advancements, particularly from PacBio, have increased the buzz around long-read seq. In fact, in a 2023 Nature Methods article, Marx dubbed it the “method of the year”.1
Long-read sequencing technologies have the ability to assess large and complex regions of the genome, which makes them well-suited for clinical applications like molecular diagnosis and precision medicine.2
Beyond human health, long-read technologies are also being applied to planetary health efforts amid threats like climate change. For example, HiFi technology, which we discuss in more detail below, is demonstrating great value in AgBiotech areas like conservation, crop improvement, disease management, and food safety.3
So read on as we outline the current state of long-read sequencing and offer insights on where these powerful technologies are headed and the critical role of NGS library prep.
Benefits of Long-Read Sequencing
Analyzing data is a lot like building a jigsaw puzzle because you don’t always know what the final picture is going to be.
With short-read sequencing, you generate a puzzle filled with millions of tiny pieces, and it’s difficult to know exactly how everything fits together. Typically, with short-read platforms, the maximum length of an individual read is only a few hundred bases.
Conversely, with long-read seq, you generate fewer, much larger puzzle pieces, which makes it easier to assemble a genome back together and resolve difficult-to-map regions. With this method raw reads can be hundreds of thousands of bases in length.
To that point, long reads are hugely beneficial for identifying complex structural variation, such as large insertions/deletions, inversions, repeats, duplications, and translocations. This technology can also be used to phase SNPs into haplotypes, analyze full-length plasmids, build scaffolds for de novo assembly, and resolve splicing events in full-length cDNA. Oftentimes, other information like DNA methylation can be detected using long-read seq.
Potential Disadvantages of Long-Read Sequencing
There are some pitfalls, however, compared to short-read technologies.
For example, the overall yield and throughput of long-read instruments is generally lower and, because of this, the cost per base of long-read technologies is generally higher compared to short-read technologies (although this continues to drop as technology improves).
Long-read technologies also have a higher raw error rate. However, this can often be corrected through higher coverage or consensus sequencing methods.4,5
Further, many common NGS methods and tools were built and tuned for short-read data so long-read data requires more specialized analysis. While the life sciences community is developing and sharing an increasing number of tools for long reads, labs often need custom bioinformatic support to analyze a long-read seq project.
Also, library prep methods for long reads tend to be more labor-intensive and, in most cases, more challenging to scale and automate compared to short-read seq methods.6
Current State of Long-Read Technologies
Oxford Nanopore Technologies (ONT) and PacBio are the two main players in the long-read arena. However, these two platforms rely on very different underlying technologies.
ONT tools rely on the ability sequence long DNA molecules that are deposited on a patterned flow cell with many small protein pores.7
When the DNA molecules pass through the pores under the force of an applied voltage field, the conductance at each pore changes based on the electrophysical properties of the bases of DNA that are in the pore at each instant time.
On the other hand, PacBio’s SMRT technology relies on the sequencing of single long molecules that are captured in very small openings, called zero-mode waveguides (ZMWs), which are patterned on a sequencing flow cell surface.8
The principle of operation of ZMW-based sequencing is that the dimensions of the extremely small ZMWs are small relative to the wavelength of light generated during sequential nucleotide incorporation events. This small size acts to amplify and improve the signal-to-noise of an otherwise difficult-to-detect signal produced by the activity of a single polymerase.
ZMWs still produce a signal of individually read nucleotides that is relatively noisy compared to short-read sequencing. To account for this, the sequenced molecules in each ZMW can be read via circular consensus (so called, “HiFi reads”) by exploiting closed loop hairpins that are at the end of each sequenced molecule. By reading the sequence of each molecule several times, the error rate of the HiFi reads is significantly improved over the raw accuracy of the underlying signal produced during sequencing.
Critical Role of NGS Library Prep
Both ONT and PacBio have made strides to increase the yield, throughput, and accuracy of their sequencing technologies. However, to date, the industry has not paid much attention to scaling up long-read library prep to match demands.
While some long-read methods include PCR steps (targeted sequencing, for example), many long-read library prep methods are completely free of amplification: A crucial step that allows researchers to sequence native bases, avoid introducing PCR biases, and gain extra information like DNA methylation. Not having amplification requires a large input of DNA into library prep, often up to 5 micrograms depending on the method.
Standard long-read library prep methods typically involve mechanical fragmentation of DNA to anywhere from around 8 kb to ≥20 kb, depending on the application. This is done using either individual centrifugation tubes (for example, G-tubes from Covaris) or syringe-based methods like the Megaruptor. These approaches are low-throughput, time-consuming, and essentially impossible to automate.
With increased sequencer yields, the need for higher-scale multiplexing solutions is also growing, particularly in the areas of targeted, low-pass, and bacterial/viral genome sequencing. Most current methods rely on ligation-based methods to add barcoded adapters, which is a multistep, expensive, and time-consuming process.
In the example of long-read targeted capture on the PacBio Revio, one can fit hundreds of samples per Revio’s SMRTcell depending on the panel size and desired coverage. The current standard method to make these libraries requires syringe-based fragmentation. which can take up to 1 hour to fragment just 8 samples. If doing a full plate of 96, that would require 12 hours for just the fragmentation, unless a lab has a fleet of these shearing instruments on hand.
Plus, as the samples are all processed in individual tubes, there is an increased risk for potential sample swaps and other user errors compared to if fragmentation could be performed in plates.
This step is then followed by ligation-based library prep, which could take another 3 hours.
It quickly becomes clear that using these methods to routinely process large-scale targeted capture projects is a daunting and expensive task.
The Future of Long-Read Library Prep
To match the throughput and scale of new long-read technologies, seqWell is developing library prep products that leverage a long history of expertise in the creation of multiplexing workflows.
We see the use of transposase-based technology as a key tool for this effort. Transposase chemistry has unique advantages versus conventional workflows that involve mechanical shearing and ligation-based chemistry.9
One of the key features of transposase-based tools is that the steps of fragmenting and adapting DNA molecules with sample barcodes and sequencing adapters can be compressed into a single molecular step. This has the potential to remove steps, improve the overall robustness and efficiency of the library preparation process, and allow for more samples to be processed at the same time.
seqWell is developing new methods that leverage optimized Tn5 transposase tagmentation to produce high-quality multiplexed libraries. These innovations are built off our purePlex™ unique dual index technology.
This library prep chemistry, currently dubbed purePlex HC, is a highly streamlined workflow that permits pooling of samples into plexes for hybrid capture immediately following transposase-mediated tagging, saving users both time and cost while generating libraries with high molecular complexity.10
Additionally, we are developing a version of this technology that is optimized for creating large insert (8-10 kb) libraries for long-read targeted capture and sequencing with a highly streamlined and automation friendly workflow.
If you are interested in learning more about purePlex HC and the other solutions we are designing to address the challenge of scalable long-read sequencing, please get in touch. Our team would love to discuss these innovations with you.
References
- Nature Methods | Method of the year: long-read sequencing
- Human Genomics | Long-read sequencing in clinical settings
- Eremid | HiFi sequencing with the PacBio Revio in AgBiotech
- PacBio | Understanding accuracy in DNA sequencing
- Oxford Nanopore Technologies | Sequencing accuracy
- seqWell Blog | Short-Read Sequencing vs. Long-Read Sequencing: Which Technology is Right for Your Research?
- ONT | The nanopore sequencing workflow
- PacBio | Sequencing 101: long-read sequencing
- seqWell | Transposase-based chemistry
- seqWell | purePlex HC scientific poster