Sequencing technology has come a long way since the first human genome was sequenced in 2003. In the early days, scientists had to laboriously read each letter of DNA one by one. As this field has advanced, new players have emerged, offering different technologies that can sequence DNA faster and more efficiently.
With these advancements, however, there are now additional considerations for selecting a sequencing technology – from what coverage is needed and how many samples need to be multiplexed together, to what read length would be most beneficial.
One of the key decisions in any sequencing experiment is deciding which platform to use. The two most common types of platforms are short-read and long-read sequencing. As their names suggest, short-read sequencing produces reads that are shorter in length, while long-read sequencing produces reads that are longer in length. Both types of sequencing have their own benefits and drawbacks, depending on the specific goals of the experiment.1
This blog post is your guide to better understanding the strengths and weaknesses of both short- and long-read sequencing approaches and when it’s most beneficial to use them.
The Workhorse in NGS Labs: Short-read Sequencing
Short read sequencing is a powerful tool for generating genomic data. With short read sequencing, DNA or RNA can be sequenced in a shorter amount of time and at a lower cost than traditional methods. This technology has revolutionized biomedical research and led to important discoveries in genomics, evolution, and disease. It has also been used to assemble whole genomes, providing valuable information about the structure and function of genes. Short reads are effective for applications aimed at counting the abundance of specific sequences, identifying variants within otherwise well-conserved sequences, or for profiling the expression of particular transcripts.2
Short-read sequencing has long been considered the workhorse in NGS labs, mainly because it is the best way to obtain high depth and high-quality data for the lowest cost per base. Illumina’s platform dominates this field along with Thermo Fisher Scientific’s Ion Proton technology.
However, in 2022 alone, multiple new sequencing technologies have surfaced and been introduced to the market, helping to drive short-read sequencing costs even lower. These include instruments from Element Biosciences, Ultima Genomics, MGI, Singular Genomics and the PacBio acquired technology, Omniome.
While these technologies use different chemistries, they all generate reads that are at most, a few hundred bases long. Even still, the reads are high quality, and there are a lot of them – anywhere from a few million to hundreds of billions depending on the sequencer. This means researchers can get higher coverage of their genomes or targets of interest and enables high confidence SNP and mutation calling.
All of these technologies have a common limitation – the inability to sequence long stretches of DNA. To sequence a large stretch of DNA using NGS, such as a human genome, the strands have to be fragmented and amplified. Bioinformatic programs are then used to assemble these random fragments into a continuous sequence. Unfortunately, these amplification steps can introduce biases into the samples. Also, short-read sequencing can fail to generate a sufficient overlap between the DNA fragments. Overall, this means that sequencing a highly complex and repetitive genome, like that of a human, can be challenging using these technologies.1
New sequencing technologies have made it possible to generate vast amounts of data, but they have also introduced a new burden: the need for a multistep library prep process. This process can be time-consuming and expensive, often resulting in inaccurate read counts and uneven insert size distributions. seqWell’s plexWell technology is designed to address these problems. By using plexWell, researchers can achieve balanced multiplexed libraries with highly uniform insert size distributions and accurate sample read counts. As a result, plexWell provides an important new tool for researchers who are looking to maximize the efficiency of their sequencing experiments.
Long-read Sequencing: More Accessible than Ever
Long read sequencing is another powerful tool that can be used to provide insights into a wide variety of genomic applications. As the name implies, long-read sequencers are able to generate reads with much longer lengths – anywhere from a few thousand to hundreds of thousands of bases. These longer reads allow researchers to more easily identify complex structural variation such as large insertions/deletions, inversions, repeats, duplications, and translocations. This sequencing technology can also be used to phase SNPs into haplotypes, build scaffolds for de novo assembly and resolve splicing events in full length cDNA.
Long-read instruments have been on the market for the past decade but the lower yield, higher error rate, and higher costs of the instruments, have kept them from being more widely adopted.
An additional downside is that the accuracy per read can be much lower than that of short-read sequencing. The high error rate of nanopore technology is largely due to the inability to control the speed of the DNA molecules through the pore – these are systematic errors. Errors in SMRT sequencing are completely random. These can be reduced by circular consensus sequencing; a method that allows DNA to pass through the zero-mode waveguide chip several times, generating highly accurate reads of at least 99.8%, similar to NGS platforms.1
There are also problems with applying long-read sequencing to different genome lengths, as the data processing takes longer for organisms with larger genomes. PacBio and Oxford Nanopore Technologies (ONT) have both been working to make long-read sequencing more accessible.
PacBio has improved the chemistry on their Sequel II instruments, enabling “HiFi sequencing” via circular consensus, which allows for sequencing of up to 15-20 kb pieces of DNA with error rate that are closer to short read sequencing.
ONT uses nanopore sequencing, and offer a variety of platforms with a range of price, data outputs, and portability. This kind of flexibility is useful for labs of all sizes and allows for read lengths of up to hundreds of kb, but the base error rate is a bit higher than HiFi sequencing from PacBio.
The Best of Both Worlds: Mixing Short- and Long-read Data
For many projects, mixing short and long read data together can have its advantages. Researchers can leverage the lower cost per base, high depth and higher quality data of short read sequencing to generate high confidence SNP and mutation calls, then on top of that data, layer information from long-read sequencing to resolve complex SVs and phase haplotypes. This of course requires more sophisticated analysis methods, but for de novo assembly or rare disease sequencings, using both short and long read sequencing technologies can prove highly beneficial, leading to greater understanding of genetic variation.
The debate between short-read and long-read sequencing is an ongoing one, but it is clear that both technologies have their own unique benefits. For the most comprehensive results, it is best to combine the two together. By doing so, you can get the most complete picture of your data while still taking advantage of the speed and affordability of short-read sequencing.