In recent years, there has been an explosion in our ability to generate sequencing data. Thanks to advances in technology, we are now able to sequence genomes at a fraction of the time and cost that it took just a decade ago. This has had a profound impact on our understanding of biology, as well as on the way we diagnose and treat disease.
And in the midst of this progress, significant innovation has also occurred in how we can more efficiently use this sequencing data. One very powerful example of this is in the use of low-coverage (“low-pass”) sequencing to determine the sequence of new genomes by matching information present in previously sequenced genomes. Low-pass sequencing, also sometimes called “skim seq” or low-coverage whole genome sequencing (lcWGS) is a widely used method for surveying genomes for >99% of the information they contain, in some cases requiring less than 1x coverage.
To understand the capability of this approach, it is helpful to first look at the history of genotyping and how the use of sequencing has changed how we think about surveying genomes.
Microarrays: The Gold Standard in Genetic Analysis
One of the most utilized genetic assays for the last 20 years has been a technique called microarray (or “array-based”) genotyping. This is a powerful tool that allows researchers to rapidly screen for a large number of genetic markers that are present in a genome, making it invaluable for mapping disease-related genes and understanding the genetics of complex traits.
Microarrays work by detecting the presence of specific sequences of interest in a sample by using surface-bound nucleic acid probes that hybridize to different target sequences, enabling these sequences to be read via fluorescent detection.
Although there are newer, more sophisticated ways to genotype using DNA sequencing instruments, microarray-based genotyping is still widely used in a variety of applications for the simple reason that it is often the most cost-effective option. When compared to other methods, microarray-based genotyping requires less input material and can be run in a shorter amount of time. In addition, the data generated by this method is considered highly reliable and easy to interpret.
One reason microarrays are effective is that many of the polymorphisms (variations) present in one person’s genome are actually shared with many other individuals. The simplest example of this is the case of direct relatives. However, if you look at the entirety of the human population, there are huge segments of our genome that are widely shared globally. Because of this, the number of segments that need to be measured to determine if two people are related is quite small.
The polymorphisms that are present in each person’s genome are inherited as large segments (blocks) of DNA that make up haplotypes. By implication, the vast majority of variation in a genome can in fact be ascertained by determining which haplotypes the genome carries. This is what we call “haplotype structure”: the pattern by which the large inherited blocks of sequence in a genome correspond to similar large blocks present in the genomes of others.
In order to determine the haplotype structure of a single genome, two pieces of information are required: a substantial database of possible haplotypes created from many genomes in the population, and a sufficient amount of sequencing coverage from the single genome in order to accurately match the polymorphism of that genome to known haplotypes.
More Powerful than Microarrays: Combining Sequencing with Imputation
The genomics era has been a game-changer for haplotype research. In the past, haplotype studies were limited by the small number of reference genomes that were available, making it difficult to get an accurate picture of which haplotypes were common among humans and which were rare. However, thanks to projects like the International HapMap Project and the All of Us Research Program, we now have access to a huge number of reference genomes. This has allowed us to create databases of haplotypes that are much more comprehensive than anything that was possible in the past.
From this has emerged the ability to “impute” or determine the sequence of new individuals with even a small amount of sequencing data, an approach first described ten years ago by Pasanuic et al.
The combination of sequencing and imputation is more powerful than microarray-based genotyping in some crucial ways.
First, it is a more unbiased approach because unlike microarrays, it does not depend on probing only for known sequences. Because the majority of sequenced genomes come from very specific populations this turns out to be very critical for other human subpopulations that are more genetically isolated or unrelated to others, or underrepresented in available sequencing data.
Furthermore, low-pass data can also be enhanced with targeted sequencing data to determine if rare variants are present in critical regions of the genome (such as disease-related genes).
Yet, challenges remain. While low-pass sequencing and genotype imputation will continue to grow as an effective way to determine the genetic information present in individual genomes, in order for it to be cost-effective requires the ability to run large numbers of samples together to reduce the cost per sample. Sample multiplexing is an important factor, because modern sequencers are capable of generating data from hundreds to thousands of samples at the same time. plexWell technology is a powerful approach to this problem with its inherent capabilities for multiplexing performance and workflow simplicity.
The application of plexWell technology will continue to drive down the cost of sequencing, making it possible for more people to have their genomes sequenced. With the ever-growing number of samples that can be run simultaneously, we are one step closer to a future where everyone has access to their genetic information.