Scientists have long been fascinated by how populations evolve. In recent years, population genomics has emerged as a powerful tool for answering this question. By sequencing the genomes of large numbers of people from around the world, researchers are able to identify genetic variants that are associated with particular traits or diseases. This information can be used to develop new methods for both diagnosing and treating illnesses.
Genome-wide association studies (GWAS) have been used to identify genetic variation that contributes toward a variety of complex diseases over the last decade. In GWAS, phenotypes and genotypes from a large set of individuals is collected and scientists look for associations between the two.1 Researchers typically use genotyping arrays and follow up their findings with imputation through population reference panels, giving them a closer look at what causes illnesses throughout our society. However, these highly accurate and affordable designs often come with limitations due in part “from ascertainment bias of the genotyped variants present at particular SNP arrays. This limits the genome-wide variant coverage particularly for the discovery of the association with novel and/or rare variants.”2
Whole-genome sequencing (WGS) is a powerful approach that provides an unbiased alternative to probe large fraction genetic variation, improving the power and accuracy of association tests. However, it’s still very costly in GWAS designs, which often require assessment on entire cohort populations.2
While different methods can be used to study population genomics, scientists must carefully balance cost and the data collected. The decreased cost of sequencings combined with more accessible analysis methods has enabled a shift from looking at discreet biomarkers to large-scale sequencing approaches, including low-coverage whole genome sequencing.
The Many Approaches to Population Genomics
Sequencing approaches to population genomics vary in their ability to capture the genetic diversity of a species. These include:
One of the most common molecular techniques to assess global genetic variation is microarrays. A collection of minute DNA spots attached to a solid surface, this approach enables expression levels of large numbers of genes to be measured simultaneously. In GWAS, genotype data can be utilized to associate certain phenotypes in relation with specific genetic variants within a population.3
The genotyping arrays that are available for GWAS can yield accurate results with relatively low cost. However, though affordable, microarrays have drawbacks—you’re limited to the sites on the arrays, and a large reference panel to compare with your data is required.1 This makes it difficult to extend to new populations whether in human genetics or to entirely new organisms.
Reduced Representation Methodologies (RRS)
RRS is a popular method used to assay the diversity of genetic loci across an organism’s genome. It is considered a cost-effective approach for obtaining reliable diversity information for population genetics and many software tools have been developed to process RRS data.4
Decreasing sequencing costs and increasing availability of genomic resources means that population genetic studies are utilizing genomic data more frequently. Whereas in the past, tens of microsatellites may have been used to infer population structure and answer fundamental and applied questions, now thousands of single nucleotide polymorphisms (SNPs) can be generated and aligned to reference genomes. RRS is achieved by reducing the genomic data to be sequenced using restriction enzyme digestion and next-generation sequencing (NGS) of the resultant fragments. RRS provides an effective method of sequencing a large number of genome-wide loci across many individuals. Additionally, when coupled with a high-quality assembled reference genome, RRS “improves the reliability of genotype calls and subsequently improves any downstream inferences.”4
Low-Coverage WGS with Imputation
Low-coverage whole genome sequencing (LC-WGS) followed by imputation is a cost-effective genotyping approach for disease and population genetics studies.5 As the name implies, with LC-WGS, a large number of samples are sequenced with each sample averaging low coverage, typically 0.4 – 2X. This allows hundreds of samples to be loaded onto a single sequencing run.
Unlike deep sequencing strategies, LC-WGS relies on computational methods to fill in the missing data, and it allows scientists to study more populations, making it an affordable alternative to genotyping arrays and other technologies.3 Recent advances in imputation have removed bottlenecks around computation and leveraging of large reference panels.
LC-WGS has great potential for research in Africa, where the genetic diversity is greater than out-of-African populations. In addition, LC-WGS of fetal cell-free DNA (cfDNA) from noninvasive prenatal testing is another promising area of research. For example, a study with 140,000 Chinese women identified association of maternal traits such as height and BMI as well as insights into the genetic structure and migration history of the Chinese population.1
Multiplexed NGS Libraries with plexWell LP 384
Sequencing system output has increased at an astounding rate over the last several years, providing users with an opportunity to significantly reduce their sequencing costs per sample by simply loading more libraries per sequencing run. Until now, however, the methods for preparing and accurately pooling large numbers of libraries upstream of sequencing have been hampered by outdated NGS library prep kits that were originally designed to convert a single sample into a single sequence-ready NGS library.
Our plexWell LP 384 (low-pass whole genome) is engineered for low-pass whole genome library prep and sequencing. This multiplexed library preparation procedure is optimized for inputs of 10 ng of purified dsDNA per sample, and typically generates library fragment lengths ranging from 500 – 1,000 bp. The primary advantages and benefits of using the plexWell Library Preparation Kits are a streamlined 96 sample multiplexed library preparation workflow that tolerates variation in DNA input concentration and greatly saves on labor and consumable costs.6,7
Using a plexWell low pass 384 kit, multiple libraries can easily be prepared in 96-sample batches and loaded on the same sequencing run—all in a single day.7
Using Genomics to Aid Population Genetics/Conservation
Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci.4
In fact, the California Conservation Genomics Project (CCGP) is using the latest techniques in genomics to build a genomic dataset of nearly 250 species and subspecies, covering the ecoregions and habitats in California. This data will be used to develop a unique genomic map of California that will visualize meaningful genomic variation such as climate resilient hotspots and corridors to connect these hotspots.
The results of this work will be shared with conservation groups and policy makers to inform conservation. With steady progress already being made, soon there will be an abundance of data to make better conservation and land management decisions.
Sequencing-based approaches to population genomics are growing in popularity for good reason. Low coverage WGS, in particular, has allowed researchers to gather information once and revisit it for additional pathways. As we continue to learn more about population genomics, we are sure to unlock many more secrets about our past, present, and future.
- plexWell LP384 Library Prep Kit User Guide