Introduction

Next-generation sequencing (NGS) has ushered in a new era for the life science industry, providing unparalleled speed and high-throughput for DNA and RNA sequencing.1 NGS technologies generate significant amounts of data. Plus, labs can reduce costs by sequencing multiple samples together in a single run.

Regardless of the library prep method used, however, multiplexing can sometimes lead to index misassignment. The cause of this error is a phenomenon known as “index hopping,” which is when data from one sample “hops” to another index during multiplexed sequencing.1

Cause and Effect of Index Hopping

Index hopping is primarily the result of cross contamination that occurs during the sample preparation or sequencing stages.2 Despite even the most rigorous quality control, labs still face the risk of samples getting mixed up.

While this misassignment of libraries between indexes only affects approximately 1% of total reads, it can still yield inaccurate and potentially misleading results if not correctly identified and managed.3

The challenges posed by index hopping are substantial. Although it only affects a small percentage of reads, its impact on data quality can be significant, resulting in misidentification of sequences, which skews results and potentially leads to false conclusion.4,5 For example, in a 2020 Nature Communications article, Farouni et al. concluded that low levels of index hopping can lead to phantom molecules, which complicate downstream analysis of single cell RNA-seq experiments.6 These phantom molecules can affect cell characterization or identification and, in high amounts, can influence cell calling, which leads to an overestimation in cell numbers.7

In addition, index hopping affects data quality and downstream analysis with factors like:

  1. Data misinterpretation: The main challenge of index hopping is the misassignment of sequences to the wrong sample, which can significantly skew data interpretation and downstream analysis.
  2. Lower accuracy: Index hopping can decrease the accuracy of results due to erroneous assignment of reads. This can lead to false positives and negatives, making it difficult to interpret data.
  3. Increased complexity: Index hopping adds to the complexity of data analysis, requiring advanced bioinformatics tools and algorithms to correct the errors.
  4. Resource-intensive: Identifying and eliminating misassigned reads can be resource-intensive, adding to the time and cost of sequencing projects.
  5. Impact on sensitive applications: In sensitive applications like single-cell genomics or detecting low-frequency variants, even a small amount of index hopping can have a significant impact.

Strategies to Mitigate Index Hopping

Experimental Approaches

Enhanced library preparation techniques, such as unique dual indexing and unique molecular identifiers (UMIs), have shown promise in minimizing index hopping.

Dual indexing involves the use of two indexes for each sample, helping to facilitate the identification and removal of index-hopped reads. This method is becoming more and more popular due to its compatibility with various sequencing platforms.8

Another library prep innovation is the use of UMIs. These are short, random sequences attached to each DNA molecule before amplification. They serve as unique tags that can help identify and eliminate PCR duplicates, thus enhancing sequencing data accuracy.9

Both dual indexing and UMIs bring several benefits to the table. They can significantly reduce errors and improve the detection of low-frequency mutations.10 Sequencing with UMIs can reduce the rate of false-positive variant calls and increase sensitivity of variant detection.11

Optimization of PCR Conditions

PCR optimization plays a critical role in reducing index hopping. This is especially true when dealing with samples that have high homology or low diversity.

PCR optimization helps the designed primers be efficient, specific, and sensitive so the assay can be robust and reliable.12 When PCR conditions are optimized, the PCR amplification step of the sequencing library preparation process is more accurate, leaving a smaller chance that a DNA fragment will index hop and mistakenly get assigned to the index of a different sample.

Click here to read our blog on optimizing PCR.

Computational and Bioinformatics Approaches

Computational approaches are useful when working with low-pass sequencing data. Studies can benefit from using computational approaches over array genotyping, including having increased accuracy for risk prediction.

For example, computational approaches can help ensure accuracy in data analysis and data analysis pipelines. Statistical modeling and data analysis can help identify phantom molecules in data that result from index hopping. Statistical models can help purge data of these phantom molecules and more accurately classify data for better downstream quality control.13 Further, data analysis pipelines calculate pathogen detection, taxonomic classification level, and target read count to detect instances of index hopping.14

Bioinformatics approaches are also a helpful area where computational algorithms can correct index hopping errors. Post-amplification libraries can present adapter bands, and adapter dimers cause index hopping and demultiplexing issues. Bioinformatics offers several error correction methods, such as deconvolution algorithms that can help remove adapter dimers and identify index hopping instances.15

While bioinformatics approaches offer advantages, they also have some challenges and limitations. For example, in NGS-based HLA genotyping, bioinformatics offers helpful processing for large amounts of data. However, if sequence data need to be assigned to their region of origin, read cross-mapping from or to homologous sequences may confound correct read attribution.16

Innovative Library Prep Tools

Addressing the complex challenges posed by index hopping requires innovative solutions. Library prep tools like seqWell’s purePlex DNA Library Prep Kits play a crucial role in mitigating these issues, delivering high-quality sequencing data with minimized errors.

purePlex provides speed, performance, and auto-normalization with 384 unique dual indexes (4 sets of 96). These features help combat index hopping by ensuring each DNA fragment is correctly assigned to its original sample, and purePlex’s accuracy significantly reduces the likelihood of misidentification and data loss.

Conclusion

By implementing the strategies outlined in this blog and combining them with advanced library prep solutions like purePlex, researchers can effectively mitigate the impact of index hopping.

As a result, labs can achieve higher success rates in their low-pass applications while maintaining the high-throughput demands of the industry.

This blog is the second in our series on mitigating common challenges encountered during low-pass WGS. Click here to read the first post titled “Improve Success Rates in Low-Pass NGS Applications” and stay tuned for more insights on this important topic.

References

  1. IDT | Dual Index UMI Adapters reduce index hopping and improve variant identification
  2. BMC Genomics | Eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing
  3. seqWell | Getting a Read on NGS Barcodes
  4. seqWell | Improve Success Rates in Low-Pass NGS Applications
  5. Illumina | Minimize index hopping in multiplexed runs
  6. Nature Communications | Model-based analysis of sample index hopping
  7. 10x Genomics | Index hopping and how to resolve it
  8. BMC Genomics | Dual indexed library design enables compatibility of in-Drop single-cell RNA-sequencing with exAMP chemistry sequencing platforms
  9. Springer Link | Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers
  10. PLOS ONE | Benefits and Challenges with Applying Unique Molecular Identifiers
  11. Illumina | Benefits of Unique Molecular Identifiers.
  12. Horticulture Research | An optimized protocol for stepwise optimization of real-time RT-PCR analysis
  13. Nature Communications | Model-based analysis of sample index hopping reveals its widespread artifacts in multiplexed single-cell RNA-sequencing
  14. Journal of Clinical Virology | Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets from clinical samples
  15. STAR Protocols | Deconvolution of in vivo protein-RNA contacts using fractionated eCLIP-seq
  16. Transfusion Medicine and Hemotherapy | Bioinformatics Strategies, Challenges, and Opportunities for Next Generation Sequencing-Based HLA Genotyping