Indexing with Intention: Matching Indexing Strategies to Your Projects and Experiments

If you’ve spent any time preparing Illumina sequencing libraries in the last few years, you’ve probably heard a lot about index hopping, and perhaps even more about how unique dual indexing (UDI) is the only safe way forward. While UDI absolutely has its place, the reality is a bit more nuanced. For many common Illumina sequencing applications, combinatorial dual indexing (CDI) remains an appropriate and often more practical choice.

Let’s unpack why…

 

What is Index Hopping?

Index hopping refers to the misassignment of reads to the wrong sample due to index sequences swapping during clustering and sequencing on the flow cell. This phenomenon is most commonly associated with Illumina’s patterned flow cells and exclusion amplification chemistry, where free adapters or index primers can occasionally get incorporated into the wrong library molecules.

The key point that often gets lost in the conversation is the scale of the phenomenon: the actual rate of index hopping is typically quite low, often well under 0.5% and in many cases closer to 0.1 – 0.2% of the total reads sequenced.

 

When Would That Low Rate of Index Hopping Matter?

Index hopping becomes a real concern when you are trying to detect extremely rare signals, such as:

  • Low-frequency somatic variants
  • Ultra-low abundance transcripts
  • Rare microbial species in metagenomics
  • Minimal residual disease or contamination studies

In these scenarios, even a fraction of a percent of misassigned reads can blur the line between real signal and noise. This is where the use of unique dual indexing is the best course of action, because it allows hopping events to be identified and filtered bioinformatically with high confidence.

 

For Most Other Applications, CDI is More Than Sufficient

However, most sequencing experiments don’t live at the edge of detectability. For a wide range of standard Illumina applications, the low rate of index hopping simply does not meaningfully impact results. Examples include:

  • Standard whole genome sequencing
  • Genotyping-by-sequencing, skim-seq, and population-scale studies
  • Targeted panels with moderate allele frequencies
  • Bulk-RNA sequencing
  • Plasmid and amplicon sequencing

In these cases, a <0.5% hopping rate is typically drowned out by biological variation, sequencing depth, and statistical robustness. The signal you care about is orders of magnitude stronger than any misassigned background reads.

Put simply: if you’re not hunting for needles in a haystack, a few extra pieces of straw won’t change the picture.

 

Advantages of Combinatorial Indexing Strategies

A major advantage to combinatorial dual indexing is the ability to reach higher multiplexing levels without needing to screen for and validate thousands of unique index pairs. Indexes are notoriously fickle in terms of sequencing robustness, and some indexes just do not play well with Illumina chemistry. To reach high numbers of UDI combinations, you may need to screen and validate many thousands of indexes to find a group UDIs that play well together as a set. With CDI, the task of expanding index sets becomes much less onerous.

For high throughput cost sensitive applications like plasmid and amplicon sequencing, skim-seq, and population scale genotyping this ability to pool and sequence thousands of samples together using CDI approaches can lead to major cost and time savings. This is highly attractive for high scale and cost sensitive applications like genotyping-by-sequencing in agrigenomics and plasmid sequencing in synthetic biology.

 

The Bottom Line

The most important takeaway is this: indexing strategy should match the biological question being asked and support the scale of the work being done.

  • If your goal is robust, scalable sequencing for most common genomics applications, CDI is the most efficient and cost-effective option.
  • If your experiment depends on detecting rare events at or below 1% frequency, UDI is a smart investment.

While index hopping is real, it is a phenomenon that happens at a very low rate. For many Illumina sequencing applications, its impact is negligible and combinatorial dual indexing remains a reliable and practical choice. Unique dual indexing is a powerful tool when you need it, but it’s not a universal requirement.

As with most things in genomics, the best solution isn’t one-size-fits-all—it’s the one that fits your experiment.