“The Anatomy of an Order-Preserving k-mer Dictionary” by Giulio Ermanno PIBIRI
🗓 Date & Time
Wednesday, July 2
🕐 13:30 – 14:30 (JST)
📍 Venue
- AIP Open Space (AIP members only)
- Online via Zoom (link will be shared with registered participants)
🎤 Speaker
Giulio Ermanno PIBIRI
Ca’ Foscari University
🔗 GitHub Profile
🧠Organizer
Succinct Information Processing Team in RIKEN-AIP
🧪 Title
The Anatomy of an Order-Preserving k-mer Dictionary
🧬 Abstract
Efficiently storing and querying large collections of kmers while preserving their original order in sequences is a critical challenge in bioinformatics. Some approaches often optimize for membership queries but neglect the importance of maintaining the sequential arrangement of kmers, which is essential for applications like read mapping and genome assembly.
We will introduce an order-preserving kmer dictionary built upon three fundamental building blocks:
Spectrum-preserving string sets, which enable compact representation of kmer
sets while maintaining the original spectrum.(Random) minimizers, a class of randomized algorithms that sample representative
kmers, allowing for space-efficient indexing and localized searches.(Compressed) Minimal perfect hashing, providing collision-free and constant-time lookups in highly compressed space.
By integrating these components, our data structure efficiently encodes the order of kmers and allows fast exact membership, as well as streaming and navigational queries. This data structure is also at the basis for the indexing of colored de Bruijn graph that we will cover on July 4th.