Doorkeeper

“The Anatomy of an Order-Preserving k-mer Dictionary” by Giulio Ermanno PIBIRI

Wed, 02 Jul 2025 13:30 - 14:30 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission

Description

🗓 Date & Time

Wednesday, July 2

🕐 13:30 – 14:30 (JST)

📍 Venue

  • AIP Open Space (AIP members only)
  • Online via Zoom (link will be shared with registered participants)

🎤 Speaker

Giulio Ermanno PIBIRI

Ca’ Foscari University

🔗 GitHub Profile

🧠Organizer

Succinct Information Processing Team in RIKEN-AIP


🧪 Title

The Anatomy of an Order-Preserving k-mer Dictionary


🧬 Abstract

Efficiently storing and querying large collections of kmers while preserving their original order in sequences is a critical challenge in bioinformatics. Some approaches often optimize for membership queries but neglect the importance of maintaining the sequential arrangement of kmers, which is essential for applications like read mapping and genome assembly.

We will introduce an order-preserving kmer dictionary built upon three fundamental building blocks:

  • Spectrum-preserving string sets, which enable compact representation of kmer
      sets while maintaining the original spectrum.

  • (Random) minimizers, a class of randomized algorithms that sample representative
      kmers, allowing for space-efficient indexing and localized searches.

  • (Compressed) Minimal perfect hashing, providing collision-free and constant-time lookups in highly compressed space.

By integrating these components, our data structure efficiently encodes the order of kmers and allows fast exact membership, as well as streaming and navigational queries. This data structure is also at the basis for the indexing of colored de Bruijn graph that we will cover on July 4th.


About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community