[RIKEN AIP Achievement Report Event] CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis

Name: [RIKEN AIP Achievement Report Event] CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis
Start: 2025-07-03T04:00+09:00
End: 2025-07-04T08:00+09:00

2025/07/03(木) 04:00 〜 2025/07/04(金) 08:00

主催：RIKEN AIP Public

イベントに申し込む

📊 RIKEN AIP Achievement Report Event

CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis

🧠 Workshop Overview

This workshop highlights recent advancements in compressed data structures and AI, with a focus on real-world applications in data analysis. Topics include:

Advanced compressed data structures
Efficient algorithms
Cutting-edge ML models
Graph-based methodologies
Algorithmic techniques for real-world challenges

The event features talks by leading researchers with significant contributions in these fields.

🎯 Key Topics

Advanced Algorithmic Techniques
Compressed Data Structures
Machine Learning
Applications: Bioinformatics, Chemoinformatics, Trajectory Data Analyses

🏢 Organization

Succinct Information Processing Team, RIKEN-AIP

🗓️ Date

July 3rd and 4th

📍 Venue

In-person: RIKEN-AIP Nihonbashi office (Access) (please register via the Google form)
Virtual: Zoom

👥 Speakers

📅 Program Schedule

Day 1 – July 3rd

Time	Speaker	Title
13:00–13:50	Yasuo Tabei	Past Achievements and Future Directions in Succinct Information Processing
13:50–14:40	Takaaki Nishimoto	Optimal-Time Queries and Constructions of RLBWT in BWT-runs Bounded Space
14:40–15:30	Ben Langmead	The new move (infra)structure for pangenomics
15:30–16:00	—	☕ Coffee Break
16:00–16:50	Yoshitaka Yamamoto	Application and Limitation of Itemset-Frequency Sketches
16:50–17:40	Rossano Venturini (Online)	Fast KNN Retrieval for Dense and Sparse Learned Embedding

Day 2 – July 4th

Time	Speaker	Title
13:00–13:50	Simon J. Puglisi	Fast k-mer lookups with the Spectral Burrows-Wheeler transform
13:50–14:40	Giulio Ermanno Pibiri	Repetition-aware Compression and Query of Colored de Bruijn Graphs
14:40–15:10	—	☕ Coffee Break
15:10–16:00	Tomohiro I	Space-efficient SLP Encoding for O(log N)-time Random Access
16:00–16:50	Travis Gagie (Online)	Fun with MEMs

📄 Abstracts

Fun with MEMs

A maximal exact match (MEM) of a pattern P with respect to a text T is a substring of P occurring in T but that can't be extended in either direction while still occurring in T. Finding MEMs is an important step in many bioinformatics tasks, such as DNA read alignment, and taxonomic and metagenomic classification. Although there are theoretically superior options, most practitioners find MEMs with Li's forward-backward algorithm, which is used in several popular tools. The number of steps forward-backward takes is roughly proportional to the total length of the MEMs, however, which tends to be dominated by short MEMs that are likely due to noise.

Last year we noticed that a strategy similar to Boyer-Moore pattern matching, which we call Boyer-Moore-Li (BML), can be used to speed up forward-backward when looking only for long MEMs. Li incorporated BML into his tool ropebwt3 (over a weekend!) and found it often gives a tenfold speedup in practice. Continuing the Boyer-Moore philosophy of ignoring as much of the input as possible, we then introduced a k-mer filtration step: if a k-mer P[i..i + k - 1] does not occur in T then no MEM of length L > k can contain P[i..i + k - 1]; therefore, given k and a lower bound L > k on the lengths of the MEMs we want, we can use a Bloom filter of the distinct k-mers in T to break P into maximal substrings containing only k-mers accepted by the filter, which we call pseudo-MEMs, and then search only the pseudo-MEMs of length L or more. In fact, our experiments show it is usually enough to search only a few of the longest pseudo-MEMs, and this is much faster than searching all of P even with BML.

We are now working on mixing BML with two-level indexing. We build an rsync-like parse of T and index the parse with a run-length compressed suffix array (RLCSA) and T itself with an FM-index. Given P, we parse it and try as much as we can to run BML on the parses of P with respect to the parse of T, dropping into P and T themselves as little as we can. Matching phrase by phrase is faster than matching character by character, especially since the parse of T is over a large alphabet (the dictionary of distinct phrases) and that makes the RLCSA cache-efficient.

This is joint work with many people, including Hideo Bannai, Christina Boucher, Nate Brown, Lore Depuydt, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Marinella Sciortino and Mohsen Zakeri.

The new move (infra)structure for pangenomics

Pangenome indexing is an important tool for avoiding the problem of reference bias in sequencing data analysis. Compressed indexes like r-index are practical because they use space proportional to the amount of non-redundant sequence in a pangenome. Since the r-index appeared, it became the basis of new methods for (a) computing maximal exact matches, (b) performing multi-class and taxonomic classification, and (c) read alignment.

Nishimoto & Tabei recently proposed the "move structure" as an alternative to the r-index. The move structure has advantageous time and space complexity, achieving the same important O(r) space bound as the r-index. It also radically rearranges how the compressed index is laid out. Instead of being fragmented over many wavelet trees, bitvectors and lookup tables, the move structure simply consists of one table. A move-structure match query involves a predictable and small number of movements through the table. This leads to excellent locality of reference, yielding the best query speed of any full-text compressed index.

I will discuss how the move structure has become the basis for a new suite of practical pangenome methods, starting with my group's work on Movi. Movi is a move-structure-based compressed index for pangenome indexing. I will discuss many practical improvements we have made in Movi, including (a) integrating "thresholds" into the structure so that it can perform MONI-like queries, (b) minimizing the number of bytes used to represent each row of the structure using compression, (c) different strategies and rationales for "splitting" rows of the structure, (d) specific ways in which the move structure (and its notion of "splitting") makes it applicable to classification. I will also discuss how the move structure can continue to develop into a core pangenome method alongside r-index. Much of this work is joint with Travis Gagie and Christina Boucher.

キーワード
タグ
ツール	Discord Google Hangout Google Meet Remo Skype Slack Microsoft Teams Whereby YouTube Live Zoom
開催日
こだわり条件	人気のウェビナー終了のウェビナーを含む

[RIKEN AIP Achievement Report Event] CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis

主催：RIKEN AIP Public

📊 RIKEN AIP Achievement Report Event

CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis

🧠 Workshop Overview

🎯 Key Topics

🏢 Organization

🗓️ Date

📍 Venue

👥 Speakers

📅 Program Schedule

Day 1 – July 3rd

Day 2 – July 4th

📄 Abstracts

Fun with MEMs

The new move (infra)structure for pangenomics

似たイベント

【WACA関東支部主催】【増席】生成AI時代のサイトデータ分析手法 ― ウェブ解析士のための爆速改善講座＆フォローアップテスト対策講座＜ウェブ解析士・上級ウェブ解析士＞〜2026年も資格維持するために〜

主催：ウェブ解析士協会 WACA【公式】

ChatGPTとCANVAでサクッと作れる！初心者にもできる動画作成講座

主催：ビーラブカンパニー公式セミナー・イベント

企業法務最前線！リーガルテック/AIの現在地点～先進事例の学びをぎゅっと凝縮～

主催：マジセミ×ビジネス（デジタルとの新たな出会いと体験）

クラウドネイティブBI「Sigma」で実現する Snowflake データの徹底活用！

主催：マジセミ×データ活用（デジタルとの新たな出会いと体験）

UiPath Friends 九州～初めてでも大丈夫！話して学ぶ、自動化とAIの一歩～

主催：UiPath Friends [公式]

📊 RIKEN AIP Achievement Report Event

CompressedAI2025: Compressed Data Structures for Advanced Intelligent Data Analysis

🧠 Workshop Overview

🎯 Key Topics

🏢 Organization

🗓️ Date

📍 Venue

👥 Speakers

📅 Program Schedule

Day 1 – July 3rd

Day 2 – July 4th

📄 Abstracts

Fun with MEMs

The new move (infra)structure for pangenomics

似たイベント

【WACA関東支部主催】【増席】生成AI時代のサイトデータ分析手法 ― ウェブ解析士のための爆速改善講座＆フォローアップテスト対策講座＜ウェブ解析士・上級ウェブ解析士＞〜2026年も資格維持するために〜

主催：ウェブ解析士協会 WACA【公式】

ChatGPTとCANVAでサクッと作れる！初心者にもできる動画作成講座

主催：ビーラブカンパニー公式セミナー・イベント

企業法務 最前線！リーガルテック/AIの現在地点 ～先進事例の学びをぎゅっと凝縮～

主催：マジセミ×ビジネス（デジタルとの新たな出会いと体験）

クラウドネイティブBI「Sigma」で実現する Snowflake データの徹底活用！

主催：マジセミ×データ活用（デジタルとの新たな出会いと体験）

UiPath Friends 九州 ～初めてでも大丈夫！話して学ぶ、自動化とAIの一歩～

主催：UiPath Friends [公式]

企業法務最前線！リーガルテック/AIの現在地点～先進事例の学びをぎゅっと凝縮～

UiPath Friends 九州～初めてでも大丈夫！話して学ぶ、自動化とAIの一歩～