Clique algorithm optimality in active clustering

Conjecture 5 · arXiv:2110.14521

arXiv Conjecture high confidence— first stated 2021-10-27

Authors: Quentin Lutz, Élie de Panafieu, Alex Scott, Maya Stein

Status open high confidence

Conjecture 5 of arXiv:2110.14521 asserts that among all active clustering (AC) algorithms, the clique algorithm achieves minimal average query complexity when items are assigned to classes independently with fixed probabilities p_1 >= p_2 >= ... >= p_k. The paper itself provides Theorem 6 as supporting evidence, showing the clique algorithm's asymptotic expected query count equals sum_{i=1}^k i*p_i*n. A search of citing literature (via Semantic Scholar) yields six citing papers as of May 2026, none of which address this conjecture; the conjecture appears to remain open.

Reviewer notes. Semantic Scholar lists 6 citing papers (as of May 2026): Bastide & Groenland 2025 (distance query reconstruction), A3S 2024 (arXiv:2407.10196, general active clustering), Bastide & Groenland 2023 (arXiv:2306.05979), a handwriting paper, a deep constrained clustering paper, and a lexicographic unranking paper. None of these appears to address Conjecture 5. The conjecture is recent (NeurIPS 2021) and highly specific to the probabilistic AC model; no follow-up was found in the indexed literature.

Auto-reviewed 2026-05-15 with claude-sonnet-4-6 (web search enabled).

Conjecture. For an $n$-set with a random partition with probabilities $p_1, p_2, \ldots, p_k$, the clique algorithm has minimal average complexity among all AC algorithms.

Context

In the model where each of $n$ items independently belongs to class $C_i$ with probability $p_i$ (with $p_1 \geq p_2 \geq \cdots \geq p_k$), the authors argue that an optimal algorithm should compare each new element first to the largest identified class. The clique algorithm does exactly this, comparing each new item to blocks in decreasing order of size. The conjecture is stated directly before Theorem 6, which characterises the asymptotic expected query count of the clique algorithm as $\sum_{i=1}^k i p_i n$, offered as supporting evidence.

Source paper

Active clustering for labeling training data
Quentin Lutz, Élie de Panafieu, Alex Scott, Maya Stein · 2021-10-27
https://arxiv.org/abs/2110.14521 PDF source