Clique algorithm optimality in active clustering
Conjecture 5 · arXiv:2110.14521
Status open high confidence
Conjecture 5 of arXiv:2110.14521 asserts that among all active clustering (AC) algorithms, the clique algorithm achieves minimal average query complexity when items are assigned to classes independently with fixed probabilities p_1 >= p_2 >= ... >= p_k. The paper itself provides Theorem 6 as supporting evidence, showing the clique algorithm's asymptotic expected query count equals sum_{i=1}^k i*p_i*n. A search of citing literature (via Semantic Scholar) yields six citing papers as of May 2026, none of which address this conjecture; the conjecture appears to remain open.
Reviewer notes. Semantic Scholar lists 6 citing papers (as of May 2026): Bastide & Groenland 2025 (distance query reconstruction), A3S 2024 (arXiv:2407.10196, general active clustering), Bastide & Groenland 2023 (arXiv:2306.05979), a handwriting paper, a deep constrained clustering paper, and a lexicographic unranking paper. None of these appears to address Conjecture 5. The conjecture is recent (NeurIPS 2021) and highly specific to the probabilistic AC model; no follow-up was found in the indexed literature.
Context
In the model where each of $n$ items independently belongs to class $C_i$ with probability $p_i$ (with $p_1 \geq p_2 \geq \cdots \geq p_k$), the authors argue that an optimal algorithm should compare each new element first to the largest identified class. The clique algorithm does exactly this, comparing each new item to blocks in decreasing order of size. The conjecture is stated directly before Theorem 6, which characterises the asymptotic expected query count of the clique algorithm as $\sum_{i=1}^k i p_i n$, offered as supporting evidence.
Source paper
Active clustering for labeling training data
Quentin Lutz, Élie de Panafieu, Alex Scott, Maya Stein · 2021-10-27
https://arxiv.org/abs/2110.14521
PDF source