KyroBench

Measuring whether context systems stay correct when knowledge changes and defend against pollution and staleness.

Read the blog Paper

Leaderboard

Headline frontier score reports all-gates incident success. Retrieval and semantic signals are diagnostics.

Updated June 12, 20264 systems / 36,864 scored checks / 12,288 retrievals per run

Retrieval signal

Relevant evidence surfaced before certification checks.

0-100

KyroDB79.1

Graphiti/Zep71.6

Qdrant53.3

Mem00.06

0/4

Certified

KyroDB

Top retrieval

Graphiti/Zep

Fastest p95

Retrieval signal shows whether the system can surface plausible evidence. It is useful for diagnosis, but it does not certify the context because similar text can still be stale, polluted, or unverifiable.

KyroDB: Strong retrieval coverage and complete proof metadata, but not certified because the held-out semantic and freshness gates did not clear.

Graphiti/Zep: High retrieval signal and fast responses, but not certified without proof metadata and complete context-support behavior.

Qdrant: Useful as a raw vector-store baseline; it does not expose native freshness, authority, or proof semantics for certification.

Mem0: Strict requests were often blocked as stale, leaving very little completed retrieval signal in the official run.

System	Retrieval	Semantic	Proof	Freshness	Pollution	p95 latency
KyroDB	79.1	0.0	100.0	0.0	100.0	520 ms
Graphiti/Zep	71.6	0.0	0.0	69.4	100.0	168 ms
Qdrant	53.3	0.0	0.0	0.0	100.0	231 ms
Mem0	0.06	0.04	0.0	0.0	2.5	401 ms

What KyroBench measures

KyroBench tests the context layer an agent receives before it acts: whether evidence is current, in scope, clean, supported, and small enough to use.

Freshness

Newer valid evidence must beat old text that still matches the query.

Scope

Similar evidence from another scope is treated as incorrect context.

Pollution

Close but invalid content must not survive ranking or packing.

Proof

Returned context needs enough metadata for independent verification.

Multi-hop

The system must combine related evidence without losing constraints.