EvolutionDB Agent-Memory Benchmarks (v1)¶

run at 2026-04-28T08:00:55.049149Z

These numbers are produced by bench/run_all.py against a single EvolutionDB process on 127.0.0.1:9967 (default Docker compose). Latencies are wall-clock from the Python ctypes client — they include the EVO text protocol round-trip.

Latency (single process)¶

op	n	mean	p50	p95	p99	min	max
memory_put	200	2.724	2.329	5.221	8.454	1.099	13.050
memory_get	200	0.493	0.431	0.745	1.841	0.331	2.413
checkpoint_put	200	2.359	2.080	3.886	4.795	1.027	14.451
checkpoint_get_latest	200	0.769	0.765	0.830	0.872	0.688	0.927
memory_search_top10	200	3.527	3.495	3.813	4.173	3.266	4.792

units: milliseconds

Reactive delivery latency¶

op	n	mean	p50	p95	p99	min	max
push (NOTIFY)	150	0.254	0.247	0.308	0.362	0.204	0.633
poll @ 1000ms	150	470.122	431.990	928.760	986.340	8.080	993.700

units: milliseconds

Temporal-query latency¶

op	n	mean	p50	p95	p99	min	max
select live snapshot	150	2.027	1.923	2.959	3.939	1.601	5.732
select AS OF historical	150	0.495	0.489	0.573	0.595	0.423	0.605

units: milliseconds

LongMemEval (lexical fallback)¶

dataset: <built-in synthetic fixture>
questions: 2
ingested messages: 7
recall@10: 1.000 (2/2)
retrieval p50: 214.646 ms, p99: 206.949 ms

Notes¶

The reactive bench compares EvoSQL's NOTIFY push against a polling loop at the chosen interval — push p99 should land well under 10 ms while polling worst-case is bounded by the interval (1000 ms by default).
The LongMemEval row uses the built-in synthetic fixture by default. For the full public dataset run bench/longmemeval/run.py --dataset path/to/longmemeval.json and supply an embedding model via the embed_fn parameter.
Cross-vendor comparison rows (Zep / Mem0 / langgraph-store-mongodb / Pinecone) are deferred to v2 — those backends ship as separate Docker images and the runner will sweep them once the official compatibility test harness (Task 220) lands.