EvolutionDB Agent-Memory Benchmarks (v1)¶
run at 2026-04-28T08:00:55.049149Z
These numbers are produced by bench/run_all.py against a single EvolutionDB process on 127.0.0.1:9967 (default Docker compose). Latencies are wall-clock from the Python ctypes client — they include the EVO text protocol round-trip.
Latency (single process)¶
| op | n | mean | p50 | p95 | p99 | min | max |
|---|---|---|---|---|---|---|---|
| memory_put | 200 | 2.724 | 2.329 | 5.221 | 8.454 | 1.099 | 13.050 |
| memory_get | 200 | 0.493 | 0.431 | 0.745 | 1.841 | 0.331 | 2.413 |
| checkpoint_put | 200 | 2.359 | 2.080 | 3.886 | 4.795 | 1.027 | 14.451 |
| checkpoint_get_latest | 200 | 0.769 | 0.765 | 0.830 | 0.872 | 0.688 | 0.927 |
| memory_search_top10 | 200 | 3.527 | 3.495 | 3.813 | 4.173 | 3.266 | 4.792 |
units: milliseconds
Reactive delivery latency¶
| op | n | mean | p50 | p95 | p99 | min | max |
|---|---|---|---|---|---|---|---|
| push (NOTIFY) | 150 | 0.254 | 0.247 | 0.308 | 0.362 | 0.204 | 0.633 |
| poll @ 1000ms | 150 | 470.122 | 431.990 | 928.760 | 986.340 | 8.080 | 993.700 |
units: milliseconds
Temporal-query latency¶
| op | n | mean | p50 | p95 | p99 | min | max |
|---|---|---|---|---|---|---|---|
| select live snapshot | 150 | 2.027 | 1.923 | 2.959 | 3.939 | 1.601 | 5.732 |
| select AS OF historical | 150 | 0.495 | 0.489 | 0.573 | 0.595 | 0.423 | 0.605 |
units: milliseconds
LongMemEval (lexical fallback)¶
- dataset:
<built-in synthetic fixture> - questions: 2
- ingested messages: 7
- recall@10: 1.000 (2/2)
- retrieval p50: 214.646 ms, p99: 206.949 ms
Notes¶
- The reactive bench compares EvoSQL's NOTIFY push against a polling loop at the chosen interval — push p99 should land well under 10 ms while polling worst-case is bounded by the interval (1000 ms by default).
- The LongMemEval row uses the built-in synthetic fixture by default. For the full public dataset run
bench/longmemeval/run.py --dataset path/to/longmemeval.jsonand supply an embedding model via theembed_fnparameter. - Cross-vendor comparison rows (Zep / Mem0 / langgraph-store-mongodb / Pinecone) are deferred to v2 — those backends ship as separate Docker images and the runner will sweep them once the official compatibility test harness (Task 220) lands.