Skip to content

EvolutionDB Agent-Memory Benchmarks (v1)

run at 2026-04-28T08:00:55.049149Z

These numbers are produced by bench/run_all.py against a single EvolutionDB process on 127.0.0.1:9967 (default Docker compose). Latencies are wall-clock from the Python ctypes client — they include the EVO text protocol round-trip.

Latency (single process)

op n mean p50 p95 p99 min max
memory_put 200 2.724 2.329 5.221 8.454 1.099 13.050
memory_get 200 0.493 0.431 0.745 1.841 0.331 2.413
checkpoint_put 200 2.359 2.080 3.886 4.795 1.027 14.451
checkpoint_get_latest 200 0.769 0.765 0.830 0.872 0.688 0.927
memory_search_top10 200 3.527 3.495 3.813 4.173 3.266 4.792

units: milliseconds

Reactive delivery latency

op n mean p50 p95 p99 min max
push (NOTIFY) 150 0.254 0.247 0.308 0.362 0.204 0.633
poll @ 1000ms 150 470.122 431.990 928.760 986.340 8.080 993.700

units: milliseconds

Temporal-query latency

op n mean p50 p95 p99 min max
select live snapshot 150 2.027 1.923 2.959 3.939 1.601 5.732
select AS OF historical 150 0.495 0.489 0.573 0.595 0.423 0.605

units: milliseconds

LongMemEval (lexical fallback)

  • dataset: <built-in synthetic fixture>
  • questions: 2
  • ingested messages: 7
  • recall@10: 1.000 (2/2)
  • retrieval p50: 214.646 ms, p99: 206.949 ms

Notes

  • The reactive bench compares EvoSQL's NOTIFY push against a polling loop at the chosen interval — push p99 should land well under 10 ms while polling worst-case is bounded by the interval (1000 ms by default).
  • The LongMemEval row uses the built-in synthetic fixture by default. For the full public dataset run bench/longmemeval/run.py --dataset path/to/longmemeval.json and supply an embedding model via the embed_fn parameter.
  • Cross-vendor comparison rows (Zep / Mem0 / langgraph-store-mongodb / Pinecone) are deferred to v2 — those backends ship as separate Docker images and the runner will sweep them once the official compatibility test harness (Task 220) lands.