Part 4 — Temporal memory: `FOR SYSTEM_TIME AS OF` on top of MVCC¶

Series: Long-Term Memory in EvolutionDB — previous: Part 3, Vectors and HNSW.

The hardest invariant for an agent-memory system to hold is what did I believe at this point in time? Agents act on the memory they have when they decide to act. If by the time you debug the decision the memory has changed, the audit trail is broken. The standard fix is to write append-only audit logs alongside the live tables — and immediately discover that joining the audit log back to the live table at a given timestamp is its own software project.

EvolutionDB takes a different route, and it costs us almost nothing because we already had the parts. Every row in every table is already versioned by the storage engine for transactional reasons. We just had to expose that versioning at the SQL surface.

What `FOR SYSTEM_TIME` looks like¶

The grammar is borrowed from the SQL:2011 temporal extension:

sql SELECT * FROM user_memory FOR SYSTEM_TIME AS OF TRANSACTION 4217 WHERE mem_namespace = 'alptekin';

There's also a timestamp variant — AS OF TIMESTAMP '2026-03-15 12:00' — which the engine resolves to the highest committed XID at or before that wall-clock time. The XID variant is the primitive; the timestamp variant is sugar.

Either way, the result is the rows as they were when transaction 4217 committed. New rows that landed after are invisible. Rows that existed at the time but were later deleted are still present. Rows that have been updated show their old values.

How it works — and what's actually new¶

There is genuinely nothing new on the storage side. The storage engine has tracked xmin and xmax on every row since Task 37 (the MVCC roll-out), originally for snapshot isolation: when a transaction runs at READ COMMITTED or REPEATABLE READ, the visibility predicate checks the row's xmin/xmax against the active snapshot and decides whether to show it.

AS OF is the same machinery with a synthetic snapshot. The relevant function in mvcc.c is six lines of substance:

c void mvcc_snapshot_at_xid(uint32_t at_xid, Snapshot *out) { memset(out, 0, sizeof(Snapshot)); if (at_xid == 0) { mvcc_snapshot_take(out); return; } out->xmin = at_xid + 1; /* nothing in-progress in the past view */ out->xmax = at_xid + 1; /* anything >= xmax is "future" → invisible */ out->nactive = 0; out->my_xid = 0; out->snapshot_csn = clog_get_csn(at_xid); }

The function builds a fake snapshot positioned right after at_xid committed. The xmin = xmax = at_xid + 1 setup is the trick: with no active transactions in the past view (nactive = 0), the visibility check reduces to "was this row's xmin ≤ at_xid AND was its xmax either zero or > at_xid". That's exactly what was visible right after at_xid committed, by definition.

The engine plumbs this through the parser via a tiny side-channel:

c void SetAsOfXid(uint32_t at_xid) { if (g_qctx) g_qctx->asof_xid = at_xid; }

The grammar action calls SetAsOfXid while parsing the FOR SYSTEM_TIME clause; query_executor reads g_qctx->asof_xid when constructing the snapshot for the SELECT, swaps in mvcc_snapshot_at_xid instead of mvcc_snapshot_take, runs the query normally, then resets the field so the next statement starts fresh.

There is no additional disk read, no separate history table, no shadow join. The rows the visibility predicate filters out for the live query are the same ones it filters in for the temporal query. The cost is one branch in the visibility check.

Why CSN matters¶

Earlier versions of EvolutionDB's MVCC visibility check walked an active transaction list every time. With a few thousand TPS that list got long, the visibility check turned into a hotspot, and the sequential scan it dominated would creep up to dozens of microseconds per row.

Task 37's CSN (Commit Sequence Number) layer fixed that. Each commit gets a monotonically-increasing CSN; the snapshot records the highest CSN at the moment it was taken; visibility for committed transactions becomes one CLOG lookup and one comparison:

"If the row's xmin's CSN ≤ snapshot.csn, it's visible without scanning the active list."

The active list is only consulted for transactions that aren't yet known to the CSN cache — typically the few hundred that committed in the last second. Old history is checked in O(1).

For temporal queries, this is what makes AS OF cheap. The synthetic snapshot at XID at_xid carries the CSN of at_xid's commit:

c out->snapshot_csn = clog_get_csn(at_xid);

…and from there every row visibility check is O(1). A FOR SYSTEM_TIME AS OF TRANSACTION N over a 100k-row memory store runs at the same speed as the equivalent live query. We've measured this in the temporal benchmark suite (bench/temporal/) and it holds within a few percent.

Retention is the real cost¶

The temporal feature is free at query time, but it's not free at storage time. Every UPDATE leaves an old version on disk; every DELETE turns a row into a tombstone but doesn't reclaim the bytes. If nothing ever cleans those up, the database fills the disk.

The engine has two mechanisms working against this:

1. Auto-RECLAIM (Task 27 + GAP-7) walks tables in the background and removes versions that no live snapshot can possibly need. The "cannot possibly need" predicate is xmin_horizon — the lowest XID held open by any active transaction. Anything older and retired and not pinned by SET SYSTEM_TIME_RETENTION is fair game.

2. SET SYSTEM_TIME_RETENTION = '30 days' (Task 209) explicitly holds versions back from RECLAIM for the configured window. The default is seven days; the operator knob is the env var EVOSQL_SYSTEM_TIME_RETENTION_DAYS plus the SQL setter. Beyond the window, RECLAIM treats the version as collectible and AS OF queries returning a timestamp older than the window will return an error (02000 — no_data, retention window expired) rather than silently returning a partial truth.

The retention knob is the operator's lever. Tighten it for disk budgets, loosen it for compliance audits, set it explicitly per memory store if some classes of memory matter more than others.

What this gets you in practice¶

Three things, in increasing order of how much they matter to me.

Audit, the boring one. "What did the agent know about user X at 3:42 PM yesterday when it triggered the refund?" becomes a single query, no separate audit infrastructure required:

sql SELECT mem_key, mem_value FROM user_memory FOR SYSTEM_TIME AS OF TIMESTAMP '2026-05-04 15:42:00' WHERE mem_namespace = 'user-x';

Diff, the interesting one. Comparing memory between two points in time is a self-join across two AS OF snapshots:

sql SELECT n.mem_key, n.mem_value AS new_value, o.mem_value AS old_value FROM user_memory FOR SYSTEM_TIME AS OF TRANSACTION 4217 o FULL JOIN user_memory FOR SYSTEM_TIME AS OF TRANSACTION 4612 n ON n.mem_key = o.mem_key WHERE n.mem_value IS DISTINCT FROM o.mem_value;

That returns every fact that changed (or was added or removed) between those two transactions. It's a memory diff. We use it during testing — point-in-time inspection of how the agent's beliefs evolved across a multi-step run is invaluable when something goes wrong four steps in.

Replay, the one I care most about. A failed agent run can be re-driven against the memory state it actually saw, rather than the current state. You take the timestamp of the failure, run the agent in dry-run mode with SET TRANSACTION SNAPSHOT pointing at that timestamp, and the model sees exactly what it saw at the time — including the absence of facts that arrived later. This is the closest thing to a debugger an agent can have.

The internal demo for this lives in examples/agent_replay/. It re-drives a recorded ReAct loop against a MEMORY STORE at three different timestamps and shows how the model's plan diverges based on what it knew when. It's also a depressingly effective way to show why the standard "log everything to a file" approach is insufficient.

Limits we know about¶

The temporal layer is built on top of MVCC, so its limits are MVCC's limits.

No AS OF FUTURE. XIDs in the future haven't happened yet. The parser rejects them at compile time.

The retention window is the floor. A query against a timestamp older than the configured retention returns 02000 rather than a partial result. We chose the explicit error over silent truncation because agents that act on truncated memory are worse than agents that fail loudly.

Replication is best-effort for AS OF. Replicas catch up asynchronously by default; an AS OF query on a lagging replica may not yet have the version you're asking for. The engine returns 08006 — connection_failure_in_doubt until the replay catches up to your requested XID. Synchronous-commit setups don't see this, but they trade latency for the guarantee.

TDE doesn't change the story. Encrypted-at-rest pages are decrypted into the buffer pool transparently, and the visibility predicate operates on the decrypted in-memory view. AS OF works on a TDE-enabled database with no changes to the SQL or the operator experience.

What's next¶

Part 5 turns to push: how LISTEN/NOTIFY was extended into a durable subscription queue (Task 210), how the CDC streaming server (Task 211) shapes a row-level change feed for downstream consumers, and what the __sub_<name> mailbox looks like under the hood.

→ Part 5 — Push, not poll (planned)

Part 4 — Temporal memory: FOR SYSTEM_TIME AS OF on top of MVCC¶

What FOR SYSTEM_TIME looks like¶