Skip to content

Part 5 — Push, not poll: durable subscriptions and CDC streaming

Series: Long-Term Memory in EvolutionDB — previous: Part 4, Temporal memory.

The standard way an agent finds out something happened in its memory is to ask. Every second, every five seconds, every minute. Anything new? Anything new? Anything new? Most of the time the answer is no, and the network round-trip plus the SQL execution plus the marshalling cost on both ends adds up to a steady tax with no benefit.

The product piece quoted a 2,900× speedup for push over polling. This article walks through where that number comes from: the in-memory LISTEN/NOTIFY primitive (Task 91), the durable subscription mailbox on top of it (Task 210), and the CDC streaming server that turns the WAL into an external event feed (Task 211).

The in-memory layer: LISTEN/NOTIFY

LISTEN/NOTIFY is the Postgres-shaped pub/sub primitive: a session calls LISTEN channel, another session calls NOTIFY channel, 'payload', and the first session receives the payload. EvolutionDB's implementation lives in evolution/db/Notify.c and adaptor/notify.c, split across the engine and the wire layer because the registry of listeners is per-connection.

There are three things worth knowing about how it's wired:

1. Delivery is commit-time, not statement-time. A NOTIFY inside a transaction does not deliver until the transaction commits. The payload is staged in the transaction's notify queue while the transaction is in progress; if the transaction rolls back, the notifications are dropped silently. This matches Postgres's semantics and means a listener never sees an event for a row that doesn't exist.

2. Per-transaction dedup. Two identical NOTIFY ch, 'p' calls in the same transaction produce one delivered event. This is by design — agents that re-run the same workflow step on retry shouldn't fan out to twice the listeners.

3. Output mutex per connection. Notifications are pushed onto the same TCP stream the connection is reading queries from, so the engine acquires a per-connection output mutex before writing. Otherwise an in-flight result row and a notification frame could interleave in the output buffer. (This is the bug that took the longest to diagnose during Task 91.)

The wire format on the PG protocol is the standard 'A' message — backend pid + channel name + payload — for compatibility with psql and every existing Postgres-aware client. On the EVO native protocol it's a single NOTIFY ch payload\n line, also frame-aligned with the output mutex.

The big drawback of in-memory LISTEN/NOTIFY is that it's exactly that — in-memory. A subscriber that disconnects between the publish and the next reconnect misses the message. This is fine for some workflows and catastrophic for others. Hence Task 210.

The durable layer: CREATE SUBSCRIPTION

A subscription is a named, persistent mailbox. The DDL is one statement:

sql CREATE SUBSCRIPTION agent_events FOR CHANNEL 'memory_updated';

That creates a backing table __sub_agent_events(seq, payload, ts) and registers the subscription in the catalog. From that moment, every NOTIFY 'memory_updated', '...' that commits appends one row to the backing table. The Subscription.h header summarises the contract:

Whenever a NOTIFY commits on the channel that subscription is bound to, the payload lands in the subscription's __sub_<name> backing table with an auto-incrementing seq. The consumer drains the mailbox with RESUME SUBSCRIPTION (returns rows in seq order) and acknowledges progress with ACK SUBSCRIPTION ... UPTO N. The auto-RECLAIM daemon prunes acked rows and TTL-expired ones.

Two DML verbs do the work:

sql RESUME SUBSCRIPTION agent_events; -- drain unacked rows ACK SUBSCRIPTION agent_events UPTO 1042; -- mark seq <= 1042 done

The publish side is subscription_publish, called from the NOTIFY commit path:

c /* Called from the NOTIFY commit path after the in-memory broadcast. * For each durable subscription bound to `channel`, append a row to * the backing __sub_<name> table with an auto-incrementing seq. */ void subscription_publish(const char *channel, const char *payload);

The order of operations matters. The in-memory broadcast fires first, so live subscribers see the event with no extra latency. The durable append happens next, in the same transaction as the NOTIFY itself — which means the durable mailbox is exactly as consistent as the underlying data change. If the publishing transaction rolls back, neither the live notification nor the durable row was ever sent.

The mailbox is a regular table

This is the design choice that pays for itself.

__sub_agent_events is a normal user-style heap with a primary-key B+ tree on seq. RESUME is a SELECT with an open cursor. ACK is an UPDATE that bumps the last_acked_seq column on the catalog row. RECLAIM is the same auto-RECLAIM daemon that handles MVCC garbage — it now knows about subscription mailboxes and prunes rows whose seq is below the catalog row's last_acked_seq.

That means every property of regular tables applies to subscription mailboxes for free:

  • They participate in WAL, so a crash mid-publish doesn't lose acknowledged-but-not-yet-flushed events.
  • They participate in TDE, so the payloads are encrypted at rest if the database is.
  • They participate in replication, so a hot standby has the same mailbox state as the primary.
  • They participate in FOR SYSTEM_TIME AS OF, so you can ask "what events were in the mailbox at 3:42 PM yesterday" — useful if the consumer was buggy and you need to replay.

We didn't have to write any of those. The mailbox is just rows.

CDC streaming: when the consumer isn't a SQL client

There's one more notch up. LISTEN/NOTIFY and durable subscriptions both assume the consumer holds a SQL connection. For consumers that don't — Kafka bridges, observability pipelines, agent runtimes in languages without a SQL driver — Task 211 adds a TCP/JSON-lines CDC server.

The DDL is:

sql CREATE CDC STREAM agent_cdc FOR TABLE __mem_user_memory;

The handshake is one TCP connection per consumer:

CDC SUBSCRIBE agent_cdc [FROM LSN 1042]

…and from then on, every row change to the named table arrives as one JSON line:

json {"op":"I","table":"__mem_user_memory","pk":"alptekinkey42", "new":{...},"lsn":4812,"ts":"2026-05-04T15:42:13Z"}

op is I, U, or D. The lsn is the WAL log-sequence number, which the consumer pins so a reconnect with FROM LSN N resumes from exactly where it left off. The ts is server-side wall-clock time, useful for MAX_LAG calculations on the consumer side.

The CDC server is opt-in (default disabled, enabled with --cdc-port 9970) because not every deployment needs it and the WAL decoder costs a little memory per active subscriber. When it is enabled, the underlying mechanism is the same WAL we already write for crash recovery — there is no separate replication log, no tail-of-table trigger, no double-write.

What "2,900×" actually measures

The launch blog quotes a measured difference of ~0.4 ms for NOTIFY push delivery versus ~990 ms for polling at 1 s intervals. The benchmark for both lives in bench/reactive/:

  • The push case publishes a row, the subscriber receives the notification on the same connection, the test measures the publish-to-receive wall-clock latency. Median is around 200 µs; p99 is around 400 µs.
  • The poll case sets a 1 s polling interval, publishes a row at a random offset within the interval, and measures publish-to-detect wall-clock latency. The expected value is half the interval (≈ 500 ms), and we report p99 instead, which is approximately the full interval.

The 2,900× ratio is 990 / 0.4. It's the wall-clock latency advantage, not a throughput claim. Throughput is dominated by what the consumer does with the event after receiving it, which is application-specific and not what the benchmark tries to measure.

The interesting takeaway isn't the headline number. It's that an agent which polls every second wakes up, on average, half a second late on every event. Over a long run, that's the difference between "feels like it's responding to me" and "feels like it's behind."

Tests

The push layer is exercised by:

  • tests/test_listen_notify.py (Task 91) — 20 cases, including per-tx dedup, savepoint scoping, output-mutex correctness under concurrent listeners.
  • tests/test_durable_subscription.py (Task 210) — RESUME ordering, ACK semantics, RECLAIM-prunes-acked, crash-during-publish recovery.
  • tests/test_cdc_stream.py (Task 211) — JSON-line framing, LSN resume, INSERT/UPDATE/DELETE op codes, multi-consumer fan-out.

What's next

Part 6 walks through the C SDK that ties all of this together — the opaque evo_conn_t, the EVO text protocol's surface in libevosql-memory, and how the same underlying calls are reached from Python ctypes, Go cgo, and Rust bindgen.

→ Part 6 — The C SDK and FFI (planned)