Skip to content

DML Inner-Loop Profile — Baseline

Phase 6.0 measurement. No code optimizations — this file captures the real per-helper cost so Phase 6.1 is driven by data instead of guesses.

Setup

  • Platform: macOS (Darwin 25.3.0), local server
  • Binary: feature/dml-profiling branch, debug build (-O0 -g)
  • Server: EVOSQL_PASSWORD=admin EVOSQL_DML_PROFILE=1 ./adaptor/evosql-server
  • Profile log: /tmp/evosql_dml_profile.log
  • Scenario per iteration (6 total, first 2 discarded as warm-up): sql CREATE TABLE bench_items_N (id INTEGER PRIMARY KEY, val VARCHAR(50)); INSERT INTO bench_items_N VALUES (1,'v1'), ..., (10000,'v10000'); -- 10000 rows DELETE FROM bench_items_N WHERE id <= 5000; -- 5000 rows UPDATE bench_items_N SET val = 'u' WHERE id <= 7000; -- 2000 rows DROP TABLE bench_items_N;
  • No triggers, no secondary indexes, no foreign keys referencing the table. Every trigger_*, sec_idx_*, fk_check hit below is a no-op catalog scan that returns 0 — pure overhead.

Profile-mode overhead

clock_gettime(CLOCK_MONOTONIC) wrappers add ~28 ms to DELETE 5K and ~15 ms to UPDATE 2K vs the off-mode wall-clock. Relative percentages below are the signal; subtract ~5 µs/row from absolute numbers to approximate real off-mode cost.

Wall-clock, last 4 runs, on: INS=1949 ms DEL=113 ms UPD=81 ms (profile mode) INS=~1570 DEL=~85 UPD=~65 (off mode, Phase 5 baseline)

DELETE — 5000 rows, 4-run average

loop_total = 109.16 ms avg_us = 21.83 per row

Helper calls total_ms avg_us %loop Notes
bt2_search 5000 27.75 5.55 25.4% hot — cursor already had rid
trigger_before 5000 25.28 5.06 23.2% no triggers exist — 100% catalog scan waste
trigger_after 5000 24.92 4.98 22.8% no triggers exist — 100% catalog scan waste
sec_idx_delete 5000 14.29 2.86 13.1% no sec indexes exist — 100% catalog scan waste
[unaccounted] - 6.19 - 5.7% loop control, frees, memset, returning check
vmap_clear 5000 2.42 0.48 2.2%
heap_set_xmax 5000 2.35 0.47 2.2% in-place tuple mark — surprisingly cheap
fk_check 5000 1.79 0.36 1.6% no FKs referencing — fast early return
lock_row_acquire 5000 1.45 0.29 1.3% Phase 1 overestimate — hash insert is cheap
tapi_heap_read 5000 0.90 0.18 0.8% Phase 1 overestimate — heap read is cheap
cg_check_write 5000 0.79 0.16 0.7%
tup_extract 5000 0.68 0.14 0.6%
mvcc_ensure_xid 5000 0.17 0.04 0.2% idempotent NOP after first call
conc_mod_log 5000 0.17 0.03 0.2% no concurrent index build active

Top 4 = 84.5% of loop_total

The top 4 helpers consume 84.5% of the loop, and three of them (trigger_before, trigger_after, sec_idx_delete = 59.1%) are pure no-op catalog lookups for features the table does not have.

UPDATE — 2000 rows, 4-run average

loop_total = 74.83 ms avg_us = 37.42 per row (apply_update_total = 99.8% of loop)

Helper calls total_ms avg_us %loop Notes
apply_update_total 2000 74.66 37.33 99.8% the work is all inside ApplyUpdateToRow
bt2_update 2000 16.31 8.16 21.8% PK tree points to new heap rid (unavoidable)
bt2_search 2000 15.72 7.86 21.0% redundant — cursor already had rid
trigger_before 2000 10.67 5.34 14.3% no triggers — 100% waste
trigger_after 2000 10.31 5.15 13.8% no triggers — 100% waste
vmap_clear 4000 1.57 0.39 2.1% called twice per row (old + new page)
heap_insert 2000 1.35 0.68 1.8% non-destructive UPDATE writes new tuple
heap_set_xmax 2000 0.92 0.46 1.2% soft-delete old version
lock_row_acquire 2000 0.41 0.21 0.6%
tapi_heap_read 2000 0.36 0.18 0.5%
cg_check_write 2000 0.33 0.16 0.4%
mvcc_ensure_xid 2000 0.06 0.03 0.1%

Top 5 = 70.7% of loop_total

Same pattern as DELETE: two no-op trigger scans (28.1%) + a redundant bt2_search (21.0%) dominate.

Estimate vs reality

Phase 1 code-reading estimates (from the pre-work exploration) vs the numbers measured here:

Helper Phase 1 guess (µs/row) Measured (µs/row) Verdict
bt2_search 2-3 5.55 underestimated — cursor save wins more than expected
tapi_heap_read 5-8 0.18 32-44× overestimate — page is hot in buffer pool
lock_row_acquire 3-5 0.29 10-17× overestimate — hash insert is fast
fk_check (no FK) 10-50 0.36 28-140× overestimate — early return already works
cg_check_write 1-2 0.16 6-12× overestimate
heap_set_xmax 3-5 0.47 6-10× overestimate — in-place, page is hot
vmap_clear 0.5-1 0.48 accurate
trigger_* (no trigger) ~0 4.98 wildly underestimated — no per-table cache, full scan every row
sec_idx_delete (no idx) ~0 2.86 wildly underestimated — same full scan waste

Lesson: the intuition "no triggers / no FKs / no indexes → free" was wrong. The early-return paths inside those helpers still pay the full catalog B+ tree scan to discover that there's nothing to do. Phase 6.1 should focus on caching presence flags on TableDesc, not on skipping locks or elimination of hot-path helpers that are already cheap.

Phase 6.1 recommendation (data-driven)

Rank by (avg_us × frequency) across DELETE + UPDATE:

Rank Helper Fix Expected save
1 trigger_before/after (no-trigger) Cache has_triggers flag on TableDesc at resolve time; skip evo_fire_triggers when 0 DELETE ~10 µs/row = 50 ms total
UPDATE ~10 µs/row = 20 ms total
2 bt2_search (fast-path redundancy) Save (key, rid) pairs during PK-range cursor walk; drop bt2_search from inner loop DELETE ~5.5 µs/row = 28 ms total
UPDATE ~7.9 µs/row = 16 ms total
3 sec_idx_delete (no-index) Cache has_secondary_indexes flag; skip delete_secondary_indexes when 0 DELETE ~2.9 µs/row = 14 ms total
4 vmap_clear per-page batching Sort match list by page_no, clear vmap once per page DELETE ~0.3 µs/row = 1.5 ms total
UPDATE ~0.2 µs/row = 0.4 ms total
5 heap_set_xmax per-page batching Group same-page soft-deletes → single pgm_read_page / pgm_write_page DELETE ~0.3 µs/row = 1.5 ms

Conservative total for #1-#3 alone (skipping #4/#5 as noise):

  • DELETE 5K: 85 ms → 85 − 50 − 28 − 14 = ~-7 ms … clamp to ~5 ms
  • UPDATE 2K: 65 ms → 65 − 20 − 16 = ~29 ms

1–#3 are all low-risk, schema-cache-only changes — no batch

lock, no deferred undo, no transaction-semantics risk. A single TableMetaFlags struct populated during tapi_resolve() + a (key, rid) pair in the fast-path collection list would knock DELETE under the <10 ms target and bring UPDATE within 5× of PG.

Trigger caching alone is the biggest-ROI single change and should be Phase 6.1. bt2_search elimination and secondary-index flag are Phase 6.2 / 6.3.

Files referenced

  • Delete.c inner loop: evolution/db/Delete.c:556-711
  • Update.c inner loop: evolution/db/Update.c:1414-1458
  • ApplyUpdateToRow: evolution/db/Update.c:331-1019
  • Trigger dispatch: evo_fire_triggers() — full catalog scan every call (see cat_list_triggers_for_table inside Trigger.c)
  • Secondary index loader: idx_load_secondary() — catalog B+ tree scan every call
  • PK range fast path (Phase 5): evolution/db/Delete.c:408-480, evolution/db/Update.c:1280-1402