Skip to content

DML Profile — After Phase 6.3 Constraint Presence Cache

Follow-up to Phase 6.2. UPDATE was still dominated by apply_update_total (99.6% of loop), and inside that 37% of loop_total was invisible to the profile — ApplyUpdateToRow calls ReadCheckConstraintsWithNames + cat_list_constraints (FK) + cat_list_constraints (UNIQUE) once per row. Each of those catalog scans returns 0 rows for a table with no CHECK / FK / UNIQUE constraints, but still pays the prefix-scan cost.

Phase 6.3 probes all three once per statement in tapi_probe_constraints(), caches the flags on TableDesc, and guards the three call sites in ApplyUpdateToRow.

Wall-clock (off mode, 3 consecutive runs)

Op Phase 6.2 Phase 6.3 Δ from 6.2 vs pre-Phase-5 PG
INSERT 10K 1517 ms 1535 ms +1% +3% 1509 ms
DELETE 5K 12 ms 12 ms -88% 2 ms
UPDATE 2K 33 ms 24 ms -27% -67% 5 ms
TOTAL 1562 ms 1571 ms +0.6% -5.7% 1516 ms

Runs: INS=1516/1549/1541, DEL=12/12/12, UPD=24/24/24. Stable.

INSERT is essentially unchanged (Phase 6.3 doesn't touch the INSERT constraint paths — see "Deferred" below). DELETE is unchanged (no CHECK/FK/UNIQUE helpers on the DELETE path). The whole Phase 6.3 win is in UPDATE.

Profile mode — UPDATE 2000 rows (4-run avg)

Before (Phase 6.2): loop_total 29.39 ms avg 14.70 us/row apply_update_total 29.29 ms 99.6% bt2_update 14.41 ms 49.0% [unaccounted-inside] ~10.75 ms ~37% ← hidden catalog scans vmap_clear x 2 1.30 ms 4.4% heap_insert 1.16 ms 3.9% ...

After (Phase 6.3): loop_total 20.25 ms avg 10.12 us/row (-9.1 ms) apply_update_total 20.16 ms 99.6% bt2_update 14.05 ms 69.4% ← now dominant vmap_clear x 2 0.95 ms 4.7% heap_insert 0.93 ms 4.6% heap_set_xmax 0.53 ms 2.6% lock_row_acquire 0.46 ms 2.3% ...

The 9.1 ms saved corresponds directly to the three catalog scans per row (2000 rows × 3 scans × ~1.5 µs each ≈ 9 ms) that 6.3 now skips.

bt2_update is now 69.4% of loop_total. It's the PK B+ tree entry update required by non-destructive MVCC UPDATE (soft-delete old + heap insert new + bt2_update to point the PK entry at the new rid). This is structurally required and can't be eliminated without reworking the concurrency model.

DELETE — no change

DELETE's loop_total stays at ~10.8 ms, profile unchanged:

[unaccounted] 4.54 ms 42.8% heap_set_xmax 1.26 ms 11.9% fk_check 1.20 ms 11.3% vmap_clear 1.19 ms 11.2% cg_check_write 0.61 ms 5.8% tapi_heap_read 0.55 ms 5.2% lock_row_acquire 0.51 ms 4.8% tup_extract 0.50 ms 4.7%

The DELETE loop doesn't touch CHECK/UNIQUE/FK-local constraints — it only does fk_check (referencing FK lookup for cascade/restrict), which is already cheap at 0.24 µs/row.

Implementation notes

Why not tapi_resolve?

Original Phase 6.3 plan was to populate the flags inside tapi_resolve() alongside has_triggers / has_secondary_indexes. That attempt crashed the server because:

  1. Adding fields to TableDesc grew the struct. Without a make clean a partial rebuild left some .o files linking against the old struct layout while others used the new one — classic ODR violation causing stack-frame mismatches between callers and callees. Clean rebuild resolved it.
  2. Adding a cat_list_constraints call to every tapi_resolve affected hundreds of call sites (SELECT, DDL, DML helpers) — too broad for a performance-only change.

Phase 6.3 uses a separate tapi_probe_constraints(TableDesc *td) helper called explicitly from UpdateProcess after the initial resolve. Narrow blast radius: only the UPDATE hot path pays the probe cost, and only once per statement.

Column-level inline UNIQUE

VARCHAR(50) UNIQUE declared inline does not create a catalog 'U' constraint — it lives as ColumnDesc.is_unique. The probe only scans the catalog, so UpdateProcess manually OR-s in the inline-unique bit from the column descriptors before entering the Phase 2 loop:

c tapi_probe_constraints(&td); for (int ci = 0; ci < allNCols; ci++) { if (allCols[ci].is_unique) { td.has_unique_constraints = 1; break; } }

Deferred: INSERT constraint cache

INSERT's validate_types → validate_not_null → validate_unique → validate_check → validate_foreign_keys chain has the same per-row catalog scan pattern. An earlier attempt to gate those at the InsertProcess level broke validate_unique (which also consults column-level inline UNIQUE) and was reverted.

A safer approach is to push the presence check into each validate_* function after their own tapi_resolve — same as ApplyUpdateToRow's approach. Deferred because:

  1. INSERT 10K in the benchmark is already within 0.3% of PG (1535 vs 1509 ms). No gap to close.
  2. The validators have divergent code paths (inline is_unique, per-column NOT NULL flags, etc.) that need careful handling — higher risk than UPDATE's cleaner call sites.

Regression

All 13 test suites green:

Suite Result
test_delete_where.py 15/15
test_update_where.py 14/14
test_foreign_key.py 22/22
test_check.py 22/22 ← critical (has_check_constraints path)
test_unique.py 30/30 ← critical (has_unique_constraints path)
test_mvcc.py 10/10
test_pk_range_fastpath.py 19/19
test_triggers.py 17/17
test_index.py 24/24
test_multi_delete.py 10/10
test_multi_update.py 10/10
test_reclaim_mvcc.py 8/8
test_composite_constraints.py 29/29

test_check, test_unique, test_foreign_key, and test_composite_constraints prove the guards correctly switch to the full validation path when has_check/fk/unique=1.

Remaining opportunities

UPDATE at 24 ms is now 79% bt2_update (plus the stack below it). The remaining paths to push further:

  • bt2_update leaf_pno cache: carry the PK tree leaf page number from the fast-path cursor walk, skip find_leaf inside bt2_update. Estimated save: ~3 µs/row → UPDATE 24 → ~18 ms. Medium refactor.
  • HOT in-place UPDATE for fast path: no secondary indexes → no HOT eligibility check needed, insert new tuple on same page. Complex MVCC interaction. Deferred.
  • Inner profile: the ~5.4 µs/row [unaccounted-inside] inside apply_update_total after Phase 6.3 is smaller than before but still not measured. UpdateBuildBinaryRecord and UpdateExtractAllFields are the main suspects. Optional instrumentation pass if further gains are wanted.

Even without those, UPDATE at 24 ms is 4.8× PG — from 15× pre- Phase-5. TOTAL at 1571 ms is 3.6% above PG. Phase 6 diminishing returns curve has bent.