DML Profile — After Phase 6.3 Constraint Presence Cache¶
Follow-up to Phase 6.2. UPDATE was still dominated by apply_update_total
(99.6% of loop), and inside that 37% of loop_total was invisible
to the profile — ApplyUpdateToRow calls ReadCheckConstraintsWithNames
+ cat_list_constraints (FK) + cat_list_constraints (UNIQUE) once
per row. Each of those catalog scans returns 0 rows for a table with
no CHECK / FK / UNIQUE constraints, but still pays the prefix-scan
cost.
Phase 6.3 probes all three once per statement in
tapi_probe_constraints(), caches the flags on TableDesc, and
guards the three call sites in ApplyUpdateToRow.
Wall-clock (off mode, 3 consecutive runs)¶
| Op | Phase 6.2 | Phase 6.3 | Δ from 6.2 | vs pre-Phase-5 | PG |
|---|---|---|---|---|---|
| INSERT 10K | 1517 ms | 1535 ms | +1% | +3% | 1509 ms |
| DELETE 5K | 12 ms | 12 ms | — | -88% | 2 ms |
| UPDATE 2K | 33 ms | 24 ms | -27% | -67% | 5 ms |
| TOTAL | 1562 ms | 1571 ms | +0.6% | -5.7% | 1516 ms |
Runs: INS=1516/1549/1541, DEL=12/12/12, UPD=24/24/24. Stable.
INSERT is essentially unchanged (Phase 6.3 doesn't touch the INSERT constraint paths — see "Deferred" below). DELETE is unchanged (no CHECK/FK/UNIQUE helpers on the DELETE path). The whole Phase 6.3 win is in UPDATE.
Profile mode — UPDATE 2000 rows (4-run avg)¶
Before (Phase 6.2):
loop_total 29.39 ms avg 14.70 us/row
apply_update_total 29.29 ms 99.6%
bt2_update 14.41 ms 49.0%
[unaccounted-inside] ~10.75 ms ~37% ← hidden catalog scans
vmap_clear x 2 1.30 ms 4.4%
heap_insert 1.16 ms 3.9%
...
After (Phase 6.3):
loop_total 20.25 ms avg 10.12 us/row (-9.1 ms)
apply_update_total 20.16 ms 99.6%
bt2_update 14.05 ms 69.4% ← now dominant
vmap_clear x 2 0.95 ms 4.7%
heap_insert 0.93 ms 4.6%
heap_set_xmax 0.53 ms 2.6%
lock_row_acquire 0.46 ms 2.3%
...
The 9.1 ms saved corresponds directly to the three catalog scans per row (2000 rows × 3 scans × ~1.5 µs each ≈ 9 ms) that 6.3 now skips.
bt2_update is now 69.4% of loop_total. It's the PK B+ tree
entry update required by non-destructive MVCC UPDATE (soft-delete old
+ heap insert new + bt2_update to point the PK entry at the new rid).
This is structurally required and can't be eliminated without reworking
the concurrency model.
DELETE — no change¶
DELETE's loop_total stays at ~10.8 ms, profile unchanged:
[unaccounted] 4.54 ms 42.8%
heap_set_xmax 1.26 ms 11.9%
fk_check 1.20 ms 11.3%
vmap_clear 1.19 ms 11.2%
cg_check_write 0.61 ms 5.8%
tapi_heap_read 0.55 ms 5.2%
lock_row_acquire 0.51 ms 4.8%
tup_extract 0.50 ms 4.7%
The DELETE loop doesn't touch CHECK/UNIQUE/FK-local constraints — it
only does fk_check (referencing FK lookup for cascade/restrict),
which is already cheap at 0.24 µs/row.
Implementation notes¶
Why not tapi_resolve?¶
Original Phase 6.3 plan was to populate the flags inside tapi_resolve()
alongside has_triggers / has_secondary_indexes. That attempt
crashed the server because:
- Adding fields to
TableDescgrew the struct. Without amake cleana partial rebuild left some .o files linking against the old struct layout while others used the new one — classic ODR violation causing stack-frame mismatches between callers and callees. Clean rebuild resolved it. - Adding a
cat_list_constraintscall to everytapi_resolveaffected hundreds of call sites (SELECT, DDL, DML helpers) — too broad for a performance-only change.
Phase 6.3 uses a separate tapi_probe_constraints(TableDesc *td)
helper called explicitly from UpdateProcess after the initial
resolve. Narrow blast radius: only the UPDATE hot path pays the
probe cost, and only once per statement.
Column-level inline UNIQUE¶
VARCHAR(50) UNIQUE declared inline does not create a catalog 'U'
constraint — it lives as ColumnDesc.is_unique. The probe only
scans the catalog, so UpdateProcess manually OR-s in the
inline-unique bit from the column descriptors before entering the
Phase 2 loop:
c
tapi_probe_constraints(&td);
for (int ci = 0; ci < allNCols; ci++) {
if (allCols[ci].is_unique) { td.has_unique_constraints = 1; break; }
}
Deferred: INSERT constraint cache¶
INSERT's validate_types → validate_not_null → validate_unique →
validate_check → validate_foreign_keys chain has the same per-row
catalog scan pattern. An earlier attempt to gate those at the
InsertProcess level broke validate_unique (which also consults
column-level inline UNIQUE) and was reverted.
A safer approach is to push the presence check into each
validate_* function after their own tapi_resolve — same as
ApplyUpdateToRow's approach. Deferred because:
- INSERT 10K in the benchmark is already within 0.3% of PG (1535 vs 1509 ms). No gap to close.
- The validators have divergent code paths (inline is_unique, per-column NOT NULL flags, etc.) that need careful handling — higher risk than UPDATE's cleaner call sites.
Regression¶
All 13 test suites green:
| Suite | Result |
|---|---|
| test_delete_where.py | 15/15 |
| test_update_where.py | 14/14 |
| test_foreign_key.py | 22/22 |
| test_check.py | 22/22 ← critical (has_check_constraints path) |
| test_unique.py | 30/30 ← critical (has_unique_constraints path) |
| test_mvcc.py | 10/10 |
| test_pk_range_fastpath.py | 19/19 |
| test_triggers.py | 17/17 |
| test_index.py | 24/24 |
| test_multi_delete.py | 10/10 |
| test_multi_update.py | 10/10 |
| test_reclaim_mvcc.py | 8/8 |
| test_composite_constraints.py | 29/29 |
test_check, test_unique, test_foreign_key, and
test_composite_constraints prove the guards correctly switch to
the full validation path when has_check/fk/unique=1.
Remaining opportunities¶
UPDATE at 24 ms is now 79% bt2_update (plus the stack below it). The remaining paths to push further:
- bt2_update leaf_pno cache: carry the PK tree leaf page number
from the fast-path cursor walk, skip
find_leafinsidebt2_update. Estimated save: ~3 µs/row → UPDATE 24 → ~18 ms. Medium refactor. - HOT in-place UPDATE for fast path: no secondary indexes → no HOT eligibility check needed, insert new tuple on same page. Complex MVCC interaction. Deferred.
- Inner profile: the ~5.4 µs/row
[unaccounted-inside]insideapply_update_totalafter Phase 6.3 is smaller than before but still not measured.UpdateBuildBinaryRecordandUpdateExtractAllFieldsare the main suspects. Optional instrumentation pass if further gains are wanted.
Even without those, UPDATE at 24 ms is 4.8× PG — from 15× pre- Phase-5. TOTAL at 1571 ms is 3.6% above PG. Phase 6 diminishing returns curve has bent.