From 04f84a827dd7f6c2b361753c1c0630ce2f165d41 Mon Sep 17 00:00:00 2001 From: claude-noether Date: Wed, 6 May 2026 05:55:11 +0000 Subject: [PATCH] iter7 Phase 1: lock A+B+C (msync verify + slot-leak fix + cap_pool harness) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Operator chose A+B+C — closes all three internal carry items from iter5/iter6 in one iteration: - A: msync pixel-correctness verification (iter5 sonnet C3) - B: slot-leak error recovery (iter6 internal carry) - C: probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal anchor) Phase 1 success criteria locked per-track. Phase 5 sonnet review mandatory before commit per CLAUDE.md user-global rule. Execution order: B (smallest, additive) -> C (synthetic test, no driver change) -> A (verification — runs against iter7-end driver including any B/C changes). Co-Authored-By: Claude Opus 4.7 (1M context) --- phase0_findings_iter7.md | 53 ++++++++++++++++++++++++++-------------- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/phase0_findings_iter7.md b/phase0_findings_iter7.md index 3db2d04..8af072a 100644 --- a/phase0_findings_iter7.md +++ b/phase0_findings_iter7.md @@ -138,9 +138,11 @@ Same as iter6. Plus for new candidates: - For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`. - For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`. -## In-scope (LOCKING DEFERRED — Phase 1 user input) +## In-scope (LOCKED 2026-05-06 for iteration 7) — A + B + C -To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope. H, I are separate top-level campaign decisions, not iter7 candidates. +Operator locked **A + B + C**: msync pixel-correctness verification, slot-leak error recovery, and cap_pool-race synthetic test harness. Closes all three iter5/iter6 internal carry items in one iteration. + +D (upstreaming), E (perf binding cell), F (V4L2_MEMORY_DMABUF) deferred to iter8+. G (WiFi-IRQ frame drops) remains out-of-campaign-scope. H, I are separate top-level campaigns. ## Out-of-scope (LOCKED 2026-05-06 for iteration 7) @@ -153,25 +155,38 @@ To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope. - New target hardware (fresnel, ampere) — separate campaigns (H, I above). - WiFi-IRQ frame drops — system-level, not libva-multiplanar. -## Phase 1 success criterion (will lock after user picks candidate) +## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 7) -Pre-lock template: -- For candidate A: "100-frame `vaapi-copy` produces frame hashes matching either FFmpeg SW baseline (preferred) or iter1 baseline (if msync-removal causes any divergence). If divergence, msync restored and verified." -- For candidate B: "Synthetic fault-injection (REINIT returns -EBUSY after N frames) demonstrates pool starvation pre-fix; post-fix demonstrates `request_pool_force_release` reclaims the slot and decode resumes." -- For candidate C: "Synthetic test program issues `vaCreateSurfaces(small)` then `vaCreateSurfaces(big)` then decodes bbb's first I-frame; driver stderr has zero REQBUFS-EBUSY events; output frame sha matches FFmpeg SW reference for that I-frame." -- For candidate D: "Mozilla Bugzilla bug filed with combined 160-line patch attached, references bug 1833354/1965646. Bootlin patch series prepared as a clean iter1-iter6 sequence on a separate branch, ready to send (no PR until operator OK)." -- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing + GPU freq on bbb_1080p30. Reproducible from documented script." -- For candidate F: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed OUTPUT buffers; race window architecturally closed." +> All three sub-tracks must independently pass on the iter7-end driver build: +> +> **A — msync pixel-correctness verification** +> - 100-frame `mpv --hwdec=vaapi-copy --o=output_%04d.yuv` against `bbb_1080p30_h264.mp4`. +> - Frame-by-frame sha256 of the captured YUV planes compared against FFmpeg SW decode reference (`ffmpeg -i bbb -frames:v 100 -f rawvideo -pix_fmt nv12 -`). +> - **Pass:** all 100 frames match SW reference byte-for-byte (or visually-identical with documented bit-precision delta if the kernel's NV12 packing differs trivially from FFmpeg's). Formally closes iter5 sonnet C3. +> - **Fail action:** restore `msync(MS_SYNC | MS_INVALIDATE)` in the surface DQBUF path; re-run; verify match. Document either way. +> +> **B — Slot-leak error recovery** +> - `request_pool_force_release(pool, slot_index)` added to request_pool.{c,h}; REINITs the slot's fd and clears `busy=true`. +> - Called from `RequestSyncSurface` error paths after `media_request_reinit` or `DQBUF` failure. +> - Synthetic fault-injection: a debug compile flag returns `-EBUSY` from REINIT after N frames. Pre-fix: pool starves after 16 errors. Post-fix: pool recovers; decode continues across error events. +> - mpv-vaapi-copy 100-frame regression test still GREEN (no regression on the happy path). +> +> **C — Probe-pattern test harness for cap_pool race** +> - C program at `tests/cap_pool_probe_pattern.c` (~50 lines) using libva: open device, `vaCreateContext`, `vaCreateSurfaces(128×128, 4)`, dispose, `vaCreateSurfaces(1920×1080, 4)`, decode bbb's first I-frame, sha256 the output. +> - **Pass:** zero `REQBUFS-EBUSY` events in driver stderr; decoded frame sha matches FFmpeg SW reference for the same I-frame; harness exits 0. +> - Formally anchors iter5 sonnet C4 / iter6 candidate A — the race that was organically exercised by YouTube's resolution renegotiations is now also covered by a deterministic synthetic test. +> +> Phase 5 sonnet review must explicitly confirm: (a) any restored msync (if A required it) is correctly placed, (b) `request_pool_force_release` doesn't introduce new mutable global state or break the pool's invariants, (c) the cap_pool harness is a real test (not just a fixture-hardcoded check that passes trivially). -## Stop point +## Phase 1 LOCKED. Iteration 7 proceeds. -**Phase 1 lock requires user input** — pick from A..F (and any pairing). - -Recommended primary: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants upstream-filing or perf-anchor next. - -Alternative leans: -- **D alone** if operator wants the upstream-prep iteration now -- **E alone** if perf measurement matters more than carryover-closure -- **F alone** if architectural cleanliness drives the next iteration +iter7 = A + B + C combined. Phases 2..8: +- Phase 2: situation analysis for each track (A/B/C) — what we expect to find, what tools needed, what could go wrong +- Phase 3: baseline anchor — capture pre-fix state for each (A: current frame hashes vs SW; B: current pool starvation under fault inject; C: current behavior on probe pattern) +- Phase 4: execute. Order: B (smallest, additive) → C (synthetic test, no driver change) → A (verification — runs against the iter7-end driver including any B/C changes) +- Phase 5: sonnet review of combined diff before commit +- Phase 6: deploy iter7 driver to ohm +- Phase 7: verify all three tracks against locked criteria above +- Phase 8: close After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."