iter7 Phase 1: lock A+B+C (msync verify + slot-leak fix + cap_pool harness)

Operator chose A+B+C — closes all three internal carry items from
iter5/iter6 in one iteration:
- A: msync pixel-correctness verification (iter5 sonnet C3)
- B: slot-leak error recovery (iter6 internal carry)
- C: probe-pattern test harness for cap_pool race (iter5 sonnet C4 /
  iter6 candidate A formal anchor)

Phase 1 success criteria locked per-track. Phase 5 sonnet review
mandatory before commit per CLAUDE.md user-global rule.

Execution order: B (smallest, additive) -> C (synthetic test, no
driver change) -> A (verification — runs against iter7-end driver
including any B/C changes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 05:55:11 +00:00
parent 6f03fb8baa
commit 04f84a827d
+34 -19
View File
@@ -138,9 +138,11 @@ Same as iter6. Plus for new candidates:
- For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`.
- For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`.
## In-scope (LOCKING DEFERRED — Phase 1 user input)
## In-scope (LOCKED 2026-05-06 for iteration 7) — A + B + C
To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope. H, I are separate top-level campaign decisions, not iter7 candidates.
Operator locked **A + B + C**: msync pixel-correctness verification, slot-leak error recovery, and cap_pool-race synthetic test harness. Closes all three iter5/iter6 internal carry items in one iteration.
D (upstreaming), E (perf binding cell), F (V4L2_MEMORY_DMABUF) deferred to iter8+. G (WiFi-IRQ frame drops) remains out-of-campaign-scope. H, I are separate top-level campaigns.
## Out-of-scope (LOCKED 2026-05-06 for iteration 7)
@@ -153,25 +155,38 @@ To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope.
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
## Phase 1 success criterion (will lock after user picks candidate)
## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 7)
Pre-lock template:
- For candidate A: "100-frame `vaapi-copy` produces frame hashes matching either FFmpeg SW baseline (preferred) or iter1 baseline (if msync-removal causes any divergence). If divergence, msync restored and verified."
- For candidate B: "Synthetic fault-injection (REINIT returns -EBUSY after N frames) demonstrates pool starvation pre-fix; post-fix demonstrates `request_pool_force_release` reclaims the slot and decode resumes."
- For candidate C: "Synthetic test program issues `vaCreateSurfaces(small)` then `vaCreateSurfaces(big)` then decodes bbb's first I-frame; driver stderr has zero REQBUFS-EBUSY events; output frame sha matches FFmpeg SW reference for that I-frame."
- For candidate D: "Mozilla Bugzilla bug filed with combined 160-line patch attached, references bug 1833354/1965646. Bootlin patch series prepared as a clean iter1-iter6 sequence on a separate branch, ready to send (no PR until operator OK)."
- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing + GPU freq on bbb_1080p30. Reproducible from documented script."
- For candidate F: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed OUTPUT buffers; race window architecturally closed."
> All three sub-tracks must independently pass on the iter7-end driver build:
>
> **A — msync pixel-correctness verification**
> - 100-frame `mpv --hwdec=vaapi-copy --o=output_%04d.yuv` against `bbb_1080p30_h264.mp4`.
> - Frame-by-frame sha256 of the captured YUV planes compared against FFmpeg SW decode reference (`ffmpeg -i bbb -frames:v 100 -f rawvideo -pix_fmt nv12 -`).
> - **Pass:** all 100 frames match SW reference byte-for-byte (or visually-identical with documented bit-precision delta if the kernel's NV12 packing differs trivially from FFmpeg's). Formally closes iter5 sonnet C3.
> - **Fail action:** restore `msync(MS_SYNC | MS_INVALIDATE)` in the surface DQBUF path; re-run; verify match. Document either way.
>
> **B — Slot-leak error recovery**
> - `request_pool_force_release(pool, slot_index)` added to request_pool.{c,h}; REINITs the slot's fd and clears `busy=true`.
> - Called from `RequestSyncSurface` error paths after `media_request_reinit` or `DQBUF` failure.
> - Synthetic fault-injection: a debug compile flag returns `-EBUSY` from REINIT after N frames. Pre-fix: pool starves after 16 errors. Post-fix: pool recovers; decode continues across error events.
> - mpv-vaapi-copy 100-frame regression test still GREEN (no regression on the happy path).
>
> **C — Probe-pattern test harness for cap_pool race**
> - C program at `tests/cap_pool_probe_pattern.c` (~50 lines) using libva: open device, `vaCreateContext`, `vaCreateSurfaces(128×128, 4)`, dispose, `vaCreateSurfaces(1920×1080, 4)`, decode bbb's first I-frame, sha256 the output.
> - **Pass:** zero `REQBUFS-EBUSY` events in driver stderr; decoded frame sha matches FFmpeg SW reference for the same I-frame; harness exits 0.
> - Formally anchors iter5 sonnet C4 / iter6 candidate A — the race that was organically exercised by YouTube's resolution renegotiations is now also covered by a deterministic synthetic test.
>
> Phase 5 sonnet review must explicitly confirm: (a) any restored msync (if A required it) is correctly placed, (b) `request_pool_force_release` doesn't introduce new mutable global state or break the pool's invariants, (c) the cap_pool harness is a real test (not just a fixture-hardcoded check that passes trivially).
## Stop point
## Phase 1 LOCKED. Iteration 7 proceeds.
**Phase 1 lock requires user input** — pick from A..F (and any pairing).
Recommended primary: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants upstream-filing or perf-anchor next.
Alternative leans:
- **D alone** if operator wants the upstream-prep iteration now
- **E alone** if perf measurement matters more than carryover-closure
- **F alone** if architectural cleanliness drives the next iteration
iter7 = A + B + C combined. Phases 2..8:
- Phase 2: situation analysis for each track (A/B/C) — what we expect to find, what tools needed, what could go wrong
- Phase 3: baseline anchor — capture pre-fix state for each (A: current frame hashes vs SW; B: current pool starvation under fault inject; C: current behavior on probe pattern)
- Phase 4: execute. Order: B (smallest, additive) → C (synthetic test, no driver change) → A (verification — runs against the iter7-end driver including any B/C changes)
- Phase 5: sonnet review of combined diff before commit
- Phase 6: deploy iter7 driver to ohm
- Phase 7: verify all three tracks against locked criteria above
- Phase 8: close
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."