iter6 Phase 2: A∪I merge + bug class identified
Phase 1 amended: scope merged (A: cap_pool resolution-change race + I: Firefox VIDIOC_QBUF EINVAL) after Phase 2 telemetry showed they're facets of the same buffer-pool / surface-recycle lifecycle weakness. Phase 2 findings: - Original "S_EXT_CTRLS fails on frame 1" was transient state, does not reproduce on iter6-DX diagnostic build. - Reproducible failure: OUTPUT VIDIOC_QBUF EINVAL after a varying number of successful frames (1, 19, 53 across three runs). - mpv-vaapi-copy clean — single-surface recycle pattern doesn't trigger the race; Firefox's multi-surface MediaSource pattern does. - DQBUF index-mismatch theory: ruled out. - Control payload divergence: ruled out (first 64 bytes byte-identical between mpv and Firefox). - Surviving hypothesis: request_fd lifecycle race — fd=30 reused on every frame after close, kernel-side request object may not release synchronously, next QBUF on REQUEST_FD=30 collides with stale state. Phase 4 leading approach: C — extend iter4's "drain before reuse" discipline from request_fd to OUTPUT pool slot. Mirror picture.c's cap_pool unbind-before-rebind pattern in the OUTPUT lifecycle. iter6-DX diagnostic build is local on ohm (/home/mfritsche/iter6-fork-dx). Diagnostics are not committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+16
-11
@@ -140,13 +140,17 @@ Same as iter5. Plus for new candidates:
|
||||
- For E (perf): `pidstat -u` for CPU%, Mali-G52 freq via `/sys/class/devfreq/fde60000.gpu`.
|
||||
- For F (MPEG-2): need an MPEG-2 fixture (`mpv --dump-stream` from a public DVD or transcode bbb to MPEG-2).
|
||||
|
||||
## In-scope (LOCKED 2026-05-05 for iteration 6) — I (Firefox VIDIOC_S_EXT_CTRLS / VIDIOC_QBUF EINVAL)
|
||||
## In-scope (LOCKED 2026-05-05 for iteration 6) — A∪I (cap_pool / OUTPUT-buffer-recycle lifecycle)
|
||||
|
||||
Operator locked candidate **I** (Firefox VIDIOC_QBUF EINVAL on first frame, enriched post-iter5-amend telemetry to also include S_EXT_CTRLS EINVAL).
|
||||
Operator locked candidate **I**, then **merged with candidate A** after Phase 2 telemetry (2026-05-05) showed they are facets of the same underlying bug:
|
||||
|
||||
Why I: deterministic repro on two consumers (bbb fixture + YouTube avc1), narrowest scope of any candidate, gates the only known consumer hard-failure under the post-amendment binary. Also: iter5-amend just unblocked Firefox's path; closing this completes the Firefox HW-decode story end-to-end.
|
||||
- **A (iter5 sonnet C4 carryover)**: cap_pool resolution-change race — REQBUFS-EBUSY when CAPTURE pool isn't drained before re-allocation. mpv libplacebo `--vo=gpu` triggers it via Vulkan-fallback resolution change.
|
||||
|
||||
Other candidates (A, B, C, D, E, F, G) deferred to iter7+. H (fourier-fresnel) remains separate top-level campaign.
|
||||
- **I (iter6 candidate)**: Firefox `VIDIOC_QBUF` EINVAL on OUTPUT after ~19 successful S_EXT_CTRLS calls. The original "S_EXT_CTRLS EINVAL on frame 1" framing was transient state; the reproducible failure is at OUTPUT-buffer requeue with rotating `source_index`. mpv-vaapi-copy (single-surface recycle) doesn't hit this; Firefox (multi-surface rotation through libva) does.
|
||||
|
||||
Why merge: both are buffer-pool / surface-recycle lifecycle issues at OUTPUT (and CAPTURE) drain ordering. Partial fixes risk just shifting the symptom. The intended Phase 4 fix is a single coherent rework of cap_pool + DQBUF/QBUF sequencing in the surface lifecycle.
|
||||
|
||||
Other candidates (B, C, D, E, F, G) deferred to iter7+. H (fourier-fresnel) remains separate top-level campaign.
|
||||
|
||||
## Out-of-scope (LOCKED 2026-05-05 for iteration 6)
|
||||
|
||||
@@ -156,16 +160,17 @@ Other candidates (A, B, C, D, E, F, G) deferred to iter7+. H (fourier-fresnel) r
|
||||
- New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
|
||||
- New target hardware (fresnel, ampere) — separate campaign (H above).
|
||||
|
||||
## Phase 1 success criterion (LOCKED 2026-05-05 for iteration 6)
|
||||
## Phase 1 success criterion (LOCKED 2026-05-05 for iteration 6, AMENDED 2026-05-05 for A∪I merge)
|
||||
|
||||
> Firefox 150 (iter5-amend, sandbox enabled, `LIBVA_DRIVER_NAME=v4l2_request`) plays a known-h264 fixture (`bbb_1080p30_h264.mp4`) for ≥30 seconds with HW decode actually engaged: `cap_pool_init` succeeds, **zero `Unable to set control(s)` and zero `Unable to queue buffer`** in driver stderr, `lsof /dev/video1` shows the Firefox Utility process holding the device throughout playback, frames advance, no SW fallback in `about:support`'s "Decoder Backend" fields.
|
||||
> All three consumer paths must be GREEN on the iter6 driver:
|
||||
>
|
||||
> Acceptance evidence (capture all three):
|
||||
> 1. Driver stderr lines: only the single per-process `cap_pool_init: 24 slots ready` log, no per-frame errors.
|
||||
> 2. `lsof /dev/video1` snapshot at t=15s into playback shows a Firefox process (PID parent or descendant of the launcher) with the device open.
|
||||
> 3. about:support's media decoder section names `vaapi`-or-equivalent for video/h264, not `ffvpx`.
|
||||
> 1. **Firefox** (iter5-amend binary, sandbox enabled, `LIBVA_DRIVER_NAME=v4l2_request`) plays `bbb_1080p30_h264.mp4` for ≥30 seconds with HW decode engaged: zero `Unable to queue buffer` / `Unable to set control(s)` per-frame, `lsof /dev/video1` shows the Firefox Utility process holding the device, frames advance.
|
||||
>
|
||||
> Phase 5 sonnet review must explicitly confirm that the fix is on the libva-side (or jointly libva + a Firefox-side patch), and that mpv-vaapi-copy 2000-frame test still GREEN (no regression introduced).
|
||||
> 2. **mpv libplacebo `--vo=gpu`** runs ≥30 seconds on the same fixture without segfault and without REQBUFS-EBUSY events at init or resolution-change boundaries (carries iter5 sonnet C4 caveat to closure).
|
||||
>
|
||||
> 3. **mpv `--hwdec=vaapi-copy`** (regression check) decodes 2000 frames clean, identical pattern to iter5-end driver baseline (sha `4bed52ec5d44b389…`).
|
||||
>
|
||||
> Phase 5 sonnet review must confirm: (a) fix is libva-side (not consumer-specific kludge), (b) all three consumer paths verified, (c) no new mutable global state introduced (Track E discipline).
|
||||
|
||||
## Phase 1 LOCKED. Iteration 6 proceeds.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user