Files
libva-multiplanar/phase0_findings_iter6.md
claude-noether 793409b960 iter6 Phase 2: A∪I merge + bug class identified
Phase 1 amended: scope merged (A: cap_pool resolution-change race
+ I: Firefox VIDIOC_QBUF EINVAL) after Phase 2 telemetry showed
they're facets of the same buffer-pool / surface-recycle lifecycle
weakness.

Phase 2 findings:
- Original "S_EXT_CTRLS fails on frame 1" was transient state, does
  not reproduce on iter6-DX diagnostic build.
- Reproducible failure: OUTPUT VIDIOC_QBUF EINVAL after a varying
  number of successful frames (1, 19, 53 across three runs).
- mpv-vaapi-copy clean — single-surface recycle pattern doesn't
  trigger the race; Firefox's multi-surface MediaSource pattern does.
- DQBUF index-mismatch theory: ruled out.
- Control payload divergence: ruled out (first 64 bytes byte-identical
  between mpv and Firefox).
- Surviving hypothesis: request_fd lifecycle race — fd=30 reused on
  every frame after close, kernel-side request object may not release
  synchronously, next QBUF on REQUEST_FD=30 collides with stale state.

Phase 4 leading approach: C — extend iter4's "drain before reuse"
discipline from request_fd to OUTPUT pool slot. Mirror picture.c's
cap_pool unbind-before-rebind pattern in the OUTPUT lifecycle.

iter6-DX diagnostic build is local on ohm (/home/mfritsche/iter6-fork-dx).
Diagnostics are not committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:37:31 +00:00

187 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 6 — Phase 0 (substrate / motivation / inventory)
Opens 2026-05-05 immediately after iter5 close (`phase8_iteration5_close.md`, fork commit `c8b6ede`, campaign close `8e6d9e6`).
## Predecessor close-out summary (iteration 5 → iteration 6)
iter5 was a four-track iteration: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context safety). All four closed GREEN. Driver source clean (1 v4l2-request log line per 2000-frame stress, was ~30+ lines/frame pre-iter5). Process-global mutable state eliminated. Firefox-fourier 150.0.1-1.1 deployed (169 MB libxul, 21× smaller than iter3 PGO-instrumented).
The campaign's original substrate question — "make multi-planar libva work end-to-end on Rockchip hantro for production VAAPI consumers" — is empirically achieved at the libva-side decode layer. iter6 is about closing the remaining quality + verification gates and (gated on operator instruction) the upstream prep work.
## Iteration 6 candidate research questions
### A. Cap_pool resolution-change race (carried from iter5 Phase 5 sonnet C4)
> Fix the latent REQBUFS-EBUSY race in `CreateSurfaces2`. When a libva consumer probes with `vaCreateSurfaces(N, M)` then re-allocates with different dimensions while CAPTURE STREAMON is active, the cap_pool drain doesn't fully complete before REQBUFs(0) on OUTPUT — kernel returns EBUSY, driver pushes ahead with garbage `sizes[1]` (uninitialized memory shape), consumer hits SIGSEGV or falls back to SW.
**Empirical signature** (iter3 substrate, iter5 Phase 7B re-test): `Unable to request buffers: Device or resource busy` at init, then `cap_pool_init: query_buffer failed for slot N`, then ffmpeg `AVHWFramesContext: Failed to create surface: 2 (resource allocation failed)`. mpv recovers via SW fallback (no SIGSEGV in iter5 — the consumer happens to handle it). Other consumers (chromium-style: probe + real-alloc + STREAMON in tight succession) might not.
**Fix shape**: in `surface.c::CreateSurfaces2` resolution-change branch, ensure ordering is `STREAMOFF on CAPTURE → REQBUFS(0) on CAPTURE → STREAMOFF on OUTPUT → REQBUFS(0) on OUTPUT → S_FMT on OUTPUT → REQBUFS(N) on OUTPUT`. ~30 lines. The current code calls `cap_pool_destroy` (issues REQBUFS(0) on CAPTURE) then `v4l2_request_buffers(output, 0)` but doesn't STREAMOFF first — kernel rejects REQBUFS(0) on a streaming queue.
**Why first**: highest-leverage closure of a known-broken edge case. Required for cleaner upstream submission. Small scope.
### B. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
> Add a frame-hash spot check to confirm the iter5 sweep's msync(MS_SYNC|MS_INVALIDATE) removal in `surface.c::RequestSyncSurface` doesn't silently corrupt CAPTURE buffer reads on this CMA-backed hantro setup.
**Plan**: extend mpv vaapi-copy stress test to checksum the first-frame Y-plane and compare against a known-good baseline captured pre-removal. If they match, msync removal is verified. If they diverge, restore the msync + investigate.
**Fast**: ~1 hour total (capture baseline, write checksum harness, run side-by-side).
### C. Verify chromium-fourier + iter5 driver compatibility (NEW from iter5)
> Run the existing chromium-fourier 149 (`~/src/chromium-fourier/` if it exists, or rebuild from chromium-fourier patch series) against the iter5-end libva-v4l2-request-fourier driver. Determine whether iter4+iter5 libva-side fixes obviate any of chromium-fourier's Step-1 patches.
**Why this question exists**: stock Brave on PineTab2 doesn't engage VAAPI at all (its GPU process dies at GL bindings init — `InitializeStaticGLBindingsOneOff failed`, `GLES3 is unsupported`). chromium-fourier's value is partially the GL-stack fix that gets Chromium's GPU process running, separate from any libva-side patches it carries. With iter4 (DPB FFmpeg-semantics, fresh request_fd) and iter5 (debug-clean, multi-context safe) landed, some chromium-fourier libva-side patches may now be redundant.
**Tested in iter5**: stock Brave with three GL flags (`default`, `--use-gl=egl`, `--use-gl=desktop`) all fail at GL bindings before reaching VAAPI. Different failure mode for each but never reaches the decoder.
**Diagnostic plan**:
1. Confirm chromium-fourier 149 still builds and runs on ohm
2. Point it at /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 with iter5 driver
3. Capture stderr for v4l2-request: + vaapi: traces
4. If decode works: which chromium-fourier libva-side patches are still load-bearing? (revert + retest each)
5. Surface a "minimal chromium-fourier" patch set if some patches are obsoleted
**Output**: an updated chromium-fourier patch matrix (which patches are still needed; which iter4/iter5 obsoletes).
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5)
> File the Mozilla Bugzilla bug for `/dev/media*` + V4L2-stateless RDD sandbox with the firefox-fourier patch. File a bootlin issue on `bootlin/libva-v4l2-request` with iter1+2+3+4+5 patches as a cohesive working set.
**Why now**: the iter5 sweep + multi-context fix shifts the fork toward upstream-readiness. Patches are clean. Mozilla bug doesn't exist yet (per iter3 Phase 0 Sonnet research). bootlin upstream is dormant since 2021 but the fork is now substantially ahead with empirically-verified fixes.
**Stance**: per `feedback_no_upstream.md`, gated on explicit operator instruction. Listed for completeness.
### E. Performance binding cell (carried from iter1+2+3+4+5)
> Establish a measurement protocol: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (with iter5 G's PGO-disabled binary), SW baseline}. Anchor in `phaseN_evidence/`.
**Why**: anchors all iter1iter5 claims to numbers. Carried five iterations. iter5 G's PGO-disabled Firefox is now deployed — first iteration that has both a clean driver AND a release-quality Firefox to measure with.
### F. Multi-codec audit (carried from iter1 lock backlog)
> Verify MPEG-2 decode path. iter1 lock said "H.264 first; MPEG-2 next." iter1+2+3+4+5 all H.264-only.
**Plan**: find an MPEG-2 test fixture, decode via mpv vaapi-copy + vainfo. Verify hantro G1's MPEG-2 path through libva-v4l2-request-fourier. Surface any codec-specific bugs (the iter4 DPB+request_fd fixes were H.264-specific; MPEG-2 has different control flow).
### I. Firefox VIDIOC_QBUF EINVAL on first frame (NEW from iter5 amendment 2026-05-05)
> The iter5 amendment (Utility seccomp fix) closes Track F: sandbox no longer blocks the V4L2 request API. With sandbox open, Firefox loads the v4l2_request driver, `cap_pool_init: 24 slots ready` succeeds, then a single `Unable to queue buffer: Invalid argument` (VIDIOC_QBUF EINVAL) on what looks like the first frame, after which Firefox falls back to SW.
**Why this is a new candidate, not a Track A regression**: mpv `--hwdec=vaapi-copy` decoded 2000 frames clean on the same iter5-end driver build (sha `4bed52ec5d44b389…`). Only Firefox triggers the EINVAL. So the bug lives in the Firefox-specific consumer path through libva, not in the per-frame request_fd/DPB logic that iter4 closed.
**Failure modes observed (iter5-amend telemetry, 2026-05-05)**:
```
bbb_1080p30_h264.mp4 (single decode attempt):
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to queue buffer: Invalid argument
YouTube avc1 (Enhancer for YouTube forcing h264, multiple decode attempts on same tab):
v4l2-request: cap_pool_init: 24 slots ready (×4)
v4l2-request: Unable to set control(s): Invalid argument
v4l2-request: Unable to queue buffer: Invalid argument
```
The S_EXT_CTRLS EINVAL is the more diagnostic signal — one of the compound H.264 controls (SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX) is being rejected by the kernel. iter5 sweep removed the iter4 per-control TRY isolation (`d3a299b`); reinstating it temporarily (or strace) is iter6 step one.
**Diagnostic plan**:
1. Reinstate per-control VIDIOC_TRY_EXT_CTRLS isolation in v4l2.c (the iter4 diagnostic that pinpointed which compound control fails). Do not commit; use only for diagnosis.
2. Strace the Firefox Utility process during decode init. Capture exact `v4l2_ext_control` payloads. Cross-reference with FFmpeg's `v4l2_request_h264.c` (cached at `references/ffmpeg-kwiboo/`, the empirical authority per memory).
3. Once the offending control is identified, look at how Firefox's libva path constructs it differently from mpv-vaapi-copy. Likely: a header field Firefox doesn't populate, an order-of-operations difference, or a VAImage / VAExportSurfaceHandle side effect that perturbs driver state.
**Reproducibility**: 100% on either `file:///home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` OR YouTube avc1 (with enhanced-h264ify or Enhancer for YouTube forcing h264) + `LIBVA_DRIVER_NAME=v4l2_request` env vars + iter5-amend Firefox 150 + sandbox enabled.
**Risk**: medium. Could be a 5-line fix in v4l2.c QBUF prep, or could surface a fundamental Firefox-vs-mpv divergence in libva surface management.
### G. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5)
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation (iter2 Fix 3 was statistical / LRU mitigation; this is architectural).
**Why**: race window mathematically eliminated. Significant kernel-side test surface (does hantro on this kernel actually accept DMABUF type? GStreamer's v4l2slh264dec uses MMAP, so DMABUF on hantro may not be tested upstream).
**Risk**: highest unknown of any candidate. Possibly requires kernel work.
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
> Open the `fourier-fresnel` campaign — port libva-v4l2-request-fourier from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1+2+3+4+5 fixes on a second hardware target.
**Stance**: this is a **separate top-level campaign**, not an iter6 candidate. Listed here for completeness because operator may pick it as the next thing to open (sequenced ahead of `panvk-bifrost`). Charter at `~/src/fourier-fresnel/` once opened.
### Recommended pairings
- **A + B** (cap_pool race fix + msync verify) — small, surgical, both close iter5 carryovers, both prerequisites for clean upstream submission. Tightest scope.
- **A + C** (cap_pool fix + chromium-fourier verify) — closes the cap_pool race AND validates the libva backend on a second consumer family. Mid scope.
- **D alone** (upstream prep) — gated on operator instruction. Small if everything else is already clean (which iter5 mostly achieved).
- **E alone** (perf binding cell) — anchors campaign-wide claims to numbers. Carried five iterations.
- **F alone** (MPEG-2) — validates beyond H.264 scope.
- **G alone** (DMABUF) — high-risk architectural.
- **I alone** (Firefox QBUF EINVAL) — narrow, deterministic repro, gates the only known consumer-with-iter5-amendment hard-failure. Strong candidate for the iter6 lock.
## State that carries (re-verified iter5 close)
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) or `ohm.vpn`.
- **Userspace**: firefox 150.0.1-1.1 (iter5 G PGO-disabled fourier rebuild), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `4bed52ec5d44b389...` (iter5-end, post-cleanup).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. (For Track F MPEG-2: need new fixture.)
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
## State that does NOT carry
- iter5 mpv stress logs are tmpfs-volatile.
- chromium-fourier 149 — uncertain whether it still builds against current Chromium upstream / mesa / etc. Track C's Phase 2 has to confirm or rebuild.
## Tooling and measurement-instrument inventory
Same as iter5. Plus for new candidates:
- For A (cap_pool race fix): write a probe-pattern test program (`vaCreateSurfaces` 16x16 then 1920x1080 in tight succession) to deterministically trigger the race. Quicker repro than waiting for mpv libplacebo to randomly hit it.
- For B (msync verify): need a frame-hash baseline pre-iter5-sweep. Could use git checkout to a pre-c8b6ede commit + capture, then compare to iter5-end.
- For C (chromium-fourier): need to find/rebuild chromium-fourier. Operator's `~/src/chromium-fourier/` likely has the build.
- For E (perf): `pidstat -u` for CPU%, Mali-G52 freq via `/sys/class/devfreq/fde60000.gpu`.
- For F (MPEG-2): need an MPEG-2 fixture (`mpv --dump-stream` from a public DVD or transcode bbb to MPEG-2).
## In-scope (LOCKED 2026-05-05 for iteration 6) — AI (cap_pool / OUTPUT-buffer-recycle lifecycle)
Operator locked candidate **I**, then **merged with candidate A** after Phase 2 telemetry (2026-05-05) showed they are facets of the same underlying bug:
- **A (iter5 sonnet C4 carryover)**: cap_pool resolution-change race — REQBUFS-EBUSY when CAPTURE pool isn't drained before re-allocation. mpv libplacebo `--vo=gpu` triggers it via Vulkan-fallback resolution change.
- **I (iter6 candidate)**: Firefox `VIDIOC_QBUF` EINVAL on OUTPUT after ~19 successful S_EXT_CTRLS calls. The original "S_EXT_CTRLS EINVAL on frame 1" framing was transient state; the reproducible failure is at OUTPUT-buffer requeue with rotating `source_index`. mpv-vaapi-copy (single-surface recycle) doesn't hit this; Firefox (multi-surface rotation through libva) does.
Why merge: both are buffer-pool / surface-recycle lifecycle issues at OUTPUT (and CAPTURE) drain ordering. Partial fixes risk just shifting the symptom. The intended Phase 4 fix is a single coherent rework of cap_pool + DQBUF/QBUF sequencing in the surface lifecycle.
Other candidates (B, C, D, E, F, G) deferred to iter7+. H (fourier-fresnel) remains separate top-level campaign.
## Out-of-scope (LOCKED 2026-05-05 for iteration 6)
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter6 candidate D.
- New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
- New target hardware (fresnel, ampere) — separate campaign (H above).
## Phase 1 success criterion (LOCKED 2026-05-05 for iteration 6, AMENDED 2026-05-05 for AI merge)
> All three consumer paths must be GREEN on the iter6 driver:
>
> 1. **Firefox** (iter5-amend binary, sandbox enabled, `LIBVA_DRIVER_NAME=v4l2_request`) plays `bbb_1080p30_h264.mp4` for ≥30 seconds with HW decode engaged: zero `Unable to queue buffer` / `Unable to set control(s)` per-frame, `lsof /dev/video1` shows the Firefox Utility process holding the device, frames advance.
>
> 2. **mpv libplacebo `--vo=gpu`** runs ≥30 seconds on the same fixture without segfault and without REQBUFS-EBUSY events at init or resolution-change boundaries (carries iter5 sonnet C4 caveat to closure).
>
> 3. **mpv `--hwdec=vaapi-copy`** (regression check) decodes 2000 frames clean, identical pattern to iter5-end driver baseline (sha `4bed52ec5d44b389…`).
>
> Phase 5 sonnet review must confirm: (a) fix is libva-side (not consumer-specific kludge), (b) all three consumer paths verified, (c) no new mutable global state introduced (Track E discipline).
## Phase 1 LOCKED. Iteration 6 proceeds.
iter6 = candidate **I** alone. Phases 2..8:
- Phase 2: situation analysis — strace the failing Firefox decode, identify the offending compound H.264 control payload field, compare to FFmpeg `v4l2_request_h264.c` and to mpv-vaapi-copy's payload at the same call site.
- Phase 3: baseline anchor — capture the iter5-amend driver stderr + about:support decoder backend strings on the failing case, snapshot pre-fix.
- Phase 4: plan + execute the fix in the libva-v4l2-request-fourier driver (or, if root cause is upstream Firefox, document and stop).
- Phase 5: sonnet review.
- Phase 6: deploy fixed driver to ohm.
- Phase 7: verify against the success criterion above.
- Phase 8: close.
After lock, iter6 phases 2..8 proceed autonomously per "Stop only if user is needed."