Files
libva-multiplanar/phase0_findings_iter6.md
T
claude-noether c7c0bcae72 iter6 candidate I: enrich with YouTube avc1 telemetry
Re-tested the iter5-amend binary on YouTube with Enhancer for YouTube
forcing h264 (avc1). Captured a richer failure pattern:

- Multiple cap_pool_init events (4 decode attempts on one tab)
- Zero seccomp violations (Track F still GREEN under YT load)
- "Unable to set control(s): Invalid argument" — VIDIOC_S_EXT_CTRLS EINVAL
  on one of SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX (cannot
  identify which without per-control TRY isolation, which iter5
  sweep d3a299b removed)
- "Unable to queue buffer: Invalid argument" — VIDIOC_QBUF EINVAL

The S_EXT_CTRLS EINVAL is the more diagnostic of the two — points
to a compound H.264 control payload mismatch between Firefox's libva
path and what mpv-vaapi-copy sends.

Updated iter6 candidate I:
- Added the YouTube failure pattern alongside the bbb pattern
- Step-1 of diagnostic plan: reinstate per-control TRY isolation
  temporarily (do not commit; diagnosis-only)
- Cross-reference with FFmpeg v4l2_request_h264.c per existing memory
- Reproducibility: now 100% on YT-avc1 OR bbb-h264

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:42:32 +00:00

178 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 6 — Phase 0 (substrate / motivation / inventory)
Opens 2026-05-05 immediately after iter5 close (`phase8_iteration5_close.md`, fork commit `c8b6ede`, campaign close `8e6d9e6`).
## Predecessor close-out summary (iteration 5 → iteration 6)
iter5 was a four-track iteration: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context safety). All four closed GREEN. Driver source clean (1 v4l2-request log line per 2000-frame stress, was ~30+ lines/frame pre-iter5). Process-global mutable state eliminated. Firefox-fourier 150.0.1-1.1 deployed (169 MB libxul, 21× smaller than iter3 PGO-instrumented).
The campaign's original substrate question — "make multi-planar libva work end-to-end on Rockchip hantro for production VAAPI consumers" — is empirically achieved at the libva-side decode layer. iter6 is about closing the remaining quality + verification gates and (gated on operator instruction) the upstream prep work.
## Iteration 6 candidate research questions
### A. Cap_pool resolution-change race (carried from iter5 Phase 5 sonnet C4)
> Fix the latent REQBUFS-EBUSY race in `CreateSurfaces2`. When a libva consumer probes with `vaCreateSurfaces(N, M)` then re-allocates with different dimensions while CAPTURE STREAMON is active, the cap_pool drain doesn't fully complete before REQBUFs(0) on OUTPUT — kernel returns EBUSY, driver pushes ahead with garbage `sizes[1]` (uninitialized memory shape), consumer hits SIGSEGV or falls back to SW.
**Empirical signature** (iter3 substrate, iter5 Phase 7B re-test): `Unable to request buffers: Device or resource busy` at init, then `cap_pool_init: query_buffer failed for slot N`, then ffmpeg `AVHWFramesContext: Failed to create surface: 2 (resource allocation failed)`. mpv recovers via SW fallback (no SIGSEGV in iter5 — the consumer happens to handle it). Other consumers (chromium-style: probe + real-alloc + STREAMON in tight succession) might not.
**Fix shape**: in `surface.c::CreateSurfaces2` resolution-change branch, ensure ordering is `STREAMOFF on CAPTURE → REQBUFS(0) on CAPTURE → STREAMOFF on OUTPUT → REQBUFS(0) on OUTPUT → S_FMT on OUTPUT → REQBUFS(N) on OUTPUT`. ~30 lines. The current code calls `cap_pool_destroy` (issues REQBUFS(0) on CAPTURE) then `v4l2_request_buffers(output, 0)` but doesn't STREAMOFF first — kernel rejects REQBUFS(0) on a streaming queue.
**Why first**: highest-leverage closure of a known-broken edge case. Required for cleaner upstream submission. Small scope.
### B. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
> Add a frame-hash spot check to confirm the iter5 sweep's msync(MS_SYNC|MS_INVALIDATE) removal in `surface.c::RequestSyncSurface` doesn't silently corrupt CAPTURE buffer reads on this CMA-backed hantro setup.
**Plan**: extend mpv vaapi-copy stress test to checksum the first-frame Y-plane and compare against a known-good baseline captured pre-removal. If they match, msync removal is verified. If they diverge, restore the msync + investigate.
**Fast**: ~1 hour total (capture baseline, write checksum harness, run side-by-side).
### C. Verify chromium-fourier + iter5 driver compatibility (NEW from iter5)
> Run the existing chromium-fourier 149 (`~/src/chromium-fourier/` if it exists, or rebuild from chromium-fourier patch series) against the iter5-end libva-v4l2-request-fourier driver. Determine whether iter4+iter5 libva-side fixes obviate any of chromium-fourier's Step-1 patches.
**Why this question exists**: stock Brave on PineTab2 doesn't engage VAAPI at all (its GPU process dies at GL bindings init — `InitializeStaticGLBindingsOneOff failed`, `GLES3 is unsupported`). chromium-fourier's value is partially the GL-stack fix that gets Chromium's GPU process running, separate from any libva-side patches it carries. With iter4 (DPB FFmpeg-semantics, fresh request_fd) and iter5 (debug-clean, multi-context safe) landed, some chromium-fourier libva-side patches may now be redundant.
**Tested in iter5**: stock Brave with three GL flags (`default`, `--use-gl=egl`, `--use-gl=desktop`) all fail at GL bindings before reaching VAAPI. Different failure mode for each but never reaches the decoder.
**Diagnostic plan**:
1. Confirm chromium-fourier 149 still builds and runs on ohm
2. Point it at /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 with iter5 driver
3. Capture stderr for v4l2-request: + vaapi: traces
4. If decode works: which chromium-fourier libva-side patches are still load-bearing? (revert + retest each)
5. Surface a "minimal chromium-fourier" patch set if some patches are obsoleted
**Output**: an updated chromium-fourier patch matrix (which patches are still needed; which iter4/iter5 obsoletes).
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5)
> File the Mozilla Bugzilla bug for `/dev/media*` + V4L2-stateless RDD sandbox with the firefox-fourier patch. File a bootlin issue on `bootlin/libva-v4l2-request` with iter1+2+3+4+5 patches as a cohesive working set.
**Why now**: the iter5 sweep + multi-context fix shifts the fork toward upstream-readiness. Patches are clean. Mozilla bug doesn't exist yet (per iter3 Phase 0 Sonnet research). bootlin upstream is dormant since 2021 but the fork is now substantially ahead with empirically-verified fixes.
**Stance**: per `feedback_no_upstream.md`, gated on explicit operator instruction. Listed for completeness.
### E. Performance binding cell (carried from iter1+2+3+4+5)
> Establish a measurement protocol: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (with iter5 G's PGO-disabled binary), SW baseline}. Anchor in `phaseN_evidence/`.
**Why**: anchors all iter1iter5 claims to numbers. Carried five iterations. iter5 G's PGO-disabled Firefox is now deployed — first iteration that has both a clean driver AND a release-quality Firefox to measure with.
### F. Multi-codec audit (carried from iter1 lock backlog)
> Verify MPEG-2 decode path. iter1 lock said "H.264 first; MPEG-2 next." iter1+2+3+4+5 all H.264-only.
**Plan**: find an MPEG-2 test fixture, decode via mpv vaapi-copy + vainfo. Verify hantro G1's MPEG-2 path through libva-v4l2-request-fourier. Surface any codec-specific bugs (the iter4 DPB+request_fd fixes were H.264-specific; MPEG-2 has different control flow).
### I. Firefox VIDIOC_QBUF EINVAL on first frame (NEW from iter5 amendment 2026-05-05)
> The iter5 amendment (Utility seccomp fix) closes Track F: sandbox no longer blocks the V4L2 request API. With sandbox open, Firefox loads the v4l2_request driver, `cap_pool_init: 24 slots ready` succeeds, then a single `Unable to queue buffer: Invalid argument` (VIDIOC_QBUF EINVAL) on what looks like the first frame, after which Firefox falls back to SW.
**Why this is a new candidate, not a Track A regression**: mpv `--hwdec=vaapi-copy` decoded 2000 frames clean on the same iter5-end driver build (sha `4bed52ec5d44b389…`). Only Firefox triggers the EINVAL. So the bug lives in the Firefox-specific consumer path through libva, not in the per-frame request_fd/DPB logic that iter4 closed.
**Failure modes observed (iter5-amend telemetry, 2026-05-05)**:
```
bbb_1080p30_h264.mp4 (single decode attempt):
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to queue buffer: Invalid argument
YouTube avc1 (Enhancer for YouTube forcing h264, multiple decode attempts on same tab):
v4l2-request: cap_pool_init: 24 slots ready (×4)
v4l2-request: Unable to set control(s): Invalid argument
v4l2-request: Unable to queue buffer: Invalid argument
```
The S_EXT_CTRLS EINVAL is the more diagnostic signal — one of the compound H.264 controls (SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX) is being rejected by the kernel. iter5 sweep removed the iter4 per-control TRY isolation (`d3a299b`); reinstating it temporarily (or strace) is iter6 step one.
**Diagnostic plan**:
1. Reinstate per-control VIDIOC_TRY_EXT_CTRLS isolation in v4l2.c (the iter4 diagnostic that pinpointed which compound control fails). Do not commit; use only for diagnosis.
2. Strace the Firefox Utility process during decode init. Capture exact `v4l2_ext_control` payloads. Cross-reference with FFmpeg's `v4l2_request_h264.c` (cached at `references/ffmpeg-kwiboo/`, the empirical authority per memory).
3. Once the offending control is identified, look at how Firefox's libva path constructs it differently from mpv-vaapi-copy. Likely: a header field Firefox doesn't populate, an order-of-operations difference, or a VAImage / VAExportSurfaceHandle side effect that perturbs driver state.
**Reproducibility**: 100% on either `file:///home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` OR YouTube avc1 (with enhanced-h264ify or Enhancer for YouTube forcing h264) + `LIBVA_DRIVER_NAME=v4l2_request` env vars + iter5-amend Firefox 150 + sandbox enabled.
**Risk**: medium. Could be a 5-line fix in v4l2.c QBUF prep, or could surface a fundamental Firefox-vs-mpv divergence in libva surface management.
### G. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5)
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation (iter2 Fix 3 was statistical / LRU mitigation; this is architectural).
**Why**: race window mathematically eliminated. Significant kernel-side test surface (does hantro on this kernel actually accept DMABUF type? GStreamer's v4l2slh264dec uses MMAP, so DMABUF on hantro may not be tested upstream).
**Risk**: highest unknown of any candidate. Possibly requires kernel work.
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
> Open the `fourier-fresnel` campaign — port libva-v4l2-request-fourier from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1+2+3+4+5 fixes on a second hardware target.
**Stance**: this is a **separate top-level campaign**, not an iter6 candidate. Listed here for completeness because operator may pick it as the next thing to open (sequenced ahead of `panvk-bifrost`). Charter at `~/src/fourier-fresnel/` once opened.
### Recommended pairings
- **A + B** (cap_pool race fix + msync verify) — small, surgical, both close iter5 carryovers, both prerequisites for clean upstream submission. Tightest scope.
- **A + C** (cap_pool fix + chromium-fourier verify) — closes the cap_pool race AND validates the libva backend on a second consumer family. Mid scope.
- **D alone** (upstream prep) — gated on operator instruction. Small if everything else is already clean (which iter5 mostly achieved).
- **E alone** (perf binding cell) — anchors campaign-wide claims to numbers. Carried five iterations.
- **F alone** (MPEG-2) — validates beyond H.264 scope.
- **G alone** (DMABUF) — high-risk architectural.
- **I alone** (Firefox QBUF EINVAL) — narrow, deterministic repro, gates the only known consumer-with-iter5-amendment hard-failure. Strong candidate for the iter6 lock.
## State that carries (re-verified iter5 close)
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) or `ohm.vpn`.
- **Userspace**: firefox 150.0.1-1.1 (iter5 G PGO-disabled fourier rebuild), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `4bed52ec5d44b389...` (iter5-end, post-cleanup).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. (For Track F MPEG-2: need new fixture.)
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
## State that does NOT carry
- iter5 mpv stress logs are tmpfs-volatile.
- chromium-fourier 149 — uncertain whether it still builds against current Chromium upstream / mesa / etc. Track C's Phase 2 has to confirm or rebuild.
## Tooling and measurement-instrument inventory
Same as iter5. Plus for new candidates:
- For A (cap_pool race fix): write a probe-pattern test program (`vaCreateSurfaces` 16x16 then 1920x1080 in tight succession) to deterministically trigger the race. Quicker repro than waiting for mpv libplacebo to randomly hit it.
- For B (msync verify): need a frame-hash baseline pre-iter5-sweep. Could use git checkout to a pre-c8b6ede commit + capture, then compare to iter5-end.
- For C (chromium-fourier): need to find/rebuild chromium-fourier. Operator's `~/src/chromium-fourier/` likely has the build.
- For E (perf): `pidstat -u` for CPU%, Mali-G52 freq via `/sys/class/devfreq/fde60000.gpu`.
- For F (MPEG-2): need an MPEG-2 fixture (`mpv --dump-stream` from a public DVD or transcode bbb to MPEG-2).
## In-scope (LOCKING DEFERRED — Phase 1 user input)
To be locked at Phase 1 from candidates A..G above. H is a separate top-level campaign decision, not an iter6 candidate.
## Out-of-scope (LOCKED 2026-05-05 for iteration 6)
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter6 candidate D.
- New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
- New target hardware (fresnel, ampere) — separate campaign (H above).
## Phase 1 success criterion (will lock after user picks candidate)
Pre-lock template:
- For candidate A: "`vaCreateSurfaces` race-probe test program runs cleanly with no REQBUFS EBUSY events; mpv libplacebo --vo=gpu test still GREEN; iter5-end smoke test still 0 EINVAL."
- For candidate B: "vaapi-copy 100-frame run produces frame-hash matching pre-iter5-sweep baseline; OR if divergence, msync restored and verified."
- For candidate C: "chromium-fourier 149 builds + runs against iter5 driver. v4l2-request: traces present in stderr. Decode works at ≥24 fps. Patch matrix updated indicating which chromium-fourier libva-side patches are obsolete vs still required."
- For candidate D: "Mozilla Bugzilla bug filed with firefox-fourier patch attached; bootlin issue filed against libva-v4l2-request with iter1-iter5 patch series."
- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing on bbb_1080p30. Reproducible from documented script."
- For candidate F: "MPEG-2 fixture decodes through hantro G1 via libva-v4l2-request-fourier without crashes / EINVAL."
- For candidate G: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed CAPTURE buffers; race window architecturally closed."
## Stop point
**Phase 1 lock requires user input** — pick from A..G (and any pairing).
Recommended primary: **A + B** (cap_pool race fix + msync verify) — both are iter5 sonnet review caveats, both small scope, both prerequisites for clean upstream submission. If the operator wants to make iter6 the upstream-prep iteration, **A + B + D** (the pair plus upstream filings) is the natural next step.
Alternative leans:
- **C alone** if "do we still need chromium-fourier?" is the more pressing question
- **E alone** if perf measurement matters more than upstream prep
- **F alone** if multi-codec coverage is a higher priority than refinement
After lock, iter6 phases 2..8 proceed autonomously per "Stop only if user is needed."