c7c0bcae72
Re-tested the iter5-amend binary on YouTube with Enhancer for YouTube forcing h264 (avc1). Captured a richer failure pattern: - Multiple cap_pool_init events (4 decode attempts on one tab) - Zero seccomp violations (Track F still GREEN under YT load) - "Unable to set control(s): Invalid argument" — VIDIOC_S_EXT_CTRLS EINVAL on one of SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX (cannot identify which without per-control TRY isolation, which iter5 sweep d3a299b removed) - "Unable to queue buffer: Invalid argument" — VIDIOC_QBUF EINVAL The S_EXT_CTRLS EINVAL is the more diagnostic of the two — points to a compound H.264 control payload mismatch between Firefox's libva path and what mpv-vaapi-copy sends. Updated iter6 candidate I: - Added the YouTube failure pattern alongside the bbb pattern - Step-1 of diagnostic plan: reinstate per-control TRY isolation temporarily (do not commit; diagnosis-only) - Cross-reference with FFmpeg v4l2_request_h264.c per existing memory - Reproducibility: now 100% on YT-avc1 OR bbb-h264 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
178 lines
15 KiB
Markdown
178 lines
15 KiB
Markdown
# Iteration 6 — Phase 0 (substrate / motivation / inventory)
|
||
|
||
Opens 2026-05-05 immediately after iter5 close (`phase8_iteration5_close.md`, fork commit `c8b6ede`, campaign close `8e6d9e6`).
|
||
|
||
## Predecessor close-out summary (iteration 5 → iteration 6)
|
||
|
||
iter5 was a four-track iteration: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context safety). All four closed GREEN. Driver source clean (1 v4l2-request log line per 2000-frame stress, was ~30+ lines/frame pre-iter5). Process-global mutable state eliminated. Firefox-fourier 150.0.1-1.1 deployed (169 MB libxul, 21× smaller than iter3 PGO-instrumented).
|
||
|
||
The campaign's original substrate question — "make multi-planar libva work end-to-end on Rockchip hantro for production VAAPI consumers" — is empirically achieved at the libva-side decode layer. iter6 is about closing the remaining quality + verification gates and (gated on operator instruction) the upstream prep work.
|
||
|
||
## Iteration 6 candidate research questions
|
||
|
||
### A. Cap_pool resolution-change race (carried from iter5 Phase 5 sonnet C4)
|
||
|
||
> Fix the latent REQBUFS-EBUSY race in `CreateSurfaces2`. When a libva consumer probes with `vaCreateSurfaces(N, M)` then re-allocates with different dimensions while CAPTURE STREAMON is active, the cap_pool drain doesn't fully complete before REQBUFs(0) on OUTPUT — kernel returns EBUSY, driver pushes ahead with garbage `sizes[1]` (uninitialized memory shape), consumer hits SIGSEGV or falls back to SW.
|
||
|
||
**Empirical signature** (iter3 substrate, iter5 Phase 7B re-test): `Unable to request buffers: Device or resource busy` at init, then `cap_pool_init: query_buffer failed for slot N`, then ffmpeg `AVHWFramesContext: Failed to create surface: 2 (resource allocation failed)`. mpv recovers via SW fallback (no SIGSEGV in iter5 — the consumer happens to handle it). Other consumers (chromium-style: probe + real-alloc + STREAMON in tight succession) might not.
|
||
|
||
**Fix shape**: in `surface.c::CreateSurfaces2` resolution-change branch, ensure ordering is `STREAMOFF on CAPTURE → REQBUFS(0) on CAPTURE → STREAMOFF on OUTPUT → REQBUFS(0) on OUTPUT → S_FMT on OUTPUT → REQBUFS(N) on OUTPUT`. ~30 lines. The current code calls `cap_pool_destroy` (issues REQBUFS(0) on CAPTURE) then `v4l2_request_buffers(output, 0)` but doesn't STREAMOFF first — kernel rejects REQBUFS(0) on a streaming queue.
|
||
|
||
**Why first**: highest-leverage closure of a known-broken edge case. Required for cleaner upstream submission. Small scope.
|
||
|
||
### B. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
|
||
|
||
> Add a frame-hash spot check to confirm the iter5 sweep's msync(MS_SYNC|MS_INVALIDATE) removal in `surface.c::RequestSyncSurface` doesn't silently corrupt CAPTURE buffer reads on this CMA-backed hantro setup.
|
||
|
||
**Plan**: extend mpv vaapi-copy stress test to checksum the first-frame Y-plane and compare against a known-good baseline captured pre-removal. If they match, msync removal is verified. If they diverge, restore the msync + investigate.
|
||
|
||
**Fast**: ~1 hour total (capture baseline, write checksum harness, run side-by-side).
|
||
|
||
### C. Verify chromium-fourier + iter5 driver compatibility (NEW from iter5)
|
||
|
||
> Run the existing chromium-fourier 149 (`~/src/chromium-fourier/` if it exists, or rebuild from chromium-fourier patch series) against the iter5-end libva-v4l2-request-fourier driver. Determine whether iter4+iter5 libva-side fixes obviate any of chromium-fourier's Step-1 patches.
|
||
|
||
**Why this question exists**: stock Brave on PineTab2 doesn't engage VAAPI at all (its GPU process dies at GL bindings init — `InitializeStaticGLBindingsOneOff failed`, `GLES3 is unsupported`). chromium-fourier's value is partially the GL-stack fix that gets Chromium's GPU process running, separate from any libva-side patches it carries. With iter4 (DPB FFmpeg-semantics, fresh request_fd) and iter5 (debug-clean, multi-context safe) landed, some chromium-fourier libva-side patches may now be redundant.
|
||
|
||
**Tested in iter5**: stock Brave with three GL flags (`default`, `--use-gl=egl`, `--use-gl=desktop`) all fail at GL bindings before reaching VAAPI. Different failure mode for each but never reaches the decoder.
|
||
|
||
**Diagnostic plan**:
|
||
1. Confirm chromium-fourier 149 still builds and runs on ohm
|
||
2. Point it at /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 with iter5 driver
|
||
3. Capture stderr for v4l2-request: + vaapi: traces
|
||
4. If decode works: which chromium-fourier libva-side patches are still load-bearing? (revert + retest each)
|
||
5. Surface a "minimal chromium-fourier" patch set if some patches are obsoleted
|
||
|
||
**Output**: an updated chromium-fourier patch matrix (which patches are still needed; which iter4/iter5 obsoletes).
|
||
|
||
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5)
|
||
|
||
> File the Mozilla Bugzilla bug for `/dev/media*` + V4L2-stateless RDD sandbox with the firefox-fourier patch. File a bootlin issue on `bootlin/libva-v4l2-request` with iter1+2+3+4+5 patches as a cohesive working set.
|
||
|
||
**Why now**: the iter5 sweep + multi-context fix shifts the fork toward upstream-readiness. Patches are clean. Mozilla bug doesn't exist yet (per iter3 Phase 0 Sonnet research). bootlin upstream is dormant since 2021 but the fork is now substantially ahead with empirically-verified fixes.
|
||
|
||
**Stance**: per `feedback_no_upstream.md`, gated on explicit operator instruction. Listed for completeness.
|
||
|
||
### E. Performance binding cell (carried from iter1+2+3+4+5)
|
||
|
||
> Establish a measurement protocol: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (with iter5 G's PGO-disabled binary), SW baseline}. Anchor in `phaseN_evidence/`.
|
||
|
||
**Why**: anchors all iter1–iter5 claims to numbers. Carried five iterations. iter5 G's PGO-disabled Firefox is now deployed — first iteration that has both a clean driver AND a release-quality Firefox to measure with.
|
||
|
||
### F. Multi-codec audit (carried from iter1 lock backlog)
|
||
|
||
> Verify MPEG-2 decode path. iter1 lock said "H.264 first; MPEG-2 next." iter1+2+3+4+5 all H.264-only.
|
||
|
||
**Plan**: find an MPEG-2 test fixture, decode via mpv vaapi-copy + vainfo. Verify hantro G1's MPEG-2 path through libva-v4l2-request-fourier. Surface any codec-specific bugs (the iter4 DPB+request_fd fixes were H.264-specific; MPEG-2 has different control flow).
|
||
|
||
### I. Firefox VIDIOC_QBUF EINVAL on first frame (NEW from iter5 amendment 2026-05-05)
|
||
|
||
> The iter5 amendment (Utility seccomp fix) closes Track F: sandbox no longer blocks the V4L2 request API. With sandbox open, Firefox loads the v4l2_request driver, `cap_pool_init: 24 slots ready` succeeds, then a single `Unable to queue buffer: Invalid argument` (VIDIOC_QBUF EINVAL) on what looks like the first frame, after which Firefox falls back to SW.
|
||
|
||
**Why this is a new candidate, not a Track A regression**: mpv `--hwdec=vaapi-copy` decoded 2000 frames clean on the same iter5-end driver build (sha `4bed52ec5d44b389…`). Only Firefox triggers the EINVAL. So the bug lives in the Firefox-specific consumer path through libva, not in the per-frame request_fd/DPB logic that iter4 closed.
|
||
|
||
**Failure modes observed (iter5-amend telemetry, 2026-05-05)**:
|
||
```
|
||
bbb_1080p30_h264.mp4 (single decode attempt):
|
||
v4l2-request: cap_pool_init: 24 slots ready
|
||
v4l2-request: Unable to queue buffer: Invalid argument
|
||
|
||
YouTube avc1 (Enhancer for YouTube forcing h264, multiple decode attempts on same tab):
|
||
v4l2-request: cap_pool_init: 24 slots ready (×4)
|
||
v4l2-request: Unable to set control(s): Invalid argument
|
||
v4l2-request: Unable to queue buffer: Invalid argument
|
||
```
|
||
|
||
The S_EXT_CTRLS EINVAL is the more diagnostic signal — one of the compound H.264 controls (SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX) is being rejected by the kernel. iter5 sweep removed the iter4 per-control TRY isolation (`d3a299b`); reinstating it temporarily (or strace) is iter6 step one.
|
||
|
||
**Diagnostic plan**:
|
||
1. Reinstate per-control VIDIOC_TRY_EXT_CTRLS isolation in v4l2.c (the iter4 diagnostic that pinpointed which compound control fails). Do not commit; use only for diagnosis.
|
||
2. Strace the Firefox Utility process during decode init. Capture exact `v4l2_ext_control` payloads. Cross-reference with FFmpeg's `v4l2_request_h264.c` (cached at `references/ffmpeg-kwiboo/`, the empirical authority per memory).
|
||
3. Once the offending control is identified, look at how Firefox's libva path constructs it differently from mpv-vaapi-copy. Likely: a header field Firefox doesn't populate, an order-of-operations difference, or a VAImage / VAExportSurfaceHandle side effect that perturbs driver state.
|
||
|
||
**Reproducibility**: 100% on either `file:///home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` OR YouTube avc1 (with enhanced-h264ify or Enhancer for YouTube forcing h264) + `LIBVA_DRIVER_NAME=v4l2_request` env vars + iter5-amend Firefox 150 + sandbox enabled.
|
||
|
||
**Risk**: medium. Could be a 5-line fix in v4l2.c QBUF prep, or could surface a fundamental Firefox-vs-mpv divergence in libva surface management.
|
||
|
||
### G. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5)
|
||
|
||
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation (iter2 Fix 3 was statistical / LRU mitigation; this is architectural).
|
||
|
||
**Why**: race window mathematically eliminated. Significant kernel-side test surface (does hantro on this kernel actually accept DMABUF type? GStreamer's v4l2slh264dec uses MMAP, so DMABUF on hantro may not be tested upstream).
|
||
|
||
**Risk**: highest unknown of any candidate. Possibly requires kernel work.
|
||
|
||
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
|
||
|
||
> Open the `fourier-fresnel` campaign — port libva-v4l2-request-fourier from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1+2+3+4+5 fixes on a second hardware target.
|
||
|
||
**Stance**: this is a **separate top-level campaign**, not an iter6 candidate. Listed here for completeness because operator may pick it as the next thing to open (sequenced ahead of `panvk-bifrost`). Charter at `~/src/fourier-fresnel/` once opened.
|
||
|
||
### Recommended pairings
|
||
|
||
- **A + B** (cap_pool race fix + msync verify) — small, surgical, both close iter5 carryovers, both prerequisites for clean upstream submission. Tightest scope.
|
||
- **A + C** (cap_pool fix + chromium-fourier verify) — closes the cap_pool race AND validates the libva backend on a second consumer family. Mid scope.
|
||
- **D alone** (upstream prep) — gated on operator instruction. Small if everything else is already clean (which iter5 mostly achieved).
|
||
- **E alone** (perf binding cell) — anchors campaign-wide claims to numbers. Carried five iterations.
|
||
- **F alone** (MPEG-2) — validates beyond H.264 scope.
|
||
- **G alone** (DMABUF) — high-risk architectural.
|
||
- **I alone** (Firefox QBUF EINVAL) — narrow, deterministic repro, gates the only known consumer-with-iter5-amendment hard-failure. Strong candidate for the iter6 lock.
|
||
|
||
## State that carries (re-verified iter5 close)
|
||
|
||
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) or `ohm.vpn`.
|
||
- **Userspace**: firefox 150.0.1-1.1 (iter5 G PGO-disabled fourier rebuild), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
|
||
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `4bed52ec5d44b389...` (iter5-end, post-cleanup).
|
||
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. (For Track F MPEG-2: need new fixture.)
|
||
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
|
||
|
||
## State that does NOT carry
|
||
|
||
- iter5 mpv stress logs are tmpfs-volatile.
|
||
- chromium-fourier 149 — uncertain whether it still builds against current Chromium upstream / mesa / etc. Track C's Phase 2 has to confirm or rebuild.
|
||
|
||
## Tooling and measurement-instrument inventory
|
||
|
||
Same as iter5. Plus for new candidates:
|
||
- For A (cap_pool race fix): write a probe-pattern test program (`vaCreateSurfaces` 16x16 then 1920x1080 in tight succession) to deterministically trigger the race. Quicker repro than waiting for mpv libplacebo to randomly hit it.
|
||
- For B (msync verify): need a frame-hash baseline pre-iter5-sweep. Could use git checkout to a pre-c8b6ede commit + capture, then compare to iter5-end.
|
||
- For C (chromium-fourier): need to find/rebuild chromium-fourier. Operator's `~/src/chromium-fourier/` likely has the build.
|
||
- For E (perf): `pidstat -u` for CPU%, Mali-G52 freq via `/sys/class/devfreq/fde60000.gpu`.
|
||
- For F (MPEG-2): need an MPEG-2 fixture (`mpv --dump-stream` from a public DVD or transcode bbb to MPEG-2).
|
||
|
||
## In-scope (LOCKING DEFERRED — Phase 1 user input)
|
||
|
||
To be locked at Phase 1 from candidates A..G above. H is a separate top-level campaign decision, not an iter6 candidate.
|
||
|
||
## Out-of-scope (LOCKED 2026-05-05 for iteration 6)
|
||
|
||
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
|
||
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
|
||
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter6 candidate D.
|
||
- New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
|
||
- New target hardware (fresnel, ampere) — separate campaign (H above).
|
||
|
||
## Phase 1 success criterion (will lock after user picks candidate)
|
||
|
||
Pre-lock template:
|
||
- For candidate A: "`vaCreateSurfaces` race-probe test program runs cleanly with no REQBUFS EBUSY events; mpv libplacebo --vo=gpu test still GREEN; iter5-end smoke test still 0 EINVAL."
|
||
- For candidate B: "vaapi-copy 100-frame run produces frame-hash matching pre-iter5-sweep baseline; OR if divergence, msync restored and verified."
|
||
- For candidate C: "chromium-fourier 149 builds + runs against iter5 driver. v4l2-request: traces present in stderr. Decode works at ≥24 fps. Patch matrix updated indicating which chromium-fourier libva-side patches are obsolete vs still required."
|
||
- For candidate D: "Mozilla Bugzilla bug filed with firefox-fourier patch attached; bootlin issue filed against libva-v4l2-request with iter1-iter5 patch series."
|
||
- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing on bbb_1080p30. Reproducible from documented script."
|
||
- For candidate F: "MPEG-2 fixture decodes through hantro G1 via libva-v4l2-request-fourier without crashes / EINVAL."
|
||
- For candidate G: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed CAPTURE buffers; race window architecturally closed."
|
||
|
||
## Stop point
|
||
|
||
**Phase 1 lock requires user input** — pick from A..G (and any pairing).
|
||
|
||
Recommended primary: **A + B** (cap_pool race fix + msync verify) — both are iter5 sonnet review caveats, both small scope, both prerequisites for clean upstream submission. If the operator wants to make iter6 the upstream-prep iteration, **A + B + D** (the pair plus upstream filings) is the natural next step.
|
||
|
||
Alternative leans:
|
||
- **C alone** if "do we still need chromium-fourier?" is the more pressing question
|
||
- **E alone** if perf measurement matters more than upstream prep
|
||
- **F alone** if multi-codec coverage is a higher priority than refinement
|
||
|
||
After lock, iter6 phases 2..8 proceed autonomously per "Stop only if user is needed."
|