Files
libva-multiplanar/phase0_findings_iter6.md
T
claude-noether 793409b960 iter6 Phase 2: A∪I merge + bug class identified
Phase 1 amended: scope merged (A: cap_pool resolution-change race
+ I: Firefox VIDIOC_QBUF EINVAL) after Phase 2 telemetry showed
they're facets of the same buffer-pool / surface-recycle lifecycle
weakness.

Phase 2 findings:
- Original "S_EXT_CTRLS fails on frame 1" was transient state, does
  not reproduce on iter6-DX diagnostic build.
- Reproducible failure: OUTPUT VIDIOC_QBUF EINVAL after a varying
  number of successful frames (1, 19, 53 across three runs).
- mpv-vaapi-copy clean — single-surface recycle pattern doesn't
  trigger the race; Firefox's multi-surface MediaSource pattern does.
- DQBUF index-mismatch theory: ruled out.
- Control payload divergence: ruled out (first 64 bytes byte-identical
  between mpv and Firefox).
- Surviving hypothesis: request_fd lifecycle race — fd=30 reused on
  every frame after close, kernel-side request object may not release
  synchronously, next QBUF on REQUEST_FD=30 collides with stale state.

Phase 4 leading approach: C — extend iter4's "drain before reuse"
discipline from request_fd to OUTPUT pool slot. Mirror picture.c's
cap_pool unbind-before-rebind pattern in the OUTPUT lifecycle.

iter6-DX diagnostic build is local on ohm (/home/mfritsche/iter6-fork-dx).
Diagnostics are not committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:37:31 +00:00

16 KiB
Raw Blame History

Iteration 6 — Phase 0 (substrate / motivation / inventory)

Opens 2026-05-05 immediately after iter5 close (phase8_iteration5_close.md, fork commit c8b6ede, campaign close 8e6d9e6).

Predecessor close-out summary (iteration 5 → iteration 6)

iter5 was a four-track iteration: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context safety). All four closed GREEN. Driver source clean (1 v4l2-request log line per 2000-frame stress, was ~30+ lines/frame pre-iter5). Process-global mutable state eliminated. Firefox-fourier 150.0.1-1.1 deployed (169 MB libxul, 21× smaller than iter3 PGO-instrumented).

The campaign's original substrate question — "make multi-planar libva work end-to-end on Rockchip hantro for production VAAPI consumers" — is empirically achieved at the libva-side decode layer. iter6 is about closing the remaining quality + verification gates and (gated on operator instruction) the upstream prep work.

Iteration 6 candidate research questions

A. Cap_pool resolution-change race (carried from iter5 Phase 5 sonnet C4)

Fix the latent REQBUFS-EBUSY race in CreateSurfaces2. When a libva consumer probes with vaCreateSurfaces(N, M) then re-allocates with different dimensions while CAPTURE STREAMON is active, the cap_pool drain doesn't fully complete before REQBUFs(0) on OUTPUT — kernel returns EBUSY, driver pushes ahead with garbage sizes[1] (uninitialized memory shape), consumer hits SIGSEGV or falls back to SW.

Empirical signature (iter3 substrate, iter5 Phase 7B re-test): Unable to request buffers: Device or resource busy at init, then cap_pool_init: query_buffer failed for slot N, then ffmpeg AVHWFramesContext: Failed to create surface: 2 (resource allocation failed). mpv recovers via SW fallback (no SIGSEGV in iter5 — the consumer happens to handle it). Other consumers (chromium-style: probe + real-alloc + STREAMON in tight succession) might not.

Fix shape: in surface.c::CreateSurfaces2 resolution-change branch, ensure ordering is STREAMOFF on CAPTURE → REQBUFS(0) on CAPTURE → STREAMOFF on OUTPUT → REQBUFS(0) on OUTPUT → S_FMT on OUTPUT → REQBUFS(N) on OUTPUT. ~30 lines. The current code calls cap_pool_destroy (issues REQBUFS(0) on CAPTURE) then v4l2_request_buffers(output, 0) but doesn't STREAMOFF first — kernel rejects REQBUFS(0) on a streaming queue.

Why first: highest-leverage closure of a known-broken edge case. Required for cleaner upstream submission. Small scope.

B. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)

Add a frame-hash spot check to confirm the iter5 sweep's msync(MS_SYNC|MS_INVALIDATE) removal in surface.c::RequestSyncSurface doesn't silently corrupt CAPTURE buffer reads on this CMA-backed hantro setup.

Plan: extend mpv vaapi-copy stress test to checksum the first-frame Y-plane and compare against a known-good baseline captured pre-removal. If they match, msync removal is verified. If they diverge, restore the msync + investigate.

Fast: ~1 hour total (capture baseline, write checksum harness, run side-by-side).

C. Verify chromium-fourier + iter5 driver compatibility (NEW from iter5)

Run the existing chromium-fourier 149 (~/src/chromium-fourier/ if it exists, or rebuild from chromium-fourier patch series) against the iter5-end libva-v4l2-request-fourier driver. Determine whether iter4+iter5 libva-side fixes obviate any of chromium-fourier's Step-1 patches.

Why this question exists: stock Brave on PineTab2 doesn't engage VAAPI at all (its GPU process dies at GL bindings init — InitializeStaticGLBindingsOneOff failed, GLES3 is unsupported). chromium-fourier's value is partially the GL-stack fix that gets Chromium's GPU process running, separate from any libva-side patches it carries. With iter4 (DPB FFmpeg-semantics, fresh request_fd) and iter5 (debug-clean, multi-context safe) landed, some chromium-fourier libva-side patches may now be redundant.

Tested in iter5: stock Brave with three GL flags (default, --use-gl=egl, --use-gl=desktop) all fail at GL bindings before reaching VAAPI. Different failure mode for each but never reaches the decoder.

Diagnostic plan:

  1. Confirm chromium-fourier 149 still builds and runs on ohm
  2. Point it at /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 with iter5 driver
  3. Capture stderr for v4l2-request: + vaapi: traces
  4. If decode works: which chromium-fourier libva-side patches are still load-bearing? (revert + retest each)
  5. Surface a "minimal chromium-fourier" patch set if some patches are obsoleted

Output: an updated chromium-fourier patch matrix (which patches are still needed; which iter4/iter5 obsoletes).

D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5)

File the Mozilla Bugzilla bug for /dev/media* + V4L2-stateless RDD sandbox with the firefox-fourier patch. File a bootlin issue on bootlin/libva-v4l2-request with iter1+2+3+4+5 patches as a cohesive working set.

Why now: the iter5 sweep + multi-context fix shifts the fork toward upstream-readiness. Patches are clean. Mozilla bug doesn't exist yet (per iter3 Phase 0 Sonnet research). bootlin upstream is dormant since 2021 but the fork is now substantially ahead with empirically-verified fixes.

Stance: per feedback_no_upstream.md, gated on explicit operator instruction. Listed for completeness.

E. Performance binding cell (carried from iter1+2+3+4+5)

Establish a measurement protocol: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (with iter5 G's PGO-disabled binary), SW baseline}. Anchor in phaseN_evidence/.

Why: anchors all iter1iter5 claims to numbers. Carried five iterations. iter5 G's PGO-disabled Firefox is now deployed — first iteration that has both a clean driver AND a release-quality Firefox to measure with.

F. Multi-codec audit (carried from iter1 lock backlog)

Verify MPEG-2 decode path. iter1 lock said "H.264 first; MPEG-2 next." iter1+2+3+4+5 all H.264-only.

Plan: find an MPEG-2 test fixture, decode via mpv vaapi-copy + vainfo. Verify hantro G1's MPEG-2 path through libva-v4l2-request-fourier. Surface any codec-specific bugs (the iter4 DPB+request_fd fixes were H.264-specific; MPEG-2 has different control flow).

I. Firefox VIDIOC_QBUF EINVAL on first frame (NEW from iter5 amendment 2026-05-05)

The iter5 amendment (Utility seccomp fix) closes Track F: sandbox no longer blocks the V4L2 request API. With sandbox open, Firefox loads the v4l2_request driver, cap_pool_init: 24 slots ready succeeds, then a single Unable to queue buffer: Invalid argument (VIDIOC_QBUF EINVAL) on what looks like the first frame, after which Firefox falls back to SW.

Why this is a new candidate, not a Track A regression: mpv --hwdec=vaapi-copy decoded 2000 frames clean on the same iter5-end driver build (sha 4bed52ec5d44b389…). Only Firefox triggers the EINVAL. So the bug lives in the Firefox-specific consumer path through libva, not in the per-frame request_fd/DPB logic that iter4 closed.

Failure modes observed (iter5-amend telemetry, 2026-05-05):

bbb_1080p30_h264.mp4 (single decode attempt):
  v4l2-request: cap_pool_init: 24 slots ready
  v4l2-request: Unable to queue buffer: Invalid argument

YouTube avc1 (Enhancer for YouTube forcing h264, multiple decode attempts on same tab):
  v4l2-request: cap_pool_init: 24 slots ready  (×4)
  v4l2-request: Unable to set control(s): Invalid argument
  v4l2-request: Unable to queue buffer: Invalid argument

The S_EXT_CTRLS EINVAL is the more diagnostic signal — one of the compound H.264 controls (SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX) is being rejected by the kernel. iter5 sweep removed the iter4 per-control TRY isolation (d3a299b); reinstating it temporarily (or strace) is iter6 step one.

Diagnostic plan:

  1. Reinstate per-control VIDIOC_TRY_EXT_CTRLS isolation in v4l2.c (the iter4 diagnostic that pinpointed which compound control fails). Do not commit; use only for diagnosis.
  2. Strace the Firefox Utility process during decode init. Capture exact v4l2_ext_control payloads. Cross-reference with FFmpeg's v4l2_request_h264.c (cached at references/ffmpeg-kwiboo/, the empirical authority per memory).
  3. Once the offending control is identified, look at how Firefox's libva path constructs it differently from mpv-vaapi-copy. Likely: a header field Firefox doesn't populate, an order-of-operations difference, or a VAImage / VAExportSurfaceHandle side effect that perturbs driver state.

Reproducibility: 100% on either file:///home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 OR YouTube avc1 (with enhanced-h264ify or Enhancer for YouTube forcing h264) + LIBVA_DRIVER_NAME=v4l2_request env vars + iter5-amend Firefox 150 + sandbox enabled.

Risk: medium. Could be a 5-line fix in v4l2.c QBUF prep, or could surface a fundamental Firefox-vs-mpv divergence in libva surface management.

G. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5)

Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation (iter2 Fix 3 was statistical / LRU mitigation; this is architectural).

Why: race window mathematically eliminated. Significant kernel-side test surface (does hantro on this kernel actually accept DMABUF type? GStreamer's v4l2slh264dec uses MMAP, so DMABUF on hantro may not be tested upstream).

Risk: highest unknown of any candidate. Possibly requires kernel work.

H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)

Open the fourier-fresnel campaign — port libva-v4l2-request-fourier from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1+2+3+4+5 fixes on a second hardware target.

Stance: this is a separate top-level campaign, not an iter6 candidate. Listed here for completeness because operator may pick it as the next thing to open (sequenced ahead of panvk-bifrost). Charter at ~/src/fourier-fresnel/ once opened.

  • A + B (cap_pool race fix + msync verify) — small, surgical, both close iter5 carryovers, both prerequisites for clean upstream submission. Tightest scope.
  • A + C (cap_pool fix + chromium-fourier verify) — closes the cap_pool race AND validates the libva backend on a second consumer family. Mid scope.
  • D alone (upstream prep) — gated on operator instruction. Small if everything else is already clean (which iter5 mostly achieved).
  • E alone (perf binding cell) — anchors campaign-wide claims to numbers. Carried five iterations.
  • F alone (MPEG-2) — validates beyond H.264 scope.
  • G alone (DMABUF) — high-risk architectural.
  • I alone (Firefox QBUF EINVAL) — narrow, deterministic repro, gates the only known consumer-with-iter5-amendment hard-failure. Strong candidate for the iter6 lock.

State that carries (re-verified iter5 close)

  • Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: ohm (LAN) or ohm.vpn.
  • Userspace: firefox 150.0.1-1.1 (iter5 G PGO-disabled fourier rebuild), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
  • Driver installed: /usr/lib/dri/v4l2_request_drv_video.so sha256 4bed52ec5d44b389... (iter5-end, post-cleanup).
  • Test fixture: bbb_1080p30_h264.mp4 sha256 dcf8a7170fbd.... (For Track F MPEG-2: need new fixture.)
  • Build container: firefox-fourier LXD on boltzmann, persistent.

State that does NOT carry

  • iter5 mpv stress logs are tmpfs-volatile.
  • chromium-fourier 149 — uncertain whether it still builds against current Chromium upstream / mesa / etc. Track C's Phase 2 has to confirm or rebuild.

Tooling and measurement-instrument inventory

Same as iter5. Plus for new candidates:

  • For A (cap_pool race fix): write a probe-pattern test program (vaCreateSurfaces 16x16 then 1920x1080 in tight succession) to deterministically trigger the race. Quicker repro than waiting for mpv libplacebo to randomly hit it.
  • For B (msync verify): need a frame-hash baseline pre-iter5-sweep. Could use git checkout to a pre-c8b6ede commit + capture, then compare to iter5-end.
  • For C (chromium-fourier): need to find/rebuild chromium-fourier. Operator's ~/src/chromium-fourier/ likely has the build.
  • For E (perf): pidstat -u for CPU%, Mali-G52 freq via /sys/class/devfreq/fde60000.gpu.
  • For F (MPEG-2): need an MPEG-2 fixture (mpv --dump-stream from a public DVD or transcode bbb to MPEG-2).

In-scope (LOCKED 2026-05-05 for iteration 6) — AI (cap_pool / OUTPUT-buffer-recycle lifecycle)

Operator locked candidate I, then merged with candidate A after Phase 2 telemetry (2026-05-05) showed they are facets of the same underlying bug:

  • A (iter5 sonnet C4 carryover): cap_pool resolution-change race — REQBUFS-EBUSY when CAPTURE pool isn't drained before re-allocation. mpv libplacebo --vo=gpu triggers it via Vulkan-fallback resolution change.

  • I (iter6 candidate): Firefox VIDIOC_QBUF EINVAL on OUTPUT after ~19 successful S_EXT_CTRLS calls. The original "S_EXT_CTRLS EINVAL on frame 1" framing was transient state; the reproducible failure is at OUTPUT-buffer requeue with rotating source_index. mpv-vaapi-copy (single-surface recycle) doesn't hit this; Firefox (multi-surface rotation through libva) does.

Why merge: both are buffer-pool / surface-recycle lifecycle issues at OUTPUT (and CAPTURE) drain ordering. Partial fixes risk just shifting the symptom. The intended Phase 4 fix is a single coherent rework of cap_pool + DQBUF/QBUF sequencing in the surface lifecycle.

Other candidates (B, C, D, E, F, G) deferred to iter7+. H (fourier-fresnel) remains separate top-level campaign.

Out-of-scope (LOCKED 2026-05-05 for iteration 6)

  • iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
  • iter4-completed work (Track A frame-11 EINVAL fix) — done.
  • iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter6 candidate D.
  • New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
  • New target hardware (fresnel, ampere) — separate campaign (H above).

Phase 1 success criterion (LOCKED 2026-05-05 for iteration 6, AMENDED 2026-05-05 for AI merge)

All three consumer paths must be GREEN on the iter6 driver:

  1. Firefox (iter5-amend binary, sandbox enabled, LIBVA_DRIVER_NAME=v4l2_request) plays bbb_1080p30_h264.mp4 for ≥30 seconds with HW decode engaged: zero Unable to queue buffer / Unable to set control(s) per-frame, lsof /dev/video1 shows the Firefox Utility process holding the device, frames advance.

  2. mpv libplacebo --vo=gpu runs ≥30 seconds on the same fixture without segfault and without REQBUFS-EBUSY events at init or resolution-change boundaries (carries iter5 sonnet C4 caveat to closure).

  3. mpv --hwdec=vaapi-copy (regression check) decodes 2000 frames clean, identical pattern to iter5-end driver baseline (sha 4bed52ec5d44b389…).

Phase 5 sonnet review must confirm: (a) fix is libva-side (not consumer-specific kludge), (b) all three consumer paths verified, (c) no new mutable global state introduced (Track E discipline).

Phase 1 LOCKED. Iteration 6 proceeds.

iter6 = candidate I alone. Phases 2..8:

  • Phase 2: situation analysis — strace the failing Firefox decode, identify the offending compound H.264 control payload field, compare to FFmpeg v4l2_request_h264.c and to mpv-vaapi-copy's payload at the same call site.
  • Phase 3: baseline anchor — capture the iter5-amend driver stderr + about:support decoder backend strings on the failing case, snapshot pre-fix.
  • Phase 4: plan + execute the fix in the libva-v4l2-request-fourier driver (or, if root cause is upstream Firefox, document and stop).
  • Phase 5: sonnet review.
  • Phase 6: deploy fixed driver to ohm.
  • Phase 7: verify against the success criterion above.
  • Phase 8: close.

After lock, iter6 phases 2..8 proceed autonomously per "Stop only if user is needed."