marfrit 67494ae7ee Iteration 4 close — Track A locked, three-iteration carryover resolved
The iter1+iter2+iter3 frame-11 EINVAL is empirically eliminated. mpv
direct stress test on ohm via patched libva-v4l2-request-fourier:

  RequestBeginPicture:     2130
  RequestSyncSurface:      4254
  S_EXT_CTRLS EINVAL:      0
  Unable to set control(s): 0
  Generic EINVAL:          0
  ENETDOWN:                0

2130 frames at 24 fps = real-time HW decode (>98% of 2160-frame max
in 90 seconds wall time). Track A's Phase 1 success criterion crushed.

Three correctness fixes (4 fork commits):
- 74d8dd1: DPB fields=V4L2_H264_FRAME_REF + skip stale entries
- 385dee1: fresh request_fd per frame (THE load-bearing fix)
- b81ce69: B-slice L1 reflist .fields copy-paste

Plus diagnostic instrumentation (a12d299, 4892656, f21bdf0) deferred
to iter5 sweep alongside earlier iter1/iter3 instrumentation.

Three new memory entries: kernel obfuscation extends to compound TRY,
request_fd lifecycle (fresh per frame), FFmpeg as empirical authority.
README iteration table updated.

Carries to iter5 substrate: DEBUG sweep, mpv libplacebo segfault,
multi-context libva safety, PGO Firefox rebuild, eventual upstream
prep (Mozilla bug + bootlin libva-v4l2-request).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:29:43 +00:00

libva-multiplanar

Single-question campaign on the libva V4L2-stateless backend: make multi-planar libva work, end-to-end, on Rockchip hantro hardware, for production VA-API consumers (Brave/Chromium, Firefox via libavcodec, mpv --hwdec=vaapi, vainfo as smoke test).

The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 (initially) and MPEG-2 hardware-decoded NV12 dmabufs out of, on PineTab2 RK3568 first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the RK3568 path is solid.

The fork lives as a subdirectory of this campaign:

  • libva-v4l2-request-fourier/ — clone of bootlin/libva-v4l2-request with our master ahead. Existing substrate: see its STUDY.md for the build-cleanly + probe + control-flow + WIP-tracing work landed before this campaign opened.

This README is the Claude-facing entry point for resumption after compaction. Read it first.

Origin

fourier_attribution campaign closed 2026-05-04 with the per-package wheat-vs-chaff verdict on bbb 1080p H.264 first-60s playback (PineTab2):

  • kwin-fourier: WHEAT, robust. Removing it triples kwin CPU, drives Mali to 95 % peak-freq residency, doubles drops. Confirmed.
  • chromium-fourier: WHEAT-but-fragile (Sonnet review's downgrade). Removing it (= falling back to stock Brave 1.89 / Chromium-147 base) costs 83 pp browser CPU (54 % → 137 %) — a magnitude consistent with multi-planar libva enabling the hantro hardware-decode fast path, but confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149 control) was identified as the cheapest disambiguator and not yet run.
  • qt6-fourier: CHAFF on this workload.

Phase 5 review: https://dokuwiki.reauktion.de/doku.php?id=fourier:attribution_2026-05-03

The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. The reverse is also true: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on RK3568. Firefox VAAPI, mpv --hwdec=vaapi, gst-vaapi, vainfo all degrade to software or fall over.

Process

Eight-plus-one phase loop per feedback_dev_process.md. Phase 0 of each iteration is locked in phase0_findings*.md — read the latest iteration's substrate next.

Phase 5 (second-model review) and Phase 8 (iteration close + memory entry) follow the predecessor cadence — invoke the sonnet subagent for the review pattern.

Per the feedback_replicate_baseline_first.md lesson: any binding cell in this campaign anchors to in-session-acquired data. The migrated STUDY.md material and ohm_gl_fix patch-correctness audits are reference history, not threshold sources.

Iteration history

Iter Status Locked question Outcome
1 Closed 2026-05-04 "Does multi-planar libva-v4l2-request decode H.264 to NV12 dmabufs on hantro for any consumer?" YES. vaapi-copy + Firefox-with-sandbox-bypass + vainfo all engage hantro. Documented bugs: surface-export DMA-BUF lifecycle race, multi-resolution session corruption, Mesa WSI 64-pitch alignment. See phase8_iteration1_close.md.
2 Closed 2026-05-04 "Harden the iter1 deliverable: fix the three known bugs without regressing scope." DONE. Fix 1 (resolution-change format-cache invalidation), Fix 2 (DRM_FORMAT_MOD_INVALID conditional for non-64 pitch), Fix 3 (decoupled cap_pool with LRU recycling for DMA-BUF lifecycle). mpv vaapi DMA-BUF playback "smooth" per operator inspection. See phase8_iteration2_close.md.
3 Closed 2026-05-05 "F+A: verify the Firefox RDD sandbox hypothesis by patched-binary, while resolving the carryover frame-11 EINVAL on the same rig." F GREEN — patched Firefox decodes through libva without MOZ_DISABLE_RDD_SANDBOX=1 (broker policy + seccomp ioctl '|' allow + driver select() → poll() migration). A REPRODUCED — frame-11 EINVAL fires deterministically on a single-slice P-frame, Y2 instrumentation logs the failing controls. Track A's fix deferred to iter4. See phase8_iteration3_close.md.
4 Closed 2026-05-05 "Track A solo — fix the iter1+2+3 carryover frame-11 EINVAL." GREEN. Three correctness fixes landed (DPB fields=FRAME_REF + skip stale entries, fresh request_fd per frame, B-slice L1 reflist .fields copy-paste). mpv direct stress test verified 2130 BeginPictures over 90s with 0 EINVAL events of any kind — real-time HW decode through libva-v4l2-request-fourier. See phase8_iteration4_close.md.

Predecessor work that this campaign builds on

State (carry-over) — fork content, file:line pointers, contract analyses:

  • libva-v4l2-request-fourier/STUDY.md — Phase 0 / Phase 2 substrate already written, dated through 2026-05-02. Goal statement, why-the-fork-exists, build-cleanly stack of fixes, probe/control-flow fixes, eager-probe rationale, failure-mode-as-of-2026-04-26 (Brave-side wall is chromeos pipeline, not libva surface stack).
  • libva-v4l2-request-fourier/ git history: 12 commits ahead of bootlin tip a3c2476, including kernel-UAPI renames, NV12 multi-plane format entry, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE probe fallback, and recent (2026-05-02) WIP entry-point tracing for Brave's libva surface stack.
  • ~/src/ohm_gl_fix/phase6/step1/ — Step 1 patches 0001..0018, contract-correct port of libva-v4l2-request to hantro multi-planar / Chromium-149 era. Audit at audit_0008_decode_params_2026-05-01.md. vainfo enumerates H.264 profiles cleanly on this binary; Brave's chromium-fourier 149 binary engages this libva path end-to-end (per fourier_attribution cell A's 54 % browser CPU vs cell B's 137 %). Step 1 patches are the working substrate that this campaign should reconcile against the libva-v4l2-request-fourier master and either fold-in or supersede.
  • ~/src/ohm_gl_fix/ — closed campaign, README documents the Step 1 audit and the test corpus (bbb_1080p30_h264.mp4 etc.).
  • ~/src/fourier_attribution/ — most recent campaign. Pay attention to:
    • Cell A (chromium-fourier on, libva-multi-planar engaged): browser_cpu_median = 54.4 %, fps = 24.0, drops_60s = 12.
    • Cell B (Brave 1.89 / Chromium 147, libva path absent or broken): browser_cpu_median = 137.15 %, fps = 23.18, drops_60s = 16.
    • phase4_findings.md for cross-cell verdict; phase5_review_sonnet_2026-05-04.md for the reviewer's pushback on the chromium-fourier conclusion.

Reference history (context, NOT data this campaign anchors to) — orthogonal scanout-plane constraint:

  • ~/src/kwin_overlay_subsurface/phase2_source_findings.md — rockchip-drm RK3568 plane format/modifier table. Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12. Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. This is orthogonal to libva: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes.
  • ~/src/x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md — confirms the scanout-plane gap isn't fixable by switching session servers either. mpv-xv {SW,HW} and mpv-gpu {SW,HW} all leave Plane 39 in XRGB8888 throughout. It's a kernel/Mesa/Xorg-DDX gap, not a hardware-decoding gap. Don't expect this campaign to "fix the video pipeline end-to-end" — fixing libva-multi-planar fixes the decode side; the scanout-plane question stays open after.
  • ~/src/kwin_overlay_subsurface/ — closed without patch (phase8_handover.md); its feedback_replicate_baseline_first.md lesson is the discipline that this campaign inherits.

External reference:

  • Mozilla bug 1833354 / 1965646 (Firefox HW decode on RK3566/RK3588 explicitly needs libva-v4l2-request, not v4l2-m2m).
  • Bootlin upstream bootlin/libva-v4l2-request — dormant since 2021, written for single-plane sunxi-cedrus.
  • Collabora's cros-codecs (Rust, bypasses libva) — strategic replacement, not shipping soon.
  • Other dormant forks (per STUDY.md): jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov — none ship multi-planar.

In-scope (LOCKED 2026-05-04 in phase0_findings.md)

  • libva-v4l2-request fork (libva-v4l2-request-fourier/), backend only — multi-planar correctness across the V4L2-stateless lifecycle: format probing (single-plane fallback to multi-plane), control protocol sequencing, surface-handle export, dmabuf modifier negotiation.
  • H.264 first; MPEG-2 next. HEVC explicitly out.
  • Hardware target: ohm RK3568 hantro G1/G2 first iteration only. fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid.
  • Test consumers: vainfo, mpv --hwdec=vaapi, Firefox media.ffmpeg.vaapi.enabled, chromium-fourier 149 (regression check). Brave 1.89 deferred (chromeos-pipeline gating, not a libva-side problem).
  • Phase 1 success criterion: boolean correctness — "libva accepted + providing access to hardware decoder". Performance metrics deferred to follow-up iteration.

Out-of-scope (LOCKED 2026-05-04)

  • Front-end libva (API library). Backend only.
  • Other hardware: fresnel, ampere, boltzmann — separate iterations.
  • HEVC, VP8, VP9, AV1 codecs.
  • Performance metrics (CPU%, fps, drops_60s, panfrost freq).
  • KWin / Wayland scanout-plane work — orthogonal (kwin_overlay_subsurface closed without patch).
  • cros-codecs Rust replacement (per user_stance_rust.md).
  • Bootlin / Collabora upstreaming (per feedback_no_upstream.md).

Hardware target

  • ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Kernel 6.19.10-danctnix1-1-pinetab2. Primary measurement target.
  • (later) fresnel — Pinebook Pro, Rockchip RK3399 (hantro G1, no G2). EndeavourOS-ARM custom OC kernel — see reference_fresnel_kernel_constraints.md.
  • (much later) ampere/boltzmann — RK3588 (hantro VDPU381). Adding VDPU381 is a code addition this fork doesn't have today.

Non-upstreaming default

Inherited from the predecessors. Patches must be aligned to upstream in syntax and semantics, but no PR/MR/bug-report happens without explicit operator instruction. Bootlin upstream is dormant; once this campaign reaches a defensible state, Markus may wish to engage Bootlin / Collabora / Hans de Goede / Jernej Škrabec — that's a separate explicit decision.

Repository layout

~/src/libva-multiplanar/                       <- this campaign (its own git repo for findings)
├── README.md                                  <- this file
├── (worklist.md, phase0_findings.md, ... — created as phases land)
└── libva-v4l2-request-fourier/                <- the actual fork (separate git repo)
    ├── .git/                                  <- origin: marfrit/libva-v4l2-request-fourier
    │                                              upstream: bootlin/libva-v4l2-request
    ├── STUDY.md                               <- pre-existing Phase 0/2 substrate
    └── src/                                   <- libva-v4l2-request source tree

The campaign repo and the fork repo are separate git repositories — campaign findings and fork commits are versioned independently. This matches the operator's general pattern (ohm_gl_fix campaign vs the bootlin fork it patched).

Operator-facing repo URL TBD: probably git.reauktion.de/marfrit/libva-multiplanar once the campaign produces something worth pushing. The fork is already at git.reauktion.de/marfrit/libva-v4l2-request-fourier.

File map

Iteration 1 (closed):

File What it is
phase0_findings.md iter1 substrate: locked research question, locked scope, predecessor state, source-read references
phase0_evidence/ iter1 inventory + baseline anchor
phase4_iter2_plan.md (mis-named — actually iter1 Phase 4) diff against FFmpeg + hantro kernel source identifying the bug fixed in iter1
phase5_review_2026-05-04.md iter1 sonnet review
phase6_findings.md iter1 Phase 6: hantro decodes real H.264 pixels
phase7_findings.md iter1 Phase 7 verification: vaapi-copy works, surface-export bug surfaces
phase8_iteration1_close.md iter1 close
diff_against_ffmpeg.md Cross-reference of fork divergence vs FFmpeg's V4L2 request-API code

Iteration 2 (closed):

File What it is
phase0_findings_iter2.md iter2 substrate
phase2_iter2_analysis.md iter2 situation analysis
phase5_review_iter2_2026-05-04.md iter2 sonnet review (3 architecture blockers + REQBUFS gap)
phase8_iteration2_close.md iter2 close (Fix 1 + Fix 2 + Fix 3 landed)

Iteration 3 (in progress):

File What it is
phase0_findings_iter3.md iter3 substrate. Read this for current iteration state.
phase2_iter3_situation.md Mozilla sandbox source verbatim (broker policy + cap filter)
phase3_iter3_baseline.md Pre-patch baseline anchor (ohm offline; iter2-close evidence anchored)
phase4_iter3_plan.md Patch authorship + PKGBUILD overlay + Track A diagnostic plan
phase5_iter3_review.md iter3 Phase 5 sonnet review (Y1 patch idiom fix, Y2 driver error_idx instrumentation, B-slice bug)
phase6_iter3_findings.md iter3 Phase 6 build-side surprises (proper unified-diff, no --enable-v4l2, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap)
firefox-fourier/ Patch + PKGBUILD overlay artifacts for the boltzmann LXD container build
firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch The Firefox RDD sandbox patch (allows /dev/media*; cap-filter widened for stateless decoders)
firefox-fourier/PKGBUILD-overlay.md PKGBUILD overlay strategy — verified working sequence
firefox-fourier/bootstrap.sh Reproducible bootstrap script (run as builder inside the firefox-fourier LXD)

Always-current:

File What it is
README.md This file
libva-v4l2-request-fourier/ The fork (separate repo: marfrit/libva-v4l2-request-fourier)
references/ External docs: kernel source excerpts, Mozilla bugzilla notes

Build infrastructure

iter3 introduced a remote build host: firefox-fourier LXD container on boltzmann (RK3588 aarch64, 8 cores, 24 GB RAM, NVMe /build). Provisioned by the his agent, accessed as ssh -J boltzmann builder@firefox-fourier. Used to compile Firefox 150.0.1 with the iter3 sandbox patch ("firefox-fourier" build).

S
Description
Single-question campaign: make multi-planar libva accepted by VA-API consumers on Rockchip hantro hardware (RK3568/PineTab2 first iteration).
Readme 836 KiB
Languages
Shell 67.8%
JavaScript 23.6%
HTML 8.6%