Files
libva-multiplanar/README.md
T
claude-noether 080490d4a7 firefox-fourier: add user-facing README explaining quick vs proper path
Two paths for HW decode through Firefox:
1. Easy: stock Firefox + MOZ_DISABLE_RDD_SANDBOX=1 (sandbox off,
   defense-in-depth lost — OK for personal-machine use)
2. Proper: apply 0001-rdd-allow-stateless-v4l2-request-api.patch to
   firefox-150.0.1 source, build (Arch overlay or generic mach build),
   install. Sandbox stays on; HW decode works.

Patch covers all three sandbox gates discovered iter3-5:
- Broker: cap-filter widening + /dev/media* enumeration
- Seccomp: ioctl magic byte '|' (linux/media.h)

README also points at the companion libva-v4l2-request-fourier repo
which carries the libva-side fixes (request_fd lifecycle, DPB
FFmpeg-semantics, B-slice L1, multi-context safety) needed alongside
the Firefox patch.

Top-level README cross-link added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:48:34 +00:00

166 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# libva-multiplanar
Single-question campaign on the libva V4L2-stateless backend: **make multi-planar libva work, end-to-end, on Rockchip hantro hardware, for production VA-API consumers (Brave/Chromium, Firefox via libavcodec, mpv `--hwdec=vaapi`, vainfo as smoke test)**.
The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 (initially) and MPEG-2 hardware-decoded NV12 dmabufs out of, on PineTab2 RK3568 first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the RK3568 path is solid.
The fork lives as a subdirectory of this campaign:
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) — clone of `bootlin/libva-v4l2-request` with our `master` ahead. Existing substrate: see its [`STUDY.md`](libva-v4l2-request-fourier/STUDY.md) for the build-cleanly + probe + control-flow + WIP-tracing work landed before this campaign opened.
This README is the Claude-facing entry point for resumption after compaction. Read it first.
## Origin
`fourier_attribution` campaign closed 2026-05-04 with the per-package wheat-vs-chaff verdict on bbb 1080p H.264 first-60s playback (PineTab2):
- **kwin-fourier**: WHEAT, robust. Removing it triples kwin CPU, drives Mali to 95 % peak-freq residency, doubles drops. Confirmed.
- **chromium-fourier**: WHEAT-but-fragile (Sonnet review's downgrade). Removing it (= falling back to stock Brave 1.89 / Chromium-147 base) costs 83 pp browser CPU (54 % → 137 %) — a magnitude consistent with **multi-planar libva enabling the hantro hardware-decode fast path**, but confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149 control) was identified as the cheapest disambiguator and not yet run.
- **qt6-fourier**: CHAFF on this workload.
Phase 5 review: <https://dokuwiki.reauktion.de/doku.php?id=fourier:attribution_2026-05-03>
The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. **The reverse is also true**: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on RK3568. Firefox VAAPI, mpv `--hwdec=vaapi`, gst-vaapi, vainfo all degrade to software or fall over.
## Process
Eight-plus-one phase loop per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md). Phase 0 of each iteration is locked in `phase0_findings*.md` — read the latest iteration's substrate next.
Phase 5 (second-model review) and Phase 8 (iteration close + memory entry) follow the predecessor cadence — invoke the sonnet subagent for the review pattern.
Per the [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md) lesson: any binding cell in this campaign anchors to in-session-acquired data. The migrated STUDY.md material and ohm_gl_fix patch-correctness audits are reference history, not threshold sources.
## Iteration history
| Iter | Status | Locked question | Outcome |
|---|---|---|---|
| 1 | Closed 2026-05-04 | "Does multi-planar libva-v4l2-request decode H.264 to NV12 dmabufs on hantro for any consumer?" | YES. vaapi-copy + Firefox-with-sandbox-bypass + vainfo all engage hantro. Documented bugs: surface-export DMA-BUF lifecycle race, multi-resolution session corruption, Mesa WSI 64-pitch alignment. See `phase8_iteration1_close.md`. |
| 2 | Closed 2026-05-04 | "Harden the iter1 deliverable: fix the three known bugs without regressing scope." | DONE. Fix 1 (resolution-change format-cache invalidation), Fix 2 (DRM_FORMAT_MOD_INVALID conditional for non-64 pitch), Fix 3 (decoupled `cap_pool` with LRU recycling for DMA-BUF lifecycle). mpv vaapi DMA-BUF playback "smooth" per operator inspection. See `phase8_iteration2_close.md`. |
| 3 | Closed 2026-05-05 | "F+A: verify the Firefox RDD sandbox hypothesis by patched-binary, while resolving the carryover frame-11 EINVAL on the same rig." | F GREEN — patched Firefox decodes through libva without `MOZ_DISABLE_RDD_SANDBOX=1` (broker policy + seccomp ioctl `'\|'` allow + driver `select() → poll()` migration). A REPRODUCED — frame-11 EINVAL fires deterministically on a single-slice P-frame, Y2 instrumentation logs the failing controls. Track A's fix deferred to iter4. See `phase8_iteration3_close.md`. |
| 4 | Closed 2026-05-05 | "Track A solo — fix the iter1+2+3 carryover frame-11 EINVAL." | GREEN. Three correctness fixes landed (DPB `fields=FRAME_REF` + skip stale entries, fresh `request_fd` per frame, B-slice L1 reflist `.fields` copy-paste). mpv direct stress test verified 2130 BeginPictures over 90s with 0 EINVAL events of any kind — real-time HW decode through libva-v4l2-request-fourier. See `phase8_iteration4_close.md`. |
| 5 | Closed 2026-05-05 | "A+G+B+E quad: DEBUG sweep + PGO-disabled Firefox rebuild + libplacebo segfault + multi-context safety." | GREEN, all four tracks. ~339 lines of instrumentation removed (iter1+iter3+iter4 noise) — driver builds clean, per-frame log noise zero. firefox-fourier 150.0.1-1.1 rebuilt non-PGO (169 MB libxul, 21× smaller, 2.7× faster decode). LAST_OUTPUT_* moved per-driver-data. mpv `--vo=gpu` 0 segfaults. One iter6+ caveat: cap_pool resolution-change race latent under untested consumer probe patterns (Phase 5 sonnet C4). See `phase8_iteration5_close.md`. |
## Predecessor work that this campaign builds on
State (carry-over) — fork content, file:line pointers, contract analyses:
- [`libva-v4l2-request-fourier/STUDY.md`](libva-v4l2-request-fourier/STUDY.md) — Phase 0 / Phase 2 substrate already written, dated through 2026-05-02. Goal statement, why-the-fork-exists, build-cleanly stack of fixes, probe/control-flow fixes, eager-probe rationale, failure-mode-as-of-2026-04-26 (Brave-side wall is chromeos pipeline, not libva surface stack).
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) git history: 12 commits ahead of bootlin tip `a3c2476`, including kernel-UAPI renames, NV12 multi-plane format entry, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE probe fallback, and recent (2026-05-02) WIP entry-point tracing for Brave's libva surface stack.
- [`~/src/ohm_gl_fix/phase6/step1/`](../ohm_gl_fix/phase6/step1/) — Step 1 patches 0001..0018, contract-correct port of libva-v4l2-request to hantro multi-planar / Chromium-149 era. Audit at `audit_0008_decode_params_2026-05-01.md`. **vainfo enumerates H.264 profiles cleanly on this binary; Brave's chromium-fourier 149 binary engages this libva path end-to-end** (per `fourier_attribution` cell A's 54 % browser CPU vs cell B's 137 %). Step 1 patches are the working substrate that this campaign should reconcile against the libva-v4l2-request-fourier `master` and either fold-in or supersede.
- [`~/src/ohm_gl_fix/`](../ohm_gl_fix/) — closed campaign, README documents the Step 1 audit and the test corpus (`bbb_1080p30_h264.mp4` etc.).
- [`~/src/fourier_attribution/`](../fourier_attribution/) — most recent campaign. Pay attention to:
- Cell A (chromium-fourier on, libva-multi-planar engaged): browser_cpu_median = 54.4 %, fps = 24.0, drops_60s = 12.
- Cell B (Brave 1.89 / Chromium 147, libva path absent or broken): browser_cpu_median = 137.15 %, fps = 23.18, drops_60s = 16.
- `phase4_findings.md` for cross-cell verdict; `phase5_review_sonnet_2026-05-04.md` for the reviewer's pushback on the chromium-fourier conclusion.
Reference history (context, NOT data this campaign anchors to) — orthogonal scanout-plane constraint:
- [`~/src/kwin_overlay_subsurface/phase2_source_findings.md`](../kwin_overlay_subsurface/phase2_source_findings.md) — rockchip-drm RK3568 plane format/modifier table. **Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12.** Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. **This is orthogonal to libva**: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes.
- [`~/src/x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md`](../x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md) — confirms the scanout-plane gap isn't fixable by switching session servers either. mpv-xv {SW,HW} and mpv-gpu {SW,HW} all leave Plane 39 in XRGB8888 throughout. It's a kernel/Mesa/Xorg-DDX gap, not a hardware-decoding gap. **Don't expect this campaign to "fix the video pipeline end-to-end"** — fixing libva-multi-planar fixes the decode side; the scanout-plane question stays open after.
- [`~/src/kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) — closed without patch (`phase8_handover.md`); its `feedback_replicate_baseline_first.md` lesson is the discipline that this campaign inherits.
**Firefox patch** for the RDD sandbox: see [`firefox-fourier/README.md`](firefox-fourier/README.md). Two ways to get HW decode working — quick (env var, sandbox off) or proper (patched Firefox, sandbox kept on). Patch is ~50 lines, applies cleanly to firefox-150.0.1 source.
External reference:
- Mozilla bug 1833354 / 1965646 (Firefox HW decode on RK3566/RK3588 explicitly needs libva-v4l2-request, not v4l2-m2m).
- Bootlin upstream `bootlin/libva-v4l2-request` — dormant since 2021, written for single-plane sunxi-cedrus.
- Collabora's `cros-codecs` (Rust, bypasses libva) — strategic replacement, not shipping soon.
- Other dormant forks (per `STUDY.md`): jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov — none ship multi-planar.
## In-scope (LOCKED 2026-05-04 in phase0_findings.md)
- libva-v4l2-request fork (`libva-v4l2-request-fourier/`), **backend only** — multi-planar correctness across the V4L2-stateless lifecycle: format probing (single-plane fallback to multi-plane), control protocol sequencing, surface-handle export, dmabuf modifier negotiation.
- H.264 first; MPEG-2 next. HEVC explicitly out.
- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only.** fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid.
- Test consumers: vainfo, mpv `--hwdec=vaapi`, Firefox `media.ffmpeg.vaapi.enabled`, chromium-fourier 149 (regression check). Brave 1.89 deferred (chromeos-pipeline gating, not a libva-side problem).
- Phase 1 success criterion: **boolean correctness** — "libva accepted + providing access to hardware decoder". Performance metrics deferred to follow-up iteration.
## Out-of-scope (LOCKED 2026-05-04)
- Front-end libva (API library). Backend only.
- Other hardware: fresnel, ampere, boltzmann — separate iterations.
- HEVC, VP8, VP9, AV1 codecs.
- Performance metrics (CPU%, fps, drops_60s, panfrost freq).
- KWin / Wayland scanout-plane work — orthogonal (`kwin_overlay_subsurface` closed without patch).
- `cros-codecs` Rust replacement (per [`user_stance_rust.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/user_stance_rust.md)).
- Bootlin / Collabora upstreaming (per [`feedback_no_upstream.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_no_upstream.md)).
## Hardware target
- ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Kernel `6.19.10-danctnix1-1-pinetab2`. **Primary measurement target.**
- (later) fresnel — Pinebook Pro, Rockchip RK3399 (hantro G1, no G2). EndeavourOS-ARM custom OC kernel — see [`reference_fresnel_kernel_constraints.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/reference_fresnel_kernel_constraints.md).
- (much later) ampere/boltzmann — RK3588 (hantro VDPU381). Adding VDPU381 is a code addition this fork doesn't have today.
## Non-upstreaming default
Inherited from the predecessors. Patches must be aligned to upstream in syntax and semantics, but no PR/MR/bug-report happens without explicit operator instruction. Bootlin upstream is dormant; once this campaign reaches a defensible state, Markus may wish to engage Bootlin / Collabora / Hans de Goede / Jernej Škrabec — that's a separate explicit decision.
## Repository layout
```
~/src/libva-multiplanar/ <- this campaign (its own git repo for findings)
├── README.md <- this file
├── (worklist.md, phase0_findings.md, ... — created as phases land)
└── libva-v4l2-request-fourier/ <- the actual fork (separate git repo)
├── .git/ <- origin: marfrit/libva-v4l2-request-fourier
│ upstream: bootlin/libva-v4l2-request
├── STUDY.md <- pre-existing Phase 0/2 substrate
└── src/ <- libva-v4l2-request source tree
```
The campaign repo and the fork repo are **separate git repositories** — campaign findings and fork commits are versioned independently. This matches the operator's general pattern (`ohm_gl_fix` campaign vs the bootlin fork it patched).
Operator-facing repo URL TBD: probably `git.reauktion.de/marfrit/libva-multiplanar` once the campaign produces something worth pushing. The fork is already at `git.reauktion.de/marfrit/libva-v4l2-request-fourier`.
## File map
Iteration 1 (closed):
| File | What it is |
|---|---|
| `phase0_findings.md` | iter1 substrate: locked research question, locked scope, predecessor state, source-read references |
| `phase0_evidence/` | iter1 inventory + baseline anchor |
| `phase4_iter2_plan.md` | (mis-named — actually iter1 Phase 4) diff against FFmpeg + hantro kernel source identifying the bug fixed in iter1 |
| `phase5_review_2026-05-04.md` | iter1 sonnet review |
| `phase6_findings.md` | iter1 Phase 6: hantro decodes real H.264 pixels |
| `phase7_findings.md` | iter1 Phase 7 verification: vaapi-copy works, surface-export bug surfaces |
| `phase8_iteration1_close.md` | iter1 close |
| `diff_against_ffmpeg.md` | Cross-reference of fork divergence vs FFmpeg's V4L2 request-API code |
Iteration 2 (closed):
| File | What it is |
|---|---|
| `phase0_findings_iter2.md` | iter2 substrate |
| `phase2_iter2_analysis.md` | iter2 situation analysis |
| `phase5_review_iter2_2026-05-04.md` | iter2 sonnet review (3 architecture blockers + REQBUFS gap) |
| `phase8_iteration2_close.md` | iter2 close (Fix 1 + Fix 2 + Fix 3 landed) |
Iteration 3 (in progress):
| File | What it is |
|---|---|
| `phase0_findings_iter3.md` | iter3 substrate. **Read this for current iteration state.** |
| `phase2_iter3_situation.md` | Mozilla sandbox source verbatim (broker policy + cap filter) |
| `phase3_iter3_baseline.md` | Pre-patch baseline anchor (ohm offline; iter2-close evidence anchored) |
| `phase4_iter3_plan.md` | Patch authorship + PKGBUILD overlay + Track A diagnostic plan |
| `phase5_iter3_review.md` | iter3 Phase 5 sonnet review (Y1 patch idiom fix, Y2 driver error_idx instrumentation, B-slice bug) |
| `phase6_iter3_findings.md` | iter3 Phase 6 build-side surprises (proper unified-diff, no `--enable-v4l2`, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap) |
| `firefox-fourier/` | Patch + PKGBUILD overlay artifacts for the boltzmann LXD container build |
| `firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch` | The Firefox RDD sandbox patch (allows /dev/media\*; cap-filter widened for stateless decoders) |
| `firefox-fourier/PKGBUILD-overlay.md` | PKGBUILD overlay strategy — verified working sequence |
| `firefox-fourier/bootstrap.sh` | Reproducible bootstrap script (run as `builder` inside the firefox-fourier LXD) |
Always-current:
| File | What it is |
|---|---|
| `README.md` | This file |
| `libva-v4l2-request-fourier/` | The fork (separate repo: `marfrit/libva-v4l2-request-fourier`) |
| `references/` | External docs: kernel source excerpts, Mozilla bugzilla notes |
## Build infrastructure
iter3 introduced a remote build host: `firefox-fourier` LXD container on `boltzmann` (RK3588 aarch64, 8 cores, 24 GB RAM, NVMe `/build`). Provisioned by the `his` agent, accessed as `ssh -J boltzmann builder@firefox-fourier`. Used to compile Firefox 150.0.1 with the iter3 sandbox patch ("firefox-fourier" build).