Files
libva-multiplanar/README.md
T
claude-noether ec769a9687 docs: clarify Rockchip silicon across operative docs (RK3566)
PineTab2 is Rockchip RK3566 silicon, not RK3568. The hantro driver
attaches via the rockchip,rk3568-vpu DT compatible because RK3566/
RK3568 silicon is close enough to share that variant. The proper
RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel
support yet — Christian Hewitt's patch series LKML 2025/12/26/206
is unmerged.

Updated operative docs to use the consistent form:
"PineTab2 (Rockchip RK3566 silicon; hantro driver via the
rockchip,rk3568-vpu DT compatible)" or shorter variants.

Files updated:
- README.md (campaign top-level): TL;DR, deliverable, KWin link,
  hardware target, hardware listing
- firefox-fourier/README.md: tested-on line
- phase8_iteration7_close.md: hardware carry
- phase8_iteration6_close.md: hardware carry, MPEG-2 drop
  rationale
- phase0_findings_iter7.md: predecessor summary, fourier-fresnel
  description, hardware carry
- phase2_iter7_situation.md: msync hypothesis hardware reference

Historical iter1-iter5 phase docs left as-is — they're snapshots
of what the campaign believed at the time. The canonical source
for the silicon-ID correction is track_F_research_2026-05-06.md
(commit 358801b).

Not a correctness change. The campaign's empirical evidence is
unaffected — the hantro/rk3568-vpu driver path that we exercised
was always the actual decode path on PineTab2 silicon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:39:28 +00:00

178 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# libva-multiplanar
## TL;DR
**libva** (= "Linux Video Acceleration API") is the standard userspace shim that lets video apps — Firefox, Chromium, mpv, GStreamer, vlc — talk to GPU/VPU hardware decoders without per-vendor code. Apps call libva; libva loads a backend driver (`.so` in `/usr/lib/dri/`) for the local hardware. Common backends: `intel-iHD` (Intel iGPUs), `mesa-va-gallium` (AMD), `nvidia-vaapi-driver`, and **`v4l2_request`** for V4L2-stateless ARM/embedded decoders (Rockchip hantro, Allwinner cedrus, Sun*, etc.). This campaign hardens a fork of the v4l2-request backend to actually work on Rockchip hantro (PineTab2 — Rockchip **RK3566** silicon, hantro driver via the `rockchip,rk3568-vpu` DT compatible since the silicon is close enough; mainline `rkvdec2`/`vdpu346` for RK3566 not yet merged), end-to-end, for real consumers.
**`firefox-fourier`** ([build instructions](firefox-fourier/README.md)) exists because Firefox 150's RDD-process sandbox blocks two things V4L2-stateless decoders need: `/dev/media*` request-API nodes (broker policy never lists them) and `linux/media.h` ioctls (seccomp policy doesn't admit magic byte `'|'`). Stock Firefox on hantro/cedrus/sun* hardware therefore SW-decodes everything unless you set `MOZ_DISABLE_RDD_SANDBOX=1` (which turns the whole sandbox off). The `firefox-fourier` patch — ~50 lines across two files — opens *just* what V4L2-stateless decoders need, sandbox stays on. Tested working with the iter5-end libva backend.
**Sister campaigns** in the operator's "fourier" series — same hardware, different layers of the pipeline:
- [`kwin-fourier`](https://git.reauktion.de/marfrit/marfrit-packages/src/branch/master/arch/kwin-fourier) — KWin compositor patches addressing scanout-plane / dmabuf-fence behavior on rockchip-drm (the kernel DRM driver that handles RK3566/RK3568/RK3588 display). Decode-side fix in this campaign + render-side fix in kwin-fourier are complementary; together they make HW video work end-to-end on PineTab2.
- `chromium-fourier` — Chromium fork enabling VAAPI on aarch64 + GL-stack workarounds for Mali-G52/Panfrost. Stock Brave doesn't even reach VAAPI on this rig (GPU process dies at GL bindings); chromium-fourier unblocks it. Verifying compatibility with iter5 driver is iter6 candidate C.
---
Single-question campaign on the libva V4L2-stateless backend: **make multi-planar libva work, end-to-end, on Rockchip hantro hardware, for production VA-API consumers (Brave/Chromium, Firefox via libavcodec, mpv `--hwdec=vaapi`, vainfo as smoke test)**.
The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 hardware-decoded NV12 dmabufs out of, on PineTab2 (RK3566 via hantro/rk3568-vpu) first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the PineTab2 path is solid. (MPEG-2 was iter1 backlog, dropped at iter6 close — CPU handles MPEG-2 fine on the A55 cluster.)
The fork lives as a subdirectory of this campaign:
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) — clone of `bootlin/libva-v4l2-request` with our `master` ahead. Existing substrate: see its [`STUDY.md`](libva-v4l2-request-fourier/STUDY.md) for the build-cleanly + probe + control-flow + WIP-tracing work landed before this campaign opened.
This README is the Claude-facing entry point for resumption after compaction. Read it first.
## Origin
`fourier_attribution` campaign closed 2026-05-04 with the per-package wheat-vs-chaff verdict on bbb 1080p H.264 first-60s playback (PineTab2):
- **kwin-fourier**: WHEAT, robust. Removing it triples kwin CPU, drives Mali to 95 % peak-freq residency, doubles drops. Confirmed.
- **chromium-fourier**: WHEAT-but-fragile (Sonnet review's downgrade). Removing it (= falling back to stock Brave 1.89 / Chromium-147 base) costs 83 pp browser CPU (54 % → 137 %) — a magnitude consistent with **multi-planar libva enabling the hantro hardware-decode fast path**, but confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149 control) was identified as the cheapest disambiguator and not yet run.
- **qt6-fourier**: CHAFF on this workload.
Phase 5 review: <https://dokuwiki.reauktion.de/doku.php?id=fourier:attribution_2026-05-03>
The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. **The reverse is also true**: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on PineTab2 (RK3566 via hantro/rk3568-vpu). Firefox VAAPI, mpv `--hwdec=vaapi`, gst-vaapi, vainfo all degrade to software or fall over.
## Process
Eight-plus-one phase loop per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md). Phase 0 of each iteration is locked in `phase0_findings*.md` — read the latest iteration's substrate next.
Phase 5 (second-model review) and Phase 8 (iteration close + memory entry) follow the predecessor cadence — invoke the sonnet subagent for the review pattern.
Per the [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md) lesson: any binding cell in this campaign anchors to in-session-acquired data. The migrated STUDY.md material and ohm_gl_fix patch-correctness audits are reference history, not threshold sources.
## Iteration history
| Iter | Status | Locked question | Outcome |
|---|---|---|---|
| 1 | Closed 2026-05-04 | "Does multi-planar libva-v4l2-request decode H.264 to NV12 dmabufs on hantro for any consumer?" | YES. vaapi-copy + Firefox-with-sandbox-bypass + vainfo all engage hantro. Documented bugs: surface-export DMA-BUF lifecycle race, multi-resolution session corruption, Mesa WSI 64-pitch alignment. See `phase8_iteration1_close.md`. |
| 2 | Closed 2026-05-04 | "Harden the iter1 deliverable: fix the three known bugs without regressing scope." | DONE. Fix 1 (resolution-change format-cache invalidation), Fix 2 (DRM_FORMAT_MOD_INVALID conditional for non-64 pitch), Fix 3 (decoupled `cap_pool` with LRU recycling for DMA-BUF lifecycle). mpv vaapi DMA-BUF playback "smooth" per operator inspection. See `phase8_iteration2_close.md`. |
| 3 | Closed 2026-05-05 | "F+A: verify the Firefox RDD sandbox hypothesis by patched-binary, while resolving the carryover frame-11 EINVAL on the same rig." | F GREEN — patched Firefox decodes through libva without `MOZ_DISABLE_RDD_SANDBOX=1` (broker policy + seccomp ioctl `'\|'` allow + driver `select() → poll()` migration). A REPRODUCED — frame-11 EINVAL fires deterministically on a single-slice P-frame, Y2 instrumentation logs the failing controls. Track A's fix deferred to iter4. See `phase8_iteration3_close.md`. |
| 4 | Closed 2026-05-05 | "Track A solo — fix the iter1+2+3 carryover frame-11 EINVAL." | GREEN. Three correctness fixes landed (DPB `fields=FRAME_REF` + skip stale entries, fresh `request_fd` per frame, B-slice L1 reflist `.fields` copy-paste). mpv direct stress test verified 2130 BeginPictures over 90s with 0 EINVAL events of any kind — real-time HW decode through libva-v4l2-request-fourier. See `phase8_iteration4_close.md`. |
| 5 | Closed 2026-05-05 | "A+G+B+E quad: DEBUG sweep + PGO-disabled Firefox rebuild + libplacebo segfault + multi-context safety." | GREEN, all four tracks. ~339 lines of instrumentation removed (iter1+iter3+iter4 noise) — driver builds clean, per-frame log noise zero. firefox-fourier 150.0.1-1.1 rebuilt non-PGO (169 MB libxul, 21× smaller, 2.7× faster decode). LAST_OUTPUT_* moved per-driver-data. mpv `--vo=gpu` 0 segfaults. One iter6+ caveat: cap_pool resolution-change race latent under untested consumer probe patterns (Phase 5 sonnet C4). See `phase8_iteration5_close.md`. |
## Predecessor work that this campaign builds on
State (carry-over) — fork content, file:line pointers, contract analyses:
- [`libva-v4l2-request-fourier/STUDY.md`](libva-v4l2-request-fourier/STUDY.md) — Phase 0 / Phase 2 substrate already written, dated through 2026-05-02. Goal statement, why-the-fork-exists, build-cleanly stack of fixes, probe/control-flow fixes, eager-probe rationale, failure-mode-as-of-2026-04-26 (Brave-side wall is chromeos pipeline, not libva surface stack).
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) git history: 12 commits ahead of bootlin tip `a3c2476`, including kernel-UAPI renames, NV12 multi-plane format entry, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE probe fallback, and recent (2026-05-02) WIP entry-point tracing for Brave's libva surface stack.
- [`~/src/ohm_gl_fix/phase6/step1/`](../ohm_gl_fix/phase6/step1/) — Step 1 patches 0001..0018, contract-correct port of libva-v4l2-request to hantro multi-planar / Chromium-149 era. Audit at `audit_0008_decode_params_2026-05-01.md`. **vainfo enumerates H.264 profiles cleanly on this binary; Brave's chromium-fourier 149 binary engages this libva path end-to-end** (per `fourier_attribution` cell A's 54 % browser CPU vs cell B's 137 %). Step 1 patches are the working substrate that this campaign should reconcile against the libva-v4l2-request-fourier `master` and either fold-in or supersede.
- [`~/src/ohm_gl_fix/`](../ohm_gl_fix/) — closed campaign, README documents the Step 1 audit and the test corpus (`bbb_1080p30_h264.mp4` etc.).
- [`~/src/fourier_attribution/`](../fourier_attribution/) — most recent campaign. Pay attention to:
- Cell A (chromium-fourier on, libva-multi-planar engaged): browser_cpu_median = 54.4 %, fps = 24.0, drops_60s = 12.
- Cell B (Brave 1.89 / Chromium 147, libva path absent or broken): browser_cpu_median = 137.15 %, fps = 23.18, drops_60s = 16.
- `phase4_findings.md` for cross-cell verdict; `phase5_review_sonnet_2026-05-04.md` for the reviewer's pushback on the chromium-fourier conclusion.
Reference history (context, NOT data this campaign anchors to) — orthogonal scanout-plane constraint:
- [`~/src/kwin_overlay_subsurface/phase2_source_findings.md`](../kwin_overlay_subsurface/phase2_source_findings.md) — rockchip-drm plane format/modifier table on PineTab2 (RK3566). **Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12.** Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. **This is orthogonal to libva**: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes.
- [`~/src/x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md`](../x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md) — confirms the scanout-plane gap isn't fixable by switching session servers either. mpv-xv {SW,HW} and mpv-gpu {SW,HW} all leave Plane 39 in XRGB8888 throughout. It's a kernel/Mesa/Xorg-DDX gap, not a hardware-decoding gap. **Don't expect this campaign to "fix the video pipeline end-to-end"** — fixing libva-multi-planar fixes the decode side; the scanout-plane question stays open after.
- [`~/src/kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) — closed without patch (`phase8_handover.md`); its `feedback_replicate_baseline_first.md` lesson is the discipline that this campaign inherits.
**Firefox patch** for the RDD sandbox: see [`firefox-fourier/README.md`](firefox-fourier/README.md). Two ways to get HW decode working — quick (env var, sandbox off) or proper (patched Firefox, sandbox kept on). Patch is ~50 lines, applies cleanly to firefox-150.0.1 source.
External reference:
- Mozilla bug 1833354 / 1965646 (Firefox HW decode on RK3566/RK3588 explicitly needs libva-v4l2-request, not v4l2-m2m).
- Bootlin upstream `bootlin/libva-v4l2-request` — dormant since 2021, written for single-plane sunxi-cedrus.
- Collabora's `cros-codecs` (Rust, bypasses libva) — strategic replacement, not shipping soon.
- Other dormant forks (per `STUDY.md`): jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov — none ship multi-planar.
## In-scope (LOCKED 2026-05-04 in phase0_findings.md)
- libva-v4l2-request fork (`libva-v4l2-request-fourier/`), **backend only** — multi-planar correctness across the V4L2-stateless lifecycle: format probing (single-plane fallback to multi-plane), control protocol sequencing, surface-handle export, dmabuf modifier negotiation.
- H.264 first; MPEG-2 next. HEVC explicitly out.
- Hardware target: **ohm PineTab2 (RK3566 silicon, hantro driver via rk3568-vpu DT compatible) first iteration only.** fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid.
- Test consumers: vainfo, mpv `--hwdec=vaapi`, Firefox `media.ffmpeg.vaapi.enabled`, chromium-fourier 149 (regression check). Brave 1.89 deferred (chromeos-pipeline gating, not a libva-side problem).
- Phase 1 success criterion: **boolean correctness** — "libva accepted + providing access to hardware decoder". Performance metrics deferred to follow-up iteration.
## Out-of-scope (LOCKED 2026-05-04)
- Front-end libva (API library). Backend only.
- Other hardware: fresnel, ampere, boltzmann — separate iterations.
- HEVC, VP8, VP9, AV1 codecs.
- Performance metrics (CPU%, fps, drops_60s, panfrost freq).
- KWin / Wayland scanout-plane work — orthogonal (`kwin_overlay_subsurface` closed without patch).
- `cros-codecs` Rust replacement (per [`user_stance_rust.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/user_stance_rust.md)).
- Bootlin / Collabora upstreaming (per [`feedback_no_upstream.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_no_upstream.md)).
## Hardware target
- ohm — PineTab2, Rockchip **RK3566** silicon (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Hantro driver attaches via the `rockchip,rk3568-vpu` DT compatible since RK3566/RK3568 silicon is close enough; the proper RK3566 driver target (`rkvdec2`/`vdpu346`) has no mainline support yet — Christian Hewitt's patch series ([LKML 2025/12/26/206](https://lkml.org/lkml/2025/12/26/206)) is unmerged. Kernel `6.19.10-danctnix1-1-pinetab2`. **Primary measurement target.**
- (later) fresnel — Pinebook Pro, Rockchip RK3399 (hantro G1, no G2). EndeavourOS-ARM custom OC kernel — see [`reference_fresnel_kernel_constraints.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/reference_fresnel_kernel_constraints.md).
- (much later) ampere/boltzmann — RK3588 (hantro VDPU381). Adding VDPU381 is a code addition this fork doesn't have today.
## Non-upstreaming default
Inherited from the predecessors. Patches must be aligned to upstream in syntax and semantics, but no PR/MR/bug-report happens without explicit operator instruction. Bootlin upstream is dormant; once this campaign reaches a defensible state, Markus may wish to engage Bootlin / Collabora / Hans de Goede / Jernej Škrabec — that's a separate explicit decision.
## Repository layout
```
~/src/libva-multiplanar/ <- this campaign (its own git repo for findings)
├── README.md <- this file
├── (worklist.md, phase0_findings.md, ... — created as phases land)
└── libva-v4l2-request-fourier/ <- the actual fork (separate git repo)
├── .git/ <- origin: marfrit/libva-v4l2-request-fourier
│ upstream: bootlin/libva-v4l2-request
├── STUDY.md <- pre-existing Phase 0/2 substrate
└── src/ <- libva-v4l2-request source tree
```
The campaign repo and the fork repo are **separate git repositories** — campaign findings and fork commits are versioned independently. This matches the operator's general pattern (`ohm_gl_fix` campaign vs the bootlin fork it patched).
Operator-facing repo URL TBD: probably `git.reauktion.de/marfrit/libva-multiplanar` once the campaign produces something worth pushing. The fork is already at `git.reauktion.de/marfrit/libva-v4l2-request-fourier`.
## File map
Iteration 1 (closed):
| File | What it is |
|---|---|
| `phase0_findings.md` | iter1 substrate: locked research question, locked scope, predecessor state, source-read references |
| `phase0_evidence/` | iter1 inventory + baseline anchor |
| `phase4_iter2_plan.md` | (mis-named — actually iter1 Phase 4) diff against FFmpeg + hantro kernel source identifying the bug fixed in iter1 |
| `phase5_review_2026-05-04.md` | iter1 sonnet review |
| `phase6_findings.md` | iter1 Phase 6: hantro decodes real H.264 pixels |
| `phase7_findings.md` | iter1 Phase 7 verification: vaapi-copy works, surface-export bug surfaces |
| `phase8_iteration1_close.md` | iter1 close |
| `diff_against_ffmpeg.md` | Cross-reference of fork divergence vs FFmpeg's V4L2 request-API code |
Iteration 2 (closed):
| File | What it is |
|---|---|
| `phase0_findings_iter2.md` | iter2 substrate |
| `phase2_iter2_analysis.md` | iter2 situation analysis |
| `phase5_review_iter2_2026-05-04.md` | iter2 sonnet review (3 architecture blockers + REQBUFS gap) |
| `phase8_iteration2_close.md` | iter2 close (Fix 1 + Fix 2 + Fix 3 landed) |
Iteration 3 (in progress):
| File | What it is |
|---|---|
| `phase0_findings_iter3.md` | iter3 substrate. **Read this for current iteration state.** |
| `phase2_iter3_situation.md` | Mozilla sandbox source verbatim (broker policy + cap filter) |
| `phase3_iter3_baseline.md` | Pre-patch baseline anchor (ohm offline; iter2-close evidence anchored) |
| `phase4_iter3_plan.md` | Patch authorship + PKGBUILD overlay + Track A diagnostic plan |
| `phase5_iter3_review.md` | iter3 Phase 5 sonnet review (Y1 patch idiom fix, Y2 driver error_idx instrumentation, B-slice bug) |
| `phase6_iter3_findings.md` | iter3 Phase 6 build-side surprises (proper unified-diff, no `--enable-v4l2`, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap) |
| `firefox-fourier/` | Patch + PKGBUILD overlay artifacts for the boltzmann LXD container build |
| `firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch` | The Firefox RDD sandbox patch (allows /dev/media\*; cap-filter widened for stateless decoders) |
| `firefox-fourier/PKGBUILD-overlay.md` | PKGBUILD overlay strategy — verified working sequence |
| `firefox-fourier/bootstrap.sh` | Reproducible bootstrap script (run as `builder` inside the firefox-fourier LXD) |
Always-current:
| File | What it is |
|---|---|
| `README.md` | This file |
| `libva-v4l2-request-fourier/` | The fork (separate repo: `marfrit/libva-v4l2-request-fourier`) |
| `references/` | External docs: kernel source excerpts, Mozilla bugzilla notes |
## Build infrastructure
iter3 introduced a remote build host: `firefox-fourier` LXD container on `boltzmann` (RK3588 aarch64, 8 cores, 24 GB RAM, NVMe `/build`). Provisioned by the `his` agent, accessed as `ssh -J boltzmann builder@firefox-fourier`. Used to compile Firefox 150.0.1 with the iter3 sandbox patch ("firefox-fourier" build).