Files
libva-multiplanar/README.md
T
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

162 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# libva-multiplanar
Single-question campaign on the libva V4L2-stateless backend: **make multi-planar libva work, end-to-end, on Rockchip hantro hardware, for production VA-API consumers (Brave/Chromium, Firefox via libavcodec, mpv `--hwdec=vaapi`, vainfo as smoke test)**.
The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 (initially) and MPEG-2 hardware-decoded NV12 dmabufs out of, on PineTab2 RK3568 first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the RK3568 path is solid.
The fork lives as a subdirectory of this campaign:
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) — clone of `bootlin/libva-v4l2-request` with our `master` ahead. Existing substrate: see its [`STUDY.md`](libva-v4l2-request-fourier/STUDY.md) for the build-cleanly + probe + control-flow + WIP-tracing work landed before this campaign opened.
This README is the Claude-facing entry point for resumption after compaction. Read it first.
## Origin
`fourier_attribution` campaign closed 2026-05-04 with the per-package wheat-vs-chaff verdict on bbb 1080p H.264 first-60s playback (PineTab2):
- **kwin-fourier**: WHEAT, robust. Removing it triples kwin CPU, drives Mali to 95 % peak-freq residency, doubles drops. Confirmed.
- **chromium-fourier**: WHEAT-but-fragile (Sonnet review's downgrade). Removing it (= falling back to stock Brave 1.89 / Chromium-147 base) costs 83 pp browser CPU (54 % → 137 %) — a magnitude consistent with **multi-planar libva enabling the hantro hardware-decode fast path**, but confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149 control) was identified as the cheapest disambiguator and not yet run.
- **qt6-fourier**: CHAFF on this workload.
Phase 5 review: <https://dokuwiki.reauktion.de/doku.php?id=fourier:attribution_2026-05-03>
The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. **The reverse is also true**: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on RK3568. Firefox VAAPI, mpv `--hwdec=vaapi`, gst-vaapi, vainfo all degrade to software or fall over.
## Process
Eight-plus-one phase loop per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md). Phase 0 of each iteration is locked in `phase0_findings*.md` — read the latest iteration's substrate next.
Phase 5 (second-model review) and Phase 8 (iteration close + memory entry) follow the predecessor cadence — invoke the sonnet subagent for the review pattern.
Per the [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md) lesson: any binding cell in this campaign anchors to in-session-acquired data. The migrated STUDY.md material and ohm_gl_fix patch-correctness audits are reference history, not threshold sources.
## Iteration history
| Iter | Status | Locked question | Outcome |
|---|---|---|---|
| 1 | Closed 2026-05-04 | "Does multi-planar libva-v4l2-request decode H.264 to NV12 dmabufs on hantro for any consumer?" | YES. vaapi-copy + Firefox-with-sandbox-bypass + vainfo all engage hantro. Documented bugs: surface-export DMA-BUF lifecycle race, multi-resolution session corruption, Mesa WSI 64-pitch alignment. See `phase8_iteration1_close.md`. |
| 2 | Closed 2026-05-04 | "Harden the iter1 deliverable: fix the three known bugs without regressing scope." | DONE. Fix 1 (resolution-change format-cache invalidation), Fix 2 (DRM_FORMAT_MOD_INVALID conditional for non-64 pitch), Fix 3 (decoupled `cap_pool` with LRU recycling for DMA-BUF lifecycle). mpv vaapi DMA-BUF playback "smooth" per operator inspection. See `phase8_iteration2_close.md`. |
| 3 | Closed 2026-05-05 | "F+A: verify the Firefox RDD sandbox hypothesis by patched-binary, while resolving the carryover frame-11 EINVAL on the same rig." | F GREEN — patched Firefox decodes through libva without `MOZ_DISABLE_RDD_SANDBOX=1` (broker policy + seccomp ioctl `'\|'` allow + driver `select() → poll()` migration). A REPRODUCED — frame-11 EINVAL fires deterministically on a single-slice P-frame, Y2 instrumentation logs the failing controls. Track A's fix deferred to iter4. See `phase8_iteration3_close.md`. |
## Predecessor work that this campaign builds on
State (carry-over) — fork content, file:line pointers, contract analyses:
- [`libva-v4l2-request-fourier/STUDY.md`](libva-v4l2-request-fourier/STUDY.md) — Phase 0 / Phase 2 substrate already written, dated through 2026-05-02. Goal statement, why-the-fork-exists, build-cleanly stack of fixes, probe/control-flow fixes, eager-probe rationale, failure-mode-as-of-2026-04-26 (Brave-side wall is chromeos pipeline, not libva surface stack).
- [`libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) git history: 12 commits ahead of bootlin tip `a3c2476`, including kernel-UAPI renames, NV12 multi-plane format entry, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE probe fallback, and recent (2026-05-02) WIP entry-point tracing for Brave's libva surface stack.
- [`~/src/ohm_gl_fix/phase6/step1/`](../ohm_gl_fix/phase6/step1/) — Step 1 patches 0001..0018, contract-correct port of libva-v4l2-request to hantro multi-planar / Chromium-149 era. Audit at `audit_0008_decode_params_2026-05-01.md`. **vainfo enumerates H.264 profiles cleanly on this binary; Brave's chromium-fourier 149 binary engages this libva path end-to-end** (per `fourier_attribution` cell A's 54 % browser CPU vs cell B's 137 %). Step 1 patches are the working substrate that this campaign should reconcile against the libva-v4l2-request-fourier `master` and either fold-in or supersede.
- [`~/src/ohm_gl_fix/`](../ohm_gl_fix/) — closed campaign, README documents the Step 1 audit and the test corpus (`bbb_1080p30_h264.mp4` etc.).
- [`~/src/fourier_attribution/`](../fourier_attribution/) — most recent campaign. Pay attention to:
- Cell A (chromium-fourier on, libva-multi-planar engaged): browser_cpu_median = 54.4 %, fps = 24.0, drops_60s = 12.
- Cell B (Brave 1.89 / Chromium 147, libva path absent or broken): browser_cpu_median = 137.15 %, fps = 23.18, drops_60s = 16.
- `phase4_findings.md` for cross-cell verdict; `phase5_review_sonnet_2026-05-04.md` for the reviewer's pushback on the chromium-fourier conclusion.
Reference history (context, NOT data this campaign anchors to) — orthogonal scanout-plane constraint:
- [`~/src/kwin_overlay_subsurface/phase2_source_findings.md`](../kwin_overlay_subsurface/phase2_source_findings.md) — rockchip-drm RK3568 plane format/modifier table. **Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12.** Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. **This is orthogonal to libva**: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes.
- [`~/src/x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md`](../x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md) — confirms the scanout-plane gap isn't fixable by switching session servers either. mpv-xv {SW,HW} and mpv-gpu {SW,HW} all leave Plane 39 in XRGB8888 throughout. It's a kernel/Mesa/Xorg-DDX gap, not a hardware-decoding gap. **Don't expect this campaign to "fix the video pipeline end-to-end"** — fixing libva-multi-planar fixes the decode side; the scanout-plane question stays open after.
- [`~/src/kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) — closed without patch (`phase8_handover.md`); its `feedback_replicate_baseline_first.md` lesson is the discipline that this campaign inherits.
External reference:
- Mozilla bug 1833354 / 1965646 (Firefox HW decode on RK3566/RK3588 explicitly needs libva-v4l2-request, not v4l2-m2m).
- Bootlin upstream `bootlin/libva-v4l2-request` — dormant since 2021, written for single-plane sunxi-cedrus.
- Collabora's `cros-codecs` (Rust, bypasses libva) — strategic replacement, not shipping soon.
- Other dormant forks (per `STUDY.md`): jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov — none ship multi-planar.
## In-scope (LOCKED 2026-05-04 in phase0_findings.md)
- libva-v4l2-request fork (`libva-v4l2-request-fourier/`), **backend only** — multi-planar correctness across the V4L2-stateless lifecycle: format probing (single-plane fallback to multi-plane), control protocol sequencing, surface-handle export, dmabuf modifier negotiation.
- H.264 first; MPEG-2 next. HEVC explicitly out.
- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only.** fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid.
- Test consumers: vainfo, mpv `--hwdec=vaapi`, Firefox `media.ffmpeg.vaapi.enabled`, chromium-fourier 149 (regression check). Brave 1.89 deferred (chromeos-pipeline gating, not a libva-side problem).
- Phase 1 success criterion: **boolean correctness** — "libva accepted + providing access to hardware decoder". Performance metrics deferred to follow-up iteration.
## Out-of-scope (LOCKED 2026-05-04)
- Front-end libva (API library). Backend only.
- Other hardware: fresnel, ampere, boltzmann — separate iterations.
- HEVC, VP8, VP9, AV1 codecs.
- Performance metrics (CPU%, fps, drops_60s, panfrost freq).
- KWin / Wayland scanout-plane work — orthogonal (`kwin_overlay_subsurface` closed without patch).
- `cros-codecs` Rust replacement (per [`user_stance_rust.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/user_stance_rust.md)).
- Bootlin / Collabora upstreaming (per [`feedback_no_upstream.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_no_upstream.md)).
## Hardware target
- ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Kernel `6.19.10-danctnix1-1-pinetab2`. **Primary measurement target.**
- (later) fresnel — Pinebook Pro, Rockchip RK3399 (hantro G1, no G2). EndeavourOS-ARM custom OC kernel — see [`reference_fresnel_kernel_constraints.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/reference_fresnel_kernel_constraints.md).
- (much later) ampere/boltzmann — RK3588 (hantro VDPU381). Adding VDPU381 is a code addition this fork doesn't have today.
## Non-upstreaming default
Inherited from the predecessors. Patches must be aligned to upstream in syntax and semantics, but no PR/MR/bug-report happens without explicit operator instruction. Bootlin upstream is dormant; once this campaign reaches a defensible state, Markus may wish to engage Bootlin / Collabora / Hans de Goede / Jernej Škrabec — that's a separate explicit decision.
## Repository layout
```
~/src/libva-multiplanar/ <- this campaign (its own git repo for findings)
├── README.md <- this file
├── (worklist.md, phase0_findings.md, ... — created as phases land)
└── libva-v4l2-request-fourier/ <- the actual fork (separate git repo)
├── .git/ <- origin: marfrit/libva-v4l2-request-fourier
│ upstream: bootlin/libva-v4l2-request
├── STUDY.md <- pre-existing Phase 0/2 substrate
└── src/ <- libva-v4l2-request source tree
```
The campaign repo and the fork repo are **separate git repositories** — campaign findings and fork commits are versioned independently. This matches the operator's general pattern (`ohm_gl_fix` campaign vs the bootlin fork it patched).
Operator-facing repo URL TBD: probably `git.reauktion.de/marfrit/libva-multiplanar` once the campaign produces something worth pushing. The fork is already at `git.reauktion.de/marfrit/libva-v4l2-request-fourier`.
## File map
Iteration 1 (closed):
| File | What it is |
|---|---|
| `phase0_findings.md` | iter1 substrate: locked research question, locked scope, predecessor state, source-read references |
| `phase0_evidence/` | iter1 inventory + baseline anchor |
| `phase4_iter2_plan.md` | (mis-named — actually iter1 Phase 4) diff against FFmpeg + hantro kernel source identifying the bug fixed in iter1 |
| `phase5_review_2026-05-04.md` | iter1 sonnet review |
| `phase6_findings.md` | iter1 Phase 6: hantro decodes real H.264 pixels |
| `phase7_findings.md` | iter1 Phase 7 verification: vaapi-copy works, surface-export bug surfaces |
| `phase8_iteration1_close.md` | iter1 close |
| `diff_against_ffmpeg.md` | Cross-reference of fork divergence vs FFmpeg's V4L2 request-API code |
Iteration 2 (closed):
| File | What it is |
|---|---|
| `phase0_findings_iter2.md` | iter2 substrate |
| `phase2_iter2_analysis.md` | iter2 situation analysis |
| `phase5_review_iter2_2026-05-04.md` | iter2 sonnet review (3 architecture blockers + REQBUFS gap) |
| `phase8_iteration2_close.md` | iter2 close (Fix 1 + Fix 2 + Fix 3 landed) |
Iteration 3 (in progress):
| File | What it is |
|---|---|
| `phase0_findings_iter3.md` | iter3 substrate. **Read this for current iteration state.** |
| `phase2_iter3_situation.md` | Mozilla sandbox source verbatim (broker policy + cap filter) |
| `phase3_iter3_baseline.md` | Pre-patch baseline anchor (ohm offline; iter2-close evidence anchored) |
| `phase4_iter3_plan.md` | Patch authorship + PKGBUILD overlay + Track A diagnostic plan |
| `phase5_iter3_review.md` | iter3 Phase 5 sonnet review (Y1 patch idiom fix, Y2 driver error_idx instrumentation, B-slice bug) |
| `phase6_iter3_findings.md` | iter3 Phase 6 build-side surprises (proper unified-diff, no `--enable-v4l2`, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap) |
| `firefox-fourier/` | Patch + PKGBUILD overlay artifacts for the boltzmann LXD container build |
| `firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch` | The Firefox RDD sandbox patch (allows /dev/media\*; cap-filter widened for stateless decoders) |
| `firefox-fourier/PKGBUILD-overlay.md` | PKGBUILD overlay strategy — verified working sequence |
| `firefox-fourier/bootstrap.sh` | Reproducible bootstrap script (run as `builder` inside the firefox-fourier LXD) |
Always-current:
| File | What it is |
|---|---|
| `README.md` | This file |
| `libva-v4l2-request-fourier/` | The fork (separate repo: `marfrit/libva-v4l2-request-fourier`) |
| `references/` | External docs: kernel source excerpts, Mozilla bugzilla notes |
## Build infrastructure
iter3 introduced a remote build host: `firefox-fourier` LXD container on `boltzmann` (RK3588 aarch64, 8 cores, 24 GB RAM, NVMe `/build`). Provisioned by the `his` agent, accessed as `ssh -J boltzmann builder@firefox-fourier`. Used to compile Firefox 150.0.1 with the iter3 sandbox patch ("firefox-fourier" build).