Files
libva-multiplanar/phase0_findings.md
T
marfrit f15ba8b147 Phase 0 in-session re-verification: 2026-04-26 picture holds
Re-executed deliverables #1 (verify failure-mode finding) and #4 (capture
contract trace) on ohm against the substrate that's actually deployed —
not the libva-v4l2-request-fourier git fork master, but the
libva-v4l2-request-ohm-gl-fix package built on boltzmann from the Step 1
18-patch series.

Result: vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; mpv
--hwdec=vaapi-copy decodes 68 H.264 frames end-to-end through the full
V4L2-stateless contract on hantro /dev/video1 + /dev/media0. Zero
EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring
Phase 2 loopback.

Inventory finding documented: the git fork at e8c3937 is a pre-Step-1
substrate; rebuilding from it as-is would be a regression. Step 1
reconciliation (deliverable #2) is upstream of any future build-from-fork
action.

Rig caveat captured: --hwdec=vaapi requires a real VO; --hwdec=vaapi-copy
is the headless-safe alternative for SSH-driven test rigs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:16:38 +00:00

160 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — libva-multiplanar
This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior `STUDY.md` in the fork (`libva-v4l2-request-fourier/STUDY.md` as of commit `e0acc33`, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.
## Campaign-contained data discipline (governing rule)
Per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md) Phase 0 + [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md):
This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior `STUDY.md`, `ohm_gl_fix/phase6/step1/` audit, `fourier_attribution` cell-A vs cell-B numbers) is documented for **state** carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are **reference history** until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.
## Research question (LOCKED 2026-05-04)
> **"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."**
Pass/fail is **boolean correctness**, not throughput:
- Does the consumer dlopen `v4l2_request_drv_video.so`?
- Does it complete the VA-API surface lifecycle calls without falling back to SW?
- Does an actual V4L2 request-API ioctl (`VIDIOC_QBUF` with attached SPS/PPS controls + a request fd → `MEDIA_REQUEST_IOC_QUEUE``VIDIOC_DQBUF` of a populated CAPTURE buffer) land on hantro?
If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.
## Mechanism the question targets
Hantro VPU on RK3568 exposes its decode interface as a **multi-planar V4L2 stateless** device (`/dev/video1`, `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE` + `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE`, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is `libva-v4l2-request` — a libva backend that translates `vaCreateSurfaces2` / `vaBeginPicture` / `vaRenderPicture` / `vaEndPicture` into the V4L2-stateless protocol.
Bootlin's upstream `libva-v4l2-request` (dormant since 2021) was written for **single-plane** sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement `cros-codecs` is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.
External pointers:
- Mozilla bug 1833354 / 1965646 — Firefox HW decode on RK3566/RK3588 explicitly requires `libva-v4l2-request`, not `v4l2-m2m`.
- Bootlin upstream (dormant): <https://github.com/bootlin/libva-v4l2-request>
## Predecessor close-out summary (state carry-over, not data)
### From `~/src/ohm_gl_fix/phase6/step1/` (closed 2026-05-02, contract-correct port snapshot)
Patches `0001..0018` against an early multi-planar branch of `libva-v4l2-request`, plus the audit at `audit_0008_decode_params_2026-05-01.md`. Most relevant for this campaign:
- `0008-h264-decode-params-correctness.patch` — V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified against `hantro_h264.c` kernel source.
- `0012-h264-omit-scaling-matrix-frame-based.patch` — contract-correct gating of `SCALING_MATRIX` control by `matrix_set` rather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" in `feedback_dev_process.md`).
- vainfo enumerates H.264 profiles cleanly with these patches against `chromium-fourier 149` binary, confirmed by `fourier_attribution` cell-A (54 % browser CPU, fps 24.0). **State**: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.
The Step 1 patches must be reconciled against the libva-v4l2-request-fourier `master` (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.
### From `libva-v4l2-request-fourier/` (the fork, now sub-tree of this campaign)
Carry-over **state** (re-verify before treating as current):
- 12 commits ahead of bootlin `a3c2476`. Six "build cleanly against current kernel UAPI" commits (`V4L2_PIX_FMT_H264_SLICE_RAW``V4L2_PIX_FMT_H264_SLICE` rename; missing `utils.h` include; HEVC strip; `h264-ctrls.h` shim with `V4L2_CID_MPEG_VIDEO_H264_*``V4L2_CID_STATELESS_H264_*` aliases; `struct v4l2_ctrl_h264_slice_params` shape updates; `tiled_yuv.S` aarch64 stub).
- Five probe + control flow fix commits (`src/video.c` NV12 multi-plane format entry; `src/surface.c` MPLANE probe fallback; eager probe in `RequestInit`; `src/context.c` rename pass; **WIP**: `STREAMON` defer in `RequestCreateContext` — the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring lets `vaCreateContext` succeed but proper sequencing is the next phase).
- `src/utils.c` diagnostic logging tee to `/tmp/libva-fourier.log` (will revert before any final).
- Recent (2026-05-02) WIP entry-point tracing across `surface.c`, `image.c`, `buffer.c`, `context.c` for Brave's libva surface stack instrumentation.
The build artifact is a ~265 KB `.so`. `vainfo` + `mpv --hwdec=vaapi` enumerated profiles end-to-end as of 2026-04-26.
### From `~/src/fourier_attribution/` (closed 2026-05-04 with Phase 5 review)
- Cell A (chromium-fourier 149 with Step 1 + Step 2 patches): `browser_cpu_median = 54.4 %`, `effective_fps = 24.0`, `drops_60s = 12`. **The libva-multi-planar path is engaged here** — this is what end-to-end success looks like at the workload level.
- Cell B (stock Brave 1.89 / Chromium 147): `browser_cpu_median = 137 %`, `fps = 23.18`, `drops_60s = 16`. **Brave's libva path falls back to SW** because of the chromeos-pipeline gating documented in `STUDY.md` § "Brave's failure is not in our driver".
- The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.
**Phase 7 verification gate (LOCKED 2026-05-04)**: when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest `fourier_attribution` cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.
### From `~/src/kwin_overlay_subsurface/` and `~/src/x11-session-research/` (orthogonal)
The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is **orthogonal** to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by `kwin-fourier`'s `watchDmaBuf` fix per the `fourier_attribution` cell-D evidence).
## Current ohm state (carry-over from `fourier_attribution`)
- Kernel: `6.19.10-danctnix1-1-pinetab2`
- Mesa: `1:26.0.5-1`
- Plasma 6.6.4 Wayland session
- `qt6-base-fourier 1:6.11.0-3`, `qt6-xcb-private-headers-fourier 1:6.11.0-3`, `kwin-fourier 1:6.6.4-3` installed (cell-A package state restored end of `fourier_attribution`)
- `chromium-fourier 149` binary at `/tmp/chromium-ohm-gl-fix-step2/chrome` (Step 1 + Step 2 engaged)
- `brave-bin 1:1.89.145-1` (Chromium 147 base, control browser)
- governor `performance`, baloo disabled
- hantro on `/dev/video1`, `/dev/media0` — multi-planar V4L2 stateless
The fork tree at `~/src/libva-multiplanar/libva-v4l2-request-fourier/` is on commit `e0acc33` (master) with no uncommitted changes. Build harness: `meson setup` + `ninja` directly on ohm (small library, no distcc per operator instruction).
## In-scope (LOCKED 2026-05-04)
- libva-v4l2-request **backend only**. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only**. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
- Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
- Test consumers (LOCKED 2026-05-04):
- `vainfo` — smoke test, enumerates profiles + entrypoints
- `mpv --hwdec=vaapi` — most directly testable end-to-end consumer for HW decode validation
- Firefox via `media.ffmpeg.vaapi.enabled` + `LIBVA_DRIVER_NAME=v4l2_request` — primary "real consumer" target per Mozilla bug 1965646
- chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
- Brave 1.89 — *deferred* test consumer; the chromeos-pipeline gating documented in `STUDY.md` is upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.
## Out-of-scope (LOCKED 2026-05-04)
- Front-end libva.
- Other hardware (fresnel, ampere, boltzmann) — separate iterations.
- HEVC, VP8, VP9, AV1.
- Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
- HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
- Performance metrics. **Explicitly deferred to a follow-up iteration.** Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
- KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
- `cros-codecs` Rust replacement (out per `user_stance_rust.md`).
- Bootlin / Collabora upstreaming. Per `feedback_no_upstream.md`: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.
## Open questions before Phase 1 lock
1. **In-session re-verification of the 2026-04-26 failure-mode finding** — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
2. **Step 1 reconciliation** — fold-in `ohm_gl_fix/phase6/step1/0001..0018` to libva-v4l2-request-fourier `master`, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan.
3. **Firefox configuration** — does `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` work as documented? Phase 0 inventory item.
4. **`STREAMON` ordering on hantro** — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, then `STREAMON` both queues". Verify against `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` and `FFmpeg/libavcodec/v4l2_request*` — both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.
5. **`V4L2_EVENT_SOURCE_CHANGE` handling** — needed for resolution-change streams; not strictly required for the fixed-resolution `bbb_1080p30_h264.mp4` test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.
## Open questions resolved in this exchange
- *libva fork scope*: backend only.
- *Hardware target lock*: ohm first; others future iterations.
- *Test corpus*: vainfo, mpv `--hwdec=vaapi`, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred).
- *Phase 1 success criterion*: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
- *Cell E folded into Phase 7 verification gate*: confirmed.
- *distcc*: no — small library, builds on ohm directly.
- *Gitea repo for campaign root*: create `marfrit/libva-multiplanar` empty now; don't push until something publish-worthy lands.
## What Phase 0 will deliver (regardless of detail)
1. **Re-verify the failure-mode finding in-session.** Build the current fork on ohm, install to `/usr/lib/dri/v4l2_request_drv_video.so`, run `vainfo` and `mpv --hwdec=vaapi` on `bbb_1080p30_h264.mp4`. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs.
2. **Reconcile Step 1 (`ohm_gl_fix/phase6/step1/0001..0018`) against fork master.** Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep.
3. **Verify Firefox configuration end-to-end.** Stock Firefox + `media.ffmpeg.vaapi.enabled=true` + LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item.
4. **Phase 0 baseline anchor (in-session N=3-equivalent).** For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.
## In-session re-verification result (2026-05-04)
Items #1 and #4 above executed against the substrate that was actually deployed on ohm. Full write-up: [`phase0_evidence/2026-05-04/findings.md`](phase0_evidence/2026-05-04/findings.md). Headline:
- **Item #1 — 2026-04-26 picture HOLDS** at boolean-correctness level. vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; `mpv --hwdec=vaapi-copy` decodes 68 H.264 frames end-to-end through the full V4L2-stateless contract on hantro (`/dev/video1` + `/dev/media0`) with zero EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring Phase 2 loopback.
- **Item #4 — contract trace captured** for mpv vaapi-copy. The chromium-fourier-as-spec-source plan from Phase 0 substrate is no longer blocking — mpv's trace is a clean reproducible substitute (same backend, same per-frame lifecycle: `MEDIA_REQUEST_IOC_REINIT` → per-request `S_EXT_CTRLS``QBUF`+`MEDIA_REQUEST_IOC_QUEUE``DQBUF`). Chromium trace remains worth capturing as cross-validation but isn't needed to lock Phase 1.
- **Substrate inventory shift**: the installed `/usr/lib/dri/v4l2_request_drv_video.so` on ohm is **not** built from `libva-v4l2-request-fourier/master`. It's `libva-v4l2-request-ohm-gl-fix 1.0.0.r0.ga3c2476-2`, built on **boltzmann** 2026-05-02 from `~/src/marfrit-packages/arch/libva-v4l2-request-ohm-gl-fix/PKGBUILD` (which applies `fourier-local.patch` + Step 1 patches `0001..0018` on top of bootlin tarball `a3c2476`). The git fork at `e8c3937` is a *pre-Step-1* substrate — it has the multi-planar wedge + HEVC strip + UAPI shim + STREAMON-defer WIP, but lacks `0002..0018` (request_pool, conditional PRED_WEIGHTS, ANNEX_B start codes, fill DECODE_PARAMS from VAAPI, no CAPTURE S_FMT, SCALING_MATRIX matrix_set predicate, level_idc, POC sentinel strip, DPB picnum, P/B-frame flags). **Rebuilding from the fork as-is would be a regression** — Phase 0 deliverable #2 (Step 1 reconciliation) is upstream of any "build from fork and install" step. The "Build + install on ohm" section below describes the *target* recipe once reconciliation lands; the *current* binary on ohm matches its build chain via the marfrit-packages PKGBUILD on boltzmann.
- **Rig caveat**: `mpv --hwdec=vaapi --vo=null` fails with `Could not create device.` because vo=null doesn't provide a DRM context to vaapi proper — this is mpv-side, not libva. Headless test rigs (SSH session) must use `--hwdec=vaapi-copy` or run inside a real Plasma/X session.
Phase 0 deliverables status: #1 ✓, #2 not started (recommended next), #3 not started (independent), #4 ✓.
## Source-read references (carry-over from STUDY.md)
For Phase 2 source-read and Phase 6 implementation:
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec `v4l2_request_h264.c`. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream: `code.ffmpeg.org/Kwiboo/FFmpeg/` branch `v4l2-request-n8.1`. 2024-08 v2 patchset on the FFmpeg list.
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` + `gstv4l2codecsh264dec.c`. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target.
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}` + `v4l2_queue.cc`. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
## Test fixtures
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler (SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source). Already present at `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` on ohm from the `fourier_attribution` campaign. Pull via hertz `lxc file pull` if not present elsewhere.
- **Reference path that already works on the same hardware**: `gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink` — 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; *not* a Phase 1 binding cell — performance is deferred.)
## Build + install on ohm
- `meson setup build && ninja -C build` directly on ohm. Small library; ~265 KB `.so`. **No distcc** (operator instruction; not enough work to be worth the orchestration).
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` + `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
- Once the port works: package as `marfrit/libva-v4l2-request-fourier` next to `ffmpeg-v4l2-request-git`, with `provides=(libva-v4l2-request-git)` shape. (Out of Phase 1 scope — packaging is post-Phase-7.)