Files
libva-multiplanar/phase0_findings.md
T
marfrit f115fa6cbc Phase 0 deliverable #3 (Firefox): headless-rig finding
Firefox 150.0.1 + media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=
v4l2_request, executed under Xvfb on ohm.

Result: inconclusive at the boolean-correctness level. RDD process
dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability
probe then immediately closes them; never reaches vaInitialize, never
opens /dev/dri/renderD128, never reaches v4l2_request_drv_video.so.
Falls back to software H.264 in RDD via FFmpeg-OS-library PDM
(Broadcast support from 'RDD', support=H264 SWDEC).

Root cause: Xvfb provides software framebuffer with no DRI/DRM
render-node integration. Firefox's gfx-environment platform-fitness
check rejects VAAPI before adding it to the RDD PDM order list.
Not a libva-side or driver-side fault — mpv --hwdec=vaapi-copy in
the same headless rig DID engage end-to-end (per
phase0_evidence/2026-05-04/findings.md).

Definitive Firefox verdict requires retesting inside a live Plasma
session — deferred to live-session run (next commit).

Also: Phase 0 deliverable #2 (Step 1 reconciliation into fork
master) was completed and pushed to marfrit/libva-v4l2-request-fourier
between this and the prior Phase 0 commit; status table updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:19:14 +00:00

20 KiB
Raw Blame History

Phase 0 — libva-multiplanar

This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior STUDY.md in the fork (libva-v4l2-request-fourier/STUDY.md as of commit e0acc33, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.

Campaign-contained data discipline (governing rule)

Per feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:

This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior STUDY.md, ohm_gl_fix/phase6/step1/ audit, fourier_attribution cell-A vs cell-B numbers) is documented for state carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are reference history until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.

Research question (LOCKED 2026-05-04)

"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."

Pass/fail is boolean correctness, not throughput:

  • Does the consumer dlopen v4l2_request_drv_video.so?
  • Does it complete the VA-API surface lifecycle calls without falling back to SW?
  • Does an actual V4L2 request-API ioctl (VIDIOC_QBUF with attached SPS/PPS controls + a request fd → MEDIA_REQUEST_IOC_QUEUEVIDIOC_DQBUF of a populated CAPTURE buffer) land on hantro?

If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.

Mechanism the question targets

Hantro VPU on RK3568 exposes its decode interface as a multi-planar V4L2 stateless device (/dev/video1, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE + V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is libva-v4l2-request — a libva backend that translates vaCreateSurfaces2 / vaBeginPicture / vaRenderPicture / vaEndPicture into the V4L2-stateless protocol.

Bootlin's upstream libva-v4l2-request (dormant since 2021) was written for single-plane sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement cros-codecs is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.

External pointers:

Predecessor close-out summary (state carry-over, not data)

From ~/src/ohm_gl_fix/phase6/step1/ (closed 2026-05-02, contract-correct port snapshot)

Patches 0001..0018 against an early multi-planar branch of libva-v4l2-request, plus the audit at audit_0008_decode_params_2026-05-01.md. Most relevant for this campaign:

  • 0008-h264-decode-params-correctness.patch — V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified against hantro_h264.c kernel source.
  • 0012-h264-omit-scaling-matrix-frame-based.patch — contract-correct gating of SCALING_MATRIX control by matrix_set rather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" in feedback_dev_process.md).
  • vainfo enumerates H.264 profiles cleanly with these patches against chromium-fourier 149 binary, confirmed by fourier_attribution cell-A (54 % browser CPU, fps 24.0). State: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.

The Step 1 patches must be reconciled against the libva-v4l2-request-fourier master (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.

From libva-v4l2-request-fourier/ (the fork, now sub-tree of this campaign)

Carry-over state (re-verify before treating as current):

  • 12 commits ahead of bootlin a3c2476. Six "build cleanly against current kernel UAPI" commits (V4L2_PIX_FMT_H264_SLICE_RAWV4L2_PIX_FMT_H264_SLICE rename; missing utils.h include; HEVC strip; h264-ctrls.h shim with V4L2_CID_MPEG_VIDEO_H264_*V4L2_CID_STATELESS_H264_* aliases; struct v4l2_ctrl_h264_slice_params shape updates; tiled_yuv.S aarch64 stub).
  • Five probe + control flow fix commits (src/video.c NV12 multi-plane format entry; src/surface.c MPLANE probe fallback; eager probe in RequestInit; src/context.c rename pass; WIP: STREAMON defer in RequestCreateContext — the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring lets vaCreateContext succeed but proper sequencing is the next phase).
  • src/utils.c diagnostic logging tee to /tmp/libva-fourier.log (will revert before any final).
  • Recent (2026-05-02) WIP entry-point tracing across surface.c, image.c, buffer.c, context.c for Brave's libva surface stack instrumentation.

The build artifact is a ~265 KB .so. vainfo + mpv --hwdec=vaapi enumerated profiles end-to-end as of 2026-04-26.

From ~/src/fourier_attribution/ (closed 2026-05-04 with Phase 5 review)

  • Cell A (chromium-fourier 149 with Step 1 + Step 2 patches): browser_cpu_median = 54.4 %, effective_fps = 24.0, drops_60s = 12. The libva-multi-planar path is engaged here — this is what end-to-end success looks like at the workload level.
  • Cell B (stock Brave 1.89 / Chromium 147): browser_cpu_median = 137 %, fps = 23.18, drops_60s = 16. Brave's libva path falls back to SW because of the chromeos-pipeline gating documented in STUDY.md § "Brave's failure is not in our driver".
  • The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.

Phase 7 verification gate (LOCKED 2026-05-04): when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest fourier_attribution cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.

From ~/src/kwin_overlay_subsurface/ and ~/src/x11-session-research/ (orthogonal)

The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is orthogonal to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by kwin-fourier's watchDmaBuf fix per the fourier_attribution cell-D evidence).

Current ohm state (carry-over from fourier_attribution)

  • Kernel: 6.19.10-danctnix1-1-pinetab2
  • Mesa: 1:26.0.5-1
  • Plasma 6.6.4 Wayland session
  • qt6-base-fourier 1:6.11.0-3, qt6-xcb-private-headers-fourier 1:6.11.0-3, kwin-fourier 1:6.6.4-3 installed (cell-A package state restored end of fourier_attribution)
  • chromium-fourier 149 binary at /tmp/chromium-ohm-gl-fix-step2/chrome (Step 1 + Step 2 engaged)
  • brave-bin 1:1.89.145-1 (Chromium 147 base, control browser)
  • governor performance, baloo disabled
  • hantro on /dev/video1, /dev/media0 — multi-planar V4L2 stateless

The fork tree at ~/src/libva-multiplanar/libva-v4l2-request-fourier/ is on commit e0acc33 (master) with no uncommitted changes. Build harness: meson setup + ninja directly on ohm (small library, no distcc per operator instruction).

In-scope (LOCKED 2026-05-04)

  • libva-v4l2-request backend only. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
  • Hardware target: ohm RK3568 hantro G1/G2 first iteration only. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
  • Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
  • Test consumers (LOCKED 2026-05-04):
    • vainfo — smoke test, enumerates profiles + entrypoints
    • mpv --hwdec=vaapi — most directly testable end-to-end consumer for HW decode validation
    • Firefox via media.ffmpeg.vaapi.enabled + LIBVA_DRIVER_NAME=v4l2_request — primary "real consumer" target per Mozilla bug 1965646
    • chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
    • Brave 1.89 — deferred test consumer; the chromeos-pipeline gating documented in STUDY.md is upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.

Out-of-scope (LOCKED 2026-05-04)

  • Front-end libva.
  • Other hardware (fresnel, ampere, boltzmann) — separate iterations.
  • HEVC, VP8, VP9, AV1.
  • Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
  • HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
  • Performance metrics. Explicitly deferred to a follow-up iteration. Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
  • KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
  • cros-codecs Rust replacement (out per user_stance_rust.md).
  • Bootlin / Collabora upstreaming. Per feedback_no_upstream.md: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.

Open questions before Phase 1 lock

  1. In-session re-verification of the 2026-04-26 failure-mode finding — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
  2. Step 1 reconciliation — fold-in ohm_gl_fix/phase6/step1/0001..0018 to libva-v4l2-request-fourier master, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan.
  3. Firefox configuration — does media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=v4l2_request + LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 work as documented? Phase 0 inventory item.
  4. STREAMON ordering on hantro — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, then STREAMON both queues". Verify against gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c and FFmpeg/libavcodec/v4l2_request* — both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.
  5. V4L2_EVENT_SOURCE_CHANGE handling — needed for resolution-change streams; not strictly required for the fixed-resolution bbb_1080p30_h264.mp4 test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.

Open questions resolved in this exchange

  • libva fork scope: backend only.
  • Hardware target lock: ohm first; others future iterations.
  • Test corpus: vainfo, mpv --hwdec=vaapi, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred).
  • Phase 1 success criterion: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
  • Cell E folded into Phase 7 verification gate: confirmed.
  • distcc: no — small library, builds on ohm directly.
  • Gitea repo for campaign root: create marfrit/libva-multiplanar empty now; don't push until something publish-worthy lands.

What Phase 0 will deliver (regardless of detail)

  1. Re-verify the failure-mode finding in-session. Build the current fork on ohm, install to /usr/lib/dri/v4l2_request_drv_video.so, run vainfo and mpv --hwdec=vaapi on bbb_1080p30_h264.mp4. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs.
  2. Reconcile Step 1 (ohm_gl_fix/phase6/step1/0001..0018) against fork master. Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep.
  3. Verify Firefox configuration end-to-end. Stock Firefox + media.ffmpeg.vaapi.enabled=true + LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item.
  4. Phase 0 baseline anchor (in-session N=3-equivalent). For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.

In-session re-verification result (2026-05-04)

Items #1 and #4 above executed against the substrate that was actually deployed on ohm. Full write-up: phase0_evidence/2026-05-04/findings.md. Headline:

  • Item #1 — 2026-04-26 picture HOLDS at boolean-correctness level. vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; mpv --hwdec=vaapi-copy decodes 68 H.264 frames end-to-end through the full V4L2-stateless contract on hantro (/dev/video1 + /dev/media0) with zero EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring Phase 2 loopback.
  • Item #4 — contract trace captured for mpv vaapi-copy. The chromium-fourier-as-spec-source plan from Phase 0 substrate is no longer blocking — mpv's trace is a clean reproducible substitute (same backend, same per-frame lifecycle: MEDIA_REQUEST_IOC_REINIT → per-request S_EXT_CTRLSQBUF+MEDIA_REQUEST_IOC_QUEUEDQBUF). Chromium trace remains worth capturing as cross-validation but isn't needed to lock Phase 1.
  • Substrate inventory shift: the installed /usr/lib/dri/v4l2_request_drv_video.so on ohm is not built from libva-v4l2-request-fourier/master. It's libva-v4l2-request-ohm-gl-fix 1.0.0.r0.ga3c2476-2, built on boltzmann 2026-05-02 from ~/src/marfrit-packages/arch/libva-v4l2-request-ohm-gl-fix/PKGBUILD (which applies fourier-local.patch + Step 1 patches 0001..0018 on top of bootlin tarball a3c2476). The git fork at e8c3937 is a pre-Step-1 substrate — it has the multi-planar wedge + HEVC strip + UAPI shim + STREAMON-defer WIP, but lacks 0002..0018 (request_pool, conditional PRED_WEIGHTS, ANNEX_B start codes, fill DECODE_PARAMS from VAAPI, no CAPTURE S_FMT, SCALING_MATRIX matrix_set predicate, level_idc, POC sentinel strip, DPB picnum, P/B-frame flags). Rebuilding from the fork as-is would be a regression — Phase 0 deliverable #2 (Step 1 reconciliation) is upstream of any "build from fork and install" step. The "Build + install on ohm" section below describes the target recipe once reconciliation lands; the current binary on ohm matches its build chain via the marfrit-packages PKGBUILD on boltzmann.
  • Rig caveat: mpv --hwdec=vaapi --vo=null fails with Could not create device. because vo=null doesn't provide a DRM context to vaapi proper — this is mpv-side, not libva. Headless test rigs (SSH session) must use --hwdec=vaapi-copy or run inside a real Plasma/X session.

Phase 0 deliverables status: #1 ✓, #2 ✓ (Step 1 reconciled into fork master and pushed; see libva-v4l2-request-fourier/ git log), #3 ⚠ partial (see below), #4 ✓.

Firefox engagement test (Phase 0 deliverable #3, 2026-05-04)

Stock Firefox 150.0.1 + media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=v4l2_request env, executed under Xvfb on ohm. Full write-up: phase0_evidence/2026-05-04-firefox/findings.md.

Verdict: inconclusive at the boolean-correctness level under the headless rig. Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches vaInitialize, never opens /dev/dri/renderD128, never reaches v4l2_request_drv_video.so. Falls back to software H.264 in RDD via FFmpeg-OS-library PDM (Broadcast support from 'RDD', support=H264 SWDEC). The gating decision happens at Firefox's gfx-environment platform-fitness check, before VAAPI device init — Xvfb provides software framebuffer with no DRI/DRM render-node integration, so Firefox's PDM enumerator skips VAAPI entirely. Not a libva-side or driver-side fault.

mpv --hwdec=vaapi-copy in the same headless rig DID engage end-to-end, so the issue is specifically Firefox's gfx-env requirements being stricter. Definitive Firefox verdict requires retesting inside a live Plasma session — currently ohm has only SDDM greeter on tty1 with no active user session.

Implication for Phase 1: Firefox stays as a target consumer in the corpus, but the binding cell for "does Firefox engage HW decode" is locked to Phase 7 verification in a real session, not to a Phase 0 baseline. mpv --hwdec=vaapi-copy carries the boolean-correctness substrate for Phase 0; vainfo + chromium-fourier 149 (TBD) provide additional triangulation.

Source-read references (carry-over from STUDY.md)

For Phase 2 source-read and Phase 6 implementation:

  • FFmpeglibavcodec/v4l2_request.c, v4l2_request_buffer.c, per-codec v4l2_request_h264.c. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream: code.ffmpeg.org/Kwiboo/FFmpeg/ branch v4l2-request-n8.1. 2024-08 v2 patchset on the FFmpeg list.
  • GStreamer v4l2codecsgst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c + gstv4l2codecsh264dec.c. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target.
  • Chromiummedia/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc} + v4l2_queue.cc. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.

Test fixtures

  • Test clip: /moviedata/fourier-test/bbb_1080p30_h264.mp4 on doppler (SHA-16 dcf8a7170fbd49bb, 1920×1080 H.264, 24 fps source). Already present at /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 on ohm from the fourier_attribution campaign. Pull via hertz lxc file pull if not present elsewhere.
  • Reference path that already works on the same hardware: gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink — 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; not a Phase 1 binding cell — performance is deferred.)

Build + install on ohm

  • meson setup build && ninja -C build directly on ohm. Small library; ~265 KB .so. No distcc (operator instruction; not enough work to be worth the orchestration).
  • Install path: /usr/lib/dri/v4l2_request_drv_video.so.
  • Activate: LIBVA_DRIVER_NAME=v4l2_request + LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 + LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0.
  • Once the port works: package as marfrit/libva-v4l2-request-fourier next to ffmpeg-v4l2-request-git, with provides=(libva-v4l2-request-git) shape. (Out of Phase 1 scope — packaging is post-Phase-7.)