Single-question campaign — make multi-planar libva accepted by VA-API consumers on Rockchip hantro RK3568 (PineTab2/ohm first iteration). Backend only, success criterion is boolean correctness, performance deferred. Substrate carried over from libva-v4l2-request-fourier STUDY.md (commit e0acc33 in the fork) plus locked decisions from the 2026-05-04 setup exchange. Fork lives as a subdirectory: libva-v4l2-request-fourier/ (separate git repo, origin marfrit/libva-v4l2-request-fourier, upstream bootlin/libva-v4l2-request). Empty Gitea repo created at git.reauktion.de/marfrit/libva-multiplanar; local origin remote set, no push yet (per operator instruction — wait for publish-worthy state).
16 KiB
Phase 0 — libva-multiplanar
This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior STUDY.md in the fork (libva-v4l2-request-fourier/STUDY.md as of commit e0acc33, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.
Campaign-contained data discipline (governing rule)
Per feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:
This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior STUDY.md, ohm_gl_fix/phase6/step1/ audit, fourier_attribution cell-A vs cell-B numbers) is documented for state carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are reference history until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.
Research question (LOCKED 2026-05-04)
"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."
Pass/fail is boolean correctness, not throughput:
- Does the consumer dlopen
v4l2_request_drv_video.so? - Does it complete the VA-API surface lifecycle calls without falling back to SW?
- Does an actual V4L2 request-API ioctl (
VIDIOC_QBUFwith attached SPS/PPS controls + a request fd →MEDIA_REQUEST_IOC_QUEUE→VIDIOC_DQBUFof a populated CAPTURE buffer) land on hantro?
If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.
Mechanism the question targets
Hantro VPU on RK3568 exposes its decode interface as a multi-planar V4L2 stateless device (/dev/video1, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE + V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is libva-v4l2-request — a libva backend that translates vaCreateSurfaces2 / vaBeginPicture / vaRenderPicture / vaEndPicture into the V4L2-stateless protocol.
Bootlin's upstream libva-v4l2-request (dormant since 2021) was written for single-plane sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement cros-codecs is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.
External pointers:
- Mozilla bug 1833354 / 1965646 — Firefox HW decode on RK3566/RK3588 explicitly requires
libva-v4l2-request, notv4l2-m2m. - Bootlin upstream (dormant): https://github.com/bootlin/libva-v4l2-request
Predecessor close-out summary (state carry-over, not data)
From ~/src/ohm_gl_fix/phase6/step1/ (closed 2026-05-02, contract-correct port snapshot)
Patches 0001..0018 against an early multi-planar branch of libva-v4l2-request, plus the audit at audit_0008_decode_params_2026-05-01.md. Most relevant for this campaign:
0008-h264-decode-params-correctness.patch— V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified againsthantro_h264.ckernel source.0012-h264-omit-scaling-matrix-frame-based.patch— contract-correct gating ofSCALING_MATRIXcontrol bymatrix_setrather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" infeedback_dev_process.md).- vainfo enumerates H.264 profiles cleanly with these patches against
chromium-fourier 149binary, confirmed byfourier_attributioncell-A (54 % browser CPU, fps 24.0). State: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.
The Step 1 patches must be reconciled against the libva-v4l2-request-fourier master (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.
From libva-v4l2-request-fourier/ (the fork, now sub-tree of this campaign)
Carry-over state (re-verify before treating as current):
- 12 commits ahead of bootlin
a3c2476. Six "build cleanly against current kernel UAPI" commits (V4L2_PIX_FMT_H264_SLICE_RAW→V4L2_PIX_FMT_H264_SLICErename; missingutils.hinclude; HEVC strip;h264-ctrls.hshim withV4L2_CID_MPEG_VIDEO_H264_*→V4L2_CID_STATELESS_H264_*aliases;struct v4l2_ctrl_h264_slice_paramsshape updates;tiled_yuv.Saarch64 stub). - Five probe + control flow fix commits (
src/video.cNV12 multi-plane format entry;src/surface.cMPLANE probe fallback; eager probe inRequestInit;src/context.crename pass; WIP:STREAMONdefer inRequestCreateContext— the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring letsvaCreateContextsucceed but proper sequencing is the next phase). src/utils.cdiagnostic logging tee to/tmp/libva-fourier.log(will revert before any final).- Recent (2026-05-02) WIP entry-point tracing across
surface.c,image.c,buffer.c,context.cfor Brave's libva surface stack instrumentation.
The build artifact is a ~265 KB .so. vainfo + mpv --hwdec=vaapi enumerated profiles end-to-end as of 2026-04-26.
From ~/src/fourier_attribution/ (closed 2026-05-04 with Phase 5 review)
- Cell A (chromium-fourier 149 with Step 1 + Step 2 patches):
browser_cpu_median = 54.4 %,effective_fps = 24.0,drops_60s = 12. The libva-multi-planar path is engaged here — this is what end-to-end success looks like at the workload level. - Cell B (stock Brave 1.89 / Chromium 147):
browser_cpu_median = 137 %,fps = 23.18,drops_60s = 16. Brave's libva path falls back to SW because of the chromeos-pipeline gating documented inSTUDY.md§ "Brave's failure is not in our driver". - The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.
Phase 7 verification gate (LOCKED 2026-05-04): when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest fourier_attribution cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.
From ~/src/kwin_overlay_subsurface/ and ~/src/x11-session-research/ (orthogonal)
The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is orthogonal to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by kwin-fourier's watchDmaBuf fix per the fourier_attribution cell-D evidence).
Current ohm state (carry-over from fourier_attribution)
- Kernel:
6.19.10-danctnix1-1-pinetab2 - Mesa:
1:26.0.5-1 - Plasma 6.6.4 Wayland session
qt6-base-fourier 1:6.11.0-3,qt6-xcb-private-headers-fourier 1:6.11.0-3,kwin-fourier 1:6.6.4-3installed (cell-A package state restored end offourier_attribution)chromium-fourier 149binary at/tmp/chromium-ohm-gl-fix-step2/chrome(Step 1 + Step 2 engaged)brave-bin 1:1.89.145-1(Chromium 147 base, control browser)- governor
performance, baloo disabled - hantro on
/dev/video1,/dev/media0— multi-planar V4L2 stateless
The fork tree at ~/src/libva-multiplanar/libva-v4l2-request-fourier/ is on commit e0acc33 (master) with no uncommitted changes. Build harness: meson setup + ninja directly on ohm (small library, no distcc per operator instruction).
In-scope (LOCKED 2026-05-04)
- libva-v4l2-request backend only. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
- Hardware target: ohm RK3568 hantro G1/G2 first iteration only. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
- Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
- Test consumers (LOCKED 2026-05-04):
vainfo— smoke test, enumerates profiles + entrypointsmpv --hwdec=vaapi— most directly testable end-to-end consumer for HW decode validation- Firefox via
media.ffmpeg.vaapi.enabled+LIBVA_DRIVER_NAME=v4l2_request— primary "real consumer" target per Mozilla bug 1965646 - chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
- Brave 1.89 — deferred test consumer; the chromeos-pipeline gating documented in
STUDY.mdis upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.
Out-of-scope (LOCKED 2026-05-04)
- Front-end libva.
- Other hardware (fresnel, ampere, boltzmann) — separate iterations.
- HEVC, VP8, VP9, AV1.
- Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
- HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
- Performance metrics. Explicitly deferred to a follow-up iteration. Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
- KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
cros-codecsRust replacement (out peruser_stance_rust.md).- Bootlin / Collabora upstreaming. Per
feedback_no_upstream.md: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.
Open questions before Phase 1 lock
- In-session re-verification of the 2026-04-26 failure-mode finding — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
- Step 1 reconciliation — fold-in
ohm_gl_fix/phase6/step1/0001..0018to libva-v4l2-request-fouriermaster, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan. - Firefox configuration — does
media.ffmpeg.vaapi.enabled=true+LIBVA_DRIVER_NAME=v4l2_request+LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1work as documented? Phase 0 inventory item. STREAMONordering on hantro — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, thenSTREAMONboth queues". Verify againstgst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.candFFmpeg/libavcodec/v4l2_request*— both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.V4L2_EVENT_SOURCE_CHANGEhandling — needed for resolution-change streams; not strictly required for the fixed-resolutionbbb_1080p30_h264.mp4test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.
Open questions resolved in this exchange
- libva fork scope: backend only.
- Hardware target lock: ohm first; others future iterations.
- Test corpus: vainfo, mpv
--hwdec=vaapi, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred). - Phase 1 success criterion: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
- Cell E folded into Phase 7 verification gate: confirmed.
- distcc: no — small library, builds on ohm directly.
- Gitea repo for campaign root: create
marfrit/libva-multiplanarempty now; don't push until something publish-worthy lands.
What Phase 0 will deliver (regardless of detail)
- Re-verify the failure-mode finding in-session. Build the current fork on ohm, install to
/usr/lib/dri/v4l2_request_drv_video.so, runvainfoandmpv --hwdec=vaapionbbb_1080p30_h264.mp4. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs. - Reconcile Step 1 (
ohm_gl_fix/phase6/step1/0001..0018) against fork master. Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep. - Verify Firefox configuration end-to-end. Stock Firefox +
media.ffmpeg.vaapi.enabled=true+ LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item. - Phase 0 baseline anchor (in-session N=3-equivalent). For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.
Source-read references (carry-over from STUDY.md)
For Phase 2 source-read and Phase 6 implementation:
- FFmpeg —
libavcodec/v4l2_request.c,v4l2_request_buffer.c, per-codecv4l2_request_h264.c. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream:code.ffmpeg.org/Kwiboo/FFmpeg/branchv4l2-request-n8.1. 2024-08 v2 patchset on the FFmpeg list. - GStreamer v4l2codecs —
gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c+gstv4l2codecsh264dec.c. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target. - Chromium —
media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}+v4l2_queue.cc. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
Test fixtures
- Test clip:
/moviedata/fourier-test/bbb_1080p30_h264.mp4on doppler (SHA-16dcf8a7170fbd49bb, 1920×1080 H.264, 24 fps source). Already present at/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4on ohm from thefourier_attributioncampaign. Pull via hertzlxc file pullif not present elsewhere. - Reference path that already works on the same hardware:
gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink— 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; not a Phase 1 binding cell — performance is deferred.)
Build + install on ohm
meson setup build && ninja -C builddirectly on ohm. Small library; ~265 KB.so. No distcc (operator instruction; not enough work to be worth the orchestration).- Install path:
/usr/lib/dri/v4l2_request_drv_video.so. - Activate:
LIBVA_DRIVER_NAME=v4l2_request+LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1+LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0. - Once the port works: package as
marfrit/libva-v4l2-request-fouriernext toffmpeg-v4l2-request-git, withprovides=(libva-v4l2-request-git)shape. (Out of Phase 1 scope — packaging is post-Phase-7.)