Update with current state: library builds clean, vainfo enumerates profiles, vaCreateContext succeeds on Brave (with STREAMON deferred as WIP unblocker), next failure is frame pool initialization in vaCreateSurfaces2. Documents the 12-step diff stack vs bootlin upstream and what still needs to happen to actually decode a frame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.5 KiB
libva-v4l2-request — Fourier port study
Goal
Make this libva backend usable on multiplanar V4L2 stateless decoders:
specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588
hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via
ffmpeg-vaapi, mpv --hwdec=vaapi, vlc, ...) gets HW decode for H.264 + MPEG-2
on the Fourier fleet without going through GStreamer.
Why this fork exists
Bootlin upstream https://github.com/bootlin/libva-v4l2-request went dormant
around 2021 and was written for single-plane sunxi-cedrus decoders.
Collabora's strategic replacement is cros-codecs (Rust) — it bypasses libva
entirely, targets Chromium/Firefox direct integration, and is not shipping
soon. That leaves a hole for VAAPI clients on Rockchip. None of the public
forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.
Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly a working libva-v4l2-request to bridge.
State today
Bootlin tip is a3c2476. Stack of WIP commits on top:
Build cleanly against current kernel UAPI
V4L2_PIX_FMT_H264_SLICE_RAW→V4L2_PIX_FMT_H264_SLICErename.src/h264.c: missing#include "utils.h"forrequest_log().- HEVC stripped —
h265.c/h265.hexcluded frommeson.build,hevc-ctrls.hreplaced by passthrough to<linux/v4l2-controls.h>, four HEVC case blocks removed frompicture.c. include/h264-ctrls.hmade into a passthrough shim to<linux/v4l2-controls.h>plusV4L2_CID_MPEG_VIDEO_H264_* → V4L2_CID_STATELESS_H264_*aliases (kernel renamed during upstreaming).src/h264.cshape updates to track the upstreamedstruct v4l2_ctrl_h264_slice_params: drop.size, usestruct v4l2_h264_reference {fields, index}forref_pic_list*, movepred_weight_tableout to its ownV4L2_CID_STATELESS_H264_PRED_WEIGHTScontrol, dropdecode_params.num_slices(kernel infers from queued controls).src/tiled_yuv.S: aarch64 stub oftiled_to_planar— the ARMv7 NEON body is#ifndef __aarch64__-guarded; without a stub the.sohad an undefined symbol and dlopen failed.
Library now builds clean (~265 KB .so) and vainfo enumerates H.264 +
MPEG-2 profiles.
Probe + control flow fixes
src/video.c: add NV12 multi-plane format entry;video_format_find()takesbool mplaneso single- and multi-plane NV12 entries don't collide on pixelformat.src/surface.c: probe block triesV4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANEas a fallback after single-plane probes fail.- Eager probe in
RequestInit— Chromium's vaapi_video_decoder callsvaCreateContextbeforevaCreateSurfaces2, so the original lazy "setdriver_data->video_formatinRequestCreateSurfaces" path was too late. Promoted the probe intovideo_format_probe()invideo.cand call it at init time. src/context.c: sameV4L2_PIX_FMT_H264_SLICE_RAW → _SLICErename that was missed in the first pass.- WIP: defer
STREAMONcalls inRequestCreateContext. The V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THENSTREAMON. Deferring letsvaCreateContextsucceed; proper sequencing is the next phase.
Diagnostic logging (will revert before final)
src/utils.c:request_log()tees to/tmp/libva-fourier.logso sandboxed Chromium GPU processes don't swallow the trace output.
Failure mode reached now (2026-04-26)
vainfo works fully. Brave on ohm with the patched .so reaches deeper:
- Init: detects NV12 multi-plane CAPTURE format ✓
- vaCreateContext: profile=H264Main 1920×1088, S_FMT + CREATE_BUFS pass ✓
- vaCreateContext: STREAMON OUTPUT returned EINVAL ← deferred to unblock
- Now stuck at:
failed Initialize()ing the frame poolfromvaapi_video_decoder.cc
That's the next-phase boundary. vaCreateSurfaces2 (frame-pool init) is
where Chromium discovers buffer geometry for output frames; the library
needs to do CAPTURE-side S_FMT + REQBUFS + EXPBUF without single-
plane assumptions on a freshly-created context, and the V4L2 stateless
protocol's source-change-event dance probably needs implementing too.
Port plan
The seam to flip is the entire kernel-userspace V4L2 boundary. The work is mostly mechanical and concentrated in three files:
src/v4l2.c — helpers (the bottleneck; all other files call into this)
Add a v4l2_type_is_mplane() predicate (one already exists upstream — keep it)
and dual paths through:
v4l2_set_format()— populate eitherformat.fmt.pixorformat.fmt.pix_mp, includingplane_fmt[0].sizeimagefor OUTPUT andnum_planesdefaulting to 1 for raw/NV12 capture.v4l2_create_buffers()/v4l2_request_buffers()— setv4l2_create_buffers.format.typeto the MPLANE variant when source is mplane.v4l2_query_buffer()/v4l2_export_buffer()— switch on type, usev4l2_buffer.m.planesarray (lengthnum_planes) instead ofm.userptr/m.fd. EXPBUF now needsplane=0parameter.v4l2_queue_buffer()/v4l2_dequeue_buffer()— samem.planes[]switch. The OUTPUT side passes the bitstream slice asm.planes[0].bytesused.
Reference: libavcodec/v4l2_buffers.c and libavcodec/v4l2_context.c in
FFmpeg already do this branching cleanly — it's the closest API match. Crib
V4L2_TYPE_IS_MULTIPLANAR() style switching there. GStreamer's
gstv4l2decoder.c is the second reference; it covers the request-API +
mplane path explicitly for the same Rockchip hardware we target.
src/context.c — context creation
RequestCreateContext calls into v4l2_set_format() for the OUTPUT and
CAPTURE queues. Detect the queue capability at context creation (cache the
mplane bit on the context object) and pick the right type for every
subsequent helper call.
src/picture.c — frame submission
The QBUF / DQBUF / EXPBUF paths in RequestEndPicture() and friends. Same
pattern — switch on the cached mplane bit and use the multiplanar variants
of the v4l2.c helpers. The slice-data submission (m.planes[0].bytesused)
is the load-bearing change here.
Reference implementations (read these side-by-side with our diff)
- FFmpeg —
libavcodec/v4l2_request.c,v4l2_request_buffer.c, per-codec files likev4l2_request_h264.c. Already multiplanar, already works on hantro/rkvdec — this is the closest-API canonical example.- https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec
- 2024-08 v2 patchset: https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg169515.html
- Active downstream: https://code.ffmpeg.org/Kwiboo/FFmpeg/ (branch
v4l2-request-n8.1)
- GStreamer v4l2codecs —
gst-plugins-bad/sys/v4l2codecs/.gstv4l2decoder.chas the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact Rockchip drivers we target.gstv4l2codecsh264dec.cshows the request-API controls submission. - Chromium —
media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}v4l2_queue.cc. ChromeOS-mature multiplanar code; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
Test fixtures
- ohm — RK3566 PineTab2, kernel
6.19.10-danctnix1-1-pinetab2. Hantro decoder exposes S264 / MG2S / VP8F formats on/dev/video1(multiplanar). This is the primary dev target. Brave on ohm is the integration test endpoint;vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1is the unit test. - Test clip:
/moviedata/fourier-test/bbb_1080p30_h264.mp4on doppler (SHA-16dcf8a7170fbd49bb, 1920×1080 H.264, 24 fps source despite the name). Pull via hertzlxc file pull. - Reference path that already works on the same hardware: GStreamer
gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink— 6 % CPU, zero drops. That's the ceiling for what we're trying to match through the libva path.
Out of scope (for the first port milestone)
- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588 silicon is on the bench AND a separate HEVC-revival pass.
- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing; this library only forwards parameters. No need to touch.
- HEVC RFC (reference frame compression) — Rockchip-specific, kernel
config has it disabled (
CONFIG_VIDEO_HANTRO_HEVC_RFC=non ohm).
Build + install
- Build container:
fermi(Arch ARM aarch64 LXC on hertz).meson setupninjastraight off the source tree, no makepkg dance needed for development iteration.
- Install path:
/usr/lib/dri/v4l2_request_drv_video.so. - Activate:
LIBVA_DRIVER_NAME=v4l2_requestplus the path env varsLIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1andLIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0. - Once the port works: package as
marfrit/libva-v4l2-request-fouriernext toffmpeg-v4l2-request-git, with the sameprovides=(libva-v4l2-request-git)shape.
Ack
Bootlin authored the original library under MIT/LGPL2.1; this fork adds GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant to track upstream if upstream ever picks the work back up.