Files
libva-v4l2-request-fourier/STUDY.md
T
test0r e0acc33455 STUDY.md: phase 2 finding — libva surface stack works; Brave wall is chromeos pipeline
mpv --hwdec=vaapi successfully probes our driver end-to-end:
RequestQueryImageFormats, QueryConfigEntrypoints, CreateConfig,
QuerySurfaceAttributes, CreateSurfaces2, DeriveImage, CreateImage,
CreateBuffer, ExportSurfaceHandle all run clean across all seven enumerated
profiles. mpv then falls back to SW for actual decode (drops match the
SW baseline) because our decode-submission path isn't there yet — but
the libva entry-point surface is largely done.

Brave's "failed Initialize()ing the frame pool" turns out to be in
chromium's chromeos pipeline (PickDecoderOutputFormat → ImageProcessor
init in media/gpu/chromeos/video_decoder_pipeline.cc), not in our
driver. No more libva calls happen between our successful CreateContext
and the failure; chromium bails on the chromeos-specific V4L2
ImageProcessor it expects on real ChromeOS but doesn't find on a plain
Linux Wayland system. Fix is on the Chromium build side, not here.

Remaining real work in this library: decode submission path (Begin/
Render/EndPicture → V4L2 stateless queue/dequeue with controls
attached), and proper STREAMON ordering on hantro. STUDY.md now
documents both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:41:54 +00:00

12 KiB
Raw Blame History

libva-v4l2-request — Fourier port study

Goal

Make this libva backend usable on multiplanar V4L2 stateless decoders: specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588 hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via ffmpeg-vaapi, mpv --hwdec=vaapi, vlc, ...) gets HW decode for H.264 + MPEG-2 on the Fourier fleet without going through GStreamer.

Why this fork exists

Bootlin upstream https://github.com/bootlin/libva-v4l2-request went dormant around 2021 and was written for single-plane sunxi-cedrus decoders. Collabora's strategic replacement is cros-codecs (Rust) — it bypasses libva entirely, targets Chromium/Firefox direct integration, and is not shipping soon. That leaves a hole for VAAPI clients on Rockchip. None of the public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.

Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly a working libva-v4l2-request to bridge.

State today

Bootlin tip is a3c2476. Stack of WIP commits on top:

Build cleanly against current kernel UAPI

  1. V4L2_PIX_FMT_H264_SLICE_RAWV4L2_PIX_FMT_H264_SLICE rename.
  2. src/h264.c: missing #include "utils.h" for request_log().
  3. HEVC stripped — h265.c/h265.h excluded from meson.build, hevc-ctrls.h replaced by passthrough to <linux/v4l2-controls.h>, four HEVC case blocks removed from picture.c.
  4. include/h264-ctrls.h made into a passthrough shim to <linux/v4l2-controls.h> plus V4L2_CID_MPEG_VIDEO_H264_* → V4L2_CID_STATELESS_H264_* aliases (kernel renamed during upstreaming).
  5. src/h264.c shape updates to track the upstreamed struct v4l2_ctrl_h264_slice_params: drop .size, use struct v4l2_h264_reference {fields, index} for ref_pic_list*, move pred_weight_table out to its own V4L2_CID_STATELESS_H264_PRED_WEIGHTS control, drop decode_params.num_slices (kernel infers from queued controls).
  6. src/tiled_yuv.S: aarch64 stub of tiled_to_planar — the ARMv7 NEON body is #ifndef __aarch64__-guarded; without a stub the .so had an undefined symbol and dlopen failed.

Library now builds clean (~265 KB .so) and vainfo enumerates H.264 + MPEG-2 profiles.

Probe + control flow fixes

  1. src/video.c: add NV12 multi-plane format entry; video_format_find() takes bool mplane so single- and multi-plane NV12 entries don't collide on pixelformat.
  2. src/surface.c: probe block tries V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE as a fallback after single-plane probes fail.
  3. Eager probe in RequestInit — Chromium's vaapi_video_decoder calls vaCreateContext before vaCreateSurfaces2, so the original lazy "set driver_data->video_format in RequestCreateSurfaces" path was too late. Promoted the probe into video_format_probe() in video.c and call it at init time.
  4. src/context.c: same V4L2_PIX_FMT_H264_SLICE_RAW → _SLICE rename that was missed in the first pass.
  5. WIP: defer STREAMON calls in RequestCreateContext. The V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON. Deferring lets vaCreateContext succeed; proper sequencing is the next phase.

Diagnostic logging (will revert before final)

  1. src/utils.c: request_log() tees to /tmp/libva-fourier.log so sandboxed Chromium GPU processes don't swallow the trace output.

Failure mode reached now (2026-04-26)

vainfo works fully. mpv --hwdec=vaapi probes our driver end-to-end successfully (seven profiles enumerated, RequestQueryImageFormats / QueryConfigEntrypoints / CreateConfig / QuerySurfaceAttributes / CreateSurfaces2 / DeriveImage / CreateImage / CreateBuffer / ExportSurfaceHandle all run clean) — then falls back to libavcodec's software H.264 decoder for the actual decode. mpv's drop pattern matches the SW baseline (≈16 drops/s through KWin gpu-next), so the SW path is in use, not ours.

Brave's failure is not in our driver. The verbose log shows:

ApplyResolutionChangeWithScreenSizes()
PickDecoderOutputFormat(): Initializing ImageProcessor; max buffers: 16
ERROR: failed Initialize()ing the frame pool

PickDecoderOutputFormat from media/gpu/chromeos/video_decoder_pipeline.cc is the ChromeOS pipeline trying to init a V4L2 ImageProcessor (a ChromeOS-specific concept — separate V4L2 m2m chip block for color conversion / scaling). Brave-on-Linux runs this code path because the build doesn't gate it on is_chromeos, but on a plain Linux Wayland system there's nothing for the ImageProcessor to bind to and it bails before any libva call lands in our driver. No code change in libva-v4l2-request-fourier will fix Brave; the lever is on the Chromium / Brave build side (skip the chromeos pipeline, or supply the expected V4L2 image processor device).

What still needs to happen for actual decode

The libva entry-point surface is largely done (probing works); the remaining gap is the decode submission path:

  • RequestBeginPicture / RequestRenderPicture / RequestEndPicture need to map to V4L2 stateless: queue OUTPUT buffer with the encoded slice + the SPS/PPS/SLICE_PARAMS/PRED_WEIGHTS/DECODE_PARAMS/ SCALING_MATRIX controls via the request fd, then trigger decode and dequeue a CAPTURE buffer.
  • The STREAMON ordering needs proper sequencing on hantro: the current WIP defer in RequestCreateContext (commit 44a7327) bypasses the EINVAL but doesn't actually enable the queue. The real fix is to set both queue formats up front, queue the first buffer with controls attached, then STREAMON both queues.
  • Source-change-event (V4L2_EVENT_SOURCE_CHANGE) handling is probably needed for resolution-change streams; not strictly required for the fixed-resolution Big Buck Bunny clip we test with.

Once that lands, mpv --hwdec=vaapi should switch from "probe then SW decode" to "actual HW decode through our path", and the user-facing recipe matches what FFmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime already delivers (14 % CPU realtime).

Brave / Chromium specifically remains parked behind the chromeos pipeline issue, independent of this library's progress.

Port plan

The seam to flip is the entire kernel-userspace V4L2 boundary. The work is mostly mechanical and concentrated in three files:

src/v4l2.c — helpers (the bottleneck; all other files call into this)

Add a v4l2_type_is_mplane() predicate (one already exists upstream — keep it) and dual paths through:

  • v4l2_set_format() — populate either format.fmt.pix or format.fmt.pix_mp, including plane_fmt[0].sizeimage for OUTPUT and num_planes defaulting to 1 for raw/NV12 capture.
  • v4l2_create_buffers() / v4l2_request_buffers() — set v4l2_create_buffers.format.type to the MPLANE variant when source is mplane.
  • v4l2_query_buffer() / v4l2_export_buffer() — switch on type, use v4l2_buffer.m.planes array (length num_planes) instead of m.userptr / m.fd. EXPBUF now needs plane=0 parameter.
  • v4l2_queue_buffer() / v4l2_dequeue_buffer() — same m.planes[] switch. The OUTPUT side passes the bitstream slice as m.planes[0].bytesused.

Reference: libavcodec/v4l2_buffers.c and libavcodec/v4l2_context.c in FFmpeg already do this branching cleanly — it's the closest API match. Crib V4L2_TYPE_IS_MULTIPLANAR() style switching there. GStreamer's gstv4l2decoder.c is the second reference; it covers the request-API + mplane path explicitly for the same Rockchip hardware we target.

src/context.c — context creation

RequestCreateContext calls into v4l2_set_format() for the OUTPUT and CAPTURE queues. Detect the queue capability at context creation (cache the mplane bit on the context object) and pick the right type for every subsequent helper call.

src/picture.c — frame submission

The QBUF / DQBUF / EXPBUF paths in RequestEndPicture() and friends. Same pattern — switch on the cached mplane bit and use the multiplanar variants of the v4l2.c helpers. The slice-data submission (m.planes[0].bytesused) is the load-bearing change here.

Reference implementations (read these side-by-side with our diff)

Test fixtures

  • ohm — RK3566 PineTab2, kernel 6.19.10-danctnix1-1-pinetab2. Hantro decoder exposes S264 / MG2S / VP8F formats on /dev/video1 (multiplanar). This is the primary dev target. Brave on ohm is the integration test endpoint; vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 is the unit test.
  • Test clip: /moviedata/fourier-test/bbb_1080p30_h264.mp4 on doppler (SHA-16 dcf8a7170fbd49bb, 1920×1080 H.264, 24 fps source despite the name). Pull via hertz lxc file pull.
  • Reference path that already works on the same hardware: GStreamer gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink — 6 % CPU, zero drops. That's the ceiling for what we're trying to match through the libva path.

Out of scope (for the first port milestone)

  • HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588 silicon is on the bench AND a separate HEVC-revival pass.
  • VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
  • Userspace bitstream parsing — kernel V4L2 stateless API does the parsing; this library only forwards parameters. No need to touch.
  • HEVC RFC (reference frame compression) — Rockchip-specific, kernel config has it disabled (CONFIG_VIDEO_HANTRO_HEVC_RFC=n on ohm).

Build + install

  • Build container: fermi (Arch ARM aarch64 LXC on hertz). meson setup
    • ninja straight off the source tree, no makepkg dance needed for development iteration.
  • Install path: /usr/lib/dri/v4l2_request_drv_video.so.
  • Activate: LIBVA_DRIVER_NAME=v4l2_request plus the path env vars LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 and LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0.
  • Once the port works: package as marfrit/libva-v4l2-request-fourier next to ffmpeg-v4l2-request-git, with the same provides=(libva-v4l2-request-git) shape.

Ack

Bootlin authored the original library under MIT/LGPL2.1; this fork adds GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant to track upstream if upstream ever picks the work back up.