From 294bdc24f69c7403b8a274d7365cc34b96a47c75 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 25 Apr 2026 21:30:54 +0000 Subject: [PATCH] STUDY.md: port plan + reference implementations + test fixtures Cold-start dossier for the multiplanar port: goal, why-this-fork-exists, state-today, port plan (v4l2.c / context.c / picture.c), reference impls to read side-by-side (FFmpeg libavcodec/v4l2_request*, GStreamer gst-plugins-bad/sys/v4l2codecs, Chromium media/gpu/v4l2), test fixtures (ohm + bbb_1080p30_h264.mp4 + GStreamer ceiling at 6% CPU), out-of-scope (HEVC/VP9/AV1). Co-Authored-By: Claude Opus 4.7 (1M context) --- STUDY.md | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 STUDY.md diff --git a/STUDY.md b/STUDY.md new file mode 100644 index 0000000..d91b1c6 --- /dev/null +++ b/STUDY.md @@ -0,0 +1,148 @@ +# libva-v4l2-request — Fourier port study + +## Goal + +Make this libva backend usable on **multiplanar** V4L2 stateless decoders: +specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588 +hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via +ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2 +on the Fourier fleet without going through GStreamer. + +## Why this fork exists + +Bootlin upstream went dormant +around 2021 and was written for **single-plane** sunxi-cedrus decoders. +Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva +entirely, targets Chromium/Firefox direct integration, and **is not shipping +soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public +forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar. + +Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses +v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly +a working libva-v4l2-request to bridge. + +## State today + +Initial commits applied to bootlin tip `a3c2476`: + +1. `V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` (kernel UAPI rename). +2. `src/h264.c`: missing `#include "utils.h"` for `request_log()` (GCC 14 fatal). +3. HEVC stripped — h265.c/h265.h excluded from `meson.build`, hevc-ctrls.h + replaced by passthrough to ``, four HEVC case blocks + removed from `picture.c` (kernel `V4L2_CID_MPEG_VIDEO_HEVC_*` was renamed + to `V4L2_CID_STATELESS_HEVC_*`; ohm has no HW HEVC anyway). +4. `src/config.c`: profile probe falls back to `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE` + when single-plane returns no formats. **`vainfo` now lists H.264 + MPEG-2 + profiles on ohm**, and Brave's GPU process picks them up via VAAPI. + +Failure mode reached and stuck on: + +``` +ERROR:media/gpu/vaapi/vaapi_wrapper.cc:2407] vaCreateContext failed, + VA error: operation failed +``` + +That's the next-phase boundary — the real port work below. + +## Port plan + +The seam to flip is the entire kernel-userspace V4L2 boundary. The work is +mostly mechanical and concentrated in three files: + +### `src/v4l2.c` — helpers (the bottleneck; all other files call into this) + +Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it) +and dual paths through: + +- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`, + including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1 + for raw/NV12 capture. +- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set + `v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane. +- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use + `v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` / + `m.fd`. EXPBUF now needs `plane=0` parameter. +- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch. + The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`. + +Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in +FFmpeg already do this branching cleanly — it's the closest API match. Crib +`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's +`gstv4l2decoder.c` is the second reference; it covers the request-API + +mplane path explicitly for the same Rockchip hardware we target. + +### `src/context.c` — context creation + +`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and +CAPTURE queues. Detect the queue capability at context creation (cache the +mplane bit on the context object) and pick the right type for every +subsequent helper call. + +### `src/picture.c` — frame submission + +The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same +pattern — switch on the cached mplane bit and use the multiplanar variants +of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`) +is the load-bearing change here. + +## Reference implementations (read these side-by-side with our diff) + +- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec + files like `v4l2_request_h264.c`. Already multiplanar, already works on + hantro/rkvdec — this is the closest-API canonical example. + - + - 2024-08 v2 patchset: + - Active downstream: (branch `v4l2-request-n8.1`) +- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c` + has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact + Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API + controls submission. + - +- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}` + + `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than + we need but useful for surface lifecycle / request-fd tracking patterns. + - + +## Test fixtures + +- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro + decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar). + This is the primary dev target. Brave on ohm is the integration test + endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` + is the unit test. +- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler + (SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the + name). Pull via hertz `lxc file pull`. +- **Reference path that already works on the same hardware**: GStreamer + `gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! + waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're + trying to match through the libva path. + +## Out of scope (for the first port milestone) + +- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588 + silicon is on the bench AND a separate HEVC-revival pass. +- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set. +- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing; + this library only forwards parameters. No need to touch. +- HEVC RFC (reference frame compression) — Rockchip-specific, kernel + config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm). + +## Build + install + +- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup` + + `ninja` straight off the source tree, no makepkg dance needed for + development iteration. +- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`. +- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and + `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`. +- Once the port works: package as `marfrit/libva-v4l2-request-fourier` + next to `ffmpeg-v4l2-request-git`, with the same + `provides=(libva-v4l2-request-git)` shape. + +## Ack + +Bootlin authored the original library under MIT/LGPL2.1; this fork adds +GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant +to track upstream if upstream ever picks the work back up.