From e8c39374359b595de5dadee9ee652df862b2cc4e Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 4 May 2026 08:08:32 +0000 Subject: [PATCH] STUDY.md: replace with pointer to libva-multiplanar campaign Phase 0 The Phase 0 / Phase 2 substrate that lived here has been transformed into ../phase0_findings.md as the campaign-level Phase 0 document. This file is reduced to a pointer + a git-show recipe to recover the prior content from commit e0acc33. --- STUDY.md | 234 ++----------------------------------------------------- 1 file changed, 5 insertions(+), 229 deletions(-) diff --git a/STUDY.md b/STUDY.md index 50e10e1..e7d8087 100644 --- a/STUDY.md +++ b/STUDY.md @@ -1,233 +1,9 @@ -# libva-v4l2-request — Fourier port study +# STUDY.md → moved -## Goal +The Phase 0 / Phase 2 substrate that previously lived here has been transformed into the campaign-level Phase 0 document at: -Make this libva backend usable on **multiplanar** V4L2 stateless decoders: -specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588 -hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via -ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2 -on the Fourier fleet without going through GStreamer. +- [`../phase0_findings.md`](../phase0_findings.md) -## Why this fork exists +That document also points at the remaining open questions for Phase 1 lock and the verification gate at Phase 7. Read it together with the campaign README at [`../README.md`](../README.md). -Bootlin upstream went dormant -around 2021 and was written for **single-plane** sunxi-cedrus decoders. -Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva -entirely, targets Chromium/Firefox direct integration, and **is not shipping -soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public -forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar. - -Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses -v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly -a working libva-v4l2-request to bridge. - -## State today - -Bootlin tip is `a3c2476`. Stack of WIP commits on top: - -### Build cleanly against current kernel UAPI - -1. `V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` rename. -2. `src/h264.c`: missing `#include "utils.h"` for `request_log()`. -3. HEVC stripped — `h265.c`/`h265.h` excluded from `meson.build`, - `hevc-ctrls.h` replaced by passthrough to ``, - four HEVC case blocks removed from `picture.c`. -4. `include/h264-ctrls.h` made into a passthrough shim to - `` plus `V4L2_CID_MPEG_VIDEO_H264_* → - V4L2_CID_STATELESS_H264_*` aliases (kernel renamed during upstreaming). -5. `src/h264.c` shape updates to track the upstreamed `struct - v4l2_ctrl_h264_slice_params`: drop `.size`, use `struct - v4l2_h264_reference {fields, index}` for `ref_pic_list*`, move - `pred_weight_table` out to its own `V4L2_CID_STATELESS_H264_PRED_WEIGHTS` - control, drop `decode_params.num_slices` (kernel infers from queued - controls). -6. `src/tiled_yuv.S`: aarch64 stub of `tiled_to_planar` — the ARMv7 NEON - body is `#ifndef __aarch64__`-guarded; without a stub the `.so` had - an undefined symbol and dlopen failed. - -Library now builds clean (~265 KB `.so`) and `vainfo` enumerates H.264 + -MPEG-2 profiles. - -### Probe + control flow fixes - -7. `src/video.c`: add NV12 multi-plane format entry; `video_format_find()` - takes `bool mplane` so single- and multi-plane NV12 entries don't - collide on pixelformat. -8. `src/surface.c`: probe block tries `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE` - as a fallback after single-plane probes fail. -9. **Eager probe in `RequestInit`** — Chromium's vaapi_video_decoder calls - `vaCreateContext` *before* `vaCreateSurfaces2`, so the original lazy - "set `driver_data->video_format` in `RequestCreateSurfaces`" path was - too late. Promoted the probe into `video_format_probe()` in - `video.c` and call it at init time. -10. `src/context.c`: same `V4L2_PIX_FMT_H264_SLICE_RAW → _SLICE` rename - that was missed in the first pass. -11. **WIP**: defer `STREAMON` calls in `RequestCreateContext`. The V4L2 - stateless protocol on hantro requires OUTPUT format → SPS controls → - first slice queued → THEN `STREAMON`. Deferring lets `vaCreateContext` - succeed; proper sequencing is the next phase. - -### Diagnostic logging (will revert before final) - -12. `src/utils.c`: `request_log()` tees to `/tmp/libva-fourier.log` so - sandboxed Chromium GPU processes don't swallow the trace output. - -### Failure mode reached now (2026-04-26) - -`vainfo` works fully. **mpv `--hwdec=vaapi` probes our driver end-to-end -successfully** (seven profiles enumerated, `RequestQueryImageFormats / -QueryConfigEntrypoints / CreateConfig / QuerySurfaceAttributes / -CreateSurfaces2 / DeriveImage / CreateImage / CreateBuffer / -ExportSurfaceHandle` all run clean) — then falls back to libavcodec's -software H.264 decoder for the actual decode. mpv's drop pattern matches -the SW baseline (≈16 drops/s through KWin gpu-next), so the SW path is -in use, not ours. - -Brave's failure is **not** in our driver. The verbose log shows: - -``` -ApplyResolutionChangeWithScreenSizes() -PickDecoderOutputFormat(): Initializing ImageProcessor; max buffers: 16 -ERROR: failed Initialize()ing the frame pool -``` - -`PickDecoderOutputFormat` from `media/gpu/chromeos/video_decoder_pipeline.cc` -is the **ChromeOS** pipeline trying to init a V4L2 ImageProcessor (a -ChromeOS-specific concept — separate V4L2 m2m chip block for color -conversion / scaling). Brave-on-Linux runs this code path because the -build doesn't gate it on `is_chromeos`, but on a plain Linux Wayland -system there's nothing for the ImageProcessor to bind to and it bails -before any libva call lands in our driver. **No code change in -libva-v4l2-request-fourier will fix Brave**; the lever is on the -Chromium / Brave build side (skip the chromeos pipeline, or supply the -expected V4L2 image processor device). - -### What still needs to happen for actual decode - -The libva entry-point surface is largely done (probing works); the -remaining gap is the **decode submission path**: - -- `RequestBeginPicture` / `RequestRenderPicture` / `RequestEndPicture` - need to map to V4L2 stateless: queue OUTPUT buffer with the encoded - slice + the SPS/PPS/SLICE_PARAMS/PRED_WEIGHTS/DECODE_PARAMS/ - SCALING_MATRIX controls via the request fd, then trigger decode and - dequeue a CAPTURE buffer. -- The `STREAMON` ordering needs proper sequencing on hantro: the - current WIP defer in `RequestCreateContext` (commit `44a7327`) bypasses - the EINVAL but doesn't actually enable the queue. The real fix is to - set both queue formats up front, queue the first buffer with controls - attached, then `STREAMON` both queues. -- Source-change-event (`V4L2_EVENT_SOURCE_CHANGE`) handling is probably - needed for resolution-change streams; not strictly required for the - fixed-resolution Big Buck Bunny clip we test with. - -Once that lands, `mpv --hwdec=vaapi` should switch from "probe then SW -decode" to "actual HW decode through our path", and the user-facing -recipe matches what FFmpeg `-hwaccel v4l2request -hwaccel_output_format -drm_prime` already delivers (14 % CPU realtime). - -Brave / Chromium specifically remains parked behind the chromeos -pipeline issue, independent of this library's progress. - -## Port plan - -The seam to flip is the entire kernel-userspace V4L2 boundary. The work is -mostly mechanical and concentrated in three files: - -### `src/v4l2.c` — helpers (the bottleneck; all other files call into this) - -Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it) -and dual paths through: - -- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`, - including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1 - for raw/NV12 capture. -- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set - `v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane. -- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use - `v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` / - `m.fd`. EXPBUF now needs `plane=0` parameter. -- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch. - The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`. - -Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in -FFmpeg already do this branching cleanly — it's the closest API match. Crib -`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's -`gstv4l2decoder.c` is the second reference; it covers the request-API + -mplane path explicitly for the same Rockchip hardware we target. - -### `src/context.c` — context creation - -`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and -CAPTURE queues. Detect the queue capability at context creation (cache the -mplane bit on the context object) and pick the right type for every -subsequent helper call. - -### `src/picture.c` — frame submission - -The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same -pattern — switch on the cached mplane bit and use the multiplanar variants -of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`) -is the load-bearing change here. - -## Reference implementations (read these side-by-side with our diff) - -- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec - files like `v4l2_request_h264.c`. Already multiplanar, already works on - hantro/rkvdec — this is the closest-API canonical example. - - - - 2024-08 v2 patchset: - - Active downstream: (branch `v4l2-request-n8.1`) -- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c` - has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact - Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API - controls submission. - - -- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}` - + `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than - we need but useful for surface lifecycle / request-fd tracking patterns. - - - -## Test fixtures - -- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro - decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar). - This is the primary dev target. Brave on ohm is the integration test - endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` - is the unit test. -- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler - (SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the - name). Pull via hertz `lxc file pull`. -- **Reference path that already works on the same hardware**: GStreamer - `gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! - waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're - trying to match through the libva path. - -## Out of scope (for the first port milestone) - -- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588 - silicon is on the bench AND a separate HEVC-revival pass. -- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set. -- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing; - this library only forwards parameters. No need to touch. -- HEVC RFC (reference frame compression) — Rockchip-specific, kernel - config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm). - -## Build + install - -- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup` - + `ninja` straight off the source tree, no makepkg dance needed for - development iteration. -- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`. -- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars - `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and - `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`. -- Once the port works: package as `marfrit/libva-v4l2-request-fourier` - next to `ffmpeg-v4l2-request-git`, with the same - `provides=(libva-v4l2-request-git)` shape. - -## Ack - -Bootlin authored the original library under MIT/LGPL2.1; this fork adds -GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant -to track upstream if upstream ever picks the work back up. +The git commit that this file points back to (the last commit while STUDY.md still held the substrate content) is `e0acc33` — `git show e0acc33:STUDY.md` recovers the historical content if needed.