STUDY.md: replace with pointer to libva-multiplanar campaign Phase 0

The Phase 0 / Phase 2 substrate that lived here has been transformed
into ../phase0_findings.md as the campaign-level Phase 0 document.
This file is reduced to a pointer + a git-show recipe to recover the
prior content from commit e0acc33.
This commit is contained in:
2026-05-04 08:08:32 +00:00
parent e0acc33455
commit e8c3937435
+5 -229
View File
@@ -1,233 +1,9 @@
# libva-v4l2-request — Fourier port study
# STUDY.md → moved
## Goal
The Phase 0 / Phase 2 substrate that previously lived here has been transformed into the campaign-level Phase 0 document at:
Make this libva backend usable on **multiplanar** V4L2 stateless decoders:
specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588
hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via
ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2
on the Fourier fleet without going through GStreamer.
- [`../phase0_findings.md`](../phase0_findings.md)
## Why this fork exists
That document also points at the remaining open questions for Phase 1 lock and the verification gate at Phase 7. Read it together with the campaign README at [`../README.md`](../README.md).
Bootlin upstream <https://github.com/bootlin/libva-v4l2-request> went dormant
around 2021 and was written for **single-plane** sunxi-cedrus decoders.
Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva
entirely, targets Chromium/Firefox direct integration, and **is not shipping
soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public
forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.
Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses
v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly
a working libva-v4l2-request to bridge.
## State today
Bootlin tip is `a3c2476`. Stack of WIP commits on top:
### Build cleanly against current kernel UAPI
1. `V4L2_PIX_FMT_H264_SLICE_RAW``V4L2_PIX_FMT_H264_SLICE` rename.
2. `src/h264.c`: missing `#include "utils.h"` for `request_log()`.
3. HEVC stripped — `h265.c`/`h265.h` excluded from `meson.build`,
`hevc-ctrls.h` replaced by passthrough to `<linux/v4l2-controls.h>`,
four HEVC case blocks removed from `picture.c`.
4. `include/h264-ctrls.h` made into a passthrough shim to
`<linux/v4l2-controls.h>` plus `V4L2_CID_MPEG_VIDEO_H264_* →
V4L2_CID_STATELESS_H264_*` aliases (kernel renamed during upstreaming).
5. `src/h264.c` shape updates to track the upstreamed `struct
v4l2_ctrl_h264_slice_params`: drop `.size`, use `struct
v4l2_h264_reference {fields, index}` for `ref_pic_list*`, move
`pred_weight_table` out to its own `V4L2_CID_STATELESS_H264_PRED_WEIGHTS`
control, drop `decode_params.num_slices` (kernel infers from queued
controls).
6. `src/tiled_yuv.S`: aarch64 stub of `tiled_to_planar` — the ARMv7 NEON
body is `#ifndef __aarch64__`-guarded; without a stub the `.so` had
an undefined symbol and dlopen failed.
Library now builds clean (~265 KB `.so`) and `vainfo` enumerates H.264 +
MPEG-2 profiles.
### Probe + control flow fixes
7. `src/video.c`: add NV12 multi-plane format entry; `video_format_find()`
takes `bool mplane` so single- and multi-plane NV12 entries don't
collide on pixelformat.
8. `src/surface.c`: probe block tries `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE`
as a fallback after single-plane probes fail.
9. **Eager probe in `RequestInit`** — Chromium's vaapi_video_decoder calls
`vaCreateContext` *before* `vaCreateSurfaces2`, so the original lazy
"set `driver_data->video_format` in `RequestCreateSurfaces`" path was
too late. Promoted the probe into `video_format_probe()` in
`video.c` and call it at init time.
10. `src/context.c`: same `V4L2_PIX_FMT_H264_SLICE_RAW → _SLICE` rename
that was missed in the first pass.
11. **WIP**: defer `STREAMON` calls in `RequestCreateContext`. The V4L2
stateless protocol on hantro requires OUTPUT format → SPS controls →
first slice queued → THEN `STREAMON`. Deferring lets `vaCreateContext`
succeed; proper sequencing is the next phase.
### Diagnostic logging (will revert before final)
12. `src/utils.c`: `request_log()` tees to `/tmp/libva-fourier.log` so
sandboxed Chromium GPU processes don't swallow the trace output.
### Failure mode reached now (2026-04-26)
`vainfo` works fully. **mpv `--hwdec=vaapi` probes our driver end-to-end
successfully** (seven profiles enumerated, `RequestQueryImageFormats /
QueryConfigEntrypoints / CreateConfig / QuerySurfaceAttributes /
CreateSurfaces2 / DeriveImage / CreateImage / CreateBuffer /
ExportSurfaceHandle` all run clean) — then falls back to libavcodec's
software H.264 decoder for the actual decode. mpv's drop pattern matches
the SW baseline (≈16 drops/s through KWin gpu-next), so the SW path is
in use, not ours.
Brave's failure is **not** in our driver. The verbose log shows:
```
ApplyResolutionChangeWithScreenSizes()
PickDecoderOutputFormat(): Initializing ImageProcessor; max buffers: 16
ERROR: failed Initialize()ing the frame pool
```
`PickDecoderOutputFormat` from `media/gpu/chromeos/video_decoder_pipeline.cc`
is the **ChromeOS** pipeline trying to init a V4L2 ImageProcessor (a
ChromeOS-specific concept — separate V4L2 m2m chip block for color
conversion / scaling). Brave-on-Linux runs this code path because the
build doesn't gate it on `is_chromeos`, but on a plain Linux Wayland
system there's nothing for the ImageProcessor to bind to and it bails
before any libva call lands in our driver. **No code change in
libva-v4l2-request-fourier will fix Brave**; the lever is on the
Chromium / Brave build side (skip the chromeos pipeline, or supply the
expected V4L2 image processor device).
### What still needs to happen for actual decode
The libva entry-point surface is largely done (probing works); the
remaining gap is the **decode submission path**:
- `RequestBeginPicture` / `RequestRenderPicture` / `RequestEndPicture`
need to map to V4L2 stateless: queue OUTPUT buffer with the encoded
slice + the SPS/PPS/SLICE_PARAMS/PRED_WEIGHTS/DECODE_PARAMS/
SCALING_MATRIX controls via the request fd, then trigger decode and
dequeue a CAPTURE buffer.
- The `STREAMON` ordering needs proper sequencing on hantro: the
current WIP defer in `RequestCreateContext` (commit `44a7327`) bypasses
the EINVAL but doesn't actually enable the queue. The real fix is to
set both queue formats up front, queue the first buffer with controls
attached, then `STREAMON` both queues.
- Source-change-event (`V4L2_EVENT_SOURCE_CHANGE`) handling is probably
needed for resolution-change streams; not strictly required for the
fixed-resolution Big Buck Bunny clip we test with.
Once that lands, `mpv --hwdec=vaapi` should switch from "probe then SW
decode" to "actual HW decode through our path", and the user-facing
recipe matches what FFmpeg `-hwaccel v4l2request -hwaccel_output_format
drm_prime` already delivers (14 % CPU realtime).
Brave / Chromium specifically remains parked behind the chromeos
pipeline issue, independent of this library's progress.
## Port plan
The seam to flip is the entire kernel-userspace V4L2 boundary. The work is
mostly mechanical and concentrated in three files:
### `src/v4l2.c` — helpers (the bottleneck; all other files call into this)
Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it)
and dual paths through:
- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`,
including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1
for raw/NV12 capture.
- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set
`v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane.
- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use
`v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` /
`m.fd`. EXPBUF now needs `plane=0` parameter.
- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch.
The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`.
Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in
FFmpeg already do this branching cleanly — it's the closest API match. Crib
`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's
`gstv4l2decoder.c` is the second reference; it covers the request-API +
mplane path explicitly for the same Rockchip hardware we target.
### `src/context.c` — context creation
`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and
CAPTURE queues. Detect the queue capability at context creation (cache the
mplane bit on the context object) and pick the right type for every
subsequent helper call.
### `src/picture.c` — frame submission
The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same
pattern — switch on the cached mplane bit and use the multiplanar variants
of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`)
is the load-bearing change here.
## Reference implementations (read these side-by-side with our diff)
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec
files like `v4l2_request_h264.c`. Already multiplanar, already works on
hantro/rkvdec — this is the closest-API canonical example.
- <https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec>
- 2024-08 v2 patchset: <https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg169515.html>
- Active downstream: <https://code.ffmpeg.org/Kwiboo/FFmpeg/> (branch `v4l2-request-n8.1`)
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c`
has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact
Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API
controls submission.
- <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/tree/main/subprojects/gst-plugins-bad/sys/v4l2codecs>
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}`
+ `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than
we need but useful for surface lifecycle / request-fd tracking patterns.
- <https://chromium.googlesource.com/chromium/src/+/refs/heads/main/media/gpu/v4l2/>
## Test fixtures
- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro
decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar).
This is the primary dev target. Brave on ohm is the integration test
endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
is the unit test.
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler
(SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the
name). Pull via hertz `lxc file pull`.
- **Reference path that already works on the same hardware**: GStreamer
`gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec !
waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're
trying to match through the libva path.
## Out of scope (for the first port milestone)
- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588
silicon is on the bench AND a separate HEVC-revival pass.
- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing;
this library only forwards parameters. No need to touch.
- HEVC RFC (reference frame compression) — Rockchip-specific, kernel
config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm).
## Build + install
- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup`
+ `ninja` straight off the source tree, no makepkg dance needed for
development iteration.
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars
`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and
`LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
- Once the port works: package as `marfrit/libva-v4l2-request-fourier`
next to `ffmpeg-v4l2-request-git`, with the same
`provides=(libva-v4l2-request-git)` shape.
## Ack
Bootlin authored the original library under MIT/LGPL2.1; this fork adds
GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant
to track upstream if upstream ever picks the work back up.
The git commit that this file points back to (the last commit while STUDY.md still held the substrate content) is `e0acc33``git show e0acc33:STUDY.md` recovers the historical content if needed.