f2c3a4c32f
Update with current state: library builds clean, vainfo enumerates profiles, vaCreateContext succeeds on Brave (with STREAMON deferred as WIP unblocker), next failure is frame pool initialization in vaCreateSurfaces2. Documents the 12-step diff stack vs bootlin upstream and what still needs to happen to actually decode a frame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
193 lines
9.5 KiB
Markdown
193 lines
9.5 KiB
Markdown
# libva-v4l2-request — Fourier port study
|
||
|
||
## Goal
|
||
|
||
Make this libva backend usable on **multiplanar** V4L2 stateless decoders:
|
||
specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588
|
||
hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via
|
||
ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2
|
||
on the Fourier fleet without going through GStreamer.
|
||
|
||
## Why this fork exists
|
||
|
||
Bootlin upstream <https://github.com/bootlin/libva-v4l2-request> went dormant
|
||
around 2021 and was written for **single-plane** sunxi-cedrus decoders.
|
||
Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva
|
||
entirely, targets Chromium/Firefox direct integration, and **is not shipping
|
||
soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public
|
||
forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.
|
||
|
||
Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses
|
||
v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly
|
||
a working libva-v4l2-request to bridge.
|
||
|
||
## State today
|
||
|
||
Bootlin tip is `a3c2476`. Stack of WIP commits on top:
|
||
|
||
### Build cleanly against current kernel UAPI
|
||
|
||
1. `V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` rename.
|
||
2. `src/h264.c`: missing `#include "utils.h"` for `request_log()`.
|
||
3. HEVC stripped — `h265.c`/`h265.h` excluded from `meson.build`,
|
||
`hevc-ctrls.h` replaced by passthrough to `<linux/v4l2-controls.h>`,
|
||
four HEVC case blocks removed from `picture.c`.
|
||
4. `include/h264-ctrls.h` made into a passthrough shim to
|
||
`<linux/v4l2-controls.h>` plus `V4L2_CID_MPEG_VIDEO_H264_* →
|
||
V4L2_CID_STATELESS_H264_*` aliases (kernel renamed during upstreaming).
|
||
5. `src/h264.c` shape updates to track the upstreamed `struct
|
||
v4l2_ctrl_h264_slice_params`: drop `.size`, use `struct
|
||
v4l2_h264_reference {fields, index}` for `ref_pic_list*`, move
|
||
`pred_weight_table` out to its own `V4L2_CID_STATELESS_H264_PRED_WEIGHTS`
|
||
control, drop `decode_params.num_slices` (kernel infers from queued
|
||
controls).
|
||
6. `src/tiled_yuv.S`: aarch64 stub of `tiled_to_planar` — the ARMv7 NEON
|
||
body is `#ifndef __aarch64__`-guarded; without a stub the `.so` had
|
||
an undefined symbol and dlopen failed.
|
||
|
||
Library now builds clean (~265 KB `.so`) and `vainfo` enumerates H.264 +
|
||
MPEG-2 profiles.
|
||
|
||
### Probe + control flow fixes
|
||
|
||
7. `src/video.c`: add NV12 multi-plane format entry; `video_format_find()`
|
||
takes `bool mplane` so single- and multi-plane NV12 entries don't
|
||
collide on pixelformat.
|
||
8. `src/surface.c`: probe block tries `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE`
|
||
as a fallback after single-plane probes fail.
|
||
9. **Eager probe in `RequestInit`** — Chromium's vaapi_video_decoder calls
|
||
`vaCreateContext` *before* `vaCreateSurfaces2`, so the original lazy
|
||
"set `driver_data->video_format` in `RequestCreateSurfaces`" path was
|
||
too late. Promoted the probe into `video_format_probe()` in
|
||
`video.c` and call it at init time.
|
||
10. `src/context.c`: same `V4L2_PIX_FMT_H264_SLICE_RAW → _SLICE` rename
|
||
that was missed in the first pass.
|
||
11. **WIP**: defer `STREAMON` calls in `RequestCreateContext`. The V4L2
|
||
stateless protocol on hantro requires OUTPUT format → SPS controls →
|
||
first slice queued → THEN `STREAMON`. Deferring lets `vaCreateContext`
|
||
succeed; proper sequencing is the next phase.
|
||
|
||
### Diagnostic logging (will revert before final)
|
||
|
||
12. `src/utils.c`: `request_log()` tees to `/tmp/libva-fourier.log` so
|
||
sandboxed Chromium GPU processes don't swallow the trace output.
|
||
|
||
### Failure mode reached now (2026-04-26)
|
||
|
||
`vainfo` works fully. Brave on ohm with the patched `.so` reaches deeper:
|
||
|
||
- Init: detects NV12 multi-plane CAPTURE format ✓
|
||
- vaCreateContext: profile=H264Main 1920×1088, S_FMT + CREATE_BUFS pass ✓
|
||
- vaCreateContext: STREAMON OUTPUT returned EINVAL ← deferred to unblock
|
||
- Now stuck at: `failed Initialize()ing the frame pool` from
|
||
`vaapi_video_decoder.cc`
|
||
|
||
That's the next-phase boundary. `vaCreateSurfaces2` (frame-pool init) is
|
||
where Chromium discovers buffer geometry for output frames; the library
|
||
needs to do CAPTURE-side `S_FMT` + `REQBUFS` + `EXPBUF` without single-
|
||
plane assumptions on a freshly-created context, and the V4L2 stateless
|
||
protocol's source-change-event dance probably needs implementing too.
|
||
|
||
## Port plan
|
||
|
||
The seam to flip is the entire kernel-userspace V4L2 boundary. The work is
|
||
mostly mechanical and concentrated in three files:
|
||
|
||
### `src/v4l2.c` — helpers (the bottleneck; all other files call into this)
|
||
|
||
Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it)
|
||
and dual paths through:
|
||
|
||
- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`,
|
||
including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1
|
||
for raw/NV12 capture.
|
||
- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set
|
||
`v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane.
|
||
- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use
|
||
`v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` /
|
||
`m.fd`. EXPBUF now needs `plane=0` parameter.
|
||
- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch.
|
||
The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`.
|
||
|
||
Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in
|
||
FFmpeg already do this branching cleanly — it's the closest API match. Crib
|
||
`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's
|
||
`gstv4l2decoder.c` is the second reference; it covers the request-API +
|
||
mplane path explicitly for the same Rockchip hardware we target.
|
||
|
||
### `src/context.c` — context creation
|
||
|
||
`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and
|
||
CAPTURE queues. Detect the queue capability at context creation (cache the
|
||
mplane bit on the context object) and pick the right type for every
|
||
subsequent helper call.
|
||
|
||
### `src/picture.c` — frame submission
|
||
|
||
The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same
|
||
pattern — switch on the cached mplane bit and use the multiplanar variants
|
||
of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`)
|
||
is the load-bearing change here.
|
||
|
||
## Reference implementations (read these side-by-side with our diff)
|
||
|
||
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec
|
||
files like `v4l2_request_h264.c`. Already multiplanar, already works on
|
||
hantro/rkvdec — this is the closest-API canonical example.
|
||
- <https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec>
|
||
- 2024-08 v2 patchset: <https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg169515.html>
|
||
- Active downstream: <https://code.ffmpeg.org/Kwiboo/FFmpeg/> (branch `v4l2-request-n8.1`)
|
||
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c`
|
||
has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact
|
||
Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API
|
||
controls submission.
|
||
- <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/tree/main/subprojects/gst-plugins-bad/sys/v4l2codecs>
|
||
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}`
|
||
+ `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than
|
||
we need but useful for surface lifecycle / request-fd tracking patterns.
|
||
- <https://chromium.googlesource.com/chromium/src/+/refs/heads/main/media/gpu/v4l2/>
|
||
|
||
## Test fixtures
|
||
|
||
- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro
|
||
decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar).
|
||
This is the primary dev target. Brave on ohm is the integration test
|
||
endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
|
||
is the unit test.
|
||
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler
|
||
(SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the
|
||
name). Pull via hertz `lxc file pull`.
|
||
- **Reference path that already works on the same hardware**: GStreamer
|
||
`gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec !
|
||
waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're
|
||
trying to match through the libva path.
|
||
|
||
## Out of scope (for the first port milestone)
|
||
|
||
- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588
|
||
silicon is on the bench AND a separate HEVC-revival pass.
|
||
- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
|
||
- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing;
|
||
this library only forwards parameters. No need to touch.
|
||
- HEVC RFC (reference frame compression) — Rockchip-specific, kernel
|
||
config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm).
|
||
|
||
## Build + install
|
||
|
||
- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup`
|
||
+ `ninja` straight off the source tree, no makepkg dance needed for
|
||
development iteration.
|
||
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
|
||
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars
|
||
`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and
|
||
`LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
|
||
- Once the port works: package as `marfrit/libva-v4l2-request-fourier`
|
||
next to `ffmpeg-v4l2-request-git`, with the same
|
||
`provides=(libva-v4l2-request-git)` shape.
|
||
|
||
## Ack
|
||
|
||
Bootlin authored the original library under MIT/LGPL2.1; this fork adds
|
||
GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant
|
||
to track upstream if upstream ever picks the work back up.
|