STUDY.md: port plan + reference implementations + test fixtures
Cold-start dossier for the multiplanar port: goal, why-this-fork-exists, state-today, port plan (v4l2.c / context.c / picture.c), reference impls to read side-by-side (FFmpeg libavcodec/v4l2_request*, GStreamer gst-plugins-bad/sys/v4l2codecs, Chromium media/gpu/v4l2), test fixtures (ohm + bbb_1080p30_h264.mp4 + GStreamer ceiling at 6% CPU), out-of-scope (HEVC/VP9/AV1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,148 @@
|
|||||||
|
# libva-v4l2-request — Fourier port study
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Make this libva backend usable on **multiplanar** V4L2 stateless decoders:
|
||||||
|
specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588
|
||||||
|
hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via
|
||||||
|
ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2
|
||||||
|
on the Fourier fleet without going through GStreamer.
|
||||||
|
|
||||||
|
## Why this fork exists
|
||||||
|
|
||||||
|
Bootlin upstream <https://github.com/bootlin/libva-v4l2-request> went dormant
|
||||||
|
around 2021 and was written for **single-plane** sunxi-cedrus decoders.
|
||||||
|
Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva
|
||||||
|
entirely, targets Chromium/Firefox direct integration, and **is not shipping
|
||||||
|
soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public
|
||||||
|
forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.
|
||||||
|
|
||||||
|
Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses
|
||||||
|
v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly
|
||||||
|
a working libva-v4l2-request to bridge.
|
||||||
|
|
||||||
|
## State today
|
||||||
|
|
||||||
|
Initial commits applied to bootlin tip `a3c2476`:
|
||||||
|
|
||||||
|
1. `V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` (kernel UAPI rename).
|
||||||
|
2. `src/h264.c`: missing `#include "utils.h"` for `request_log()` (GCC 14 fatal).
|
||||||
|
3. HEVC stripped — h265.c/h265.h excluded from `meson.build`, hevc-ctrls.h
|
||||||
|
replaced by passthrough to `<linux/v4l2-controls.h>`, four HEVC case blocks
|
||||||
|
removed from `picture.c` (kernel `V4L2_CID_MPEG_VIDEO_HEVC_*` was renamed
|
||||||
|
to `V4L2_CID_STATELESS_HEVC_*`; ohm has no HW HEVC anyway).
|
||||||
|
4. `src/config.c`: profile probe falls back to `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE`
|
||||||
|
when single-plane returns no formats. **`vainfo` now lists H.264 + MPEG-2
|
||||||
|
profiles on ohm**, and Brave's GPU process picks them up via VAAPI.
|
||||||
|
|
||||||
|
Failure mode reached and stuck on:
|
||||||
|
|
||||||
|
```
|
||||||
|
ERROR:media/gpu/vaapi/vaapi_wrapper.cc:2407] vaCreateContext failed,
|
||||||
|
VA error: operation failed
|
||||||
|
```
|
||||||
|
|
||||||
|
That's the next-phase boundary — the real port work below.
|
||||||
|
|
||||||
|
## Port plan
|
||||||
|
|
||||||
|
The seam to flip is the entire kernel-userspace V4L2 boundary. The work is
|
||||||
|
mostly mechanical and concentrated in three files:
|
||||||
|
|
||||||
|
### `src/v4l2.c` — helpers (the bottleneck; all other files call into this)
|
||||||
|
|
||||||
|
Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it)
|
||||||
|
and dual paths through:
|
||||||
|
|
||||||
|
- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`,
|
||||||
|
including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1
|
||||||
|
for raw/NV12 capture.
|
||||||
|
- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set
|
||||||
|
`v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane.
|
||||||
|
- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use
|
||||||
|
`v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` /
|
||||||
|
`m.fd`. EXPBUF now needs `plane=0` parameter.
|
||||||
|
- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch.
|
||||||
|
The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`.
|
||||||
|
|
||||||
|
Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in
|
||||||
|
FFmpeg already do this branching cleanly — it's the closest API match. Crib
|
||||||
|
`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's
|
||||||
|
`gstv4l2decoder.c` is the second reference; it covers the request-API +
|
||||||
|
mplane path explicitly for the same Rockchip hardware we target.
|
||||||
|
|
||||||
|
### `src/context.c` — context creation
|
||||||
|
|
||||||
|
`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and
|
||||||
|
CAPTURE queues. Detect the queue capability at context creation (cache the
|
||||||
|
mplane bit on the context object) and pick the right type for every
|
||||||
|
subsequent helper call.
|
||||||
|
|
||||||
|
### `src/picture.c` — frame submission
|
||||||
|
|
||||||
|
The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same
|
||||||
|
pattern — switch on the cached mplane bit and use the multiplanar variants
|
||||||
|
of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`)
|
||||||
|
is the load-bearing change here.
|
||||||
|
|
||||||
|
## Reference implementations (read these side-by-side with our diff)
|
||||||
|
|
||||||
|
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec
|
||||||
|
files like `v4l2_request_h264.c`. Already multiplanar, already works on
|
||||||
|
hantro/rkvdec — this is the closest-API canonical example.
|
||||||
|
- <https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec>
|
||||||
|
- 2024-08 v2 patchset: <https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg169515.html>
|
||||||
|
- Active downstream: <https://code.ffmpeg.org/Kwiboo/FFmpeg/> (branch `v4l2-request-n8.1`)
|
||||||
|
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c`
|
||||||
|
has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact
|
||||||
|
Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API
|
||||||
|
controls submission.
|
||||||
|
- <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/tree/main/subprojects/gst-plugins-bad/sys/v4l2codecs>
|
||||||
|
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}`
|
||||||
|
+ `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than
|
||||||
|
we need but useful for surface lifecycle / request-fd tracking patterns.
|
||||||
|
- <https://chromium.googlesource.com/chromium/src/+/refs/heads/main/media/gpu/v4l2/>
|
||||||
|
|
||||||
|
## Test fixtures
|
||||||
|
|
||||||
|
- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro
|
||||||
|
decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar).
|
||||||
|
This is the primary dev target. Brave on ohm is the integration test
|
||||||
|
endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
|
||||||
|
is the unit test.
|
||||||
|
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler
|
||||||
|
(SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the
|
||||||
|
name). Pull via hertz `lxc file pull`.
|
||||||
|
- **Reference path that already works on the same hardware**: GStreamer
|
||||||
|
`gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec !
|
||||||
|
waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're
|
||||||
|
trying to match through the libva path.
|
||||||
|
|
||||||
|
## Out of scope (for the first port milestone)
|
||||||
|
|
||||||
|
- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588
|
||||||
|
silicon is on the bench AND a separate HEVC-revival pass.
|
||||||
|
- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
|
||||||
|
- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing;
|
||||||
|
this library only forwards parameters. No need to touch.
|
||||||
|
- HEVC RFC (reference frame compression) — Rockchip-specific, kernel
|
||||||
|
config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm).
|
||||||
|
|
||||||
|
## Build + install
|
||||||
|
|
||||||
|
- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup`
|
||||||
|
+ `ninja` straight off the source tree, no makepkg dance needed for
|
||||||
|
development iteration.
|
||||||
|
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
|
||||||
|
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars
|
||||||
|
`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and
|
||||||
|
`LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
|
||||||
|
- Once the port works: package as `marfrit/libva-v4l2-request-fourier`
|
||||||
|
next to `ffmpeg-v4l2-request-git`, with the same
|
||||||
|
`provides=(libva-v4l2-request-git)` shape.
|
||||||
|
|
||||||
|
## Ack
|
||||||
|
|
||||||
|
Bootlin authored the original library under MIT/LGPL2.1; this fork adds
|
||||||
|
GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant
|
||||||
|
to track upstream if upstream ever picks the work back up.
|
||||||
Reference in New Issue
Block a user