STUDY.md: replace with pointer to libva-multiplanar campaign Phase 0
The Phase 0 / Phase 2 substrate that lived here has been transformed
into ../phase0_findings.md as the campaign-level Phase 0 document.
This file is reduced to a pointer + a git-show recipe to recover the
prior content from commit e0acc33.
This commit is contained in:
@@ -1,233 +1,9 @@
|
||||
# libva-v4l2-request — Fourier port study
|
||||
# STUDY.md → moved
|
||||
|
||||
## Goal
|
||||
The Phase 0 / Phase 2 substrate that previously lived here has been transformed into the campaign-level Phase 0 document at:
|
||||
|
||||
Make this libva backend usable on **multiplanar** V4L2 stateless decoders:
|
||||
specifically the Rockchip Hantro VPU (RK3566 ohm) and the upcoming RK3588
|
||||
hantro/VDPU381 path. End deliverable: any VAAPI client (Brave, Firefox via
|
||||
ffmpeg-vaapi, mpv `--hwdec=vaapi`, vlc, ...) gets HW decode for H.264 + MPEG-2
|
||||
on the Fourier fleet without going through GStreamer.
|
||||
- [`../phase0_findings.md`](../phase0_findings.md)
|
||||
|
||||
## Why this fork exists
|
||||
That document also points at the remaining open questions for Phase 1 lock and the verification gate at Phase 7. Read it together with the campaign README at [`../README.md`](../README.md).
|
||||
|
||||
Bootlin upstream <https://github.com/bootlin/libva-v4l2-request> went dormant
|
||||
around 2021 and was written for **single-plane** sunxi-cedrus decoders.
|
||||
Collabora's strategic replacement is `cros-codecs` (Rust) — it bypasses libva
|
||||
entirely, targets Chromium/Firefox direct integration, and **is not shipping
|
||||
soon**. That leaves a hole for VAAPI clients on Rockchip. None of the public
|
||||
forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) shipped multiplanar.
|
||||
|
||||
Reference: Mozilla bug 1833354 / 1965646 explicitly notes "Rockchip uses
|
||||
v4l2-request, not v4l2-m2m" — Firefox HW decode on RK3566/RK3588 needs exactly
|
||||
a working libva-v4l2-request to bridge.
|
||||
|
||||
## State today
|
||||
|
||||
Bootlin tip is `a3c2476`. Stack of WIP commits on top:
|
||||
|
||||
### Build cleanly against current kernel UAPI
|
||||
|
||||
1. `V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` rename.
|
||||
2. `src/h264.c`: missing `#include "utils.h"` for `request_log()`.
|
||||
3. HEVC stripped — `h265.c`/`h265.h` excluded from `meson.build`,
|
||||
`hevc-ctrls.h` replaced by passthrough to `<linux/v4l2-controls.h>`,
|
||||
four HEVC case blocks removed from `picture.c`.
|
||||
4. `include/h264-ctrls.h` made into a passthrough shim to
|
||||
`<linux/v4l2-controls.h>` plus `V4L2_CID_MPEG_VIDEO_H264_* →
|
||||
V4L2_CID_STATELESS_H264_*` aliases (kernel renamed during upstreaming).
|
||||
5. `src/h264.c` shape updates to track the upstreamed `struct
|
||||
v4l2_ctrl_h264_slice_params`: drop `.size`, use `struct
|
||||
v4l2_h264_reference {fields, index}` for `ref_pic_list*`, move
|
||||
`pred_weight_table` out to its own `V4L2_CID_STATELESS_H264_PRED_WEIGHTS`
|
||||
control, drop `decode_params.num_slices` (kernel infers from queued
|
||||
controls).
|
||||
6. `src/tiled_yuv.S`: aarch64 stub of `tiled_to_planar` — the ARMv7 NEON
|
||||
body is `#ifndef __aarch64__`-guarded; without a stub the `.so` had
|
||||
an undefined symbol and dlopen failed.
|
||||
|
||||
Library now builds clean (~265 KB `.so`) and `vainfo` enumerates H.264 +
|
||||
MPEG-2 profiles.
|
||||
|
||||
### Probe + control flow fixes
|
||||
|
||||
7. `src/video.c`: add NV12 multi-plane format entry; `video_format_find()`
|
||||
takes `bool mplane` so single- and multi-plane NV12 entries don't
|
||||
collide on pixelformat.
|
||||
8. `src/surface.c`: probe block tries `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE`
|
||||
as a fallback after single-plane probes fail.
|
||||
9. **Eager probe in `RequestInit`** — Chromium's vaapi_video_decoder calls
|
||||
`vaCreateContext` *before* `vaCreateSurfaces2`, so the original lazy
|
||||
"set `driver_data->video_format` in `RequestCreateSurfaces`" path was
|
||||
too late. Promoted the probe into `video_format_probe()` in
|
||||
`video.c` and call it at init time.
|
||||
10. `src/context.c`: same `V4L2_PIX_FMT_H264_SLICE_RAW → _SLICE` rename
|
||||
that was missed in the first pass.
|
||||
11. **WIP**: defer `STREAMON` calls in `RequestCreateContext`. The V4L2
|
||||
stateless protocol on hantro requires OUTPUT format → SPS controls →
|
||||
first slice queued → THEN `STREAMON`. Deferring lets `vaCreateContext`
|
||||
succeed; proper sequencing is the next phase.
|
||||
|
||||
### Diagnostic logging (will revert before final)
|
||||
|
||||
12. `src/utils.c`: `request_log()` tees to `/tmp/libva-fourier.log` so
|
||||
sandboxed Chromium GPU processes don't swallow the trace output.
|
||||
|
||||
### Failure mode reached now (2026-04-26)
|
||||
|
||||
`vainfo` works fully. **mpv `--hwdec=vaapi` probes our driver end-to-end
|
||||
successfully** (seven profiles enumerated, `RequestQueryImageFormats /
|
||||
QueryConfigEntrypoints / CreateConfig / QuerySurfaceAttributes /
|
||||
CreateSurfaces2 / DeriveImage / CreateImage / CreateBuffer /
|
||||
ExportSurfaceHandle` all run clean) — then falls back to libavcodec's
|
||||
software H.264 decoder for the actual decode. mpv's drop pattern matches
|
||||
the SW baseline (≈16 drops/s through KWin gpu-next), so the SW path is
|
||||
in use, not ours.
|
||||
|
||||
Brave's failure is **not** in our driver. The verbose log shows:
|
||||
|
||||
```
|
||||
ApplyResolutionChangeWithScreenSizes()
|
||||
PickDecoderOutputFormat(): Initializing ImageProcessor; max buffers: 16
|
||||
ERROR: failed Initialize()ing the frame pool
|
||||
```
|
||||
|
||||
`PickDecoderOutputFormat` from `media/gpu/chromeos/video_decoder_pipeline.cc`
|
||||
is the **ChromeOS** pipeline trying to init a V4L2 ImageProcessor (a
|
||||
ChromeOS-specific concept — separate V4L2 m2m chip block for color
|
||||
conversion / scaling). Brave-on-Linux runs this code path because the
|
||||
build doesn't gate it on `is_chromeos`, but on a plain Linux Wayland
|
||||
system there's nothing for the ImageProcessor to bind to and it bails
|
||||
before any libva call lands in our driver. **No code change in
|
||||
libva-v4l2-request-fourier will fix Brave**; the lever is on the
|
||||
Chromium / Brave build side (skip the chromeos pipeline, or supply the
|
||||
expected V4L2 image processor device).
|
||||
|
||||
### What still needs to happen for actual decode
|
||||
|
||||
The libva entry-point surface is largely done (probing works); the
|
||||
remaining gap is the **decode submission path**:
|
||||
|
||||
- `RequestBeginPicture` / `RequestRenderPicture` / `RequestEndPicture`
|
||||
need to map to V4L2 stateless: queue OUTPUT buffer with the encoded
|
||||
slice + the SPS/PPS/SLICE_PARAMS/PRED_WEIGHTS/DECODE_PARAMS/
|
||||
SCALING_MATRIX controls via the request fd, then trigger decode and
|
||||
dequeue a CAPTURE buffer.
|
||||
- The `STREAMON` ordering needs proper sequencing on hantro: the
|
||||
current WIP defer in `RequestCreateContext` (commit `44a7327`) bypasses
|
||||
the EINVAL but doesn't actually enable the queue. The real fix is to
|
||||
set both queue formats up front, queue the first buffer with controls
|
||||
attached, then `STREAMON` both queues.
|
||||
- Source-change-event (`V4L2_EVENT_SOURCE_CHANGE`) handling is probably
|
||||
needed for resolution-change streams; not strictly required for the
|
||||
fixed-resolution Big Buck Bunny clip we test with.
|
||||
|
||||
Once that lands, `mpv --hwdec=vaapi` should switch from "probe then SW
|
||||
decode" to "actual HW decode through our path", and the user-facing
|
||||
recipe matches what FFmpeg `-hwaccel v4l2request -hwaccel_output_format
|
||||
drm_prime` already delivers (14 % CPU realtime).
|
||||
|
||||
Brave / Chromium specifically remains parked behind the chromeos
|
||||
pipeline issue, independent of this library's progress.
|
||||
|
||||
## Port plan
|
||||
|
||||
The seam to flip is the entire kernel-userspace V4L2 boundary. The work is
|
||||
mostly mechanical and concentrated in three files:
|
||||
|
||||
### `src/v4l2.c` — helpers (the bottleneck; all other files call into this)
|
||||
|
||||
Add a `v4l2_type_is_mplane()` predicate (one already exists upstream — keep it)
|
||||
and dual paths through:
|
||||
|
||||
- `v4l2_set_format()` — populate either `format.fmt.pix` or `format.fmt.pix_mp`,
|
||||
including `plane_fmt[0].sizeimage` for OUTPUT and `num_planes` defaulting to 1
|
||||
for raw/NV12 capture.
|
||||
- `v4l2_create_buffers()` / `v4l2_request_buffers()` — set
|
||||
`v4l2_create_buffers.format.type` to the MPLANE variant when source is mplane.
|
||||
- `v4l2_query_buffer()` / `v4l2_export_buffer()` — switch on type, use
|
||||
`v4l2_buffer.m.planes` array (length `num_planes`) instead of `m.userptr` /
|
||||
`m.fd`. EXPBUF now needs `plane=0` parameter.
|
||||
- `v4l2_queue_buffer()` / `v4l2_dequeue_buffer()` — same `m.planes[]` switch.
|
||||
The OUTPUT side passes the bitstream slice as `m.planes[0].bytesused`.
|
||||
|
||||
Reference: `libavcodec/v4l2_buffers.c` and `libavcodec/v4l2_context.c` in
|
||||
FFmpeg already do this branching cleanly — it's the closest API match. Crib
|
||||
`V4L2_TYPE_IS_MULTIPLANAR()` style switching there. GStreamer's
|
||||
`gstv4l2decoder.c` is the second reference; it covers the request-API +
|
||||
mplane path explicitly for the same Rockchip hardware we target.
|
||||
|
||||
### `src/context.c` — context creation
|
||||
|
||||
`RequestCreateContext` calls into `v4l2_set_format()` for the OUTPUT and
|
||||
CAPTURE queues. Detect the queue capability at context creation (cache the
|
||||
mplane bit on the context object) and pick the right type for every
|
||||
subsequent helper call.
|
||||
|
||||
### `src/picture.c` — frame submission
|
||||
|
||||
The QBUF / DQBUF / EXPBUF paths in `RequestEndPicture()` and friends. Same
|
||||
pattern — switch on the cached mplane bit and use the multiplanar variants
|
||||
of the `v4l2.c` helpers. The slice-data submission (`m.planes[0].bytesused`)
|
||||
is the load-bearing change here.
|
||||
|
||||
## Reference implementations (read these side-by-side with our diff)
|
||||
|
||||
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec
|
||||
files like `v4l2_request_h264.c`. Already multiplanar, already works on
|
||||
hantro/rkvdec — this is the closest-API canonical example.
|
||||
- <https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec>
|
||||
- 2024-08 v2 patchset: <https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg169515.html>
|
||||
- Active downstream: <https://code.ffmpeg.org/Kwiboo/FFmpeg/> (branch `v4l2-request-n8.1`)
|
||||
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/`. `gstv4l2decoder.c`
|
||||
has the canonical multiplanar S_FMT / REQBUFS / EXPBUF code on the exact
|
||||
Rockchip drivers we target. `gstv4l2codecsh264dec.c` shows the request-API
|
||||
controls submission.
|
||||
- <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/tree/main/subprojects/gst-plugins-bad/sys/v4l2codecs>
|
||||
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}`
|
||||
+ `v4l2_queue.cc`. ChromeOS-mature multiplanar code; higher abstraction than
|
||||
we need but useful for surface lifecycle / request-fd tracking patterns.
|
||||
- <https://chromium.googlesource.com/chromium/src/+/refs/heads/main/media/gpu/v4l2/>
|
||||
|
||||
## Test fixtures
|
||||
|
||||
- **ohm** — RK3566 PineTab2, kernel `6.19.10-danctnix1-1-pinetab2`. Hantro
|
||||
decoder exposes S264 / MG2S / VP8F formats on `/dev/video1` (multiplanar).
|
||||
This is the primary dev target. Brave on ohm is the integration test
|
||||
endpoint; `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
|
||||
is the unit test.
|
||||
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler
|
||||
(SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source despite the
|
||||
name). Pull via hertz `lxc file pull`.
|
||||
- **Reference path that already works on the same hardware**: GStreamer
|
||||
`gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec !
|
||||
waylandsink` — 6 % CPU, zero drops. That's the ceiling for what we're
|
||||
trying to match through the libva path.
|
||||
|
||||
## Out of scope (for the first port milestone)
|
||||
|
||||
- HEVC — kernel CIDs renamed, RK3566 has no HW HEVC. Deferred until RK3588
|
||||
silicon is on the bench AND a separate HEVC-revival pass.
|
||||
- VP9, VP8, AV1 — no HW path or out of bootlin's original codec set.
|
||||
- Userspace bitstream parsing — kernel V4L2 stateless API does the parsing;
|
||||
this library only forwards parameters. No need to touch.
|
||||
- HEVC RFC (reference frame compression) — Rockchip-specific, kernel
|
||||
config has it disabled (`CONFIG_VIDEO_HANTRO_HEVC_RFC=n` on ohm).
|
||||
|
||||
## Build + install
|
||||
|
||||
- Build container: `fermi` (Arch ARM aarch64 LXC on hertz). `meson setup`
|
||||
+ `ninja` straight off the source tree, no makepkg dance needed for
|
||||
development iteration.
|
||||
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
|
||||
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` plus the path env vars
|
||||
`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` and
|
||||
`LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
|
||||
- Once the port works: package as `marfrit/libva-v4l2-request-fourier`
|
||||
next to `ffmpeg-v4l2-request-git`, with the same
|
||||
`provides=(libva-v4l2-request-git)` shape.
|
||||
|
||||
## Ack
|
||||
|
||||
Bootlin authored the original library under MIT/LGPL2.1; this fork adds
|
||||
GPL-2.0-licensed shim files (HEVC strip, multiplanar plumbing) and is meant
|
||||
to track upstream if upstream ever picks the work back up.
|
||||
The git commit that this file points back to (the last commit while STUDY.md still held the substrate content) is `e0acc33` — `git show e0acc33:STUDY.md` recovers the historical content if needed.
|
||||
|
||||
Reference in New Issue
Block a user