README: candid 'standoff' framing for Pi 5 HEVC + RK matrix

Replace the original 2018 Bootlin upstream README with the
fourier-fork situation as of May 2026. What works: fresnel 5/5,
ampere iter1+2, ohm baseline (all RK family, mainline VDPU381/383
landing Feb 2026 helps).

What doesn't: Pi 5 HEVC via this backend. New 'The Pi 5 standoff'
section captures the honest situation surfaced by the May 2026
web-research pass:

- Kwiboo's ffmpeg-v4l2request hwaccel: 8 years un-merged upstream
- libva-v4l2-request: no commits since ~2021
- rpi-hevc-dec mainline: 17 months in review, still not merged;
  Pi 6.18.x downstream has active HEVC regressions (#7228, #7306)
- Mozilla bug 1969297 picks the ffmpeg-hwaccel-context path, not
  libva — explicit ack that strict drivers need libavcodec's
  internal SPS context
- Frames the issue as ecosystem coordination failure (principal-
  agent stalemate), not architectural impossibility

Notes that iter40 + iter40b lands but parks: backend infra is
sound + reusable for any future strict V4L2 stateless target ffmpeg
ships before libva does, but the user-facing Pi 5 HEVC story will
not come from this backend — it'll come from Mozilla / Kwiboo /
upstream coordination unblocking.

iter38 5/5 fresnel + 9-profile ampere baselines preserved
post-iter40b — documented as no-regression in phase7_pi5_hevc_close.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 19:58:52 +00:00
parent 071b08dcc2
commit 941fbc5b1b
+131 -56
View File
@@ -1,75 +1,150 @@
# v4l2-request libVA Backend
# libva-v4l2-request-fourier
## About
VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
fork of the dormant `bootlin/libva-v4l2-request` upstream.
This libVA backend is designed to work with the Linux Video4Linux2
Request API that is used by a number of video codecs drivers,
including the Video Engine found in most Allwinner SoCs.
## What works
## Status
| SoC / host | Codecs verified bit-exact vs `kdirect` |
|---|---|
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 — 5/5 at iter38 |
| RK3588 (ampere) | H.264 (iter1 ampere-fourier); HEVC EXT_SPS structure clean (iter2); other codecs in progress |
| RK3568 / RK3566 (ohm — PineTab2) | iter1-5 baseline (libva-multiplanar campaign) |
The v4l2-request libVA backend currently supports the following formats:
* MPEG2 (Simple and Main profiles)
* H264 (Baseline, Main and High profiles)
* H265 (Main profile)
`kdirect` = `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime ...`
through Kwiboo's downstream ffmpeg patches. The Rockchip family has the
benefit of years of `rkvdec` + `hantro-vpu` iteration in mainline + the
RK3588/RK3576 video decoder series **landing in mainline February 2026**.
## What does NOT work, and why it's stalled
| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
### The Pi 5 standoff
iter40 + iter40b add a third multi-device-probe slot for
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
and a (fragile, fixture-specific) SPS field override using the
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
**Decode itself never succeeds** — every CAPTURE DQBUF returns
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
validation is intentional ("`try_ext_ctrls returned an error (22)` is
expected as it is validating the SPS"), and VAAPI's
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
slice-level `num_entry_point_offsets`) that the driver wants. We can't
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
the SPS itself and passes only slice NAL bytes to libva backends.
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
driver. It's an ecosystem coordination failure of long standing:
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
series in August 2024. Still un-merged in May 2026 — **eight years
in the upstream review queue**.
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
meaningful commits since ~2021. Nobody wants to own the impedance
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
I'll just drive the HW."
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
months in review. The Pi 6.18.x downstream kernel meanwhile has
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
aren't being fast-tracked because "the new uAPI is coming."
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
not via libva — explicit acknowledgement from David Turner that
libavcodec needs to retain the SPS context for the strict driver to
accept the control batch.
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
kernel). Anyone on a stock distro outside those two: no HW HEVC on
Pi 5.
Nobody who has authority to merge has skin in the game. Everyone with
skin in the game lacks authority. Result: 8-year stalemate, three
forks of working code, no merged upstream.
### What this means for this backend
We chose to extend `libva-v4l2-request` into Pi 5 territory because
the architecture maps cleanly onto the existing iter38 multi-device
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
`071b08d`). It's reusable infrastructure for any future strict V4L2
stateless decoder that ffmpeg ships before libva does.
But the *user-facing* Pi 5 HEVC story will not come from this
backend. The backend was a clean architectural target inside a
coordination dead-end. The actual Pi 5 HEVC path through libva
requires either:
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
against (Intel-driven; no Pi-aligned principal),
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
maintainer),
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
(small upstream patch; no champion), OR
- the political situation around Kwiboo's series unblocks (no
visible movement).
iter40 + iter40b are **landed but parked**. The fresnel + ampere
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
verified post-iter40b, no regression). Phase 8 packaging is
deliberately skipped — shipping a `.deb` whose primary advertised
target (Pi 5) doesn't actually decode would mislead users.
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
chapter's full empirical record.
## Instructions
In order to use this libVA backend, the `v4l2_request` driver has to
be specified through the `LIBVA_DRIVER_NAME` environment variable, as
such:
In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
variable:
export LIBVA_DRIVER_NAME=v4l2_request
A media player that supports VAAPI (such as VLC) can then be used to decode a
video in a supported format:
Then a VA-API-capable player can decode supported codecs on a probed
device:
vlc path/to/video.mpg
vlc path/to/video.mp4
mpv --hwdec=vaapi path/to/video.mp4
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
Sample media files can be obtained from:
http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/
The backend auto-detects available decoders via the V4L2 media
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
## Technical Notes
### Surface
### Multi-device probe (iter38)
A Surface is an internal data structure never handled by the VA's user
containing the output of a rendering. Usualy, a bunch of surfaces are created
at the begining of decoding and they are then used alternatively. When
created, a surface is assigned a corresponding v4l capture buffer and it is
kept until the end of decoding. Syncing a surface waits for the v4l buffer to
be available and then dequeue it.
A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
re-targets the active fd per profile via
`request_switch_device_for_profile()`. Pool teardown happens at
switch time; the next `CreateContext` rebuilds against the right
device.
Note: since a Surface is kept private from the VA's user, it can ask to
directly render a Surface on screen in an X Drawable. Some kind of
implementation is available in PutSurface but this is only for development
purpose.
### Surface / Context / Picture / Image
### Context
A Surface is an internal data structure containing rendering output.
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
defaults) for one decode session. A Picture is one encoded input
frame's set of buffers. An Image is a Standard VA pixel-format view
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
NV15 to NV12/P010 here so consumers see linear pitches.
A Context is a global data structure used for rendering a video of a certain
format. When a context is created, input buffers are created and v4l's output
(which is the compressed data input queue, since capture is the real output)
format is set.
### Picture
A Picture is an encoded input frame made of several buffers. A single input
can contain slice data, headers and IQ matrix. Each Picture is assigned a
request ID when created and each corresponding buffer might be turned into a
v4l buffers or extended control when rendered. Finally they are submitted to
kernel space when reaching EndPicture.
The real rendering is done in EndPicture instead of RenderPicture
because the v4l2 driver expects to have the full corresponding
extended control when a buffer is queued and we don't know in which
order the different RenderPicture will be called.
### Image
An Image is a standard data structure containing rendered frames in a usable
pixel format. Here we only use NV12 buffers which are converted from sunxi's
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
Surface.
The real rendering is in `EndPicture`, not `RenderPicture`, because
the kernel needs the full extended-control batch when the OUTPUT
buffer is queued, and `RenderPicture` order is consumer-defined.