forked from marfrit/libva-v4l2-request-fourier
README: candid 'standoff' framing for Pi 5 HEVC + RK matrix
Replace the original 2018 Bootlin upstream README with the fourier-fork situation as of May 2026. What works: fresnel 5/5, ampere iter1+2, ohm baseline (all RK family, mainline VDPU381/383 landing Feb 2026 helps). What doesn't: Pi 5 HEVC via this backend. New 'The Pi 5 standoff' section captures the honest situation surfaced by the May 2026 web-research pass: - Kwiboo's ffmpeg-v4l2request hwaccel: 8 years un-merged upstream - libva-v4l2-request: no commits since ~2021 - rpi-hevc-dec mainline: 17 months in review, still not merged; Pi 6.18.x downstream has active HEVC regressions (#7228, #7306) - Mozilla bug 1969297 picks the ffmpeg-hwaccel-context path, not libva — explicit ack that strict drivers need libavcodec's internal SPS context - Frames the issue as ecosystem coordination failure (principal- agent stalemate), not architectural impossibility Notes that iter40 + iter40b lands but parks: backend infra is sound + reusable for any future strict V4L2 stateless target ffmpeg ships before libva does, but the user-facing Pi 5 HEVC story will not come from this backend — it'll come from Mozilla / Kwiboo / upstream coordination unblocking. iter38 5/5 fresnel + 9-profile ampere baselines preserved post-iter40b — documented as no-regression in phase7_pi5_hevc_close. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,75 +1,150 @@
|
||||
# v4l2-request libVA Backend
|
||||
# libva-v4l2-request-fourier
|
||||
|
||||
## About
|
||||
VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
|
||||
fork of the dormant `bootlin/libva-v4l2-request` upstream.
|
||||
|
||||
This libVA backend is designed to work with the Linux Video4Linux2
|
||||
Request API that is used by a number of video codecs drivers,
|
||||
including the Video Engine found in most Allwinner SoCs.
|
||||
## What works
|
||||
|
||||
## Status
|
||||
| SoC / host | Codecs verified bit-exact vs `kdirect` |
|
||||
|---|---|
|
||||
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 — 5/5 at iter38 |
|
||||
| RK3588 (ampere) | H.264 (iter1 ampere-fourier); HEVC EXT_SPS structure clean (iter2); other codecs in progress |
|
||||
| RK3568 / RK3566 (ohm — PineTab2) | iter1-5 baseline (libva-multiplanar campaign) |
|
||||
|
||||
The v4l2-request libVA backend currently supports the following formats:
|
||||
* MPEG2 (Simple and Main profiles)
|
||||
* H264 (Baseline, Main and High profiles)
|
||||
* H265 (Main profile)
|
||||
`kdirect` = `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime ...`
|
||||
through Kwiboo's downstream ffmpeg patches. The Rockchip family has the
|
||||
benefit of years of `rkvdec` + `hantro-vpu` iteration in mainline + the
|
||||
RK3588/RK3576 video decoder series **landing in mainline February 2026**.
|
||||
|
||||
## What does NOT work, and why it's stalled
|
||||
|
||||
| Target | Status | Blocker |
|
||||
|---|---|---|
|
||||
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
|
||||
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
|
||||
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
|
||||
|
||||
### The Pi 5 standoff
|
||||
|
||||
iter40 + iter40b add a third multi-device-probe slot for
|
||||
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
|
||||
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
|
||||
and a (fragile, fixture-specific) SPS field override using the
|
||||
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
|
||||
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
|
||||
|
||||
**Decode itself never succeeds** — every CAPTURE DQBUF returns
|
||||
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
|
||||
validation is intentional ("`try_ext_ctrls returned an error (22)` is
|
||||
expected as it is validating the SPS"), and VAAPI's
|
||||
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
|
||||
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
|
||||
slice-level `num_entry_point_offsets`) that the driver wants. We can't
|
||||
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
|
||||
the SPS itself and passes only slice NAL bytes to libva backends.
|
||||
|
||||
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
|
||||
driver. It's an ecosystem coordination failure of long standing:
|
||||
|
||||
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
|
||||
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
|
||||
series in August 2024. Still un-merged in May 2026 — **eight years
|
||||
in the upstream review queue**.
|
||||
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
|
||||
meaningful commits since ~2021. Nobody wants to own the impedance
|
||||
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
|
||||
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
|
||||
I'll just drive the HW."
|
||||
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
|
||||
months in review. The Pi 6.18.x downstream kernel meanwhile has
|
||||
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
|
||||
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
|
||||
aren't being fast-tracked because "the new uAPI is coming."
|
||||
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
|
||||
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
|
||||
not via libva — explicit acknowledgement from David Turner that
|
||||
libavcodec needs to retain the SPS context for the strict driver to
|
||||
accept the control batch.
|
||||
|
||||
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
|
||||
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
|
||||
kernel). Anyone on a stock distro outside those two: no HW HEVC on
|
||||
Pi 5.
|
||||
|
||||
Nobody who has authority to merge has skin in the game. Everyone with
|
||||
skin in the game lacks authority. Result: 8-year stalemate, three
|
||||
forks of working code, no merged upstream.
|
||||
|
||||
### What this means for this backend
|
||||
|
||||
We chose to extend `libva-v4l2-request` into Pi 5 territory because
|
||||
the architecture maps cleanly onto the existing iter38 multi-device
|
||||
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
|
||||
`071b08d`). It's reusable infrastructure for any future strict V4L2
|
||||
stateless decoder that ffmpeg ships before libva does.
|
||||
|
||||
But the *user-facing* Pi 5 HEVC story will not come from this
|
||||
backend. The backend was a clean architectural target inside a
|
||||
coordination dead-end. The actual Pi 5 HEVC path through libva
|
||||
requires either:
|
||||
|
||||
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
|
||||
against (Intel-driven; no Pi-aligned principal),
|
||||
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
|
||||
maintainer),
|
||||
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
|
||||
(small upstream patch; no champion), OR
|
||||
- the political situation around Kwiboo's series unblocks (no
|
||||
visible movement).
|
||||
|
||||
iter40 + iter40b are **landed but parked**. The fresnel + ampere
|
||||
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
|
||||
verified post-iter40b, no regression). Phase 8 packaging is
|
||||
deliberately skipped — shipping a `.deb` whose primary advertised
|
||||
target (Pi 5) doesn't actually decode would mislead users.
|
||||
|
||||
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
|
||||
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
|
||||
chapter's full empirical record.
|
||||
|
||||
## Instructions
|
||||
|
||||
In order to use this libVA backend, the `v4l2_request` driver has to
|
||||
be specified through the `LIBVA_DRIVER_NAME` environment variable, as
|
||||
such:
|
||||
In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
|
||||
variable:
|
||||
|
||||
export LIBVA_DRIVER_NAME=v4l2_request
|
||||
|
||||
A media player that supports VAAPI (such as VLC) can then be used to decode a
|
||||
video in a supported format:
|
||||
Then a VA-API-capable player can decode supported codecs on a probed
|
||||
device:
|
||||
|
||||
vlc path/to/video.mpg
|
||||
vlc path/to/video.mp4
|
||||
mpv --hwdec=vaapi path/to/video.mp4
|
||||
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
|
||||
|
||||
Sample media files can be obtained from:
|
||||
|
||||
http://samplemedia.linaro.org/MPEG2/
|
||||
http://samplemedia.linaro.org/MPEG4/SVT/
|
||||
The backend auto-detects available decoders via the V4L2 media
|
||||
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
|
||||
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
|
||||
|
||||
## Technical Notes
|
||||
|
||||
### Surface
|
||||
### Multi-device probe (iter38)
|
||||
|
||||
A Surface is an internal data structure never handled by the VA's user
|
||||
containing the output of a rendering. Usualy, a bunch of surfaces are created
|
||||
at the begining of decoding and they are then used alternatively. When
|
||||
created, a surface is assigned a corresponding v4l capture buffer and it is
|
||||
kept until the end of decoding. Syncing a surface waits for the v4l buffer to
|
||||
be available and then dequeue it.
|
||||
A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
|
||||
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
|
||||
re-targets the active fd per profile via
|
||||
`request_switch_device_for_profile()`. Pool teardown happens at
|
||||
switch time; the next `CreateContext` rebuilds against the right
|
||||
device.
|
||||
|
||||
Note: since a Surface is kept private from the VA's user, it can ask to
|
||||
directly render a Surface on screen in an X Drawable. Some kind of
|
||||
implementation is available in PutSurface but this is only for development
|
||||
purpose.
|
||||
### Surface / Context / Picture / Image
|
||||
|
||||
### Context
|
||||
A Surface is an internal data structure containing rendering output.
|
||||
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
|
||||
defaults) for one decode session. A Picture is one encoded input
|
||||
frame's set of buffers. An Image is a Standard VA pixel-format view
|
||||
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
|
||||
NV15 to NV12/P010 here so consumers see linear pitches.
|
||||
|
||||
A Context is a global data structure used for rendering a video of a certain
|
||||
format. When a context is created, input buffers are created and v4l's output
|
||||
(which is the compressed data input queue, since capture is the real output)
|
||||
format is set.
|
||||
|
||||
### Picture
|
||||
|
||||
A Picture is an encoded input frame made of several buffers. A single input
|
||||
can contain slice data, headers and IQ matrix. Each Picture is assigned a
|
||||
request ID when created and each corresponding buffer might be turned into a
|
||||
v4l buffers or extended control when rendered. Finally they are submitted to
|
||||
kernel space when reaching EndPicture.
|
||||
|
||||
The real rendering is done in EndPicture instead of RenderPicture
|
||||
because the v4l2 driver expects to have the full corresponding
|
||||
extended control when a buffer is queued and we don't know in which
|
||||
order the different RenderPicture will be called.
|
||||
|
||||
### Image
|
||||
|
||||
An Image is a standard data structure containing rendered frames in a usable
|
||||
pixel format. Here we only use NV12 buffers which are converted from sunxi's
|
||||
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
|
||||
Surface.
|
||||
The real rendering is in `EndPicture`, not `RenderPicture`, because
|
||||
the kernel needs the full extended-control batch when the OUTPUT
|
||||
buffer is queued, and `RenderPicture` order is consumer-defined.
|
||||
|
||||
Reference in New Issue
Block a user