# libva-v4l2-request-fourier

VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
fork of the dormant `bootlin/libva-v4l2-request` upstream.

## What works

| SoC / host | Codecs verified bit-exact vs `kdirect` |
|---|---|
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 — 5/5 at iter38 |
| RK3588 (ampere) | H.264 (iter1 ampere-fourier); HEVC EXT_SPS structure clean (iter2); other codecs in progress |
| RK3568 / RK3566 (ohm — PineTab2) | iter1-5 baseline (libva-multiplanar campaign) |

`kdirect` = `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime ...`
through Kwiboo's downstream ffmpeg patches. The Rockchip family has the
benefit of years of `rkvdec` + `hantro-vpu` iteration in mainline + the
RK3588/RK3576 video decoder series **landing in mainline February 2026**.

## What does NOT work, and why it's stalled

| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |

### The Pi 5 standoff

iter40 + iter40b add a third multi-device-probe slot for
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
and a (fragile, fixture-specific) SPS field override using the
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.

**Decode itself never succeeds** — every CAPTURE DQBUF returns
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
validation is intentional ("`try_ext_ctrls returned an error (22)` is
expected as it is validating the SPS"), and VAAPI's
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
slice-level `num_entry_point_offsets`) that the driver wants. We can't
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
the SPS itself and passes only slice NAL bytes to libva backends.

This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
driver. It's an ecosystem coordination failure of long standing:

- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
  LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
  series in August 2024. Still un-merged in May 2026 — **eight years
  in the upstream review queue**.
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
  meaningful commits since ~2021. Nobody wants to own the impedance
  mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
  parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
  I'll just drive the HW."
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
  months in review. The Pi 6.18.x downstream kernel meanwhile has
  active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
  [#7306](https://github.com/raspberrypi/linux/issues/7306)) that
  aren't being fast-tracked because "the new uAPI is coming."
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
  path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
  not via libva — explicit acknowledgement from David Turner that
  libavcodec needs to retain the SPS context for the strict driver to
  accept the control batch.

What end-users actually do today: run Pi OS (downstream-patched ffmpeg
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
kernel). Anyone on a stock distro outside those two: no HW HEVC on
Pi 5.

Nobody who has authority to merge has skin in the game. Everyone with
skin in the game lacks authority. Result: 8-year stalemate, three
forks of working code, no merged upstream.

### What this means for this backend

We chose to extend `libva-v4l2-request` into Pi 5 territory because
the architecture maps cleanly onto the existing iter38 multi-device
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
`071b08d`). It's reusable infrastructure for any future strict V4L2
stateless decoder that ffmpeg ships before libva does.

But the *user-facing* Pi 5 HEVC story will not come from this
backend. The backend was a clean architectural target inside a
coordination dead-end. The actual Pi 5 HEVC path through libva
requires either:

- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
  against (Intel-driven; no Pi-aligned principal),
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
  maintainer),
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
  (small upstream patch; no champion), OR
- the political situation around Kwiboo's series unblocks (no
  visible movement).

iter40 + iter40b are **landed but parked**. The fresnel + ampere
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
verified post-iter40b, no regression). Phase 8 packaging is
deliberately skipped — shipping a `.deb` whose primary advertised
target (Pi 5) doesn't actually decode would mislead users.

See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
chapter's full empirical record.

## Instructions

In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
variable:

	export LIBVA_DRIVER_NAME=v4l2_request

Then a VA-API-capable player can decode supported codecs on a probed
device:

	vlc path/to/video.mp4
	mpv --hwdec=vaapi path/to/video.mp4
	ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -

The backend auto-detects available decoders via the V4L2 media
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.

## Technical Notes

### Multi-device probe (iter38)

A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
re-targets the active fd per profile via
`request_switch_device_for_profile()`. Pool teardown happens at
switch time; the next `CreateContext` rebuilds against the right
device.

### Surface / Context / Picture / Image

A Surface is an internal data structure containing rendering output.
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
defaults) for one decode session. A Picture is one encoded input
frame's set of buffers. An Image is a Standard VA pixel-format view
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
NV15 to NV12/P010 here so consumers see linear pitches.

The real rendering is in `EndPicture`, not `RenderPicture`, because
the kernel needs the full extended-control batch when the OUTPUT
buffer is queued, and `RenderPicture` order is consumer-defined.