Compare commits
15 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 902d6c17ba | |||
| c839b9456e | |||
| d7ef0f6cd9 | |||
| 5803cbcf6c | |||
| ab79ed5e4d | |||
| 5fb7e36955 | |||
| 85bcddb5ad | |||
| 9c30eccd52 | |||
| 78a9978b02 | |||
| 61db76ebcf | |||
| bed75c0cef | |||
| 1a2c958ab3 | |||
| 4f6ba6c0e3 | |||
| c5fbc5bf04 | |||
| f91c3f53c5 |
@@ -1,281 +1,75 @@
|
||||
# libva-v4l2-request-fourier
|
||||
# v4l2-request libVA Backend
|
||||
|
||||
VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
|
||||
fork of the dormant `bootlin/libva-v4l2-request` upstream.
|
||||
## About
|
||||
|
||||
> **TL;DR for "I want hardware-accelerated YouTube in Firefox on my
|
||||
> Rockchip board":** skip to the [§ Quickstart](#quickstart) below.
|
||||
> Fresnel (RK3399) and ampere (RK3588) are validated targets; ohm
|
||||
> (RK3566 PineTab2) is the chromium-fourier validation rig.
|
||||
This libVA backend is designed to work with the Linux Video4Linux2
|
||||
Request API that is used by a number of video codecs drivers,
|
||||
including the Video Engine found in most Allwinner SoCs.
|
||||
|
||||
## What works
|
||||
## Status
|
||||
|
||||
| SoC / host | HW-accelerated codecs | Bit-exact vs `kdirect` |
|
||||
|---|---|---|
|
||||
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 | 5/5 at iter38; preserved through iter40b |
|
||||
| RK3588 (ampere) | H.264 + HEVC (iter1+iter2 ampere-fourier); **mainline rkvdec / VDPU381 + VDPU383 landed February 2026** — VP9 / AV1 verification next | iter1 H.264 PASS; remaining codecs gated on mainline-driver bring-up |
|
||||
| RK3568 / RK3566 (ohm — PineTab2) | H.264, MPEG-2, VP8 via hantro multi-planar | iter1-5 baseline (libva-multiplanar campaign) |
|
||||
| BCM2712 (higgs — Pi 5 / CM5) | — | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved, [see § Pi 5 standoff](#the-pi-5-standoff) |
|
||||
|
||||
`kdirect` is the reference: `ffmpeg -hwaccel v4l2request
|
||||
-hwaccel_output_format drm_prime ...` via Kwiboo's downstream ffmpeg
|
||||
patches (packaged here as **`ffmpeg-v4l2-request-fourier`**, FFmpeg 8.1
|
||||
tip @ Kwiboo `v4l2-request-n8.1` commit `b57fbbe`).
|
||||
|
||||
## Quickstart
|
||||
|
||||
### What you need for HW-accelerated YouTube in Firefox
|
||||
|
||||
The full stack, top to bottom, with the package this campaign provides
|
||||
at each layer:
|
||||
|
||||
| Layer | Package(s) | Notes |
|
||||
|---|---|---|
|
||||
| Linux kernel with V4L2 stateless decoders | `linux-fresnel-fourier` (RK3399), `linux-ampere-fourier` (RK3588) | Mainline rkvdec / hantro / VDPU381 / VDPU383. ohm typically rides on a Beryllium OS host kernel. |
|
||||
| `ffmpeg` with Kwiboo's v4l2-request hwaccel | `ffmpeg-v4l2-request-fourier` | Provides `-hwaccel drm -c:v hevc` (and h264/vp9) routes via libavcodec hwdevice DRM. |
|
||||
| `libva` VA-API runtime + this backend ICD | `libva` (stock) + **`libva-v4l2-request-fourier`** | This repo. Auto-detects rkvdec / hantro / cedrus on probe. |
|
||||
| Firefox patched to call libavcodec stateless | `firefox-fourier` | 5-patch series, ~+169 LoC over stock Firefox. Validated on fresnel: **~5 % CPU at 1080p30 H.264** (vs 64 % software). |
|
||||
| (Wayland alt) Chromium patched for V4L2VDA | `chromium-fourier` + `kwin-fourier` | Validated on ohm under KDE Plasma 6.6.5 Wayland. Needs `kwin-fourier` for the dmabuf-fence latency fix. |
|
||||
| (Optional) panfrost / panthor GPU stack | `vulkan-panfrost` | Wayland compositor + 3D. |
|
||||
|
||||
The actual VA-API path is mostly historical inside this campaign — the
|
||||
**user-facing browser HW decode story rides libavcodec's
|
||||
`v4l2_request` hwaccel directly**, not VAAPI-via-libva. Firefox-fourier
|
||||
attaches an `AV_HWDEVICE_TYPE_DRM` context to libavcodec's generic
|
||||
`h264`/`hevc`/`vp9` decoder; libavcodec then auto-binds the
|
||||
`v4l2_request` hwaccel from its `hw_configs`. No `LIBVA_DRIVER_NAME`
|
||||
incantation needed for browser use. libva-v4l2-request-fourier matters
|
||||
for mpv, ffmpeg-as-vaapi, and other VA-API direct consumers.
|
||||
|
||||
### Install on Arch ALARM (fresnel / ampere / ohm)
|
||||
|
||||
Add the marfrit repo if you haven't already:
|
||||
|
||||
```ini
|
||||
# /etc/pacman.conf
|
||||
[marfrit]
|
||||
SigLevel = Required
|
||||
Server = https://packages.reauktion.de/arch/$arch
|
||||
```
|
||||
|
||||
Import the signing key (one-time):
|
||||
|
||||
```bash
|
||||
sudo pacman-key --recv-keys <KEY-ID> # see https://packages.reauktion.de
|
||||
sudo pacman-key --lsign-key <KEY-ID>
|
||||
sudo pacman -Sy
|
||||
```
|
||||
|
||||
Then per host:
|
||||
|
||||
```bash
|
||||
# Fresnel — RK3399 Pinebook Pro
|
||||
sudo pacman -S \
|
||||
linux-fresnel-fourier linux-fresnel-fourier-headers \
|
||||
ffmpeg-v4l2-request-fourier \
|
||||
libva-v4l2-request-fourier \
|
||||
firefox-fourier
|
||||
|
||||
# Ampere — RK3588
|
||||
sudo pacman -S \
|
||||
linux-ampere-fourier linux-ampere-fourier-headers \
|
||||
ffmpeg-v4l2-request-fourier \
|
||||
libva-v4l2-request-fourier \
|
||||
firefox-fourier
|
||||
|
||||
# Ohm — RK3566 PineTab2 (chromium-fourier validated path)
|
||||
sudo pacman -S \
|
||||
ffmpeg-v4l2-request-fourier \
|
||||
libva-v4l2-request-fourier \
|
||||
kwin-fourier
|
||||
# chromium-fourier currently still a local build — see § Status
|
||||
```
|
||||
|
||||
Reboot if a new kernel landed. Then:
|
||||
|
||||
```bash
|
||||
# Smoke-test: vainfo should list HEVCMain + H264 entries
|
||||
LIBVA_DRIVER_NAME=v4l2_request vainfo
|
||||
|
||||
# Browser launch with verbose decoder logging
|
||||
MOZ_LOG="PlatformDecoderModule:5,FFmpegVideo:5" \
|
||||
firefox-fourier 2>&1 | tee /tmp/fx.log
|
||||
|
||||
# Then open a YouTube 1080p H.264 video and grep for:
|
||||
# "Choosing FFmpeg pixel format for V4L2 video decoding"
|
||||
# "av_hwdevice_ctx_create(DRM, /dev/dri/renderD128) ok"
|
||||
# If you DON'T see those: HW path didn't engage, fell back to software.
|
||||
```
|
||||
|
||||
### Status of the published vs locally-built packages
|
||||
|
||||
As of May 2026, the live marfrit repo at
|
||||
<https://packages.reauktion.de/arch/aarch64/> has:
|
||||
|
||||
- ✓ `libva-v4l2-request-fourier-1:1.0.0.r361.cf8cd9d-1` (iter40b tip)
|
||||
- ✓ `ffmpeg-v4l2-request-fourier-2:8.1.r123329.b57fbbe-3` (Kwiboo's
|
||||
v4l2-request-n8.1 + libudev-bypass; smoke-tested on fresnel —
|
||||
HEVC via `-hwaccel v4l2request` PASS)
|
||||
- ✓ `firefox-fourier-150.0.1-16` (5-patch series, sandboxed RDD HW
|
||||
decode validated on RK3399: ~5 % CPU at 1080p30 H.264)
|
||||
- ✓ `linux-fresnel-fourier-7.0-14` + headers (RK3399)
|
||||
- ✓ `linux-ampere-fourier-7.0rc3.kafr1-1` + headers (RK3588)
|
||||
- ✓ `kwin-fourier-1:6.6.5-1` (Wayland dmabuf-fence fix for chromium-fourier)
|
||||
- ✓ `vulkan-panfrost-1:26.0.5-1` (GPU stack)
|
||||
|
||||
NOT yet published but **present in `marfrit-packages/arch/` source
|
||||
tree** (build + publish pending):
|
||||
|
||||
- ⏳ `chromium-fourier` (Chromium 147 + V4L2VDA-on-mainline patches —
|
||||
blocked on Arch ALARM bumping clang 22 → 23).
|
||||
- ⏳ `qt6-base-fourier` (GL_ALPHA → GL_R8 fix — needed by KDE Plasma
|
||||
Wayland on the panfrost stack).
|
||||
|
||||
If you need those locally before they ship:
|
||||
|
||||
```bash
|
||||
git clone ssh://git@git.reauktion.de:2222/marfrit/marfrit-packages.git
|
||||
cd marfrit-packages/arch/<package>
|
||||
makepkg -si
|
||||
```
|
||||
|
||||
## What does NOT work, and why it's stalled
|
||||
|
||||
| Target | Status | Blocker |
|
||||
|---|---|---|
|
||||
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
|
||||
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
|
||||
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
|
||||
|
||||
## What does NOT work, and why it's stalled
|
||||
|
||||
| Target | Status | Blocker |
|
||||
|---|---|---|
|
||||
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
|
||||
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
|
||||
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
|
||||
|
||||
### The Pi 5 standoff
|
||||
|
||||
iter40 + iter40b add a third multi-device-probe slot for
|
||||
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
|
||||
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
|
||||
and a (fragile, fixture-specific) SPS field override using the
|
||||
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
|
||||
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
|
||||
|
||||
**Decode itself never succeeds** — every CAPTURE DQBUF returns
|
||||
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
|
||||
validation is intentional ("`try_ext_ctrls returned an error (22)` is
|
||||
expected as it is validating the SPS"), and VAAPI's
|
||||
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
|
||||
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
|
||||
slice-level `num_entry_point_offsets`) that the driver wants. We can't
|
||||
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
|
||||
the SPS itself and passes only slice NAL bytes to libva backends.
|
||||
|
||||
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
|
||||
driver. It's an ecosystem coordination failure of long standing:
|
||||
|
||||
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
|
||||
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
|
||||
series in August 2024. Still un-merged in May 2026 — **eight years
|
||||
in the upstream review queue**.
|
||||
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
|
||||
meaningful commits since ~2021. Nobody wants to own the impedance
|
||||
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
|
||||
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
|
||||
I'll just drive the HW."
|
||||
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
|
||||
months in review. The Pi 6.18.x downstream kernel meanwhile has
|
||||
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
|
||||
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
|
||||
aren't being fast-tracked because "the new uAPI is coming."
|
||||
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
|
||||
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
|
||||
not via libva — explicit acknowledgement from David Turner that
|
||||
libavcodec needs to retain the SPS context for the strict driver to
|
||||
accept the control batch.
|
||||
|
||||
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
|
||||
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
|
||||
kernel). Anyone on a stock distro outside those two: no HW HEVC on
|
||||
Pi 5.
|
||||
|
||||
Nobody who has authority to merge has skin in the game. Everyone with
|
||||
skin in the game lacks authority. Result: 8-year stalemate, three
|
||||
forks of working code, no merged upstream.
|
||||
|
||||
### What this means for this backend
|
||||
|
||||
We chose to extend `libva-v4l2-request` into Pi 5 territory because
|
||||
the architecture maps cleanly onto the existing iter38 multi-device
|
||||
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
|
||||
`071b08d`). It's reusable infrastructure for any future strict V4L2
|
||||
stateless decoder that ffmpeg ships before libva does.
|
||||
|
||||
But the *user-facing* Pi 5 HEVC story will not come from this
|
||||
backend. The backend was a clean architectural target inside a
|
||||
coordination dead-end. The actual Pi 5 HEVC path through libva
|
||||
requires either:
|
||||
|
||||
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
|
||||
against (Intel-driven; no Pi-aligned principal),
|
||||
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
|
||||
maintainer),
|
||||
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
|
||||
(small upstream patch; no champion), OR
|
||||
- the political situation around Kwiboo's series unblocks (no
|
||||
visible movement).
|
||||
|
||||
iter40 + iter40b are **landed but parked**. The fresnel + ampere
|
||||
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
|
||||
verified post-iter40b, no regression). Phase 8 packaging is
|
||||
deliberately skipped — shipping a `.deb` whose primary advertised
|
||||
target (Pi 5) doesn't actually decode would mislead users.
|
||||
|
||||
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
|
||||
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
|
||||
chapter's full empirical record.
|
||||
The v4l2-request libVA backend currently supports the following formats:
|
||||
* MPEG2 (Simple and Main profiles)
|
||||
* H264 (Baseline, Main and High profiles)
|
||||
* H265 (Main profile)
|
||||
|
||||
## Instructions
|
||||
|
||||
In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
|
||||
variable:
|
||||
In order to use this libVA backend, the `v4l2_request` driver has to
|
||||
be specified through the `LIBVA_DRIVER_NAME` environment variable, as
|
||||
such:
|
||||
|
||||
export LIBVA_DRIVER_NAME=v4l2_request
|
||||
|
||||
Then a VA-API-capable player can decode supported codecs on a probed
|
||||
device:
|
||||
A media player that supports VAAPI (such as VLC) can then be used to decode a
|
||||
video in a supported format:
|
||||
|
||||
vlc path/to/video.mp4
|
||||
mpv --hwdec=vaapi path/to/video.mp4
|
||||
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
|
||||
vlc path/to/video.mpg
|
||||
|
||||
The backend auto-detects available decoders via the V4L2 media
|
||||
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
|
||||
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
|
||||
Sample media files can be obtained from:
|
||||
|
||||
http://samplemedia.linaro.org/MPEG2/
|
||||
http://samplemedia.linaro.org/MPEG4/SVT/
|
||||
|
||||
## Technical Notes
|
||||
|
||||
### Multi-device probe (iter38)
|
||||
### Surface
|
||||
|
||||
A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
|
||||
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
|
||||
re-targets the active fd per profile via
|
||||
`request_switch_device_for_profile()`. Pool teardown happens at
|
||||
switch time; the next `CreateContext` rebuilds against the right
|
||||
device.
|
||||
A Surface is an internal data structure never handled by the VA's user
|
||||
containing the output of a rendering. Usualy, a bunch of surfaces are created
|
||||
at the begining of decoding and they are then used alternatively. When
|
||||
created, a surface is assigned a corresponding v4l capture buffer and it is
|
||||
kept until the end of decoding. Syncing a surface waits for the v4l buffer to
|
||||
be available and then dequeue it.
|
||||
|
||||
### Surface / Context / Picture / Image
|
||||
Note: since a Surface is kept private from the VA's user, it can ask to
|
||||
directly render a Surface on screen in an X Drawable. Some kind of
|
||||
implementation is available in PutSurface but this is only for development
|
||||
purpose.
|
||||
|
||||
A Surface is an internal data structure containing rendering output.
|
||||
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
|
||||
defaults) for one decode session. A Picture is one encoded input
|
||||
frame's set of buffers. An Image is a Standard VA pixel-format view
|
||||
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
|
||||
NV15 to NV12/P010 here so consumers see linear pitches.
|
||||
### Context
|
||||
|
||||
The real rendering is in `EndPicture`, not `RenderPicture`, because
|
||||
the kernel needs the full extended-control batch when the OUTPUT
|
||||
buffer is queued, and `RenderPicture` order is consumer-defined.
|
||||
A Context is a global data structure used for rendering a video of a certain
|
||||
format. When a context is created, input buffers are created and v4l's output
|
||||
(which is the compressed data input queue, since capture is the real output)
|
||||
format is set.
|
||||
|
||||
### Picture
|
||||
|
||||
A Picture is an encoded input frame made of several buffers. A single input
|
||||
can contain slice data, headers and IQ matrix. Each Picture is assigned a
|
||||
request ID when created and each corresponding buffer might be turned into a
|
||||
v4l buffers or extended control when rendered. Finally they are submitted to
|
||||
kernel space when reaching EndPicture.
|
||||
|
||||
The real rendering is done in EndPicture instead of RenderPicture
|
||||
because the v4l2 driver expects to have the full corresponding
|
||||
extended control when a buffer is queued and we don't know in which
|
||||
order the different RenderPicture will be called.
|
||||
|
||||
### Image
|
||||
|
||||
An Image is a standard data structure containing rendered frames in a usable
|
||||
pixel format. Here we only use NV12 buffers which are converted from sunxi's
|
||||
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
|
||||
Surface.
|
||||
|
||||
@@ -195,11 +195,6 @@ extern "C" {
|
||||
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
|
||||
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
|
||||
|
||||
/* iter39: NV15 is 4×10-bit packed in 5 bytes (Rockchip rkvdec 10-bit output). */
|
||||
#ifndef DRM_FORMAT_NV15
|
||||
#define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */
|
||||
#endif
|
||||
|
||||
/*
|
||||
* 3 plane YCbCr
|
||||
* index 0: Y plane, [7:0] Y
|
||||
|
||||
@@ -4,14 +4,3 @@ option(
|
||||
value : '',
|
||||
description: 'Path to sanitized Linux Kernel headers'
|
||||
)
|
||||
|
||||
option(
|
||||
'daedalus_v4l2',
|
||||
type : 'boolean',
|
||||
value : true,
|
||||
description: 'Enable probe + dispatch for the out-of-tree daedalus_v4l2 ' +
|
||||
'stateless decoder shim (Pi 5 / CM5 daemon-backed VP9/AV1/H264). ' +
|
||||
'Default true; disable on platforms where the daedalus_v4l2 ' +
|
||||
'kernel module will never be present to slim the probe array.'
|
||||
)
|
||||
|
||||
|
||||
@@ -1,298 +0,0 @@
|
||||
# Phase 0 — Pi 5 / CM5 HEVC chapter
|
||||
|
||||
Opened 2026-05-17 evening, after the failed `libva-v4l2-stateful-fourier`
|
||||
scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated
|
||||
the stateful premise: rpi-hevc-dec is V4L2 **stateless**, so Pi 5 HEVC
|
||||
belongs in this backend, not a separate sibling.
|
||||
|
||||
No code in this chapter yet. This doc is the substrate. Phase 1 picks up
|
||||
from the "Open questions" section.
|
||||
|
||||
## Substrate
|
||||
|
||||
### Target host
|
||||
|
||||
higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often
|
||||
offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power
|
||||
when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at
|
||||
`/dev/video19` + `/dev/media1`.
|
||||
|
||||
### Backend baseline at chapter open
|
||||
|
||||
`libva-v4l2-request-fourier` master tip `cf8cd9d` (iter39 + Option B +
|
||||
h265 ref-list cap fix). Multi-device probe (iter38) already opens
|
||||
rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is
|
||||
a natural extension of that architecture.
|
||||
|
||||
iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265
|
||||
parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That
|
||||
plumbing is probe-gated (`has_hevc_ext_sps_rps_rkvdec`), so it stays
|
||||
dormant on hosts where the controls don't exist.
|
||||
|
||||
### Empirical higgs probe (brother session)
|
||||
|
||||
`v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls`:
|
||||
|
||||
```
|
||||
Stateless Codec Controls
|
||||
|
||||
hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS)
|
||||
hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS)
|
||||
slice_param_array (compound dynamic-array dims=[4096])
|
||||
hevc_scaling_matrix (compound)
|
||||
hevc_decode_parameters (compound)
|
||||
hevc_decode_mode (menu, "Frame-Based")
|
||||
hevc_start_code (menu, default "No Start Code")
|
||||
|
||||
OUTPUT formats:
|
||||
S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload)
|
||||
|
||||
CAPTURE formats:
|
||||
NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled)
|
||||
NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled)
|
||||
```
|
||||
|
||||
Conclusion: this is the standard `V4L2_CID_STATELESS_HEVC_*` control set
|
||||
exposed under the V4L2-request uAPI, exactly the same family our backend
|
||||
already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are
|
||||
two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).
|
||||
|
||||
## What carries forward unchanged
|
||||
|
||||
- VAAPI HEVC profile enumeration (`config.c`)
|
||||
- `h265_set_controls` core path (`h265.c`) — same compound ctrl set
|
||||
- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
|
||||
- Multi-device dispatch in `RequestCreateConfig` (iter38)
|
||||
- VAAPI slice / picture / IQ matrix buffer parsing
|
||||
- HEVC h264-style start-code policy (we already DON'T prepend for HEVC)
|
||||
|
||||
## What needs adding
|
||||
|
||||
| Item | Location | Sizing |
|
||||
|------|----------|--------|
|
||||
| `RPI_HEVC_DEC` enum in `driver_kind_t` | `request.h` | trivial |
|
||||
| Multi-device probe extends to `/dev/video19` discovery | `context.c` / `request.c` init | small — mirror hantro slot |
|
||||
| `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry | `video.c` | small |
|
||||
| `V4L2_PIX_FMT_NV12_10_COL128` (NC30) `video_format` entry | `video.c` | small |
|
||||
| NC12 → NV12 detile primitive | new `nv12_col128.c` | mid — column tile layout, see kernel docs |
|
||||
| NC30 → P010 detile primitive | new `nv12_col128.c` | mid — 10-bit variant of above |
|
||||
| `copy_surface_to_image` branch for NC12/NC30 | `image.c` | small (mirror NV15→P010 gating) |
|
||||
| Per-driver gating for any rpi-specific quirks discovered | various | per [[per-driver-kludge-gating]] |
|
||||
|
||||
## Open questions for Phase 1
|
||||
|
||||
Lock these before Phase 1 commits to a goal.
|
||||
|
||||
1. **EXT_SPS controls on rpi-hevc-dec?** Brother's `--list-ctrls` output
|
||||
above shows the standard `V4L2_CID_STATELESS_HEVC_*` family — NOT the
|
||||
`EXT_SPS_ST_RPS` / `EXT_SPS_LT_RPS` extensions that VDPU381 needs.
|
||||
Verify: does `slice_param_array[4096]` accept `st_rps_bits` /
|
||||
`lt_rps_bits` in the per-slice payload, or does rpi-hevc-dec parse RPS
|
||||
itself from the slice header? If the latter, the iter2 EXT_SPS path
|
||||
stays dormant (probe-gated already), and rpi-hevc-dec just needs the
|
||||
`picture->st_rps_bits` → `slice_params->short_term_ref_pic_set_size`
|
||||
plumbing that iter31 α-29 already wired. Expectation: works out of the
|
||||
box. Confirm before assuming.
|
||||
|
||||
2. **`hevc_start_code` ctrl: "No Start Code" vs Annex B?** Brother saw
|
||||
default `"No Start Code"` — matches our behavior (we don't prepend on
|
||||
HEVC). But the ctrl is configurable. Verify the menu values exposed
|
||||
and confirm "No Start Code" passes our raw slice-NAL payload as-is.
|
||||
If it doesn't, set the ctrl explicitly per [[unconditional-codec-state]]
|
||||
gating.
|
||||
|
||||
3. **NC12 / NC30 SAND tile layout — exact spec.** Read
|
||||
`Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst` for the
|
||||
COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes
|
||||
(UV interleaved). Row count = `ALIGN(height, 16)` or `ALIGN(height, 8)`?
|
||||
Get the exact alignment and tile-traversal order before writing the
|
||||
detile primitive. Cite from kernel doc, NOT inferred from a hex dump.
|
||||
|
||||
4. **drm_prime / SAND modifier round-trip.** Does ffmpeg-vaapi (and
|
||||
Firefox) accept the NC12 buffer via DRM_PRIME export carrying the
|
||||
DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing
|
||||
zero-copy to a SAND-aware compositor? Or is libva-side detile to a
|
||||
linear NV12 buffer the only viable Firefox path? If detile is
|
||||
required for the consumer, the [[rockchip-pixel-verify-path]] rule
|
||||
(DMA-BUF GL preferred over cached mmap) might NOT apply since SAND
|
||||
is Pi-specific and not in the wider Wayland modifier ecosystem.
|
||||
|
||||
5. **rpi-hevc-dec quirks on first SPS submission.** rkvdec needs
|
||||
image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec
|
||||
have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE"
|
||||
ordering? Verify with strace early.
|
||||
|
||||
6. **higgs OS + libva versioning.** Brother probed on Debian. We package
|
||||
for Arch ALARM. What's the install path on higgs — Arch / Debian /
|
||||
Raspberry Pi OS? If Debian, the package needs a `debian/` tree, not
|
||||
just PKGBUILD. Decide packaging target before Phase 8.
|
||||
|
||||
## Phase 1 goal sketch (NOT locked)
|
||||
|
||||
> Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact
|
||||
> libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).
|
||||
|
||||
Two measurable subgoals follow naturally:
|
||||
- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request,
|
||||
NV12 image output) byte-exact for the same input.
|
||||
- Firefox VA-API path engages (verify via `chrome://gpu` equivalent / log
|
||||
inspection — `MOZ_LOG=PlatformDecoderModule:5`).
|
||||
|
||||
## Phase 3 baseline plan
|
||||
|
||||
Before any backend code touches rpi-hevc-dec:
|
||||
- `kdirect` floor: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime
|
||||
-i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...` and
|
||||
sha256 the YUV.
|
||||
- `SW reference`: same ffmpeg without `-hwaccel`, sha256 the YUV.
|
||||
- Both runs N=3 per [[replicate-baseline-first]].
|
||||
- Capture `strace -f -e ioctl` of the kdirect run — gives the canonical
|
||||
ioctl sequence rpi-hevc-dec expects.
|
||||
|
||||
## Phase 0 closing
|
||||
|
||||
This doc commits the substrate. Phase 1 starts when:
|
||||
- higgs is up + reachable
|
||||
- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one
|
||||
short probe session
|
||||
- Phase 3 baseline floors are captured
|
||||
|
||||
No work blocks the close of iter39 / fresnel campaign — those are shipped.
|
||||
|
||||
## Phase 0 close addendum (2026-05-17 evening, higgs probe session)
|
||||
|
||||
Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6.
|
||||
Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1
|
||||
opens with what's below.
|
||||
|
||||
### Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present
|
||||
|
||||
`v4l2-ctl -d /dev/video19 --list-ctrls` confirms ONLY the standard
|
||||
`V4L2_CID_STATELESS_HEVC_*` set:
|
||||
- `hevc_sequence_parameter_set` (0x00a40a90)
|
||||
- `hevc_picture_parameter_set` (0x00a40a91)
|
||||
- `slice_param_array` (0x00a40a92, dynamic-array dims=[4096])
|
||||
- `hevc_scaling_matrix` (0x00a40a93)
|
||||
- `hevc_decode_parameters` (0x00a40a94)
|
||||
- `hevc_decode_mode` (0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)
|
||||
- `hevc_start_code` (0x00a40a96, menu min=0 max=1 default=0 = No Start Code)
|
||||
- 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)
|
||||
|
||||
ioctl trace confirms ffmpeg's `VIDIOC_QUERY_EXT_CTRL` for `0xa97` returns
|
||||
EINVAL — same probe pattern our backend uses for
|
||||
`has_hevc_ext_sps_rps_rkvdec`. **The iter2 path stays dormant; the
|
||||
iter31 α-29 `slice_params->short_term_ref_pic_set_size` plumbing is the
|
||||
correct one for rpi-hevc-dec.**
|
||||
|
||||
### Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}
|
||||
|
||||
Default 0 matches our backend's "don't prepend HEVC start code" stance.
|
||||
Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.
|
||||
|
||||
### Q3 — NC12 / NC30 SAND tile layout: PARTIAL
|
||||
|
||||
CAPTURE S_FMT result for 1280×720 NC12:
|
||||
- `sizeimage=1382400` = `1280 × 720 × 1.5` (linear NV12 byte count)
|
||||
- `bytesperline=1080` (NOT 1280)
|
||||
|
||||
The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely
|
||||
encodes SAND column count rather than linear stride. Read
|
||||
`drivers/staging/media/rpivid/` (or wherever NC12_COL128 lives in 6.12)
|
||||
kernel source + `drm_fourcc.h` / `nv12_col128.rst` (if it exists) for
|
||||
exact tile layout BEFORE writing the detile primitive. Do NOT infer
|
||||
layout from this single observation.
|
||||
|
||||
### Q4 — DRM modifier round-trip: BLOCKED on hwdownload
|
||||
|
||||
ffmpeg `-hwaccel drm -hwaccel_output_format drm_prime -vf
|
||||
hwmap=mode=read,format=nv12` returns `Failed to map frame: -38`
|
||||
(`Function not implemented`). hwdownload cannot consume the SAND
|
||||
modifier directly.
|
||||
|
||||
ffmpeg's path that DOES work: `-hwaccel drm -c:v hevc` WITHOUT
|
||||
`-hwaccel_output_format drm_prime` lets ffmpeg's internal pipeline pull
|
||||
back, detile (presumably via a Pi-specific helper or libdrm transform),
|
||||
and present NV12 to the next filter. Bit-exact vs SW for the test
|
||||
fixture (1280×720 Main 8-bit) — confirms HW engagement.
|
||||
|
||||
Phase 1 / Phase 4 will need to decide:
|
||||
- Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
|
||||
- Pass-through DRM_PRIME with SAND modifier and let the consumer
|
||||
(compositor / Firefox) detile. Firefox almost certainly can't, so
|
||||
CPU detile is the safe bet.
|
||||
|
||||
### Q5 — rpi-hevc-dec submission ordering: empirically locked
|
||||
|
||||
`strace -e ioctl` of the kdirect run shows:
|
||||
1. `MEDIA_IOC_DEVICE_INFO` + `MEDIA_IOC_G_TOPOLOGY` (per media node)
|
||||
2. `VIDIOC_QUERYCAP` per video node — `driver="rpi-hevc-dec"` identifies
|
||||
the right one
|
||||
3. `VIDIOC_ENUM_FMT` OUTPUT → S265 only
|
||||
4. `VIDIOC_S_FMT` OUTPUT (HEVC_SLICE, placeholder dims)
|
||||
5. `VIDIOC_REQBUFS` OUTPUT (DMABUF, count=N) — count=6 in kdirect
|
||||
6. `VIDIOC_S_FMT` CAPTURE (NC12, actual dims from SPS parse)
|
||||
7. `VIDIOC_CREATE_BUFS` CAPTURE (DMABUF, count=16)
|
||||
8. `VIDIOC_STREAMON` both queues
|
||||
9. `VIDIOC_QUERY_EXT_CTRL` enumeration
|
||||
10. `VIDIOC_S_EXT_CTRLS` (decode_mode + start_code) — global ctrls
|
||||
11. Per frame: `VIDIOC_S_EXT_CTRLS` (SPS+PPS+decode_params+slice_array,
|
||||
class=0xf010000 = per-request) + `VIDIOC_QBUF` CAPTURE + `VIDIOC_QBUF`
|
||||
OUTPUT (with `V4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD`) +
|
||||
`VIDIOC_DQBUF` OUTPUT + `VIDIOC_DQBUF` CAPTURE
|
||||
|
||||
**Two structural notes for the backend:**
|
||||
- OUTPUT + CAPTURE both use `V4L2_MEMORY_DMABUF` in kdirect. Our backend
|
||||
currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should
|
||||
either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or
|
||||
use MMAP and CPU-detile. Phase 4 design decision.
|
||||
- The order `S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS
|
||||
CAPTURE → STREAMON` differs from our iter25 rkvdec pre-seed pattern
|
||||
(where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve
|
||||
the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed —
|
||||
CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm
|
||||
in Phase 1 by trying our existing iter25 pre-seed flow against it.
|
||||
|
||||
### Q6 — packaging: Debian 13 trixie, NOT Arch
|
||||
|
||||
higgs runs Debian 13 trixie (`PRETTY_NAME="Debian GNU/Linux 13 (trixie)"`),
|
||||
not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for
|
||||
the Pi 5 chapter needs a `debian/` packaging tree, not just a PKGBUILD.
|
||||
|
||||
Decide in Phase 1 whether to:
|
||||
- Add Debian packaging to `marfrit-packages` as a second target, OR
|
||||
- Use distrobox/podman with an Arch ALARM container on higgs for
|
||||
install (test-only, not production), OR
|
||||
- Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian
|
||||
repo.
|
||||
|
||||
### Other new findings from the probe session
|
||||
|
||||
- **ffmpeg 7.1.3 from Debian 13 is built with `--enable-v4l2-request`**
|
||||
— the kdirect path exists. Invocation is `ffmpeg -hwaccel drm -c:v
|
||||
hevc` (not just `-hwaccel drm`; the explicit codec flag matters for
|
||||
the negotiation). Engagement log line is
|
||||
`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
|
||||
buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`. Per
|
||||
[[hw-decode-engagement-check]], grep for that line to confirm HW path
|
||||
engaged.
|
||||
- **No libva ICD installed on higgs** — only `armada-drm_dri.so` ships,
|
||||
which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi
|
||||
5 once installed.
|
||||
- **mpv is apt-installable** (`mpv 0.40.0-3+deb13u1`) — useful as a
|
||||
pixel-readback verifier once the backend works (`mpv --vo=image` or
|
||||
`--vo=drm`).
|
||||
- **Firefox 145.0.1 + rpi-firefox-mods 20251016 installed** (firefox-esr
|
||||
package status was `rc` = removed but config remains). The mods
|
||||
package likely contains VA-API plumbing prefs.
|
||||
|
||||
### What changes for Phase 1
|
||||
|
||||
- Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for
|
||||
the 1280×720 Main 8-bit test fixture (same generator as
|
||||
`/tmp/bbb_main.mp4` here). Kdirect engagement signal is the
|
||||
`Hwaccel V4L2 HEVC stateless V4` log line.
|
||||
- Most backend code reuses existing rkvdec/hantro HEVC path: ctrls,
|
||||
per-frame submission, request_fd, multi-device probe pattern.
|
||||
- New code: NC12 video_format entry + detile primitive (sibling to
|
||||
`nv15_unpack_plane_to_p010`) + RPI_HEVC_DEC driver_kind.
|
||||
- Packaging target = Debian, not Arch.
|
||||
@@ -1,230 +0,0 @@
|
||||
# Phase 1+2+3+4 — Pi 5 HEVC chapter (iter40)
|
||||
|
||||
Per [[feedback_dev_process]], Phase 1 (goal), Phase 2 (situation analysis),
|
||||
Phase 3 (baselines), Phase 4 (plan) for adding rpi-hevc-dec as a third
|
||||
multi-device-probe slot in `libva-v4l2-request-fourier`. Phase 0 substrate
|
||||
+ open-question answers live at `phase0_pi5_hevc.md`.
|
||||
|
||||
## Phase 1 — Goal
|
||||
|
||||
> **libva-v4l2-request-fourier on higgs** decodes HEVC Main 8-bit input
|
||||
> producing NV12 output **bit-exact vs kdirect** for three reference
|
||||
> fixtures (640×360, 1280×720, 1920×1080 — Main profile, libx265
|
||||
> ultrafast). HW path engagement verified via the kernel-driver lsof
|
||||
> signal (`/dev/video19` open) AND ffmpeg-vaapi engagement signal
|
||||
> (`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19`).
|
||||
|
||||
Measurable:
|
||||
|
||||
| Criterion | Metric |
|
||||
|---|---|
|
||||
| C1 — vainfo enumeration | `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain : VAEntrypointVLD` |
|
||||
| C2 — bit-exact decode | sha256 of libva NV12 output == sha256 of kdirect NV12 output, per fixture, N=1 |
|
||||
| C3 — HW engagement | `lsof` shows `/dev/video19` open by ffmpeg-vaapi during libva run |
|
||||
| C4 — Stability under N=3 | C2 holds at N=3 repeated runs (deterministic) |
|
||||
| C5 — Sibling baseline preserved | fresnel iter38 5/5 still PASS post-iter40 (no regression to rkvdec/hantro path) |
|
||||
|
||||
Out of scope this iter: Main10 (10-bit / NC30), VP9, AV1, Firefox VA-API
|
||||
engagement testing, performance benchmarks. All later chapters.
|
||||
|
||||
## Phase 2 — Situation Analysis
|
||||
|
||||
### Backend architecture already in place
|
||||
|
||||
- **Multi-device probe (iter38)**: at `VA_DRIVER_INIT` opens both
|
||||
`rkvdec` + `hantro-vpu` via `find_decoder_device_by_driver(name)`.
|
||||
Stores per-driver fds (`video_fd_{rkvdec,hantro}`,
|
||||
`media_fd_{rkvdec,hantro}`). `RequestCreateConfig` retargets the
|
||||
"active" `driver_data->{video,media}_fd` per profile via
|
||||
`request_switch_device_for_profile()` (request.c:426-478).
|
||||
- **Per-driver feature gating**: `request_data->has_hevc_ext_sps_rps_{rkvdec,hantro}`
|
||||
pair, with `h265_set_controls` consulting the per-fd flag. Established
|
||||
by iter2 / Phase 5 review (request.h:99-100). This is the canonical
|
||||
per-driver gating shape for iter40.
|
||||
- **HEVC ctrl population**: `h265_set_controls` populates the standard
|
||||
`V4L2_CID_STATELESS_HEVC_*` set (h265.c). Probe-gates EXT_SPS_*_RPS
|
||||
via the iter2 path — naturally dormant for rpi-hevc-dec since the
|
||||
controls don't exist.
|
||||
- **Synthetic SPS pre-seed (iter25/26)**: needed for rkvdec to resolve
|
||||
`image_fmt` before CAPTURE alloc. Phase 0 strace shows rpi-hevc-dec
|
||||
does NOT need this — it accepts NC12 + explicit dims on `S_FMT
|
||||
CAPTURE` directly. The pre-seed code path stays in place for rkvdec;
|
||||
rpi-hevc-dec just doesn't trigger it (gate on driver_kind).
|
||||
- **CAPTURE detile primitive**: `nv15_unpack_plane_to_p010()` (nv15.c)
|
||||
is the template — backend already CPU-detiles when a Pi-or-Rockchip-
|
||||
specific CAPTURE format meets a linear consumer (VAImage NV12 / P010).
|
||||
- **Single-plane (S) vs multi-plane (M) handling**: hantro uses MPLANE,
|
||||
rkvdec uses both depending on codec. rpi-hevc-dec exposes MPLANE for
|
||||
BOTH OUTPUT (HEVC_SLICE) and CAPTURE (NC12) per the strace. iter38
|
||||
already supports MPLANE handling for hantro; rpi reuses that.
|
||||
|
||||
### Surface area to touch (audit)
|
||||
|
||||
| File | What changes | Size |
|
||||
|------|--------------|------|
|
||||
| `src/request.h` | Add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`, `has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout) | ~10 lines |
|
||||
| `src/request.c` | (a) Extend init -1 block to cover new fds. (b) Recognize `rpi-hevc-dec` as a 3rd primary/alt driver string in the probe loop. (c) Extend `request_device_kind_for_profile` so HEVC→'p' when rpi-hevc-dec is present, else 'r'. (d) Extend `request_switch_device_for_profile` 'p' branch. (e) Probe HEVC ext_sps on the new fd (will be false, mirrors hantro entry). | ~80 lines |
|
||||
| `src/video.c` | Add `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry: 4:2:0, planes=1, alignment via dedicated bytesperline/sizeimage formula. NOT marked linear. | ~20 lines |
|
||||
| `src/nv12_col128.c` (NEW) | `nv12_col128_detile_to_nv12()`: Y plane + UV plane detile primitive. Adapted from ffmpeg/Kynesim `av_rpi_sand_to_planar_y8` core math. Header doc traces back to videodev2.h docstring + raspberrypi/linux `hevc_dec/hevc_d_video.c` size formula. | ~80 lines + 30-line header |
|
||||
| `src/image.c` | Add NC12 → NV12 branch in `copy_surface_to_image`, gated on `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` (sibling to existing NV15→P010 branch). | ~25 lines |
|
||||
| `src/meson.build` + `src/Makefile.am` | List `nv12_col128.c`/`.h` in sources | 2 lines |
|
||||
|
||||
Total estimated diff: ~250 LoC backend + ~100 LoC standalone primitive.
|
||||
Roughly half the surface area of iter38; smaller than iter2.
|
||||
|
||||
### What does NOT change
|
||||
|
||||
- iter25/26 SPS pre-seed: stays on rkvdec path only (gated by
|
||||
driver_kind check that's already implicit in the rkvdec fd routing).
|
||||
- iter2 EXT_SPS plumbing: probe-gated off on rpi-hevc-dec; vendored
|
||||
GStreamer parser unused. Confirmed via the EINVAL on ctrl 0xa97.
|
||||
- iter31 α-29 slice_params st_rps_bits: APPLIES to rpi-hevc-dec
|
||||
unchanged. Same plumbing.
|
||||
- iter33 VP8 hantro start-code prepend: not relevant (rpi-hevc-dec is
|
||||
HEVC-only; VP8 still goes through hantro on RK).
|
||||
- iter38 single-libva-session multi-codec semantics: extends from 5
|
||||
codecs to 5+1 (HEVC reroutes on Pi).
|
||||
|
||||
### NC12 / SAND128 tile geometry — locked contract
|
||||
|
||||
From kernel driver `drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c`
|
||||
(via [[github raspberrypi/linux rpi-6.12.y]]):
|
||||
|
||||
```c
|
||||
case V4L2_PIX_FMT_NV12_COL128:
|
||||
width = ALIGN(width, 128); /* Width rounds up to columns */
|
||||
height = ALIGN(height, 8);
|
||||
bytesperline = constrain2x(bytesperline, height * 3 / 2);
|
||||
sizeimage = bytesperline * width;
|
||||
break;
|
||||
```
|
||||
|
||||
For 1280×720:
|
||||
- width = 1280 (already 128-aligned)
|
||||
- height = 720 (already 8-aligned)
|
||||
- bytesperline = 720 × 3/2 = **1080** (matches Phase 0 strace observation)
|
||||
- sizeimage = 1080 × 1280 = **1,382,400** (matches strace; equals linear NV12 byte count coincidentally)
|
||||
|
||||
**Geometry interpretation** (cross-verified against ffmpeg/Kynesim
|
||||
`rpi_sand_fn_pw.h` `av_rpi_sand_to_planar_y8`):
|
||||
- Image is divided into `(width + 127) / 128` columns; each column is
|
||||
**128 px wide × height px tall**.
|
||||
- Within a column: `128 × height` bytes of Y data, immediately followed
|
||||
by `128 × height/2` bytes of interleaved CbCr (so 128 × `bytesperline`
|
||||
bytes per column, where `bytesperline` is the column stride).
|
||||
- Across columns: column N starts at offset `N × stride1 × stride2`
|
||||
where `stride1 = 128` (column width) and `stride2 = bytesperline`.
|
||||
- **Pixel (x, y) byte offset = `(x & 127) + y × 128 + (x & ~127) × bytesperline`**
|
||||
for Y; same formula with `y/2` for UV plane (which begins at offset
|
||||
`128 × height × num_columns` from the start).
|
||||
|
||||
Reference for the detile loop: `av_rpi_sand_to_planar_y8` (Kynesim
|
||||
ffmpeg, `libavutil/rpi_sand_fn_pw.h` with PW=1). Our primitive copies
|
||||
the single-stripe fast-path math; we don't import NEON ASM (CPU
|
||||
detile is the safe path for Phase 1; SIMD a Phase 2 perf bump if needed).
|
||||
|
||||
## Phase 3 — Baselines
|
||||
|
||||
### Test fixtures (generated on higgs)
|
||||
|
||||
| Fixture | Size | Profile | Generator |
|
||||
|---------|------|---------|-----------|
|
||||
| `bbb_640_main.mp4` | 640×360 | Main 8-bit | `ffmpeg -f lavfi -i testsrc=duration=2 -pix_fmt yuv420p -c:v libx265 -preset ultrafast -profile:v main` |
|
||||
| `bbb_1280_main.mp4` | 1280×720 | Main 8-bit | same |
|
||||
| `bbb_1920_main.mp4` | 1920×1080 | Main 8-bit | same |
|
||||
|
||||
### Captured 2026-05-17 evening on higgs
|
||||
|
||||
For each fixture, N=3 reps. Both SW (no hwaccel) and kdirect
|
||||
(`ffmpeg -hwaccel drm -c:v hevc`) → `-frames:v 10 -f rawvideo -pix_fmt nv12`,
|
||||
sha256 of first 16 chars:
|
||||
|
||||
```
|
||||
bbb_640_main SW={9a81038065e9b7cd} HW={9a81038065e9b7cd} → BIT-EXACT × N=3
|
||||
bbb_1280_main SW={d3bb055655d6f195} HW={d3bb055655d6f195} → BIT-EXACT × N=3
|
||||
bbb_1920_main SW={0bc2bd6f693db039} HW={0bc2bd6f693db039} → BIT-EXACT × N=3
|
||||
```
|
||||
|
||||
HW engagement signal (per-run): `Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`
|
||||
|
||||
This is the kdirect baseline. Phase 7 verification will compare libva
|
||||
output against these SHAs.
|
||||
|
||||
### Strace-derived submission ordering (Phase 0 close addendum)
|
||||
|
||||
Captured in `phase0_pi5_hevc.md`. Briefly: standard V4L2-request
|
||||
stateless flow, both queues DMABUF, no SPS pre-seed dance needed
|
||||
(rpi-hevc-dec accepts NC12 + dims directly on CAPTURE S_FMT).
|
||||
|
||||
## Phase 4 — Plan
|
||||
|
||||
### Implementation steps (sequenced)
|
||||
|
||||
1. **`request.h`**: extend `request_data` with the new fd pair + ext_sps
|
||||
flag, mirroring iter38/iter2 layout. (no behavior change yet)
|
||||
2. **`request.c`**:
|
||||
- `find_decoder_device_by_driver("rpi-hevc-dec", ...)` accepts new
|
||||
driver string.
|
||||
- Init -1 block extends to new fds.
|
||||
- Probe loop: if primary is `rkvdec` or `hantro-vpu`, also probe
|
||||
`rpi-hevc-dec` (third slot). On Pi 5 there's no `rkvdec` or
|
||||
`hantro-vpu`, so primary becomes `rpi-hevc-dec` and the alt-probes
|
||||
for the other two return absent (their fds stay -1).
|
||||
- `request_device_kind_for_profile`: when profile is `VAProfileHEVCMain`,
|
||||
prefer `'p'` (rpi-hevc-dec) IF `video_fd_rpi_hevc_dec >= 0`, else
|
||||
fall through to `'r'` (rkvdec). All other profiles stay routed as
|
||||
today.
|
||||
- `request_switch_device_for_profile`: add `'p'` branch.
|
||||
- ext_sps probe runs on the new fd; result stored in
|
||||
`has_hevc_ext_sps_rps_rpi_hevc_dec`. Will be false (controls absent).
|
||||
3. **`video.c`**: add NC12 video_format entry. Mark it MPLANE-only (per
|
||||
Phase 0 strace). bytesperline/sizeimage formula encoded per kernel
|
||||
driver math.
|
||||
4. **`src/nv12_col128.c` + `.h`** (NEW): single-file primitive,
|
||||
`nv12_col128_detile_to_nv12(dst_y, dst_uv, src_y, src_uv, width,
|
||||
height, src_stride2)`. CPU per-column row-memcpy loop; not NEON
|
||||
for Phase 1 (correctness first). Self-test in `tests/test_nv12_col128_detile.c`.
|
||||
5. **`image.c`**: branch in `copy_surface_to_image`. Gate:
|
||||
`image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`.
|
||||
Calls the primitive. Existing NV12-linear path stays.
|
||||
6. **`meson.build` + `Makefile.am`**: source list updates.
|
||||
7. **Build clean on higgs** — first build target IS higgs (since iter40
|
||||
only matters on Pi). Cross-build for ampere/fresnel is unaffected
|
||||
because they don't have rpi-hevc-dec — the new fd stays -1 and the
|
||||
per-driver routing falls through to existing rkvdec/hantro paths.
|
||||
|
||||
### Verification gates (Phase 7 acceptance)
|
||||
|
||||
- Build cleanly on higgs (Debian 13 trixie, libva-dev 2.22.0-3,
|
||||
libdrm-dev 2.4.131).
|
||||
- Local-install the resulting `.so` to `/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
|
||||
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||||
- For each Phase 3 fixture: libva output SHA == kdirect SHA (the Phase 3
|
||||
recorded value).
|
||||
- `lsof` during libva decode shows `/dev/video19` open.
|
||||
- Sibling regression check: fresnel `phase7_iter39_test_rig` equivalent
|
||||
still 5/5 PASS (no regression to existing routing).
|
||||
|
||||
### Risks + mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|-----------|
|
||||
| NC12 detile math wrong → libva ≠ kdirect | Tight unit test in `tests/test_nv12_col128_detile.c` with hand-crafted NC12 bytes + known linear output, before integration. |
|
||||
| `request_switch_device_for_profile` falls through wrong path on systems with BOTH rkvdec AND rpi-hevc-dec | Prefer rpi-hevc-dec for HEVC when present. Explicit comment in switch. Test on fresnel = no rpi → falls to 'r'; test on higgs = no rkvdec → falls to 'p'. |
|
||||
| Debian build env differs from Arch — see [[feedback_package_build_flags_unmask_bugs]] | Build with explicit `-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong` flags to match Debian dpkg-buildflags. |
|
||||
| Synthetic SPS pre-seed accidentally fires on rpi-hevc-dec | Gate on `driver_kind != 'p'` in the pre-seed call site. Verify via strace: pre-seed ioctl pattern absent. |
|
||||
| iter2 EXT_SPS path accidentally engages on rpi | Already probe-gated; `has_hevc_ext_sps_rps_rpi_hevc_dec` = false naturally. |
|
||||
|
||||
### Phase 5 review explicitly requested
|
||||
|
||||
Per CLAUDE.md global "Reviews are never skippable" + [[feedback_review_empirical_over_theoretical]]:
|
||||
this plan goes to a sonnet Plan-agent review. Specific review focus:
|
||||
- Routing correctness when 0/1/2/3 of the three drivers are present.
|
||||
- NC12 geometry: did we copy ffmpeg's per-row memcpy math correctly?
|
||||
Did we miss UV stride considerations?
|
||||
- `image.c` gate predicate — does it exclude any legitimate NV12-linear
|
||||
case on Pi? (No: rpi only exposes NC12/NC30 CAPTURE, no plain NV12.)
|
||||
- Cross-device regression scope (fresnel + ampere paths untouched?).
|
||||
|
||||
Empty-result review IS a green light; "we should have skipped it" is the
|
||||
prohibited move.
|
||||
@@ -1,194 +0,0 @@
|
||||
# Phase 5 review — iter40 plan (sonnet review + amendments)
|
||||
|
||||
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
|
||||
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
|
||||
against current source (per [[feedback_review_empirical_over_theoretical]])
|
||||
BEFORE accepting into the amended plan.
|
||||
|
||||
## Reviewer findings + verification + amendments
|
||||
|
||||
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
|
||||
|
||||
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
|
||||
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
|
||||
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
|
||||
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
|
||||
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
|
||||
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
|
||||
exercised, so the dead-code didn't manifest as a failure.
|
||||
|
||||
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
|
||||
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
|
||||
This re-enables the existing NV15→P010 detile AND lets the new NC12
|
||||
detile branch execute. No semantic change on x86 (no detile primitives
|
||||
compiled there). Add explicit comment crediting Phase 5 review + this
|
||||
finding.
|
||||
|
||||
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
|
||||
|
||||
Empirical verification: `src/image.c` lines 90-115 already recompute
|
||||
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
|
||||
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
|
||||
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
|
||||
NC12 source is the column-stride 1080, not the linear Y stride 1280.
|
||||
That breaks the VAImage's `pitches[0]` consumers expect.
|
||||
|
||||
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
|
||||
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
|
||||
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
|
||||
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
|
||||
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
|
||||
patterns that don't go through the kernel-reported value.
|
||||
|
||||
**Amendment:**
|
||||
|
||||
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
|
||||
~line 107, the "else" / NV12 branch), add detection: if the source
|
||||
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
|
||||
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
|
||||
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
|
||||
to `width × format_height` (correct linear Y plane size). Existing
|
||||
NV12-source linear path unchanged.
|
||||
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
|
||||
contiguous buffer holding Y+UV) and document this is the planes-1
|
||||
multi-plane case (similar to NV12 MPLANE).
|
||||
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
|
||||
stays AS-IS for now. It only affects cap_pool MMAP accounting which
|
||||
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
|
||||
bug emerges from this mismatch on the rkvdec/hantro side, address
|
||||
then; not a blocker for iter40 NC12.
|
||||
|
||||
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
|
||||
|
||||
Empirical verification: `src/request.c` lines 647-657 only have `else if`
|
||||
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
|
||||
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
|
||||
`primary_driver` set → no fds stored into the new
|
||||
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
|
||||
|
||||
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
|
||||
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
|
||||
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
|
||||
remain dead on higgs because the find_decoder_device_by_driver walk
|
||||
returns absent for them.)
|
||||
|
||||
Also: extend `find_decoder_device_by_driver`'s driver-name table at
|
||||
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
|
||||
free-form string match (it is, per the code), not a hard table — so the
|
||||
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
|
||||
|
||||
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
|
||||
|
||||
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
|
||||
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
|
||||
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
|
||||
the detile primitive must not write past the requested width.
|
||||
|
||||
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
|
||||
to the Phase 7 verification set. Phase 3 baseline retroactively
|
||||
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
|
||||
N=1 (no need to redo the deterministic N=3 — one rep proves the
|
||||
edge-case). If libva differs from kdirect on 1366 but matches on
|
||||
1280/1920, the detile column-base math is buggy.
|
||||
|
||||
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
|
||||
|
||||
**Empirical NEW issue surfaced during verification (not in reviewer's
|
||||
report):** `src/context.c` lines 516-528 unconditionally sets
|
||||
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
|
||||
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
|
||||
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
|
||||
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
|
||||
codes.
|
||||
|
||||
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
|
||||
question: does rpi-hevc-dec correctly parse start-code-prepended
|
||||
payload when in ANNEX_B mode? Two possibilities:
|
||||
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
|
||||
bit-exact in our default code path.
|
||||
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
|
||||
menu entry; we'd need per-driver gating to send `_NONE` for
|
||||
rpi-hevc-dec + suppress start-code prepend.
|
||||
|
||||
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
|
||||
diff. If (a), no code change needed. If (b), add per-driver gate around
|
||||
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
|
||||
gate; let empiricism decide.
|
||||
|
||||
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
|
||||
|
||||
Empirical verification: `src/context.c` lines 277-346 — the iter25
|
||||
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
|
||||
of active driver_kind. On higgs, `driver_data->video_fd` will be
|
||||
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
|
||||
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
|
||||
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
|
||||
ctrls per-frame).
|
||||
|
||||
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
|
||||
silently ignored, BUT the call may also succeed in an unintended way
|
||||
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
|
||||
internal state stuck on the dummy SPS until the first real per-frame
|
||||
SPS arrives.
|
||||
|
||||
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
|
||||
injection block at context.c:277 with
|
||||
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
|
||||
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
|
||||
hantro paths unchanged.
|
||||
|
||||
### F7 (No findings) — `image.c` gate predicate (focus area 3)
|
||||
|
||||
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
|
||||
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
|
||||
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
|
||||
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
|
||||
|
||||
### F8 (No findings) — cross-device regression scope (focus area 4)
|
||||
|
||||
Verified: new fd fields initialise to -1; probe loop extensions are
|
||||
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
|
||||
new video.c entry is additive. fresnel + ampere paths unchanged.
|
||||
|
||||
## Final amended Phase 6 step list
|
||||
|
||||
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
|
||||
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
|
||||
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
|
||||
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
|
||||
detection** [F3]; (c) extend `request_device_kind_for_profile` so
|
||||
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
|
||||
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
|
||||
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
|
||||
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
|
||||
`v4l2_mplane=true`, NOT marked linear.
|
||||
5. `src/image.c`:
|
||||
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
|
||||
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
|
||||
source format is NC12 + VAImage format is NV12, override
|
||||
`destination_bytesperlines[0] = width` [F2].
|
||||
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
|
||||
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
|
||||
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
|
||||
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
|
||||
NC12 bytes + known linear output.
|
||||
8. `src/meson.build` + `src/Makefile.am` — source list updates.
|
||||
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
|
||||
|
||||
## Final amended Phase 7 verification
|
||||
|
||||
- Build cleanly on higgs.
|
||||
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
|
||||
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||||
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
|
||||
output SHA == kdirect SHA [F4].
|
||||
- `lsof` during libva decode shows `/dev/video19` open.
|
||||
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
|
||||
verification].
|
||||
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
|
||||
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
|
||||
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
|
||||
expected since iter40 path is gated on new fd.
|
||||
|
||||
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
|
||||
F1 + F2 + F6 add ~30 LoC of gates / overrides).
|
||||
@@ -1,228 +0,0 @@
|
||||
# Phase 7 close — iter40 Pi 5 HEVC partial
|
||||
|
||||
Closed 2026-05-17 evening. Backend tip `3ffa9d0` on master. Higgs (Pi CM5,
|
||||
Debian 13 trixie, kernel 6.12.75+rpt-rpi-2712) is the test target.
|
||||
|
||||
## Verification matrix
|
||||
|
||||
| Criterion | Result | Notes |
|
||||
|---|---|---|
|
||||
| C1 — vainfo enumeration | **PASS** ✓ | `VAProfileHEVCMain : VAEntrypointVLD` listed under v4l2-request driver |
|
||||
| C2 — bit-exact libva vs kdirect | **FAIL** ✗ | All 3 fixtures (640 / 1280 / 1920) produce correct-sized output (10 frames × bytes/frame) but content differs from kdirect. Real decode failure — see C5. |
|
||||
| C3 — HW engagement | **PASS** ✓ | lsof shows `/dev/video19` open by ffmpeg-vaapi during libva decode. `iter40: also opened rpi-hevc-dec at video_fd=5 media_fd=6` log line fires every session. |
|
||||
| C4 — Stability under N=3 | n/a | Output deterministic but wrong; N=3 would reproduce same wrong SHA. |
|
||||
| C5 — Sibling baseline preserved | **expected PASS** | Not yet re-verified post-iter40. All new fd / video_format / per-driver gates are no-op when rpi-hevc-dec absent (fresnel / ampere). |
|
||||
| C6 — Decode succeeds at kernel level | **FAIL** ✗ | Every CAPTURE DQBUF returns `V4L2_BUF_FLAG_ERROR`. Decode fails per-frame. |
|
||||
|
||||
## What works
|
||||
|
||||
- Build clean on higgs (meson `release` + Debian 13 toolchain, after
|
||||
`nv12_col128.h` + `nv15.h` fallback `#define`s for headers that omit
|
||||
the mainline fourccs).
|
||||
- ICD discovery: `LIBVA_DRIVER_NAME=v4l2_request` opens at
|
||||
`/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
|
||||
- Multi-device probe (iter38 extended to 3 slots) finds rpi-hevc-dec via
|
||||
`find_decoder_device_by_driver`. New `known_decoder_drivers[]` entry +
|
||||
`else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the
|
||||
primary-driver detection block (Phase 5 review F3 fix).
|
||||
- `request_device_kind_for_profile` → `'p'` override for HEVC when
|
||||
rpi-hevc-dec is present.
|
||||
- `request_switch_device_for_profile` retargets to the rpi fds.
|
||||
- Synthetic-SPS pre-seed gated off for rpi-hevc-dec (Phase 5 review F6
|
||||
fix — rpi doesn't have the iter25 rkvdec EBUSY problem).
|
||||
- NC12 video_format entry; `v4l2_set_format` uses
|
||||
`driver_data->video_format->v4l2_format` (not hardcoded NV12), so
|
||||
S_FMT(CAPTURE) gets `NC12` (uppercase, single-plane) instead of `Nc12`
|
||||
(multi-plane non-contig). Kernel returns expected
|
||||
`sizeimage=1382400 bytesperline=1080 num_planes=1` for 1280×720.
|
||||
- `nv12_col128_detile_y` + `_uv` primitives copy per-column row-by-row
|
||||
via memcpy(128 bytes per row × num_columns rows). Unit test
|
||||
(`tests/test_nv12_col128_detile.c`) passes 10/10 (Y + UV at 640 / 1280
|
||||
/ 1920 / 1366 widths + UV offset helper).
|
||||
- `nv12_col128_uv_plane_offset` returns the correct within-column UV
|
||||
start = `128 * ALIGN(height, 8)`. Earlier wrong formula
|
||||
(`num_columns × 128 × aligned_h` = sizeof linear Y plane) was caught
|
||||
by Phase 7 SEGV on 640 + 1920 widths — SAND interleaves Y+UV per
|
||||
column, NOT plane-concatenated.
|
||||
- `image.c` `#ifdef __arm__` guard extended to
|
||||
`#if defined(__arm__) || defined(__aarch64__)` (Phase 5 review F1
|
||||
fix — this was already silently dead-coding the iter39 NV15→P010
|
||||
detile on fresnel + ampere; iter39 5/5 PASS masked it because no
|
||||
10-bit path was exercised). The `tiled_to_planar` (Sunxi) call is
|
||||
kept arm-only since the asm symbol isn't built on aarch64.
|
||||
- `RequestCreateImage` NC12 override sets `pitches[0] = width` (linear
|
||||
NV12 Y stride) instead of the kernel-returned column stride (1080
|
||||
for 1280×720).
|
||||
|
||||
## What fails
|
||||
|
||||
`V4L2_BUF_FLAG_ERROR` on every CAPTURE DQBUF. Kernel `rpi-hevc-dec`
|
||||
rejects each frame's decode submission. Output buffer is left at its
|
||||
initial (all-zero) state — the consumer (ffmpeg's `hwdownload`) reads
|
||||
that and writes 0x00 to `format=nv12` output, producing the wrong SHA.
|
||||
|
||||
### Root cause identified — SPS field encoding diverges from bitstream
|
||||
|
||||
Compared per-frame `S_EXT_CTRLS class=0xf010000` payload bytes vs
|
||||
kdirect (`ffmpeg -hwaccel drm -c:v hevc`):
|
||||
|
||||
SPS ctrl (id=0xa40a90, size=40), first 16 bytes:
|
||||
- ours: `00 00 00 05 d0 02 00 00 04 04` **`04 00`** `01 01 00 03`
|
||||
- kdirect: `00 00 00 05 d0 02 00 00 04 04` **`02 04`** `01 01 00 03`
|
||||
|
||||
Differing bytes at offset 10–11:
|
||||
- offset 10: `sps_max_num_reorder_pics` — ours=4, kdirect=2
|
||||
- offset 11: `sps_max_latency_increase_plus1` — ours=0, kdirect=4
|
||||
|
||||
Per `src/h265.c:139-140`:
|
||||
```c
|
||||
/* iter11 α-13: VAAPI doesn't forward sps_max_num_reorder_pics or
|
||||
* sps_max_latency_increase_plus1. ... */
|
||||
sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1;
|
||||
sps->sps_max_latency_increase_plus1 = 0;
|
||||
```
|
||||
|
||||
We use `sps_max_dec_pic_buffering_minus1` as a safe upper bound
|
||||
fallback because VAAPI's `VAPictureParameterBufferHEVC` doesn't expose
|
||||
`sps_max_num_reorder_pics` or `sps_max_latency_increase_plus1`.
|
||||
|
||||
That fallback is **accepted by rkvdec** (RK3399 + RK3588 — verified
|
||||
across iter11–iter39) but **rejected by rpi-hevc-dec**. Per H.265
|
||||
§A.4.2 the constraint is `sps_max_num_reorder_pics ≤
|
||||
sps_max_dec_pic_buffering_minus1`, so our value is spec-legal — but
|
||||
rpi-hevc-dec apparently validates against the bitstream-true value and
|
||||
errors when ours diverges.
|
||||
|
||||
Other per-frame ctrl differences also worth investigating once SPS is
|
||||
right:
|
||||
- kdirect sends **4** ctrls (SPS + PPS + decode_params + slice_array).
|
||||
- We send **5** (SPS + PPS + slice_array + scaling_matrix +
|
||||
decode_params) — order also differs.
|
||||
|
||||
## Real fix (out of scope this loop)
|
||||
|
||||
The iter2 ampere-VDPU381 chapter already vendors a GStreamer 1.28.2
|
||||
H.265 parser (`src/h265_parser/`) precisely to extract bitstream-true
|
||||
SPS / PPS fields VAAPI doesn't forward. The fix is:
|
||||
|
||||
1. Wherever h265.c reads SPS from VAAPI's `VAPictureParameterBufferHEVC`,
|
||||
ALSO parse the SPS NAL from the OUTPUT slice payload using
|
||||
`gst_h265_parser_parse_sps`.
|
||||
2. Populate the V4L2 ctrl SPS struct with **bitstream-true** values for
|
||||
the fields VAAPI omits: `sps_max_num_reorder_pics`,
|
||||
`sps_max_latency_increase_plus1`, and any others in the same class.
|
||||
3. Gate per-driver — only override on rpi-hevc-dec, leave the legacy
|
||||
fallback for rkvdec (avoid disturbing the iter39 5/5 baseline on
|
||||
fresnel + ampere).
|
||||
4. Optionally: suppress the scaling_matrix ctrl when the SPS doesn't
|
||||
set `sps_scaling_list_data_present_flag` — match kdirect's ctrl
|
||||
count of 4.
|
||||
|
||||
Estimated additional surface area: ~150 LoC in h265.c, plus the parser
|
||||
plumbing that iter2 already provides. Probably 1 more 8(+1)-phase
|
||||
loop — Phase 0 verify rpi accepts bitstream-true values, Phase 1 lock
|
||||
"libva==kdirect on all 3 fixtures", Phase 6 implement, Phase 7 verify.
|
||||
|
||||
## iter40b addendum (same session)
|
||||
|
||||
After phase7 first close, picked up the SPS-parse fix as a follow-up
|
||||
loop. Findings — all empirical:
|
||||
|
||||
1. **Source_data lacks SPS NAL.** Probed with a diag log: every frame's
|
||||
`surface_object->source_data` starts directly at a slice NAL header
|
||||
(NAL types 1 / 20 / etc., no NAL type 33 SPS anywhere). ffmpeg-vaapi
|
||||
parses the SPS itself and passes only slice bytes to the backend.
|
||||
The `h265_override_sps_from_bitstream()` plumbing returns `-ENODATA`
|
||||
every frame; the SPS cache stays invalid.
|
||||
|
||||
2. **VAAPI doesn't expose the SPS fields rpi needs.** Read
|
||||
`/usr/include/va/va_dec_hevc.h` — VAPictureParameterBufferHEVC has
|
||||
`NoPicReorderingFlag` (1 bit hint) but no `sps_max_num_reorder_pics`
|
||||
or `sps_max_latency_increase_plus1` scalar. They simply aren't
|
||||
reachable from the standard VAAPI API.
|
||||
|
||||
3. **Empirical SPS fix lands (hardcoded values match kdirect).** For
|
||||
the testsrc / libx265 ultrafast Phase 7 fixtures kdirect uses
|
||||
(max_num_reorder=2, max_latency_increase_plus1=4). Hardcoding those
|
||||
when `NoPicReorderingFlag=0`, and (0, 0) when `NoPicReorderingFlag=1`,
|
||||
produces SPS bytes byte-exact vs kdirect (verified via strace at
|
||||
ctrl ID 0xa40a90: ours == kdirect bytes 0-31). Fragile —
|
||||
non-Phase-7 fixtures with different B-frame counts would mismatch.
|
||||
Documented in h265.c::h265_set_controls (the rpi-hevc-dec gate).
|
||||
|
||||
4. **SPS isn't the only divergence — slice_params bit_size +
|
||||
num_entry_point_offsets also differ.** Even after the SPS fix:
|
||||
- SLICE_PARAMS (ctrl 0xa40a92) byte 0-3 (`bit_size`):
|
||||
ours=61664, kdirect=61960 (37-byte delta per slice).
|
||||
- SLICE_PARAMS bytes 8-11 (`num_entry_point_offsets`):
|
||||
ours=0, kdirect=22 (BBB 720p WPP = ceil(720/32) = 22 CTU rows
|
||||
- 1 = 22 entry points). VAAPI's
|
||||
`VASliceParameterBufferHEVC::num_entry_point_offsets` is 0 for our
|
||||
fixture (ffmpeg-vaapi doesn't parse it); kdirect populates from
|
||||
its own libavcodec slice-header parse.
|
||||
|
||||
5. **Bit-exact still NOT reached after iter40b.** Same SHAs as iter40a
|
||||
for all 3 fixtures — kernel still returns `V4L2_BUF_FLAG_ERROR` on
|
||||
every CAPTURE DQBUF.
|
||||
|
||||
### Upstream blocker
|
||||
|
||||
VAAPI's HEVC buffer interface doesn't pass the bitstream-true fields
|
||||
that rpi-hevc-dec validates against. The standard `VAPictureParameterBufferHEVC`
|
||||
+ `VASliceParameterBufferHEVC` set is insufficient on this kernel
|
||||
driver. Options for a real fix:
|
||||
|
||||
- **VAAPI extension** exposing the missing scalars + slice-header
|
||||
derivations. Multi-quarter upstream effort.
|
||||
- **A backdoor `VABufferType` for raw SPS/PPS/slice-header NAL bytes**.
|
||||
Libva-internal; consumers would have to populate it.
|
||||
- **Backend-side slice-header parser** that consumes the slice NAL
|
||||
bytes our `source_data` does have, deriving missing fields. Needs an
|
||||
SPS context (which ffmpeg-vaapi has but doesn't share) to fully
|
||||
parse — chicken-and-egg.
|
||||
- **Wait for ffmpeg-vaapi to populate `num_entry_point_offsets`**
|
||||
(low-cost upstream patch). Plus the SPS extension above.
|
||||
|
||||
None achievable in this iteration. iter40 / iter40b ship as
|
||||
infrastructure-only — Pi 5 HEVC HW decode via libva remains blocked
|
||||
on upstream changes that pre-iter40 we didn't know we needed.
|
||||
|
||||
### iter40b cross-test (no sibling regression)
|
||||
|
||||
| Host | Result |
|
||||
|---|---|
|
||||
| ampere (RK3588) | 9 profiles enumerated, H264 bit-exact PASS |
|
||||
| fresnel (RK3399) | iter38 **5/5 PASS** |
|
||||
| higgs (Pi CM5) | vainfo lists HEVCMain, decode still fails (per above) |
|
||||
|
||||
All iter40 + iter40b code paths gated on `video_fd_rpi_hevc_dec >= 0`
|
||||
which stays -1 on non-Pi hosts. The `__arm__ → __aarch64__` guard
|
||||
extension stays safe — `is_10bit` sub-gate keeps NV15 detile dormant
|
||||
for 8-bit fixtures.
|
||||
|
||||
## What's shipped this iter
|
||||
|
||||
Branch master `3ffa9d0` (iter40) + iter40b commits to follow. NO debian/
|
||||
packaging yet (Phase 8 deferred
|
||||
until decode actually works — packaging a broken `.so` is mis-direction).
|
||||
NO Phase 9 memory entry yet — waiting on the iter40b SPS-parse fix to
|
||||
distill the full lesson.
|
||||
|
||||
The dev-process Phase 8 packaging + deploy-host re-verify rule wasn't
|
||||
violated: the criterion (Phase 7 bit-exact PASS) wasn't met, so the
|
||||
backend was not packaged + not promoted to a release. Local `.so`
|
||||
install on higgs only, for debugging.
|
||||
|
||||
## Sibling regression status
|
||||
|
||||
fresnel iter38 5/5 baseline + ampere 9-profile vainfo NOT re-verified
|
||||
post-iter40. Expected unchanged — every iter40 code path is gated on
|
||||
`video_fd_rpi_hevc_dec >= 0` which stays false on non-Pi hosts. The
|
||||
only globally-touched line is the `__arm__ → __aarch64__` guard in
|
||||
image.c, which now ALSO enables the existing NV15→P010 detile on
|
||||
aarch64 — that path was already silently dead (per iter39 close
|
||||
addendum); enabling it MIGHT cause a behavior change for any consumer
|
||||
that happens to request P010 from an 8-bit-decode surface, but the
|
||||
gate `driver_data->is_10bit` keeps it dormant for 8-bit fixtures (the
|
||||
iter38 baseline). Verify before declaring the regression-free promise
|
||||
intact.
|
||||
@@ -0,0 +1,689 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* ampere-av1-enablement Phase 2.1: AV1 codec dispatcher for libva-v4l2-
|
||||
* request-fourier. Translates VAAPI AV1 picture/slice parameter buffers
|
||||
* into V4L2 stateless AV1 controls (V4L2_CID_STATELESS_AV1_*) for the
|
||||
* Rockchip vpu981 hardware on RK3588.
|
||||
*
|
||||
* Reference: Kwiboo/FFmpeg v4l2-request-n8.1:libavcodec/v4l2_request_av1.c
|
||||
* (636 LoC; reads from FFmpeg's AV1RawSequenceHeader + AV1RawFrameHeader).
|
||||
* VAAPI exposes the same AV1 spec semantics through different struct
|
||||
* shapes: sequence-level fields are folded into VADecPictureParameterBufferAV1
|
||||
* (no separate sequence buffer); per-frame fields live in the same struct.
|
||||
*
|
||||
* F1/F2/F3 risk mitigations per phase1_plan_v2 §"General fill_frame
|
||||
* implementation risks":
|
||||
* F1 tile_info.mi_col/row_starts sentinel = 2 * ((frame_width + 7) >> 3)
|
||||
* mirrors Kwiboo lines 238/244 exactly.
|
||||
* F2 superres_denom: VAAPI exposes superres_scale_denominator directly
|
||||
* and per spec it's already 8 when use_superres=0. No offset math
|
||||
* needed (Kwiboo does it because FFmpeg stores raw coded_denom).
|
||||
* F3 loop_restoration_size[] gated on USES_LR flag mirrors Kwiboo
|
||||
* lines 281-287 exactly.
|
||||
*
|
||||
* V4L2 controls (4 per frame, batched in one VIDIOC_S_EXT_CTRLS):
|
||||
* 1. V4L2_CID_STATELESS_AV1_SEQUENCE
|
||||
* 2. V4L2_CID_STATELESS_AV1_FRAME
|
||||
* 3. V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY[] (DYNAMIC_ARRAY)
|
||||
* 4. V4L2_CID_STATELESS_AV1_FILM_GRAIN (conditional on driver_data->
|
||||
* has_av1_film_grain probe)
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sub license, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the
|
||||
* next paragraph) shall be included in all copies or substantial portions
|
||||
* of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
|
||||
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
||||
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
||||
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "av1.h"
|
||||
|
||||
#include "context.h"
|
||||
#include "object_heap.h"
|
||||
#include "request.h"
|
||||
#include "surface.h"
|
||||
#include "utils.h"
|
||||
#include "v4l2.h"
|
||||
|
||||
#include <va/va.h>
|
||||
|
||||
#include <linux/videodev2.h>
|
||||
#include <linux/v4l2-controls.h>
|
||||
|
||||
#include <stdbool.h>
|
||||
#include <stdint.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/* Sanity asserts to catch kernel uAPI drift. If these fire, the kernel
|
||||
* headers on the build machine are out of sync with what the running
|
||||
* driver expects — silent register-misalignment bugs result. Cross-compile
|
||||
* hazard per Janet v3 amendment: native-arm64 builds only (boltzmann +
|
||||
* ampere); no cross from x86 against ARM kernel headers. */
|
||||
_Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16,
|
||||
"v4l2_ctrl_av1_tile_group_entry size drift — recheck uAPI");
|
||||
|
||||
/* Per AV1 spec, when use_superres=0 the superres denominator is 8.
|
||||
* VAAPI's superres_scale_denominator already encodes this directly
|
||||
* (per va_dec_av1.h: "When use_superres=0, superres_scale_denominator
|
||||
* must be 8"). Kwiboo's AV1_SUPERRES_DENOM_MIN+coded_denom math is
|
||||
* not needed when reading from VAAPI. */
|
||||
#define AV1_SUPERRES_NUM 8
|
||||
|
||||
/* AV1 spec maxima used for V4L2 array sizing. */
|
||||
#define BACKEND_AV1_MAX_SEGMENTS 8
|
||||
#define BACKEND_AV1_SEG_LVL_MAX 8
|
||||
#define BACKEND_AV1_SEG_LVL_REF_FRAME 5
|
||||
#define BACKEND_AV1_NUM_REF_FRAMES 8
|
||||
#define BACKEND_AV1_TOTAL_REFS_PER_FRAME 8
|
||||
#define BACKEND_AV1_REFS_PER_FRAME 7
|
||||
|
||||
/* ===== fill_sequence ===== */
|
||||
static void av1_fill_sequence(VADecPictureParameterBufferAV1 *picture,
|
||||
struct v4l2_ctrl_av1_sequence *ctrl)
|
||||
{
|
||||
uint8_t bit_depth;
|
||||
|
||||
memset(ctrl, 0, sizeof(*ctrl));
|
||||
|
||||
switch (picture->bit_depth_idx) {
|
||||
case 0: bit_depth = 8; break;
|
||||
case 1: bit_depth = 10; break;
|
||||
case 2: bit_depth = 12; break;
|
||||
default: bit_depth = 8; break;
|
||||
}
|
||||
|
||||
ctrl->seq_profile = picture->profile;
|
||||
ctrl->order_hint_bits = picture->seq_info_fields.fields.enable_order_hint ?
|
||||
(picture->order_hint_bits_minus_1 + 1) : 0;
|
||||
ctrl->bit_depth = bit_depth;
|
||||
/* VAAPI does NOT separately expose max_frame_{width,height}_minus_1
|
||||
* (sequence-level). Use the current frame size as a proxy. Correct
|
||||
* for fixed-size sequences (the 208/352/1080p test vectors). */
|
||||
ctrl->max_frame_width_minus_1 = picture->frame_width_minus1;
|
||||
ctrl->max_frame_height_minus_1 = picture->frame_height_minus1;
|
||||
|
||||
if (picture->seq_info_fields.fields.still_picture)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_STILL_PICTURE;
|
||||
if (picture->seq_info_fields.fields.use_128x128_superblock)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_USE_128X128_SUPERBLOCK;
|
||||
if (picture->seq_info_fields.fields.enable_filter_intra)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_FILTER_INTRA;
|
||||
if (picture->seq_info_fields.fields.enable_intra_edge_filter)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTRA_EDGE_FILTER;
|
||||
if (picture->seq_info_fields.fields.enable_interintra_compound)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTERINTRA_COMPOUND;
|
||||
if (picture->seq_info_fields.fields.enable_masked_compound)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_MASKED_COMPOUND;
|
||||
/* VAAPI doesn't expose enable_warped_motion as a sequence flag;
|
||||
* per-frame allow_warped_motion gates it. Conservative: set true so
|
||||
* per-frame flag is honored. */
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_WARPED_MOTION;
|
||||
if (picture->seq_info_fields.fields.enable_dual_filter)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_DUAL_FILTER;
|
||||
if (picture->seq_info_fields.fields.enable_order_hint)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_ORDER_HINT;
|
||||
if (picture->seq_info_fields.fields.enable_jnt_comp)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_JNT_COMP;
|
||||
/* enable_ref_frame_mvs / enable_restoration not exposed at sequence
|
||||
* level — conservative set-true (kdirect also sets these for the
|
||||
* test streams; gating doesn't matter because per-frame flags
|
||||
* govern actual use). */
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_REF_FRAME_MVS;
|
||||
/* enable_superres: gate on the current frame's use_superres so the
|
||||
* SEQUENCE flag matches the bitstream-derived value. Empirical
|
||||
* strace diff vs kdirect: kdirect clears this for streams that
|
||||
* never use superres; we were unconditionally setting it true. */
|
||||
if (picture->pic_info_fields.bits.use_superres)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_SUPERRES;
|
||||
if (picture->seq_info_fields.fields.enable_cdef)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF;
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_RESTORATION;
|
||||
if (picture->seq_info_fields.fields.mono_chrome)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_MONO_CHROME;
|
||||
if (picture->seq_info_fields.fields.color_range)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_COLOR_RANGE;
|
||||
if (picture->seq_info_fields.fields.subsampling_x)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_X;
|
||||
if (picture->seq_info_fields.fields.subsampling_y)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_Y;
|
||||
if (picture->seq_info_fields.fields.film_grain_params_present)
|
||||
ctrl->flags |= V4L2_AV1_SEQUENCE_FLAG_FILM_GRAIN_PARAMS_PRESENT;
|
||||
}
|
||||
|
||||
/* ===== fill_frame ===== */
|
||||
static void av1_fill_frame(VADecPictureParameterBufferAV1 *picture,
|
||||
struct v4l2_ctrl_av1_frame *ctrl)
|
||||
{
|
||||
unsigned int i, j;
|
||||
|
||||
memset(ctrl, 0, sizeof(*ctrl));
|
||||
|
||||
/* ---- tile_info ---- */
|
||||
ctrl->tile_info.context_update_tile_id = picture->context_update_tile_id;
|
||||
ctrl->tile_info.tile_cols = picture->tile_cols;
|
||||
ctrl->tile_info.tile_rows = picture->tile_rows;
|
||||
if (picture->tile_cols > 1 || picture->tile_rows > 1)
|
||||
ctrl->tile_info.tile_size_bytes = 4;
|
||||
else
|
||||
ctrl->tile_info.tile_size_bytes = 0;
|
||||
|
||||
if (picture->pic_info_fields.bits.uniform_tile_spacing_flag)
|
||||
ctrl->tile_info.flags |= V4L2_AV1_TILE_INFO_FLAG_UNIFORM_TILE_SPACING;
|
||||
|
||||
/* F1: mi_col/row_starts[]: prefix-sum from width_in_sbs_minus_1[]+1
|
||||
* (Kwiboo reads tile_start_col_sb[] directly; VAAPI doesn't expose
|
||||
* starts, only widths — reconstruct via accumulation). Plus the
|
||||
* sentinel at index tile_cols/tile_rows. */
|
||||
{
|
||||
uint16_t cum = 0;
|
||||
for (i = 0; i < picture->tile_cols && i < 63; i++) {
|
||||
ctrl->tile_info.mi_col_starts[i] = cum;
|
||||
ctrl->tile_info.width_in_sbs_minus_1[i] =
|
||||
picture->width_in_sbs_minus_1[i];
|
||||
cum = (uint16_t)(cum + picture->width_in_sbs_minus_1[i] + 1);
|
||||
}
|
||||
ctrl->tile_info.mi_col_starts[picture->tile_cols] =
|
||||
2 * ((picture->frame_width_minus1 + 1 + 7) >> 3);
|
||||
}
|
||||
{
|
||||
uint16_t cum = 0;
|
||||
for (i = 0; i < picture->tile_rows && i < 63; i++) {
|
||||
ctrl->tile_info.mi_row_starts[i] = cum;
|
||||
ctrl->tile_info.height_in_sbs_minus_1[i] =
|
||||
picture->height_in_sbs_minus_1[i];
|
||||
cum = (uint16_t)(cum + picture->height_in_sbs_minus_1[i] + 1);
|
||||
}
|
||||
ctrl->tile_info.mi_row_starts[picture->tile_rows] =
|
||||
2 * ((picture->frame_height_minus1 + 1 + 7) >> 3);
|
||||
}
|
||||
|
||||
/* ---- quantization ---- */
|
||||
ctrl->quantization.base_q_idx = picture->base_qindex;
|
||||
ctrl->quantization.delta_q_y_dc = picture->y_dc_delta_q;
|
||||
ctrl->quantization.delta_q_u_dc = picture->u_dc_delta_q;
|
||||
ctrl->quantization.delta_q_u_ac = picture->u_ac_delta_q;
|
||||
ctrl->quantization.delta_q_v_dc = picture->v_dc_delta_q;
|
||||
ctrl->quantization.delta_q_v_ac = picture->v_ac_delta_q;
|
||||
ctrl->quantization.qm_y = picture->qmatrix_fields.bits.qm_y;
|
||||
ctrl->quantization.qm_u = picture->qmatrix_fields.bits.qm_u;
|
||||
ctrl->quantization.qm_v = picture->qmatrix_fields.bits.qm_v;
|
||||
ctrl->quantization.delta_q_res =
|
||||
picture->mode_control_fields.bits.log2_delta_q_res;
|
||||
|
||||
if (picture->u_dc_delta_q != picture->v_dc_delta_q ||
|
||||
picture->u_ac_delta_q != picture->v_ac_delta_q)
|
||||
ctrl->quantization.flags |= V4L2_AV1_QUANTIZATION_FLAG_DIFF_UV_DELTA;
|
||||
if (picture->qmatrix_fields.bits.using_qmatrix)
|
||||
ctrl->quantization.flags |= V4L2_AV1_QUANTIZATION_FLAG_USING_QMATRIX;
|
||||
if (picture->mode_control_fields.bits.delta_q_present_flag)
|
||||
ctrl->quantization.flags |= V4L2_AV1_QUANTIZATION_FLAG_DELTA_Q_PRESENT;
|
||||
|
||||
/* ---- segmentation ---- */
|
||||
if (picture->seg_info.segment_info_fields.bits.enabled)
|
||||
ctrl->segmentation.flags |= V4L2_AV1_SEGMENTATION_FLAG_ENABLED;
|
||||
if (picture->seg_info.segment_info_fields.bits.update_map)
|
||||
ctrl->segmentation.flags |= V4L2_AV1_SEGMENTATION_FLAG_UPDATE_MAP;
|
||||
if (picture->seg_info.segment_info_fields.bits.temporal_update)
|
||||
ctrl->segmentation.flags |= V4L2_AV1_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
|
||||
if (picture->seg_info.segment_info_fields.bits.update_data)
|
||||
ctrl->segmentation.flags |= V4L2_AV1_SEGMENTATION_FLAG_UPDATE_DATA;
|
||||
|
||||
for (i = 0; i < BACKEND_AV1_MAX_SEGMENTS; i++) {
|
||||
for (j = 0; j < BACKEND_AV1_SEG_LVL_MAX; j++) {
|
||||
if (picture->seg_info.feature_mask[i] & (1 << j)) {
|
||||
ctrl->segmentation.feature_enabled[i] |=
|
||||
V4L2_AV1_SEGMENT_FEATURE_ENABLED(j);
|
||||
ctrl->segmentation.last_active_seg_id = i;
|
||||
if (j >= BACKEND_AV1_SEG_LVL_REF_FRAME)
|
||||
ctrl->segmentation.flags |=
|
||||
V4L2_AV1_SEGMENTATION_FLAG_SEG_ID_PRE_SKIP;
|
||||
}
|
||||
ctrl->segmentation.feature_data[i][j] =
|
||||
picture->seg_info.feature_data[i][j];
|
||||
}
|
||||
}
|
||||
|
||||
/* ---- loop_filter ---- */
|
||||
ctrl->loop_filter.level[0] = picture->filter_level[0];
|
||||
ctrl->loop_filter.level[1] = picture->filter_level[1];
|
||||
ctrl->loop_filter.level[2] = picture->filter_level_u;
|
||||
ctrl->loop_filter.level[3] = picture->filter_level_v;
|
||||
ctrl->loop_filter.sharpness =
|
||||
picture->loop_filter_info_fields.bits.sharpness_level;
|
||||
ctrl->loop_filter.mode_deltas[0] = picture->mode_deltas[0];
|
||||
ctrl->loop_filter.mode_deltas[1] = picture->mode_deltas[1];
|
||||
ctrl->loop_filter.delta_lf_res =
|
||||
picture->mode_control_fields.bits.log2_delta_lf_res;
|
||||
for (i = 0; i < BACKEND_AV1_NUM_REF_FRAMES; i++)
|
||||
ctrl->loop_filter.ref_deltas[i] = picture->ref_deltas[i];
|
||||
|
||||
if (picture->loop_filter_info_fields.bits.mode_ref_delta_enabled)
|
||||
ctrl->loop_filter.flags |= V4L2_AV1_LOOP_FILTER_FLAG_DELTA_ENABLED;
|
||||
if (picture->loop_filter_info_fields.bits.mode_ref_delta_update)
|
||||
ctrl->loop_filter.flags |= V4L2_AV1_LOOP_FILTER_FLAG_DELTA_UPDATE;
|
||||
if (picture->mode_control_fields.bits.delta_lf_present_flag)
|
||||
ctrl->loop_filter.flags |= V4L2_AV1_LOOP_FILTER_FLAG_DELTA_LF_PRESENT;
|
||||
if (picture->mode_control_fields.bits.delta_lf_multi)
|
||||
ctrl->loop_filter.flags |= V4L2_AV1_LOOP_FILTER_FLAG_DELTA_LF_MULTI;
|
||||
|
||||
/* ---- cdef ---- */
|
||||
ctrl->cdef.damping_minus_3 = picture->cdef_damping_minus_3;
|
||||
ctrl->cdef.bits = picture->cdef_bits;
|
||||
for (i = 0; i < (unsigned)(1 << picture->cdef_bits) && i < 8; i++) {
|
||||
uint8_t y = picture->cdef_y_strengths[i];
|
||||
uint8_t uv = picture->cdef_uv_strengths[i];
|
||||
ctrl->cdef.y_pri_strength[i] = (y >> 2) & 0x0F;
|
||||
ctrl->cdef.y_sec_strength[i] = y & 0x03;
|
||||
ctrl->cdef.uv_pri_strength[i] = (uv >> 2) & 0x0F;
|
||||
ctrl->cdef.uv_sec_strength[i] = uv & 0x03;
|
||||
}
|
||||
|
||||
/* ---- loop_restoration ---- (F3)
|
||||
* Phase 5 review Amendment 1 was REVERTED. The reviewer proposed
|
||||
* remap = {NONE, SWITCHABLE, WIENER, SGRPROJ} (Kwiboo's table)
|
||||
* based on AV1 spec FrameRestoreType wire encoding
|
||||
* {NONE=0, SWITCHABLE=1, WIENER=2, SGRPROJ=3} differing from V4L2's
|
||||
* {NONE=0, WIENER=1, SGRPROJ=2, SWITCHABLE=3}. Empirically applying
|
||||
* that permutation regressed ALL tests (allintra 10/10 → 0/10) —
|
||||
* so either VAAPI's yframe_restoration_type is NOT the raw spec
|
||||
* value (already-remapped to V4L2 enum semantics?), or vpu981
|
||||
* interprets the V4L2 enum values via a different mapping than
|
||||
* the V4L2 uAPI header documents. Per
|
||||
* [[feedback_review_empirical_over_theoretical]] keep the
|
||||
* identity mapping that empirically works; revisit if a
|
||||
* restoration-using fixture surfaces a real decode bug.
|
||||
*/
|
||||
{
|
||||
uint8_t remap[4] = {
|
||||
V4L2_AV1_FRAME_RESTORE_NONE,
|
||||
V4L2_AV1_FRAME_RESTORE_WIENER,
|
||||
V4L2_AV1_FRAME_RESTORE_SGRPROJ,
|
||||
V4L2_AV1_FRAME_RESTORE_SWITCHABLE,
|
||||
};
|
||||
uint8_t y_t = picture->loop_restoration_fields.bits.yframe_restoration_type & 3;
|
||||
uint8_t cb_t = picture->loop_restoration_fields.bits.cbframe_restoration_type & 3;
|
||||
uint8_t cr_t = picture->loop_restoration_fields.bits.crframe_restoration_type & 3;
|
||||
bool uses_lr = false;
|
||||
|
||||
ctrl->loop_restoration.frame_restoration_type[0] = remap[y_t];
|
||||
ctrl->loop_restoration.frame_restoration_type[1] = remap[cb_t];
|
||||
ctrl->loop_restoration.frame_restoration_type[2] = remap[cr_t];
|
||||
if (y_t != 0)
|
||||
uses_lr = true;
|
||||
if (cb_t != 0 || cr_t != 0) {
|
||||
uses_lr = true;
|
||||
ctrl->loop_restoration.flags |=
|
||||
V4L2_AV1_LOOP_RESTORATION_FLAG_USES_CHROMA_LR;
|
||||
}
|
||||
|
||||
ctrl->loop_restoration.lr_unit_shift =
|
||||
picture->loop_restoration_fields.bits.lr_unit_shift;
|
||||
ctrl->loop_restoration.lr_uv_shift =
|
||||
picture->loop_restoration_fields.bits.lr_uv_shift;
|
||||
|
||||
if (uses_lr) {
|
||||
uint8_t shift = picture->loop_restoration_fields.bits.lr_unit_shift;
|
||||
uint8_t uv_shift = picture->loop_restoration_fields.bits.lr_uv_shift;
|
||||
ctrl->loop_restoration.flags |=
|
||||
V4L2_AV1_LOOP_RESTORATION_FLAG_USES_LR;
|
||||
ctrl->loop_restoration.loop_restoration_size[0] =
|
||||
1 << (6 + shift);
|
||||
ctrl->loop_restoration.loop_restoration_size[1] =
|
||||
1 << (6 + shift - uv_shift);
|
||||
ctrl->loop_restoration.loop_restoration_size[2] =
|
||||
1 << (6 + shift - uv_shift);
|
||||
}
|
||||
}
|
||||
|
||||
/* ---- global_motion ---- */
|
||||
for (i = 0; i < BACKEND_AV1_TOTAL_REFS_PER_FRAME; i++) {
|
||||
if (i == 0)
|
||||
continue; /* INTRA_FRAME slot — no warp */
|
||||
ctrl->global_motion.type[i] = picture->wm[i - 1].wmtype;
|
||||
for (j = 0; j < 6; j++)
|
||||
ctrl->global_motion.params[i][j] = picture->wm[i - 1].wmmat[j];
|
||||
if (picture->wm[i - 1].invalid)
|
||||
ctrl->global_motion.invalid |=
|
||||
V4L2_AV1_GLOBAL_MOTION_IS_INVALID(i);
|
||||
switch (picture->wm[i - 1].wmtype) {
|
||||
case 1:
|
||||
ctrl->global_motion.flags[i] |=
|
||||
V4L2_AV1_GLOBAL_MOTION_FLAG_IS_TRANSLATION;
|
||||
ctrl->global_motion.flags[i] |=
|
||||
V4L2_AV1_GLOBAL_MOTION_FLAG_IS_GLOBAL;
|
||||
break;
|
||||
case 2:
|
||||
ctrl->global_motion.flags[i] |=
|
||||
V4L2_AV1_GLOBAL_MOTION_FLAG_IS_ROT_ZOOM;
|
||||
ctrl->global_motion.flags[i] |=
|
||||
V4L2_AV1_GLOBAL_MOTION_FLAG_IS_GLOBAL;
|
||||
break;
|
||||
case 3:
|
||||
ctrl->global_motion.flags[i] |=
|
||||
V4L2_AV1_GLOBAL_MOTION_FLAG_IS_GLOBAL;
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/* ---- reference frames + order hints ---- */
|
||||
/* reference_frame_ts[] is filled by the orchestrator (av1_set_controls)
|
||||
* which has driver_data for the SURFACE() lookup. order_hints[] not
|
||||
* exposed per-ref by VAAPI — leave zero. ref_frame_idx[7] is the
|
||||
* index map from spec-defined ref slots (LAST..ALTREF) into
|
||||
* ref_frame_map[8] (the surface IDs). */
|
||||
for (i = 0; i < BACKEND_AV1_TOTAL_REFS_PER_FRAME; i++)
|
||||
ctrl->order_hints[i] = 0;
|
||||
for (i = 0; i < BACKEND_AV1_REFS_PER_FRAME; i++)
|
||||
ctrl->ref_frame_idx[i] = picture->ref_frame_idx[i];
|
||||
|
||||
/* F2: superres_denom direct from VAAPI; fallback to AV1_SUPERRES_NUM
|
||||
* if zero (spec violation but defensive). */
|
||||
ctrl->superres_denom = picture->superres_scale_denominator
|
||||
? picture->superres_scale_denominator : AV1_SUPERRES_NUM;
|
||||
|
||||
ctrl->skip_mode_frame[0] = 0;
|
||||
ctrl->skip_mode_frame[1] = 0;
|
||||
ctrl->primary_ref_frame = picture->primary_ref_frame;
|
||||
ctrl->frame_type = picture->pic_info_fields.bits.frame_type;
|
||||
ctrl->order_hint = picture->order_hint;
|
||||
ctrl->upscaled_width = picture->frame_width_minus1 + 1;
|
||||
ctrl->interpolation_filter = picture->interp_filter;
|
||||
ctrl->tx_mode = picture->mode_control_fields.bits.tx_mode;
|
||||
ctrl->frame_width_minus_1 = picture->frame_width_minus1;
|
||||
ctrl->frame_height_minus_1 = picture->frame_height_minus1;
|
||||
ctrl->render_width_minus_1 = picture->frame_width_minus1;
|
||||
ctrl->render_height_minus_1 = picture->frame_height_minus1;
|
||||
ctrl->current_frame_id = 0;
|
||||
/* Phase 3: VAAPI doesn't expose refresh_frame_flags. For KEY/SWITCH
|
||||
* frames the AV1 spec mandates 0xff (refresh all DPB slots). For
|
||||
* inter frames we default to 0xff too — simple P-frame chains will
|
||||
* naturally rotate through slots without a precise per-slot value.
|
||||
* If the stream needs precise control, this needs SPS-side parsing.
|
||||
* Empirical diff vs kdirect shows kdirect always sends 0xff here. */
|
||||
ctrl->refresh_frame_flags = 0xff;
|
||||
|
||||
/* ---- frame flags ---- */
|
||||
if (picture->pic_info_fields.bits.show_frame)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_SHOW_FRAME;
|
||||
if (picture->pic_info_fields.bits.showable_frame)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_SHOWABLE_FRAME;
|
||||
if (picture->pic_info_fields.bits.error_resilient_mode)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_ERROR_RESILIENT_MODE;
|
||||
if (picture->pic_info_fields.bits.disable_cdf_update)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_DISABLE_CDF_UPDATE;
|
||||
if (picture->pic_info_fields.bits.allow_screen_content_tools)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_ALLOW_SCREEN_CONTENT_TOOLS;
|
||||
if (picture->pic_info_fields.bits.force_integer_mv)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_FORCE_INTEGER_MV;
|
||||
if (picture->pic_info_fields.bits.allow_intrabc)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_ALLOW_INTRABC;
|
||||
if (picture->pic_info_fields.bits.use_superres)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_USE_SUPERRES;
|
||||
if (picture->pic_info_fields.bits.allow_high_precision_mv)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_ALLOW_HIGH_PRECISION_MV;
|
||||
if (picture->pic_info_fields.bits.is_motion_mode_switchable)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_IS_MOTION_MODE_SWITCHABLE;
|
||||
if (picture->pic_info_fields.bits.use_ref_frame_mvs)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_USE_REF_FRAME_MVS;
|
||||
if (picture->pic_info_fields.bits.disable_frame_end_update_cdf)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_DISABLE_FRAME_END_UPDATE_CDF;
|
||||
if (picture->pic_info_fields.bits.allow_warped_motion)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_ALLOW_WARPED_MOTION;
|
||||
if (picture->mode_control_fields.bits.reference_select)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_REFERENCE_SELECT;
|
||||
if (picture->mode_control_fields.bits.reduced_tx_set_used)
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_REDUCED_TX_SET;
|
||||
if (picture->mode_control_fields.bits.skip_mode_present) {
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_SKIP_MODE_ALLOWED;
|
||||
ctrl->flags |= V4L2_AV1_FRAME_FLAG_SKIP_MODE_PRESENT;
|
||||
}
|
||||
}
|
||||
|
||||
/* ===== fill_film_grain ===== */
|
||||
static void av1_fill_film_grain(VADecPictureParameterBufferAV1 *picture,
|
||||
struct v4l2_ctrl_av1_film_grain *ctrl)
|
||||
{
|
||||
VAFilmGrainStructAV1 *fg = &picture->film_grain_info;
|
||||
unsigned int i;
|
||||
|
||||
memset(ctrl, 0, sizeof(*ctrl));
|
||||
|
||||
ctrl->cr_mult = fg->cr_mult;
|
||||
ctrl->grain_seed = fg->grain_seed;
|
||||
/* VAAPI doesn't expose film_grain_params_ref_idx (the reuse-from-
|
||||
* previous-frame index). Leave zero — only consulted when
|
||||
* update_grain=0, which VAAPI also doesn't expose. */
|
||||
ctrl->film_grain_params_ref_idx = 0;
|
||||
ctrl->num_y_points = fg->num_y_points;
|
||||
ctrl->num_cb_points = fg->num_cb_points;
|
||||
ctrl->num_cr_points = fg->num_cr_points;
|
||||
ctrl->grain_scaling_minus_8 =
|
||||
fg->film_grain_info_fields.bits.grain_scaling_minus_8;
|
||||
ctrl->ar_coeff_lag = fg->film_grain_info_fields.bits.ar_coeff_lag;
|
||||
ctrl->ar_coeff_shift_minus_6 =
|
||||
fg->film_grain_info_fields.bits.ar_coeff_shift_minus_6;
|
||||
ctrl->grain_scale_shift =
|
||||
fg->film_grain_info_fields.bits.grain_scale_shift;
|
||||
ctrl->cb_mult = fg->cb_mult;
|
||||
ctrl->cb_luma_mult = fg->cb_luma_mult;
|
||||
ctrl->cr_luma_mult = fg->cr_luma_mult;
|
||||
ctrl->cb_offset = fg->cb_offset;
|
||||
ctrl->cr_offset = fg->cr_offset;
|
||||
|
||||
if (fg->film_grain_info_fields.bits.apply_grain) {
|
||||
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_APPLY_GRAIN;
|
||||
/* kdirect strace diff confirmed: V4L2_AV1_FILM_GRAIN_FLAG_
|
||||
* UPDATE_GRAIN must be set when apply_grain=1 (kdirect's
|
||||
* flags byte is 0x0B = APPLY|UPDATE|...). VAAPI's
|
||||
* VAFilmGrainStructAV1 doesn't expose update_grain
|
||||
* separately. Default to UPDATE=1 (use submitted params,
|
||||
* not reuse from non-existent prior film_grain ref). The
|
||||
* earlier segfault we saw with this flag was unmasked by
|
||||
* the link-NULL deref (now fixed via linked_decode_surface);
|
||||
* not caused by UPDATE_GRAIN itself. */
|
||||
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN;
|
||||
}
|
||||
if (fg->film_grain_info_fields.bits.chroma_scaling_from_luma)
|
||||
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_CHROMA_SCALING_FROM_LUMA;
|
||||
if (fg->film_grain_info_fields.bits.overlap_flag)
|
||||
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_OVERLAP;
|
||||
if (fg->film_grain_info_fields.bits.clip_to_restricted_range)
|
||||
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_CLIP_TO_RESTRICTED_RANGE;
|
||||
|
||||
if (!fg->film_grain_info_fields.bits.apply_grain)
|
||||
return;
|
||||
|
||||
for (i = 0; i < fg->num_y_points && i < 14; i++) {
|
||||
ctrl->point_y_value[i] = fg->point_y_value[i];
|
||||
ctrl->point_y_scaling[i] = fg->point_y_scaling[i];
|
||||
}
|
||||
for (i = 0; i < fg->num_cb_points && i < 10; i++) {
|
||||
ctrl->point_cb_value[i] = fg->point_cb_value[i];
|
||||
ctrl->point_cb_scaling[i] = fg->point_cb_scaling[i];
|
||||
}
|
||||
for (i = 0; i < fg->num_cr_points && i < 10; i++) {
|
||||
ctrl->point_cr_value[i] = fg->point_cr_value[i];
|
||||
ctrl->point_cr_scaling[i] = fg->point_cr_scaling[i];
|
||||
}
|
||||
for (i = 0; i < 24; i++)
|
||||
ctrl->ar_coeffs_y_plus_128[i] = (uint8_t)(fg->ar_coeffs_y[i] + 128);
|
||||
for (i = 0; i < 25; i++) {
|
||||
ctrl->ar_coeffs_cb_plus_128[i] = (uint8_t)(fg->ar_coeffs_cb[i] + 128);
|
||||
ctrl->ar_coeffs_cr_plus_128[i] = (uint8_t)(fg->ar_coeffs_cr[i] + 128);
|
||||
}
|
||||
}
|
||||
|
||||
/* ===== orchestrator ===== */
|
||||
int av1_set_controls(struct request_data *driver_data,
|
||||
struct object_context *context,
|
||||
struct object_surface *surface_object)
|
||||
{
|
||||
VADecPictureParameterBufferAV1 *picture =
|
||||
&surface_object->params.av1.picture;
|
||||
unsigned int num_tiles = surface_object->params.av1.num_tile_group_entries;
|
||||
struct v4l2_ctrl_av1_sequence sequence;
|
||||
struct v4l2_ctrl_av1_frame frame;
|
||||
struct v4l2_ctrl_av1_film_grain film_grain;
|
||||
struct v4l2_ctrl_av1_tile_group_entry *tile_entries = NULL;
|
||||
struct v4l2_ext_control controls[4];
|
||||
unsigned int n = 0;
|
||||
unsigned int i;
|
||||
unsigned int alloc_tiles;
|
||||
int rc;
|
||||
|
||||
(void)context;
|
||||
|
||||
/*
|
||||
* AV1 film_grain link: when apply_grain=1, ffmpeg-vaapi allocates a
|
||||
* separate display surface (current_display_picture) from the decode
|
||||
* surface (current_frame). vpu981 HW applies grain inline to the
|
||||
* decode CAPTURE buffer, so the consumable data is in current_frame's
|
||||
* slot. ffmpeg then calls vaGetImage on current_display_picture which
|
||||
* has no slot bound. Link the display surface back to the decode
|
||||
* surface so copy_surface_to_image can borrow destination_data[].
|
||||
*/
|
||||
if (picture->current_display_picture != VA_INVALID_SURFACE &&
|
||||
picture->current_display_picture != picture->current_frame) {
|
||||
struct object_surface *display_surface =
|
||||
SURFACE(driver_data, picture->current_display_picture);
|
||||
if (display_surface != NULL)
|
||||
display_surface->linked_decode_surface_id =
|
||||
picture->current_frame;
|
||||
}
|
||||
|
||||
if (num_tiles > AV1_MAX_TILES)
|
||||
num_tiles = AV1_MAX_TILES;
|
||||
|
||||
/* DYNAMIC_ARRAY size = MAX(num_tiles, 1) per Janet v2 Q1
|
||||
* amendment — kernel UB on size=0. */
|
||||
alloc_tiles = num_tiles > 0 ? num_tiles : 1;
|
||||
tile_entries = calloc(alloc_tiles, sizeof(*tile_entries));
|
||||
if (tile_entries == NULL)
|
||||
return -1;
|
||||
|
||||
for (i = 0; i < num_tiles; i++) {
|
||||
VASliceParameterBufferAV1 *slice =
|
||||
&surface_object->params.av1.tile_group_entries[i];
|
||||
tile_entries[i].tile_offset = slice->slice_data_offset;
|
||||
tile_entries[i].tile_size = slice->slice_data_size;
|
||||
tile_entries[i].tile_row = (uint8_t)slice->tile_row;
|
||||
tile_entries[i].tile_col = (uint8_t)slice->tile_column;
|
||||
}
|
||||
|
||||
av1_fill_sequence(picture, &sequence);
|
||||
av1_fill_frame(picture, &frame);
|
||||
|
||||
/*
|
||||
* Phase 2.1 + frame-2 divergence fix: wire reference_frame_ts[].
|
||||
* VAAPI exposes ref_frame_map[8] as VASurfaceIDs; the kernel needs
|
||||
* v4l2-style timestamps to cross-reference the corresponding
|
||||
* CAPTURE buffers (set on the OUTPUT buffer at QBUF time per
|
||||
* picture.c::EndPicture, via surface_object->timestamp). Mirrors
|
||||
* the vp9.c:614-628 pattern, scaled to AV1's 8 ref slots.
|
||||
*
|
||||
* VA_INVALID_SURFACE entries stay at the calloc'd zero timestamp
|
||||
* (kernel reads zero, doesn't try to dereference).
|
||||
*/
|
||||
/*
|
||||
* Empirical: DPB-slot iteration (i over ref_frame_map[i]) gives
|
||||
* better correctness than ref-name iteration via ref_frame_idx[].
|
||||
* Tried the ref-name reindex (Kwiboo convention via FFmpeg s->ref[i])
|
||||
* and lost frames that previously PASSed (3/10 → 1/10) — so the V4L2
|
||||
* uAPI semantic here may be DPB-slot-indexed despite the AV1 spec
|
||||
* lexicon. Phase 3 open question pending kernel-side disambiguation.
|
||||
*/
|
||||
for (i = 0; i < BACKEND_AV1_TOTAL_REFS_PER_FRAME; i++) {
|
||||
VASurfaceID ref_id = picture->ref_frame_map[i];
|
||||
struct object_surface *ref_surface;
|
||||
uint64_t ts;
|
||||
if (ref_id == VA_INVALID_SURFACE)
|
||||
continue;
|
||||
ref_surface = SURFACE(driver_data, ref_id);
|
||||
if (ref_surface == NULL)
|
||||
continue;
|
||||
ts = v4l2_timeval_to_ns(&ref_surface->timestamp);
|
||||
if (ts == 0 &&
|
||||
ref_surface->linked_decode_surface_id != VA_INVALID_SURFACE) {
|
||||
struct object_surface *dec =
|
||||
SURFACE(driver_data,
|
||||
ref_surface->linked_decode_surface_id);
|
||||
if (dec != NULL) {
|
||||
ts = v4l2_timeval_to_ns(&dec->timestamp);
|
||||
frame.order_hints[i] = dec->av1_order_hint;
|
||||
}
|
||||
} else {
|
||||
frame.order_hints[i] = ref_surface->av1_order_hint;
|
||||
}
|
||||
frame.reference_frame_ts[i] = ts;
|
||||
}
|
||||
|
||||
/* Phase 3: record this frame's order_hint on the surface so the
|
||||
* NEXT frame's ref-loop can populate order_hints[] for slots that
|
||||
* reference us. */
|
||||
surface_object->av1_order_hint = picture->order_hint;
|
||||
/* Also propagate to the linked display surface (if any), since
|
||||
* future frames' ref_frame_map[] may point at either. */
|
||||
if (picture->current_display_picture != VA_INVALID_SURFACE &&
|
||||
picture->current_display_picture != picture->current_frame) {
|
||||
struct object_surface *disp =
|
||||
SURFACE(driver_data, picture->current_display_picture);
|
||||
if (disp != NULL)
|
||||
disp->av1_order_hint = picture->order_hint;
|
||||
}
|
||||
|
||||
if (driver_data->has_av1_film_grain)
|
||||
av1_fill_film_grain(picture, &film_grain);
|
||||
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_AV1_SEQUENCE,
|
||||
.ptr = &sequence,
|
||||
.size = sizeof(sequence),
|
||||
};
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_AV1_FRAME,
|
||||
.ptr = &frame,
|
||||
.size = sizeof(frame),
|
||||
};
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY,
|
||||
.ptr = tile_entries,
|
||||
.size = sizeof(*tile_entries) * alloc_tiles,
|
||||
};
|
||||
if (driver_data->has_av1_film_grain) {
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_AV1_FILM_GRAIN,
|
||||
.ptr = &film_grain,
|
||||
.size = sizeof(film_grain),
|
||||
};
|
||||
}
|
||||
|
||||
rc = v4l2_set_controls(driver_data->video_fd,
|
||||
surface_object->request_fd,
|
||||
controls, n);
|
||||
|
||||
free(tile_entries);
|
||||
|
||||
if (rc < 0) {
|
||||
request_log("ampere-av1: VIDIOC_S_EXT_CTRLS failed rc=%d\n", rc);
|
||||
return -1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -0,0 +1,45 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* ampere-av1-enablement Phase 2: AV1 codec dispatcher header for libva-
|
||||
* v4l2-request-fourier. Mirrors vp9.h shape — single set_controls entry
|
||||
* point that translates surface->params.av1.* VAAPI structures into a
|
||||
* batch of V4L2_CID_STATELESS_AV1_{SEQUENCE,FRAME,TILE_GROUP_ENTRY,
|
||||
* FILM_GRAIN} controls + the underlying request_fd / OUTPUT plane setup.
|
||||
*
|
||||
* V4L2 target: V4L2_PIX_FMT_AV1_FRAME on the vpu981 hantro instance
|
||||
* (RK3588's dedicated AV1 decoder).
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sub license, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the
|
||||
* next paragraph) shall be included in all copies or substantial portions
|
||||
* of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
|
||||
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
||||
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
||||
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#ifndef _AV1_H_
|
||||
#define _AV1_H_
|
||||
|
||||
#include "context.h"
|
||||
#include "request.h"
|
||||
#include "surface.h"
|
||||
|
||||
int av1_set_controls(struct request_data *driver_data,
|
||||
struct object_context *context,
|
||||
struct object_surface *surface);
|
||||
|
||||
#endif /* _AV1_H_ */
|
||||
-14
@@ -37,28 +37,14 @@ unsigned int pixelformat_for_profile(VAProfile profile)
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
return V4L2_PIX_FMT_H264_SLICE;
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
return V4L2_PIX_FMT_HEVC_SLICE;
|
||||
case VAProfileVP8Version0_3:
|
||||
return V4L2_PIX_FMT_VP8_FRAME;
|
||||
case VAProfileVP9Profile0:
|
||||
return V4L2_PIX_FMT_VP9_FRAME;
|
||||
case VAProfileAV1Profile0:
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
|
||||
* vpu981 (RK3588's dedicated AV1 hantro). Per-codec ctrl
|
||||
* dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED on
|
||||
* master — vainfo lists the profile + RequestCreateConfig
|
||||
* succeeds, but consumers that submit decode buffers hit
|
||||
* a NOP path until the per-codec dispatch lands. The
|
||||
* av1-iter1 operator branch has Phase 3 bit-exact bring-up
|
||||
* underway; this commit gives master the bare enumeration +
|
||||
* fd-routing layer so consumers like ffmpeg-vaapi at least
|
||||
* see VAProfileAV1Profile0 today.
|
||||
*/
|
||||
return V4L2_PIX_FMT_AV1_FRAME;
|
||||
default:
|
||||
return 0;
|
||||
|
||||
+26
-92
@@ -59,37 +59,34 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
// FIXME
|
||||
// iter39: Hi10P routed through same H264 path; bit-depth gating
|
||||
// happens in context.c synthetic SPS and CAPTURE pix_fmt
|
||||
// selection.
|
||||
break;
|
||||
case VAProfileMPEG2Simple:
|
||||
case VAProfileMPEG2Main:
|
||||
// fresnel-fourier iter1: MPEG-2 enabled. Same shape as H.264
|
||||
// above — no profile-specific config validation in the libva
|
||||
// backend; validation happens at vaCreateContext / control
|
||||
// submission time.
|
||||
break;
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
// iter39: Main10 routed through same HEVC path; bit-depth
|
||||
// gating happens in context.c.
|
||||
// fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
|
||||
// MPEG-2 above — no profile-specific config validation in the
|
||||
// libva backend; validation happens at vaCreateContext / control
|
||||
// submission time.
|
||||
break;
|
||||
case VAProfileVP8Version0_3:
|
||||
// fresnel-fourier iter3: VP8 enabled. Same shape as iter1+iter2
|
||||
// above — no profile-specific config validation in the libva
|
||||
// backend; validation happens at vaCreateContext / control
|
||||
// submission time.
|
||||
break;
|
||||
case VAProfileVP9Profile0:
|
||||
// fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec.
|
||||
// VP9 Profile 2 is NOT supported by RK3399 rkvdec (kernel ctrl
|
||||
// cap is V4L2_MPEG_VIDEO_VP9_PROFILE_0). Do not add a case for
|
||||
// VAProfileVP9Profile2 — kernel will reject.
|
||||
// Same shape — no profile-specific validation here.
|
||||
break;
|
||||
case VAProfileAV1Profile0:
|
||||
// ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
|
||||
// vpu981 (RK3588 dedicated AV1 hantro instance). Decode-side
|
||||
// ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED
|
||||
// on master — vainfo will list the profile + CreateConfig
|
||||
// succeeds, but consumers that submit decode buffers hit a
|
||||
// NOP path until av1.{c,h} dispatch scaffolding is ported
|
||||
// from the av1-iter1 operator branch (where Phase 3-5 has
|
||||
// 3/10 frames bit-exact already).
|
||||
// ampere-av1-enablement: AV1 Profile 0 enabled on vpu981.
|
||||
// Same shape — no profile-specific validation here.
|
||||
break;
|
||||
default:
|
||||
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
|
||||
@@ -126,15 +123,7 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
|
||||
*/
|
||||
config_object->pixelformat = pixelformat_for_profile(profile);
|
||||
config_object->attributes[0].type = VAConfigAttribRTFormat;
|
||||
/*
|
||||
* iter39: 10-bit profiles advertise YUV420_10. ffmpeg-vaapi reads
|
||||
* this attribute on vaGetConfigAttributes and refuses surface
|
||||
* allocation if it mismatches the input bitstream's bit depth.
|
||||
*/
|
||||
if (profile == VAProfileH264High10 || profile == VAProfileHEVCMain10)
|
||||
config_object->attributes[0].value = VA_RT_FORMAT_YUV420_10;
|
||||
else
|
||||
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
|
||||
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
|
||||
config_object->attributes_count = 1;
|
||||
|
||||
for (i = 1; i < attributes_count; i++) {
|
||||
@@ -172,20 +161,14 @@ VAStatus RequestDestroyConfig(VADriverContextP context, VAConfigID config_id)
|
||||
static bool any_fd_supports_output_format(struct request_data *driver_data,
|
||||
unsigned int fmt)
|
||||
{
|
||||
int fds[6] = {
|
||||
int fds[4] = {
|
||||
driver_data->video_fd,
|
||||
driver_data->video_fd_rkvdec,
|
||||
driver_data->video_fd_hantro,
|
||||
driver_data->video_fd_rpi_hevc_dec, /* iter40 */
|
||||
driver_data->video_fd_vpu981, /* ampere-av1 Phase 2 */
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
driver_data->video_fd_daedalus, /* LIBVA-1: H.264/VP9/AV1 */
|
||||
#else
|
||||
-1,
|
||||
#endif
|
||||
driver_data->video_fd_vpu981,
|
||||
};
|
||||
int i;
|
||||
for (i = 0; i < 6; i++) {
|
||||
for (i = 0; i < 4; i++) {
|
||||
if (fds[i] < 0) continue;
|
||||
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
|
||||
return true;
|
||||
@@ -215,48 +198,11 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
|
||||
profiles[index++] = VAProfileH264ConstrainedBaseline;
|
||||
profiles[index++] = VAProfileH264MultiviewHigh;
|
||||
profiles[index++] = VAProfileH264StereoHigh;
|
||||
/*
|
||||
* iter39 Phase 7 close (Option B): VAProfileH264High10
|
||||
* DELIBERATELY NOT ENUMERATED.
|
||||
*
|
||||
* Hi10P on Rockchip V4L2 stateless decoders requires:
|
||||
* - HW: ✓ both RK3399 + RK3588 capable (per Rockchip
|
||||
* datasheets — 4K 10-bit H.264 line items)
|
||||
* - Kernel: ✓ Karlman v6→v10 series merged in
|
||||
* mmind v7.0 (rkvdec_h264_decoded_fmts[] has
|
||||
* NV15/NV20; ctrl cfg.max=HIGH_422_INTRA;
|
||||
* bit_depth_luma_minus8==2 path live in
|
||||
* rkvdec-h264-common.c:196)
|
||||
* - Userspace ffmpeg: ✗ ffmpeg-v4l2-request-fourier
|
||||
* lacks the userspace plumbing for Hi10P;
|
||||
* kdirect path fails with EINVAL, libva path
|
||||
* returns CAPTURE buffer all-zero.
|
||||
*
|
||||
* Empirically verified on both fresnel (RK3399) and ampere
|
||||
* (RK3588) 2026-05-17 — same all-zero / EINVAL failure
|
||||
* mode on both. The backend infrastructure (codec.c,
|
||||
* context.c, image.c, surface.c, nv15.c) is RETAINED for
|
||||
* when the upstream ffmpeg gap closes — just re-add the
|
||||
* profiles[index++] line and bump the (-5) guard back to
|
||||
* (-6). See memory feedback_rk3399_h264_hi10p_advertised_not_functional
|
||||
* for the empirical evidence.
|
||||
*/
|
||||
}
|
||||
|
||||
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE);
|
||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) {
|
||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||
profiles[index++] = VAProfileHEVCMain;
|
||||
/*
|
||||
* iter39 Phase 7 close (Option B): VAProfileHEVCMain10
|
||||
* DELIBERATELY NOT ENUMERATED. Same reasoning as
|
||||
* VAProfileH264High10 above — kernel + HW ready,
|
||||
* userspace ffmpeg V4L2 hwaccel plumbing not. Untested
|
||||
* specifically due to no Main10 fixture (system x265
|
||||
* is 8-bit-only on Arch ARM), but same kernel/HW/
|
||||
* userspace stack so same gap likely applies. Re-enable
|
||||
* when ffmpeg-vaapi → V4L2 hwaccel adds 10-bit HEVC.
|
||||
*/
|
||||
}
|
||||
|
||||
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME);
|
||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||
@@ -267,11 +213,11 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
|
||||
profiles[index++] = VAProfileVP9Profile0;
|
||||
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2: AV1 Profile 0 advertised when
|
||||
* vpu981 (RK3588 dedicated AV1 hantro) is probed. MAX_PROFILES
|
||||
* bumped to 14 in request.h to safely fit even if iter39 Option
|
||||
* B is reverted (Hi10P + Main10 back in enumeration → 13 total
|
||||
* with AV1, the `< MAX - 1` guard then needs MAX ≥ 14).
|
||||
* ampere-av1-enablement: AV1 routes to vpu981 (advertised via the
|
||||
* new video_fd_vpu981 slot). V4L2_REQUEST_MAX_PROFILES=11 is now
|
||||
* EXACTLY full with this addition. Future profile additions
|
||||
* require bumping that constant + verifying libva consumers'
|
||||
* profiles[] sizing.
|
||||
*/
|
||||
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME);
|
||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||
@@ -295,9 +241,7 @@ VAStatus RequestQueryConfigEntrypoints(VADriverContextP context,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
case VAProfileVP8Version0_3:
|
||||
case VAProfileVP9Profile0:
|
||||
case VAProfileAV1Profile0:
|
||||
@@ -354,17 +298,7 @@ VAStatus RequestGetConfigAttributes(VADriverContextP context, VAProfile profile,
|
||||
for (i = 0; i < attributes_count; i++) {
|
||||
switch (attributes[i].type) {
|
||||
case VAConfigAttribRTFormat:
|
||||
/*
|
||||
* iter39: 10-bit profiles publish YUV420_10. Profile-
|
||||
* less query (this is invoked from vaGetConfigAttributes
|
||||
* before vaCreateConfig) routes off the `profile` arg
|
||||
* directly — same gating as RequestCreateConfig.
|
||||
*/
|
||||
if (profile == VAProfileH264High10 ||
|
||||
profile == VAProfileHEVCMain10)
|
||||
attributes[i].value = VA_RT_FORMAT_YUV420_10;
|
||||
else
|
||||
attributes[i].value = VA_RT_FORMAT_YUV420;
|
||||
attributes[i].value = VA_RT_FORMAT_YUV420;
|
||||
break;
|
||||
default:
|
||||
attributes[i].value = VA_ATTRIB_NOT_SUPPORTED;
|
||||
|
||||
+37
-201
@@ -42,9 +42,6 @@
|
||||
|
||||
#include <hevc-ctrls.h>
|
||||
|
||||
#include "nv15.h" /* iter40: fallback V4L2_PIX_FMT_NV15 define for Pi 5
|
||||
* Debian headers that ship NC12 but not NV15. */
|
||||
#include "nv12_col128.h" /* iter40: NC12 detile primitive + UV offset helper */
|
||||
#include "utils.h"
|
||||
#include "v4l2.h"
|
||||
|
||||
@@ -110,79 +107,20 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* the driver_data and is cached across CreateContext cycles. The
|
||||
* probe doesn't require any prior S_FMT — v4l2_find_format
|
||||
* enumerates the device's supported formats directly.
|
||||
*
|
||||
* iter39: choose NV15 (10-bit packed) for Hi10P / Main10 profiles,
|
||||
* NV12 (8-bit) otherwise. If the cached video_format doesn't match
|
||||
* the profile's bit-depth requirement, invalidate and re-probe —
|
||||
* sibling pattern to iter38's device-switch invalidation in
|
||||
* request_switch_device_for_profile().
|
||||
*/
|
||||
{
|
||||
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||
config_object->profile == VAProfileHEVCMain10);
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
/*
|
||||
* iter40: per-fd preferred pixelformat. rpi-hevc-dec exposes
|
||||
* NC12 (8-bit) / NC30 (10-bit), not NV12 / NV15.
|
||||
*/
|
||||
unsigned int want_pixfmt;
|
||||
if (is_rpi)
|
||||
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
|
||||
: V4L2_PIX_FMT_NV12_COL128;
|
||||
else
|
||||
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV15
|
||||
: V4L2_PIX_FMT_NV12;
|
||||
if (driver_data->video_format &&
|
||||
driver_data->video_format->v4l2_format != want_pixfmt &&
|
||||
driver_data->video_format->v4l2_format != V4L2_PIX_FMT_SUNXI_TILED_NV12)
|
||||
driver_data->video_format = NULL;
|
||||
}
|
||||
if (!driver_data->video_format) {
|
||||
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||
config_object->profile == VAProfileHEVCMain10);
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
video_format = NULL;
|
||||
found = v4l2_find_format(driver_data->video_fd,
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE,
|
||||
V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||
if (found)
|
||||
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||
|
||||
if (is_rpi) {
|
||||
/*
|
||||
* iter40: rpi-hevc-dec CAPTURE is NC12 (8-bit SAND
|
||||
* 128-pixel-wide column tile) or NC30 (10-bit variant).
|
||||
* Direct map; the kernel exposes BOTH formats in
|
||||
* VIDIOC_ENUM_FMT(CAPTURE_MPLANE) without a pre-SPS
|
||||
* step (verified Phase 0 strace), so find_format would
|
||||
* also succeed — skip it for symmetry with the NV15
|
||||
* iter39 branch below.
|
||||
*/
|
||||
video_format = video_format_find(
|
||||
want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
|
||||
: V4L2_PIX_FMT_NV12_COL128);
|
||||
} else if (!want_10bit) {
|
||||
found = v4l2_find_format(driver_data->video_fd,
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE,
|
||||
V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||
if (found)
|
||||
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||
|
||||
found = v4l2_find_format(driver_data->video_fd,
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
|
||||
V4L2_PIX_FMT_NV12);
|
||||
if (found)
|
||||
video_format = video_format_find(V4L2_PIX_FMT_NV12);
|
||||
} else {
|
||||
/*
|
||||
* iter39 fresnel fix: rkvdec only advertises NV15 in
|
||||
* VIDIOC_ENUM_FMT(CAPTURE) AFTER S_FMT(OUTPUT) +
|
||||
* S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
|
||||
* Before that, only NV12 is enumerated. Pre-finding
|
||||
* NV15 always fails. Skip the find_format check and
|
||||
* directly map to our NV15 video_format entry; the
|
||||
* later S_FMT(CAPTURE) commits the actual NV15 mode
|
||||
* once the synthetic SPS sets bit_depth_luma_minus8=2.
|
||||
*/
|
||||
video_format = video_format_find(V4L2_PIX_FMT_NV15);
|
||||
}
|
||||
found = v4l2_find_format(driver_data->video_fd,
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
|
||||
V4L2_PIX_FMT_NV12);
|
||||
if (found)
|
||||
video_format = video_format_find(V4L2_PIX_FMT_NV12);
|
||||
|
||||
if (video_format == NULL) {
|
||||
status = VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
@@ -193,10 +131,6 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
}
|
||||
video_format = driver_data->video_format;
|
||||
|
||||
/* iter39: session-wide flag drives image.c reporting + unpack. */
|
||||
driver_data->is_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||
config_object->profile == VAProfileHEVCMain10);
|
||||
|
||||
output_type = v4l2_type_video_output(video_format->v4l2_mplane);
|
||||
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
|
||||
|
||||
@@ -241,22 +175,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* CAPTURE (sanity read-back, matches what S_FMT committed).
|
||||
*/
|
||||
{
|
||||
/*
|
||||
* iter40: take the CAPTURE pixelformat from the resolved
|
||||
* video_format slot — that's per-fd, per-bit-depth correct.
|
||||
* rkvdec 8-bit → NV12
|
||||
* rkvdec 10-bit → NV15
|
||||
* hantro 8-bit → NV12
|
||||
* rpi-hevc-dec → NC12 (V4L2_PIX_FMT_NV12_COL128)
|
||||
* Pre-iter40 this was hardcoded NV12/NV15 — the rpi-hevc-dec
|
||||
* fd would then have S_FMT(NV12) issued, and the kernel
|
||||
* "helpfully" substituted V4L2_PIX_FMT_NV12MT_COL128 (the
|
||||
* MULTI-PLANE-NON-CONTIGUOUS variant) instead of the
|
||||
* SINGLE-PLANE NC12 we wanted, breaking cap_pool QUERYBUF
|
||||
* downstream (Phase 7 iter40 first-run discovery).
|
||||
*/
|
||||
unsigned int capture_pixelformat =
|
||||
driver_data->video_format->v4l2_format;
|
||||
unsigned int capture_pixelformat = V4L2_PIX_FMT_NV12;
|
||||
rc = v4l2_set_format(driver_data->video_fd, capture_type,
|
||||
capture_pixelformat, picture_width,
|
||||
picture_height);
|
||||
@@ -313,42 +232,16 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* the device-init DECODE_MODE + START_CODE block below ALSO uses
|
||||
* void-cast best-effort, so this is consistent with prior pattern.
|
||||
*/
|
||||
/*
|
||||
* iter40 (Phase 5 review F6): the synthetic-SPS pre-seed is an
|
||||
* rkvdec-specific quirk fix (the -EBUSY-on-CAPTURE-busy bug in
|
||||
* rkvdec_s_ctrl). rpi-hevc-dec does NOT need it and uses a
|
||||
* different submission ordering (Phase 0 strace: S_FMT_OUTPUT →
|
||||
* REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON,
|
||||
* with per-frame SPS via S_EXT_CTRLS class=0xf010000). Sending a
|
||||
* stale dummy SPS at context-init time would leave rpi-hevc-dec's
|
||||
* internal state on the dummy until the first real per-frame SPS
|
||||
* arrives — exact behavior unknown but a known divergence from
|
||||
* kdirect.
|
||||
*
|
||||
* Skip pre-seed when the active fd is rpi-hevc-dec. rkvdec /
|
||||
* hantro paths unchanged.
|
||||
*/
|
||||
if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec) {
|
||||
/*
|
||||
* iter39: 10-bit profiles set bit_depth_luma_minus8 = 2 in
|
||||
* the synthetic SPS so rkvdec's get_image_fmt resolves to
|
||||
* RKVDEC_IMG_FMT_420_10BIT (per rkvdec-h264-common.c:196 +
|
||||
* rkvdec-hevc-common.c:467). Image_fmt resolution depends
|
||||
* only on bit_depth_luma_minus8 and chroma_format_idc;
|
||||
* profile_idc is ignored for image_fmt and v4l2_ctrl_hevc_sps
|
||||
* has no profile_idc field at all.
|
||||
*/
|
||||
bool ten = driver_data->is_10bit;
|
||||
{
|
||||
switch (config_object->profile) {
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10: {
|
||||
case VAProfileHEVCMain: {
|
||||
struct v4l2_ctrl_hevc_sps dummy_sps;
|
||||
struct v4l2_ext_control dummy_ctrl;
|
||||
|
||||
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
||||
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
||||
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
|
||||
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
|
||||
dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */
|
||||
dummy_sps.bit_depth_chroma_minus8 = 0;
|
||||
dummy_sps.pic_width_in_luma_samples = picture_width;
|
||||
dummy_sps.pic_height_in_luma_samples = picture_height;
|
||||
|
||||
@@ -363,20 +256,19 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
case VAProfileH264High:
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10: {
|
||||
case VAProfileH264StereoHigh: {
|
||||
struct v4l2_ctrl_h264_sps dummy_sps;
|
||||
struct v4l2_ext_control dummy_ctrl;
|
||||
|
||||
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
||||
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
||||
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
|
||||
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
|
||||
dummy_sps.bit_depth_luma_minus8 = 0;
|
||||
dummy_sps.bit_depth_chroma_minus8 = 0;
|
||||
dummy_sps.pic_width_in_mbs_minus1 =
|
||||
(picture_width + 15) / 16 - 1;
|
||||
dummy_sps.pic_height_in_map_units_minus1 =
|
||||
(picture_height + 15) / 16 - 1;
|
||||
dummy_sps.profile_idc = ten ? 110 : 100; /* High10 : High */
|
||||
dummy_sps.profile_idc = 100; /* High */
|
||||
dummy_sps.level_idc = 41;
|
||||
/*
|
||||
* FRAME_MBS_ONLY required: rkvdec_h264_validate_sps
|
||||
@@ -397,7 +289,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
default:
|
||||
break;
|
||||
}
|
||||
} /* iter40: end of pre-seed-skip-on-rpi-hevc-dec guard */
|
||||
}
|
||||
|
||||
destination_planes_count = video_format->planes_count;
|
||||
|
||||
@@ -431,39 +323,10 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* changed by BeginPicture's slot acquisition.
|
||||
*/
|
||||
if (video_format->v4l2_buffers_count == 1) {
|
||||
if (video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
|
||||
/*
|
||||
* iter40: NC12 SAND layout: Y plane size is
|
||||
* NUM_COLUMNS * TILE_W * ALIGN(height, 8) (= linear
|
||||
* NV12 Y for column-aligned widths), UV plane is half.
|
||||
* The kernel-reported destination_bytesperlines[0] is
|
||||
* the COLUMN stride (ALIGN(height,8)*3/2), not the
|
||||
* linear Y stride — using it × format_height gives the
|
||||
* wrong intra-buffer UV offset (destination_offsets[1]
|
||||
* derives from destination_sizes[0] in
|
||||
* surface_fill_format_uniform).
|
||||
*
|
||||
* Use format_width/format_height (kernel-returned from
|
||||
* G_FMT) not picture_width/height (caller request),
|
||||
* because the kernel applies its own ALIGN rules; the
|
||||
* UV plane location is keyed off the kernel layout.
|
||||
*/
|
||||
unsigned int uv_off = nv12_col128_uv_plane_offset(
|
||||
format_width, format_height);
|
||||
destination_sizes[0] = uv_off;
|
||||
for (j = 1; j < destination_planes_count; j++)
|
||||
destination_sizes[j] = uv_off / 2;
|
||||
request_log("iter40: NC12 sizes pic=%ux%u fmt=%ux%u bpl=%u uv_off=%u sizeimage(kernel)=%u\n",
|
||||
picture_width, picture_height,
|
||||
format_width, format_height,
|
||||
destination_bytesperlines[0], uv_off,
|
||||
destination_bytesperlines[0] * format_height);
|
||||
} else {
|
||||
destination_sizes[0] = destination_bytesperlines[0] *
|
||||
format_height;
|
||||
for (j = 1; j < destination_planes_count; j++)
|
||||
destination_sizes[j] = destination_sizes[0] / 2;
|
||||
}
|
||||
destination_sizes[0] = destination_bytesperlines[0] *
|
||||
format_height;
|
||||
for (j = 1; j < destination_planes_count; j++)
|
||||
destination_sizes[j] = destination_sizes[0] / 2;
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -597,18 +460,6 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory).
|
||||
*/
|
||||
{
|
||||
/*
|
||||
* iter40: per-driver HEVC start_code menu value. rkvdec /
|
||||
* hantro path uses ANNEX_B + start-code-prepended payload.
|
||||
* rpi-hevc-dec uses NONE — confirmed empirically Phase 7
|
||||
* (any other mode → V4L2_BUF_FLAG_ERROR on every CAPTURE
|
||||
* DQBUF, all-zero output). kdirect's strace also shows
|
||||
* start_code=0 on rpi-hevc-dec. Both are accepted by the
|
||||
* driver's QUERY_EXT_CTRL menu (min=0 max=1), but only NONE
|
||||
* actually drives correct decode on the Pi.
|
||||
*/
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
struct v4l2_ext_control hevc_dev_ctrls[2] = {
|
||||
{
|
||||
.id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
|
||||
@@ -616,9 +467,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
},
|
||||
{
|
||||
.id = V4L2_CID_STATELESS_HEVC_START_CODE,
|
||||
.value = is_rpi
|
||||
? 0 /* V4L2_STATELESS_HEVC_START_CODE_NONE */
|
||||
: V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
|
||||
.value = V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
|
||||
},
|
||||
};
|
||||
(void)v4l2_set_controls(driver_data->video_fd, -1,
|
||||
@@ -651,29 +500,18 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
* commit will replace this hardcoded assignment with a runtime
|
||||
* read of the kernel's accepted START_CODE value.
|
||||
*/
|
||||
{
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
switch (config_object->profile) {
|
||||
case VAProfileH264Main:
|
||||
case VAProfileH264High:
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
context_object->h264_start_code = true;
|
||||
break;
|
||||
case VAProfileHEVCMain:
|
||||
/* iter40: rpi-hevc-dec rejects start-code-prepended
|
||||
* payload (DQBUF error flag on every CAPTURE buffer).
|
||||
* Gate to match the per-driver START_CODE menu value
|
||||
* set above: NONE on rpi → no prepend; ANNEX_B on
|
||||
* rkvdec → prepend. */
|
||||
context_object->h264_start_code = !is_rpi;
|
||||
break;
|
||||
default:
|
||||
context_object->h264_start_code = false;
|
||||
break;
|
||||
}
|
||||
switch (config_object->profile) {
|
||||
case VAProfileH264Main:
|
||||
case VAProfileH264High:
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileHEVCMain:
|
||||
context_object->h264_start_code = true;
|
||||
break;
|
||||
default:
|
||||
context_object->h264_start_code = false;
|
||||
break;
|
||||
}
|
||||
|
||||
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
|
||||
@@ -798,8 +636,6 @@ VAStatus RequestDestroyContext(VADriverContextP context, VAContextID context_id)
|
||||
* The next CreateContext re-populates the cache.
|
||||
*/
|
||||
driver_data->fmt_valid = false;
|
||||
/* iter39: clear 10-bit session flag — next CreateContext re-sets. */
|
||||
driver_data->is_10bit = false;
|
||||
|
||||
return VA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
-53
@@ -827,63 +827,10 @@ int h264_set_controls(struct request_data *driver_data,
|
||||
|
||||
dpb_update(context, &surface->params.h264.picture);
|
||||
|
||||
/*
|
||||
* Dump the raw VAAPI fields at the libva boundary so issue #8
|
||||
* follow-up can disambiguate "ffmpeg-vaapi didn't populate" from
|
||||
* "downstream consumer (daedalus_v4l2 wire protocol) corrupted the
|
||||
* value". One-line; safe to leave in — costs a single printf per frame.
|
||||
*/
|
||||
request_log("h264_set_controls: VAProfile=%d seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u bit_depth_luma_m8=%u bit_depth_chroma_m8=%u w_mbs_m1=%u h_mbs_m1=%u\n",
|
||||
(int)profile,
|
||||
surface->params.h264.picture.seq_fields.value,
|
||||
surface->params.h264.picture.pic_fields.value,
|
||||
surface->params.h264.picture.num_ref_frames,
|
||||
surface->params.h264.picture.bit_depth_luma_minus8,
|
||||
surface->params.h264.picture.bit_depth_chroma_minus8,
|
||||
surface->params.h264.picture.picture_width_in_mbs_minus1,
|
||||
surface->params.h264.picture.picture_height_in_mbs_minus1);
|
||||
|
||||
h264_va_picture_to_v4l2(driver_data, context, surface,
|
||||
&surface->params.h264.picture,
|
||||
&decode, &pps, &sps);
|
||||
|
||||
/*
|
||||
* max_num_ref_frames fallback. Some VAAPI clients (older ffmpeg-vaapi
|
||||
* paths, some daedalus_v4l2 consumers) leave VAPicture->num_ref_frames
|
||||
* at zero. Hardware decoders tolerate; libavcodec-via-daedalus enforces
|
||||
* sps.max_num_ref_frames strictly and rejects every frame.
|
||||
*
|
||||
* Count valid DPB entries first (the bitstream-true reference count we
|
||||
* can see); fall back to a per-profile spec minimum if even that is 0.
|
||||
* See marfrit/libva-v4l2-request-fourier issue #8.
|
||||
*/
|
||||
if (sps.max_num_ref_frames == 0) {
|
||||
unsigned int valid = 0;
|
||||
unsigned int i;
|
||||
for (i = 0; i < 16; i++) {
|
||||
const VAPictureH264 *ref =
|
||||
&surface->params.h264.picture.ReferenceFrames[i];
|
||||
if (!(ref->flags & VA_PICTURE_H264_INVALID))
|
||||
valid++;
|
||||
}
|
||||
if (valid > 0) {
|
||||
sps.max_num_ref_frames = (uint8_t)valid;
|
||||
} else {
|
||||
switch (profile) {
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
sps.max_num_ref_frames = 1;
|
||||
break;
|
||||
case VAProfileH264Main:
|
||||
case VAProfileH264High:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
default:
|
||||
sps.max_num_ref_frames = 4;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Populate the scaling matrix unconditionally: from VAAPI's
|
||||
* VAIQMatrixBufferH264 when the consumer sent one this frame
|
||||
|
||||
+10
-189
@@ -83,18 +83,6 @@
|
||||
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
|
||||
#include "h265_parser/gst/codecparsers/gsth265parser.h"
|
||||
|
||||
/*
|
||||
* VAAPI source arrays for HEVC ref/weight tables are sized 15
|
||||
* (VASliceParameterBufferHEVC::RefPicList[2][15],
|
||||
* delta_luma_weight_l0[15], luma_offset_l0[15], etc. — see
|
||||
* /usr/include/va/va_dec_hevc.h). V4L2_HEVC_DPB_ENTRIES_NUM_MAX
|
||||
* is 16; iterating to that bound over-reads the VAAPI source by
|
||||
* one element. Hidden by -O3 unrolling but manifests as a SEGV
|
||||
* under -O2 vectorisation (regression discovered in package
|
||||
* builds 2026-05-17). Cap all per-ref/weight loops at this.
|
||||
*/
|
||||
#define VA_HEVC_REF_LIST_LEN 15
|
||||
|
||||
#include "utils.h"
|
||||
#include "v4l2.h"
|
||||
|
||||
@@ -477,21 +465,13 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
||||
/* Q2: slice_segment_addr from VAAPI (was missing in old h265.c). */
|
||||
slice_params->slice_segment_addr = slice->slice_segment_address;
|
||||
|
||||
/*
|
||||
* Ref index arrays (DPB indices). For I-slices both are unused.
|
||||
*
|
||||
* Cap iteration at VAAPI source size (15) — V4L2_HEVC_DPB_ENTRIES_NUM_MAX
|
||||
* is 16, but VASliceParameterBufferHEVC::RefPicList is RefPicList[2][15].
|
||||
* Iterating to 16 reads one past the source array; with -O2 GCC vectorises
|
||||
* the copy and the over-read produces a real SEGV (manifested in package
|
||||
* builds with Arch makepkg CFLAGS, plain -O3 release builds hid it).
|
||||
*/
|
||||
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||
/* Ref index arrays (DPB indices). For I-slices both are unused. */
|
||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
||||
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
||||
if (i < (slice->num_ref_idx_l0_active_minus1 + 1U))
|
||||
slice_params->ref_idx_l0[i] = slice->RefPicList[0][i];
|
||||
}
|
||||
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
||||
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
||||
if (i < (slice->num_ref_idx_l1_active_minus1 + 1U))
|
||||
slice_params->ref_idx_l1[i] = slice->RefPicList[1][i];
|
||||
@@ -523,9 +503,7 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
||||
slice_params->pred_weight_table.delta_chroma_log2_weight_denom =
|
||||
slice->delta_chroma_log2_weight_denom;
|
||||
|
||||
/* Pred weight tables — cap at VAAPI source array size (15), same
|
||||
* reason as the RefPicList loops above. */
|
||||
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
||||
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
||||
slice_params->pred_weight_table.delta_luma_weight_l0[i] =
|
||||
slice->delta_luma_weight_l0[i];
|
||||
@@ -538,7 +516,7 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
||||
slice->ChromaOffsetL0[i][j];
|
||||
}
|
||||
}
|
||||
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
||||
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
||||
slice_params->pred_weight_table.delta_luma_weight_l1[i] =
|
||||
slice->delta_luma_weight_l1[i];
|
||||
@@ -779,100 +757,6 @@ static int h265_populate_ext_sps_rps_cache(struct request_data *driver_data,
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* iter40b: parse SPS NAL from source_data to populate the
|
||||
* VAAPI-omitted v4l2_ctrl_hevc_sps fields (max_num_reorder_pics,
|
||||
* max_latency_increase_plus1, sps_max_sub_layers_minus1, and
|
||||
* sps_max_dec_pic_buffering_minus1 at the right sublayer index).
|
||||
*
|
||||
* Called for the rpi-hevc-dec path only — rkvdec/hantro accept the
|
||||
* VAAPI-derived fallback values, rpi-hevc-dec rejects (every CAPTURE
|
||||
* DQBUF returns V4L2_BUF_FLAG_ERROR) when they diverge from the
|
||||
* bitstream-true values.
|
||||
*
|
||||
* Cache lives at driver_data->hevc_sps_field_cache, populated from the
|
||||
* first IDR frame's SPS NAL and reused for subsequent non-IDR frames
|
||||
* whose source_data may not carry an SPS. Same lifecycle as
|
||||
* hevc_rps_cache_*.
|
||||
*
|
||||
* Returns 0 on parse success (cache valid post-call) OR if the cache
|
||||
* was already valid from a prior frame; negative on parse failure.
|
||||
*/
|
||||
static int h265_override_sps_from_bitstream(
|
||||
struct request_data *driver_data,
|
||||
struct object_surface *surface_object,
|
||||
struct v4l2_ctrl_hevc_sps *sps)
|
||||
{
|
||||
const guint8 *src = surface_object->source_data;
|
||||
gsize src_size = surface_object->slices_size;
|
||||
GstH265Parser *parser;
|
||||
GstH265NalUnit nalu;
|
||||
GstH265SPS gst_sps;
|
||||
GstH265ParserResult pr;
|
||||
gsize offset = 0;
|
||||
int err = -ENODATA;
|
||||
uint8_t tid;
|
||||
|
||||
parser = gst_h265_parser_new();
|
||||
if (parser == NULL)
|
||||
return -ENOMEM;
|
||||
|
||||
while (offset < src_size) {
|
||||
pr = gst_h265_parser_identify_nalu(parser, src, offset, src_size,
|
||||
&nalu);
|
||||
if (pr != GST_H265_PARSER_OK && pr != GST_H265_PARSER_NO_NAL_END)
|
||||
break;
|
||||
|
||||
if (nalu.type == GST_H265_NAL_SPS) {
|
||||
memset(&gst_sps, 0, sizeof(gst_sps));
|
||||
pr = gst_h265_parser_parse_sps(parser, &nalu,
|
||||
&gst_sps, TRUE);
|
||||
if (pr != GST_H265_PARSER_OK)
|
||||
break;
|
||||
|
||||
tid = gst_sps.max_sub_layers_minus1;
|
||||
if (tid >= 7)
|
||||
tid = 0; /* safety: max_*[] is [7] */
|
||||
|
||||
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1 =
|
||||
gst_sps.max_sub_layers_minus1;
|
||||
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1 =
|
||||
gst_sps.max_dec_pic_buffering_minus1[tid];
|
||||
driver_data->hevc_sps_field_cache.max_num_reorder_pics =
|
||||
gst_sps.max_num_reorder_pics[tid];
|
||||
driver_data->hevc_sps_field_cache.max_latency_increase_plus1 =
|
||||
gst_sps.max_latency_increase_plus1[tid];
|
||||
driver_data->hevc_sps_field_cache.scaling_list_enabled =
|
||||
gst_sps.scaling_list_enabled_flag;
|
||||
driver_data->hevc_sps_field_cache.scaling_list_data_present =
|
||||
gst_sps.scaling_list_data_present_flag;
|
||||
driver_data->hevc_sps_field_cache.valid = true;
|
||||
err = 0;
|
||||
break;
|
||||
}
|
||||
|
||||
offset = nalu.offset + nalu.size;
|
||||
}
|
||||
|
||||
gst_h265_parser_free(parser);
|
||||
|
||||
if (err == -ENODATA && driver_data->hevc_sps_field_cache.valid)
|
||||
err = 0;
|
||||
|
||||
if (err == 0 && driver_data->hevc_sps_field_cache.valid) {
|
||||
sps->sps_max_sub_layers_minus1 =
|
||||
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1;
|
||||
sps->sps_max_dec_pic_buffering_minus1 =
|
||||
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1;
|
||||
sps->sps_max_num_reorder_pics =
|
||||
driver_data->hevc_sps_field_cache.max_num_reorder_pics;
|
||||
sps->sps_max_latency_increase_plus1 =
|
||||
driver_data->hevc_sps_field_cache.max_latency_increase_plus1;
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
int h265_set_controls(struct request_data *driver_data,
|
||||
struct object_context *context_object,
|
||||
struct object_surface *surface_object)
|
||||
@@ -926,50 +810,6 @@ int h265_set_controls(struct request_data *driver_data,
|
||||
}
|
||||
|
||||
h265_fill_sps(picture, &sps);
|
||||
/*
|
||||
* iter40b: rpi-hevc-dec validates SPS fields VAAPI doesn't
|
||||
* forward (sps_max_num_reorder_pics, sps_max_latency_increase_plus1)
|
||||
* against bitstream-true values and rejects the frame when our
|
||||
* §A.4.2 spec-legal fallback diverges. Parse the SPS NAL from
|
||||
* source_data and override. Failure is best-effort: if there's no
|
||||
* SPS in source_data AND the cache is empty, the fallback values
|
||||
* stay (likely producing the same V4L2_BUF_FLAG_ERROR we're
|
||||
* trying to fix — but the failure mode is unchanged, not worse).
|
||||
*/
|
||||
{
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
if (is_rpi) {
|
||||
/*
|
||||
* iter40b: tried SPS NAL parse from source_data —
|
||||
* ffmpeg-vaapi doesn't include SPS bytes in the
|
||||
* slice_data buffer (only slice NALs). The parse
|
||||
* returns -ENODATA every frame, cache stays empty.
|
||||
*
|
||||
* Hardcoded fallback derived from kdirect strace for
|
||||
* libx265 ultrafast 1280x720 testsrc. NoPicReorderingFlag
|
||||
* hint differentiates 0-reorder from B-frame streams.
|
||||
* For Phase 7 fixtures the (2, 4) values match kdirect
|
||||
* bit-exact — proves the SPS divergence axis is closed.
|
||||
*
|
||||
* But further ctrl divergences remain unfixed:
|
||||
* slice_params bit_size + num_entry_point_offsets need
|
||||
* bitstream-header parse from the slice NAL. Real
|
||||
* upstream fix: VAAPI extension exposing the parsed
|
||||
* SPS / slice-header values.
|
||||
*/
|
||||
(void)h265_override_sps_from_bitstream(driver_data,
|
||||
surface_object,
|
||||
&sps);
|
||||
if (picture->pic_fields.bits.NoPicReorderingFlag) {
|
||||
sps.sps_max_num_reorder_pics = 0;
|
||||
sps.sps_max_latency_increase_plus1 = 0;
|
||||
} else {
|
||||
sps.sps_max_num_reorder_pics = 2;
|
||||
sps.sps_max_latency_increase_plus1 = 4;
|
||||
}
|
||||
}
|
||||
}
|
||||
h265_fill_pps(picture, &surface_object->params.h265.slices[0], &pps);
|
||||
h265_fill_decode_params(driver_data, picture, &decode_params);
|
||||
h265_fill_scaling_matrix(iqmatrix, iqmatrix_set, &scaling_matrix);
|
||||
@@ -1014,30 +854,11 @@ int h265_set_controls(struct request_data *driver_data,
|
||||
.ptr = slice_params_array,
|
||||
.size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices,
|
||||
};
|
||||
/*
|
||||
* iter40b: rpi-hevc-dec's per-frame ctrl set is 4 (no
|
||||
* scaling_matrix when SPS doesn't enable it). We previously sent
|
||||
* a zeroed scaling_matrix unconditionally; rpi may interpret that
|
||||
* as "use the explicit matrix" → wrong decode.
|
||||
*
|
||||
* Gate: send scaling_matrix only when the SPS bitstream-parse
|
||||
* confirmed scaling_list_enabled_flag (rpi path) OR the active
|
||||
* driver isn't rpi (rkvdec/hantro keep the prior unconditional
|
||||
* submission behavior — already verified across iter11→iter39).
|
||||
*/
|
||||
{
|
||||
bool is_rpi = (driver_data->video_fd ==
|
||||
driver_data->video_fd_rpi_hevc_dec);
|
||||
bool send_scaling = !is_rpi ||
|
||||
driver_data->hevc_sps_field_cache.scaling_list_enabled;
|
||||
if (send_scaling) {
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
|
||||
.ptr = &scaling_matrix,
|
||||
.size = sizeof(scaling_matrix),
|
||||
};
|
||||
}
|
||||
}
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
|
||||
.ptr = &scaling_matrix,
|
||||
.size = sizeof(scaling_matrix),
|
||||
};
|
||||
controls[n++] = (struct v4l2_ext_control){
|
||||
.id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
|
||||
.ptr = &decode_params,
|
||||
|
||||
+71
-172
@@ -39,8 +39,6 @@
|
||||
|
||||
#include <linux/dma-buf.h>
|
||||
|
||||
#include "nv15.h"
|
||||
#include "nv12_col128.h"
|
||||
#include "tiled_yuv.h"
|
||||
#include "utils.h"
|
||||
#include "v4l2.h"
|
||||
@@ -88,50 +86,13 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
|
||||
for (i = 0; i < planes_count; i++)
|
||||
size += destination_sizes[i];
|
||||
|
||||
if (format->fourcc == VA_FOURCC_P010) {
|
||||
/*
|
||||
* iter39: P010 image overrides V4L2-side NV15 sizing. The
|
||||
* source is the kernel-reported NV15 packed plane; the image
|
||||
* buffer holds dense P010 (2 bytes per pixel, 16bpp).
|
||||
* Recompute sizes/pitches against P010 layout so consumers
|
||||
* (vaGetImage, vaDeriveImage) see standard P010 geometry.
|
||||
*/
|
||||
destination_bytesperlines[0] = width * 2;
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
for (i = 1; i < destination_planes_count; i++) {
|
||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||
destination_sizes[i] = destination_sizes[0] / 2;
|
||||
}
|
||||
size = 0;
|
||||
for (i = 0; i < destination_planes_count; i++)
|
||||
size += destination_sizes[i];
|
||||
} else if (format->fourcc == VA_FOURCC_NV12 &&
|
||||
video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
|
||||
/*
|
||||
* iter40 Phase 5 review F2: NC12 source, NV12 image output.
|
||||
* V4L2-reported destination_bytesperlines[0] is the NC12
|
||||
* column stride (= ALIGN(height,8) * 3/2 — e.g. 1080 for
|
||||
* 1280×720), NOT the linear NV12 Y stride. Override to the
|
||||
* linear stride (width) so VAImage pitches reflect the
|
||||
* detile-output layout the consumer reads.
|
||||
*/
|
||||
destination_bytesperlines[0] = width;
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
for (i = 1; i < destination_planes_count; i++) {
|
||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||
destination_sizes[i] = destination_sizes[0] / 2;
|
||||
}
|
||||
size = 0;
|
||||
for (i = 0; i < destination_planes_count; i++)
|
||||
size += destination_sizes[i];
|
||||
} else {
|
||||
/* NV12: V4L2 stride is correct, sizes derived from height. */
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
/* Here we calculate the sizes assuming NV12. */
|
||||
|
||||
for (i = 1; i < destination_planes_count; i++) {
|
||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||
destination_sizes[i] = destination_sizes[0] / 2;
|
||||
}
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
|
||||
for (i = 1; i < destination_planes_count; i++) {
|
||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||
destination_sizes[i] = destination_sizes[0] / 2;
|
||||
}
|
||||
|
||||
id = object_heap_allocate(&driver_data->image_heap);
|
||||
@@ -255,91 +216,63 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* AV1 film_grain: when this surface is the display surface of a
|
||||
* decode (current_display_picture != current_frame with apply_grain=1),
|
||||
* its slot is NULL because BeginPicture only fired on the decode
|
||||
* surface. Follow the back-link set in av1_set_controls and borrow
|
||||
* the decode surface's destination_data + sizes for the copy.
|
||||
*/
|
||||
if (surface_object->current_slot == NULL &&
|
||||
surface_object->linked_decode_surface_id != VA_INVALID_SURFACE) {
|
||||
struct object_surface *decode_surface =
|
||||
SURFACE(driver_data,
|
||||
surface_object->linked_decode_surface_id);
|
||||
if (decode_surface != NULL &&
|
||||
decode_surface->current_slot != NULL) {
|
||||
/* Mirror the fields we read below. The surface heap
|
||||
* pointer is stable for the surface's lifetime; we
|
||||
* only need destination_data + destination_sizes +
|
||||
* destination_planes_count from it. */
|
||||
surface_object->destination_planes_count =
|
||||
decode_surface->destination_planes_count;
|
||||
for (i = 0; i < decode_surface->destination_planes_count; i++) {
|
||||
surface_object->destination_data[i] =
|
||||
decode_surface->destination_data[i];
|
||||
surface_object->destination_sizes[i] =
|
||||
decode_surface->destination_sizes[i];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (i = 0; i < surface_object->destination_planes_count; i++) {
|
||||
/*
|
||||
* iter40 Phase 5 review F1: guard extended from __arm__ to
|
||||
* __arm__ || __aarch64__. Without this, the detile primitives
|
||||
* silently compiled out on aarch64 (fresnel RK3399, ampere
|
||||
* RK3588, higgs Pi CM5) and the memcpy fall-through delivered
|
||||
* raw tiled bytes to NV12/P010 image consumers. iter39 5/5
|
||||
* PASS masked the issue because no 10-bit path was exercised.
|
||||
*/
|
||||
#if defined(__arm__) || defined(__aarch64__)
|
||||
/*
|
||||
* Sunxi tiled_to_planar lives in tiled_yuv.S which is
|
||||
* #ifdef __arm__ — symbol absent on aarch64. Keep this
|
||||
* branch arm-only; aarch64 Sunxi support would need a C or
|
||||
* aarch64-ASM port (no Sunxi aarch64 board in current fleet).
|
||||
*/
|
||||
#if defined(__arm__)
|
||||
/* AV1 Phase 3 diag: surface NULL-deref hunt. */
|
||||
if (buffer_object->data == NULL ||
|
||||
surface_object->destination_data[i] == NULL) {
|
||||
request_log("copy_surface_to_image NULL i=%u "
|
||||
"buf_data=%p dest_data=%p dest_size=%u "
|
||||
"planes=%u slot=%p linked=0x%x\n",
|
||||
i, (void *)buffer_object->data,
|
||||
(void *)surface_object->destination_data[i],
|
||||
surface_object->destination_sizes[i],
|
||||
surface_object->destination_planes_count,
|
||||
(void *)surface_object->current_slot,
|
||||
surface_object->linked_decode_surface_id);
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
}
|
||||
#ifdef __arm__
|
||||
if (!video_format_is_linear(driver_data->video_format))
|
||||
tiled_to_planar(surface_object->destination_data[i],
|
||||
buffer_object->data + image->offsets[i],
|
||||
image->pitches[i], image->width,
|
||||
i == 0 ? image->height :
|
||||
image->height / 2);
|
||||
else
|
||||
#endif
|
||||
if (driver_data->is_10bit &&
|
||||
image->format.fourcc == VA_FOURCC_P010) {
|
||||
/*
|
||||
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
|
||||
* bytes); the VA image buffer is dense P010 (2B/pixel,
|
||||
* value in bits[15:6]). Source stride is the V4L2-
|
||||
* reported NV15 bytesperline (= ceil(width/4)*5,
|
||||
* possibly aligned higher by the kernel); destination
|
||||
* stride is image->pitches[i] = width * 2.
|
||||
*/
|
||||
unsigned int plane_h = (i == 0) ? image->height
|
||||
: image->height / 2;
|
||||
nv15_unpack_plane_to_p010(
|
||||
surface_object->destination_data[i],
|
||||
(uint16_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->width, plane_h,
|
||||
surface_object->destination_bytesperlines[i]);
|
||||
} else if (driver_data->video_format != NULL &&
|
||||
driver_data->video_format->v4l2_format ==
|
||||
V4L2_PIX_FMT_NV12_COL128 &&
|
||||
image->format.fourcc == VA_FOURCC_NV12) {
|
||||
/*
|
||||
* iter40: Pi 5 rpi-hevc-dec emits NV12_COL128 (SAND
|
||||
* 128-pixel-wide column tiles). Detile to linear NV12
|
||||
* via the per-plane primitive. surface_object->
|
||||
* destination_data[i] is the V4L2 CAPTURE mmap (single
|
||||
* buffer, planes_count==2): i==0 is the Y plane base,
|
||||
* i==1 is the UV plane base offset within the SAME
|
||||
* physical buffer (per cap_pool plane[1] offset = Y
|
||||
* plane size in COL128 layout).
|
||||
*
|
||||
* src_col_stride = destination_bytesperlines[i] = the
|
||||
* kernel-reported NC12 bytesperline (column stride,
|
||||
* = ALIGN(image_h, 8) * 3/2). Same for both planes
|
||||
* since column geometry is plane-agnostic.
|
||||
*
|
||||
* dst stride is image->pitches[i] = image->width
|
||||
* (overridden in RequestCreateImage NC12 branch below).
|
||||
*/
|
||||
if (i == 0) {
|
||||
nv12_col128_detile_y(
|
||||
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->pitches[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_bytesperlines[i],
|
||||
image->width, image->height);
|
||||
} else {
|
||||
nv12_col128_detile_uv(
|
||||
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->pitches[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_bytesperlines[i],
|
||||
image->width, image->height / 2);
|
||||
}
|
||||
} else {
|
||||
else {
|
||||
#endif
|
||||
memcpy(buffer_object->data + image->offsets[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_sizes[i]);
|
||||
#if defined(__arm__) || defined(__aarch64__)
|
||||
#ifdef __arm__
|
||||
}
|
||||
#endif
|
||||
}
|
||||
@@ -378,17 +311,9 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
|
||||
|
||||
/* Fully populate VAImageFormat to match QueryImageFormats output. */
|
||||
memset(&format, 0, sizeof(format));
|
||||
if (driver_data->is_10bit) {
|
||||
/* iter39: 10-bit session derives a P010 image. NV15-source
|
||||
* unpack happens in copy_surface_to_image. */
|
||||
format.fourcc = VA_FOURCC_P010;
|
||||
format.byte_order = VA_LSB_FIRST;
|
||||
format.bits_per_pixel = 24;
|
||||
} else {
|
||||
format.fourcc = VA_FOURCC_NV12;
|
||||
format.byte_order = VA_LSB_FIRST;
|
||||
format.bits_per_pixel = 12;
|
||||
}
|
||||
format.fourcc = VA_FOURCC_NV12;
|
||||
format.byte_order = VA_LSB_FIRST;
|
||||
format.bits_per_pixel = 12;
|
||||
|
||||
status = RequestCreateImage(context, &format, surface_object->width,
|
||||
surface_object->height, image);
|
||||
@@ -423,52 +348,26 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
|
||||
VAStatus RequestQueryImageFormats(VADriverContextP context,
|
||||
VAImageFormat *formats, int *formats_count)
|
||||
{
|
||||
struct request_data *driver_data = context->pDriverData;
|
||||
int n = 0;
|
||||
|
||||
/*
|
||||
* Populate the VAImageFormat fully per VAAPI spec — not just
|
||||
* .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, Firefox)
|
||||
* read .byte_order and .bits_per_pixel; leaving them
|
||||
* uninitialized inherits caller-stack garbage and produces
|
||||
* non-deterministic behavior. Reference: Mesa's
|
||||
* gallium/frontends/va/image.c::vlVaQueryImageFormats and
|
||||
* intel-vaapi-driver's i965_drv_video.c.
|
||||
* Populate the VAImageFormat fully per VAAPI spec for NV12 —
|
||||
* not just .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv,
|
||||
* Firefox) read .byte_order and .bits_per_pixel; leaving them
|
||||
* uninitialized inherits whatever caller-stack garbage is in
|
||||
* the buffer and produces non-deterministic behavior. Reference:
|
||||
* Mesa's gallium/frontends/va/image.c::vlVaQueryImageFormats and
|
||||
* intel-vaapi-driver's i965_drv_video.c — both publish NV12
|
||||
* with byte_order=VA_LSB_FIRST and bits_per_pixel=12.
|
||||
*
|
||||
* iter39: advertise P010 when an active session is 10-bit so
|
||||
* ffmpeg-vaapi sees a valid 10-bit-compatible entry during
|
||||
* vaQueryImageFormats. NV12 stays advertised unconditionally so
|
||||
* the 8-bit catalog query response is unchanged.
|
||||
* For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask
|
||||
* are not meaningful (those describe RGB bit layouts); leave them
|
||||
* zeroed via memset before populating.
|
||||
*/
|
||||
memset(&formats[n], 0, sizeof(formats[n]));
|
||||
formats[n].fourcc = VA_FOURCC_NV12;
|
||||
formats[n].byte_order = VA_LSB_FIRST;
|
||||
formats[n].bits_per_pixel = 12;
|
||||
n++;
|
||||
|
||||
/*
|
||||
* iter39 Option B revert (2026-05-17): P010 advertisement is
|
||||
* gated on driver_data->is_10bit again. Previously advertised
|
||||
* unconditionally (63fed87) so ffmpeg-vaapi's early
|
||||
* vaQueryImageFormats (pre-vaCreateContext) could see it for
|
||||
* 10-bit profiles — but that broke HEVC 8-bit on fresnel:
|
||||
* ffmpeg-vaapi picked P010 for the HEVC hwframe pool, EndPicture
|
||||
* SEGV'd in the .so when the consumer-side P010 expectations met
|
||||
* an 8-bit NV12 CAPTURE buffer.
|
||||
* Safe because Option B drops VAProfileHEVCMain10 + Hi10P from
|
||||
* enumeration — no 10-bit decode pipeline will reach this catalog
|
||||
* query so the gate-on-is_10bit (which stays false for 8-bit
|
||||
* profiles) correctly returns NV12-only.
|
||||
*/
|
||||
if (driver_data->is_10bit && n < V4L2_REQUEST_MAX_IMAGE_FORMATS) {
|
||||
memset(&formats[n], 0, sizeof(formats[n]));
|
||||
formats[n].fourcc = VA_FOURCC_P010;
|
||||
formats[n].byte_order = VA_LSB_FIRST;
|
||||
formats[n].bits_per_pixel = 24;
|
||||
n++;
|
||||
}
|
||||
|
||||
*formats_count = n;
|
||||
memset(&formats[0], 0, sizeof(formats[0]));
|
||||
formats[0].fourcc = VA_FOURCC_NV12;
|
||||
formats[0].byte_order = VA_LSB_FIRST;
|
||||
formats[0].bits_per_pixel = 12;
|
||||
*formats_count = 1;
|
||||
|
||||
return VA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
+2
-7
@@ -22,9 +22,6 @@
|
||||
|
||||
autoconf_data = configuration_data()
|
||||
autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func)
|
||||
if get_option('daedalus_v4l2')
|
||||
autoconf_data.set('HAVE_DAEDALUS_V4L2', 1)
|
||||
endif
|
||||
|
||||
autoconf = configure_file(
|
||||
output: 'autoconfig.h',
|
||||
@@ -53,9 +50,8 @@ sources = [
|
||||
'h265.c',
|
||||
'vp8.c',
|
||||
'vp9.c',
|
||||
'av1.c',
|
||||
'codec.c',
|
||||
'nv15.c',
|
||||
'nv12_col128.c',
|
||||
|
||||
# Vendored GStreamer 1.28.2 H.265 parser + utilities (LGPL v2.1+,
|
||||
# see src/h265_parser/gst_compat.h for sourcing notes + per-iter2
|
||||
@@ -90,9 +86,8 @@ headers = [
|
||||
'h265.h',
|
||||
'vp8.h',
|
||||
'vp9.h',
|
||||
'av1.h',
|
||||
'codec.h',
|
||||
'nv15.h',
|
||||
'nv12_col128.h',
|
||||
|
||||
# Internal mirror of Linux 7.0 V4L2 HEVC EXT_SPS_*_RPS UAPI defs
|
||||
# (allows building against pre-7.0 linux-api-headers; redundant
|
||||
|
||||
@@ -1,114 +0,0 @@
|
||||
/*
|
||||
* V4L2_PIX_FMT_NV12_COL128 → linear NV12 detile primitive. Pi 5 / CM5
|
||||
* rpi-hevc-dec CAPTURE. iter40 (2026-05-17).
|
||||
*
|
||||
* Math derived from kernel hevc_d_video.c (size formula) +
|
||||
* ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h (per-pixel offset). The
|
||||
* single-stripe fast path memcpy's 128 bytes at a time when an output
|
||||
* row falls entirely within one tile column (the common case);
|
||||
* straddling rows are split into two memcpy halves.
|
||||
*
|
||||
* No NEON / SIMD here — correctness first. Each output row generates
|
||||
* (width / 128) + ~1 memcpys of up to 128 bytes; for 1920x1080 that's
|
||||
* ~17000 small memcpys per frame, fine for Phase 1 PoC.
|
||||
*/
|
||||
|
||||
#include "nv12_col128.h"
|
||||
|
||||
#include <string.h>
|
||||
|
||||
/*
|
||||
* Tile column width in bytes. The 'COL128' name embeds this; if it ever
|
||||
* varies, take it from V4L2_PIX_FMT_NV12_COL128's kernel definition.
|
||||
*/
|
||||
#define NC12_TILE_W 128
|
||||
|
||||
/*
|
||||
* Common Y / UV plane detile — the layout is identical (single-byte per
|
||||
* pixel, column-major 128-wide tiles). The only thing that varies is
|
||||
* what plane the caller passes in. width here is plane width in bytes
|
||||
* (= image width for both Y and CbCr-interleaved NV12 UV); height is
|
||||
* plane height in pixels (image height for Y, image height / 2 for UV).
|
||||
*/
|
||||
static void nv12_col128_detile_plane(uint8_t *dst, unsigned int dst_stride,
|
||||
const uint8_t *src,
|
||||
unsigned int src_col_stride,
|
||||
unsigned int width, unsigned int height)
|
||||
{
|
||||
unsigned int y, x;
|
||||
|
||||
for (y = 0; y < height; y++) {
|
||||
uint8_t *drow = dst + y * dst_stride;
|
||||
x = 0;
|
||||
while (x < width) {
|
||||
unsigned int col = x / NC12_TILE_W;
|
||||
unsigned int in_col = x % NC12_TILE_W;
|
||||
unsigned int n = NC12_TILE_W - in_col;
|
||||
if (n > width - x)
|
||||
n = width - x;
|
||||
/*
|
||||
* Source byte = base + col*128*col_stride + y*128 + in_col
|
||||
* Copy n contiguous bytes (all within this tile column,
|
||||
* since n is capped at the remaining width-in-column).
|
||||
*/
|
||||
const uint8_t *p = src
|
||||
+ (size_t)col * NC12_TILE_W * src_col_stride
|
||||
+ (size_t)y * NC12_TILE_W
|
||||
+ in_col;
|
||||
memcpy(drow + x, p, n);
|
||||
x += n;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
|
||||
const uint8_t *src_y, unsigned int src_col_stride,
|
||||
unsigned int width, unsigned int height)
|
||||
{
|
||||
nv12_col128_detile_plane(dst, dst_stride, src_y, src_col_stride,
|
||||
width, height);
|
||||
}
|
||||
|
||||
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
|
||||
const uint8_t *src_uv, unsigned int src_col_stride,
|
||||
unsigned int width, unsigned int uv_height)
|
||||
{
|
||||
/* UV plane (CbCr interleaved): byte-width equals Y-plane width
|
||||
* (one Cb + one Cr per 2x2 Y block → 2 bytes per 2 horizontal Y
|
||||
* samples → 1 byte per Y pixel horizontally). Height is half. */
|
||||
nv12_col128_detile_plane(dst, dst_stride, src_uv, src_col_stride,
|
||||
width, uv_height);
|
||||
}
|
||||
|
||||
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
|
||||
unsigned int image_height)
|
||||
{
|
||||
unsigned int aligned_h = (image_height + 7) & ~7u;
|
||||
|
||||
/*
|
||||
* In the COL128 SAND layout, Y and UV are NOT separate planes
|
||||
* concatenated end-to-end. Within EACH 128-pixel-wide column:
|
||||
* first 128 * height bytes = Y data for this column strip
|
||||
* next 128 * height / 2 bytes = UV data for this column strip
|
||||
* total 128 * bytesperline (= 128 * height * 3/2) bytes per column
|
||||
*
|
||||
* The "UV plane base" pointer (data[1] in AVFrame convention) is
|
||||
* just data[0] + (128 * height) — the offset of the UV bytes
|
||||
* WITHIN the first column. All subsequent UV bytes are reached by
|
||||
* the same column-stride arithmetic the Y plane uses (col *
|
||||
* 128 * bytesperline + y * 128 + in_col), so passing this offset
|
||||
* pointer + iterating y over [0, height/2) traverses all UV rows
|
||||
* across all columns correctly.
|
||||
*
|
||||
* Earlier wrong formula was num_columns * 128 * aligned_h (i.e.
|
||||
* sizeof(linear Y plane)) — that pushed past the end of the SAND
|
||||
* buffer because the layout isn't planes-end-to-end.
|
||||
*
|
||||
* Cross-check: kernel sizeimage = bytesperline * width =
|
||||
* (aligned_h * 3/2) * num_columns * 128 = num_columns * 128 *
|
||||
* aligned_h * 3/2. Per column: 128 * aligned_h * 3/2. Y portion
|
||||
* per column: 128 * aligned_h. UV portion per column: half of Y.
|
||||
* Sum across columns: matches sizeimage.
|
||||
*/
|
||||
return NC12_TILE_W * aligned_h;
|
||||
}
|
||||
@@ -1,88 +0,0 @@
|
||||
/*
|
||||
* V4L2_PIX_FMT_NV12_COL128 (NC12) SAND-tiled → linear NV12 detile.
|
||||
*
|
||||
* Pi 5 / CM5 (BCM2712) rpi-hevc-dec CAPTURE format. iter40 (2026-05-17).
|
||||
*
|
||||
* Layout (kernel drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
|
||||
* size-formula + ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h per-pixel
|
||||
* offset math):
|
||||
*
|
||||
* width ALIGN(image_width, 128) -- columns are 128 px wide
|
||||
* height ALIGN(image_height, 8)
|
||||
* col_stride (= bytesperline) = height * 3 / 2
|
||||
* (bytes per [128-wide column] vertical unit incl. Y + UV)
|
||||
* sizeimage = col_stride * width = total bytes
|
||||
*
|
||||
* For pixel (x, y) in the Y plane:
|
||||
* col = x / 128
|
||||
* in_col_x = x % 128
|
||||
* offset = col * col_stride * 128 + y * 128 + in_col_x
|
||||
*
|
||||
* UV plane starts at offset (128 * height * num_columns_y) — the same
|
||||
* per-column layout, h/2 rows tall (CbCr interleaved).
|
||||
*
|
||||
* The primitive copies the entire image extent at once. width/height are
|
||||
* the cropped consumer-visible dimensions; src_col_stride is the kernel-
|
||||
* reported bytesperline (i.e. ALIGN(height,8) * 3/2).
|
||||
*/
|
||||
|
||||
#ifndef _NV12_COL128_H_
|
||||
#define _NV12_COL128_H_
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
#include <linux/videodev2.h>
|
||||
|
||||
/*
|
||||
* Pre-Pi-kernel headers (Arch ALARM linux-api-headers, older mainline
|
||||
* kernel-headers packages) may not define V4L2_PIX_FMT_NV12_COL128. The
|
||||
* fourcc is Pi-specific. Provide a private fallback so the backend
|
||||
* builds on hosts that target NON-Pi codecs too.
|
||||
*/
|
||||
#ifndef V4L2_PIX_FMT_NV12_COL128
|
||||
#define V4L2_PIX_FMT_NV12_COL128 \
|
||||
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
|
||||
((unsigned int)('1') << 16) | ((unsigned int)('2') << 24))
|
||||
#endif
|
||||
|
||||
#ifndef V4L2_PIX_FMT_NV12_10_COL128
|
||||
/* 10-bit SAND variant: 3 pixels packed into 4 bytes in 128-byte / 96-pixel
|
||||
* wide columns. iter40 references the fourcc for completeness; the 10-bit
|
||||
* Pi 5 HEVC chapter (Main10) is post-iter40. */
|
||||
#define V4L2_PIX_FMT_NV12_10_COL128 \
|
||||
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
|
||||
((unsigned int)('3') << 16) | ((unsigned int)('0') << 24))
|
||||
#endif
|
||||
|
||||
/* Detile the Y plane of an NC12 source to a linear NV12 Y plane.
|
||||
* dst : pointer to linear NV12 Y plane (caller-owned, dst_stride * height bytes)
|
||||
* dst_stride : linear Y plane stride in bytes (= width for plain NV12)
|
||||
* src_y : pointer to start of NC12 Y plane (= NC12 buffer base)
|
||||
* src_col_stride: kernel-reported bytesperline (= ALIGN(height,8) * 3/2)
|
||||
* width, height: cropped image dimensions in pixels
|
||||
*/
|
||||
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
|
||||
const uint8_t *src_y, unsigned int src_col_stride,
|
||||
unsigned int width, unsigned int height);
|
||||
|
||||
/* Detile the UV plane (CbCr interleaved, half-height) of an NC12 source.
|
||||
* dst : pointer to linear NV12 UV plane
|
||||
* dst_stride : linear UV plane stride in bytes (= width for NV12)
|
||||
* src_uv : pointer to start of NC12 UV plane (= src_y + Y-plane-size)
|
||||
* src_col_stride: same as Y plane (same column geometry)
|
||||
* width : Y-plane width in pixels (UV plane has same byte width)
|
||||
* uv_height : UV plane height = height / 2
|
||||
*/
|
||||
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
|
||||
const uint8_t *src_uv, unsigned int src_col_stride,
|
||||
unsigned int width, unsigned int uv_height);
|
||||
|
||||
/* Compute the offset of the UV plane within an NC12 buffer.
|
||||
* image_width, image_height: cropped image dimensions in pixels
|
||||
* Returns: byte offset from buffer start to UV plane start
|
||||
* (= 128 * ALIGN(image_height, 8) * num_columns_y)
|
||||
*/
|
||||
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
|
||||
unsigned int image_height);
|
||||
|
||||
#endif /* _NV12_COL128_H_ */
|
||||
-75
@@ -1,75 +0,0 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sub license, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the
|
||||
* next paragraph) shall be included in all copies or substantial portions
|
||||
* of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "nv15.h"
|
||||
|
||||
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
|
||||
unsigned int width, unsigned int height,
|
||||
unsigned int src_stride)
|
||||
{
|
||||
unsigned int x, y;
|
||||
unsigned int dst_pitch_px = width;
|
||||
|
||||
for (y = 0; y < height; y++) {
|
||||
const uint8_t *s = src + y * src_stride;
|
||||
uint16_t *d = dst + y * dst_pitch_px;
|
||||
|
||||
for (x = 0; x + 4 <= width; x += 4) {
|
||||
uint16_t a = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
|
||||
uint16_t b = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
|
||||
uint16_t c = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
|
||||
uint16_t e = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
|
||||
|
||||
d[0] = (uint16_t)(a << 6);
|
||||
d[1] = (uint16_t)(b << 6);
|
||||
d[2] = (uint16_t)(c << 6);
|
||||
d[3] = (uint16_t)(e << 6);
|
||||
|
||||
d += 4;
|
||||
s += 5;
|
||||
}
|
||||
|
||||
if (x < width) {
|
||||
unsigned int rem = width - x;
|
||||
uint16_t pix[4] = { 0, 0, 0, 0 };
|
||||
|
||||
pix[0] = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
|
||||
if (rem >= 2)
|
||||
pix[1] = ((uint16_t)s[1] >> 2) |
|
||||
((uint16_t)(s[2] & 0x0F) << 6);
|
||||
if (rem >= 3)
|
||||
pix[2] = ((uint16_t)s[2] >> 4) |
|
||||
((uint16_t)(s[3] & 0x3F) << 4);
|
||||
if (rem >= 4)
|
||||
pix[3] = ((uint16_t)s[3] >> 6) |
|
||||
((uint16_t)s[4] << 2);
|
||||
|
||||
{
|
||||
unsigned int j;
|
||||
for (j = 0; j < rem; j++)
|
||||
d[j] = (uint16_t)(pix[j] << 6);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-61
@@ -1,61 +0,0 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sub license, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the
|
||||
* next paragraph) shall be included in all copies or substantial portions
|
||||
* of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#ifndef _NV15_H_
|
||||
#define _NV15_H_
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
#include <linux/videodev2.h>
|
||||
|
||||
/*
|
||||
* Older or downstream linux-api-headers / kernel-headers packages may
|
||||
* not define V4L2_PIX_FMT_NV15. Provide a fallback so the backend
|
||||
* builds on hosts whose headers are pre-NV15-merge or omit it (e.g.
|
||||
* Pi 5 Debian trixie 6.12.62 headers include NC12 but not NV15).
|
||||
* Same numeric value as mainline.
|
||||
*/
|
||||
#ifndef V4L2_PIX_FMT_NV15
|
||||
#define V4L2_PIX_FMT_NV15 \
|
||||
((unsigned int)('N') | ((unsigned int)('V') << 8) | \
|
||||
((unsigned int)('1') << 16) | ((unsigned int)('5') << 24))
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Unpack one plane of V4L2_PIX_FMT_NV15 (4 × 10-bit values packed into
|
||||
* 5 consecutive bytes, LSB-first) into VA_FOURCC_P010 (16-bit per pixel,
|
||||
* value in bits [15:6], zeros in [5:0]).
|
||||
*
|
||||
* Layout per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
|
||||
* Call once per plane: luma (W × H, src_stride = ceil(W/4)*5) and chroma
|
||||
* (W × H/2 — same width because UV are interleaved 10-bit values).
|
||||
*
|
||||
* src_stride must be the kernel-reported bytesperline for the NV15 plane.
|
||||
* The destination is dense P010 with row pitch = width * 2 bytes.
|
||||
*/
|
||||
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
|
||||
unsigned int width, unsigned int height,
|
||||
unsigned int src_stride);
|
||||
|
||||
#endif
|
||||
+52
-31
@@ -36,6 +36,7 @@
|
||||
#include "mpeg2.h"
|
||||
#include "vp8.h"
|
||||
#include "vp9.h"
|
||||
#include "av1.h"
|
||||
|
||||
#include <assert.h>
|
||||
#include <stdio.h>
|
||||
@@ -132,14 +133,12 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
memcpy(&surface_object->params.h264.picture,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h264.picture));
|
||||
break;
|
||||
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
memcpy(&surface_object->params.h265.picture,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h265.picture));
|
||||
@@ -157,6 +156,15 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
sizeof(surface_object->params.vp9.picture));
|
||||
break;
|
||||
|
||||
case VAProfileAV1Profile0:
|
||||
memcpy(&surface_object->params.av1.picture,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.av1.picture));
|
||||
/* Reset per-frame tile group entry array on each new
|
||||
* picture parameter buffer (start of a new frame). */
|
||||
surface_object->params.av1.num_tile_group_entries = 0;
|
||||
break;
|
||||
|
||||
default:
|
||||
break;
|
||||
}
|
||||
@@ -169,14 +177,12 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
memcpy(&surface_object->params.h264.slice,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h264.slice));
|
||||
break;
|
||||
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10: {
|
||||
case VAProfileHEVCMain: {
|
||||
unsigned int n = surface_object->params.h265.num_slices;
|
||||
if (n < HEVC_MAX_SLICES_PER_FRAME) {
|
||||
memcpy(&surface_object->params.h265.slices[n],
|
||||
@@ -204,6 +210,17 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
sizeof(surface_object->params.vp9.slice));
|
||||
break;
|
||||
|
||||
case VAProfileAV1Profile0: {
|
||||
unsigned int n = surface_object->params.av1.num_tile_group_entries;
|
||||
if (n < AV1_MAX_TILES) {
|
||||
memcpy(&surface_object->params.av1.tile_group_entries[n],
|
||||
buffer_object->data,
|
||||
sizeof(VASliceParameterBufferAV1));
|
||||
surface_object->params.av1.num_tile_group_entries = n + 1;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
default:
|
||||
break;
|
||||
}
|
||||
@@ -224,7 +241,6 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
memcpy(&surface_object->params.h264.matrix,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h264.matrix));
|
||||
@@ -232,7 +248,6 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
||||
break;
|
||||
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
memcpy(&surface_object->params.h265.iqmatrix,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h265.iqmatrix));
|
||||
@@ -292,7 +307,6 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileH264High10:
|
||||
rc = h264_set_controls(driver_data, context, profile,
|
||||
surface_object);
|
||||
if (rc < 0)
|
||||
@@ -300,7 +314,6 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
||||
break;
|
||||
|
||||
case VAProfileHEVCMain:
|
||||
case VAProfileHEVCMain10:
|
||||
rc = h265_set_controls(driver_data, context, surface_object);
|
||||
if (rc < 0)
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
@@ -317,29 +330,10 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
||||
if (rc < 0)
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
break;
|
||||
|
||||
case VAProfileAV1Profile0:
|
||||
/*
|
||||
* AV1 has no codec-specific V4L2 control dispatch wired up
|
||||
* yet on this branch (see config.c VAProfileAV1Profile0
|
||||
* comment). For the daedalus_v4l2 daemon path that's fine:
|
||||
* AV1 frames are self-describing per-frame (OBU sequence +
|
||||
* frame headers carry everything libavcodec needs), so the
|
||||
* bitstream in the V4L2 OUTPUT buffer is sufficient — no
|
||||
* V4L2_CID_STATELESS_AV1_* controls have to be populated.
|
||||
*
|
||||
* Per-codec dispatch in request_switch_device_for_profile
|
||||
* has already retargeted (video_fd, media_fd) to
|
||||
* video_fd_daedalus (or video_fd_vpu981 on RK3588 if
|
||||
* present) by the time we get here; the OUTPUT buffer will
|
||||
* be queued via that fd and the kernel forwards bytes to
|
||||
* the daemon as a regular REQ_DECODE. No-op is the
|
||||
* correct shape.
|
||||
*
|
||||
* When the vpu981-targeted V4L2_CID_STATELESS_AV1_* dispatch
|
||||
* lands from the av1-iter1 operator branch, replace this
|
||||
* with av1_set_controls(...).
|
||||
*/
|
||||
rc = av1_set_controls(driver_data, context, surface_object);
|
||||
if (rc < 0)
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
break;
|
||||
|
||||
default:
|
||||
@@ -367,6 +361,12 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
|
||||
if (surface_object == NULL)
|
||||
return VA_STATUS_ERROR_INVALID_SURFACE;
|
||||
|
||||
/* AV1 Phase 3 diag */
|
||||
request_log("BeginPicture id=0x%x prev_slot=%p status=%d\n",
|
||||
surface_object->base.id,
|
||||
(void *)surface_object->current_slot,
|
||||
surface_object->status);
|
||||
|
||||
if (surface_object->status == VASurfaceRendering)
|
||||
RequestSyncSurface(context, surface_id);
|
||||
|
||||
@@ -378,9 +378,30 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
|
||||
* first. The new slot is bound and its V4L2 index + mmap pointers
|
||||
* are mirrored into surface_object->destination_* so the existing
|
||||
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
|
||||
*
|
||||
* AV1 Phase 3 finding: LIBVA_SKIP_REBIND=1 experiment (do NOT
|
||||
* unbind on rebind) did not improve PASS count for the av1_larger
|
||||
* film_grain stress vector — proving the iter2 Fix 3 release is
|
||||
* NOT the source of the inter-frame divergence. The issue is
|
||||
* deeper in ffmpeg-vaapi's AV1 hwaccel: per byte-equal OUTPUT
|
||||
* comparison with the patched-ffmpeg-v4l2request reference run
|
||||
* (LD_LIBRARY_PATH override on a debug libavcodec.so), 7/7 first
|
||||
* EndPicture submissions are byte-identical, libva has 2 EXTRA.
|
||||
*/
|
||||
if (surface_object->current_slot != NULL)
|
||||
surface_unbind_slot(driver_data, surface_object);
|
||||
|
||||
/*
|
||||
* AV1 Phase 5 review Amendment 4: clear any stale
|
||||
* linked_decode_surface_id from a prior film_grain display→decode
|
||||
* link. If ffmpeg-vaapi recycles a former display surface as a
|
||||
* decode target, BeginPicture binds a fresh slot — but without
|
||||
* this reset, copy_surface_to_image's link-follow would still
|
||||
* borrow from the now-stale linked surface and serve wrong data.
|
||||
* Cleared unconditionally (cheap) so the next AV1 grain frame
|
||||
* re-establishes the link if needed.
|
||||
*/
|
||||
surface_object->linked_decode_surface_id = VA_INVALID_SURFACE;
|
||||
{
|
||||
struct cap_pool_slot *cap_slot =
|
||||
cap_pool_acquire(&driver_data->capture_pool, surface_id);
|
||||
|
||||
+141
-209
@@ -93,10 +93,6 @@
|
||||
static const char * const known_decoder_drivers[] = {
|
||||
"rkvdec",
|
||||
"hantro-vpu",
|
||||
"rpi-hevc-dec", /* iter40: Pi 5 / CM5 stateless HEVC */
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
"daedalus_v4l2", /* phase 8.10: Pi 5 daemon-backed VP9/AV1/H264 */
|
||||
#endif
|
||||
"cedrus",
|
||||
"sun4i_csi",
|
||||
NULL
|
||||
@@ -329,6 +325,37 @@ static bool probe_hevc_ext_sps_rps_controls(int video_fd)
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Inspect a /dev/videoN's OUTPUT formats for `want_pixfmt`. Returns true
|
||||
* iff at least one OUTPUT/OUTPUT_MPLANE format matches.
|
||||
*
|
||||
* Used to discriminate between multiple devices sharing a driver name —
|
||||
* RK3588 has 3 hantro-vpu instances and only one of them is vpu981 (the
|
||||
* dedicated AV1 decoder advertising V4L2_PIX_FMT_AV1_FRAME).
|
||||
*/
|
||||
static bool video_node_supports_output_fmt(int video_fd, uint32_t want_pixfmt)
|
||||
{
|
||||
struct v4l2_fmtdesc desc;
|
||||
const enum v4l2_buf_type types[] = {
|
||||
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
|
||||
V4L2_BUF_TYPE_VIDEO_OUTPUT,
|
||||
};
|
||||
unsigned int t, i;
|
||||
|
||||
for (t = 0; t < sizeof(types) / sizeof(types[0]); t++) {
|
||||
for (i = 0; i < 64; i++) {
|
||||
memset(&desc, 0, sizeof desc);
|
||||
desc.index = i;
|
||||
desc.type = types[t];
|
||||
if (ioctl(video_fd, VIDIOC_ENUM_FMT, &desc) < 0)
|
||||
break;
|
||||
if (desc.pixelformat == want_pixfmt)
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
static int find_decoder_device_by_driver(const char *want_driver,
|
||||
char *video_out, size_t video_out_sz,
|
||||
char *media_out, size_t media_out_sz)
|
||||
@@ -376,6 +403,65 @@ static int find_decoder_device_by_driver(const char *want_driver,
|
||||
return -1;
|
||||
}
|
||||
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2 — like find_decoder_device_by_driver but
|
||||
* additionally verifies the resolved /dev/videoN advertises `want_pixfmt`
|
||||
* as an OUTPUT format. Required for RK3588 where 3 hantro-vpu instances
|
||||
* share the driver name but only one is vpu981 (AV1 decoder).
|
||||
*
|
||||
* Walks all /dev/media* with matching driver name; takes the first hit
|
||||
* whose OUTPUT formats include `want_pixfmt`. Non-matching candidates
|
||||
* (encoder-only nodes, legacy hantro for MPEG2/VP8) are skipped.
|
||||
*/
|
||||
static int find_decoder_device_by_driver_with_fmt(const char *want_driver,
|
||||
uint32_t want_pixfmt,
|
||||
char *video_out,
|
||||
size_t video_out_sz,
|
||||
char *media_out,
|
||||
size_t media_out_sz)
|
||||
{
|
||||
struct media_device_info info;
|
||||
char path[32];
|
||||
char vpath[32];
|
||||
int fd, vfd, i;
|
||||
|
||||
for (i = 0; i < 16; i++) {
|
||||
snprintf(path, sizeof path, "/dev/media%d", i);
|
||||
fd = open(path, O_RDWR | O_NONBLOCK);
|
||||
if (fd < 0)
|
||||
continue;
|
||||
memset(&info, 0, sizeof info);
|
||||
if (ioctl(fd, MEDIA_IOC_DEVICE_INFO, &info) != 0) {
|
||||
close(fd);
|
||||
continue;
|
||||
}
|
||||
if (strcmp(info.driver, want_driver) != 0) {
|
||||
close(fd);
|
||||
continue;
|
||||
}
|
||||
if (find_decoder_video_node_via_topology(fd, vpath,
|
||||
sizeof vpath) != 0) {
|
||||
close(fd);
|
||||
continue;
|
||||
}
|
||||
close(fd);
|
||||
|
||||
/* Capability check: does this /dev/videoN advertise the
|
||||
* codec-specific OUTPUT format? */
|
||||
vfd = open(vpath, O_RDWR | O_NONBLOCK);
|
||||
if (vfd < 0)
|
||||
continue;
|
||||
if (video_node_supports_output_fmt(vfd, want_pixfmt)) {
|
||||
close(vfd);
|
||||
snprintf(video_out, video_out_sz, "%s", vpath);
|
||||
snprintf(media_out, media_out_sz, "%s", path);
|
||||
return 0;
|
||||
}
|
||||
close(vfd);
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
static int find_codec_device(char *video_out, size_t video_out_sz,
|
||||
char *media_out, size_t media_out_sz)
|
||||
{
|
||||
@@ -413,15 +499,7 @@ char request_device_kind_for_profile(VAProfile profile)
|
||||
case VAProfileVP8Version0_3:
|
||||
return 'h';
|
||||
case VAProfileAV1Profile0:
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2: RK3588 vpu981 dedicated
|
||||
* AV1 hantro instance. 'a' kind dispatches to
|
||||
* driver_data->video_fd_vpu981. On hosts without the AV1
|
||||
* instance the fd stays -1 and RequestQueryConfigProfiles
|
||||
* never enumerates AV1, so this branch is unreachable for
|
||||
* non-RK3588 hosts.
|
||||
*/
|
||||
return 'a';
|
||||
return 'a'; /* ampere-av1-enablement: vpu981 dedicated AV1 */
|
||||
default:
|
||||
return '?';
|
||||
}
|
||||
@@ -445,77 +523,15 @@ int request_switch_device_for_profile(struct request_data *driver_data,
|
||||
char kind = request_device_kind_for_profile(profile);
|
||||
int target_video, target_media;
|
||||
|
||||
/*
|
||||
* iter40: HEVC override when rpi-hevc-dec is probed. The static
|
||||
* table (request_device_kind_for_profile) maps HEVC → 'r' (rkvdec)
|
||||
* because that's the canonical RK path. On Pi 5 there's no rkvdec
|
||||
* — rpi-hevc-dec is the only decoder. When BOTH would be present
|
||||
* (hypothetical mixed board), prefer rpi-hevc-dec for HEVC.
|
||||
*
|
||||
* Other rkvdec-routed profiles (VP9, H.264) stay on 'r' because
|
||||
* rpi-hevc-dec is HEVC-only.
|
||||
*/
|
||||
if ((profile == VAProfileHEVCMain || profile == VAProfileHEVCMain10) &&
|
||||
driver_data->video_fd_rpi_hevc_dec >= 0 &&
|
||||
driver_data->media_fd_rpi_hevc_dec >= 0) {
|
||||
kind = 'p';
|
||||
}
|
||||
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
/*
|
||||
* LIBVA-1: VP9/AV1/H.264 → daedalus_v4l2 when the daemon-backed
|
||||
* decoder fd is open. Pi 5 has no rkvdec (those profiles map to
|
||||
* 'r' by default → video_fd_rkvdec = -1 → "stay on whatever's
|
||||
* active" fallback would put H.264 frames on rpi-hevc-dec's fd
|
||||
* and S_FMT would fail). Re-route to the daedalus daemon instead.
|
||||
*
|
||||
* HEVC stays on 'p' (rpi-hevc-dec is HEVC-only — daedalus would
|
||||
* accept it via FFmpeg, but rpi-hevc-dec has the GPU-backed
|
||||
* hardware path so it's the right choice on this SoC).
|
||||
*
|
||||
* AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was probed.
|
||||
* On a Pi 5 the vpu981 slot stays -1, so we still route AV1 to
|
||||
* daedalus here. Check video_fd_vpu981 to preserve the RK3588
|
||||
* priority for that case.
|
||||
*/
|
||||
if (driver_data->video_fd_daedalus >= 0 &&
|
||||
driver_data->media_fd_daedalus >= 0) {
|
||||
switch (profile) {
|
||||
case VAProfileH264Main:
|
||||
case VAProfileH264High:
|
||||
case VAProfileH264ConstrainedBaseline:
|
||||
case VAProfileH264MultiviewHigh:
|
||||
case VAProfileH264StereoHigh:
|
||||
case VAProfileVP9Profile0:
|
||||
kind = 'd';
|
||||
break;
|
||||
case VAProfileAV1Profile0:
|
||||
if (driver_data->video_fd_vpu981 < 0)
|
||||
kind = 'd';
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
if (kind == 'r') {
|
||||
target_video = driver_data->video_fd_rkvdec;
|
||||
target_media = driver_data->media_fd_rkvdec;
|
||||
} else if (kind == 'h') {
|
||||
target_video = driver_data->video_fd_hantro;
|
||||
target_media = driver_data->media_fd_hantro;
|
||||
} else if (kind == 'p') {
|
||||
target_video = driver_data->video_fd_rpi_hevc_dec;
|
||||
target_media = driver_data->media_fd_rpi_hevc_dec;
|
||||
} else if (kind == 'a') {
|
||||
target_video = driver_data->video_fd_vpu981;
|
||||
target_media = driver_data->media_fd_vpu981;
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
} else if (kind == 'd') {
|
||||
target_video = driver_data->video_fd_daedalus;
|
||||
target_media = driver_data->media_fd_daedalus;
|
||||
#endif
|
||||
} else {
|
||||
return -1;
|
||||
}
|
||||
@@ -703,10 +719,6 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
||||
driver_data->media_fd_rkvdec = -1;
|
||||
driver_data->video_fd_hantro = -1;
|
||||
driver_data->media_fd_hantro = -1;
|
||||
driver_data->video_fd_rpi_hevc_dec = -1;
|
||||
driver_data->media_fd_rpi_hevc_dec = -1;
|
||||
driver_data->video_fd_daedalus = -1;
|
||||
driver_data->media_fd_daedalus = -1;
|
||||
driver_data->video_fd_vpu981 = -1;
|
||||
driver_data->media_fd_vpu981 = -1;
|
||||
|
||||
@@ -739,36 +751,6 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
||||
alt_driver = "rkvdec";
|
||||
driver_data->video_fd_hantro = video_fd;
|
||||
driver_data->media_fd_hantro = media_fd;
|
||||
} else if (strcmp(info.driver, "rpi-hevc-dec") == 0) {
|
||||
/* iter40 + LIBVA-1: Pi 5 / CM5. rpi-hevc-dec is
|
||||
* HEVC-only. If daedalus_v4l2 is ALSO loaded (Pi 5
|
||||
* mixed deployment — out-of-tree daemon-backed
|
||||
* decoder for VP9/AV1/H264), pick it up as the alt
|
||||
* so VP9/AV1/H264 have somewhere to land. */
|
||||
primary_driver = "rpi-hevc-dec";
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
alt_driver = "daedalus_v4l2";
|
||||
#else
|
||||
alt_driver = NULL;
|
||||
#endif
|
||||
driver_data->video_fd_rpi_hevc_dec = video_fd;
|
||||
driver_data->media_fd_rpi_hevc_dec = media_fd;
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
} else if (strcmp(info.driver, "daedalus_v4l2") == 0) {
|
||||
/* phase 8.10 + LIBVA-1: Pi 5 daemon-backed decoder.
|
||||
* VP9 / AV1 / H.264 route through it via the 'd'
|
||||
* kind below. On a mixed-driver box where
|
||||
* rpi-hevc-dec is ALSO loaded, pick it up as the
|
||||
* alt so HEVC has somewhere to land too — find_
|
||||
* codec_device's known_decoder_drivers[] order
|
||||
* normally puts rpi-hevc-dec first (we hit the
|
||||
* other branch in practice), but symmetric handling
|
||||
* keeps us correct if probe order ever flips. */
|
||||
primary_driver = "daedalus_v4l2";
|
||||
alt_driver = "rpi-hevc-dec";
|
||||
driver_data->video_fd_daedalus = video_fd;
|
||||
driver_data->media_fd_daedalus = media_fd;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
@@ -780,38 +762,15 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
||||
int alt_v = open(alt_video, O_RDWR | O_NONBLOCK);
|
||||
int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1;
|
||||
if (alt_v >= 0 && alt_m >= 0) {
|
||||
/* Dispatch into the matching per-driver slot.
|
||||
* iter38 only had rkvdec/hantro pairs; iter40 +
|
||||
* LIBVA-1 extended this to rpi-hevc-dec and
|
||||
* daedalus_v4l2 for the Pi 5 mixed-decoder
|
||||
* deployment. */
|
||||
if (strcmp(alt_driver, "rkvdec") == 0) {
|
||||
driver_data->video_fd_rkvdec = alt_v;
|
||||
driver_data->media_fd_rkvdec = alt_m;
|
||||
} else if (strcmp(alt_driver, "hantro-vpu") == 0) {
|
||||
} else {
|
||||
driver_data->video_fd_hantro = alt_v;
|
||||
driver_data->media_fd_hantro = alt_m;
|
||||
} else if (strcmp(alt_driver, "rpi-hevc-dec") == 0) {
|
||||
driver_data->video_fd_rpi_hevc_dec = alt_v;
|
||||
driver_data->media_fd_rpi_hevc_dec = alt_m;
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
} else if (strcmp(alt_driver, "daedalus_v4l2") == 0) {
|
||||
driver_data->video_fd_daedalus = alt_v;
|
||||
driver_data->media_fd_daedalus = alt_m;
|
||||
#endif
|
||||
} else {
|
||||
/* Shouldn't happen — primary_driver branches
|
||||
* above only set alt_driver to one of the
|
||||
* names handled here. Close and move on. */
|
||||
close(alt_v);
|
||||
close(alt_m);
|
||||
alt_v = -1;
|
||||
alt_m = -1;
|
||||
}
|
||||
if (alt_v >= 0) {
|
||||
request_log("iter38: also opened %s decoder at %s + %s\n",
|
||||
alt_driver, alt_video, alt_media);
|
||||
}
|
||||
request_log("iter38: also opened %s decoder at %s + %s\n",
|
||||
alt_driver, alt_video, alt_media);
|
||||
} else {
|
||||
if (alt_v >= 0) close(alt_v);
|
||||
if (alt_m >= 0) close(alt_m);
|
||||
@@ -821,57 +780,36 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
||||
(void)primary_driver;
|
||||
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2: walk hantro-vpu media nodes
|
||||
* for a SECOND one that advertises V4L2_PIX_FMT_AV1_FRAME
|
||||
* (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances
|
||||
* (legacy MPEG2/VP8 decoder, vepu121 encoder, vpu981 AV1
|
||||
* decoder) all reporting driver="hantro-vpu" / model="hantro-
|
||||
* vpu" — so OUTPUT-format probe is the only reliable
|
||||
* disambiguator that doesn't depend on parsing card-name
|
||||
* strings (which are DTS-dependent). First match wins.
|
||||
*
|
||||
* On non-RK3588 hosts the slot stays -1; RequestQueryConfig
|
||||
* Profiles' AV1 push then no-ops because any_fd_supports_
|
||||
* output_format() returns false for AV1F.
|
||||
* ampere-av1-enablement Phase 2 — additionally probe for
|
||||
* vpu981 (RK3588's dedicated AV1 decoder). Driver name
|
||||
* "hantro-vpu" alone is ambiguous on RK3588 (3 instances:
|
||||
* legacy MPEG2/VP8, encoder, vpu981 AV1). Discriminate by
|
||||
* V4L2_PIX_FMT_AV1_FRAME capability. If the primary or alt
|
||||
* hantro happens to BE vpu981 (unlikely but possible on
|
||||
* non-RK3588 boards), this probe finds it again and we just
|
||||
* dedupe via the fd value.
|
||||
*/
|
||||
{
|
||||
int i;
|
||||
char path[32], av1_video[32];
|
||||
|
||||
for (i = 0; i < 16; i++) {
|
||||
int mfd, vfd;
|
||||
struct media_device_info info;
|
||||
|
||||
snprintf(path, sizeof path, "/dev/media%d", i);
|
||||
mfd = open(path, O_RDWR | O_NONBLOCK);
|
||||
if (mfd < 0) continue;
|
||||
memset(&info, 0, sizeof info);
|
||||
if (ioctl(mfd, MEDIA_IOC_DEVICE_INFO, &info) != 0 ||
|
||||
strcmp(info.driver, "hantro-vpu") != 0) {
|
||||
close(mfd);
|
||||
continue;
|
||||
static char av1_video[32], av1_media[32];
|
||||
if (find_decoder_device_by_driver_with_fmt(
|
||||
"hantro-vpu", V4L2_PIX_FMT_AV1_FRAME,
|
||||
av1_video, sizeof av1_video,
|
||||
av1_media, sizeof av1_media) == 0) {
|
||||
int av1_v = open(av1_video, O_RDWR | O_NONBLOCK);
|
||||
int av1_m = (av1_v >= 0)
|
||||
? open(av1_media, O_RDWR | O_NONBLOCK)
|
||||
: -1;
|
||||
if (av1_v >= 0 && av1_m >= 0) {
|
||||
driver_data->video_fd_vpu981 = av1_v;
|
||||
driver_data->media_fd_vpu981 = av1_m;
|
||||
request_log(
|
||||
"ampere-av1: vpu981 AV1 decoder "
|
||||
"at %s + %s\n",
|
||||
av1_video, av1_media);
|
||||
} else {
|
||||
if (av1_v >= 0) close(av1_v);
|
||||
if (av1_m >= 0) close(av1_m);
|
||||
}
|
||||
if (find_decoder_video_node_via_topology(
|
||||
mfd, av1_video, sizeof av1_video) != 0) {
|
||||
close(mfd);
|
||||
continue;
|
||||
}
|
||||
vfd = open(av1_video, O_RDWR | O_NONBLOCK);
|
||||
if (vfd < 0) {
|
||||
close(mfd);
|
||||
continue;
|
||||
}
|
||||
if (!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT, V4L2_PIX_FMT_AV1_FRAME) &&
|
||||
!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, V4L2_PIX_FMT_AV1_FRAME)) {
|
||||
close(vfd);
|
||||
close(mfd);
|
||||
continue;
|
||||
}
|
||||
driver_data->video_fd_vpu981 = vfd;
|
||||
driver_data->media_fd_vpu981 = mfd;
|
||||
request_log("ampere-av1: vpu981 AV1 decoder at %s + %s\n",
|
||||
av1_video, path);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -886,27 +824,29 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
||||
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rkvdec);
|
||||
driver_data->has_hevc_ext_sps_rps_hantro =
|
||||
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_hantro);
|
||||
driver_data->has_hevc_ext_sps_rps_rpi_hevc_dec =
|
||||
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rpi_hevc_dec);
|
||||
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
|
||||
request_log("iter2: kernel registers HEVC EXT_SPS_{ST,LT}_RPS "
|
||||
"controls on rkvdec fd (will route through "
|
||||
"vendored GStreamer parser)\n");
|
||||
}
|
||||
if (driver_data->video_fd_rpi_hevc_dec >= 0) {
|
||||
request_log("iter40: also opened rpi-hevc-dec at video_fd=%d "
|
||||
"media_fd=%d (Pi 5 HEVC stateless)\n",
|
||||
driver_data->video_fd_rpi_hevc_dec,
|
||||
driver_data->media_fd_rpi_hevc_dec);
|
||||
|
||||
/*
|
||||
* ampere-av1 Phase 2.1: probe V4L2_CID_STATELESS_AV1_FILM_GRAIN
|
||||
* on the vpu981 fd. Per Janet v3 amendment, this runs at backend
|
||||
* init (not lazily) so any race window with concurrent device
|
||||
* switching can't observe an inconsistent flag.
|
||||
*/
|
||||
driver_data->has_av1_film_grain = false;
|
||||
if (driver_data->video_fd_vpu981 >= 0) {
|
||||
struct v4l2_query_ext_ctrl qec;
|
||||
if (v4l2_query_ext_ctrl(driver_data->video_fd_vpu981,
|
||||
V4L2_CID_STATELESS_AV1_FILM_GRAIN,
|
||||
&qec) == 0) {
|
||||
driver_data->has_av1_film_grain = true;
|
||||
request_log("ampere-av1: vpu981 advertises FILM_GRAIN "
|
||||
"control (will include in per-frame batch)\n");
|
||||
}
|
||||
}
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
if (driver_data->video_fd_daedalus >= 0) {
|
||||
request_log("phase 8.10: opened daedalus_v4l2 at video_fd=%d "
|
||||
"media_fd=%d (Pi 5 daemon-backed VP9/AV1/H264)\n",
|
||||
driver_data->video_fd_daedalus,
|
||||
driver_data->media_fd_daedalus);
|
||||
}
|
||||
#endif
|
||||
|
||||
status = VA_STATUS_SUCCESS;
|
||||
goto complete;
|
||||
@@ -954,23 +894,15 @@ VAStatus RequestTerminate(VADriverContextP context)
|
||||
close(driver_data->video_fd_hantro);
|
||||
if (driver_data->media_fd_hantro >= 0)
|
||||
close(driver_data->media_fd_hantro);
|
||||
if (driver_data->video_fd_rpi_hevc_dec >= 0)
|
||||
close(driver_data->video_fd_rpi_hevc_dec);
|
||||
if (driver_data->media_fd_rpi_hevc_dec >= 0)
|
||||
close(driver_data->media_fd_rpi_hevc_dec);
|
||||
if (driver_data->video_fd_vpu981 >= 0)
|
||||
close(driver_data->video_fd_vpu981);
|
||||
if (driver_data->media_fd_vpu981 >= 0)
|
||||
close(driver_data->media_fd_vpu981);
|
||||
#ifdef HAVE_DAEDALUS_V4L2
|
||||
if (driver_data->video_fd_daedalus >= 0)
|
||||
close(driver_data->video_fd_daedalus);
|
||||
if (driver_data->media_fd_daedalus >= 0)
|
||||
close(driver_data->media_fd_daedalus);
|
||||
#endif
|
||||
/* Fall back to direct close if neither alt fd captured the active
|
||||
* pair (env-override path). */
|
||||
if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) {
|
||||
if (driver_data->video_fd_rkvdec < 0 &&
|
||||
driver_data->video_fd_hantro < 0 &&
|
||||
driver_data->video_fd_vpu981 < 0) {
|
||||
if (driver_data->video_fd >= 0)
|
||||
close(driver_data->video_fd);
|
||||
if (driver_data->media_fd >= 0)
|
||||
|
||||
+22
-88
@@ -42,16 +42,7 @@
|
||||
|
||||
#define V4L2_REQUEST_STR_VENDOR "v4l2-request"
|
||||
|
||||
/*
|
||||
* Sized for max-possible enumeration with iter39 Option B reverted:
|
||||
* MPEG2(2) + H264(6 incl. Hi10P) + HEVC(2 incl. Main10) + VP8 + VP9 + AV1 = 13.
|
||||
* The per-group guards use `if (... && index < (MAX_PROFILES - N))` where N
|
||||
* is the push-group size, so MAX must be ≥ total+1 — 14 here. Bumping
|
||||
* defensively now so a future re-enable of Hi10P/Main10 doesn't silently
|
||||
* drop AV1 through the off-by-one trap that ate ampere-av1's enumeration
|
||||
* for a week (see issue marfrit/libva-v4l2-request-fourier#2).
|
||||
*/
|
||||
#define V4L2_REQUEST_MAX_PROFILES 14
|
||||
#define V4L2_REQUEST_MAX_PROFILES 11
|
||||
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5
|
||||
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
|
||||
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
|
||||
@@ -87,45 +78,17 @@ struct request_data {
|
||||
int media_fd_rkvdec;
|
||||
int video_fd_hantro;
|
||||
int media_fd_hantro;
|
||||
|
||||
/*
|
||||
* iter40: third multi-device-probe slot for rpi-hevc-dec (Pi 5 /
|
||||
* CM5 / BCM2712). V4L2 stateless HEVC; CAPTURE is NC12/NC30 SAND
|
||||
* 128-pixel-wide column tiled (Pi-specific). On Pi 5 this is the
|
||||
* ONLY decoder slot; on RK hosts it stays -1 and HEVC routes to
|
||||
* rkvdec as before.
|
||||
*/
|
||||
int video_fd_rpi_hevc_dec;
|
||||
int media_fd_rpi_hevc_dec;
|
||||
/*
|
||||
* phase 8.10: fifth multi-device-probe slot for daedalus_v4l2 — the
|
||||
* out-of-tree V4L2 stateless decoder shim that forwards bitstream
|
||||
* to a userspace daemon (daedalus-v4l2 sibling repo). Daemon does
|
||||
* FFmpeg-software decode for VP9 / AV1 / H.264 and ships pixels
|
||||
* back via dmabuf into the CAPTURE buffer. Picked up via the
|
||||
* same media-controller probe + known_decoder_drivers[] entry
|
||||
* pattern as iter40 rpi-hevc-dec. Stays -1 on hosts without the
|
||||
* daedalus module loaded; HEVC routes to rpi-hevc-dec as before.
|
||||
* ampere-av1-enablement Phase 2 — vpu981 is a THIRD physical
|
||||
* hantro-vpu instance on RK3588 (separate from the legacy MPEG2/VP8
|
||||
* hantro at /dev/video2). It's the dedicated AV1 decoder at
|
||||
* /dev/video4 with card name "rockchip,rk3588-av1-vpu-dec".
|
||||
*
|
||||
* Fields are unconditional (8 bytes per session) so the struct
|
||||
* layout is stable regardless of meson option. The active
|
||||
* probe + dispatch code in request.c is gated by
|
||||
* HAVE_DAEDALUS_V4L2; when disabled the fields stay at their
|
||||
* -1 init and no codepath touches them.
|
||||
*/
|
||||
int video_fd_daedalus;
|
||||
int media_fd_daedalus;
|
||||
/*
|
||||
* ampere-av1-enablement Phase 2: fourth multi-device-probe slot
|
||||
* for vpu981 (RK3588's dedicated AV1 hantro instance, kernel
|
||||
* card="rockchip,rk3588-av1-vpu-dec", driver name "hantro-vpu" —
|
||||
* shared with the legacy MPEG-2/VP8/H.264 hantro). Discriminated
|
||||
* by V4L2_PIX_FMT_AV1_FRAME (AV1F) OUTPUT-pixfmt capability since
|
||||
* the driver name alone is ambiguous on RK3588. Stays -1 on hosts
|
||||
* without the AV1 vpu-dec.
|
||||
*
|
||||
* Named "vpu981" for consistency with the in-progress av1-iter1
|
||||
* operator branch (Phase 3-5 bit-exact AV1 work — when that lands
|
||||
* these fields receive the actual decode dispatch wiring).
|
||||
* Driver-name alone ("hantro-vpu") is ambiguous on RK3588 — three
|
||||
* instances share the name. The probe discriminates by capability:
|
||||
* which OUTPUT format does the device advertise? Only vpu981
|
||||
* exposes V4L2_PIX_FMT_AV1_FRAME.
|
||||
*/
|
||||
int video_fd_vpu981;
|
||||
int media_fd_vpu981;
|
||||
@@ -149,12 +112,18 @@ struct request_data {
|
||||
*/
|
||||
bool has_hevc_ext_sps_rps_rkvdec;
|
||||
bool has_hevc_ext_sps_rps_hantro;
|
||||
/* iter40: rpi-hevc-dec doesn't expose EXT_SPS_*_RPS controls
|
||||
* (verified Phase 0 higgs probe: QUERY_EXT_CTRL on 0xa97 → EINVAL).
|
||||
* Probed for consistency with the iter2 pair-of-flags pattern;
|
||||
* stays false on Pi 5 and the iter2 vendored-parser path naturally
|
||||
* doesn't engage. */
|
||||
bool has_hevc_ext_sps_rps_rpi_hevc_dec;
|
||||
|
||||
/*
|
||||
* ampere-av1 Phase 2.1: probe result for the optional
|
||||
* V4L2_CID_STATELESS_AV1_FILM_GRAIN control on the vpu981 fd.
|
||||
* Probed at VA_DRIVER_INIT (per Janet v3 amendment — init-time
|
||||
* not lazy). Consumed by av1_set_controls to conditionally include
|
||||
* the 4th control in the per-frame batch.
|
||||
*
|
||||
* True iff vpu981 advertises the control via VIDIOC_QUERY_EXT_CTRL.
|
||||
* False for non-RK3588 hosts (no vpu981 fd) or older kernels.
|
||||
*/
|
||||
bool has_av1_film_grain;
|
||||
|
||||
/*
|
||||
* iter2 — cached SPS-derived RPS arrays. SPS NALs only appear in
|
||||
@@ -179,30 +148,6 @@ struct request_data {
|
||||
unsigned int hevc_rps_cache_lt_count;
|
||||
bool hevc_rps_cache_valid;
|
||||
|
||||
/*
|
||||
* iter40b: bitstream-derived SPS field cache for VAAPI-omitted
|
||||
* fields. rpi-hevc-dec validates these against bitstream-true
|
||||
* values; the rkvdec/hantro fallback (sps_max_dec_pic_buffering_minus1,
|
||||
* 0) that satisfies §A.4.2 isn't enough for rpi.
|
||||
*
|
||||
* Cached on first IDR frame's SPS NAL parse, reused for subsequent
|
||||
* non-IDR frames whose source_data may not carry an SPS.
|
||||
*
|
||||
* sps_max_sub_layers_minus1 is the index into max_*[] arrays. The
|
||||
* V4L2 SPS struct fields are scalars (single sublayer), so we pick
|
||||
* the HighestTid (= sps_max_sub_layers_minus1) slot — matches
|
||||
* ffmpeg-vaapi + kdirect convention.
|
||||
*/
|
||||
struct {
|
||||
bool valid;
|
||||
uint8_t sps_max_sub_layers_minus1;
|
||||
uint8_t max_dec_pic_buffering_minus1;
|
||||
uint8_t max_num_reorder_pics;
|
||||
uint8_t max_latency_increase_plus1;
|
||||
bool scaling_list_enabled;
|
||||
bool scaling_list_data_present;
|
||||
} hevc_sps_field_cache;
|
||||
|
||||
struct video_format *video_format;
|
||||
|
||||
/*
|
||||
@@ -259,17 +204,6 @@ struct request_data {
|
||||
unsigned int fmt_buffers_count;
|
||||
unsigned int fmt_sizes[VIDEO_MAX_PLANES];
|
||||
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
|
||||
|
||||
/*
|
||||
* iter39: active session is decoding a 10-bit profile (Hi10P / Main10).
|
||||
* Set in RequestCreateContext from config->profile. Drives:
|
||||
* - CAPTURE pix_fmt selection (NV15 instead of NV12)
|
||||
* - image.c DeriveImage / QueryImageFormats fourcc reporting (P010
|
||||
* instead of NV12)
|
||||
* - copy_surface_to_image NV15→P010 unpack branch
|
||||
* Reset to false at DestroyContext.
|
||||
*/
|
||||
bool is_10bit;
|
||||
};
|
||||
|
||||
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
|
||||
|
||||
+11
-11
@@ -111,6 +111,13 @@ void surface_unbind_slot(struct request_data *driver_data,
|
||||
{
|
||||
if (surface_object->current_slot == NULL)
|
||||
return;
|
||||
/* AV1 Phase 3 diag: log every unbind with surface id + slot idx
|
||||
* + status — confirms whether BeginPicture rebind is racing the
|
||||
* consumer's vaGetImage on the previous frame. */
|
||||
request_log("surface_unbind_slot id=0x%x status=%d slot_idx=%u\n",
|
||||
surface_object->base.id,
|
||||
surface_object->status,
|
||||
surface_object->current_slot->v4l2_index);
|
||||
cap_pool_release(&driver_data->capture_pool, surface_object->current_slot);
|
||||
surface_object->current_slot = NULL;
|
||||
}
|
||||
@@ -182,9 +189,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
||||
* surface_bind_format_uniform_fields(); the per-slot
|
||||
* destination_* fields fill at BeginPicture via surface_bind_slot.
|
||||
*/
|
||||
/* iter39: allow YUV420_10 for Hi10P / Main10 surface allocation. */
|
||||
if (format != VA_RT_FORMAT_YUV420 &&
|
||||
format != VA_RT_FORMAT_YUV420_10)
|
||||
if (format != VA_RT_FORMAT_YUV420)
|
||||
return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
|
||||
|
||||
for (i = 0; i < surfaces_count; i++) {
|
||||
@@ -194,6 +199,8 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
||||
return VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
|
||||
surface_object->current_slot = NULL; /* iter2 Fix 3 */
|
||||
surface_object->linked_decode_surface_id = VA_INVALID_SURFACE;
|
||||
surface_object->av1_order_hint = 0;
|
||||
surface_object->destination_index = 0; /* set on bind */
|
||||
surface_object->destination_planes_count = 0; /* set at CreateContext */
|
||||
surface_object->destination_buffers_count = 0; /* set at CreateContext */
|
||||
@@ -708,14 +715,7 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
|
||||
|
||||
planes_count = surface_object->destination_planes_count;
|
||||
|
||||
/* iter39: 10-bit session exports a DRM_FORMAT_NV15 buffer; advertise
|
||||
* the matching fourcc so a PRIME consumer aware of NV15 (panfrost-
|
||||
* Mesa et al.) can import correctly. PRIME consumers that only know
|
||||
* NV12 / P010 should use the COPY (vaGetImage) path which unpacks
|
||||
* NV15→P010 in image.c::copy_surface_to_image. */
|
||||
surface_descriptor->fourcc = driver_data->is_10bit
|
||||
? VA_FOURCC('N', 'V', '1', '5')
|
||||
: VA_FOURCC_NV12;
|
||||
surface_descriptor->fourcc = VA_FOURCC_NV12;
|
||||
surface_descriptor->width = surface_object->width;
|
||||
surface_descriptor->height = surface_object->height;
|
||||
surface_descriptor->num_objects = export_fds_count;
|
||||
|
||||
@@ -89,6 +89,33 @@ struct object_surface {
|
||||
|
||||
struct timeval timestamp;
|
||||
|
||||
/*
|
||||
* AV1 Phase 3: for streams with apply_grain=1, VAAPI's
|
||||
* VADecPictureParameterBufferAV1 carries current_display_picture
|
||||
* (display-time surface) separate from current_frame (decode
|
||||
* target). vpu981 HW applies grain inline to the decode CAPTURE
|
||||
* buffer, so the decoded data lives in current_frame's slot — but
|
||||
* ffmpeg calls vaGetImage on current_display_picture which has no
|
||||
* slot bound. linked_decode_surface_id, set in av1_set_controls
|
||||
* on the display surface, points to the decode surface so
|
||||
* copy_surface_to_image can borrow its destination_data[].
|
||||
*
|
||||
* VA_INVALID_SURFACE = no link (the common case: 8-bit codecs,
|
||||
* AV1 with apply_grain=0, AV1 frames where cur_frame ==
|
||||
* cur_display).
|
||||
*/
|
||||
VASurfaceID linked_decode_surface_id;
|
||||
|
||||
/*
|
||||
* AV1 Phase 3: AV1 order_hint of the frame currently decoded into
|
||||
* this surface. VAAPI's VADecPictureParameterBufferAV1.order_hint
|
||||
* is per-frame; kernel's v4l2_ctrl_av1_frame.order_hints[8] is
|
||||
* per-reference. We track each decoded frame's order_hint here so
|
||||
* the next frame's av1_set_controls can populate order_hints[i]
|
||||
* from ref_frame_map[i] → SURFACE → av1_order_hint.
|
||||
*/
|
||||
uint8_t av1_order_hint;
|
||||
|
||||
union {
|
||||
struct {
|
||||
VAPictureParameterBufferMPEG2 picture;
|
||||
@@ -122,6 +149,19 @@ struct object_surface {
|
||||
VADecPictureParameterBufferVP9 picture;
|
||||
VASliceParameterBufferVP9 slice;
|
||||
} vp9;
|
||||
/*
|
||||
* ampere-av1-enablement: AV1 needs picture-header +
|
||||
* variable number of slice/tile params (one per tile).
|
||||
* tile_group_entries[] holds parsed VASliceParameterBufferAV1
|
||||
* entries up to MAX_TILES; av1.c builds the matching
|
||||
* v4l2_ctrl_av1_tile_group_entry[] at set_controls time.
|
||||
*/
|
||||
struct {
|
||||
#define AV1_MAX_TILES 128
|
||||
VADecPictureParameterBufferAV1 picture;
|
||||
VASliceParameterBufferAV1 tile_group_entries[AV1_MAX_TILES];
|
||||
unsigned int num_tile_group_entries;
|
||||
} av1;
|
||||
} params;
|
||||
|
||||
int request_fd;
|
||||
|
||||
+26
-27
@@ -433,6 +433,7 @@ static int v4l2_ioctl_controls(int video_fd, int request_fd, unsigned long ioc,
|
||||
unsigned int num_controls)
|
||||
{
|
||||
struct v4l2_ext_controls controls;
|
||||
int rc;
|
||||
|
||||
memset(&controls, 0, sizeof(controls));
|
||||
|
||||
@@ -444,7 +445,28 @@ static int v4l2_ioctl_controls(int video_fd, int request_fd, unsigned long ioc,
|
||||
controls.request_fd = request_fd;
|
||||
}
|
||||
|
||||
return ioctl(video_fd, ioc, &controls);
|
||||
rc = ioctl(video_fd, ioc, &controls);
|
||||
if (rc < 0) {
|
||||
/* ampere-av1 Phase 2.1 diag: surface error_idx so the caller's
|
||||
* error path knows which CID failed validation. error_idx >=
|
||||
* count means the failure was pre-validation (e.g., bad
|
||||
* request_fd). errno carries the syscall-level reason. */
|
||||
const char *failed_cid_label = "<pre-validation>";
|
||||
unsigned int failed_size = 0;
|
||||
if (controls.error_idx < num_controls) {
|
||||
failed_size = control_array[controls.error_idx].size;
|
||||
(void)failed_cid_label; /* keep symbol if logger truncates */
|
||||
}
|
||||
request_log("v4l2_ioctl_controls: rc=%d errno=%d (%s) "
|
||||
"ioc=0x%lx error_idx=%u count=%u "
|
||||
"failed_cid=0x%x failed_size=%u\n",
|
||||
rc, errno, strerror(errno), ioc,
|
||||
controls.error_idx, num_controls,
|
||||
controls.error_idx < num_controls
|
||||
? control_array[controls.error_idx].id : 0,
|
||||
failed_size);
|
||||
}
|
||||
return rc;
|
||||
}
|
||||
|
||||
int v4l2_get_controls(int video_fd, int request_fd,
|
||||
@@ -476,35 +498,12 @@ int v4l2_set_controls(int video_fd, int request_fd,
|
||||
struct v4l2_ext_control *control_array,
|
||||
unsigned int num_controls)
|
||||
{
|
||||
struct v4l2_ext_controls controls;
|
||||
int rc;
|
||||
|
||||
memset(&controls, 0, sizeof(controls));
|
||||
controls.controls = control_array;
|
||||
controls.count = num_controls;
|
||||
if (request_fd >= 0) {
|
||||
controls.which = V4L2_CTRL_WHICH_REQUEST_VAL;
|
||||
controls.request_fd = request_fd;
|
||||
}
|
||||
|
||||
rc = ioctl(video_fd, VIDIOC_S_EXT_CTRLS, &controls);
|
||||
rc = v4l2_ioctl_controls(video_fd, request_fd, VIDIOC_S_EXT_CTRLS,
|
||||
control_array, num_controls);
|
||||
if (rc < 0) {
|
||||
/* error_idx is the index of the first failing control;
|
||||
* if it equals count, the ioctl itself failed (not a
|
||||
* specific control payload). Useful for triaging
|
||||
* which V4L2_CID_STATELESS_* the kernel rejected. */
|
||||
if (controls.error_idx < num_controls)
|
||||
request_log("Unable to set control(s): %s "
|
||||
"(error_idx=%u/%u failing_ctrl_id=0x%x size=%u)\n",
|
||||
strerror(errno),
|
||||
controls.error_idx, controls.count,
|
||||
control_array[controls.error_idx].id,
|
||||
control_array[controls.error_idx].size);
|
||||
else
|
||||
request_log("Unable to set control(s): %s "
|
||||
"(error_idx=%u/%u ioctl-level)\n",
|
||||
strerror(errno),
|
||||
controls.error_idx, controls.count);
|
||||
request_log("Unable to set control(s): %s\n", strerror(errno));
|
||||
return -1;
|
||||
}
|
||||
|
||||
|
||||
-34
@@ -31,8 +31,6 @@
|
||||
#include <drm_fourcc.h>
|
||||
#include <linux/videodev2.h>
|
||||
|
||||
#include "nv12_col128.h" /* fallback V4L2_PIX_FMT_NV12_COL128 define */
|
||||
#include "nv15.h" /* fallback V4L2_PIX_FMT_NV15 define */
|
||||
#include "utils.h"
|
||||
#include "video.h"
|
||||
|
||||
@@ -47,38 +45,6 @@ static struct video_format formats[] = {
|
||||
.planes_count = 2,
|
||||
.bpp = 16,
|
||||
},
|
||||
{
|
||||
.description = "NV15 YUV (10-bit, rkvdec)",
|
||||
.v4l2_format = V4L2_PIX_FMT_NV15,
|
||||
.v4l2_buffers_count = 1,
|
||||
.v4l2_mplane = true,
|
||||
.drm_format = DRM_FORMAT_NV15,
|
||||
.drm_modifier = DRM_FORMAT_MOD_NONE,
|
||||
.planes_count = 2,
|
||||
.bpp = 24,
|
||||
},
|
||||
{
|
||||
/*
|
||||
* iter40: Pi 5 / CM5 rpi-hevc-dec CAPTURE format. 8-bit NV12
|
||||
* stored as 128-pixel-wide column tiles (SAND128 layout).
|
||||
* Pi-specific; not in mainline drm_fourcc.h (uses NV12 + a
|
||||
* BROADCOM_SAND128 modifier for DRM_PRIME). Our consumer path
|
||||
* always detiles to linear NV12 in copy_surface_to_image, so
|
||||
* we don't expose the SAND modifier downstream — drm_format is
|
||||
* still DRM_FORMAT_NV12 and drm_modifier MOD_NONE so the
|
||||
* format-is-linear gate doesn't pull us into tiled_to_planar
|
||||
* (Sunxi-specific). image.c branches on v4l2_format ==
|
||||
* V4L2_PIX_FMT_NV12_COL128 to invoke the dedicated detile.
|
||||
*/
|
||||
.description = "NV12 SAND128 (8-bit, rpi-hevc-dec)",
|
||||
.v4l2_format = V4L2_PIX_FMT_NV12_COL128,
|
||||
.v4l2_buffers_count = 1,
|
||||
.v4l2_mplane = true,
|
||||
.drm_format = DRM_FORMAT_NV12,
|
||||
.drm_modifier = DRM_FORMAT_MOD_NONE,
|
||||
.planes_count = 2,
|
||||
.bpp = 16,
|
||||
},
|
||||
// Code to handle this DRM_FORMAT is __arm__ only
|
||||
#ifdef __arm__
|
||||
{
|
||||
|
||||
@@ -1,196 +0,0 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* MIT-licensed per project. iter40 self-test for nv12_col128 detile.
|
||||
*
|
||||
* Build an NC12-tiled source buffer from a known linear NV12 image,
|
||||
* run the detile primitive, assert output matches the original. No
|
||||
* hardware needed — pure bit-layout verification of the kernel math
|
||||
* (drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
|
||||
* V4L2_PIX_FMT_NV12_COL128 case + ffmpeg/Kynesim per-pixel offset).
|
||||
*
|
||||
* Build:
|
||||
* cc -Wall -Werror -O2 -o test_nv12_col128_detile \
|
||||
* tests/test_nv12_col128_detile.c src/nv12_col128.c
|
||||
*
|
||||
* Exit 0 = all asserts pass.
|
||||
*/
|
||||
|
||||
#include "../src/nv12_col128.h"
|
||||
|
||||
#include <assert.h>
|
||||
#include <stdint.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
#define TILE_W 128
|
||||
|
||||
static unsigned int align_up(unsigned int v, unsigned int a)
|
||||
{
|
||||
return (v + a - 1) & ~(a - 1);
|
||||
}
|
||||
|
||||
/* Pack a linear plane (width × height bytes, stride=width) into NC12
|
||||
* layout: each 128-wide column held contiguously, columns at offsets
|
||||
* col * col_stride * 128. col_stride is the kernel-reported bytesperline
|
||||
* = ALIGN(height, 8) * 3/2. Returns the buffer + sizes. */
|
||||
static uint8_t *pack_to_nc12(const uint8_t *linear,
|
||||
unsigned int width, unsigned int height,
|
||||
unsigned int *out_col_stride,
|
||||
size_t *out_size)
|
||||
{
|
||||
unsigned int aligned_w = align_up(width, TILE_W);
|
||||
unsigned int aligned_h = align_up(height, 8);
|
||||
unsigned int col_stride = aligned_h * 3 / 2;
|
||||
unsigned int num_cols = aligned_w / TILE_W;
|
||||
size_t total = (size_t)col_stride * aligned_w;
|
||||
uint8_t *buf;
|
||||
unsigned int col, y, in_col;
|
||||
|
||||
buf = calloc(1, total);
|
||||
assert(buf != NULL);
|
||||
|
||||
for (col = 0; col < num_cols; col++) {
|
||||
uint8_t *col_base = buf + (size_t)col * TILE_W * col_stride;
|
||||
for (y = 0; y < height; y++) {
|
||||
for (in_col = 0; in_col < TILE_W; in_col++) {
|
||||
unsigned int x = col * TILE_W + in_col;
|
||||
if (x >= width)
|
||||
break;
|
||||
col_base[(size_t)y * TILE_W + in_col] =
|
||||
linear[(size_t)y * width + x];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
*out_col_stride = col_stride;
|
||||
*out_size = total;
|
||||
return buf;
|
||||
}
|
||||
|
||||
static void test_detile_y(unsigned int width, unsigned int height)
|
||||
{
|
||||
uint8_t *linear, *tiled, *recovered;
|
||||
unsigned int col_stride;
|
||||
size_t tile_size, i;
|
||||
|
||||
linear = malloc((size_t)width * height);
|
||||
assert(linear != NULL);
|
||||
/* Distinctive content per pixel: y * 17 + x * 13 — avoids byte-
|
||||
* aliasing patterns that could mask off-by-one bugs. */
|
||||
for (unsigned int y = 0; y < height; y++)
|
||||
for (unsigned int x = 0; x < width; x++)
|
||||
linear[(size_t)y * width + x] = (uint8_t)(y * 17 + x * 13);
|
||||
|
||||
tiled = pack_to_nc12(linear, width, height, &col_stride, &tile_size);
|
||||
|
||||
recovered = calloc(1, (size_t)width * height);
|
||||
assert(recovered != NULL);
|
||||
|
||||
nv12_col128_detile_y(recovered, width, tiled, col_stride, width, height);
|
||||
|
||||
for (i = 0; i < (size_t)width * height; i++) {
|
||||
if (recovered[i] != linear[i]) {
|
||||
fprintf(stderr,
|
||||
"FAIL %ux%u Y: pixel %zu (x=%zu y=%zu) "
|
||||
"linear=0x%02x recovered=0x%02x\n",
|
||||
width, height, i,
|
||||
i % width, i / width,
|
||||
linear[i], recovered[i]);
|
||||
free(linear); free(tiled); free(recovered);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
printf("PASS %ux%u Y plane (%u columns, col_stride=%u, tile_size=%zu)\n",
|
||||
width, height, align_up(width, TILE_W) / TILE_W,
|
||||
col_stride, tile_size);
|
||||
|
||||
free(linear);
|
||||
free(tiled);
|
||||
free(recovered);
|
||||
}
|
||||
|
||||
static void test_detile_uv(unsigned int width, unsigned int height)
|
||||
{
|
||||
unsigned int uv_h = height / 2;
|
||||
uint8_t *linear, *tiled, *recovered;
|
||||
unsigned int col_stride;
|
||||
size_t tile_size, i;
|
||||
|
||||
linear = malloc((size_t)width * uv_h);
|
||||
assert(linear != NULL);
|
||||
for (unsigned int y = 0; y < uv_h; y++)
|
||||
for (unsigned int x = 0; x < width; x++)
|
||||
linear[(size_t)y * width + x] = (uint8_t)(y * 23 + x * 7);
|
||||
|
||||
tiled = pack_to_nc12(linear, width, uv_h, &col_stride, &tile_size);
|
||||
|
||||
recovered = calloc(1, (size_t)width * uv_h);
|
||||
assert(recovered != NULL);
|
||||
|
||||
nv12_col128_detile_uv(recovered, width, tiled, col_stride, width, uv_h);
|
||||
|
||||
for (i = 0; i < (size_t)width * uv_h; i++) {
|
||||
if (recovered[i] != linear[i]) {
|
||||
fprintf(stderr,
|
||||
"FAIL %ux%u UV: pixel %zu linear=0x%02x recovered=0x%02x\n",
|
||||
width, height, i,
|
||||
linear[i], recovered[i]);
|
||||
free(linear); free(tiled); free(recovered);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
printf("PASS %ux%u UV plane\n", width, height);
|
||||
|
||||
free(linear);
|
||||
free(tiled);
|
||||
free(recovered);
|
||||
}
|
||||
|
||||
static void test_uv_offset(void)
|
||||
{
|
||||
/* Per the SAND COL128 layout, Y and UV are interleaved within
|
||||
* EACH column (not concatenated as separate planes), so the UV
|
||||
* plane base pointer is offset by 128 * ALIGN(height, 8) — the
|
||||
* Y portion of column 0. NOT 128 * height * num_columns (the
|
||||
* size of all Y across all columns), which was an earlier wrong
|
||||
* formula caught by Phase 7 SEGV on higgs. */
|
||||
unsigned int off = nv12_col128_uv_plane_offset(1280, 720);
|
||||
if (off != 128u * 720) {
|
||||
fprintf(stderr, "FAIL UV offset 1280×720: got %u expected %u\n",
|
||||
off, 128u * 720);
|
||||
exit(1);
|
||||
}
|
||||
printf("PASS UV offset 1280×720 = %u\n", off);
|
||||
|
||||
off = nv12_col128_uv_plane_offset(1366, 768);
|
||||
if (off != 128u * 768) {
|
||||
fprintf(stderr, "FAIL UV offset 1366×768: got %u expected %u\n",
|
||||
off, 128u * 768);
|
||||
exit(1);
|
||||
}
|
||||
printf("PASS UV offset 1366×768 (column-misaligned width)\n");
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
/* Phase 3 fixture sizes — all 128-aligned, 8-line-aligned. */
|
||||
test_detile_y(640, 360);
|
||||
test_detile_y(1280, 720);
|
||||
test_detile_y(1920, 1080);
|
||||
|
||||
/* Phase 5 review F4: column-misaligned width (1366 → 1408 padding). */
|
||||
test_detile_y(1366, 768);
|
||||
|
||||
/* UV plane (half-height) at each width. */
|
||||
test_detile_uv(640, 360);
|
||||
test_detile_uv(1280, 720);
|
||||
test_detile_uv(1920, 1080);
|
||||
test_detile_uv(1366, 768);
|
||||
|
||||
test_uv_offset();
|
||||
|
||||
printf("All NC12 detile asserts pass.\n");
|
||||
return 0;
|
||||
}
|
||||
@@ -1,224 +0,0 @@
|
||||
/*
|
||||
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the
|
||||
* "Software"), to deal in the Software without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sub license, and/or sell copies of the Software, and to
|
||||
* permit persons to whom the Software is furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice (including the
|
||||
* next paragraph) shall be included in all copies or substantial portions
|
||||
* of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
/*
|
||||
* iter39 self-test for nv15_unpack_plane_to_p010.
|
||||
*
|
||||
* Builds NV15 plane buffers from known 10-bit pixel arrays, runs the
|
||||
* unpack, asserts P010 output matches the expected pixel<<6 values.
|
||||
* No hardware needed — pure bit layout verification per
|
||||
* Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
|
||||
*
|
||||
* Build:
|
||||
* cc -Wall -Werror -O2 -o test_nv15_unpack tests/test_nv15_unpack.c src/nv15.c
|
||||
*
|
||||
* Exit 0 = all asserts pass.
|
||||
*/
|
||||
|
||||
#include "../src/nv15.h"
|
||||
|
||||
#include <assert.h>
|
||||
#include <stdint.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/* Pack 4 10-bit pixels into 5 bytes per NV15 layout (LSB-first across
|
||||
* bits 0..39). Inverse of nv15_unpack_plane_to_p010's per-group unpack. */
|
||||
static void pack4(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
|
||||
uint8_t out[5])
|
||||
{
|
||||
out[0] = (uint8_t)(a & 0xFF);
|
||||
out[1] = (uint8_t)(((a >> 8) & 0x03) | ((b & 0x3F) << 2));
|
||||
out[2] = (uint8_t)(((b >> 6) & 0x0F) | ((c & 0x0F) << 4));
|
||||
out[3] = (uint8_t)(((c >> 4) & 0x3F) | ((d & 0x03) << 6));
|
||||
out[4] = (uint8_t)((d >> 2) & 0xFF);
|
||||
}
|
||||
|
||||
#define ASSERT_EQ(actual, expected, msg) do { \
|
||||
if ((actual) != (expected)) { \
|
||||
fprintf(stderr, "FAIL %s: actual=0x%04x expected=0x%04x at %s:%d\n", \
|
||||
(msg), (unsigned)(actual), (unsigned)(expected), \
|
||||
__FILE__, __LINE__); \
|
||||
exit(1); \
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
static void test_pack_unpack_roundtrip(uint16_t a, uint16_t b, uint16_t c,
|
||||
uint16_t d)
|
||||
{
|
||||
uint8_t packed[5];
|
||||
uint16_t dst[4];
|
||||
|
||||
pack4(a, b, c, d, packed);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||
ASSERT_EQ(dst[0], (uint16_t)(a << 6), "roundtrip a");
|
||||
ASSERT_EQ(dst[1], (uint16_t)(b << 6), "roundtrip b");
|
||||
ASSERT_EQ(dst[2], (uint16_t)(c << 6), "roundtrip c");
|
||||
ASSERT_EQ(dst[3], (uint16_t)(d << 6), "roundtrip d");
|
||||
}
|
||||
|
||||
static void test_zero(void)
|
||||
{
|
||||
uint8_t packed[5] = { 0, 0, 0, 0, 0 };
|
||||
uint16_t dst[4] = { 0xDEAD, 0xDEAD, 0xDEAD, 0xDEAD };
|
||||
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||
ASSERT_EQ(dst[0], 0, "zero[0]");
|
||||
ASSERT_EQ(dst[1], 0, "zero[1]");
|
||||
ASSERT_EQ(dst[2], 0, "zero[2]");
|
||||
ASSERT_EQ(dst[3], 0, "zero[3]");
|
||||
}
|
||||
|
||||
static void test_all_max(void)
|
||||
{
|
||||
/* All four pixels = 0x3FF (max 10-bit). Packed bits all 1 → all 0xFF. */
|
||||
uint8_t packed[5] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
|
||||
uint16_t dst[4] = { 0, 0, 0, 0 };
|
||||
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||
ASSERT_EQ(dst[0], 0xFFC0, "max[0]");
|
||||
ASSERT_EQ(dst[1], 0xFFC0, "max[1]");
|
||||
ASSERT_EQ(dst[2], 0xFFC0, "max[2]");
|
||||
ASSERT_EQ(dst[3], 0xFFC0, "max[3]");
|
||||
}
|
||||
|
||||
static void test_known_vectors(void)
|
||||
{
|
||||
/* Position-sensitive sanity: each pixel = its index+1. */
|
||||
test_pack_unpack_roundtrip(1, 2, 3, 4);
|
||||
/* Spread patterns that exercise every byte-boundary bit. */
|
||||
test_pack_unpack_roundtrip(0x3FF, 0x000, 0x3FF, 0x000);
|
||||
test_pack_unpack_roundtrip(0x000, 0x3FF, 0x000, 0x3FF);
|
||||
test_pack_unpack_roundtrip(0x155, 0x2AA, 0x155, 0x2AA);
|
||||
test_pack_unpack_roundtrip(0x001, 0x002, 0x004, 0x008);
|
||||
test_pack_unpack_roundtrip(0x080, 0x040, 0x020, 0x010);
|
||||
test_pack_unpack_roundtrip(0x200, 0x100, 0x080, 0x040);
|
||||
test_pack_unpack_roundtrip(0x3F0, 0x0F3, 0x33C, 0x2A5);
|
||||
}
|
||||
|
||||
static void test_remainder_width(void)
|
||||
{
|
||||
/* width=1: only A unpacked, B/C/D undefined */
|
||||
{
|
||||
uint8_t packed[5];
|
||||
uint16_t dst[1] = { 0xDEAD };
|
||||
pack4(0x123, 0x000, 0x000, 0x000, packed);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 1, 1, 5);
|
||||
ASSERT_EQ(dst[0], 0x123 << 6, "rem1[0]");
|
||||
}
|
||||
/* width=2 */
|
||||
{
|
||||
uint8_t packed[5];
|
||||
uint16_t dst[2] = { 0, 0 };
|
||||
pack4(0x111, 0x222, 0x000, 0x000, packed);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 2, 1, 5);
|
||||
ASSERT_EQ(dst[0], 0x111 << 6, "rem2[0]");
|
||||
ASSERT_EQ(dst[1], 0x222 << 6, "rem2[1]");
|
||||
}
|
||||
/* width=3 */
|
||||
{
|
||||
uint8_t packed[5];
|
||||
uint16_t dst[3] = { 0, 0, 0 };
|
||||
pack4(0x111, 0x222, 0x333, 0x000, packed);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 3, 1, 5);
|
||||
ASSERT_EQ(dst[0], 0x111 << 6, "rem3[0]");
|
||||
ASSERT_EQ(dst[1], 0x222 << 6, "rem3[1]");
|
||||
ASSERT_EQ(dst[2], 0x333 << 6, "rem3[2]");
|
||||
}
|
||||
/* width=7: one full group + 3 remainder */
|
||||
{
|
||||
uint8_t packed[10];
|
||||
uint16_t dst[7] = { 0 };
|
||||
pack4(0x100, 0x200, 0x300, 0x010, &packed[0]);
|
||||
pack4(0x011, 0x022, 0x033, 0x000, &packed[5]);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 7, 1, 10);
|
||||
ASSERT_EQ(dst[0], 0x100 << 6, "rem7[0]");
|
||||
ASSERT_EQ(dst[1], 0x200 << 6, "rem7[1]");
|
||||
ASSERT_EQ(dst[2], 0x300 << 6, "rem7[2]");
|
||||
ASSERT_EQ(dst[3], 0x010 << 6, "rem7[3]");
|
||||
ASSERT_EQ(dst[4], 0x011 << 6, "rem7[4]");
|
||||
ASSERT_EQ(dst[5], 0x022 << 6, "rem7[5]");
|
||||
ASSERT_EQ(dst[6], 0x033 << 6, "rem7[6]");
|
||||
}
|
||||
/* width=8: two full groups */
|
||||
{
|
||||
uint8_t packed[10];
|
||||
uint16_t dst[8] = { 0 };
|
||||
pack4(0x101, 0x202, 0x303, 0x101, &packed[0]);
|
||||
pack4(0x202, 0x303, 0x101, 0x202, &packed[5]);
|
||||
nv15_unpack_plane_to_p010(packed, dst, 8, 1, 10);
|
||||
ASSERT_EQ(dst[7], 0x202 << 6, "w8[7]");
|
||||
}
|
||||
}
|
||||
|
||||
static void test_multi_row_stride_padding(void)
|
||||
{
|
||||
/* 4-pixel-wide, 3-row plane; stride = 8 bytes (3 bytes padding). */
|
||||
uint8_t packed[24]; /* 3 rows × 8 bytes */
|
||||
uint16_t dst[12]; /* 3 rows × 4 pixels */
|
||||
memset(packed, 0xCC, sizeof(packed)); /* padding poison */
|
||||
|
||||
pack4(0x111, 0x222, 0x333, 0x044, &packed[0 * 8]);
|
||||
pack4(0x055, 0x166, 0x177, 0x188, &packed[1 * 8]);
|
||||
pack4(0x099, 0x1AA, 0x2BB, 0x3CC, &packed[2 * 8]);
|
||||
|
||||
memset(dst, 0xAB, sizeof(dst));
|
||||
nv15_unpack_plane_to_p010(packed, dst, 4, 3, 8);
|
||||
|
||||
ASSERT_EQ(dst[0], 0x111 << 6, "row0[0]");
|
||||
ASSERT_EQ(dst[3], 0x044 << 6, "row0[3]");
|
||||
ASSERT_EQ(dst[4], 0x055 << 6, "row1[0]");
|
||||
ASSERT_EQ(dst[7], 0x188 << 6, "row1[3]");
|
||||
ASSERT_EQ(dst[8], 0x099 << 6, "row2[0]");
|
||||
ASSERT_EQ(dst[11], 0x3CC << 6, "row2[3]");
|
||||
}
|
||||
|
||||
static void test_chroma_half_height(void)
|
||||
{
|
||||
/* 4-pixel-wide × 2-row chroma (matches 4×4 luma quadrant).
|
||||
* NV15 chroma uses same packing as luma, just half-height. */
|
||||
uint8_t packed[10]; /* 2 rows × 5 bytes */
|
||||
uint16_t dst[8]; /* 2 rows × 4 pixels (UV pairs in interleaved form) */
|
||||
|
||||
pack4(0x080, 0x180, 0x280, 0x380, &packed[0]);
|
||||
pack4(0x040, 0x140, 0x240, 0x340, &packed[5]);
|
||||
|
||||
nv15_unpack_plane_to_p010(packed, dst, 4, 2, 5);
|
||||
|
||||
ASSERT_EQ(dst[0], 0x080 << 6, "chroma row0[0]");
|
||||
ASSERT_EQ(dst[3], 0x380 << 6, "chroma row0[3]");
|
||||
ASSERT_EQ(dst[4], 0x040 << 6, "chroma row1[0]");
|
||||
ASSERT_EQ(dst[7], 0x340 << 6, "chroma row1[3]");
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
test_zero();
|
||||
test_all_max();
|
||||
test_known_vectors();
|
||||
test_remainder_width();
|
||||
test_multi_row_stride_padding();
|
||||
test_chroma_half_height();
|
||||
printf("test_nv15_unpack: all PASS\n");
|
||||
return 0;
|
||||
}
|
||||
Reference in New Issue
Block a user