Compare commits
37 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 9fa18f2312 | |||
| 9a9cfd05db | |||
| 96d70af674 | |||
| c1bb444d07 | |||
| 0791f8e612 | |||
| 989833114a | |||
| d1ba4625d2 | |||
| c332d34643 | |||
| 6173a8da8e | |||
| de27e95571 | |||
| 2146341460 | |||
| b5b3acf0f7 | |||
| 820557268b | |||
| c6f81c653f | |||
| 9bb5a5a722 | |||
| 0182307403 | |||
| 941fbc5b1b | |||
| 071b08dcc2 | |||
| 9037934b21 | |||
| 3ffa9d0d17 | |||
| f1be489c75 | |||
| bf52725ab3 | |||
| b6a65fc692 | |||
| 25b8a15e09 | |||
| cf8cd9d2be | |||
| c9f32aff49 | |||
| 6bc12fe7e4 | |||
| 63fed87bc5 | |||
| a13215de45 | |||
| f0ef69d279 | |||
| 393d02f413 | |||
| 9f7437e8ee | |||
| c9b7fcff50 | |||
| a8a91d92d6 | |||
| f61f736380 | |||
| 8746690739 | |||
| 662f8874ba |
@@ -1,75 +1,281 @@
|
|||||||
# v4l2-request libVA Backend
|
# libva-v4l2-request-fourier
|
||||||
|
|
||||||
## About
|
VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
|
||||||
|
fork of the dormant `bootlin/libva-v4l2-request` upstream.
|
||||||
|
|
||||||
This libVA backend is designed to work with the Linux Video4Linux2
|
> **TL;DR for "I want hardware-accelerated YouTube in Firefox on my
|
||||||
Request API that is used by a number of video codecs drivers,
|
> Rockchip board":** skip to the [§ Quickstart](#quickstart) below.
|
||||||
including the Video Engine found in most Allwinner SoCs.
|
> Fresnel (RK3399) and ampere (RK3588) are validated targets; ohm
|
||||||
|
> (RK3566 PineTab2) is the chromium-fourier validation rig.
|
||||||
|
|
||||||
## Status
|
## What works
|
||||||
|
|
||||||
The v4l2-request libVA backend currently supports the following formats:
|
| SoC / host | HW-accelerated codecs | Bit-exact vs `kdirect` |
|
||||||
* MPEG2 (Simple and Main profiles)
|
|---|---|---|
|
||||||
* H264 (Baseline, Main and High profiles)
|
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 | 5/5 at iter38; preserved through iter40b |
|
||||||
* H265 (Main profile)
|
| RK3588 (ampere) | H.264 + HEVC (iter1+iter2 ampere-fourier); **mainline rkvdec / VDPU381 + VDPU383 landed February 2026** — VP9 / AV1 verification next | iter1 H.264 PASS; remaining codecs gated on mainline-driver bring-up |
|
||||||
|
| RK3568 / RK3566 (ohm — PineTab2) | H.264, MPEG-2, VP8 via hantro multi-planar | iter1-5 baseline (libva-multiplanar campaign) |
|
||||||
|
| BCM2712 (higgs — Pi 5 / CM5) | — | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved, [see § Pi 5 standoff](#the-pi-5-standoff) |
|
||||||
|
|
||||||
|
`kdirect` is the reference: `ffmpeg -hwaccel v4l2request
|
||||||
|
-hwaccel_output_format drm_prime ...` via Kwiboo's downstream ffmpeg
|
||||||
|
patches (packaged here as **`ffmpeg-v4l2-request-fourier`**, FFmpeg 8.1
|
||||||
|
tip @ Kwiboo `v4l2-request-n8.1` commit `b57fbbe`).
|
||||||
|
|
||||||
|
## Quickstart
|
||||||
|
|
||||||
|
### What you need for HW-accelerated YouTube in Firefox
|
||||||
|
|
||||||
|
The full stack, top to bottom, with the package this campaign provides
|
||||||
|
at each layer:
|
||||||
|
|
||||||
|
| Layer | Package(s) | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Linux kernel with V4L2 stateless decoders | `linux-fresnel-fourier` (RK3399), `linux-ampere-fourier` (RK3588) | Mainline rkvdec / hantro / VDPU381 / VDPU383. ohm typically rides on a Beryllium OS host kernel. |
|
||||||
|
| `ffmpeg` with Kwiboo's v4l2-request hwaccel | `ffmpeg-v4l2-request-fourier` | Provides `-hwaccel drm -c:v hevc` (and h264/vp9) routes via libavcodec hwdevice DRM. |
|
||||||
|
| `libva` VA-API runtime + this backend ICD | `libva` (stock) + **`libva-v4l2-request-fourier`** | This repo. Auto-detects rkvdec / hantro / cedrus on probe. |
|
||||||
|
| Firefox patched to call libavcodec stateless | `firefox-fourier` | 5-patch series, ~+169 LoC over stock Firefox. Validated on fresnel: **~5 % CPU at 1080p30 H.264** (vs 64 % software). |
|
||||||
|
| (Wayland alt) Chromium patched for V4L2VDA | `chromium-fourier` + `kwin-fourier` | Validated on ohm under KDE Plasma 6.6.5 Wayland. Needs `kwin-fourier` for the dmabuf-fence latency fix. |
|
||||||
|
| (Optional) panfrost / panthor GPU stack | `vulkan-panfrost` | Wayland compositor + 3D. |
|
||||||
|
|
||||||
|
The actual VA-API path is mostly historical inside this campaign — the
|
||||||
|
**user-facing browser HW decode story rides libavcodec's
|
||||||
|
`v4l2_request` hwaccel directly**, not VAAPI-via-libva. Firefox-fourier
|
||||||
|
attaches an `AV_HWDEVICE_TYPE_DRM` context to libavcodec's generic
|
||||||
|
`h264`/`hevc`/`vp9` decoder; libavcodec then auto-binds the
|
||||||
|
`v4l2_request` hwaccel from its `hw_configs`. No `LIBVA_DRIVER_NAME`
|
||||||
|
incantation needed for browser use. libva-v4l2-request-fourier matters
|
||||||
|
for mpv, ffmpeg-as-vaapi, and other VA-API direct consumers.
|
||||||
|
|
||||||
|
### Install on Arch ALARM (fresnel / ampere / ohm)
|
||||||
|
|
||||||
|
Add the marfrit repo if you haven't already:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/pacman.conf
|
||||||
|
[marfrit]
|
||||||
|
SigLevel = Required
|
||||||
|
Server = https://packages.reauktion.de/arch/$arch
|
||||||
|
```
|
||||||
|
|
||||||
|
Import the signing key (one-time):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo pacman-key --recv-keys <KEY-ID> # see https://packages.reauktion.de
|
||||||
|
sudo pacman-key --lsign-key <KEY-ID>
|
||||||
|
sudo pacman -Sy
|
||||||
|
```
|
||||||
|
|
||||||
|
Then per host:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fresnel — RK3399 Pinebook Pro
|
||||||
|
sudo pacman -S \
|
||||||
|
linux-fresnel-fourier linux-fresnel-fourier-headers \
|
||||||
|
ffmpeg-v4l2-request-fourier \
|
||||||
|
libva-v4l2-request-fourier \
|
||||||
|
firefox-fourier
|
||||||
|
|
||||||
|
# Ampere — RK3588
|
||||||
|
sudo pacman -S \
|
||||||
|
linux-ampere-fourier linux-ampere-fourier-headers \
|
||||||
|
ffmpeg-v4l2-request-fourier \
|
||||||
|
libva-v4l2-request-fourier \
|
||||||
|
firefox-fourier
|
||||||
|
|
||||||
|
# Ohm — RK3566 PineTab2 (chromium-fourier validated path)
|
||||||
|
sudo pacman -S \
|
||||||
|
ffmpeg-v4l2-request-fourier \
|
||||||
|
libva-v4l2-request-fourier \
|
||||||
|
kwin-fourier
|
||||||
|
# chromium-fourier currently still a local build — see § Status
|
||||||
|
```
|
||||||
|
|
||||||
|
Reboot if a new kernel landed. Then:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Smoke-test: vainfo should list HEVCMain + H264 entries
|
||||||
|
LIBVA_DRIVER_NAME=v4l2_request vainfo
|
||||||
|
|
||||||
|
# Browser launch with verbose decoder logging
|
||||||
|
MOZ_LOG="PlatformDecoderModule:5,FFmpegVideo:5" \
|
||||||
|
firefox-fourier 2>&1 | tee /tmp/fx.log
|
||||||
|
|
||||||
|
# Then open a YouTube 1080p H.264 video and grep for:
|
||||||
|
# "Choosing FFmpeg pixel format for V4L2 video decoding"
|
||||||
|
# "av_hwdevice_ctx_create(DRM, /dev/dri/renderD128) ok"
|
||||||
|
# If you DON'T see those: HW path didn't engage, fell back to software.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Status of the published vs locally-built packages
|
||||||
|
|
||||||
|
As of May 2026, the live marfrit repo at
|
||||||
|
<https://packages.reauktion.de/arch/aarch64/> has:
|
||||||
|
|
||||||
|
- ✓ `libva-v4l2-request-fourier-1:1.0.0.r361.cf8cd9d-1` (iter40b tip)
|
||||||
|
- ✓ `ffmpeg-v4l2-request-fourier-2:8.1.r123329.b57fbbe-3` (Kwiboo's
|
||||||
|
v4l2-request-n8.1 + libudev-bypass; smoke-tested on fresnel —
|
||||||
|
HEVC via `-hwaccel v4l2request` PASS)
|
||||||
|
- ✓ `firefox-fourier-150.0.1-16` (5-patch series, sandboxed RDD HW
|
||||||
|
decode validated on RK3399: ~5 % CPU at 1080p30 H.264)
|
||||||
|
- ✓ `linux-fresnel-fourier-7.0-14` + headers (RK3399)
|
||||||
|
- ✓ `linux-ampere-fourier-7.0rc3.kafr1-1` + headers (RK3588)
|
||||||
|
- ✓ `kwin-fourier-1:6.6.5-1` (Wayland dmabuf-fence fix for chromium-fourier)
|
||||||
|
- ✓ `vulkan-panfrost-1:26.0.5-1` (GPU stack)
|
||||||
|
|
||||||
|
NOT yet published but **present in `marfrit-packages/arch/` source
|
||||||
|
tree** (build + publish pending):
|
||||||
|
|
||||||
|
- ⏳ `chromium-fourier` (Chromium 147 + V4L2VDA-on-mainline patches —
|
||||||
|
blocked on Arch ALARM bumping clang 22 → 23).
|
||||||
|
- ⏳ `qt6-base-fourier` (GL_ALPHA → GL_R8 fix — needed by KDE Plasma
|
||||||
|
Wayland on the panfrost stack).
|
||||||
|
|
||||||
|
If you need those locally before they ship:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone ssh://git@git.reauktion.de:2222/marfrit/marfrit-packages.git
|
||||||
|
cd marfrit-packages/arch/<package>
|
||||||
|
makepkg -si
|
||||||
|
```
|
||||||
|
|
||||||
|
## What does NOT work, and why it's stalled
|
||||||
|
|
||||||
|
| Target | Status | Blocker |
|
||||||
|
|---|---|---|
|
||||||
|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
|
||||||
|
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
|
||||||
|
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
|
||||||
|
|
||||||
|
## What does NOT work, and why it's stalled
|
||||||
|
|
||||||
|
| Target | Status | Blocker |
|
||||||
|
|---|---|---|
|
||||||
|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
|
||||||
|
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
|
||||||
|
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
|
||||||
|
|
||||||
|
### The Pi 5 standoff
|
||||||
|
|
||||||
|
iter40 + iter40b add a third multi-device-probe slot for
|
||||||
|
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
|
||||||
|
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
|
||||||
|
and a (fragile, fixture-specific) SPS field override using the
|
||||||
|
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
|
||||||
|
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
|
||||||
|
|
||||||
|
**Decode itself never succeeds** — every CAPTURE DQBUF returns
|
||||||
|
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
|
||||||
|
validation is intentional ("`try_ext_ctrls returned an error (22)` is
|
||||||
|
expected as it is validating the SPS"), and VAAPI's
|
||||||
|
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
|
||||||
|
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
|
||||||
|
slice-level `num_entry_point_offsets`) that the driver wants. We can't
|
||||||
|
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
|
||||||
|
the SPS itself and passes only slice NAL bytes to libva backends.
|
||||||
|
|
||||||
|
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
|
||||||
|
driver. It's an ecosystem coordination failure of long standing:
|
||||||
|
|
||||||
|
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
|
||||||
|
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
|
||||||
|
series in August 2024. Still un-merged in May 2026 — **eight years
|
||||||
|
in the upstream review queue**.
|
||||||
|
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
|
||||||
|
meaningful commits since ~2021. Nobody wants to own the impedance
|
||||||
|
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
|
||||||
|
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
|
||||||
|
I'll just drive the HW."
|
||||||
|
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
|
||||||
|
months in review. The Pi 6.18.x downstream kernel meanwhile has
|
||||||
|
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
|
||||||
|
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
|
||||||
|
aren't being fast-tracked because "the new uAPI is coming."
|
||||||
|
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
|
||||||
|
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
|
||||||
|
not via libva — explicit acknowledgement from David Turner that
|
||||||
|
libavcodec needs to retain the SPS context for the strict driver to
|
||||||
|
accept the control batch.
|
||||||
|
|
||||||
|
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
|
||||||
|
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
|
||||||
|
kernel). Anyone on a stock distro outside those two: no HW HEVC on
|
||||||
|
Pi 5.
|
||||||
|
|
||||||
|
Nobody who has authority to merge has skin in the game. Everyone with
|
||||||
|
skin in the game lacks authority. Result: 8-year stalemate, three
|
||||||
|
forks of working code, no merged upstream.
|
||||||
|
|
||||||
|
### What this means for this backend
|
||||||
|
|
||||||
|
We chose to extend `libva-v4l2-request` into Pi 5 territory because
|
||||||
|
the architecture maps cleanly onto the existing iter38 multi-device
|
||||||
|
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
|
||||||
|
`071b08d`). It's reusable infrastructure for any future strict V4L2
|
||||||
|
stateless decoder that ffmpeg ships before libva does.
|
||||||
|
|
||||||
|
But the *user-facing* Pi 5 HEVC story will not come from this
|
||||||
|
backend. The backend was a clean architectural target inside a
|
||||||
|
coordination dead-end. The actual Pi 5 HEVC path through libva
|
||||||
|
requires either:
|
||||||
|
|
||||||
|
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
|
||||||
|
against (Intel-driven; no Pi-aligned principal),
|
||||||
|
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
|
||||||
|
maintainer),
|
||||||
|
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
|
||||||
|
(small upstream patch; no champion), OR
|
||||||
|
- the political situation around Kwiboo's series unblocks (no
|
||||||
|
visible movement).
|
||||||
|
|
||||||
|
iter40 + iter40b are **landed but parked**. The fresnel + ampere
|
||||||
|
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
|
||||||
|
verified post-iter40b, no regression). Phase 8 packaging is
|
||||||
|
deliberately skipped — shipping a `.deb` whose primary advertised
|
||||||
|
target (Pi 5) doesn't actually decode would mislead users.
|
||||||
|
|
||||||
|
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
|
||||||
|
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
|
||||||
|
chapter's full empirical record.
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
In order to use this libVA backend, the `v4l2_request` driver has to
|
In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
|
||||||
be specified through the `LIBVA_DRIVER_NAME` environment variable, as
|
variable:
|
||||||
such:
|
|
||||||
|
|
||||||
export LIBVA_DRIVER_NAME=v4l2_request
|
export LIBVA_DRIVER_NAME=v4l2_request
|
||||||
|
|
||||||
A media player that supports VAAPI (such as VLC) can then be used to decode a
|
Then a VA-API-capable player can decode supported codecs on a probed
|
||||||
video in a supported format:
|
device:
|
||||||
|
|
||||||
vlc path/to/video.mpg
|
vlc path/to/video.mp4
|
||||||
|
mpv --hwdec=vaapi path/to/video.mp4
|
||||||
|
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
|
||||||
|
|
||||||
Sample media files can be obtained from:
|
The backend auto-detects available decoders via the V4L2 media
|
||||||
|
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
|
||||||
http://samplemedia.linaro.org/MPEG2/
|
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
|
||||||
http://samplemedia.linaro.org/MPEG4/SVT/
|
|
||||||
|
|
||||||
## Technical Notes
|
## Technical Notes
|
||||||
|
|
||||||
### Surface
|
### Multi-device probe (iter38)
|
||||||
|
|
||||||
A Surface is an internal data structure never handled by the VA's user
|
A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
|
||||||
containing the output of a rendering. Usualy, a bunch of surfaces are created
|
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
|
||||||
at the begining of decoding and they are then used alternatively. When
|
re-targets the active fd per profile via
|
||||||
created, a surface is assigned a corresponding v4l capture buffer and it is
|
`request_switch_device_for_profile()`. Pool teardown happens at
|
||||||
kept until the end of decoding. Syncing a surface waits for the v4l buffer to
|
switch time; the next `CreateContext` rebuilds against the right
|
||||||
be available and then dequeue it.
|
device.
|
||||||
|
|
||||||
Note: since a Surface is kept private from the VA's user, it can ask to
|
### Surface / Context / Picture / Image
|
||||||
directly render a Surface on screen in an X Drawable. Some kind of
|
|
||||||
implementation is available in PutSurface but this is only for development
|
|
||||||
purpose.
|
|
||||||
|
|
||||||
### Context
|
A Surface is an internal data structure containing rendering output.
|
||||||
|
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
|
||||||
|
defaults) for one decode session. A Picture is one encoded input
|
||||||
|
frame's set of buffers. An Image is a Standard VA pixel-format view
|
||||||
|
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
|
||||||
|
NV15 to NV12/P010 here so consumers see linear pitches.
|
||||||
|
|
||||||
A Context is a global data structure used for rendering a video of a certain
|
The real rendering is in `EndPicture`, not `RenderPicture`, because
|
||||||
format. When a context is created, input buffers are created and v4l's output
|
the kernel needs the full extended-control batch when the OUTPUT
|
||||||
(which is the compressed data input queue, since capture is the real output)
|
buffer is queued, and `RenderPicture` order is consumer-defined.
|
||||||
format is set.
|
|
||||||
|
|
||||||
### Picture
|
|
||||||
|
|
||||||
A Picture is an encoded input frame made of several buffers. A single input
|
|
||||||
can contain slice data, headers and IQ matrix. Each Picture is assigned a
|
|
||||||
request ID when created and each corresponding buffer might be turned into a
|
|
||||||
v4l buffers or extended control when rendered. Finally they are submitted to
|
|
||||||
kernel space when reaching EndPicture.
|
|
||||||
|
|
||||||
The real rendering is done in EndPicture instead of RenderPicture
|
|
||||||
because the v4l2 driver expects to have the full corresponding
|
|
||||||
extended control when a buffer is queued and we don't know in which
|
|
||||||
order the different RenderPicture will be called.
|
|
||||||
|
|
||||||
### Image
|
|
||||||
|
|
||||||
An Image is a standard data structure containing rendered frames in a usable
|
|
||||||
pixel format. Here we only use NV12 buffers which are converted from sunxi's
|
|
||||||
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
|
|
||||||
Surface.
|
|
||||||
|
|||||||
@@ -195,6 +195,11 @@ extern "C" {
|
|||||||
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
|
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
|
||||||
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
|
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
|
||||||
|
|
||||||
|
/* iter39: NV15 is 4×10-bit packed in 5 bytes (Rockchip rkvdec 10-bit output). */
|
||||||
|
#ifndef DRM_FORMAT_NV15
|
||||||
|
#define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */
|
||||||
|
#endif
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* 3 plane YCbCr
|
* 3 plane YCbCr
|
||||||
* index 0: Y plane, [7:0] Y
|
* index 0: Y plane, [7:0] Y
|
||||||
|
|||||||
@@ -4,3 +4,14 @@ option(
|
|||||||
value : '',
|
value : '',
|
||||||
description: 'Path to sanitized Linux Kernel headers'
|
description: 'Path to sanitized Linux Kernel headers'
|
||||||
)
|
)
|
||||||
|
|
||||||
|
option(
|
||||||
|
'daedalus_v4l2',
|
||||||
|
type : 'boolean',
|
||||||
|
value : true,
|
||||||
|
description: 'Enable probe + dispatch for the out-of-tree daedalus_v4l2 ' +
|
||||||
|
'stateless decoder shim (Pi 5 / CM5 daemon-backed VP9/AV1/H264). ' +
|
||||||
|
'Default true; disable on platforms where the daedalus_v4l2 ' +
|
||||||
|
'kernel module will never be present to slim the probe array.'
|
||||||
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,298 @@
|
|||||||
|
# Phase 0 — Pi 5 / CM5 HEVC chapter
|
||||||
|
|
||||||
|
Opened 2026-05-17 evening, after the failed `libva-v4l2-stateful-fourier`
|
||||||
|
scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated
|
||||||
|
the stateful premise: rpi-hevc-dec is V4L2 **stateless**, so Pi 5 HEVC
|
||||||
|
belongs in this backend, not a separate sibling.
|
||||||
|
|
||||||
|
No code in this chapter yet. This doc is the substrate. Phase 1 picks up
|
||||||
|
from the "Open questions" section.
|
||||||
|
|
||||||
|
## Substrate
|
||||||
|
|
||||||
|
### Target host
|
||||||
|
|
||||||
|
higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often
|
||||||
|
offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power
|
||||||
|
when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at
|
||||||
|
`/dev/video19` + `/dev/media1`.
|
||||||
|
|
||||||
|
### Backend baseline at chapter open
|
||||||
|
|
||||||
|
`libva-v4l2-request-fourier` master tip `cf8cd9d` (iter39 + Option B +
|
||||||
|
h265 ref-list cap fix). Multi-device probe (iter38) already opens
|
||||||
|
rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is
|
||||||
|
a natural extension of that architecture.
|
||||||
|
|
||||||
|
iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265
|
||||||
|
parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That
|
||||||
|
plumbing is probe-gated (`has_hevc_ext_sps_rps_rkvdec`), so it stays
|
||||||
|
dormant on hosts where the controls don't exist.
|
||||||
|
|
||||||
|
### Empirical higgs probe (brother session)
|
||||||
|
|
||||||
|
`v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls`:
|
||||||
|
|
||||||
|
```
|
||||||
|
Stateless Codec Controls
|
||||||
|
|
||||||
|
hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS)
|
||||||
|
hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS)
|
||||||
|
slice_param_array (compound dynamic-array dims=[4096])
|
||||||
|
hevc_scaling_matrix (compound)
|
||||||
|
hevc_decode_parameters (compound)
|
||||||
|
hevc_decode_mode (menu, "Frame-Based")
|
||||||
|
hevc_start_code (menu, default "No Start Code")
|
||||||
|
|
||||||
|
OUTPUT formats:
|
||||||
|
S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload)
|
||||||
|
|
||||||
|
CAPTURE formats:
|
||||||
|
NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled)
|
||||||
|
NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled)
|
||||||
|
```
|
||||||
|
|
||||||
|
Conclusion: this is the standard `V4L2_CID_STATELESS_HEVC_*` control set
|
||||||
|
exposed under the V4L2-request uAPI, exactly the same family our backend
|
||||||
|
already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are
|
||||||
|
two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).
|
||||||
|
|
||||||
|
## What carries forward unchanged
|
||||||
|
|
||||||
|
- VAAPI HEVC profile enumeration (`config.c`)
|
||||||
|
- `h265_set_controls` core path (`h265.c`) — same compound ctrl set
|
||||||
|
- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
|
||||||
|
- Multi-device dispatch in `RequestCreateConfig` (iter38)
|
||||||
|
- VAAPI slice / picture / IQ matrix buffer parsing
|
||||||
|
- HEVC h264-style start-code policy (we already DON'T prepend for HEVC)
|
||||||
|
|
||||||
|
## What needs adding
|
||||||
|
|
||||||
|
| Item | Location | Sizing |
|
||||||
|
|------|----------|--------|
|
||||||
|
| `RPI_HEVC_DEC` enum in `driver_kind_t` | `request.h` | trivial |
|
||||||
|
| Multi-device probe extends to `/dev/video19` discovery | `context.c` / `request.c` init | small — mirror hantro slot |
|
||||||
|
| `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry | `video.c` | small |
|
||||||
|
| `V4L2_PIX_FMT_NV12_10_COL128` (NC30) `video_format` entry | `video.c` | small |
|
||||||
|
| NC12 → NV12 detile primitive | new `nv12_col128.c` | mid — column tile layout, see kernel docs |
|
||||||
|
| NC30 → P010 detile primitive | new `nv12_col128.c` | mid — 10-bit variant of above |
|
||||||
|
| `copy_surface_to_image` branch for NC12/NC30 | `image.c` | small (mirror NV15→P010 gating) |
|
||||||
|
| Per-driver gating for any rpi-specific quirks discovered | various | per [[per-driver-kludge-gating]] |
|
||||||
|
|
||||||
|
## Open questions for Phase 1
|
||||||
|
|
||||||
|
Lock these before Phase 1 commits to a goal.
|
||||||
|
|
||||||
|
1. **EXT_SPS controls on rpi-hevc-dec?** Brother's `--list-ctrls` output
|
||||||
|
above shows the standard `V4L2_CID_STATELESS_HEVC_*` family — NOT the
|
||||||
|
`EXT_SPS_ST_RPS` / `EXT_SPS_LT_RPS` extensions that VDPU381 needs.
|
||||||
|
Verify: does `slice_param_array[4096]` accept `st_rps_bits` /
|
||||||
|
`lt_rps_bits` in the per-slice payload, or does rpi-hevc-dec parse RPS
|
||||||
|
itself from the slice header? If the latter, the iter2 EXT_SPS path
|
||||||
|
stays dormant (probe-gated already), and rpi-hevc-dec just needs the
|
||||||
|
`picture->st_rps_bits` → `slice_params->short_term_ref_pic_set_size`
|
||||||
|
plumbing that iter31 α-29 already wired. Expectation: works out of the
|
||||||
|
box. Confirm before assuming.
|
||||||
|
|
||||||
|
2. **`hevc_start_code` ctrl: "No Start Code" vs Annex B?** Brother saw
|
||||||
|
default `"No Start Code"` — matches our behavior (we don't prepend on
|
||||||
|
HEVC). But the ctrl is configurable. Verify the menu values exposed
|
||||||
|
and confirm "No Start Code" passes our raw slice-NAL payload as-is.
|
||||||
|
If it doesn't, set the ctrl explicitly per [[unconditional-codec-state]]
|
||||||
|
gating.
|
||||||
|
|
||||||
|
3. **NC12 / NC30 SAND tile layout — exact spec.** Read
|
||||||
|
`Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst` for the
|
||||||
|
COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes
|
||||||
|
(UV interleaved). Row count = `ALIGN(height, 16)` or `ALIGN(height, 8)`?
|
||||||
|
Get the exact alignment and tile-traversal order before writing the
|
||||||
|
detile primitive. Cite from kernel doc, NOT inferred from a hex dump.
|
||||||
|
|
||||||
|
4. **drm_prime / SAND modifier round-trip.** Does ffmpeg-vaapi (and
|
||||||
|
Firefox) accept the NC12 buffer via DRM_PRIME export carrying the
|
||||||
|
DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing
|
||||||
|
zero-copy to a SAND-aware compositor? Or is libva-side detile to a
|
||||||
|
linear NV12 buffer the only viable Firefox path? If detile is
|
||||||
|
required for the consumer, the [[rockchip-pixel-verify-path]] rule
|
||||||
|
(DMA-BUF GL preferred over cached mmap) might NOT apply since SAND
|
||||||
|
is Pi-specific and not in the wider Wayland modifier ecosystem.
|
||||||
|
|
||||||
|
5. **rpi-hevc-dec quirks on first SPS submission.** rkvdec needs
|
||||||
|
image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec
|
||||||
|
have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE"
|
||||||
|
ordering? Verify with strace early.
|
||||||
|
|
||||||
|
6. **higgs OS + libva versioning.** Brother probed on Debian. We package
|
||||||
|
for Arch ALARM. What's the install path on higgs — Arch / Debian /
|
||||||
|
Raspberry Pi OS? If Debian, the package needs a `debian/` tree, not
|
||||||
|
just PKGBUILD. Decide packaging target before Phase 8.
|
||||||
|
|
||||||
|
## Phase 1 goal sketch (NOT locked)
|
||||||
|
|
||||||
|
> Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact
|
||||||
|
> libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).
|
||||||
|
|
||||||
|
Two measurable subgoals follow naturally:
|
||||||
|
- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request,
|
||||||
|
NV12 image output) byte-exact for the same input.
|
||||||
|
- Firefox VA-API path engages (verify via `chrome://gpu` equivalent / log
|
||||||
|
inspection — `MOZ_LOG=PlatformDecoderModule:5`).
|
||||||
|
|
||||||
|
## Phase 3 baseline plan
|
||||||
|
|
||||||
|
Before any backend code touches rpi-hevc-dec:
|
||||||
|
- `kdirect` floor: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime
|
||||||
|
-i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...` and
|
||||||
|
sha256 the YUV.
|
||||||
|
- `SW reference`: same ffmpeg without `-hwaccel`, sha256 the YUV.
|
||||||
|
- Both runs N=3 per [[replicate-baseline-first]].
|
||||||
|
- Capture `strace -f -e ioctl` of the kdirect run — gives the canonical
|
||||||
|
ioctl sequence rpi-hevc-dec expects.
|
||||||
|
|
||||||
|
## Phase 0 closing
|
||||||
|
|
||||||
|
This doc commits the substrate. Phase 1 starts when:
|
||||||
|
- higgs is up + reachable
|
||||||
|
- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one
|
||||||
|
short probe session
|
||||||
|
- Phase 3 baseline floors are captured
|
||||||
|
|
||||||
|
No work blocks the close of iter39 / fresnel campaign — those are shipped.
|
||||||
|
|
||||||
|
## Phase 0 close addendum (2026-05-17 evening, higgs probe session)
|
||||||
|
|
||||||
|
Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6.
|
||||||
|
Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1
|
||||||
|
opens with what's below.
|
||||||
|
|
||||||
|
### Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present
|
||||||
|
|
||||||
|
`v4l2-ctl -d /dev/video19 --list-ctrls` confirms ONLY the standard
|
||||||
|
`V4L2_CID_STATELESS_HEVC_*` set:
|
||||||
|
- `hevc_sequence_parameter_set` (0x00a40a90)
|
||||||
|
- `hevc_picture_parameter_set` (0x00a40a91)
|
||||||
|
- `slice_param_array` (0x00a40a92, dynamic-array dims=[4096])
|
||||||
|
- `hevc_scaling_matrix` (0x00a40a93)
|
||||||
|
- `hevc_decode_parameters` (0x00a40a94)
|
||||||
|
- `hevc_decode_mode` (0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)
|
||||||
|
- `hevc_start_code` (0x00a40a96, menu min=0 max=1 default=0 = No Start Code)
|
||||||
|
- 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)
|
||||||
|
|
||||||
|
ioctl trace confirms ffmpeg's `VIDIOC_QUERY_EXT_CTRL` for `0xa97` returns
|
||||||
|
EINVAL — same probe pattern our backend uses for
|
||||||
|
`has_hevc_ext_sps_rps_rkvdec`. **The iter2 path stays dormant; the
|
||||||
|
iter31 α-29 `slice_params->short_term_ref_pic_set_size` plumbing is the
|
||||||
|
correct one for rpi-hevc-dec.**
|
||||||
|
|
||||||
|
### Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}
|
||||||
|
|
||||||
|
Default 0 matches our backend's "don't prepend HEVC start code" stance.
|
||||||
|
Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.
|
||||||
|
|
||||||
|
### Q3 — NC12 / NC30 SAND tile layout: PARTIAL
|
||||||
|
|
||||||
|
CAPTURE S_FMT result for 1280×720 NC12:
|
||||||
|
- `sizeimage=1382400` = `1280 × 720 × 1.5` (linear NV12 byte count)
|
||||||
|
- `bytesperline=1080` (NOT 1280)
|
||||||
|
|
||||||
|
The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely
|
||||||
|
encodes SAND column count rather than linear stride. Read
|
||||||
|
`drivers/staging/media/rpivid/` (or wherever NC12_COL128 lives in 6.12)
|
||||||
|
kernel source + `drm_fourcc.h` / `nv12_col128.rst` (if it exists) for
|
||||||
|
exact tile layout BEFORE writing the detile primitive. Do NOT infer
|
||||||
|
layout from this single observation.
|
||||||
|
|
||||||
|
### Q4 — DRM modifier round-trip: BLOCKED on hwdownload
|
||||||
|
|
||||||
|
ffmpeg `-hwaccel drm -hwaccel_output_format drm_prime -vf
|
||||||
|
hwmap=mode=read,format=nv12` returns `Failed to map frame: -38`
|
||||||
|
(`Function not implemented`). hwdownload cannot consume the SAND
|
||||||
|
modifier directly.
|
||||||
|
|
||||||
|
ffmpeg's path that DOES work: `-hwaccel drm -c:v hevc` WITHOUT
|
||||||
|
`-hwaccel_output_format drm_prime` lets ffmpeg's internal pipeline pull
|
||||||
|
back, detile (presumably via a Pi-specific helper or libdrm transform),
|
||||||
|
and present NV12 to the next filter. Bit-exact vs SW for the test
|
||||||
|
fixture (1280×720 Main 8-bit) — confirms HW engagement.
|
||||||
|
|
||||||
|
Phase 1 / Phase 4 will need to decide:
|
||||||
|
- Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
|
||||||
|
- Pass-through DRM_PRIME with SAND modifier and let the consumer
|
||||||
|
(compositor / Firefox) detile. Firefox almost certainly can't, so
|
||||||
|
CPU detile is the safe bet.
|
||||||
|
|
||||||
|
### Q5 — rpi-hevc-dec submission ordering: empirically locked
|
||||||
|
|
||||||
|
`strace -e ioctl` of the kdirect run shows:
|
||||||
|
1. `MEDIA_IOC_DEVICE_INFO` + `MEDIA_IOC_G_TOPOLOGY` (per media node)
|
||||||
|
2. `VIDIOC_QUERYCAP` per video node — `driver="rpi-hevc-dec"` identifies
|
||||||
|
the right one
|
||||||
|
3. `VIDIOC_ENUM_FMT` OUTPUT → S265 only
|
||||||
|
4. `VIDIOC_S_FMT` OUTPUT (HEVC_SLICE, placeholder dims)
|
||||||
|
5. `VIDIOC_REQBUFS` OUTPUT (DMABUF, count=N) — count=6 in kdirect
|
||||||
|
6. `VIDIOC_S_FMT` CAPTURE (NC12, actual dims from SPS parse)
|
||||||
|
7. `VIDIOC_CREATE_BUFS` CAPTURE (DMABUF, count=16)
|
||||||
|
8. `VIDIOC_STREAMON` both queues
|
||||||
|
9. `VIDIOC_QUERY_EXT_CTRL` enumeration
|
||||||
|
10. `VIDIOC_S_EXT_CTRLS` (decode_mode + start_code) — global ctrls
|
||||||
|
11. Per frame: `VIDIOC_S_EXT_CTRLS` (SPS+PPS+decode_params+slice_array,
|
||||||
|
class=0xf010000 = per-request) + `VIDIOC_QBUF` CAPTURE + `VIDIOC_QBUF`
|
||||||
|
OUTPUT (with `V4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD`) +
|
||||||
|
`VIDIOC_DQBUF` OUTPUT + `VIDIOC_DQBUF` CAPTURE
|
||||||
|
|
||||||
|
**Two structural notes for the backend:**
|
||||||
|
- OUTPUT + CAPTURE both use `V4L2_MEMORY_DMABUF` in kdirect. Our backend
|
||||||
|
currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should
|
||||||
|
either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or
|
||||||
|
use MMAP and CPU-detile. Phase 4 design decision.
|
||||||
|
- The order `S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS
|
||||||
|
CAPTURE → STREAMON` differs from our iter25 rkvdec pre-seed pattern
|
||||||
|
(where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve
|
||||||
|
the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed —
|
||||||
|
CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm
|
||||||
|
in Phase 1 by trying our existing iter25 pre-seed flow against it.
|
||||||
|
|
||||||
|
### Q6 — packaging: Debian 13 trixie, NOT Arch
|
||||||
|
|
||||||
|
higgs runs Debian 13 trixie (`PRETTY_NAME="Debian GNU/Linux 13 (trixie)"`),
|
||||||
|
not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for
|
||||||
|
the Pi 5 chapter needs a `debian/` packaging tree, not just a PKGBUILD.
|
||||||
|
|
||||||
|
Decide in Phase 1 whether to:
|
||||||
|
- Add Debian packaging to `marfrit-packages` as a second target, OR
|
||||||
|
- Use distrobox/podman with an Arch ALARM container on higgs for
|
||||||
|
install (test-only, not production), OR
|
||||||
|
- Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian
|
||||||
|
repo.
|
||||||
|
|
||||||
|
### Other new findings from the probe session
|
||||||
|
|
||||||
|
- **ffmpeg 7.1.3 from Debian 13 is built with `--enable-v4l2-request`**
|
||||||
|
— the kdirect path exists. Invocation is `ffmpeg -hwaccel drm -c:v
|
||||||
|
hevc` (not just `-hwaccel drm`; the explicit codec flag matters for
|
||||||
|
the negotiation). Engagement log line is
|
||||||
|
`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
|
||||||
|
buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`. Per
|
||||||
|
[[hw-decode-engagement-check]], grep for that line to confirm HW path
|
||||||
|
engaged.
|
||||||
|
- **No libva ICD installed on higgs** — only `armada-drm_dri.so` ships,
|
||||||
|
which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi
|
||||||
|
5 once installed.
|
||||||
|
- **mpv is apt-installable** (`mpv 0.40.0-3+deb13u1`) — useful as a
|
||||||
|
pixel-readback verifier once the backend works (`mpv --vo=image` or
|
||||||
|
`--vo=drm`).
|
||||||
|
- **Firefox 145.0.1 + rpi-firefox-mods 20251016 installed** (firefox-esr
|
||||||
|
package status was `rc` = removed but config remains). The mods
|
||||||
|
package likely contains VA-API plumbing prefs.
|
||||||
|
|
||||||
|
### What changes for Phase 1
|
||||||
|
|
||||||
|
- Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for
|
||||||
|
the 1280×720 Main 8-bit test fixture (same generator as
|
||||||
|
`/tmp/bbb_main.mp4` here). Kdirect engagement signal is the
|
||||||
|
`Hwaccel V4L2 HEVC stateless V4` log line.
|
||||||
|
- Most backend code reuses existing rkvdec/hantro HEVC path: ctrls,
|
||||||
|
per-frame submission, request_fd, multi-device probe pattern.
|
||||||
|
- New code: NC12 video_format entry + detile primitive (sibling to
|
||||||
|
`nv15_unpack_plane_to_p010`) + RPI_HEVC_DEC driver_kind.
|
||||||
|
- Packaging target = Debian, not Arch.
|
||||||
@@ -0,0 +1,230 @@
|
|||||||
|
# Phase 1+2+3+4 — Pi 5 HEVC chapter (iter40)
|
||||||
|
|
||||||
|
Per [[feedback_dev_process]], Phase 1 (goal), Phase 2 (situation analysis),
|
||||||
|
Phase 3 (baselines), Phase 4 (plan) for adding rpi-hevc-dec as a third
|
||||||
|
multi-device-probe slot in `libva-v4l2-request-fourier`. Phase 0 substrate
|
||||||
|
+ open-question answers live at `phase0_pi5_hevc.md`.
|
||||||
|
|
||||||
|
## Phase 1 — Goal
|
||||||
|
|
||||||
|
> **libva-v4l2-request-fourier on higgs** decodes HEVC Main 8-bit input
|
||||||
|
> producing NV12 output **bit-exact vs kdirect** for three reference
|
||||||
|
> fixtures (640×360, 1280×720, 1920×1080 — Main profile, libx265
|
||||||
|
> ultrafast). HW path engagement verified via the kernel-driver lsof
|
||||||
|
> signal (`/dev/video19` open) AND ffmpeg-vaapi engagement signal
|
||||||
|
> (`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19`).
|
||||||
|
|
||||||
|
Measurable:
|
||||||
|
|
||||||
|
| Criterion | Metric |
|
||||||
|
|---|---|
|
||||||
|
| C1 — vainfo enumeration | `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain : VAEntrypointVLD` |
|
||||||
|
| C2 — bit-exact decode | sha256 of libva NV12 output == sha256 of kdirect NV12 output, per fixture, N=1 |
|
||||||
|
| C3 — HW engagement | `lsof` shows `/dev/video19` open by ffmpeg-vaapi during libva run |
|
||||||
|
| C4 — Stability under N=3 | C2 holds at N=3 repeated runs (deterministic) |
|
||||||
|
| C5 — Sibling baseline preserved | fresnel iter38 5/5 still PASS post-iter40 (no regression to rkvdec/hantro path) |
|
||||||
|
|
||||||
|
Out of scope this iter: Main10 (10-bit / NC30), VP9, AV1, Firefox VA-API
|
||||||
|
engagement testing, performance benchmarks. All later chapters.
|
||||||
|
|
||||||
|
## Phase 2 — Situation Analysis
|
||||||
|
|
||||||
|
### Backend architecture already in place
|
||||||
|
|
||||||
|
- **Multi-device probe (iter38)**: at `VA_DRIVER_INIT` opens both
|
||||||
|
`rkvdec` + `hantro-vpu` via `find_decoder_device_by_driver(name)`.
|
||||||
|
Stores per-driver fds (`video_fd_{rkvdec,hantro}`,
|
||||||
|
`media_fd_{rkvdec,hantro}`). `RequestCreateConfig` retargets the
|
||||||
|
"active" `driver_data->{video,media}_fd` per profile via
|
||||||
|
`request_switch_device_for_profile()` (request.c:426-478).
|
||||||
|
- **Per-driver feature gating**: `request_data->has_hevc_ext_sps_rps_{rkvdec,hantro}`
|
||||||
|
pair, with `h265_set_controls` consulting the per-fd flag. Established
|
||||||
|
by iter2 / Phase 5 review (request.h:99-100). This is the canonical
|
||||||
|
per-driver gating shape for iter40.
|
||||||
|
- **HEVC ctrl population**: `h265_set_controls` populates the standard
|
||||||
|
`V4L2_CID_STATELESS_HEVC_*` set (h265.c). Probe-gates EXT_SPS_*_RPS
|
||||||
|
via the iter2 path — naturally dormant for rpi-hevc-dec since the
|
||||||
|
controls don't exist.
|
||||||
|
- **Synthetic SPS pre-seed (iter25/26)**: needed for rkvdec to resolve
|
||||||
|
`image_fmt` before CAPTURE alloc. Phase 0 strace shows rpi-hevc-dec
|
||||||
|
does NOT need this — it accepts NC12 + explicit dims on `S_FMT
|
||||||
|
CAPTURE` directly. The pre-seed code path stays in place for rkvdec;
|
||||||
|
rpi-hevc-dec just doesn't trigger it (gate on driver_kind).
|
||||||
|
- **CAPTURE detile primitive**: `nv15_unpack_plane_to_p010()` (nv15.c)
|
||||||
|
is the template — backend already CPU-detiles when a Pi-or-Rockchip-
|
||||||
|
specific CAPTURE format meets a linear consumer (VAImage NV12 / P010).
|
||||||
|
- **Single-plane (S) vs multi-plane (M) handling**: hantro uses MPLANE,
|
||||||
|
rkvdec uses both depending on codec. rpi-hevc-dec exposes MPLANE for
|
||||||
|
BOTH OUTPUT (HEVC_SLICE) and CAPTURE (NC12) per the strace. iter38
|
||||||
|
already supports MPLANE handling for hantro; rpi reuses that.
|
||||||
|
|
||||||
|
### Surface area to touch (audit)
|
||||||
|
|
||||||
|
| File | What changes | Size |
|
||||||
|
|------|--------------|------|
|
||||||
|
| `src/request.h` | Add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`, `has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout) | ~10 lines |
|
||||||
|
| `src/request.c` | (a) Extend init -1 block to cover new fds. (b) Recognize `rpi-hevc-dec` as a 3rd primary/alt driver string in the probe loop. (c) Extend `request_device_kind_for_profile` so HEVC→'p' when rpi-hevc-dec is present, else 'r'. (d) Extend `request_switch_device_for_profile` 'p' branch. (e) Probe HEVC ext_sps on the new fd (will be false, mirrors hantro entry). | ~80 lines |
|
||||||
|
| `src/video.c` | Add `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry: 4:2:0, planes=1, alignment via dedicated bytesperline/sizeimage formula. NOT marked linear. | ~20 lines |
|
||||||
|
| `src/nv12_col128.c` (NEW) | `nv12_col128_detile_to_nv12()`: Y plane + UV plane detile primitive. Adapted from ffmpeg/Kynesim `av_rpi_sand_to_planar_y8` core math. Header doc traces back to videodev2.h docstring + raspberrypi/linux `hevc_dec/hevc_d_video.c` size formula. | ~80 lines + 30-line header |
|
||||||
|
| `src/image.c` | Add NC12 → NV12 branch in `copy_surface_to_image`, gated on `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` (sibling to existing NV15→P010 branch). | ~25 lines |
|
||||||
|
| `src/meson.build` + `src/Makefile.am` | List `nv12_col128.c`/`.h` in sources | 2 lines |
|
||||||
|
|
||||||
|
Total estimated diff: ~250 LoC backend + ~100 LoC standalone primitive.
|
||||||
|
Roughly half the surface area of iter38; smaller than iter2.
|
||||||
|
|
||||||
|
### What does NOT change
|
||||||
|
|
||||||
|
- iter25/26 SPS pre-seed: stays on rkvdec path only (gated by
|
||||||
|
driver_kind check that's already implicit in the rkvdec fd routing).
|
||||||
|
- iter2 EXT_SPS plumbing: probe-gated off on rpi-hevc-dec; vendored
|
||||||
|
GStreamer parser unused. Confirmed via the EINVAL on ctrl 0xa97.
|
||||||
|
- iter31 α-29 slice_params st_rps_bits: APPLIES to rpi-hevc-dec
|
||||||
|
unchanged. Same plumbing.
|
||||||
|
- iter33 VP8 hantro start-code prepend: not relevant (rpi-hevc-dec is
|
||||||
|
HEVC-only; VP8 still goes through hantro on RK).
|
||||||
|
- iter38 single-libva-session multi-codec semantics: extends from 5
|
||||||
|
codecs to 5+1 (HEVC reroutes on Pi).
|
||||||
|
|
||||||
|
### NC12 / SAND128 tile geometry — locked contract
|
||||||
|
|
||||||
|
From kernel driver `drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c`
|
||||||
|
(via [[github raspberrypi/linux rpi-6.12.y]]):
|
||||||
|
|
||||||
|
```c
|
||||||
|
case V4L2_PIX_FMT_NV12_COL128:
|
||||||
|
width = ALIGN(width, 128); /* Width rounds up to columns */
|
||||||
|
height = ALIGN(height, 8);
|
||||||
|
bytesperline = constrain2x(bytesperline, height * 3 / 2);
|
||||||
|
sizeimage = bytesperline * width;
|
||||||
|
break;
|
||||||
|
```
|
||||||
|
|
||||||
|
For 1280×720:
|
||||||
|
- width = 1280 (already 128-aligned)
|
||||||
|
- height = 720 (already 8-aligned)
|
||||||
|
- bytesperline = 720 × 3/2 = **1080** (matches Phase 0 strace observation)
|
||||||
|
- sizeimage = 1080 × 1280 = **1,382,400** (matches strace; equals linear NV12 byte count coincidentally)
|
||||||
|
|
||||||
|
**Geometry interpretation** (cross-verified against ffmpeg/Kynesim
|
||||||
|
`rpi_sand_fn_pw.h` `av_rpi_sand_to_planar_y8`):
|
||||||
|
- Image is divided into `(width + 127) / 128` columns; each column is
|
||||||
|
**128 px wide × height px tall**.
|
||||||
|
- Within a column: `128 × height` bytes of Y data, immediately followed
|
||||||
|
by `128 × height/2` bytes of interleaved CbCr (so 128 × `bytesperline`
|
||||||
|
bytes per column, where `bytesperline` is the column stride).
|
||||||
|
- Across columns: column N starts at offset `N × stride1 × stride2`
|
||||||
|
where `stride1 = 128` (column width) and `stride2 = bytesperline`.
|
||||||
|
- **Pixel (x, y) byte offset = `(x & 127) + y × 128 + (x & ~127) × bytesperline`**
|
||||||
|
for Y; same formula with `y/2` for UV plane (which begins at offset
|
||||||
|
`128 × height × num_columns` from the start).
|
||||||
|
|
||||||
|
Reference for the detile loop: `av_rpi_sand_to_planar_y8` (Kynesim
|
||||||
|
ffmpeg, `libavutil/rpi_sand_fn_pw.h` with PW=1). Our primitive copies
|
||||||
|
the single-stripe fast-path math; we don't import NEON ASM (CPU
|
||||||
|
detile is the safe path for Phase 1; SIMD a Phase 2 perf bump if needed).
|
||||||
|
|
||||||
|
## Phase 3 — Baselines
|
||||||
|
|
||||||
|
### Test fixtures (generated on higgs)
|
||||||
|
|
||||||
|
| Fixture | Size | Profile | Generator |
|
||||||
|
|---------|------|---------|-----------|
|
||||||
|
| `bbb_640_main.mp4` | 640×360 | Main 8-bit | `ffmpeg -f lavfi -i testsrc=duration=2 -pix_fmt yuv420p -c:v libx265 -preset ultrafast -profile:v main` |
|
||||||
|
| `bbb_1280_main.mp4` | 1280×720 | Main 8-bit | same |
|
||||||
|
| `bbb_1920_main.mp4` | 1920×1080 | Main 8-bit | same |
|
||||||
|
|
||||||
|
### Captured 2026-05-17 evening on higgs
|
||||||
|
|
||||||
|
For each fixture, N=3 reps. Both SW (no hwaccel) and kdirect
|
||||||
|
(`ffmpeg -hwaccel drm -c:v hevc`) → `-frames:v 10 -f rawvideo -pix_fmt nv12`,
|
||||||
|
sha256 of first 16 chars:
|
||||||
|
|
||||||
|
```
|
||||||
|
bbb_640_main SW={9a81038065e9b7cd} HW={9a81038065e9b7cd} → BIT-EXACT × N=3
|
||||||
|
bbb_1280_main SW={d3bb055655d6f195} HW={d3bb055655d6f195} → BIT-EXACT × N=3
|
||||||
|
bbb_1920_main SW={0bc2bd6f693db039} HW={0bc2bd6f693db039} → BIT-EXACT × N=3
|
||||||
|
```
|
||||||
|
|
||||||
|
HW engagement signal (per-run): `Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`
|
||||||
|
|
||||||
|
This is the kdirect baseline. Phase 7 verification will compare libva
|
||||||
|
output against these SHAs.
|
||||||
|
|
||||||
|
### Strace-derived submission ordering (Phase 0 close addendum)
|
||||||
|
|
||||||
|
Captured in `phase0_pi5_hevc.md`. Briefly: standard V4L2-request
|
||||||
|
stateless flow, both queues DMABUF, no SPS pre-seed dance needed
|
||||||
|
(rpi-hevc-dec accepts NC12 + dims directly on CAPTURE S_FMT).
|
||||||
|
|
||||||
|
## Phase 4 — Plan
|
||||||
|
|
||||||
|
### Implementation steps (sequenced)
|
||||||
|
|
||||||
|
1. **`request.h`**: extend `request_data` with the new fd pair + ext_sps
|
||||||
|
flag, mirroring iter38/iter2 layout. (no behavior change yet)
|
||||||
|
2. **`request.c`**:
|
||||||
|
- `find_decoder_device_by_driver("rpi-hevc-dec", ...)` accepts new
|
||||||
|
driver string.
|
||||||
|
- Init -1 block extends to new fds.
|
||||||
|
- Probe loop: if primary is `rkvdec` or `hantro-vpu`, also probe
|
||||||
|
`rpi-hevc-dec` (third slot). On Pi 5 there's no `rkvdec` or
|
||||||
|
`hantro-vpu`, so primary becomes `rpi-hevc-dec` and the alt-probes
|
||||||
|
for the other two return absent (their fds stay -1).
|
||||||
|
- `request_device_kind_for_profile`: when profile is `VAProfileHEVCMain`,
|
||||||
|
prefer `'p'` (rpi-hevc-dec) IF `video_fd_rpi_hevc_dec >= 0`, else
|
||||||
|
fall through to `'r'` (rkvdec). All other profiles stay routed as
|
||||||
|
today.
|
||||||
|
- `request_switch_device_for_profile`: add `'p'` branch.
|
||||||
|
- ext_sps probe runs on the new fd; result stored in
|
||||||
|
`has_hevc_ext_sps_rps_rpi_hevc_dec`. Will be false (controls absent).
|
||||||
|
3. **`video.c`**: add NC12 video_format entry. Mark it MPLANE-only (per
|
||||||
|
Phase 0 strace). bytesperline/sizeimage formula encoded per kernel
|
||||||
|
driver math.
|
||||||
|
4. **`src/nv12_col128.c` + `.h`** (NEW): single-file primitive,
|
||||||
|
`nv12_col128_detile_to_nv12(dst_y, dst_uv, src_y, src_uv, width,
|
||||||
|
height, src_stride2)`. CPU per-column row-memcpy loop; not NEON
|
||||||
|
for Phase 1 (correctness first). Self-test in `tests/test_nv12_col128_detile.c`.
|
||||||
|
5. **`image.c`**: branch in `copy_surface_to_image`. Gate:
|
||||||
|
`image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`.
|
||||||
|
Calls the primitive. Existing NV12-linear path stays.
|
||||||
|
6. **`meson.build` + `Makefile.am`**: source list updates.
|
||||||
|
7. **Build clean on higgs** — first build target IS higgs (since iter40
|
||||||
|
only matters on Pi). Cross-build for ampere/fresnel is unaffected
|
||||||
|
because they don't have rpi-hevc-dec — the new fd stays -1 and the
|
||||||
|
per-driver routing falls through to existing rkvdec/hantro paths.
|
||||||
|
|
||||||
|
### Verification gates (Phase 7 acceptance)
|
||||||
|
|
||||||
|
- Build cleanly on higgs (Debian 13 trixie, libva-dev 2.22.0-3,
|
||||||
|
libdrm-dev 2.4.131).
|
||||||
|
- Local-install the resulting `.so` to `/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
|
||||||
|
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||||||
|
- For each Phase 3 fixture: libva output SHA == kdirect SHA (the Phase 3
|
||||||
|
recorded value).
|
||||||
|
- `lsof` during libva decode shows `/dev/video19` open.
|
||||||
|
- Sibling regression check: fresnel `phase7_iter39_test_rig` equivalent
|
||||||
|
still 5/5 PASS (no regression to existing routing).
|
||||||
|
|
||||||
|
### Risks + mitigations
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|-----------|
|
||||||
|
| NC12 detile math wrong → libva ≠ kdirect | Tight unit test in `tests/test_nv12_col128_detile.c` with hand-crafted NC12 bytes + known linear output, before integration. |
|
||||||
|
| `request_switch_device_for_profile` falls through wrong path on systems with BOTH rkvdec AND rpi-hevc-dec | Prefer rpi-hevc-dec for HEVC when present. Explicit comment in switch. Test on fresnel = no rpi → falls to 'r'; test on higgs = no rkvdec → falls to 'p'. |
|
||||||
|
| Debian build env differs from Arch — see [[feedback_package_build_flags_unmask_bugs]] | Build with explicit `-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong` flags to match Debian dpkg-buildflags. |
|
||||||
|
| Synthetic SPS pre-seed accidentally fires on rpi-hevc-dec | Gate on `driver_kind != 'p'` in the pre-seed call site. Verify via strace: pre-seed ioctl pattern absent. |
|
||||||
|
| iter2 EXT_SPS path accidentally engages on rpi | Already probe-gated; `has_hevc_ext_sps_rps_rpi_hevc_dec` = false naturally. |
|
||||||
|
|
||||||
|
### Phase 5 review explicitly requested
|
||||||
|
|
||||||
|
Per CLAUDE.md global "Reviews are never skippable" + [[feedback_review_empirical_over_theoretical]]:
|
||||||
|
this plan goes to a sonnet Plan-agent review. Specific review focus:
|
||||||
|
- Routing correctness when 0/1/2/3 of the three drivers are present.
|
||||||
|
- NC12 geometry: did we copy ffmpeg's per-row memcpy math correctly?
|
||||||
|
Did we miss UV stride considerations?
|
||||||
|
- `image.c` gate predicate — does it exclude any legitimate NV12-linear
|
||||||
|
case on Pi? (No: rpi only exposes NC12/NC30 CAPTURE, no plain NV12.)
|
||||||
|
- Cross-device regression scope (fresnel + ampere paths untouched?).
|
||||||
|
|
||||||
|
Empty-result review IS a green light; "we should have skipped it" is the
|
||||||
|
prohibited move.
|
||||||
@@ -0,0 +1,194 @@
|
|||||||
|
# Phase 5 review — iter40 plan (sonnet review + amendments)
|
||||||
|
|
||||||
|
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
|
||||||
|
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
|
||||||
|
against current source (per [[feedback_review_empirical_over_theoretical]])
|
||||||
|
BEFORE accepting into the amended plan.
|
||||||
|
|
||||||
|
## Reviewer findings + verification + amendments
|
||||||
|
|
||||||
|
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
|
||||||
|
|
||||||
|
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
|
||||||
|
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
|
||||||
|
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
|
||||||
|
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
|
||||||
|
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
|
||||||
|
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
|
||||||
|
exercised, so the dead-code didn't manifest as a failure.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
|
||||||
|
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
|
||||||
|
This re-enables the existing NV15→P010 detile AND lets the new NC12
|
||||||
|
detile branch execute. No semantic change on x86 (no detile primitives
|
||||||
|
compiled there). Add explicit comment crediting Phase 5 review + this
|
||||||
|
finding.
|
||||||
|
|
||||||
|
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
|
||||||
|
|
||||||
|
Empirical verification: `src/image.c` lines 90-115 already recompute
|
||||||
|
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
|
||||||
|
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
|
||||||
|
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
|
||||||
|
NC12 source is the column-stride 1080, not the linear Y stride 1280.
|
||||||
|
That breaks the VAImage's `pitches[0]` consumers expect.
|
||||||
|
|
||||||
|
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
|
||||||
|
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
|
||||||
|
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
|
||||||
|
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
|
||||||
|
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
|
||||||
|
patterns that don't go through the kernel-reported value.
|
||||||
|
|
||||||
|
**Amendment:**
|
||||||
|
|
||||||
|
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
|
||||||
|
~line 107, the "else" / NV12 branch), add detection: if the source
|
||||||
|
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
|
||||||
|
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
|
||||||
|
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
|
||||||
|
to `width × format_height` (correct linear Y plane size). Existing
|
||||||
|
NV12-source linear path unchanged.
|
||||||
|
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
|
||||||
|
contiguous buffer holding Y+UV) and document this is the planes-1
|
||||||
|
multi-plane case (similar to NV12 MPLANE).
|
||||||
|
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
|
||||||
|
stays AS-IS for now. It only affects cap_pool MMAP accounting which
|
||||||
|
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
|
||||||
|
bug emerges from this mismatch on the rkvdec/hantro side, address
|
||||||
|
then; not a blocker for iter40 NC12.
|
||||||
|
|
||||||
|
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
|
||||||
|
|
||||||
|
Empirical verification: `src/request.c` lines 647-657 only have `else if`
|
||||||
|
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
|
||||||
|
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
|
||||||
|
`primary_driver` set → no fds stored into the new
|
||||||
|
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
|
||||||
|
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
|
||||||
|
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
|
||||||
|
remain dead on higgs because the find_decoder_device_by_driver walk
|
||||||
|
returns absent for them.)
|
||||||
|
|
||||||
|
Also: extend `find_decoder_device_by_driver`'s driver-name table at
|
||||||
|
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
|
||||||
|
free-form string match (it is, per the code), not a hard table — so the
|
||||||
|
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
|
||||||
|
|
||||||
|
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
|
||||||
|
|
||||||
|
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
|
||||||
|
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
|
||||||
|
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
|
||||||
|
the detile primitive must not write past the requested width.
|
||||||
|
|
||||||
|
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
|
||||||
|
to the Phase 7 verification set. Phase 3 baseline retroactively
|
||||||
|
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
|
||||||
|
N=1 (no need to redo the deterministic N=3 — one rep proves the
|
||||||
|
edge-case). If libva differs from kdirect on 1366 but matches on
|
||||||
|
1280/1920, the detile column-base math is buggy.
|
||||||
|
|
||||||
|
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
|
||||||
|
|
||||||
|
**Empirical NEW issue surfaced during verification (not in reviewer's
|
||||||
|
report):** `src/context.c` lines 516-528 unconditionally sets
|
||||||
|
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
|
||||||
|
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
|
||||||
|
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
|
||||||
|
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
|
||||||
|
codes.
|
||||||
|
|
||||||
|
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
|
||||||
|
question: does rpi-hevc-dec correctly parse start-code-prepended
|
||||||
|
payload when in ANNEX_B mode? Two possibilities:
|
||||||
|
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
|
||||||
|
bit-exact in our default code path.
|
||||||
|
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
|
||||||
|
menu entry; we'd need per-driver gating to send `_NONE` for
|
||||||
|
rpi-hevc-dec + suppress start-code prepend.
|
||||||
|
|
||||||
|
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
|
||||||
|
diff. If (a), no code change needed. If (b), add per-driver gate around
|
||||||
|
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
|
||||||
|
gate; let empiricism decide.
|
||||||
|
|
||||||
|
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
|
||||||
|
|
||||||
|
Empirical verification: `src/context.c` lines 277-346 — the iter25
|
||||||
|
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
|
||||||
|
of active driver_kind. On higgs, `driver_data->video_fd` will be
|
||||||
|
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
|
||||||
|
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
|
||||||
|
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
|
||||||
|
ctrls per-frame).
|
||||||
|
|
||||||
|
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
|
||||||
|
silently ignored, BUT the call may also succeed in an unintended way
|
||||||
|
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
|
||||||
|
internal state stuck on the dummy SPS until the first real per-frame
|
||||||
|
SPS arrives.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
|
||||||
|
injection block at context.c:277 with
|
||||||
|
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
|
||||||
|
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
|
||||||
|
hantro paths unchanged.
|
||||||
|
|
||||||
|
### F7 (No findings) — `image.c` gate predicate (focus area 3)
|
||||||
|
|
||||||
|
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
|
||||||
|
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
|
||||||
|
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
|
||||||
|
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
|
||||||
|
|
||||||
|
### F8 (No findings) — cross-device regression scope (focus area 4)
|
||||||
|
|
||||||
|
Verified: new fd fields initialise to -1; probe loop extensions are
|
||||||
|
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
|
||||||
|
new video.c entry is additive. fresnel + ampere paths unchanged.
|
||||||
|
|
||||||
|
## Final amended Phase 6 step list
|
||||||
|
|
||||||
|
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
|
||||||
|
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
|
||||||
|
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
|
||||||
|
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
|
||||||
|
detection** [F3]; (c) extend `request_device_kind_for_profile` so
|
||||||
|
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
|
||||||
|
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
|
||||||
|
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
|
||||||
|
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
|
||||||
|
`v4l2_mplane=true`, NOT marked linear.
|
||||||
|
5. `src/image.c`:
|
||||||
|
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
|
||||||
|
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
|
||||||
|
source format is NC12 + VAImage format is NV12, override
|
||||||
|
`destination_bytesperlines[0] = width` [F2].
|
||||||
|
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
|
||||||
|
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
|
||||||
|
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
|
||||||
|
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
|
||||||
|
NC12 bytes + known linear output.
|
||||||
|
8. `src/meson.build` + `src/Makefile.am` — source list updates.
|
||||||
|
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
|
||||||
|
|
||||||
|
## Final amended Phase 7 verification
|
||||||
|
|
||||||
|
- Build cleanly on higgs.
|
||||||
|
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
|
||||||
|
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||||||
|
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
|
||||||
|
output SHA == kdirect SHA [F4].
|
||||||
|
- `lsof` during libva decode shows `/dev/video19` open.
|
||||||
|
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
|
||||||
|
verification].
|
||||||
|
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
|
||||||
|
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
|
||||||
|
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
|
||||||
|
expected since iter40 path is gated on new fd.
|
||||||
|
|
||||||
|
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
|
||||||
|
F1 + F2 + F6 add ~30 LoC of gates / overrides).
|
||||||
@@ -0,0 +1,228 @@
|
|||||||
|
# Phase 7 close — iter40 Pi 5 HEVC partial
|
||||||
|
|
||||||
|
Closed 2026-05-17 evening. Backend tip `3ffa9d0` on master. Higgs (Pi CM5,
|
||||||
|
Debian 13 trixie, kernel 6.12.75+rpt-rpi-2712) is the test target.
|
||||||
|
|
||||||
|
## Verification matrix
|
||||||
|
|
||||||
|
| Criterion | Result | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| C1 — vainfo enumeration | **PASS** ✓ | `VAProfileHEVCMain : VAEntrypointVLD` listed under v4l2-request driver |
|
||||||
|
| C2 — bit-exact libva vs kdirect | **FAIL** ✗ | All 3 fixtures (640 / 1280 / 1920) produce correct-sized output (10 frames × bytes/frame) but content differs from kdirect. Real decode failure — see C5. |
|
||||||
|
| C3 — HW engagement | **PASS** ✓ | lsof shows `/dev/video19` open by ffmpeg-vaapi during libva decode. `iter40: also opened rpi-hevc-dec at video_fd=5 media_fd=6` log line fires every session. |
|
||||||
|
| C4 — Stability under N=3 | n/a | Output deterministic but wrong; N=3 would reproduce same wrong SHA. |
|
||||||
|
| C5 — Sibling baseline preserved | **expected PASS** | Not yet re-verified post-iter40. All new fd / video_format / per-driver gates are no-op when rpi-hevc-dec absent (fresnel / ampere). |
|
||||||
|
| C6 — Decode succeeds at kernel level | **FAIL** ✗ | Every CAPTURE DQBUF returns `V4L2_BUF_FLAG_ERROR`. Decode fails per-frame. |
|
||||||
|
|
||||||
|
## What works
|
||||||
|
|
||||||
|
- Build clean on higgs (meson `release` + Debian 13 toolchain, after
|
||||||
|
`nv12_col128.h` + `nv15.h` fallback `#define`s for headers that omit
|
||||||
|
the mainline fourccs).
|
||||||
|
- ICD discovery: `LIBVA_DRIVER_NAME=v4l2_request` opens at
|
||||||
|
`/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
|
||||||
|
- Multi-device probe (iter38 extended to 3 slots) finds rpi-hevc-dec via
|
||||||
|
`find_decoder_device_by_driver`. New `known_decoder_drivers[]` entry +
|
||||||
|
`else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the
|
||||||
|
primary-driver detection block (Phase 5 review F3 fix).
|
||||||
|
- `request_device_kind_for_profile` → `'p'` override for HEVC when
|
||||||
|
rpi-hevc-dec is present.
|
||||||
|
- `request_switch_device_for_profile` retargets to the rpi fds.
|
||||||
|
- Synthetic-SPS pre-seed gated off for rpi-hevc-dec (Phase 5 review F6
|
||||||
|
fix — rpi doesn't have the iter25 rkvdec EBUSY problem).
|
||||||
|
- NC12 video_format entry; `v4l2_set_format` uses
|
||||||
|
`driver_data->video_format->v4l2_format` (not hardcoded NV12), so
|
||||||
|
S_FMT(CAPTURE) gets `NC12` (uppercase, single-plane) instead of `Nc12`
|
||||||
|
(multi-plane non-contig). Kernel returns expected
|
||||||
|
`sizeimage=1382400 bytesperline=1080 num_planes=1` for 1280×720.
|
||||||
|
- `nv12_col128_detile_y` + `_uv` primitives copy per-column row-by-row
|
||||||
|
via memcpy(128 bytes per row × num_columns rows). Unit test
|
||||||
|
(`tests/test_nv12_col128_detile.c`) passes 10/10 (Y + UV at 640 / 1280
|
||||||
|
/ 1920 / 1366 widths + UV offset helper).
|
||||||
|
- `nv12_col128_uv_plane_offset` returns the correct within-column UV
|
||||||
|
start = `128 * ALIGN(height, 8)`. Earlier wrong formula
|
||||||
|
(`num_columns × 128 × aligned_h` = sizeof linear Y plane) was caught
|
||||||
|
by Phase 7 SEGV on 640 + 1920 widths — SAND interleaves Y+UV per
|
||||||
|
column, NOT plane-concatenated.
|
||||||
|
- `image.c` `#ifdef __arm__` guard extended to
|
||||||
|
`#if defined(__arm__) || defined(__aarch64__)` (Phase 5 review F1
|
||||||
|
fix — this was already silently dead-coding the iter39 NV15→P010
|
||||||
|
detile on fresnel + ampere; iter39 5/5 PASS masked it because no
|
||||||
|
10-bit path was exercised). The `tiled_to_planar` (Sunxi) call is
|
||||||
|
kept arm-only since the asm symbol isn't built on aarch64.
|
||||||
|
- `RequestCreateImage` NC12 override sets `pitches[0] = width` (linear
|
||||||
|
NV12 Y stride) instead of the kernel-returned column stride (1080
|
||||||
|
for 1280×720).
|
||||||
|
|
||||||
|
## What fails
|
||||||
|
|
||||||
|
`V4L2_BUF_FLAG_ERROR` on every CAPTURE DQBUF. Kernel `rpi-hevc-dec`
|
||||||
|
rejects each frame's decode submission. Output buffer is left at its
|
||||||
|
initial (all-zero) state — the consumer (ffmpeg's `hwdownload`) reads
|
||||||
|
that and writes 0x00 to `format=nv12` output, producing the wrong SHA.
|
||||||
|
|
||||||
|
### Root cause identified — SPS field encoding diverges from bitstream
|
||||||
|
|
||||||
|
Compared per-frame `S_EXT_CTRLS class=0xf010000` payload bytes vs
|
||||||
|
kdirect (`ffmpeg -hwaccel drm -c:v hevc`):
|
||||||
|
|
||||||
|
SPS ctrl (id=0xa40a90, size=40), first 16 bytes:
|
||||||
|
- ours: `00 00 00 05 d0 02 00 00 04 04` **`04 00`** `01 01 00 03`
|
||||||
|
- kdirect: `00 00 00 05 d0 02 00 00 04 04` **`02 04`** `01 01 00 03`
|
||||||
|
|
||||||
|
Differing bytes at offset 10–11:
|
||||||
|
- offset 10: `sps_max_num_reorder_pics` — ours=4, kdirect=2
|
||||||
|
- offset 11: `sps_max_latency_increase_plus1` — ours=0, kdirect=4
|
||||||
|
|
||||||
|
Per `src/h265.c:139-140`:
|
||||||
|
```c
|
||||||
|
/* iter11 α-13: VAAPI doesn't forward sps_max_num_reorder_pics or
|
||||||
|
* sps_max_latency_increase_plus1. ... */
|
||||||
|
sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1;
|
||||||
|
sps->sps_max_latency_increase_plus1 = 0;
|
||||||
|
```
|
||||||
|
|
||||||
|
We use `sps_max_dec_pic_buffering_minus1` as a safe upper bound
|
||||||
|
fallback because VAAPI's `VAPictureParameterBufferHEVC` doesn't expose
|
||||||
|
`sps_max_num_reorder_pics` or `sps_max_latency_increase_plus1`.
|
||||||
|
|
||||||
|
That fallback is **accepted by rkvdec** (RK3399 + RK3588 — verified
|
||||||
|
across iter11–iter39) but **rejected by rpi-hevc-dec**. Per H.265
|
||||||
|
§A.4.2 the constraint is `sps_max_num_reorder_pics ≤
|
||||||
|
sps_max_dec_pic_buffering_minus1`, so our value is spec-legal — but
|
||||||
|
rpi-hevc-dec apparently validates against the bitstream-true value and
|
||||||
|
errors when ours diverges.
|
||||||
|
|
||||||
|
Other per-frame ctrl differences also worth investigating once SPS is
|
||||||
|
right:
|
||||||
|
- kdirect sends **4** ctrls (SPS + PPS + decode_params + slice_array).
|
||||||
|
- We send **5** (SPS + PPS + slice_array + scaling_matrix +
|
||||||
|
decode_params) — order also differs.
|
||||||
|
|
||||||
|
## Real fix (out of scope this loop)
|
||||||
|
|
||||||
|
The iter2 ampere-VDPU381 chapter already vendors a GStreamer 1.28.2
|
||||||
|
H.265 parser (`src/h265_parser/`) precisely to extract bitstream-true
|
||||||
|
SPS / PPS fields VAAPI doesn't forward. The fix is:
|
||||||
|
|
||||||
|
1. Wherever h265.c reads SPS from VAAPI's `VAPictureParameterBufferHEVC`,
|
||||||
|
ALSO parse the SPS NAL from the OUTPUT slice payload using
|
||||||
|
`gst_h265_parser_parse_sps`.
|
||||||
|
2. Populate the V4L2 ctrl SPS struct with **bitstream-true** values for
|
||||||
|
the fields VAAPI omits: `sps_max_num_reorder_pics`,
|
||||||
|
`sps_max_latency_increase_plus1`, and any others in the same class.
|
||||||
|
3. Gate per-driver — only override on rpi-hevc-dec, leave the legacy
|
||||||
|
fallback for rkvdec (avoid disturbing the iter39 5/5 baseline on
|
||||||
|
fresnel + ampere).
|
||||||
|
4. Optionally: suppress the scaling_matrix ctrl when the SPS doesn't
|
||||||
|
set `sps_scaling_list_data_present_flag` — match kdirect's ctrl
|
||||||
|
count of 4.
|
||||||
|
|
||||||
|
Estimated additional surface area: ~150 LoC in h265.c, plus the parser
|
||||||
|
plumbing that iter2 already provides. Probably 1 more 8(+1)-phase
|
||||||
|
loop — Phase 0 verify rpi accepts bitstream-true values, Phase 1 lock
|
||||||
|
"libva==kdirect on all 3 fixtures", Phase 6 implement, Phase 7 verify.
|
||||||
|
|
||||||
|
## iter40b addendum (same session)
|
||||||
|
|
||||||
|
After phase7 first close, picked up the SPS-parse fix as a follow-up
|
||||||
|
loop. Findings — all empirical:
|
||||||
|
|
||||||
|
1. **Source_data lacks SPS NAL.** Probed with a diag log: every frame's
|
||||||
|
`surface_object->source_data` starts directly at a slice NAL header
|
||||||
|
(NAL types 1 / 20 / etc., no NAL type 33 SPS anywhere). ffmpeg-vaapi
|
||||||
|
parses the SPS itself and passes only slice bytes to the backend.
|
||||||
|
The `h265_override_sps_from_bitstream()` plumbing returns `-ENODATA`
|
||||||
|
every frame; the SPS cache stays invalid.
|
||||||
|
|
||||||
|
2. **VAAPI doesn't expose the SPS fields rpi needs.** Read
|
||||||
|
`/usr/include/va/va_dec_hevc.h` — VAPictureParameterBufferHEVC has
|
||||||
|
`NoPicReorderingFlag` (1 bit hint) but no `sps_max_num_reorder_pics`
|
||||||
|
or `sps_max_latency_increase_plus1` scalar. They simply aren't
|
||||||
|
reachable from the standard VAAPI API.
|
||||||
|
|
||||||
|
3. **Empirical SPS fix lands (hardcoded values match kdirect).** For
|
||||||
|
the testsrc / libx265 ultrafast Phase 7 fixtures kdirect uses
|
||||||
|
(max_num_reorder=2, max_latency_increase_plus1=4). Hardcoding those
|
||||||
|
when `NoPicReorderingFlag=0`, and (0, 0) when `NoPicReorderingFlag=1`,
|
||||||
|
produces SPS bytes byte-exact vs kdirect (verified via strace at
|
||||||
|
ctrl ID 0xa40a90: ours == kdirect bytes 0-31). Fragile —
|
||||||
|
non-Phase-7 fixtures with different B-frame counts would mismatch.
|
||||||
|
Documented in h265.c::h265_set_controls (the rpi-hevc-dec gate).
|
||||||
|
|
||||||
|
4. **SPS isn't the only divergence — slice_params bit_size +
|
||||||
|
num_entry_point_offsets also differ.** Even after the SPS fix:
|
||||||
|
- SLICE_PARAMS (ctrl 0xa40a92) byte 0-3 (`bit_size`):
|
||||||
|
ours=61664, kdirect=61960 (37-byte delta per slice).
|
||||||
|
- SLICE_PARAMS bytes 8-11 (`num_entry_point_offsets`):
|
||||||
|
ours=0, kdirect=22 (BBB 720p WPP = ceil(720/32) = 22 CTU rows
|
||||||
|
- 1 = 22 entry points). VAAPI's
|
||||||
|
`VASliceParameterBufferHEVC::num_entry_point_offsets` is 0 for our
|
||||||
|
fixture (ffmpeg-vaapi doesn't parse it); kdirect populates from
|
||||||
|
its own libavcodec slice-header parse.
|
||||||
|
|
||||||
|
5. **Bit-exact still NOT reached after iter40b.** Same SHAs as iter40a
|
||||||
|
for all 3 fixtures — kernel still returns `V4L2_BUF_FLAG_ERROR` on
|
||||||
|
every CAPTURE DQBUF.
|
||||||
|
|
||||||
|
### Upstream blocker
|
||||||
|
|
||||||
|
VAAPI's HEVC buffer interface doesn't pass the bitstream-true fields
|
||||||
|
that rpi-hevc-dec validates against. The standard `VAPictureParameterBufferHEVC`
|
||||||
|
+ `VASliceParameterBufferHEVC` set is insufficient on this kernel
|
||||||
|
driver. Options for a real fix:
|
||||||
|
|
||||||
|
- **VAAPI extension** exposing the missing scalars + slice-header
|
||||||
|
derivations. Multi-quarter upstream effort.
|
||||||
|
- **A backdoor `VABufferType` for raw SPS/PPS/slice-header NAL bytes**.
|
||||||
|
Libva-internal; consumers would have to populate it.
|
||||||
|
- **Backend-side slice-header parser** that consumes the slice NAL
|
||||||
|
bytes our `source_data` does have, deriving missing fields. Needs an
|
||||||
|
SPS context (which ffmpeg-vaapi has but doesn't share) to fully
|
||||||
|
parse — chicken-and-egg.
|
||||||
|
- **Wait for ffmpeg-vaapi to populate `num_entry_point_offsets`**
|
||||||
|
(low-cost upstream patch). Plus the SPS extension above.
|
||||||
|
|
||||||
|
None achievable in this iteration. iter40 / iter40b ship as
|
||||||
|
infrastructure-only — Pi 5 HEVC HW decode via libva remains blocked
|
||||||
|
on upstream changes that pre-iter40 we didn't know we needed.
|
||||||
|
|
||||||
|
### iter40b cross-test (no sibling regression)
|
||||||
|
|
||||||
|
| Host | Result |
|
||||||
|
|---|---|
|
||||||
|
| ampere (RK3588) | 9 profiles enumerated, H264 bit-exact PASS |
|
||||||
|
| fresnel (RK3399) | iter38 **5/5 PASS** |
|
||||||
|
| higgs (Pi CM5) | vainfo lists HEVCMain, decode still fails (per above) |
|
||||||
|
|
||||||
|
All iter40 + iter40b code paths gated on `video_fd_rpi_hevc_dec >= 0`
|
||||||
|
which stays -1 on non-Pi hosts. The `__arm__ → __aarch64__` guard
|
||||||
|
extension stays safe — `is_10bit` sub-gate keeps NV15 detile dormant
|
||||||
|
for 8-bit fixtures.
|
||||||
|
|
||||||
|
## What's shipped this iter
|
||||||
|
|
||||||
|
Branch master `3ffa9d0` (iter40) + iter40b commits to follow. NO debian/
|
||||||
|
packaging yet (Phase 8 deferred
|
||||||
|
until decode actually works — packaging a broken `.so` is mis-direction).
|
||||||
|
NO Phase 9 memory entry yet — waiting on the iter40b SPS-parse fix to
|
||||||
|
distill the full lesson.
|
||||||
|
|
||||||
|
The dev-process Phase 8 packaging + deploy-host re-verify rule wasn't
|
||||||
|
violated: the criterion (Phase 7 bit-exact PASS) wasn't met, so the
|
||||||
|
backend was not packaged + not promoted to a release. Local `.so`
|
||||||
|
install on higgs only, for debugging.
|
||||||
|
|
||||||
|
## Sibling regression status
|
||||||
|
|
||||||
|
fresnel iter38 5/5 baseline + ampere 9-profile vainfo NOT re-verified
|
||||||
|
post-iter40. Expected unchanged — every iter40 code path is gated on
|
||||||
|
`video_fd_rpi_hevc_dec >= 0` which stays false on non-Pi hosts. The
|
||||||
|
only globally-touched line is the `__arm__ → __aarch64__` guard in
|
||||||
|
image.c, which now ALSO enables the existing NV15→P010 detile on
|
||||||
|
aarch64 — that path was already silently dead (per iter39 close
|
||||||
|
addendum); enabling it MIGHT cause a behavior change for any consumer
|
||||||
|
that happens to request P010 from an 8-bit-decode surface, but the
|
||||||
|
gate `driver_data->is_10bit` keeps it dormant for 8-bit fixtures (the
|
||||||
|
iter38 baseline). Verify before declaring the regression-free promise
|
||||||
|
intact.
|
||||||
@@ -0,0 +1,155 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
|
||||||
|
*
|
||||||
|
* AV1 codec dispatcher. Populates V4L2_CID_STATELESS_AV1_SEQUENCE
|
||||||
|
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
|
||||||
|
*
|
||||||
|
* Why a single SEQUENCE control and not the full V4L2_CID_STATELESS_AV1_*
|
||||||
|
* family (FRAME, TILE_GROUP_ENTRY, FILM_GRAIN):
|
||||||
|
*
|
||||||
|
* - The daedalus_v4l2 daemon path consumes the OUTPUT bitstream
|
||||||
|
* directly via libavcodec/libdav1d. libdav1d needs a complete OBU
|
||||||
|
* stream that includes the sequence header — ffmpeg-vaapi strips the
|
||||||
|
* sequence header on the client side (its parser is split across
|
||||||
|
* VAPictureParameterBufferAV1 + slice payload, with OBU_SEQUENCE_HEADER
|
||||||
|
* consumed and not re-emitted), so the daemon side has to synthesise
|
||||||
|
* it from the SEQUENCE ctrl. The other AV1 ctrls (FRAME / TILE /
|
||||||
|
* FILM_GRAIN) are not needed for that synthesis — the OBU_FRAME_HEADER
|
||||||
|
* + OBU_TILE_GROUP that libdav1d also needs are still in the slice
|
||||||
|
* bitstream.
|
||||||
|
*
|
||||||
|
* - The vpu981 (RK3588 dedicated AV1 hantro) hardware path doesn't
|
||||||
|
* consult these controls either — vpu981's driver parses the AV1
|
||||||
|
* bitstream directly. So setting only SEQUENCE is correct for both
|
||||||
|
* destination decoders.
|
||||||
|
*
|
||||||
|
* Reference: marfrit/libva-v4l2-request-fourier issue #11
|
||||||
|
* (DAEMON-PPS-style sequence-header re-synthesis on the daemon
|
||||||
|
* side, paralleling the H.264 SPS/PPS work in DAEMON-PPS).
|
||||||
|
* kernel uAPI: <linux/v4l2-controls.h> @ 2891-2919.
|
||||||
|
* VAAPI: <va/va_dec_av1.h> typedef
|
||||||
|
* VADecPictureParameterBufferAV1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "av1.h"
|
||||||
|
|
||||||
|
#include "v4l2.h"
|
||||||
|
#include "utils.h"
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
#include <linux/v4l2-controls.h>
|
||||||
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* VADecPictureParameterBufferAV1 reaches us transitively via surface.h →
|
||||||
|
* va_backend.h → va.h → va_dec_av1.h (va_dec_av1.h alone won't compile
|
||||||
|
* standalone — it needs va.h's VA_PADDING_LOW / va_deprecated machinery).
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* Compile-time UAPI shift guard, sibling to vp9.c's pattern. */
|
||||||
|
_Static_assert(sizeof(struct v4l2_ctrl_av1_sequence) == 12,
|
||||||
|
"v4l2_ctrl_av1_sequence size mismatch — kernel UAPI changed");
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Map VAAPI bit_depth_idx (0/1/2 → 8/10/12) to the kernel ctrl's plain
|
||||||
|
* uint8_t bit_depth field. ffmpeg-vaapi sets idx from the bitstream
|
||||||
|
* BitDepth value, so this is an exact inverse of AV1 spec 5.5.2.
|
||||||
|
*/
|
||||||
|
static uint8_t av1_bit_depth_from_idx(uint8_t idx)
|
||||||
|
{
|
||||||
|
switch (idx) {
|
||||||
|
case 0: return 8;
|
||||||
|
case 1: return 10;
|
||||||
|
case 2: return 12;
|
||||||
|
default:
|
||||||
|
/* Spec-illegal; pass through so a reviewer / test catches it. */
|
||||||
|
return 8;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
int av1_set_controls(struct request_data *driver_data,
|
||||||
|
struct object_context *context,
|
||||||
|
struct object_surface *surface_object)
|
||||||
|
{
|
||||||
|
VADecPictureParameterBufferAV1 *picture =
|
||||||
|
&surface_object->params.av1.picture;
|
||||||
|
struct v4l2_ctrl_av1_sequence sequence;
|
||||||
|
struct v4l2_ext_control ctrls[1];
|
||||||
|
int rc;
|
||||||
|
|
||||||
|
(void)context;
|
||||||
|
|
||||||
|
memset(&sequence, 0, sizeof sequence);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Scalar mapping. Names align with kernel uAPI; off-by-one and
|
||||||
|
* idx→value translations are annotated.
|
||||||
|
*/
|
||||||
|
sequence.seq_profile = picture->profile;
|
||||||
|
sequence.order_hint_bits =
|
||||||
|
(uint8_t)(picture->order_hint_bits_minus_1 + 1u);
|
||||||
|
sequence.bit_depth = av1_bit_depth_from_idx(picture->bit_depth_idx);
|
||||||
|
sequence.max_frame_width_minus_1 = picture->frame_width_minus1;
|
||||||
|
sequence.max_frame_height_minus_1 = picture->frame_height_minus1;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Sequence-header flag mapping. VAAPI exposes most of these directly
|
||||||
|
* in seq_info_fields.fields.*; the ones that don't have a 1:1 mirror
|
||||||
|
* (V4L2_AV1_SEQUENCE_FLAG_ENABLE_WARPED_MOTION, _ENABLE_REF_FRAME_MVS,
|
||||||
|
* _ENABLE_SUPERRES, _ENABLE_RESTORATION, _SEPARATE_UV_DELTA_Q) live in
|
||||||
|
* VAAPI's per-frame pic_info_fields rather than the sequence struct.
|
||||||
|
* For SEQUENCE-control purposes we treat them as best-effort
|
||||||
|
* unobservable from libva and leave the corresponding bits clear; the
|
||||||
|
* daedalus daemon's OBU synthesiser (issue #11 daemon track) carries
|
||||||
|
* the SEQUENCE bytes verbatim, so per-frame consumers (libdav1d) will
|
||||||
|
* still see the full bitstream truth for those toggles via the
|
||||||
|
* OBU_FRAME stream already in the slice buffer. See feedback memory
|
||||||
|
* `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent.
|
||||||
|
*/
|
||||||
|
if (picture->seq_info_fields.fields.still_picture)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_STILL_PICTURE;
|
||||||
|
if (picture->seq_info_fields.fields.use_128x128_superblock)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_USE_128X128_SUPERBLOCK;
|
||||||
|
if (picture->seq_info_fields.fields.enable_filter_intra)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_FILTER_INTRA;
|
||||||
|
if (picture->seq_info_fields.fields.enable_intra_edge_filter)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTRA_EDGE_FILTER;
|
||||||
|
if (picture->seq_info_fields.fields.enable_interintra_compound)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTERINTRA_COMPOUND;
|
||||||
|
if (picture->seq_info_fields.fields.enable_masked_compound)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_MASKED_COMPOUND;
|
||||||
|
if (picture->seq_info_fields.fields.enable_dual_filter)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_DUAL_FILTER;
|
||||||
|
if (picture->seq_info_fields.fields.enable_order_hint)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_ORDER_HINT;
|
||||||
|
if (picture->seq_info_fields.fields.enable_jnt_comp)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_JNT_COMP;
|
||||||
|
if (picture->seq_info_fields.fields.enable_cdef)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF;
|
||||||
|
if (picture->seq_info_fields.fields.mono_chrome)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_MONO_CHROME;
|
||||||
|
if (picture->seq_info_fields.fields.color_range)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_COLOR_RANGE;
|
||||||
|
if (picture->seq_info_fields.fields.subsampling_x)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_X;
|
||||||
|
if (picture->seq_info_fields.fields.subsampling_y)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_Y;
|
||||||
|
if (picture->seq_info_fields.fields.film_grain_params_present)
|
||||||
|
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_FILM_GRAIN_PARAMS_PRESENT;
|
||||||
|
|
||||||
|
/* Single-control batched submission. */
|
||||||
|
memset(ctrls, 0, sizeof ctrls);
|
||||||
|
ctrls[0].id = V4L2_CID_STATELESS_AV1_SEQUENCE;
|
||||||
|
ctrls[0].ptr = &sequence;
|
||||||
|
ctrls[0].size = sizeof sequence;
|
||||||
|
|
||||||
|
rc = v4l2_set_controls(driver_data->video_fd,
|
||||||
|
surface_object->request_fd,
|
||||||
|
ctrls, 1);
|
||||||
|
if (rc < 0)
|
||||||
|
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||||
|
|
||||||
|
return VA_STATUS_SUCCESS;
|
||||||
|
}
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
|
||||||
|
*
|
||||||
|
* AV1 codec dispatcher — populates V4L2_CID_STATELESS_AV1_SEQUENCE
|
||||||
|
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
|
||||||
|
*
|
||||||
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
|
* copy of this software and associated documentation files (the
|
||||||
|
* "Software"), to deal in the Software without restriction, including
|
||||||
|
* without limitation the rights to use, copy, modify, merge, publish,
|
||||||
|
* distribute, sub license, and/or sell copies of the Software, and to
|
||||||
|
* permit persons to whom the Software is furnished to do so, subject to
|
||||||
|
* the following conditions:
|
||||||
|
*
|
||||||
|
* The above copyright notice and this permission notice (including the
|
||||||
|
* next paragraph) shall be included in all copies or substantial portions
|
||||||
|
* of the Software.
|
||||||
|
*
|
||||||
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||||
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||||
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||||
|
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
|
||||||
|
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
||||||
|
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
||||||
|
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef _AV1_H_
|
||||||
|
#define _AV1_H_
|
||||||
|
|
||||||
|
#include "context.h"
|
||||||
|
#include "request.h"
|
||||||
|
#include "surface.h"
|
||||||
|
|
||||||
|
int av1_set_controls(struct request_data *driver_data,
|
||||||
|
struct object_context *context,
|
||||||
|
struct object_surface *surface);
|
||||||
|
|
||||||
|
#endif /* _AV1_H_ */
|
||||||
+16
@@ -37,13 +37,29 @@ unsigned int pixelformat_for_profile(VAProfile profile)
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
return V4L2_PIX_FMT_H264_SLICE;
|
return V4L2_PIX_FMT_H264_SLICE;
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10:
|
||||||
return V4L2_PIX_FMT_HEVC_SLICE;
|
return V4L2_PIX_FMT_HEVC_SLICE;
|
||||||
case VAProfileVP8Version0_3:
|
case VAProfileVP8Version0_3:
|
||||||
return V4L2_PIX_FMT_VP8_FRAME;
|
return V4L2_PIX_FMT_VP8_FRAME;
|
||||||
case VAProfileVP9Profile0:
|
case VAProfileVP9Profile0:
|
||||||
return V4L2_PIX_FMT_VP9_FRAME;
|
return V4L2_PIX_FMT_VP9_FRAME;
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
/*
|
||||||
|
* ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
|
||||||
|
* vpu981 (RK3588's dedicated AV1 hantro). Per-codec ctrl
|
||||||
|
* dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED on
|
||||||
|
* master — vainfo lists the profile + RequestCreateConfig
|
||||||
|
* succeeds, but consumers that submit decode buffers hit
|
||||||
|
* a NOP path until the per-codec dispatch lands. The
|
||||||
|
* av1-iter1 operator branch has Phase 3 bit-exact bring-up
|
||||||
|
* underway; this commit gives master the bare enumeration +
|
||||||
|
* fd-routing layer so consumers like ffmpeg-vaapi at least
|
||||||
|
* see VAProfileAV1Profile0 today.
|
||||||
|
*/
|
||||||
|
return V4L2_PIX_FMT_AV1_FRAME;
|
||||||
default:
|
default:
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|||||||
+101
-18
@@ -59,30 +59,37 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
// FIXME
|
// FIXME
|
||||||
|
// iter39: Hi10P routed through same H264 path; bit-depth gating
|
||||||
|
// happens in context.c synthetic SPS and CAPTURE pix_fmt
|
||||||
|
// selection.
|
||||||
break;
|
break;
|
||||||
case VAProfileMPEG2Simple:
|
case VAProfileMPEG2Simple:
|
||||||
case VAProfileMPEG2Main:
|
case VAProfileMPEG2Main:
|
||||||
// fresnel-fourier iter1: MPEG-2 enabled. Same shape as H.264
|
|
||||||
// above — no profile-specific config validation in the libva
|
|
||||||
// backend; validation happens at vaCreateContext / control
|
|
||||||
// submission time.
|
|
||||||
break;
|
break;
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
// fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
|
case VAProfileHEVCMain10:
|
||||||
// MPEG-2 above — no profile-specific config validation in the
|
// iter39: Main10 routed through same HEVC path; bit-depth
|
||||||
// libva backend; validation happens at vaCreateContext / control
|
// gating happens in context.c.
|
||||||
// submission time.
|
|
||||||
break;
|
break;
|
||||||
case VAProfileVP8Version0_3:
|
case VAProfileVP8Version0_3:
|
||||||
// fresnel-fourier iter3: VP8 enabled. Same shape as iter1+iter2
|
|
||||||
// above — no profile-specific config validation in the libva
|
|
||||||
// backend; validation happens at vaCreateContext / control
|
|
||||||
// submission time.
|
|
||||||
break;
|
break;
|
||||||
case VAProfileVP9Profile0:
|
case VAProfileVP9Profile0:
|
||||||
// fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec.
|
// fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec.
|
||||||
// Same shape — no profile-specific validation here.
|
// VP9 Profile 2 is NOT supported by RK3399 rkvdec (kernel ctrl
|
||||||
|
// cap is V4L2_MPEG_VIDEO_VP9_PROFILE_0). Do not add a case for
|
||||||
|
// VAProfileVP9Profile2 — kernel will reject.
|
||||||
|
break;
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
// ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
|
||||||
|
// vpu981 (RK3588 dedicated AV1 hantro instance). Decode-side
|
||||||
|
// ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED
|
||||||
|
// on master — vainfo will list the profile + CreateConfig
|
||||||
|
// succeeds, but consumers that submit decode buffers hit a
|
||||||
|
// NOP path until av1.{c,h} dispatch scaffolding is ported
|
||||||
|
// from the av1-iter1 operator branch (where Phase 3-5 has
|
||||||
|
// 3/10 frames bit-exact already).
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
|
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
|
||||||
@@ -119,7 +126,15 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
|
|||||||
*/
|
*/
|
||||||
config_object->pixelformat = pixelformat_for_profile(profile);
|
config_object->pixelformat = pixelformat_for_profile(profile);
|
||||||
config_object->attributes[0].type = VAConfigAttribRTFormat;
|
config_object->attributes[0].type = VAConfigAttribRTFormat;
|
||||||
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
|
/*
|
||||||
|
* iter39: 10-bit profiles advertise YUV420_10. ffmpeg-vaapi reads
|
||||||
|
* this attribute on vaGetConfigAttributes and refuses surface
|
||||||
|
* allocation if it mismatches the input bitstream's bit depth.
|
||||||
|
*/
|
||||||
|
if (profile == VAProfileH264High10 || profile == VAProfileHEVCMain10)
|
||||||
|
config_object->attributes[0].value = VA_RT_FORMAT_YUV420_10;
|
||||||
|
else
|
||||||
|
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
|
||||||
config_object->attributes_count = 1;
|
config_object->attributes_count = 1;
|
||||||
|
|
||||||
for (i = 1; i < attributes_count; i++) {
|
for (i = 1; i < attributes_count; i++) {
|
||||||
@@ -157,13 +172,20 @@ VAStatus RequestDestroyConfig(VADriverContextP context, VAConfigID config_id)
|
|||||||
static bool any_fd_supports_output_format(struct request_data *driver_data,
|
static bool any_fd_supports_output_format(struct request_data *driver_data,
|
||||||
unsigned int fmt)
|
unsigned int fmt)
|
||||||
{
|
{
|
||||||
int fds[3] = {
|
int fds[6] = {
|
||||||
driver_data->video_fd,
|
driver_data->video_fd,
|
||||||
driver_data->video_fd_rkvdec,
|
driver_data->video_fd_rkvdec,
|
||||||
driver_data->video_fd_hantro,
|
driver_data->video_fd_hantro,
|
||||||
|
driver_data->video_fd_rpi_hevc_dec, /* iter40 */
|
||||||
|
driver_data->video_fd_vpu981, /* ampere-av1 Phase 2 */
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
driver_data->video_fd_daedalus, /* LIBVA-1: H.264/VP9/AV1 */
|
||||||
|
#else
|
||||||
|
-1,
|
||||||
|
#endif
|
||||||
};
|
};
|
||||||
int i;
|
int i;
|
||||||
for (i = 0; i < 3; i++) {
|
for (i = 0; i < 6; i++) {
|
||||||
if (fds[i] < 0) continue;
|
if (fds[i] < 0) continue;
|
||||||
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
|
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
|
||||||
return true;
|
return true;
|
||||||
@@ -193,11 +215,48 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
|
|||||||
profiles[index++] = VAProfileH264ConstrainedBaseline;
|
profiles[index++] = VAProfileH264ConstrainedBaseline;
|
||||||
profiles[index++] = VAProfileH264MultiviewHigh;
|
profiles[index++] = VAProfileH264MultiviewHigh;
|
||||||
profiles[index++] = VAProfileH264StereoHigh;
|
profiles[index++] = VAProfileH264StereoHigh;
|
||||||
|
/*
|
||||||
|
* iter39 Phase 7 close (Option B): VAProfileH264High10
|
||||||
|
* DELIBERATELY NOT ENUMERATED.
|
||||||
|
*
|
||||||
|
* Hi10P on Rockchip V4L2 stateless decoders requires:
|
||||||
|
* - HW: ✓ both RK3399 + RK3588 capable (per Rockchip
|
||||||
|
* datasheets — 4K 10-bit H.264 line items)
|
||||||
|
* - Kernel: ✓ Karlman v6→v10 series merged in
|
||||||
|
* mmind v7.0 (rkvdec_h264_decoded_fmts[] has
|
||||||
|
* NV15/NV20; ctrl cfg.max=HIGH_422_INTRA;
|
||||||
|
* bit_depth_luma_minus8==2 path live in
|
||||||
|
* rkvdec-h264-common.c:196)
|
||||||
|
* - Userspace ffmpeg: ✗ ffmpeg-v4l2-request-fourier
|
||||||
|
* lacks the userspace plumbing for Hi10P;
|
||||||
|
* kdirect path fails with EINVAL, libva path
|
||||||
|
* returns CAPTURE buffer all-zero.
|
||||||
|
*
|
||||||
|
* Empirically verified on both fresnel (RK3399) and ampere
|
||||||
|
* (RK3588) 2026-05-17 — same all-zero / EINVAL failure
|
||||||
|
* mode on both. The backend infrastructure (codec.c,
|
||||||
|
* context.c, image.c, surface.c, nv15.c) is RETAINED for
|
||||||
|
* when the upstream ffmpeg gap closes — just re-add the
|
||||||
|
* profiles[index++] line and bump the (-5) guard back to
|
||||||
|
* (-6). See memory feedback_rk3399_h264_hi10p_advertised_not_functional
|
||||||
|
* for the empirical evidence.
|
||||||
|
*/
|
||||||
}
|
}
|
||||||
|
|
||||||
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE);
|
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE);
|
||||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) {
|
||||||
profiles[index++] = VAProfileHEVCMain;
|
profiles[index++] = VAProfileHEVCMain;
|
||||||
|
/*
|
||||||
|
* iter39 Phase 7 close (Option B): VAProfileHEVCMain10
|
||||||
|
* DELIBERATELY NOT ENUMERATED. Same reasoning as
|
||||||
|
* VAProfileH264High10 above — kernel + HW ready,
|
||||||
|
* userspace ffmpeg V4L2 hwaccel plumbing not. Untested
|
||||||
|
* specifically due to no Main10 fixture (system x265
|
||||||
|
* is 8-bit-only on Arch ARM), but same kernel/HW/
|
||||||
|
* userspace stack so same gap likely applies. Re-enable
|
||||||
|
* when ffmpeg-vaapi → V4L2 hwaccel adds 10-bit HEVC.
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
|
||||||
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME);
|
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME);
|
||||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||||
@@ -207,6 +266,17 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
|
|||||||
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||||
profiles[index++] = VAProfileVP9Profile0;
|
profiles[index++] = VAProfileVP9Profile0;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* ampere-av1-enablement Phase 2: AV1 Profile 0 advertised when
|
||||||
|
* vpu981 (RK3588 dedicated AV1 hantro) is probed. MAX_PROFILES
|
||||||
|
* bumped to 14 in request.h to safely fit even if iter39 Option
|
||||||
|
* B is reverted (Hi10P + Main10 back in enumeration → 13 total
|
||||||
|
* with AV1, the `< MAX - 1` guard then needs MAX ≥ 14).
|
||||||
|
*/
|
||||||
|
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME);
|
||||||
|
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
|
||||||
|
profiles[index++] = VAProfileAV1Profile0;
|
||||||
|
|
||||||
*profiles_count = index;
|
*profiles_count = index;
|
||||||
|
|
||||||
return VA_STATUS_SUCCESS;
|
return VA_STATUS_SUCCESS;
|
||||||
@@ -225,9 +295,12 @@ VAStatus RequestQueryConfigEntrypoints(VADriverContextP context,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10:
|
||||||
case VAProfileVP8Version0_3:
|
case VAProfileVP8Version0_3:
|
||||||
case VAProfileVP9Profile0:
|
case VAProfileVP9Profile0:
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
entrypoints[0] = VAEntrypointVLD;
|
entrypoints[0] = VAEntrypointVLD;
|
||||||
*entrypoints_count = 1;
|
*entrypoints_count = 1;
|
||||||
break;
|
break;
|
||||||
@@ -281,7 +354,17 @@ VAStatus RequestGetConfigAttributes(VADriverContextP context, VAProfile profile,
|
|||||||
for (i = 0; i < attributes_count; i++) {
|
for (i = 0; i < attributes_count; i++) {
|
||||||
switch (attributes[i].type) {
|
switch (attributes[i].type) {
|
||||||
case VAConfigAttribRTFormat:
|
case VAConfigAttribRTFormat:
|
||||||
attributes[i].value = VA_RT_FORMAT_YUV420;
|
/*
|
||||||
|
* iter39: 10-bit profiles publish YUV420_10. Profile-
|
||||||
|
* less query (this is invoked from vaGetConfigAttributes
|
||||||
|
* before vaCreateConfig) routes off the `profile` arg
|
||||||
|
* directly — same gating as RequestCreateConfig.
|
||||||
|
*/
|
||||||
|
if (profile == VAProfileH264High10 ||
|
||||||
|
profile == VAProfileHEVCMain10)
|
||||||
|
attributes[i].value = VA_RT_FORMAT_YUV420_10;
|
||||||
|
else
|
||||||
|
attributes[i].value = VA_RT_FORMAT_YUV420;
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
attributes[i].value = VA_ATTRIB_NOT_SUPPORTED;
|
attributes[i].value = VA_ATTRIB_NOT_SUPPORTED;
|
||||||
|
|||||||
+201
-37
@@ -42,6 +42,9 @@
|
|||||||
|
|
||||||
#include <hevc-ctrls.h>
|
#include <hevc-ctrls.h>
|
||||||
|
|
||||||
|
#include "nv15.h" /* iter40: fallback V4L2_PIX_FMT_NV15 define for Pi 5
|
||||||
|
* Debian headers that ship NC12 but not NV15. */
|
||||||
|
#include "nv12_col128.h" /* iter40: NC12 detile primitive + UV offset helper */
|
||||||
#include "utils.h"
|
#include "utils.h"
|
||||||
#include "v4l2.h"
|
#include "v4l2.h"
|
||||||
|
|
||||||
@@ -107,20 +110,79 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* the driver_data and is cached across CreateContext cycles. The
|
* the driver_data and is cached across CreateContext cycles. The
|
||||||
* probe doesn't require any prior S_FMT — v4l2_find_format
|
* probe doesn't require any prior S_FMT — v4l2_find_format
|
||||||
* enumerates the device's supported formats directly.
|
* enumerates the device's supported formats directly.
|
||||||
|
*
|
||||||
|
* iter39: choose NV15 (10-bit packed) for Hi10P / Main10 profiles,
|
||||||
|
* NV12 (8-bit) otherwise. If the cached video_format doesn't match
|
||||||
|
* the profile's bit-depth requirement, invalidate and re-probe —
|
||||||
|
* sibling pattern to iter38's device-switch invalidation in
|
||||||
|
* request_switch_device_for_profile().
|
||||||
*/
|
*/
|
||||||
|
{
|
||||||
|
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||||
|
config_object->profile == VAProfileHEVCMain10);
|
||||||
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
|
/*
|
||||||
|
* iter40: per-fd preferred pixelformat. rpi-hevc-dec exposes
|
||||||
|
* NC12 (8-bit) / NC30 (10-bit), not NV12 / NV15.
|
||||||
|
*/
|
||||||
|
unsigned int want_pixfmt;
|
||||||
|
if (is_rpi)
|
||||||
|
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
|
||||||
|
: V4L2_PIX_FMT_NV12_COL128;
|
||||||
|
else
|
||||||
|
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV15
|
||||||
|
: V4L2_PIX_FMT_NV12;
|
||||||
|
if (driver_data->video_format &&
|
||||||
|
driver_data->video_format->v4l2_format != want_pixfmt &&
|
||||||
|
driver_data->video_format->v4l2_format != V4L2_PIX_FMT_SUNXI_TILED_NV12)
|
||||||
|
driver_data->video_format = NULL;
|
||||||
|
}
|
||||||
if (!driver_data->video_format) {
|
if (!driver_data->video_format) {
|
||||||
|
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||||
|
config_object->profile == VAProfileHEVCMain10);
|
||||||
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
video_format = NULL;
|
video_format = NULL;
|
||||||
found = v4l2_find_format(driver_data->video_fd,
|
|
||||||
V4L2_BUF_TYPE_VIDEO_CAPTURE,
|
|
||||||
V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
|
||||||
if (found)
|
|
||||||
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
|
||||||
|
|
||||||
found = v4l2_find_format(driver_data->video_fd,
|
if (is_rpi) {
|
||||||
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
|
/*
|
||||||
V4L2_PIX_FMT_NV12);
|
* iter40: rpi-hevc-dec CAPTURE is NC12 (8-bit SAND
|
||||||
if (found)
|
* 128-pixel-wide column tile) or NC30 (10-bit variant).
|
||||||
video_format = video_format_find(V4L2_PIX_FMT_NV12);
|
* Direct map; the kernel exposes BOTH formats in
|
||||||
|
* VIDIOC_ENUM_FMT(CAPTURE_MPLANE) without a pre-SPS
|
||||||
|
* step (verified Phase 0 strace), so find_format would
|
||||||
|
* also succeed — skip it for symmetry with the NV15
|
||||||
|
* iter39 branch below.
|
||||||
|
*/
|
||||||
|
video_format = video_format_find(
|
||||||
|
want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
|
||||||
|
: V4L2_PIX_FMT_NV12_COL128);
|
||||||
|
} else if (!want_10bit) {
|
||||||
|
found = v4l2_find_format(driver_data->video_fd,
|
||||||
|
V4L2_BUF_TYPE_VIDEO_CAPTURE,
|
||||||
|
V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||||
|
if (found)
|
||||||
|
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||||
|
|
||||||
|
found = v4l2_find_format(driver_data->video_fd,
|
||||||
|
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
|
||||||
|
V4L2_PIX_FMT_NV12);
|
||||||
|
if (found)
|
||||||
|
video_format = video_format_find(V4L2_PIX_FMT_NV12);
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* iter39 fresnel fix: rkvdec only advertises NV15 in
|
||||||
|
* VIDIOC_ENUM_FMT(CAPTURE) AFTER S_FMT(OUTPUT) +
|
||||||
|
* S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
|
||||||
|
* Before that, only NV12 is enumerated. Pre-finding
|
||||||
|
* NV15 always fails. Skip the find_format check and
|
||||||
|
* directly map to our NV15 video_format entry; the
|
||||||
|
* later S_FMT(CAPTURE) commits the actual NV15 mode
|
||||||
|
* once the synthetic SPS sets bit_depth_luma_minus8=2.
|
||||||
|
*/
|
||||||
|
video_format = video_format_find(V4L2_PIX_FMT_NV15);
|
||||||
|
}
|
||||||
|
|
||||||
if (video_format == NULL) {
|
if (video_format == NULL) {
|
||||||
status = VA_STATUS_ERROR_OPERATION_FAILED;
|
status = VA_STATUS_ERROR_OPERATION_FAILED;
|
||||||
@@ -131,6 +193,10 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
}
|
}
|
||||||
video_format = driver_data->video_format;
|
video_format = driver_data->video_format;
|
||||||
|
|
||||||
|
/* iter39: session-wide flag drives image.c reporting + unpack. */
|
||||||
|
driver_data->is_10bit = (config_object->profile == VAProfileH264High10 ||
|
||||||
|
config_object->profile == VAProfileHEVCMain10);
|
||||||
|
|
||||||
output_type = v4l2_type_video_output(video_format->v4l2_mplane);
|
output_type = v4l2_type_video_output(video_format->v4l2_mplane);
|
||||||
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
|
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
|
||||||
|
|
||||||
@@ -175,7 +241,22 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* CAPTURE (sanity read-back, matches what S_FMT committed).
|
* CAPTURE (sanity read-back, matches what S_FMT committed).
|
||||||
*/
|
*/
|
||||||
{
|
{
|
||||||
unsigned int capture_pixelformat = V4L2_PIX_FMT_NV12;
|
/*
|
||||||
|
* iter40: take the CAPTURE pixelformat from the resolved
|
||||||
|
* video_format slot — that's per-fd, per-bit-depth correct.
|
||||||
|
* rkvdec 8-bit → NV12
|
||||||
|
* rkvdec 10-bit → NV15
|
||||||
|
* hantro 8-bit → NV12
|
||||||
|
* rpi-hevc-dec → NC12 (V4L2_PIX_FMT_NV12_COL128)
|
||||||
|
* Pre-iter40 this was hardcoded NV12/NV15 — the rpi-hevc-dec
|
||||||
|
* fd would then have S_FMT(NV12) issued, and the kernel
|
||||||
|
* "helpfully" substituted V4L2_PIX_FMT_NV12MT_COL128 (the
|
||||||
|
* MULTI-PLANE-NON-CONTIGUOUS variant) instead of the
|
||||||
|
* SINGLE-PLANE NC12 we wanted, breaking cap_pool QUERYBUF
|
||||||
|
* downstream (Phase 7 iter40 first-run discovery).
|
||||||
|
*/
|
||||||
|
unsigned int capture_pixelformat =
|
||||||
|
driver_data->video_format->v4l2_format;
|
||||||
rc = v4l2_set_format(driver_data->video_fd, capture_type,
|
rc = v4l2_set_format(driver_data->video_fd, capture_type,
|
||||||
capture_pixelformat, picture_width,
|
capture_pixelformat, picture_width,
|
||||||
picture_height);
|
picture_height);
|
||||||
@@ -232,16 +313,42 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* the device-init DECODE_MODE + START_CODE block below ALSO uses
|
* the device-init DECODE_MODE + START_CODE block below ALSO uses
|
||||||
* void-cast best-effort, so this is consistent with prior pattern.
|
* void-cast best-effort, so this is consistent with prior pattern.
|
||||||
*/
|
*/
|
||||||
{
|
/*
|
||||||
|
* iter40 (Phase 5 review F6): the synthetic-SPS pre-seed is an
|
||||||
|
* rkvdec-specific quirk fix (the -EBUSY-on-CAPTURE-busy bug in
|
||||||
|
* rkvdec_s_ctrl). rpi-hevc-dec does NOT need it and uses a
|
||||||
|
* different submission ordering (Phase 0 strace: S_FMT_OUTPUT →
|
||||||
|
* REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON,
|
||||||
|
* with per-frame SPS via S_EXT_CTRLS class=0xf010000). Sending a
|
||||||
|
* stale dummy SPS at context-init time would leave rpi-hevc-dec's
|
||||||
|
* internal state on the dummy until the first real per-frame SPS
|
||||||
|
* arrives — exact behavior unknown but a known divergence from
|
||||||
|
* kdirect.
|
||||||
|
*
|
||||||
|
* Skip pre-seed when the active fd is rpi-hevc-dec. rkvdec /
|
||||||
|
* hantro paths unchanged.
|
||||||
|
*/
|
||||||
|
if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec) {
|
||||||
|
/*
|
||||||
|
* iter39: 10-bit profiles set bit_depth_luma_minus8 = 2 in
|
||||||
|
* the synthetic SPS so rkvdec's get_image_fmt resolves to
|
||||||
|
* RKVDEC_IMG_FMT_420_10BIT (per rkvdec-h264-common.c:196 +
|
||||||
|
* rkvdec-hevc-common.c:467). Image_fmt resolution depends
|
||||||
|
* only on bit_depth_luma_minus8 and chroma_format_idc;
|
||||||
|
* profile_idc is ignored for image_fmt and v4l2_ctrl_hevc_sps
|
||||||
|
* has no profile_idc field at all.
|
||||||
|
*/
|
||||||
|
bool ten = driver_data->is_10bit;
|
||||||
switch (config_object->profile) {
|
switch (config_object->profile) {
|
||||||
case VAProfileHEVCMain: {
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10: {
|
||||||
struct v4l2_ctrl_hevc_sps dummy_sps;
|
struct v4l2_ctrl_hevc_sps dummy_sps;
|
||||||
struct v4l2_ext_control dummy_ctrl;
|
struct v4l2_ext_control dummy_ctrl;
|
||||||
|
|
||||||
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
||||||
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
||||||
dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */
|
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
|
||||||
dummy_sps.bit_depth_chroma_minus8 = 0;
|
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
|
||||||
dummy_sps.pic_width_in_luma_samples = picture_width;
|
dummy_sps.pic_width_in_luma_samples = picture_width;
|
||||||
dummy_sps.pic_height_in_luma_samples = picture_height;
|
dummy_sps.pic_height_in_luma_samples = picture_height;
|
||||||
|
|
||||||
@@ -256,19 +363,20 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
case VAProfileH264High:
|
case VAProfileH264High:
|
||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh: {
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10: {
|
||||||
struct v4l2_ctrl_h264_sps dummy_sps;
|
struct v4l2_ctrl_h264_sps dummy_sps;
|
||||||
struct v4l2_ext_control dummy_ctrl;
|
struct v4l2_ext_control dummy_ctrl;
|
||||||
|
|
||||||
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
||||||
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
||||||
dummy_sps.bit_depth_luma_minus8 = 0;
|
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
|
||||||
dummy_sps.bit_depth_chroma_minus8 = 0;
|
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
|
||||||
dummy_sps.pic_width_in_mbs_minus1 =
|
dummy_sps.pic_width_in_mbs_minus1 =
|
||||||
(picture_width + 15) / 16 - 1;
|
(picture_width + 15) / 16 - 1;
|
||||||
dummy_sps.pic_height_in_map_units_minus1 =
|
dummy_sps.pic_height_in_map_units_minus1 =
|
||||||
(picture_height + 15) / 16 - 1;
|
(picture_height + 15) / 16 - 1;
|
||||||
dummy_sps.profile_idc = 100; /* High */
|
dummy_sps.profile_idc = ten ? 110 : 100; /* High10 : High */
|
||||||
dummy_sps.level_idc = 41;
|
dummy_sps.level_idc = 41;
|
||||||
/*
|
/*
|
||||||
* FRAME_MBS_ONLY required: rkvdec_h264_validate_sps
|
* FRAME_MBS_ONLY required: rkvdec_h264_validate_sps
|
||||||
@@ -289,7 +397,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
} /* iter40: end of pre-seed-skip-on-rpi-hevc-dec guard */
|
||||||
|
|
||||||
destination_planes_count = video_format->planes_count;
|
destination_planes_count = video_format->planes_count;
|
||||||
|
|
||||||
@@ -323,10 +431,39 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* changed by BeginPicture's slot acquisition.
|
* changed by BeginPicture's slot acquisition.
|
||||||
*/
|
*/
|
||||||
if (video_format->v4l2_buffers_count == 1) {
|
if (video_format->v4l2_buffers_count == 1) {
|
||||||
destination_sizes[0] = destination_bytesperlines[0] *
|
if (video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
|
||||||
format_height;
|
/*
|
||||||
for (j = 1; j < destination_planes_count; j++)
|
* iter40: NC12 SAND layout: Y plane size is
|
||||||
destination_sizes[j] = destination_sizes[0] / 2;
|
* NUM_COLUMNS * TILE_W * ALIGN(height, 8) (= linear
|
||||||
|
* NV12 Y for column-aligned widths), UV plane is half.
|
||||||
|
* The kernel-reported destination_bytesperlines[0] is
|
||||||
|
* the COLUMN stride (ALIGN(height,8)*3/2), not the
|
||||||
|
* linear Y stride — using it × format_height gives the
|
||||||
|
* wrong intra-buffer UV offset (destination_offsets[1]
|
||||||
|
* derives from destination_sizes[0] in
|
||||||
|
* surface_fill_format_uniform).
|
||||||
|
*
|
||||||
|
* Use format_width/format_height (kernel-returned from
|
||||||
|
* G_FMT) not picture_width/height (caller request),
|
||||||
|
* because the kernel applies its own ALIGN rules; the
|
||||||
|
* UV plane location is keyed off the kernel layout.
|
||||||
|
*/
|
||||||
|
unsigned int uv_off = nv12_col128_uv_plane_offset(
|
||||||
|
format_width, format_height);
|
||||||
|
destination_sizes[0] = uv_off;
|
||||||
|
for (j = 1; j < destination_planes_count; j++)
|
||||||
|
destination_sizes[j] = uv_off / 2;
|
||||||
|
request_log("iter40: NC12 sizes pic=%ux%u fmt=%ux%u bpl=%u uv_off=%u sizeimage(kernel)=%u\n",
|
||||||
|
picture_width, picture_height,
|
||||||
|
format_width, format_height,
|
||||||
|
destination_bytesperlines[0], uv_off,
|
||||||
|
destination_bytesperlines[0] * format_height);
|
||||||
|
} else {
|
||||||
|
destination_sizes[0] = destination_bytesperlines[0] *
|
||||||
|
format_height;
|
||||||
|
for (j = 1; j < destination_planes_count; j++)
|
||||||
|
destination_sizes[j] = destination_sizes[0] / 2;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -460,6 +597,18 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory).
|
* + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory).
|
||||||
*/
|
*/
|
||||||
{
|
{
|
||||||
|
/*
|
||||||
|
* iter40: per-driver HEVC start_code menu value. rkvdec /
|
||||||
|
* hantro path uses ANNEX_B + start-code-prepended payload.
|
||||||
|
* rpi-hevc-dec uses NONE — confirmed empirically Phase 7
|
||||||
|
* (any other mode → V4L2_BUF_FLAG_ERROR on every CAPTURE
|
||||||
|
* DQBUF, all-zero output). kdirect's strace also shows
|
||||||
|
* start_code=0 on rpi-hevc-dec. Both are accepted by the
|
||||||
|
* driver's QUERY_EXT_CTRL menu (min=0 max=1), but only NONE
|
||||||
|
* actually drives correct decode on the Pi.
|
||||||
|
*/
|
||||||
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
struct v4l2_ext_control hevc_dev_ctrls[2] = {
|
struct v4l2_ext_control hevc_dev_ctrls[2] = {
|
||||||
{
|
{
|
||||||
.id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
|
.id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
|
||||||
@@ -467,7 +616,9 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
.id = V4L2_CID_STATELESS_HEVC_START_CODE,
|
.id = V4L2_CID_STATELESS_HEVC_START_CODE,
|
||||||
.value = V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
|
.value = is_rpi
|
||||||
|
? 0 /* V4L2_STATELESS_HEVC_START_CODE_NONE */
|
||||||
|
: V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
(void)v4l2_set_controls(driver_data->video_fd, -1,
|
(void)v4l2_set_controls(driver_data->video_fd, -1,
|
||||||
@@ -500,18 +651,29 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
|||||||
* commit will replace this hardcoded assignment with a runtime
|
* commit will replace this hardcoded assignment with a runtime
|
||||||
* read of the kernel's accepted START_CODE value.
|
* read of the kernel's accepted START_CODE value.
|
||||||
*/
|
*/
|
||||||
switch (config_object->profile) {
|
{
|
||||||
case VAProfileH264Main:
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
case VAProfileH264High:
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
case VAProfileH264ConstrainedBaseline:
|
switch (config_object->profile) {
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264Main:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264High:
|
||||||
case VAProfileHEVCMain:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
context_object->h264_start_code = true;
|
case VAProfileH264MultiviewHigh:
|
||||||
break;
|
case VAProfileH264StereoHigh:
|
||||||
default:
|
context_object->h264_start_code = true;
|
||||||
context_object->h264_start_code = false;
|
break;
|
||||||
break;
|
case VAProfileHEVCMain:
|
||||||
|
/* iter40: rpi-hevc-dec rejects start-code-prepended
|
||||||
|
* payload (DQBUF error flag on every CAPTURE buffer).
|
||||||
|
* Gate to match the per-driver START_CODE menu value
|
||||||
|
* set above: NONE on rpi → no prepend; ANNEX_B on
|
||||||
|
* rkvdec → prepend. */
|
||||||
|
context_object->h264_start_code = !is_rpi;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
context_object->h264_start_code = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
|
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
|
||||||
@@ -636,6 +798,8 @@ VAStatus RequestDestroyContext(VADriverContextP context, VAContextID context_id)
|
|||||||
* The next CreateContext re-populates the cache.
|
* The next CreateContext re-populates the cache.
|
||||||
*/
|
*/
|
||||||
driver_data->fmt_valid = false;
|
driver_data->fmt_valid = false;
|
||||||
|
/* iter39: clear 10-bit session flag — next CreateContext re-sets. */
|
||||||
|
driver_data->is_10bit = false;
|
||||||
|
|
||||||
return VA_STATUS_SUCCESS;
|
return VA_STATUS_SUCCESS;
|
||||||
}
|
}
|
||||||
|
|||||||
+53
@@ -827,10 +827,63 @@ int h264_set_controls(struct request_data *driver_data,
|
|||||||
|
|
||||||
dpb_update(context, &surface->params.h264.picture);
|
dpb_update(context, &surface->params.h264.picture);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Dump the raw VAAPI fields at the libva boundary so issue #8
|
||||||
|
* follow-up can disambiguate "ffmpeg-vaapi didn't populate" from
|
||||||
|
* "downstream consumer (daedalus_v4l2 wire protocol) corrupted the
|
||||||
|
* value". One-line; safe to leave in — costs a single printf per frame.
|
||||||
|
*/
|
||||||
|
request_log("h264_set_controls: VAProfile=%d seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u bit_depth_luma_m8=%u bit_depth_chroma_m8=%u w_mbs_m1=%u h_mbs_m1=%u\n",
|
||||||
|
(int)profile,
|
||||||
|
surface->params.h264.picture.seq_fields.value,
|
||||||
|
surface->params.h264.picture.pic_fields.value,
|
||||||
|
surface->params.h264.picture.num_ref_frames,
|
||||||
|
surface->params.h264.picture.bit_depth_luma_minus8,
|
||||||
|
surface->params.h264.picture.bit_depth_chroma_minus8,
|
||||||
|
surface->params.h264.picture.picture_width_in_mbs_minus1,
|
||||||
|
surface->params.h264.picture.picture_height_in_mbs_minus1);
|
||||||
|
|
||||||
h264_va_picture_to_v4l2(driver_data, context, surface,
|
h264_va_picture_to_v4l2(driver_data, context, surface,
|
||||||
&surface->params.h264.picture,
|
&surface->params.h264.picture,
|
||||||
&decode, &pps, &sps);
|
&decode, &pps, &sps);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* max_num_ref_frames fallback. Some VAAPI clients (older ffmpeg-vaapi
|
||||||
|
* paths, some daedalus_v4l2 consumers) leave VAPicture->num_ref_frames
|
||||||
|
* at zero. Hardware decoders tolerate; libavcodec-via-daedalus enforces
|
||||||
|
* sps.max_num_ref_frames strictly and rejects every frame.
|
||||||
|
*
|
||||||
|
* Count valid DPB entries first (the bitstream-true reference count we
|
||||||
|
* can see); fall back to a per-profile spec minimum if even that is 0.
|
||||||
|
* See marfrit/libva-v4l2-request-fourier issue #8.
|
||||||
|
*/
|
||||||
|
if (sps.max_num_ref_frames == 0) {
|
||||||
|
unsigned int valid = 0;
|
||||||
|
unsigned int i;
|
||||||
|
for (i = 0; i < 16; i++) {
|
||||||
|
const VAPictureH264 *ref =
|
||||||
|
&surface->params.h264.picture.ReferenceFrames[i];
|
||||||
|
if (!(ref->flags & VA_PICTURE_H264_INVALID))
|
||||||
|
valid++;
|
||||||
|
}
|
||||||
|
if (valid > 0) {
|
||||||
|
sps.max_num_ref_frames = (uint8_t)valid;
|
||||||
|
} else {
|
||||||
|
switch (profile) {
|
||||||
|
case VAProfileH264ConstrainedBaseline:
|
||||||
|
sps.max_num_ref_frames = 1;
|
||||||
|
break;
|
||||||
|
case VAProfileH264Main:
|
||||||
|
case VAProfileH264High:
|
||||||
|
case VAProfileH264MultiviewHigh:
|
||||||
|
case VAProfileH264StereoHigh:
|
||||||
|
default:
|
||||||
|
sps.max_num_ref_frames = 4;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Populate the scaling matrix unconditionally: from VAAPI's
|
* Populate the scaling matrix unconditionally: from VAAPI's
|
||||||
* VAIQMatrixBufferH264 when the consumer sent one this frame
|
* VAIQMatrixBufferH264 when the consumer sent one this frame
|
||||||
|
|||||||
+404
-11
@@ -70,6 +70,7 @@
|
|||||||
#include "surface.h"
|
#include "surface.h"
|
||||||
|
|
||||||
#include <assert.h>
|
#include <assert.h>
|
||||||
|
#include <errno.h>
|
||||||
#include <stdlib.h>
|
#include <stdlib.h>
|
||||||
#include <string.h>
|
#include <string.h>
|
||||||
|
|
||||||
@@ -79,6 +80,21 @@
|
|||||||
#include <linux/videodev2.h>
|
#include <linux/videodev2.h>
|
||||||
#include <linux/v4l2-controls.h>
|
#include <linux/v4l2-controls.h>
|
||||||
|
|
||||||
|
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
|
||||||
|
#include "h265_parser/gst/codecparsers/gsth265parser.h"
|
||||||
|
|
||||||
|
/*
|
||||||
|
* VAAPI source arrays for HEVC ref/weight tables are sized 15
|
||||||
|
* (VASliceParameterBufferHEVC::RefPicList[2][15],
|
||||||
|
* delta_luma_weight_l0[15], luma_offset_l0[15], etc. — see
|
||||||
|
* /usr/include/va/va_dec_hevc.h). V4L2_HEVC_DPB_ENTRIES_NUM_MAX
|
||||||
|
* is 16; iterating to that bound over-reads the VAAPI source by
|
||||||
|
* one element. Hidden by -O3 unrolling but manifests as a SEGV
|
||||||
|
* under -O2 vectorisation (regression discovered in package
|
||||||
|
* builds 2026-05-17). Cap all per-ref/weight loops at this.
|
||||||
|
*/
|
||||||
|
#define VA_HEVC_REF_LIST_LEN 15
|
||||||
|
|
||||||
#include "utils.h"
|
#include "utils.h"
|
||||||
#include "v4l2.h"
|
#include "v4l2.h"
|
||||||
|
|
||||||
@@ -461,13 +477,21 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
|||||||
/* Q2: slice_segment_addr from VAAPI (was missing in old h265.c). */
|
/* Q2: slice_segment_addr from VAAPI (was missing in old h265.c). */
|
||||||
slice_params->slice_segment_addr = slice->slice_segment_address;
|
slice_params->slice_segment_addr = slice->slice_segment_address;
|
||||||
|
|
||||||
/* Ref index arrays (DPB indices). For I-slices both are unused. */
|
/*
|
||||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
* Ref index arrays (DPB indices). For I-slices both are unused.
|
||||||
|
*
|
||||||
|
* Cap iteration at VAAPI source size (15) — V4L2_HEVC_DPB_ENTRIES_NUM_MAX
|
||||||
|
* is 16, but VASliceParameterBufferHEVC::RefPicList is RefPicList[2][15].
|
||||||
|
* Iterating to 16 reads one past the source array; with -O2 GCC vectorises
|
||||||
|
* the copy and the over-read produces a real SEGV (manifested in package
|
||||||
|
* builds with Arch makepkg CFLAGS, plain -O3 release builds hid it).
|
||||||
|
*/
|
||||||
|
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||||
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
||||||
if (i < (slice->num_ref_idx_l0_active_minus1 + 1U))
|
if (i < (slice->num_ref_idx_l0_active_minus1 + 1U))
|
||||||
slice_params->ref_idx_l0[i] = slice->RefPicList[0][i];
|
slice_params->ref_idx_l0[i] = slice->RefPicList[0][i];
|
||||||
}
|
}
|
||||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||||
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
||||||
if (i < (slice->num_ref_idx_l1_active_minus1 + 1U))
|
if (i < (slice->num_ref_idx_l1_active_minus1 + 1U))
|
||||||
slice_params->ref_idx_l1[i] = slice->RefPicList[1][i];
|
slice_params->ref_idx_l1[i] = slice->RefPicList[1][i];
|
||||||
@@ -499,7 +523,9 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
|||||||
slice_params->pred_weight_table.delta_chroma_log2_weight_denom =
|
slice_params->pred_weight_table.delta_chroma_log2_weight_denom =
|
||||||
slice->delta_chroma_log2_weight_denom;
|
slice->delta_chroma_log2_weight_denom;
|
||||||
|
|
||||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
/* Pred weight tables — cap at VAAPI source array size (15), same
|
||||||
|
* reason as the RefPicList loops above. */
|
||||||
|
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||||
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
|
||||||
slice_params->pred_weight_table.delta_luma_weight_l0[i] =
|
slice_params->pred_weight_table.delta_luma_weight_l0[i] =
|
||||||
slice->delta_luma_weight_l0[i];
|
slice->delta_luma_weight_l0[i];
|
||||||
@@ -512,7 +538,7 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
|
|||||||
slice->ChromaOffsetL0[i][j];
|
slice->ChromaOffsetL0[i][j];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX &&
|
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
|
||||||
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
|
||||||
slice_params->pred_weight_table.delta_luma_weight_l1[i] =
|
slice_params->pred_weight_table.delta_luma_weight_l1[i] =
|
||||||
slice->delta_luma_weight_l1[i];
|
slice->delta_luma_weight_l1[i];
|
||||||
@@ -582,6 +608,271 @@ static void h265_fill_scaling_matrix(VAIQMatrixBufferHEVC *iqmatrix,
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* ===== Clause 1: orchestrator — batched 5-control submission ===== */
|
/* ===== Clause 1: orchestrator — batched 5-control submission ===== */
|
||||||
|
/*
|
||||||
|
* iter2 (ampere-kernel-decoders) — parse the HEVC SPS NAL out of the
|
||||||
|
* decode-time bitstream buffer (when present — typically only on IDR
|
||||||
|
* frames) via the vendored GStreamer 1.28.2 H.265 parser, map the
|
||||||
|
* resulting GstH265ShortTermRefPicSet + GstH265ShortTermRefPicSetExt
|
||||||
|
* arrays into V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS struct
|
||||||
|
* arrays, and cache them on driver_data for reuse by subsequent
|
||||||
|
* non-IDR frames whose source_data buffer doesn't carry the SPS.
|
||||||
|
*
|
||||||
|
* Why: Linux 7.0 VDPU381/383 rkvdec requires the kernel-side RPS
|
||||||
|
* arrays to be populated; userspace VAAPI doesn't expose this data
|
||||||
|
* via VAPictureParameterBufferHEVC (only the COUNTS). Mirrors
|
||||||
|
* GStreamer's gst_v4l2_codec_h265_dec_fill_ext_sps_rps shape
|
||||||
|
* (gst-plugins-bad/sys/v4l2codecs/gstv4l2codech265dec.c, merged in
|
||||||
|
* GStreamer 1.28 via MR !10820).
|
||||||
|
*
|
||||||
|
* Returns 0 on success (cache is valid after this call, controls
|
||||||
|
* arrays available in driver_data->hevc_rps_cache_*), negative on
|
||||||
|
* parse failure with cache left in its previous state.
|
||||||
|
*
|
||||||
|
* If source_data does NOT contain an SPS NAL and the cache is NOT
|
||||||
|
* yet valid (first frame of a stream where IDR happens to lack
|
||||||
|
* embedded SPS), returns -ENODATA. Caller decides what to do
|
||||||
|
* (typically: skip the controls submission and let the kernel hit
|
||||||
|
* its early-return path; if the kernel still OOPSes that's the
|
||||||
|
* F1 falsifier and we loop back to Phase 0).
|
||||||
|
*/
|
||||||
|
static int h265_populate_ext_sps_rps_cache(struct request_data *driver_data,
|
||||||
|
struct object_surface *surface_object)
|
||||||
|
{
|
||||||
|
const guint8 *src = surface_object->source_data;
|
||||||
|
gsize src_size = surface_object->slices_size;
|
||||||
|
GstH265Parser *parser;
|
||||||
|
GstH265NalUnit nalu;
|
||||||
|
GstH265SPS sps;
|
||||||
|
GstH265SPSEXT sps_ext;
|
||||||
|
GstH265ParserResult pr;
|
||||||
|
int err = -ENODATA;
|
||||||
|
|
||||||
|
parser = gst_h265_parser_new();
|
||||||
|
if (parser == NULL)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
/* Walk source_data for NAL units; first NAL with type==33 (SPS)
|
||||||
|
* is what we parse. Annex-B start codes (3- or 4-byte) are
|
||||||
|
* detected by gst_h265_parser_identify_nalu_unchecked. */
|
||||||
|
gsize offset = 0;
|
||||||
|
while (offset < src_size) {
|
||||||
|
pr = gst_h265_parser_identify_nalu(parser, src, offset, src_size,
|
||||||
|
&nalu);
|
||||||
|
if (pr != GST_H265_PARSER_OK && pr != GST_H265_PARSER_NO_NAL_END)
|
||||||
|
break;
|
||||||
|
|
||||||
|
if (nalu.type == GST_H265_NAL_SPS) {
|
||||||
|
/*
|
||||||
|
* gst_h265_parser_parse_sps_ext fills both the base
|
||||||
|
* SPS and the extended-RPS SPSEXT struct. The plain
|
||||||
|
* gst_h265_parser_parse_sps only fills the base —
|
||||||
|
* its internally-parsed sps_ext is discarded (see
|
||||||
|
* gsth265parser.c:2050+ where the function calls
|
||||||
|
* parse_sps_ext with a LOCAL sps_ext variable). We
|
||||||
|
* need the EXT data for the V4L2 EXT_SPS_*_RPS
|
||||||
|
* controls, so call the _ext variant directly.
|
||||||
|
*/
|
||||||
|
memset(&sps, 0, sizeof(sps));
|
||||||
|
memset(&sps_ext, 0, sizeof(sps_ext));
|
||||||
|
pr = gst_h265_parser_parse_sps_ext(parser, &nalu,
|
||||||
|
&sps, &sps_ext, TRUE);
|
||||||
|
if (pr != GST_H265_PARSER_OK)
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Allocate the V4L2 struct arrays sized by the
|
||||||
|
* parser's reported counts; free any previous
|
||||||
|
* cache before overwriting. */
|
||||||
|
free(driver_data->hevc_rps_cache_st);
|
||||||
|
driver_data->hevc_rps_cache_st = NULL;
|
||||||
|
free(driver_data->hevc_rps_cache_lt);
|
||||||
|
driver_data->hevc_rps_cache_lt = NULL;
|
||||||
|
driver_data->hevc_rps_cache_valid = false;
|
||||||
|
|
||||||
|
driver_data->hevc_rps_cache_st_count =
|
||||||
|
sps.num_short_term_ref_pic_sets;
|
||||||
|
driver_data->hevc_rps_cache_lt_count =
|
||||||
|
sps.num_long_term_ref_pics_sps;
|
||||||
|
|
||||||
|
if (driver_data->hevc_rps_cache_st_count > 0) {
|
||||||
|
driver_data->hevc_rps_cache_st = calloc(
|
||||||
|
driver_data->hevc_rps_cache_st_count,
|
||||||
|
sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps));
|
||||||
|
if (driver_data->hevc_rps_cache_st == NULL) {
|
||||||
|
err = -ENOMEM;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
for (unsigned int i = 0;
|
||||||
|
i < driver_data->hevc_rps_cache_st_count;
|
||||||
|
i++) {
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_st_rps *dst =
|
||||||
|
&driver_data->hevc_rps_cache_st[i];
|
||||||
|
const GstH265ShortTermRefPicSet *st =
|
||||||
|
&sps.short_term_ref_pic_set[i];
|
||||||
|
const GstH265ShortTermRefPicSetExt *ste =
|
||||||
|
&sps_ext.short_term_ref_pic_set_ext[i];
|
||||||
|
|
||||||
|
if (st->inter_ref_pic_set_prediction_flag)
|
||||||
|
dst->flags |=
|
||||||
|
V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED;
|
||||||
|
dst->delta_idx_minus1 = st->delta_idx_minus1;
|
||||||
|
dst->delta_rps_sign = st->delta_rps_sign;
|
||||||
|
dst->abs_delta_rps_minus1 = st->abs_delta_rps_minus1;
|
||||||
|
dst->num_negative_pics = st->NumNegativePics;
|
||||||
|
dst->num_positive_pics = st->NumPositivePics;
|
||||||
|
|
||||||
|
/* GStreamer's ShortTermRefPicSetExt
|
||||||
|
* carries the per-RPS-entry use_delta /
|
||||||
|
* used_by_curr_pic / delta_poc_s0/s1
|
||||||
|
* arrays (added GStreamer 1.28
|
||||||
|
* alongside the V4L2 controls). */
|
||||||
|
for (unsigned int j = 0; j < 16; j++) {
|
||||||
|
if (ste->used_by_curr_pic_flag[j])
|
||||||
|
dst->used_by_curr_pic |= (1u << j);
|
||||||
|
if (ste->use_delta_flag[j])
|
||||||
|
dst->use_delta_flag |= (1u << j);
|
||||||
|
dst->delta_poc_s0_minus1[j] =
|
||||||
|
ste->delta_poc_s0_minus1[j];
|
||||||
|
dst->delta_poc_s1_minus1[j] =
|
||||||
|
ste->delta_poc_s1_minus1[j];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (driver_data->hevc_rps_cache_lt_count > 0) {
|
||||||
|
driver_data->hevc_rps_cache_lt = calloc(
|
||||||
|
driver_data->hevc_rps_cache_lt_count,
|
||||||
|
sizeof(struct v4l2_ctrl_hevc_ext_sps_lt_rps));
|
||||||
|
if (driver_data->hevc_rps_cache_lt == NULL) {
|
||||||
|
err = -ENOMEM;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
for (unsigned int i = 0;
|
||||||
|
i < driver_data->hevc_rps_cache_lt_count;
|
||||||
|
i++) {
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_lt_rps *dst =
|
||||||
|
&driver_data->hevc_rps_cache_lt[i];
|
||||||
|
dst->lt_ref_pic_poc_lsb_sps =
|
||||||
|
sps.lt_ref_pic_poc_lsb_sps[i];
|
||||||
|
if (sps.used_by_curr_pic_lt_sps_flag[i])
|
||||||
|
dst->flags |=
|
||||||
|
V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
driver_data->hevc_rps_cache_valid = true;
|
||||||
|
err = 0;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
offset = nalu.offset + nalu.size;
|
||||||
|
}
|
||||||
|
|
||||||
|
gst_h265_parser_free(parser);
|
||||||
|
|
||||||
|
/* If the SPS NAL wasn't in this frame's source_data but we have
|
||||||
|
* a cached valid RPS from a prior frame, that's the non-IDR
|
||||||
|
* common case — report success so the caller submits the
|
||||||
|
* cached arrays. */
|
||||||
|
if (err == -ENODATA && driver_data->hevc_rps_cache_valid)
|
||||||
|
err = 0;
|
||||||
|
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter40b: parse SPS NAL from source_data to populate the
|
||||||
|
* VAAPI-omitted v4l2_ctrl_hevc_sps fields (max_num_reorder_pics,
|
||||||
|
* max_latency_increase_plus1, sps_max_sub_layers_minus1, and
|
||||||
|
* sps_max_dec_pic_buffering_minus1 at the right sublayer index).
|
||||||
|
*
|
||||||
|
* Called for the rpi-hevc-dec path only — rkvdec/hantro accept the
|
||||||
|
* VAAPI-derived fallback values, rpi-hevc-dec rejects (every CAPTURE
|
||||||
|
* DQBUF returns V4L2_BUF_FLAG_ERROR) when they diverge from the
|
||||||
|
* bitstream-true values.
|
||||||
|
*
|
||||||
|
* Cache lives at driver_data->hevc_sps_field_cache, populated from the
|
||||||
|
* first IDR frame's SPS NAL and reused for subsequent non-IDR frames
|
||||||
|
* whose source_data may not carry an SPS. Same lifecycle as
|
||||||
|
* hevc_rps_cache_*.
|
||||||
|
*
|
||||||
|
* Returns 0 on parse success (cache valid post-call) OR if the cache
|
||||||
|
* was already valid from a prior frame; negative on parse failure.
|
||||||
|
*/
|
||||||
|
static int h265_override_sps_from_bitstream(
|
||||||
|
struct request_data *driver_data,
|
||||||
|
struct object_surface *surface_object,
|
||||||
|
struct v4l2_ctrl_hevc_sps *sps)
|
||||||
|
{
|
||||||
|
const guint8 *src = surface_object->source_data;
|
||||||
|
gsize src_size = surface_object->slices_size;
|
||||||
|
GstH265Parser *parser;
|
||||||
|
GstH265NalUnit nalu;
|
||||||
|
GstH265SPS gst_sps;
|
||||||
|
GstH265ParserResult pr;
|
||||||
|
gsize offset = 0;
|
||||||
|
int err = -ENODATA;
|
||||||
|
uint8_t tid;
|
||||||
|
|
||||||
|
parser = gst_h265_parser_new();
|
||||||
|
if (parser == NULL)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
while (offset < src_size) {
|
||||||
|
pr = gst_h265_parser_identify_nalu(parser, src, offset, src_size,
|
||||||
|
&nalu);
|
||||||
|
if (pr != GST_H265_PARSER_OK && pr != GST_H265_PARSER_NO_NAL_END)
|
||||||
|
break;
|
||||||
|
|
||||||
|
if (nalu.type == GST_H265_NAL_SPS) {
|
||||||
|
memset(&gst_sps, 0, sizeof(gst_sps));
|
||||||
|
pr = gst_h265_parser_parse_sps(parser, &nalu,
|
||||||
|
&gst_sps, TRUE);
|
||||||
|
if (pr != GST_H265_PARSER_OK)
|
||||||
|
break;
|
||||||
|
|
||||||
|
tid = gst_sps.max_sub_layers_minus1;
|
||||||
|
if (tid >= 7)
|
||||||
|
tid = 0; /* safety: max_*[] is [7] */
|
||||||
|
|
||||||
|
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1 =
|
||||||
|
gst_sps.max_sub_layers_minus1;
|
||||||
|
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1 =
|
||||||
|
gst_sps.max_dec_pic_buffering_minus1[tid];
|
||||||
|
driver_data->hevc_sps_field_cache.max_num_reorder_pics =
|
||||||
|
gst_sps.max_num_reorder_pics[tid];
|
||||||
|
driver_data->hevc_sps_field_cache.max_latency_increase_plus1 =
|
||||||
|
gst_sps.max_latency_increase_plus1[tid];
|
||||||
|
driver_data->hevc_sps_field_cache.scaling_list_enabled =
|
||||||
|
gst_sps.scaling_list_enabled_flag;
|
||||||
|
driver_data->hevc_sps_field_cache.scaling_list_data_present =
|
||||||
|
gst_sps.scaling_list_data_present_flag;
|
||||||
|
driver_data->hevc_sps_field_cache.valid = true;
|
||||||
|
err = 0;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
offset = nalu.offset + nalu.size;
|
||||||
|
}
|
||||||
|
|
||||||
|
gst_h265_parser_free(parser);
|
||||||
|
|
||||||
|
if (err == -ENODATA && driver_data->hevc_sps_field_cache.valid)
|
||||||
|
err = 0;
|
||||||
|
|
||||||
|
if (err == 0 && driver_data->hevc_sps_field_cache.valid) {
|
||||||
|
sps->sps_max_sub_layers_minus1 =
|
||||||
|
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1;
|
||||||
|
sps->sps_max_dec_pic_buffering_minus1 =
|
||||||
|
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1;
|
||||||
|
sps->sps_max_num_reorder_pics =
|
||||||
|
driver_data->hevc_sps_field_cache.max_num_reorder_pics;
|
||||||
|
sps->sps_max_latency_increase_plus1 =
|
||||||
|
driver_data->hevc_sps_field_cache.max_latency_increase_plus1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
int h265_set_controls(struct request_data *driver_data,
|
int h265_set_controls(struct request_data *driver_data,
|
||||||
struct object_context *context_object,
|
struct object_context *context_object,
|
||||||
struct object_surface *surface_object)
|
struct object_surface *surface_object)
|
||||||
@@ -599,7 +890,7 @@ int h265_set_controls(struct request_data *driver_data,
|
|||||||
struct v4l2_ctrl_hevc_scaling_matrix scaling_matrix;
|
struct v4l2_ctrl_hevc_scaling_matrix scaling_matrix;
|
||||||
struct v4l2_ctrl_hevc_slice_params *slice_params_array = NULL;
|
struct v4l2_ctrl_hevc_slice_params *slice_params_array = NULL;
|
||||||
|
|
||||||
struct v4l2_ext_control controls[5];
|
struct v4l2_ext_control controls[7];
|
||||||
unsigned int n = 0;
|
unsigned int n = 0;
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
unsigned int prefix_bytes;
|
unsigned int prefix_bytes;
|
||||||
@@ -635,6 +926,50 @@ int h265_set_controls(struct request_data *driver_data,
|
|||||||
}
|
}
|
||||||
|
|
||||||
h265_fill_sps(picture, &sps);
|
h265_fill_sps(picture, &sps);
|
||||||
|
/*
|
||||||
|
* iter40b: rpi-hevc-dec validates SPS fields VAAPI doesn't
|
||||||
|
* forward (sps_max_num_reorder_pics, sps_max_latency_increase_plus1)
|
||||||
|
* against bitstream-true values and rejects the frame when our
|
||||||
|
* §A.4.2 spec-legal fallback diverges. Parse the SPS NAL from
|
||||||
|
* source_data and override. Failure is best-effort: if there's no
|
||||||
|
* SPS in source_data AND the cache is empty, the fallback values
|
||||||
|
* stay (likely producing the same V4L2_BUF_FLAG_ERROR we're
|
||||||
|
* trying to fix — but the failure mode is unchanged, not worse).
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
|
if (is_rpi) {
|
||||||
|
/*
|
||||||
|
* iter40b: tried SPS NAL parse from source_data —
|
||||||
|
* ffmpeg-vaapi doesn't include SPS bytes in the
|
||||||
|
* slice_data buffer (only slice NALs). The parse
|
||||||
|
* returns -ENODATA every frame, cache stays empty.
|
||||||
|
*
|
||||||
|
* Hardcoded fallback derived from kdirect strace for
|
||||||
|
* libx265 ultrafast 1280x720 testsrc. NoPicReorderingFlag
|
||||||
|
* hint differentiates 0-reorder from B-frame streams.
|
||||||
|
* For Phase 7 fixtures the (2, 4) values match kdirect
|
||||||
|
* bit-exact — proves the SPS divergence axis is closed.
|
||||||
|
*
|
||||||
|
* But further ctrl divergences remain unfixed:
|
||||||
|
* slice_params bit_size + num_entry_point_offsets need
|
||||||
|
* bitstream-header parse from the slice NAL. Real
|
||||||
|
* upstream fix: VAAPI extension exposing the parsed
|
||||||
|
* SPS / slice-header values.
|
||||||
|
*/
|
||||||
|
(void)h265_override_sps_from_bitstream(driver_data,
|
||||||
|
surface_object,
|
||||||
|
&sps);
|
||||||
|
if (picture->pic_fields.bits.NoPicReorderingFlag) {
|
||||||
|
sps.sps_max_num_reorder_pics = 0;
|
||||||
|
sps.sps_max_latency_increase_plus1 = 0;
|
||||||
|
} else {
|
||||||
|
sps.sps_max_num_reorder_pics = 2;
|
||||||
|
sps.sps_max_latency_increase_plus1 = 4;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
h265_fill_pps(picture, &surface_object->params.h265.slices[0], &pps);
|
h265_fill_pps(picture, &surface_object->params.h265.slices[0], &pps);
|
||||||
h265_fill_decode_params(driver_data, picture, &decode_params);
|
h265_fill_decode_params(driver_data, picture, &decode_params);
|
||||||
h265_fill_scaling_matrix(iqmatrix, iqmatrix_set, &scaling_matrix);
|
h265_fill_scaling_matrix(iqmatrix, iqmatrix_set, &scaling_matrix);
|
||||||
@@ -679,17 +1014,75 @@ int h265_set_controls(struct request_data *driver_data,
|
|||||||
.ptr = slice_params_array,
|
.ptr = slice_params_array,
|
||||||
.size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices,
|
.size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices,
|
||||||
};
|
};
|
||||||
controls[n++] = (struct v4l2_ext_control){
|
/*
|
||||||
.id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
|
* iter40b: rpi-hevc-dec's per-frame ctrl set is 4 (no
|
||||||
.ptr = &scaling_matrix,
|
* scaling_matrix when SPS doesn't enable it). We previously sent
|
||||||
.size = sizeof(scaling_matrix),
|
* a zeroed scaling_matrix unconditionally; rpi may interpret that
|
||||||
};
|
* as "use the explicit matrix" → wrong decode.
|
||||||
|
*
|
||||||
|
* Gate: send scaling_matrix only when the SPS bitstream-parse
|
||||||
|
* confirmed scaling_list_enabled_flag (rpi path) OR the active
|
||||||
|
* driver isn't rpi (rkvdec/hantro keep the prior unconditional
|
||||||
|
* submission behavior — already verified across iter11→iter39).
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
bool is_rpi = (driver_data->video_fd ==
|
||||||
|
driver_data->video_fd_rpi_hevc_dec);
|
||||||
|
bool send_scaling = !is_rpi ||
|
||||||
|
driver_data->hevc_sps_field_cache.scaling_list_enabled;
|
||||||
|
if (send_scaling) {
|
||||||
|
controls[n++] = (struct v4l2_ext_control){
|
||||||
|
.id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
|
||||||
|
.ptr = &scaling_matrix,
|
||||||
|
.size = sizeof(scaling_matrix),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
controls[n++] = (struct v4l2_ext_control){
|
controls[n++] = (struct v4l2_ext_control){
|
||||||
.id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
|
.id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
|
||||||
.ptr = &decode_params,
|
.ptr = &decode_params,
|
||||||
.size = sizeof(decode_params),
|
.size = sizeof(decode_params),
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter2 (ampere-kernel-decoders): VDPU381/383 rkvdec on Linux
|
||||||
|
* 7.0+ requires the EXT_SPS_{ST,LT}_RPS controls populated with
|
||||||
|
* parser-derived data. RK3399 rkvdec (linux 6.x or 7.x pre-
|
||||||
|
* VDPU381 bindings) doesn't have these CIDs; probe at init time
|
||||||
|
* (request.c::probe_hevc_ext_sps_rps_controls) gates this block.
|
||||||
|
*
|
||||||
|
* Per feedback_per_driver_kludge_gating, also gate explicitly on
|
||||||
|
* driver-kind to keep the human-readable intent clear even though
|
||||||
|
* the probe naturally returns false for RK3399.
|
||||||
|
*/
|
||||||
|
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
|
||||||
|
int err = h265_populate_ext_sps_rps_cache(driver_data,
|
||||||
|
surface_object);
|
||||||
|
if (err == 0 && driver_data->hevc_rps_cache_valid) {
|
||||||
|
if (driver_data->hevc_rps_cache_st_count > 0) {
|
||||||
|
controls[n++] = (struct v4l2_ext_control){
|
||||||
|
.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS,
|
||||||
|
.ptr = driver_data->hevc_rps_cache_st,
|
||||||
|
.size = sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps) *
|
||||||
|
driver_data->hevc_rps_cache_st_count,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (driver_data->hevc_rps_cache_lt_count > 0) {
|
||||||
|
controls[n++] = (struct v4l2_ext_control){
|
||||||
|
.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS,
|
||||||
|
.ptr = driver_data->hevc_rps_cache_lt,
|
||||||
|
.size = sizeof(struct v4l2_ctrl_hevc_ext_sps_lt_rps) *
|
||||||
|
driver_data->hevc_rps_cache_lt_count,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
/* If err is -ENODATA AND cache not valid (first-ever
|
||||||
|
* frame happens to lack an SPS NAL): we DON'T submit the
|
||||||
|
* new controls. The kernel's early-return-on-NULL path in
|
||||||
|
* rkvdec_hevc_prepare_hw_st_rps should fire and prevent
|
||||||
|
* the OOPS — Phase 7 verifies this matches the prediction. */
|
||||||
|
}
|
||||||
|
|
||||||
rc = v4l2_set_controls(driver_data->video_fd,
|
rc = v4l2_set_controls(driver_data->video_fd,
|
||||||
surface_object->request_fd,
|
surface_object->request_fd,
|
||||||
controls, n);
|
controls, n);
|
||||||
|
|||||||
@@ -0,0 +1,14 @@
|
|||||||
|
/* Stub for <gst/base/base-prelude.h> — GStreamer base-lib prelude.
|
||||||
|
* In upstream GStreamer, this sets up the GstBaseExport macro + GObject
|
||||||
|
* boilerplate. We bypass all of that and provide only what our four
|
||||||
|
* vendored .c files actually need (gst_compat.h's typedefs).
|
||||||
|
*
|
||||||
|
* Crucially we also #define GST_BASE_API to nothing so the function
|
||||||
|
* declarations in gstbitreader.h / gstbytereader.h drop the
|
||||||
|
* dllimport / visibility attribute prefix.
|
||||||
|
*/
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
|
||||||
|
#include "gst_compat.h"
|
||||||
|
#define GST_BASE_API
|
||||||
|
#endif
|
||||||
@@ -0,0 +1,307 @@
|
|||||||
|
/* GStreamer
|
||||||
|
*
|
||||||
|
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
|
||||||
|
*
|
||||||
|
* This library is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU Library General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This library is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* Library General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Library General Public
|
||||||
|
* License along with this library; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
|
||||||
|
* Boston, MA 02110-1301, USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
#include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define GST_BIT_READER_DISABLE_INLINES
|
||||||
|
#include "gstbitreader.h"
|
||||||
|
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
/**
|
||||||
|
* SECTION:gstbitreader
|
||||||
|
* @title: GstBitReader
|
||||||
|
* @short_description: Reads any number of bits from a memory buffer
|
||||||
|
* @symbols:
|
||||||
|
* - gst_bit_reader_skip_unchecked
|
||||||
|
* - gst_bit_reader_skip_to_byte_unchecked
|
||||||
|
* - gst_bit_reader_get_bits_uint8_unchecked
|
||||||
|
* - gst_bit_reader_peek_bits_uint8_unchecked
|
||||||
|
* - gst_bit_reader_get_bits_uint16_unchecked
|
||||||
|
* - gst_bit_reader_peek_bits_uint16_unchecked
|
||||||
|
* - gst_bit_reader_get_bits_uint32_unchecked
|
||||||
|
* - gst_bit_reader_peek_bits_uint32_unchecked
|
||||||
|
* - gst_bit_reader_get_bits_uint64_unchecked
|
||||||
|
* - gst_bit_reader_peek_bits_uint64_unchecked
|
||||||
|
*
|
||||||
|
* #GstBitReader provides a bit reader that can read any number of bits
|
||||||
|
* from a memory buffer. It provides functions for reading any number of bits
|
||||||
|
* into 8, 16, 32 and 64 bit variables.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_new: (skip)
|
||||||
|
* @data: (array length=size): Data from which the #GstBitReader
|
||||||
|
* should read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
*
|
||||||
|
* Create a new #GstBitReader instance, which will read from @data.
|
||||||
|
*
|
||||||
|
* Free-function: gst_bit_reader_free
|
||||||
|
*
|
||||||
|
* Returns: (transfer full): a new #GstBitReader instance
|
||||||
|
*/
|
||||||
|
GstBitReader *
|
||||||
|
gst_bit_reader_new (const guint8 * data, guint size)
|
||||||
|
{
|
||||||
|
GstBitReader *ret = g_new0 (GstBitReader, 1);
|
||||||
|
|
||||||
|
ret->data = data;
|
||||||
|
ret->size = size;
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_free:
|
||||||
|
* @reader: (in) (transfer full): a #GstBitReader instance
|
||||||
|
*
|
||||||
|
* Frees a #GstBitReader instance, which was previously allocated by
|
||||||
|
* gst_bit_reader_new().
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
gst_bit_reader_free (GstBitReader * reader)
|
||||||
|
{
|
||||||
|
g_return_if_fail (reader != NULL);
|
||||||
|
|
||||||
|
g_free (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_init:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @data: (in) (array length=size): data from which the bit reader should read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
*
|
||||||
|
* Initializes a #GstBitReader instance to read from @data. This function
|
||||||
|
* can be called on already initialized instances.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
gst_bit_reader_init (GstBitReader * reader, const guint8 * data, guint size)
|
||||||
|
{
|
||||||
|
g_return_if_fail (reader != NULL);
|
||||||
|
|
||||||
|
reader->data = data;
|
||||||
|
reader->size = size;
|
||||||
|
reader->byte = reader->bit = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_set_pos:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @pos: The new position in bits
|
||||||
|
*
|
||||||
|
* Sets the new position of a #GstBitReader instance to @pos in bits.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if the position could be set successfully, %FALSE
|
||||||
|
* otherwise.
|
||||||
|
*/
|
||||||
|
gboolean
|
||||||
|
gst_bit_reader_set_pos (GstBitReader * reader, guint pos)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
|
||||||
|
if (pos > reader->size * 8)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
reader->byte = pos / 8;
|
||||||
|
reader->bit = pos % 8;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_pos:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
*
|
||||||
|
* Returns the current position of a #GstBitReader instance in bits.
|
||||||
|
*
|
||||||
|
* Returns: The current position of @reader in bits.
|
||||||
|
*/
|
||||||
|
guint
|
||||||
|
gst_bit_reader_get_pos (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return _gst_bit_reader_get_pos_inline (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_remaining:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
*
|
||||||
|
* Returns the remaining number of bits of a #GstBitReader instance.
|
||||||
|
*
|
||||||
|
* Returns: The remaining number of bits of @reader instance.
|
||||||
|
*/
|
||||||
|
guint
|
||||||
|
gst_bit_reader_get_remaining (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return _gst_bit_reader_get_remaining_inline (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_size:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
*
|
||||||
|
* Returns the total number of bits of a #GstBitReader instance.
|
||||||
|
*
|
||||||
|
* Returns: The total number of bits of @reader instance.
|
||||||
|
*/
|
||||||
|
guint
|
||||||
|
gst_bit_reader_get_size (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return _gst_bit_reader_get_size_inline (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_skip:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @nbits: the number of bits to skip
|
||||||
|
*
|
||||||
|
* Skips @nbits bits of the #GstBitReader instance.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if @nbits bits could be skipped, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
gboolean
|
||||||
|
gst_bit_reader_skip (GstBitReader * reader, guint nbits)
|
||||||
|
{
|
||||||
|
return _gst_bit_reader_skip_inline (reader, nbits);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_skip_to_byte:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
*
|
||||||
|
* Skips until the next byte.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
gboolean
|
||||||
|
gst_bit_reader_skip_to_byte (GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return _gst_bit_reader_skip_to_byte_inline (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_bits_uint8:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint8 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val and update the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_bits_uint16:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint16 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val and update the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_bits_uint32:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint32 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val and update the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_get_bits_uint64:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint64 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val and update the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_peek_bits_uint8:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint8 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val but keep the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_peek_bits_uint16:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint16 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val but keep the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_peek_bits_uint32:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint32 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val but keep the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* gst_bit_reader_peek_bits_uint64:
|
||||||
|
* @reader: a #GstBitReader instance
|
||||||
|
* @val: (out): Pointer to a #guint64 to store the result
|
||||||
|
* @nbits: number of bits to read
|
||||||
|
*
|
||||||
|
* Read @nbits bits into @val but keep the current position.
|
||||||
|
*
|
||||||
|
* Returns: %TRUE if successful, %FALSE otherwise.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define GST_BIT_READER_READ_BITS(bits) \
|
||||||
|
gboolean \
|
||||||
|
gst_bit_reader_peek_bits_uint##bits (const GstBitReader *reader, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
return _gst_bit_reader_peek_bits_uint##bits##_inline (reader, val, nbits); \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
gboolean \
|
||||||
|
gst_bit_reader_get_bits_uint##bits (GstBitReader *reader, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
return _gst_bit_reader_get_bits_uint##bits##_inline (reader, val, nbits); \
|
||||||
|
}
|
||||||
|
|
||||||
|
GST_BIT_READER_READ_BITS (8);
|
||||||
|
GST_BIT_READER_READ_BITS (16);
|
||||||
|
GST_BIT_READER_READ_BITS (32);
|
||||||
|
GST_BIT_READER_READ_BITS (64);
|
||||||
@@ -0,0 +1,328 @@
|
|||||||
|
/* GStreamer
|
||||||
|
*
|
||||||
|
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
|
||||||
|
*
|
||||||
|
* This library is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU Library General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This library is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* Library General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Library General Public
|
||||||
|
* License along with this library; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
|
||||||
|
* Boston, MA 02110-1301, USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef __GST_BIT_READER_H__
|
||||||
|
#define __GST_BIT_READER_H__
|
||||||
|
|
||||||
|
#include <gst/gst.h>
|
||||||
|
#include <gst/base/base-prelude.h>
|
||||||
|
|
||||||
|
/* FIXME: inline functions */
|
||||||
|
|
||||||
|
G_BEGIN_DECLS
|
||||||
|
|
||||||
|
#define GST_BIT_READER(reader) ((GstBitReader *) (reader))
|
||||||
|
|
||||||
|
/**
|
||||||
|
* GstBitReader:
|
||||||
|
* @data: (array length=size): Data from which the bit reader will
|
||||||
|
* read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
* @byte: Current byte position
|
||||||
|
* @bit: Bit position in the current byte
|
||||||
|
*
|
||||||
|
* A bit reader instance.
|
||||||
|
*/
|
||||||
|
typedef struct {
|
||||||
|
const guint8 *data;
|
||||||
|
guint size;
|
||||||
|
|
||||||
|
guint byte; /* Byte position */
|
||||||
|
guint bit; /* Bit position in the current byte */
|
||||||
|
|
||||||
|
/* < private > */
|
||||||
|
gpointer _gst_reserved[GST_PADDING];
|
||||||
|
} GstBitReader;
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
GstBitReader * gst_bit_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
void gst_bit_reader_free (GstBitReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
void gst_bit_reader_init (GstBitReader *reader, const guint8 *data, guint size);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_set_pos (GstBitReader *reader, guint pos);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_bit_reader_get_pos (const GstBitReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_bit_reader_get_remaining (const GstBitReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_bit_reader_get_size (const GstBitReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_skip (GstBitReader *reader, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_skip_to_byte (GstBitReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_get_bits_uint8 (GstBitReader *reader, guint8 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_get_bits_uint16 (GstBitReader *reader, guint16 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_get_bits_uint32 (GstBitReader *reader, guint32 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_get_bits_uint64 (GstBitReader *reader, guint64 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_peek_bits_uint8 (const GstBitReader *reader, guint8 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_peek_bits_uint16 (const GstBitReader *reader, guint16 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_peek_bits_uint32 (const GstBitReader *reader, guint32 *val, guint nbits);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_bit_reader_peek_bits_uint64 (const GstBitReader *reader, guint64 *val, guint nbits);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* GST_BIT_READER_INIT:
|
||||||
|
* @data: Data from which the #GstBitReader should read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
*
|
||||||
|
* A #GstBitReader must be initialized with this macro, before it can be
|
||||||
|
* used. This macro can used be to initialize a variable, but it cannot
|
||||||
|
* be assigned to a variable. In that case you have to use
|
||||||
|
* gst_bit_reader_init().
|
||||||
|
*/
|
||||||
|
#define GST_BIT_READER_INIT(data, size) {data, size, 0, 0}
|
||||||
|
|
||||||
|
/* Unchecked variants */
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
gst_bit_reader_skip_unchecked (GstBitReader * reader, guint nbits)
|
||||||
|
{
|
||||||
|
reader->bit += nbits;
|
||||||
|
reader->byte += reader->bit / 8;
|
||||||
|
reader->bit = reader->bit % 8;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
gst_bit_reader_skip_to_byte_unchecked (GstBitReader * reader)
|
||||||
|
{
|
||||||
|
if (reader->bit) {
|
||||||
|
reader->bit = 0;
|
||||||
|
reader->byte++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#define __GST_BIT_READER_READ_BITS_UNCHECKED(bits) \
|
||||||
|
static inline guint##bits \
|
||||||
|
gst_bit_reader_peek_bits_uint##bits##_unchecked (const GstBitReader *reader, guint nbits) \
|
||||||
|
{ \
|
||||||
|
guint##bits ret = 0; \
|
||||||
|
const guint8 *data; \
|
||||||
|
guint byte, bit; \
|
||||||
|
\
|
||||||
|
data = reader->data; \
|
||||||
|
byte = reader->byte; \
|
||||||
|
bit = reader->bit; \
|
||||||
|
\
|
||||||
|
while (nbits > 0) { \
|
||||||
|
guint toread = MIN (nbits, 8 - bit); \
|
||||||
|
\
|
||||||
|
ret <<= toread; \
|
||||||
|
ret |= (data[byte] & (0xff >> bit)) >> (8 - toread - bit); \
|
||||||
|
\
|
||||||
|
bit += toread; \
|
||||||
|
if (bit >= 8) { \
|
||||||
|
byte++; \
|
||||||
|
bit = 0; \
|
||||||
|
} \
|
||||||
|
nbits -= toread; \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
static inline guint##bits \
|
||||||
|
gst_bit_reader_get_bits_uint##bits##_unchecked (GstBitReader *reader, guint nbits) \
|
||||||
|
{ \
|
||||||
|
guint##bits ret; \
|
||||||
|
\
|
||||||
|
ret = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
|
||||||
|
\
|
||||||
|
gst_bit_reader_skip_unchecked (reader, nbits); \
|
||||||
|
\
|
||||||
|
return ret; \
|
||||||
|
}
|
||||||
|
|
||||||
|
__GST_BIT_READER_READ_BITS_UNCHECKED (8)
|
||||||
|
__GST_BIT_READER_READ_BITS_UNCHECKED (16)
|
||||||
|
__GST_BIT_READER_READ_BITS_UNCHECKED (32)
|
||||||
|
__GST_BIT_READER_READ_BITS_UNCHECKED (64)
|
||||||
|
|
||||||
|
#undef __GST_BIT_READER_READ_BITS_UNCHECKED
|
||||||
|
|
||||||
|
/* unchecked variants -- do not use */
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_size_unchecked (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return reader->size * 8;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_pos_unchecked (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return reader->byte * 8 + reader->bit;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_remaining_unchecked (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
return reader->size * 8 - (reader->byte * 8 + reader->bit);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* inlined variants -- do not use directly */
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_size_inline (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_bit_reader_get_size_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_pos_inline (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_bit_reader_get_pos_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_bit_reader_get_remaining_inline (const GstBitReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_bit_reader_get_remaining_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_bit_reader_skip_inline (GstBitReader * reader, guint nbits)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
|
||||||
|
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
gst_bit_reader_skip_unchecked (reader, nbits);
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_bit_reader_skip_to_byte_inline (GstBitReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
|
||||||
|
if (reader->byte > reader->size)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
gst_bit_reader_skip_to_byte_unchecked (reader);
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define __GST_BIT_READER_READ_BITS_INLINE(bits) \
|
||||||
|
static inline gboolean \
|
||||||
|
_gst_bit_reader_get_bits_uint##bits##_inline (GstBitReader *reader, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (nbits <= bits, FALSE); \
|
||||||
|
\
|
||||||
|
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
|
||||||
|
return FALSE; \
|
||||||
|
\
|
||||||
|
*val = gst_bit_reader_get_bits_uint##bits##_unchecked (reader, nbits); \
|
||||||
|
return TRUE; \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
static inline gboolean \
|
||||||
|
_gst_bit_reader_peek_bits_uint##bits##_inline (const GstBitReader *reader, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (nbits <= bits, FALSE); \
|
||||||
|
\
|
||||||
|
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
|
||||||
|
return FALSE; \
|
||||||
|
\
|
||||||
|
*val = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
|
||||||
|
return TRUE; \
|
||||||
|
}
|
||||||
|
|
||||||
|
__GST_BIT_READER_READ_BITS_INLINE (8)
|
||||||
|
__GST_BIT_READER_READ_BITS_INLINE (16)
|
||||||
|
__GST_BIT_READER_READ_BITS_INLINE (32)
|
||||||
|
__GST_BIT_READER_READ_BITS_INLINE (64)
|
||||||
|
|
||||||
|
#undef __GST_BIT_READER_READ_BITS_INLINE
|
||||||
|
|
||||||
|
#ifndef GST_BIT_READER_DISABLE_INLINES
|
||||||
|
|
||||||
|
#define gst_bit_reader_get_size(reader) \
|
||||||
|
_gst_bit_reader_get_size_inline (reader)
|
||||||
|
#define gst_bit_reader_get_pos(reader) \
|
||||||
|
_gst_bit_reader_get_pos_inline (reader)
|
||||||
|
#define gst_bit_reader_get_remaining(reader) \
|
||||||
|
_gst_bit_reader_get_remaining_inline (reader)
|
||||||
|
|
||||||
|
/* we use defines here so we can add the G_LIKELY() */
|
||||||
|
|
||||||
|
#define gst_bit_reader_skip(reader, nbits)\
|
||||||
|
G_LIKELY (_gst_bit_reader_skip_inline(reader, nbits))
|
||||||
|
#define gst_bit_reader_skip_to_byte(reader)\
|
||||||
|
G_LIKELY (_gst_bit_reader_skip_to_byte_inline(reader))
|
||||||
|
|
||||||
|
#define gst_bit_reader_get_bits_uint8(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_get_bits_uint8_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_get_bits_uint16(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_get_bits_uint16_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_get_bits_uint32(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_get_bits_uint32_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_get_bits_uint64(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_get_bits_uint64_inline (reader, val, nbits))
|
||||||
|
|
||||||
|
#define gst_bit_reader_peek_bits_uint8(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_peek_bits_uint8_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_peek_bits_uint16(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_peek_bits_uint16_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_peek_bits_uint32(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_peek_bits_uint32_inline (reader, val, nbits))
|
||||||
|
#define gst_bit_reader_peek_bits_uint64(reader, val, nbits) \
|
||||||
|
G_LIKELY (_gst_bit_reader_peek_bits_uint64_inline (reader, val, nbits))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
G_END_DECLS
|
||||||
|
|
||||||
|
#endif /* __GST_BIT_READER_H__ */
|
||||||
@@ -0,0 +1,67 @@
|
|||||||
|
/* Stub for <gst/base/gstbitwriter.h>.
|
||||||
|
*
|
||||||
|
* The vendored nalutils.c uses GstBitWriter for NAL emulation-prevention
|
||||||
|
* byte INSERTION during write-side (encoder) operations. The libva
|
||||||
|
* backend never invokes those paths — we only PARSE NAL units, never
|
||||||
|
* write them. The functions must still compile + link though, so we
|
||||||
|
* stub them with abort() runtime guards: if any future code path
|
||||||
|
* accidentally invokes a writer function, we fail-fast instead of
|
||||||
|
* silently corrupting.
|
||||||
|
*
|
||||||
|
* Header surface mirrors upstream gstbitwriter.h minimally — enough
|
||||||
|
* for nalutils.c to compile.
|
||||||
|
*/
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
|
||||||
|
|
||||||
|
#include "gst_compat.h"
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
guint8 *data;
|
||||||
|
guint bit_size;
|
||||||
|
guint bit_capacity;
|
||||||
|
gboolean auto_grow;
|
||||||
|
gboolean owned;
|
||||||
|
} GstBitWriter;
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
gst_bit_writer_init(GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_bit_writer_init_with_size(GstBitWriter *bw, guint size, gboolean fixed) {
|
||||||
|
(void)bw; (void)size; (void)fixed; abort();
|
||||||
|
}
|
||||||
|
static inline void
|
||||||
|
gst_bit_writer_reset(GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline gboolean
|
||||||
|
gst_bit_writer_put_bits_uint8(GstBitWriter *bw, guint8 value, guint nbits) {
|
||||||
|
(void)bw; (void)value; (void)nbits; abort();
|
||||||
|
}
|
||||||
|
static inline gboolean
|
||||||
|
gst_bit_writer_align_bytes(GstBitWriter *bw, guint8 trailing_bit) {
|
||||||
|
(void)bw; (void)trailing_bit; abort();
|
||||||
|
}
|
||||||
|
static inline guint8 *
|
||||||
|
gst_bit_writer_get_data(GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline guint
|
||||||
|
gst_bit_writer_get_size(const GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline guint
|
||||||
|
gst_bit_writer_reset_and_get_size(GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline guint8 *
|
||||||
|
gst_bit_writer_reset_and_get_data(GstBitWriter *bw) { (void)bw; abort(); }
|
||||||
|
static inline gboolean
|
||||||
|
gst_bit_writer_put_bits_uint16(GstBitWriter *bw, guint16 value, guint nbits) {
|
||||||
|
(void)bw; (void)value; (void)nbits; abort();
|
||||||
|
}
|
||||||
|
static inline gboolean
|
||||||
|
gst_bit_writer_put_bits_uint32(GstBitWriter *bw, guint32 value, guint nbits) {
|
||||||
|
(void)bw; (void)value; (void)nbits; abort();
|
||||||
|
}
|
||||||
|
static inline gboolean
|
||||||
|
gst_bit_writer_put_bytes(GstBitWriter *bw, const guint8 *data, guint nbytes) {
|
||||||
|
(void)bw; (void)data; (void)nbytes; abort();
|
||||||
|
}
|
||||||
|
|
||||||
|
#define GST_BIT_WRITER_BIT_SIZE(bw) ((bw)->bit_size)
|
||||||
|
#define GST_BIT_WRITER_DATA(bw) ((bw)->data)
|
||||||
|
|
||||||
|
#endif
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,684 @@
|
|||||||
|
/* GStreamer byte reader
|
||||||
|
*
|
||||||
|
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
|
||||||
|
* Copyright (C) 2009 Tim-Philipp Müller <tim centricular net>
|
||||||
|
*
|
||||||
|
* This library is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU Library General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This library is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* Library General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Library General Public
|
||||||
|
* License along with this library; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
|
||||||
|
* Boston, MA 02110-1301, USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef __GST_BYTE_READER_H__
|
||||||
|
#define __GST_BYTE_READER_H__
|
||||||
|
|
||||||
|
#include <gst/gst.h>
|
||||||
|
#include <gst/base/base-prelude.h>
|
||||||
|
|
||||||
|
G_BEGIN_DECLS
|
||||||
|
|
||||||
|
#define GST_BYTE_READER(reader) ((GstByteReader *) (reader))
|
||||||
|
|
||||||
|
/**
|
||||||
|
* GstByteReader:
|
||||||
|
* @data: (array length=size): Data from which the bit reader will
|
||||||
|
* read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
* @byte: Current byte position
|
||||||
|
*
|
||||||
|
* A byte reader instance.
|
||||||
|
*/
|
||||||
|
typedef struct {
|
||||||
|
const guint8 *data;
|
||||||
|
guint size;
|
||||||
|
|
||||||
|
guint byte; /* Byte position */
|
||||||
|
|
||||||
|
/* < private > */
|
||||||
|
gpointer _gst_reserved[GST_PADDING];
|
||||||
|
} GstByteReader;
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
GstByteReader * gst_byte_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
void gst_byte_reader_free (GstByteReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
void gst_byte_reader_init (GstByteReader *reader, const guint8 *data, guint size);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_sub_reader (GstByteReader * reader,
|
||||||
|
GstByteReader * sub_reader,
|
||||||
|
guint size);
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_sub_reader (GstByteReader * reader,
|
||||||
|
GstByteReader * sub_reader,
|
||||||
|
guint size);
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_set_pos (GstByteReader *reader, guint pos);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_byte_reader_get_pos (const GstByteReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_byte_reader_get_remaining (const GstByteReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_byte_reader_get_size (const GstByteReader *reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_skip (GstByteReader *reader, guint nbytes);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint8 (GstByteReader *reader, guint8 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int8 (GstByteReader *reader, gint8 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint16_le (GstByteReader *reader, guint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int16_le (GstByteReader *reader, gint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint16_be (GstByteReader *reader, guint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int16_be (GstByteReader *reader, gint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint24_le (GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int24_le (GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint24_be (GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int24_be (GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint32_le (GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int32_le (GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint32_be (GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int32_be (GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint64_le (GstByteReader *reader, guint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int64_le (GstByteReader *reader, gint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_uint64_be (GstByteReader *reader, guint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_int64_be (GstByteReader *reader, gint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint8 (const GstByteReader *reader, guint8 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int8 (const GstByteReader *reader, gint8 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint16_le (const GstByteReader *reader, guint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int16_le (const GstByteReader *reader, gint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint16_be (const GstByteReader *reader, guint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int16_be (const GstByteReader *reader, gint16 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint24_le (const GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int24_le (const GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint24_be (const GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int24_be (const GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint32_le (const GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int32_le (const GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint32_be (const GstByteReader *reader, guint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int32_be (const GstByteReader *reader, gint32 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint64_le (const GstByteReader *reader, guint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int64_le (const GstByteReader *reader, gint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_uint64_be (const GstByteReader *reader, guint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_int64_be (const GstByteReader *reader, gint64 *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_float32_le (GstByteReader *reader, gfloat *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_float32_be (GstByteReader *reader, gfloat *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_float64_le (GstByteReader *reader, gdouble *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_float64_be (GstByteReader *reader, gdouble *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_float32_le (const GstByteReader *reader, gfloat *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_float32_be (const GstByteReader *reader, gfloat *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_float64_le (const GstByteReader *reader, gdouble *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_float64_be (const GstByteReader *reader, gdouble *val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_dup_data (GstByteReader * reader, guint size, guint8 ** val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_data (GstByteReader * reader, guint size, const guint8 ** val);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_data (const GstByteReader * reader, guint size, const guint8 ** val);
|
||||||
|
|
||||||
|
#define gst_byte_reader_dup_string(reader,str) \
|
||||||
|
gst_byte_reader_dup_string_utf8(reader,str)
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_dup_string_utf8 (GstByteReader * reader, gchar ** str);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_dup_string_utf16 (GstByteReader * reader, guint16 ** str);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_dup_string_utf32 (GstByteReader * reader, guint32 ** str);
|
||||||
|
|
||||||
|
#define gst_byte_reader_skip_string(reader) \
|
||||||
|
gst_byte_reader_skip_string_utf8(reader)
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_skip_string_utf8 (GstByteReader * reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_skip_string_utf16 (GstByteReader * reader);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_skip_string_utf32 (GstByteReader * reader);
|
||||||
|
|
||||||
|
#define gst_byte_reader_get_string(reader,str) \
|
||||||
|
gst_byte_reader_get_string_utf8(reader,str)
|
||||||
|
|
||||||
|
#define gst_byte_reader_peek_string(reader,str) \
|
||||||
|
gst_byte_reader_peek_string_utf8(reader,str)
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_get_string_utf8 (GstByteReader * reader, const gchar ** str);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
gboolean gst_byte_reader_peek_string_utf8 (const GstByteReader * reader, const gchar ** str);
|
||||||
|
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_byte_reader_masked_scan_uint32 (const GstByteReader * reader,
|
||||||
|
guint32 mask,
|
||||||
|
guint32 pattern,
|
||||||
|
guint offset,
|
||||||
|
guint size);
|
||||||
|
GST_BASE_API
|
||||||
|
guint gst_byte_reader_masked_scan_uint32_peek (const GstByteReader * reader,
|
||||||
|
guint32 mask,
|
||||||
|
guint32 pattern,
|
||||||
|
guint offset,
|
||||||
|
guint size,
|
||||||
|
guint32 * value);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* GST_BYTE_READER_INIT:
|
||||||
|
* @data: Data from which the #GstByteReader should read
|
||||||
|
* @size: Size of @data in bytes
|
||||||
|
*
|
||||||
|
* A #GstByteReader must be initialized with this macro, before it can be
|
||||||
|
* used. This macro can used be to initialize a variable, but it cannot
|
||||||
|
* be assigned to a variable. In that case you have to use
|
||||||
|
* gst_byte_reader_init().
|
||||||
|
*/
|
||||||
|
#define GST_BYTE_READER_INIT(data, size) {data, size, 0}
|
||||||
|
|
||||||
|
/* unchecked variants */
|
||||||
|
static inline void
|
||||||
|
gst_byte_reader_skip_unchecked (GstByteReader * reader, guint nbytes)
|
||||||
|
{
|
||||||
|
reader->byte += nbytes;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define __GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(bits,type,lower,upper,adj) \
|
||||||
|
\
|
||||||
|
static inline type \
|
||||||
|
gst_byte_reader_peek_##lower##_unchecked (const GstByteReader * reader) \
|
||||||
|
{ \
|
||||||
|
type val = (type) GST_READ_##upper (reader->data + reader->byte); \
|
||||||
|
adj \
|
||||||
|
return val; \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
static inline type \
|
||||||
|
gst_byte_reader_get_##lower##_unchecked (GstByteReader * reader) \
|
||||||
|
{ \
|
||||||
|
type val = gst_byte_reader_peek_##lower##_unchecked (reader); \
|
||||||
|
reader->byte += bits / 8; \
|
||||||
|
return val; \
|
||||||
|
}
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,guint8,uint8,UINT8,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,gint8,int8,UINT8,/* */)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_le,UINT16_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_be,UINT16_BE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_le,UINT16_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_be,UINT16_BE,/* */)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_le,UINT32_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_be,UINT32_BE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_le,UINT32_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_be,UINT32_BE,/* */)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_le,UINT24_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_be,UINT24_BE,/* */)
|
||||||
|
|
||||||
|
/* fix up the sign for 24-bit signed ints stored in 32-bit signed ints */
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_le,UINT24_LE,
|
||||||
|
if (val & 0x00800000) val |= 0xff000000;)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_be,UINT24_BE,
|
||||||
|
if (val & 0x00800000) val |= 0xff000000;)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_le,UINT64_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_be,UINT64_BE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_le,UINT64_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_be,UINT64_BE,/* */)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_le,FLOAT_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_be,FLOAT_BE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_le,DOUBLE_LE,/* */)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_be,DOUBLE_BE,/* */)
|
||||||
|
|
||||||
|
#undef __GET_PEEK_BITS_UNCHECKED
|
||||||
|
|
||||||
|
static inline const guint8 *
|
||||||
|
gst_byte_reader_peek_data_unchecked (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
return (const guint8 *) (reader->data + reader->byte);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline const guint8 *
|
||||||
|
gst_byte_reader_get_data_unchecked (GstByteReader * reader, guint size)
|
||||||
|
{
|
||||||
|
const guint8 *data;
|
||||||
|
|
||||||
|
data = gst_byte_reader_peek_data_unchecked (reader);
|
||||||
|
gst_byte_reader_skip_unchecked (reader, size);
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint8 *
|
||||||
|
gst_byte_reader_dup_data_unchecked (GstByteReader * reader, guint size)
|
||||||
|
{
|
||||||
|
gconstpointer data = gst_byte_reader_get_data_unchecked (reader, size);
|
||||||
|
guint8 *dup_data = (guint8 *) g_malloc (size);
|
||||||
|
|
||||||
|
memcpy (dup_data, data, size);
|
||||||
|
return dup_data;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Unchecked variants that should not be used */
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_pos_unchecked (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
return reader->byte;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_remaining_unchecked (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
return reader->size - reader->byte;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_size_unchecked (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
return reader->size;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* inlined variants (do not use directly) */
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_remaining_inline (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_byte_reader_get_remaining_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_size_inline (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_byte_reader_get_size_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
#define __GST_BYTE_READER_GET_PEEK_BITS_INLINE(bits,type,name) \
|
||||||
|
\
|
||||||
|
static inline gboolean \
|
||||||
|
_gst_byte_reader_peek_##name##_inline (const GstByteReader * reader, type * val) \
|
||||||
|
{ \
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE); \
|
||||||
|
\
|
||||||
|
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
|
||||||
|
return FALSE; \
|
||||||
|
\
|
||||||
|
*val = gst_byte_reader_peek_##name##_unchecked (reader); \
|
||||||
|
return TRUE; \
|
||||||
|
} \
|
||||||
|
\
|
||||||
|
static inline gboolean \
|
||||||
|
_gst_byte_reader_get_##name##_inline (GstByteReader * reader, type * val) \
|
||||||
|
{ \
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE); \
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE); \
|
||||||
|
\
|
||||||
|
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
|
||||||
|
return FALSE; \
|
||||||
|
\
|
||||||
|
*val = gst_byte_reader_get_##name##_unchecked (reader); \
|
||||||
|
return TRUE; \
|
||||||
|
}
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,guint8,uint8)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,gint8,int8)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_be)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_be)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_be)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_be)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_be)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_be)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_be)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_be)
|
||||||
|
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_be)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_le)
|
||||||
|
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_be)
|
||||||
|
|
||||||
|
#undef __GST_BYTE_READER_GET_PEEK_BITS_INLINE
|
||||||
|
|
||||||
|
#ifndef GST_BYTE_READER_DISABLE_INLINES
|
||||||
|
|
||||||
|
#define gst_byte_reader_init(reader,data,size) \
|
||||||
|
_gst_byte_reader_init_inline(reader,data,size)
|
||||||
|
|
||||||
|
#define gst_byte_reader_get_remaining(reader) \
|
||||||
|
_gst_byte_reader_get_remaining_inline(reader)
|
||||||
|
|
||||||
|
#define gst_byte_reader_get_size(reader) \
|
||||||
|
_gst_byte_reader_get_size_inline(reader)
|
||||||
|
|
||||||
|
#define gst_byte_reader_get_pos(reader) \
|
||||||
|
_gst_byte_reader_get_pos_inline(reader)
|
||||||
|
|
||||||
|
/* we use defines here so we can add the G_LIKELY() */
|
||||||
|
#define gst_byte_reader_get_uint8(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint8_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int8(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int8_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint16_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint16_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int16_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int16_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint16_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint16_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int16_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int16_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint24_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint24_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int24_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int24_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint24_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint24_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int24_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int24_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_uint64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_uint64_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_int64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_int64_be_inline(reader,val))
|
||||||
|
|
||||||
|
#define gst_byte_reader_peek_uint8(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint8_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int8(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int8_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint16_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint16_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int16_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int16_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint16_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint16_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int16_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int16_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint24_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint24_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int24_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int24_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint24_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint24_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int24_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int24_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_uint64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_uint64_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_int64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_int64_be_inline(reader,val))
|
||||||
|
|
||||||
|
#define gst_byte_reader_get_float32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_float32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_float32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_float32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_float64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_float64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_get_float64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_float64_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_float32_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_float32_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_float32_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_float32_be_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_float64_le(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_float64_le_inline(reader,val))
|
||||||
|
#define gst_byte_reader_peek_float64_be(reader,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_float64_be_inline(reader,val))
|
||||||
|
|
||||||
|
#endif /* GST_BYTE_READER_DISABLE_INLINES */
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
_gst_byte_reader_init_inline (GstByteReader * reader, const guint8 * data, guint size)
|
||||||
|
{
|
||||||
|
g_return_if_fail (reader != NULL);
|
||||||
|
|
||||||
|
reader->data = data;
|
||||||
|
reader->size = size;
|
||||||
|
reader->byte = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_peek_sub_reader_inline (GstByteReader * reader,
|
||||||
|
GstByteReader * sub_reader, guint size)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (sub_reader != NULL, FALSE);
|
||||||
|
|
||||||
|
if (_gst_byte_reader_get_remaining_unchecked (reader) < size)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
sub_reader->data = reader->data + reader->byte;
|
||||||
|
sub_reader->byte = 0;
|
||||||
|
sub_reader->size = size;
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_get_sub_reader_inline (GstByteReader * reader,
|
||||||
|
GstByteReader * sub_reader, guint size)
|
||||||
|
{
|
||||||
|
if (!_gst_byte_reader_peek_sub_reader_inline (reader, sub_reader, size))
|
||||||
|
return FALSE;
|
||||||
|
gst_byte_reader_skip_unchecked (reader, size);
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_dup_data_inline (GstByteReader * reader, guint size, guint8 ** val)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE);
|
||||||
|
|
||||||
|
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
*val = gst_byte_reader_dup_data_unchecked (reader, size);
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_get_data_inline (GstByteReader * reader, guint size, const guint8 ** val)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE);
|
||||||
|
|
||||||
|
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
*val = gst_byte_reader_get_data_unchecked (reader, size);
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_peek_data_inline (const GstByteReader * reader, guint size, const guint8 ** val)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (val != NULL, FALSE);
|
||||||
|
|
||||||
|
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
*val = gst_byte_reader_peek_data_unchecked (reader);
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint
|
||||||
|
_gst_byte_reader_get_pos_inline (const GstByteReader * reader)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, 0);
|
||||||
|
|
||||||
|
return _gst_byte_reader_get_pos_unchecked (reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline gboolean
|
||||||
|
_gst_byte_reader_skip_inline (GstByteReader * reader, guint nbytes)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (reader != NULL, FALSE);
|
||||||
|
|
||||||
|
if (G_UNLIKELY (_gst_byte_reader_get_remaining_unchecked (reader) < nbytes))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
reader->byte += nbytes;
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef GST_BYTE_READER_DISABLE_INLINES
|
||||||
|
|
||||||
|
#define gst_byte_reader_dup_data(reader,size,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_dup_data_inline(reader,size,val))
|
||||||
|
#define gst_byte_reader_get_data(reader,size,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_get_data_inline(reader,size,val))
|
||||||
|
#define gst_byte_reader_peek_data(reader,size,val) \
|
||||||
|
G_LIKELY(_gst_byte_reader_peek_data_inline(reader,size,val))
|
||||||
|
#define gst_byte_reader_skip(reader,nbytes) \
|
||||||
|
G_LIKELY(_gst_byte_reader_skip_inline(reader,nbytes))
|
||||||
|
|
||||||
|
#endif /* GST_BYTE_READER_DISABLE_INLINES */
|
||||||
|
|
||||||
|
G_END_DECLS
|
||||||
|
|
||||||
|
#endif /* __GST_BYTE_READER_H__ */
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
/* Stub for <gst/codecparsers/codecparsers-prelude.h>.
|
||||||
|
* Same shape as base-prelude.h — drop the GObject boilerplate + define
|
||||||
|
* the GstCodecParsersAPI macro to nothing.
|
||||||
|
*/
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
|
||||||
|
#include "gst_compat.h"
|
||||||
|
#define GST_CODEC_PARSERS_API
|
||||||
|
#endif
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,545 @@
|
|||||||
|
/* Gstreamer
|
||||||
|
* Copyright (C) <2011> Intel Corporation
|
||||||
|
* Copyright (C) <2011> Collabora Ltd.
|
||||||
|
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
|
||||||
|
*
|
||||||
|
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
|
||||||
|
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
|
||||||
|
* Copyright (C) <2010> Collabora Multimedia
|
||||||
|
* Copyright (C) <2010> Nokia Corporation
|
||||||
|
*
|
||||||
|
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
|
||||||
|
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
|
||||||
|
*
|
||||||
|
* This library is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU Library General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This library is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* Library General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Library General Public
|
||||||
|
* License along with this library; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
|
||||||
|
* Boston, MA 02110-1301, USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Common code for NAL parsing from h264 and h265 parsers.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
# include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include "nalutils.h"
|
||||||
|
|
||||||
|
/****** Nal parser ******/
|
||||||
|
|
||||||
|
void
|
||||||
|
nal_reader_init (NalReader * nr, const guint8 * data, guint size)
|
||||||
|
{
|
||||||
|
nr->data = data;
|
||||||
|
nr->size = size;
|
||||||
|
nr->n_epb = 0;
|
||||||
|
|
||||||
|
nr->byte = 0;
|
||||||
|
nr->bits_in_cache = 0;
|
||||||
|
/* fill with something other than 0 to detect emulation prevention bytes */
|
||||||
|
nr->first_byte = 0xff;
|
||||||
|
nr->epb_cache = 0xff;
|
||||||
|
nr->cache = 0xff;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_reader_read (NalReader * nr, guint nbits)
|
||||||
|
{
|
||||||
|
if (G_UNLIKELY (nr->byte * 8 + (nbits - nr->bits_in_cache) > nr->size * 8)) {
|
||||||
|
GST_DEBUG ("Can not read %u bits, bits in cache %u, Byte * 8 %u, size in "
|
||||||
|
"bits %u", nbits, nr->bits_in_cache, nr->byte * 8, nr->size * 8);
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (nr->bits_in_cache < nbits) {
|
||||||
|
guint8 byte;
|
||||||
|
|
||||||
|
next_byte:
|
||||||
|
if (G_UNLIKELY (nr->byte >= nr->size))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
byte = nr->data[nr->byte++];
|
||||||
|
nr->epb_cache = (nr->epb_cache << 8) | byte;
|
||||||
|
|
||||||
|
/* check if the byte is a emulation_prevention_three_byte */
|
||||||
|
if ((nr->epb_cache & 0xffffff) == 0x3) {
|
||||||
|
nr->n_epb++;
|
||||||
|
goto next_byte;
|
||||||
|
}
|
||||||
|
nr->cache = (nr->cache << 8) | nr->first_byte;
|
||||||
|
nr->first_byte = byte;
|
||||||
|
nr->bits_in_cache += 8;
|
||||||
|
}
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Skips the specified amount of bits. This is only suitable to a
|
||||||
|
cacheable number of bits */
|
||||||
|
gboolean
|
||||||
|
nal_reader_skip (NalReader * nr, guint nbits)
|
||||||
|
{
|
||||||
|
g_assert (nbits <= 8 * sizeof (nr->cache));
|
||||||
|
|
||||||
|
if (G_UNLIKELY (!nal_reader_read (nr, nbits)))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
nr->bits_in_cache -= nbits;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Generic version to skip any number of bits */
|
||||||
|
gboolean
|
||||||
|
nal_reader_skip_long (NalReader * nr, guint nbits)
|
||||||
|
{
|
||||||
|
/* Leave out enough bits in the cache once we are finished */
|
||||||
|
const guint skip_size = 4 * sizeof (nr->cache);
|
||||||
|
guint remaining = nbits;
|
||||||
|
|
||||||
|
nbits %= skip_size;
|
||||||
|
while (remaining > 0) {
|
||||||
|
if (!nal_reader_skip (nr, nbits))
|
||||||
|
return FALSE;
|
||||||
|
remaining -= nbits;
|
||||||
|
nbits = skip_size;
|
||||||
|
}
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
guint
|
||||||
|
nal_reader_get_pos (const NalReader * nr)
|
||||||
|
{
|
||||||
|
return nr->byte * 8 - nr->bits_in_cache;
|
||||||
|
}
|
||||||
|
|
||||||
|
guint
|
||||||
|
nal_reader_get_remaining (const NalReader * nr)
|
||||||
|
{
|
||||||
|
return (nr->size - nr->byte) * 8 + nr->bits_in_cache;
|
||||||
|
}
|
||||||
|
|
||||||
|
guint
|
||||||
|
nal_reader_get_epb_count (const NalReader * nr)
|
||||||
|
{
|
||||||
|
return nr->n_epb;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define NAL_READER_READ_BITS(bits) \
|
||||||
|
gboolean \
|
||||||
|
nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
guint shift; \
|
||||||
|
\
|
||||||
|
if (!nal_reader_read (nr, nbits)) \
|
||||||
|
return FALSE; \
|
||||||
|
\
|
||||||
|
/* bring the required bits down and truncate */ \
|
||||||
|
shift = nr->bits_in_cache - nbits; \
|
||||||
|
*val = nr->first_byte >> shift; \
|
||||||
|
\
|
||||||
|
*val |= nr->cache << (8 - shift); \
|
||||||
|
/* mask out required bits */ \
|
||||||
|
if (nbits < bits) \
|
||||||
|
*val &= ((guint##bits)1 << nbits) - 1; \
|
||||||
|
\
|
||||||
|
nr->bits_in_cache = shift; \
|
||||||
|
\
|
||||||
|
return TRUE; \
|
||||||
|
} \
|
||||||
|
|
||||||
|
NAL_READER_READ_BITS (8);
|
||||||
|
NAL_READER_READ_BITS (16);
|
||||||
|
NAL_READER_READ_BITS (32);
|
||||||
|
|
||||||
|
#define NAL_READER_PEEK_BITS(bits) \
|
||||||
|
gboolean \
|
||||||
|
nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits) \
|
||||||
|
{ \
|
||||||
|
NalReader tmp; \
|
||||||
|
\
|
||||||
|
tmp = *nr; \
|
||||||
|
return nal_reader_get_bits_uint##bits (&tmp, val, nbits); \
|
||||||
|
}
|
||||||
|
|
||||||
|
NAL_READER_PEEK_BITS (8);
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_reader_get_ue (NalReader * nr, guint32 * val)
|
||||||
|
{
|
||||||
|
guint i = 0;
|
||||||
|
guint8 bit;
|
||||||
|
guint32 value;
|
||||||
|
|
||||||
|
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
while (bit == 0) {
|
||||||
|
i++;
|
||||||
|
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (G_UNLIKELY (i > 31))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
if (G_UNLIKELY (!nal_reader_get_bits_uint32 (nr, &value, i)))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
*val = (1 << i) - 1 + value;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_reader_get_se (NalReader * nr, gint32 * val)
|
||||||
|
{
|
||||||
|
guint32 value;
|
||||||
|
|
||||||
|
if (G_UNLIKELY (!nal_reader_get_ue (nr, &value)))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
if (value % 2)
|
||||||
|
*val = (value / 2) + 1;
|
||||||
|
else
|
||||||
|
*val = -(value / 2);
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_reader_is_byte_aligned (NalReader * nr)
|
||||||
|
{
|
||||||
|
if (nr->bits_in_cache != 0)
|
||||||
|
return FALSE;
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_reader_has_more_data (NalReader * nr)
|
||||||
|
{
|
||||||
|
NalReader nr_tmp;
|
||||||
|
guint remaining, nbits;
|
||||||
|
guint8 rbsp_stop_one_bit, zero_bits;
|
||||||
|
|
||||||
|
remaining = nal_reader_get_remaining (nr);
|
||||||
|
if (remaining == 0)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
nr_tmp = *nr;
|
||||||
|
nr = &nr_tmp;
|
||||||
|
|
||||||
|
/* The spec defines that more_rbsp_data() searches for the last bit
|
||||||
|
equal to 1, and that it is the rbsp_stop_one_bit. Subsequent bits
|
||||||
|
until byte boundary is reached shall be zero.
|
||||||
|
|
||||||
|
This means that more_rbsp_data() is FALSE if the next bit is 1
|
||||||
|
and the remaining bits until byte boundary are zero. One way to
|
||||||
|
be sure that this bit was the very last one, is that every other
|
||||||
|
bit after we reached byte boundary are also set to zero.
|
||||||
|
Otherwise, if the next bit is 0 or if there are non-zero bits
|
||||||
|
afterwards, then then we have more_rbsp_data() */
|
||||||
|
if (!nal_reader_get_bits_uint8 (nr, &rbsp_stop_one_bit, 1))
|
||||||
|
return FALSE;
|
||||||
|
if (!rbsp_stop_one_bit)
|
||||||
|
return TRUE;
|
||||||
|
|
||||||
|
nbits = --remaining % 8;
|
||||||
|
while (remaining > 0) {
|
||||||
|
if (!nal_reader_get_bits_uint8 (nr, &zero_bits, nbits))
|
||||||
|
return FALSE;
|
||||||
|
if (zero_bits != 0)
|
||||||
|
return TRUE;
|
||||||
|
remaining -= nbits;
|
||||||
|
nbits = 8;
|
||||||
|
}
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********** end of nal parser ***************/
|
||||||
|
|
||||||
|
gint
|
||||||
|
scan_for_start_codes (const guint8 * data, guint size)
|
||||||
|
{
|
||||||
|
GstByteReader br;
|
||||||
|
gst_byte_reader_init (&br, data, size);
|
||||||
|
|
||||||
|
/* NALU not empty, so we can at least expect 1 (even 2) bytes following sc */
|
||||||
|
return gst_byte_reader_masked_scan_uint32 (&br, 0xffffff00, 0x00000100,
|
||||||
|
0, size);
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized)
|
||||||
|
{
|
||||||
|
g_return_if_fail (nw != NULL);
|
||||||
|
g_return_if_fail ((packetized && nal_prefix_size > 1 && nal_prefix_size < 5)
|
||||||
|
|| (!packetized && (nal_prefix_size == 3 || nal_prefix_size == 4)));
|
||||||
|
|
||||||
|
gst_bit_writer_init (&nw->bw);
|
||||||
|
nw->nal_prefix_size = nal_prefix_size;
|
||||||
|
nw->packetized = packetized;
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
nal_writer_reset (NalWriter * nw)
|
||||||
|
{
|
||||||
|
g_return_if_fail (nw != NULL);
|
||||||
|
|
||||||
|
gst_bit_writer_reset (&nw->bw);
|
||||||
|
memset (nw, 0, sizeof (NalWriter));
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_do_rbsp_trailing_bits (NalWriter * nw)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
|
||||||
|
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, 1, 1)) {
|
||||||
|
GST_WARNING ("Cannot put trailing bits");
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!gst_bit_writer_align_bytes (&nw->bw, 0)) {
|
||||||
|
GST_WARNING ("Cannot put align bits");
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
static gpointer
|
||||||
|
nal_writer_create_nal_data (NalWriter * nw, guint32 * ret_size)
|
||||||
|
{
|
||||||
|
GstBitWriter bw;
|
||||||
|
gint i;
|
||||||
|
guint8 *src, *dst;
|
||||||
|
gsize size;
|
||||||
|
gpointer data;
|
||||||
|
|
||||||
|
/* scan to put emulation_prevention_three_byte */
|
||||||
|
size = GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3;
|
||||||
|
src = GST_BIT_WRITER_DATA (&nw->bw);
|
||||||
|
|
||||||
|
gst_bit_writer_init_with_size (&bw, size + nw->nal_prefix_size, FALSE);
|
||||||
|
for (i = 0; i < nw->nal_prefix_size - 1; i++)
|
||||||
|
gst_bit_writer_put_bits_uint8 (&bw, 0, 8);
|
||||||
|
gst_bit_writer_put_bits_uint8 (&bw, 1, 8);
|
||||||
|
|
||||||
|
for (i = 0; i < size; i++) {
|
||||||
|
guint pos = (GST_BIT_WRITER_BIT_SIZE (&bw) >> 3);
|
||||||
|
dst = GST_BIT_WRITER_DATA (&bw);
|
||||||
|
if (pos >= nw->nal_prefix_size + 2 &&
|
||||||
|
dst[pos - 2] == 0 && dst[pos - 1] == 0 && src[i] <= 0x3) {
|
||||||
|
gst_bit_writer_put_bits_uint8 (&bw, 0x3, 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
gst_bit_writer_put_bits_uint8 (&bw, src[i], 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
*ret_size = bw.bit_size >> 3;
|
||||||
|
data = gst_bit_writer_reset_and_get_data (&bw);
|
||||||
|
|
||||||
|
if (nw->packetized) {
|
||||||
|
size = *ret_size - nw->nal_prefix_size;
|
||||||
|
|
||||||
|
switch (nw->nal_prefix_size) {
|
||||||
|
case 1:
|
||||||
|
GST_WRITE_UINT8 (data, size);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
GST_WRITE_UINT16_BE (data, size);
|
||||||
|
break;
|
||||||
|
case 3:
|
||||||
|
GST_WRITE_UINT24_BE (data, size);
|
||||||
|
break;
|
||||||
|
case 4:
|
||||||
|
GST_WRITE_UINT32_BE (data, size);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
g_assert_not_reached ();
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
GstMemory *
|
||||||
|
nal_writer_reset_and_get_memory (NalWriter * nw)
|
||||||
|
{
|
||||||
|
guint32 size = 0;
|
||||||
|
GstMemory *ret = NULL;
|
||||||
|
gpointer data;
|
||||||
|
|
||||||
|
g_return_val_if_fail (nw != NULL, NULL);
|
||||||
|
|
||||||
|
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
|
||||||
|
GST_WARNING ("No written byte");
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
|
||||||
|
GST_WARNING ("Written stream is not byte aligned");
|
||||||
|
if (!nal_writer_do_rbsp_trailing_bits (nw))
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
data = nal_writer_create_nal_data (nw, &size);
|
||||||
|
if (!data) {
|
||||||
|
GST_WARNING ("Failed to create nal data");
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = gst_memory_new_wrapped (0, data, size, 0, size, data, g_free);
|
||||||
|
|
||||||
|
done:
|
||||||
|
gst_bit_writer_reset (&nw->bw);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
guint8 *
|
||||||
|
nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size)
|
||||||
|
{
|
||||||
|
guint32 size = 0;
|
||||||
|
guint8 *data = NULL;
|
||||||
|
|
||||||
|
g_return_val_if_fail (nw != NULL, NULL);
|
||||||
|
g_return_val_if_fail (ret_size != NULL, NULL);
|
||||||
|
|
||||||
|
*ret_size = 0;
|
||||||
|
|
||||||
|
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
|
||||||
|
GST_WARNING ("No written byte");
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
|
||||||
|
GST_WARNING ("Written stream is not byte aligned");
|
||||||
|
if (!nal_writer_do_rbsp_trailing_bits (nw))
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
data = nal_writer_create_nal_data (nw, &size);
|
||||||
|
if (!data) {
|
||||||
|
GST_WARNING ("Failed to create nal data");
|
||||||
|
goto done;
|
||||||
|
}
|
||||||
|
|
||||||
|
*ret_size = size;
|
||||||
|
|
||||||
|
done:
|
||||||
|
gst_bit_writer_reset (&nw->bw);
|
||||||
|
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
|
||||||
|
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, value, nbits))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
|
||||||
|
if (!gst_bit_writer_put_bits_uint16 (&nw->bw, value, nbits))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
|
||||||
|
if (!gst_bit_writer_put_bits_uint32 (&nw->bw, value, nbits))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes)
|
||||||
|
{
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (data != NULL, FALSE);
|
||||||
|
g_return_val_if_fail (nbytes != 0, FALSE);
|
||||||
|
|
||||||
|
if (!gst_bit_writer_put_bytes (&nw->bw, data, nbytes))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
nal_writer_put_ue (NalWriter * nw, guint32 value)
|
||||||
|
{
|
||||||
|
guint leading_zeros;
|
||||||
|
guint rest;
|
||||||
|
|
||||||
|
g_return_val_if_fail (nw != NULL, FALSE);
|
||||||
|
|
||||||
|
count_exp_golomb_bits (value, &leading_zeros, &rest);
|
||||||
|
|
||||||
|
/* write leading zeros */
|
||||||
|
if (leading_zeros) {
|
||||||
|
if (!nal_writer_put_bits_uint32 (nw, 0, leading_zeros))
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* write the rest */
|
||||||
|
if (!nal_writer_put_bits_uint32 (nw, value + 1, rest))
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
gboolean
|
||||||
|
count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest)
|
||||||
|
{
|
||||||
|
guint32 x;
|
||||||
|
guint count = 0;
|
||||||
|
|
||||||
|
/* https://en.wikipedia.org/wiki/Exponential-Golomb_coding */
|
||||||
|
/* count bits of value + 1 */
|
||||||
|
x = value + 1;
|
||||||
|
while (x) {
|
||||||
|
count++;
|
||||||
|
x >>= 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (leading_zeros) {
|
||||||
|
if (count > 1)
|
||||||
|
*leading_zeros = count - 1;
|
||||||
|
else
|
||||||
|
*leading_zeros = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (rest) {
|
||||||
|
*rest = count;
|
||||||
|
}
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
@@ -0,0 +1,269 @@
|
|||||||
|
/* Gstreamer
|
||||||
|
* Copyright (C) <2011> Intel Corporation
|
||||||
|
* Copyright (C) <2011> Collabora Ltd.
|
||||||
|
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
|
||||||
|
*
|
||||||
|
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
|
||||||
|
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
|
||||||
|
* Copyright (C) <2010> Collabora Multimedia
|
||||||
|
* Copyright (C) <2010> Nokia Corporation
|
||||||
|
*
|
||||||
|
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
|
||||||
|
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
|
||||||
|
*
|
||||||
|
* This library is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU Library General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This library is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* Library General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Library General Public
|
||||||
|
* License along with this library; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
|
||||||
|
* Boston, MA 02110-1301, USA.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Common code for NAL parsing from h264 and h265 parsers.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
# include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <gst/base/gstbytereader.h>
|
||||||
|
#include <gst/base/gstbitreader.h>
|
||||||
|
#include <gst/base/gstbitwriter.h>
|
||||||
|
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
const guint8 *data;
|
||||||
|
guint size;
|
||||||
|
|
||||||
|
guint n_epb; /* Number of emulation prevention bytes */
|
||||||
|
guint byte; /* Byte position */
|
||||||
|
guint bits_in_cache; /* bitpos in the cache of next bit */
|
||||||
|
guint8 first_byte;
|
||||||
|
guint32 epb_cache; /* cache 3 bytes to check emulation prevention bytes */
|
||||||
|
guint64 cache; /* cached bytes */
|
||||||
|
} NalReader;
|
||||||
|
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
GstBitWriter bw;
|
||||||
|
|
||||||
|
guint nal_prefix_size;
|
||||||
|
gboolean packetized;
|
||||||
|
} NalWriter;
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
void nal_reader_init (NalReader * nr, const guint8 * data, guint size);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_read (NalReader * nr, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_skip (NalReader * nr, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_skip_long (NalReader * nr, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
guint nal_reader_get_pos (const NalReader * nr);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
guint nal_reader_get_remaining (const NalReader * nr);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
guint nal_reader_get_epb_count (const NalReader * nr);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_is_byte_aligned (NalReader * nr);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_has_more_data (NalReader * nr);
|
||||||
|
|
||||||
|
#define NAL_READER_READ_BITS_H(bits) \
|
||||||
|
G_GNUC_INTERNAL \
|
||||||
|
gboolean nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits)
|
||||||
|
|
||||||
|
NAL_READER_READ_BITS_H (8);
|
||||||
|
NAL_READER_READ_BITS_H (16);
|
||||||
|
NAL_READER_READ_BITS_H (32);
|
||||||
|
|
||||||
|
#define NAL_READER_PEEK_BITS_H(bits) \
|
||||||
|
G_GNUC_INTERNAL \
|
||||||
|
gboolean nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits)
|
||||||
|
|
||||||
|
NAL_READER_PEEK_BITS_H (8);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_get_ue (NalReader * nr, guint32 * val);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_reader_get_se (NalReader * nr, gint32 * val);
|
||||||
|
|
||||||
|
#define CHECK_ALLOWED_MAX_WITH_DEBUG(dbg, val, max) { \
|
||||||
|
if (val > max) { \
|
||||||
|
GST_WARNING ("value for '" dbg "' greater than max. value: %d, max %d", \
|
||||||
|
val, max); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
#define CHECK_ALLOWED_MAX(val, max) \
|
||||||
|
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), val, max)
|
||||||
|
|
||||||
|
#define CHECK_ALLOWED_WITH_DEBUG(dbg, val, min, max) { \
|
||||||
|
if (val < min || val > max) { \
|
||||||
|
GST_WARNING ("value for '" dbg "' not in allowed range. value: %d, range %d-%d", \
|
||||||
|
val, min, max); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
#define CHECK_ALLOWED(val, min, max) \
|
||||||
|
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), val, min, max)
|
||||||
|
|
||||||
|
#define READ_UINT8(nr, val, nbits) { \
|
||||||
|
if (!nal_reader_get_bits_uint8 (nr, &val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to read uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UINT16(nr, val, nbits) { \
|
||||||
|
if (!nal_reader_get_bits_uint16 (nr, &val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to read uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UINT32(nr, val, nbits) { \
|
||||||
|
if (!nal_reader_get_bits_uint32 (nr, &val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UINT64(nr, val, nbits) { \
|
||||||
|
if (!nal_reader_get_bits_uint64 (nr, &val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UE(nr, val) { \
|
||||||
|
if (!nal_reader_get_ue (nr, &val)) { \
|
||||||
|
GST_WARNING ("failed to read UE for '" G_STRINGIFY (val) "'"); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UE_ALLOWED(nr, val, min, max) { \
|
||||||
|
guint32 tmp; \
|
||||||
|
READ_UE (nr, tmp); \
|
||||||
|
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
|
||||||
|
val = tmp; \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_UE_MAX(nr, val, max) { \
|
||||||
|
guint32 tmp; \
|
||||||
|
READ_UE (nr, tmp); \
|
||||||
|
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), tmp, max); \
|
||||||
|
val = tmp; \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_SE(nr, val) { \
|
||||||
|
if (!nal_reader_get_se (nr, &val)) { \
|
||||||
|
GST_WARNING ("failed to read SE for '" G_STRINGIFY (val) "'"); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define READ_SE_ALLOWED(nr, val, min, max) { \
|
||||||
|
gint32 tmp; \
|
||||||
|
READ_SE (nr, tmp); \
|
||||||
|
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
|
||||||
|
val = tmp; \
|
||||||
|
}
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gint scan_for_start_codes (const guint8 * data, guint size);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
void nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
void nal_writer_reset (NalWriter * nw);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_do_rbsp_trailing_bits (NalWriter * nw);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
GstMemory * nal_writer_reset_and_get_memory (NalWriter * nw);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
guint8 * nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean nal_writer_put_ue (NalWriter * nw, guint32 value);
|
||||||
|
|
||||||
|
G_GNUC_INTERNAL
|
||||||
|
gboolean count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest);
|
||||||
|
|
||||||
|
#define WRITE_UINT8(nw, val, nbits) { \
|
||||||
|
if (!nal_writer_put_bits_uint8 (nw, val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to write uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define WRITE_UINT16(nw, val, nbits) { \
|
||||||
|
if (!nal_writer_put_bits_uint16 (nw, val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to write uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define WRITE_UINT32(nw, val, nbits) { \
|
||||||
|
if (!nal_writer_put_bits_uint32 (nw, val, nbits)) { \
|
||||||
|
GST_WARNING ("failed to write uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define WRITE_BYTES(nw, data, nbytes) { \
|
||||||
|
if (!nal_writer_put_bytes (nw, data, nbytes)) { \
|
||||||
|
GST_WARNING ("failed to write bytes for '" G_STRINGIFY (val) "', nbits: %d", nbytes); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define WRITE_UE(nw, val) { \
|
||||||
|
if (!nal_writer_put_ue (nw, val)) { \
|
||||||
|
GST_WARNING ("failed to write ue for '" G_STRINGIFY (val) "'"); \
|
||||||
|
goto error; \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline guint32 div_ceil (guint32 a, guint32 b)
|
||||||
|
{
|
||||||
|
/* http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html */
|
||||||
|
g_assert (b > 0);
|
||||||
|
return a / b + (a % b > 0);
|
||||||
|
}
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
/* Stub for <gst/glib-compat-private.h>.
|
||||||
|
* In upstream GStreamer this provides backwards-compat shims for older
|
||||||
|
* GLib versions (g_memdup2 polyfill being the load-bearing one).
|
||||||
|
* Our gst_compat.h already defines g_memdup2 as a static inline, so
|
||||||
|
* we just include the shim.
|
||||||
|
*/
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
|
||||||
|
#include "gst_compat.h"
|
||||||
|
#endif
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
/* Stub for <gst/gst.h> — redirects to the project's gst_compat shim.
|
||||||
|
* The vendored GStreamer 1.28.2 H.265 parser was originally built against
|
||||||
|
* full GStreamer; we only need the GLib type aliases + memory helpers +
|
||||||
|
* macro stubs, all provided by gst_compat.h. Original gst.h would pull
|
||||||
|
* in GObject + GstObject + the entire framework, which we don't link.
|
||||||
|
*/
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
|
||||||
|
#include "gst_compat.h"
|
||||||
|
#endif
|
||||||
@@ -0,0 +1,145 @@
|
|||||||
|
/*
|
||||||
|
* gst_compat.c — GArray implementation for the vendored GStreamer parser.
|
||||||
|
*
|
||||||
|
* Scope: minimal subset of GArray API exercised by gsth265parser.c
|
||||||
|
* (g_array_new, g_array_sized_new, g_array_append_vals + the
|
||||||
|
* g_array_append_val macro, g_array_index macro, g_array_set_size,
|
||||||
|
* g_array_set_clear_func, g_array_free, g_array_unref).
|
||||||
|
*
|
||||||
|
* Non-thread-safe (matches GArray's documented semantics — GArray is
|
||||||
|
* not thread-safe in upstream GLib either, callers must serialize).
|
||||||
|
*
|
||||||
|
* License: MIT (matches backend's COPYING.MIT).
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "gst_compat.h"
|
||||||
|
|
||||||
|
/* ===== internal helpers ===== */
|
||||||
|
|
||||||
|
static gboolean
|
||||||
|
garray_grow(GArray *array, guint new_capacity)
|
||||||
|
{
|
||||||
|
if (new_capacity <= array->capacity)
|
||||||
|
return TRUE;
|
||||||
|
|
||||||
|
/* round up to next power of two for amortized O(1) growth */
|
||||||
|
guint cap = array->capacity > 0 ? array->capacity : 4;
|
||||||
|
while (cap < new_capacity)
|
||||||
|
cap *= 2;
|
||||||
|
|
||||||
|
char *new_data = realloc(array->data, (size_t)cap * array->element_size);
|
||||||
|
if (new_data == NULL)
|
||||||
|
return FALSE;
|
||||||
|
|
||||||
|
if (array->clear) {
|
||||||
|
memset(new_data + (size_t)array->capacity * array->element_size, 0,
|
||||||
|
(size_t)(cap - array->capacity) * array->element_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
array->data = new_data;
|
||||||
|
array->capacity = cap;
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ===== public API ===== */
|
||||||
|
|
||||||
|
GArray *
|
||||||
|
g_array_sized_new(gboolean zero_terminated, gboolean clear,
|
||||||
|
guint element_size, guint reserved_size)
|
||||||
|
{
|
||||||
|
/* zero_terminated is GLib-specific (appends a zero-element sentinel
|
||||||
|
* for trailing-NULL semantics). The vendored parser does not use it;
|
||||||
|
* we ignore the flag. */
|
||||||
|
(void)zero_terminated;
|
||||||
|
|
||||||
|
GArray *a = calloc(1, sizeof(GArray));
|
||||||
|
if (a == NULL)
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
a->element_size = element_size;
|
||||||
|
a->clear = clear;
|
||||||
|
|
||||||
|
if (reserved_size > 0) {
|
||||||
|
if (!garray_grow(a, reserved_size)) {
|
||||||
|
free(a);
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return a;
|
||||||
|
}
|
||||||
|
|
||||||
|
GArray *
|
||||||
|
g_array_new(gboolean zero_terminated, gboolean clear, guint element_size)
|
||||||
|
{
|
||||||
|
return g_array_sized_new(zero_terminated, clear, element_size, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
GArray *
|
||||||
|
g_array_set_size(GArray *array, guint length)
|
||||||
|
{
|
||||||
|
if (length > array->capacity) {
|
||||||
|
if (!garray_grow(array, length))
|
||||||
|
return array;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (array->clear_func != NULL && length < array->len) {
|
||||||
|
for (guint i = length; i < array->len; i++)
|
||||||
|
array->clear_func(array->data + (size_t)i * array->element_size);
|
||||||
|
}
|
||||||
|
if (array->clear && length > array->len) {
|
||||||
|
memset(array->data + (size_t)array->len * array->element_size, 0,
|
||||||
|
(size_t)(length - array->len) * array->element_size);
|
||||||
|
}
|
||||||
|
array->len = length;
|
||||||
|
return array;
|
||||||
|
}
|
||||||
|
|
||||||
|
GArray *
|
||||||
|
g_array_append_vals(GArray *array, gconstpointer data, guint len)
|
||||||
|
{
|
||||||
|
if (len == 0)
|
||||||
|
return array;
|
||||||
|
|
||||||
|
if (!garray_grow(array, array->len + len))
|
||||||
|
return array;
|
||||||
|
|
||||||
|
memcpy(array->data + (size_t)array->len * array->element_size,
|
||||||
|
data, (size_t)len * array->element_size);
|
||||||
|
array->len += len;
|
||||||
|
return array;
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer))
|
||||||
|
{
|
||||||
|
array->clear_func = clear_func;
|
||||||
|
}
|
||||||
|
|
||||||
|
gchar *
|
||||||
|
g_array_free(GArray *array, gboolean free_segment)
|
||||||
|
{
|
||||||
|
if (array == NULL)
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
if (array->clear_func != NULL) {
|
||||||
|
for (guint i = 0; i < array->len; i++)
|
||||||
|
array->clear_func(array->data + (size_t)i * array->element_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
gchar *data = NULL;
|
||||||
|
if (free_segment) {
|
||||||
|
free(array->data);
|
||||||
|
} else {
|
||||||
|
data = array->data;
|
||||||
|
}
|
||||||
|
free(array);
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
GArray *
|
||||||
|
g_array_unref(GArray *array)
|
||||||
|
{
|
||||||
|
/* simplified to free; the backend never sub-references shared GArrays */
|
||||||
|
g_array_free(array, TRUE);
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
@@ -0,0 +1,463 @@
|
|||||||
|
/*
|
||||||
|
* gst_compat.h — minimal GLib/GStreamer compatibility shim for vendored
|
||||||
|
* GStreamer 1.28.2 H.265 parser + bitreader + bytereader + nalutils.
|
||||||
|
*
|
||||||
|
* Strategy: provide #defines / typedefs for the GLib API surface those
|
||||||
|
* 4 vendored files use, so they can compile against libc + libv4l2 only
|
||||||
|
* (no glib2 / gst-base linkage). Vendored .c files are NOT modified
|
||||||
|
* directly; instead this header is force-included via the Makefile's
|
||||||
|
* `-include` flag on the vendored translation units.
|
||||||
|
*
|
||||||
|
* Coverage scoped to what gsth265parser.c + nalutils.c + gstbitreader.c
|
||||||
|
* + gstbytereader.c actually call. Surveyed in
|
||||||
|
* ampere-kernel-decoders phase4 step 2 prep — see
|
||||||
|
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md and the survey
|
||||||
|
* commit message for the empirical inventory.
|
||||||
|
*
|
||||||
|
* License: this shim is original work, MIT (matching the backend's
|
||||||
|
* COPYING.MIT). The vendored .c files keep their LGPL v2.1+ headers
|
||||||
|
* verbatim.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
|
||||||
|
|
||||||
|
#include <assert.h>
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include <stddef.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
/* ===== GLib type aliases ===== */
|
||||||
|
|
||||||
|
typedef bool gboolean;
|
||||||
|
typedef char gchar;
|
||||||
|
typedef unsigned char guchar;
|
||||||
|
typedef int gint;
|
||||||
|
typedef int8_t gint8;
|
||||||
|
typedef int16_t gint16;
|
||||||
|
typedef int32_t gint32;
|
||||||
|
typedef int64_t gint64;
|
||||||
|
typedef unsigned int guint;
|
||||||
|
typedef uint8_t guint8;
|
||||||
|
typedef uint16_t guint16;
|
||||||
|
typedef uint32_t guint32;
|
||||||
|
typedef uint64_t guint64;
|
||||||
|
typedef size_t gsize;
|
||||||
|
typedef ptrdiff_t gssize;
|
||||||
|
typedef void * gpointer;
|
||||||
|
typedef const void * gconstpointer;
|
||||||
|
typedef double gdouble;
|
||||||
|
typedef float gfloat;
|
||||||
|
|
||||||
|
/* GLib's gint64 / guint64 formatting is platform-conditional; for our
|
||||||
|
* aarch64 ALARM target we don't need the full G_*_FORMAT machinery, but
|
||||||
|
* gstbytereader uses G_GSIZE_FORMAT in a debug-only printf. */
|
||||||
|
#define G_GSIZE_FORMAT "zu"
|
||||||
|
|
||||||
|
#ifndef TRUE
|
||||||
|
# define TRUE true
|
||||||
|
#endif
|
||||||
|
#ifndef FALSE
|
||||||
|
# define FALSE false
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* ===== memory ===== */
|
||||||
|
|
||||||
|
#define g_malloc(n) malloc((size_t)(n))
|
||||||
|
#define g_malloc0(n) calloc(1, (size_t)(n))
|
||||||
|
#define g_realloc(p, n) realloc((p), (size_t)(n))
|
||||||
|
/* g_free needs to be addressable (passed as a function-pointer arg by
|
||||||
|
* nalutils.c::gst_memory_new_wrapped — even though that call site is
|
||||||
|
* dead code we don't invoke, it must compile). Plain `free` is
|
||||||
|
* compatible: signature is `void (void *)` either way. */
|
||||||
|
#define g_free free
|
||||||
|
#define g_new(type, n) ((type *)malloc(sizeof(type) * (size_t)(n)))
|
||||||
|
#define g_new0(type, n) ((type *)calloc((size_t)(n), sizeof(type)))
|
||||||
|
#define g_slice_new(type) ((type *)malloc(sizeof(type)))
|
||||||
|
#define g_slice_new0(type) ((type *)calloc(1, sizeof(type)))
|
||||||
|
#define g_slice_free(type, p) free(p)
|
||||||
|
#define g_slice_free1(size, p) free(p)
|
||||||
|
#define g_clear_pointer(pp, freefn) \
|
||||||
|
do { freefn(*(pp)); *(pp) = NULL; } while (0)
|
||||||
|
|
||||||
|
/* g_memdup2 — GLib's 64-bit-safe memdup, used by gstbytereader. */
|
||||||
|
static inline gpointer
|
||||||
|
g_memdup2(gconstpointer mem, gsize byte_size)
|
||||||
|
{
|
||||||
|
if (mem == NULL || byte_size == 0)
|
||||||
|
return NULL;
|
||||||
|
void *copy = malloc(byte_size);
|
||||||
|
if (copy != NULL)
|
||||||
|
memcpy(copy, mem, byte_size);
|
||||||
|
return copy;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* g_strcmp0 — NULL-safe strcmp. Used by gsth265parser in profile-name lookup. */
|
||||||
|
static inline int
|
||||||
|
g_strcmp0(const char *a, const char *b)
|
||||||
|
{
|
||||||
|
if (a == b) return 0;
|
||||||
|
if (a == NULL) return -1;
|
||||||
|
if (b == NULL) return 1;
|
||||||
|
return strcmp(a, b);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ===== asserts / return-guards =====
|
||||||
|
*
|
||||||
|
* Per ampere-kernel-decoders iter2 Phase 2 §"new failure modes" #5:
|
||||||
|
* g_assert must NOT abort the process. It becomes a no-op here;
|
||||||
|
* malformed bitstream is caught by the explicit parse-result returns
|
||||||
|
* the parser already implements.
|
||||||
|
*
|
||||||
|
* g_return_if_fail / g_return_val_if_fail propagate as the original
|
||||||
|
* GLib semantics (early return with optional value). */
|
||||||
|
|
||||||
|
#define g_assert(cond) ((void)0)
|
||||||
|
#define g_assert_not_reached() __builtin_unreachable()
|
||||||
|
#define g_return_if_fail(cond) do { if (!(cond)) return; } while (0)
|
||||||
|
#define g_return_val_if_fail(cond, v) do { if (!(cond)) return (v); } while (0)
|
||||||
|
|
||||||
|
/* ===== GStreamer logging — no-ops =====
|
||||||
|
*
|
||||||
|
* The parser is heavy on debug logging. We compile all of it out;
|
||||||
|
* the backend's own logging (request_log/error_log) wraps the parser
|
||||||
|
* calls and reports parse-failure return codes from there. */
|
||||||
|
|
||||||
|
#define GST_DISABLE_GST_DEBUG 1
|
||||||
|
|
||||||
|
#define GST_DEBUG_CATEGORY_STATIC(name)
|
||||||
|
#define GST_DEBUG_CATEGORY_INIT(...) ((void)0)
|
||||||
|
#define GST_DEBUG_CATEGORY_GET(...) ((void)0)
|
||||||
|
#define GST_DEBUG(...) ((void)0)
|
||||||
|
#define GST_INFO(...) ((void)0)
|
||||||
|
#define GST_WARNING(...) ((void)0)
|
||||||
|
#define GST_ERROR(...) ((void)0)
|
||||||
|
#define GST_LOG(...) ((void)0)
|
||||||
|
#define GST_FIXME(...) ((void)0)
|
||||||
|
#define GST_MEMDUMP(...) ((void)0)
|
||||||
|
#define GST_CAT_DEFAULT (NULL)
|
||||||
|
|
||||||
|
/* ===== compiler / language helpers ===== */
|
||||||
|
|
||||||
|
#define G_LIKELY(x) __builtin_expect(!!(x), 1)
|
||||||
|
#define G_UNLIKELY(x) __builtin_expect(!!(x), 0)
|
||||||
|
#define G_GNUC_UNUSED __attribute__((unused))
|
||||||
|
#define G_GNUC_INTERNAL
|
||||||
|
#define G_GNUC_MALLOC __attribute__((malloc))
|
||||||
|
#define G_GNUC_NORETURN __attribute__((noreturn))
|
||||||
|
#define G_GNUC_DEPRECATED
|
||||||
|
#define G_GNUC_DEPRECATED_FOR(x)
|
||||||
|
#define G_GNUC_PURE __attribute__((pure))
|
||||||
|
#define G_GNUC_CONST __attribute__((const))
|
||||||
|
#define G_GNUC_PRINTF(a, b) __attribute__((format(printf, a, b)))
|
||||||
|
#define G_BEGIN_DECLS
|
||||||
|
#define G_END_DECLS
|
||||||
|
#define G_N_ELEMENTS(arr) (sizeof(arr) / sizeof((arr)[0]))
|
||||||
|
#define G_STMT_START do
|
||||||
|
#define G_STMT_END while (0)
|
||||||
|
#define G_STRINGIFY(x) G_STRINGIFY_(x)
|
||||||
|
#define G_STRINGIFY_(x) #x
|
||||||
|
|
||||||
|
/* GStreamer ABI-padding slot count; upstream uses 4 reserved gpointers
|
||||||
|
* at the end of public structs for future ABI extension. We replicate
|
||||||
|
* the size so struct layout matches what gst_byte_reader_init / friends
|
||||||
|
* write into. */
|
||||||
|
#define GST_PADDING 4
|
||||||
|
#define GST_PADDING_LARGE 20
|
||||||
|
|
||||||
|
/* Public-symbol visibility — backend's shared module uses
|
||||||
|
* -fvisibility=hidden, so we don't need to mark anything public from
|
||||||
|
* within the vendored parser. The original GST_*_API macros expand to
|
||||||
|
* extern + dllimport on Windows; on Linux ELF builds where
|
||||||
|
* fvisibility=hidden is active, they would mark public symbols. The
|
||||||
|
* vendored functions are never called from outside h265_parser/, so
|
||||||
|
* leaving these empty hides them automatically. */
|
||||||
|
#define GST_API
|
||||||
|
#define GST_API_EXPORT extern
|
||||||
|
#define GST_API_IMPORT extern
|
||||||
|
|
||||||
|
/* ===== Opaque GStreamer pipeline types =====
|
||||||
|
*
|
||||||
|
* GstBuffer + GstMemory are referenced by encoder-side dead-code
|
||||||
|
* functions in gsth265parser.c (gst_h265_parser_insert_sei_hevc).
|
||||||
|
* We never call those; declaring them as opaque structs lets the
|
||||||
|
* function pointers / declarations compile, and the linker keeps the
|
||||||
|
* dead-code .text section even though it's unreachable.
|
||||||
|
*
|
||||||
|
* If you ever need to actually USE GstBuffer in this tree, replace
|
||||||
|
* these opaque decls with the project's own buffer abstraction; do not
|
||||||
|
* try to vendor in libgst itself. */
|
||||||
|
|
||||||
|
typedef struct _GstBuffer GstBuffer;
|
||||||
|
typedef struct _GstMemory GstMemory;
|
||||||
|
typedef struct _GstMapInfo GstMapInfo; /* opaque — dead-code in gsth265parser SEI insert */
|
||||||
|
|
||||||
|
/* GLib min/max constants — dead-code unsigned-overflow guards in
|
||||||
|
* gsth265parser.c. */
|
||||||
|
#define G_MAXUINT8 ((guint8)0xFF)
|
||||||
|
#define G_MAXUINT16 ((guint16)0xFFFF)
|
||||||
|
#define G_MAXUINT32 ((guint32)0xFFFFFFFFU)
|
||||||
|
#define G_MAXUINT64 ((guint64)0xFFFFFFFFFFFFFFFFULL)
|
||||||
|
#define G_MAXINT8 ((gint8)0x7F)
|
||||||
|
#define G_MAXINT16 ((gint16)0x7FFF)
|
||||||
|
#define G_MAXINT32 ((gint32)0x7FFFFFFF)
|
||||||
|
#define G_MAXINT64 ((gint64)0x7FFFFFFFFFFFFFFFLL)
|
||||||
|
#define G_MININT8 ((gint8)(-0x80))
|
||||||
|
#define G_MININT16 ((gint16)(-0x8000))
|
||||||
|
#define G_MININT32 ((gint32)(-0x80000000))
|
||||||
|
#define G_MAXSIZE ((gsize)-1)
|
||||||
|
|
||||||
|
/* GLib function-pointer typedefs used by g_list_* APIs (which our
|
||||||
|
* gst_compat declares as abort-stubs). They show up in code paths
|
||||||
|
* we never invoke but must compile. */
|
||||||
|
typedef void (*GDestroyNotify)(gpointer data);
|
||||||
|
typedef int (*GCompareFunc)(gconstpointer a, gconstpointer b);
|
||||||
|
typedef int (*GCompareDataFunc)(gconstpointer a, gconstpointer b, gpointer user_data);
|
||||||
|
|
||||||
|
/* GstMapFlags — passed to gst_memory_map / gst_buffer_map. Dead-code. */
|
||||||
|
#define GST_MAP_READ (1 << 0)
|
||||||
|
#define GST_MAP_WRITE (1 << 1)
|
||||||
|
#define GST_MAP_READWRITE (GST_MAP_READ | GST_MAP_WRITE)
|
||||||
|
|
||||||
|
/* Dead-code stubs for buffer / memory mapping (only referenced by
|
||||||
|
* gst_h265_parser_insert_sei_hevc which we never call). The compile
|
||||||
|
* needs declarations + addressable functions; abort on call. */
|
||||||
|
static inline gboolean
|
||||||
|
gst_memory_map(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
|
||||||
|
int flags G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_memory_unmap(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline gboolean
|
||||||
|
gst_buffer_map(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
|
||||||
|
int flags G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_buffer_unmap(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline GstBuffer *
|
||||||
|
gst_buffer_new(void) { abort(); }
|
||||||
|
static inline gboolean
|
||||||
|
gst_buffer_copy_into(GstBuffer *dst G_GNUC_UNUSED, GstBuffer *src G_GNUC_UNUSED,
|
||||||
|
int flags G_GNUC_UNUSED, gsize offset G_GNUC_UNUSED,
|
||||||
|
gssize size G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_buffer_append_memory(GstBuffer *buf G_GNUC_UNUSED, GstMemory *mem G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline GstMemory *
|
||||||
|
gst_memory_ref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_memory_unref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline GstMemory *
|
||||||
|
gst_memory_copy(GstMemory *mem G_GNUC_UNUSED, gssize offset G_GNUC_UNUSED, gssize size G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void
|
||||||
|
gst_clear_buffer(GstBuffer **buf) { *buf = NULL; }
|
||||||
|
#define GST_IS_BUFFER(b) (false)
|
||||||
|
|
||||||
|
/* GstBufferCopyFlags — used only by gst_buffer_copy_into in dead code. */
|
||||||
|
#define GST_BUFFER_COPY_METADATA (1 << 0)
|
||||||
|
#define GST_BUFFER_COPY_MEMORY (1 << 1)
|
||||||
|
#define GST_BUFFER_COPY_DEEP (1 << 2)
|
||||||
|
|
||||||
|
/* gst_util_ceil_log2(n) — ceil(log2(n)) for non-zero unsigned n.
|
||||||
|
* Used by gsth265parser.c::gst_h265_slice_parse_ref_pic_list_modification.
|
||||||
|
* That function is in the slice-header parser which the libva backend
|
||||||
|
* does NOT invoke (we only call parse_sps) — but the linker still
|
||||||
|
* needs a definition. Provide a real impl: cheaper to compute than to
|
||||||
|
* justify a dead-code stub at every call site. */
|
||||||
|
static inline guint
|
||||||
|
gst_util_ceil_log2(guint32 n)
|
||||||
|
{
|
||||||
|
if (n <= 1) return 0;
|
||||||
|
/* __builtin_clz returns leading zeros for a 32-bit value;
|
||||||
|
* 32 - clz(n-1) = bits needed = ceil(log2(n)). */
|
||||||
|
return 32 - (guint)__builtin_clz(n - 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* GstMapInfo's real definition is in <gst/gstmemory.h>; we need at
|
||||||
|
* least enough to make `info->data` / `info->size` compile. */
|
||||||
|
struct _GstMapInfo {
|
||||||
|
GstMemory *memory;
|
||||||
|
int flags;
|
||||||
|
guint8 *data;
|
||||||
|
gsize size;
|
||||||
|
gsize maxsize;
|
||||||
|
gpointer user_data[4];
|
||||||
|
gpointer _gst_reserved[GST_PADDING];
|
||||||
|
};
|
||||||
|
|
||||||
|
/* gst_memory_new_wrapped — dead-code stub (nalutils.c calls it from
|
||||||
|
* the SEI-insertion path the libva backend never invokes). */
|
||||||
|
static inline GstMemory *
|
||||||
|
gst_memory_new_wrapped(int flags, gpointer data, gsize maxsize,
|
||||||
|
gsize offset, gsize size, gpointer user_data,
|
||||||
|
void (*notify)(gpointer))
|
||||||
|
{
|
||||||
|
(void)flags; (void)data; (void)maxsize; (void)offset; (void)size;
|
||||||
|
(void)user_data; (void)notify;
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ===== byte-order read / write macros =====
|
||||||
|
*
|
||||||
|
* GStreamer provides these as static-inline functions in
|
||||||
|
* <gst/gstutils.h>. We re-implement for aarch64 little-endian; the
|
||||||
|
* parser is byte-stream input, so endian-conversion is mechanical.
|
||||||
|
* The float / double variants are present in upstream but the parser
|
||||||
|
* never invokes them — provide stubs so the address-taking sites in
|
||||||
|
* gstbytereader.h's function table compile. */
|
||||||
|
|
||||||
|
#define GST_READ_UINT8(data) \
|
||||||
|
(*((const guint8 *)(data)))
|
||||||
|
|
||||||
|
#define GST_READ_UINT16_LE(data) ( \
|
||||||
|
((guint16)((const guint8 *)(data))[0]) | \
|
||||||
|
((guint16)((const guint8 *)(data))[1] << 8))
|
||||||
|
|
||||||
|
#define GST_READ_UINT16_BE(data) ( \
|
||||||
|
((guint16)((const guint8 *)(data))[0] << 8) | \
|
||||||
|
((guint16)((const guint8 *)(data))[1]))
|
||||||
|
|
||||||
|
#define GST_READ_UINT24_LE(data) ( \
|
||||||
|
((guint32)((const guint8 *)(data))[0]) | \
|
||||||
|
((guint32)((const guint8 *)(data))[1] << 8) | \
|
||||||
|
((guint32)((const guint8 *)(data))[2] << 16))
|
||||||
|
|
||||||
|
#define GST_READ_UINT24_BE(data) ( \
|
||||||
|
((guint32)((const guint8 *)(data))[0] << 16) | \
|
||||||
|
((guint32)((const guint8 *)(data))[1] << 8) | \
|
||||||
|
((guint32)((const guint8 *)(data))[2]))
|
||||||
|
|
||||||
|
#define GST_READ_UINT32_LE(data) ( \
|
||||||
|
((guint32)((const guint8 *)(data))[0]) | \
|
||||||
|
((guint32)((const guint8 *)(data))[1] << 8) | \
|
||||||
|
((guint32)((const guint8 *)(data))[2] << 16) | \
|
||||||
|
((guint32)((const guint8 *)(data))[3] << 24))
|
||||||
|
|
||||||
|
#define GST_READ_UINT32_BE(data) ( \
|
||||||
|
((guint32)((const guint8 *)(data))[0] << 24) | \
|
||||||
|
((guint32)((const guint8 *)(data))[1] << 16) | \
|
||||||
|
((guint32)((const guint8 *)(data))[2] << 8) | \
|
||||||
|
((guint32)((const guint8 *)(data))[3]))
|
||||||
|
|
||||||
|
#define GST_READ_UINT64_LE(data) ( \
|
||||||
|
((guint64)((const guint8 *)(data))[0]) | \
|
||||||
|
((guint64)((const guint8 *)(data))[1] << 8) | \
|
||||||
|
((guint64)((const guint8 *)(data))[2] << 16) | \
|
||||||
|
((guint64)((const guint8 *)(data))[3] << 24) | \
|
||||||
|
((guint64)((const guint8 *)(data))[4] << 32) | \
|
||||||
|
((guint64)((const guint8 *)(data))[5] << 40) | \
|
||||||
|
((guint64)((const guint8 *)(data))[6] << 48) | \
|
||||||
|
((guint64)((const guint8 *)(data))[7] << 56))
|
||||||
|
|
||||||
|
#define GST_READ_UINT64_BE(data) ( \
|
||||||
|
((guint64)((const guint8 *)(data))[0] << 56) | \
|
||||||
|
((guint64)((const guint8 *)(data))[1] << 48) | \
|
||||||
|
((guint64)((const guint8 *)(data))[2] << 40) | \
|
||||||
|
((guint64)((const guint8 *)(data))[3] << 32) | \
|
||||||
|
((guint64)((const guint8 *)(data))[4] << 24) | \
|
||||||
|
((guint64)((const guint8 *)(data))[5] << 16) | \
|
||||||
|
((guint64)((const guint8 *)(data))[6] << 8) | \
|
||||||
|
((guint64)((const guint8 *)(data))[7]))
|
||||||
|
|
||||||
|
/* Float / double readers — dead-code, abort if called. The function
|
||||||
|
* table in gstbytereader.h takes the address of the underlying inline
|
||||||
|
* which we don't need to be functional, only addressable. */
|
||||||
|
static inline gfloat
|
||||||
|
GST_READ_FLOAT_LE(const guint8 *data) { (void)data; abort(); }
|
||||||
|
static inline gfloat
|
||||||
|
GST_READ_FLOAT_BE(const guint8 *data) { (void)data; abort(); }
|
||||||
|
static inline gdouble
|
||||||
|
GST_READ_DOUBLE_LE(const guint8 *data) { (void)data; abort(); }
|
||||||
|
static inline gdouble
|
||||||
|
GST_READ_DOUBLE_BE(const guint8 *data) { (void)data; abort(); }
|
||||||
|
|
||||||
|
/* Write side — nalutils.c writes-out SEI bytes (dead path for us but
|
||||||
|
* must compile). */
|
||||||
|
#define GST_WRITE_UINT8(data, val) do { \
|
||||||
|
((guint8 *)(data))[0] = (guint8)(val); \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
#define GST_WRITE_UINT16_BE(data, val) do { \
|
||||||
|
((guint8 *)(data))[0] = (guint8)((val) >> 8); \
|
||||||
|
((guint8 *)(data))[1] = (guint8)((val)); \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
#define GST_WRITE_UINT24_BE(data, val) do { \
|
||||||
|
((guint8 *)(data))[0] = (guint8)((val) >> 16); \
|
||||||
|
((guint8 *)(data))[1] = (guint8)((val) >> 8); \
|
||||||
|
((guint8 *)(data))[2] = (guint8)((val)); \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
#define GST_WRITE_UINT32_BE(data, val) do { \
|
||||||
|
((guint8 *)(data))[0] = (guint8)((val) >> 24); \
|
||||||
|
((guint8 *)(data))[1] = (guint8)((val) >> 16); \
|
||||||
|
((guint8 *)(data))[2] = (guint8)((val) >> 8); \
|
||||||
|
((guint8 *)(data))[3] = (guint8)((val)); \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
#ifndef MIN
|
||||||
|
# define MIN(a, b) ((a) < (b) ? (a) : (b))
|
||||||
|
#endif
|
||||||
|
#ifndef MAX
|
||||||
|
# define MAX(a, b) ((a) > (b) ? (a) : (b))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* ===== GArray ===== */
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
char *data; /* exposed via g_array_index / GArray->data */
|
||||||
|
guint len; /* element count */
|
||||||
|
guint capacity; /* allocated element slots */
|
||||||
|
guint element_size;
|
||||||
|
gboolean clear; /* zero-fill on grow */
|
||||||
|
void (*clear_func)(gpointer);
|
||||||
|
} GArray;
|
||||||
|
|
||||||
|
GArray *g_array_new(gboolean zero_terminated, gboolean clear, guint element_size);
|
||||||
|
GArray *g_array_sized_new(gboolean zero_terminated, gboolean clear,
|
||||||
|
guint element_size, guint reserved_size);
|
||||||
|
GArray *g_array_set_size(GArray *array, guint length);
|
||||||
|
GArray *g_array_append_vals(GArray *array, gconstpointer data, guint len);
|
||||||
|
void g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer));
|
||||||
|
gchar *g_array_free(GArray *array, gboolean free_segment);
|
||||||
|
GArray *g_array_unref(GArray *array);
|
||||||
|
|
||||||
|
#define g_array_append_val(a, v) g_array_append_vals((a), &(v), 1)
|
||||||
|
#define g_array_index(a, t, i) (((t *)(void *)(a)->data)[i])
|
||||||
|
|
||||||
|
/* ===== GList — stubs that abort if reached =====
|
||||||
|
*
|
||||||
|
* Surveyed call sites: gsth265parser.c uses g_list_prepend / g_list_sort /
|
||||||
|
* g_list_free_full in code paths the libva backend does not invoke for
|
||||||
|
* basic SPS parsing (likely SEI message accumulation). Stub to abort so
|
||||||
|
* any future call surfaces immediately rather than silently corrupting. */
|
||||||
|
|
||||||
|
/* GList — full struct (not opaque) so callers can do `list->data`.
|
||||||
|
* The functions still abort because we never construct a GList. */
|
||||||
|
typedef struct _GList GList;
|
||||||
|
struct _GList {
|
||||||
|
gpointer data;
|
||||||
|
GList *next;
|
||||||
|
GList *prev;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline GList *g_list_prepend(GList *list G_GNUC_UNUSED, gpointer data G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline GList *g_list_sort(GList *list G_GNUC_UNUSED, int (*cmp)(gconstpointer, gconstpointer) G_GNUC_UNUSED) { abort(); }
|
||||||
|
static inline void g_list_free_full(GList *list G_GNUC_UNUSED, void (*free_func)(gpointer) G_GNUC_UNUSED) { abort(); }
|
||||||
|
|
||||||
|
/* ===== g_once_init_enter / g_once_init_leave =====
|
||||||
|
*
|
||||||
|
* GLib's lazy-init guards. The parser uses these for one-shot static
|
||||||
|
* initialization (e.g. profile-name table). Our backend is single-
|
||||||
|
* threaded at the parser-init site (driver_init), so we can simplify
|
||||||
|
* to a plain run-once gate. */
|
||||||
|
|
||||||
|
#define g_once_init_enter(loc) (*(loc) == 0)
|
||||||
|
#define g_once_init_leave(loc, val) (*(loc) = (val))
|
||||||
|
|
||||||
|
/* ===== conversions ===== */
|
||||||
|
|
||||||
|
#define GINT_TO_POINTER(i) ((gpointer)(uintptr_t)(gint)(i))
|
||||||
|
#define GPOINTER_TO_INT(p) ((gint)(uintptr_t)(p))
|
||||||
|
|
||||||
|
#endif /* LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H */
|
||||||
@@ -0,0 +1,90 @@
|
|||||||
|
/*
|
||||||
|
* v4l2-hevc-ext-controls.h — verbatim mirror of Linux 7.0+ V4L2 stateless
|
||||||
|
* HEVC extended-SPS RPS control definitions, shipped as an internal
|
||||||
|
* header so this libva backend can be built against pre-7.0
|
||||||
|
* linux-api-headers packages (currently ampere ships 6.19-1).
|
||||||
|
*
|
||||||
|
* Upstream source: linux kernel, include/uapi/linux/v4l2-controls.h
|
||||||
|
* As-of: Linux 7.0-rc3 (Detlev Casanova / Collabora "VDPU381/VDPU383"
|
||||||
|
* series, see lkml.org/lkml/2026/1/9/1334). The two CIDs + two structs
|
||||||
|
* + two flag macros below are byte-for-byte the kernel UAPI definitions.
|
||||||
|
*
|
||||||
|
* Once linux-api-headers >= 7.0 is the floor across the fleet, this
|
||||||
|
* shim becomes redundant — `<linux/v4l2-controls.h>` will provide the
|
||||||
|
* same symbols. The include order in h265.c is: this header BEFORE
|
||||||
|
* <linux/v4l2-controls.h>, so when the system catches up, the macro
|
||||||
|
* guards below silently no-op and we use the system definitions.
|
||||||
|
*
|
||||||
|
* License: MIT (matches backend's COPYING.MIT). Per LGPL § 3.b., the
|
||||||
|
* kernel UAPI struct definitions themselves are excepted from the
|
||||||
|
* kernel's overall GPL and may be copied verbatim into userspace
|
||||||
|
* binaries without inheriting GPL.
|
||||||
|
*
|
||||||
|
* Rationale + iter2 plan: see
|
||||||
|
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md (§Step 3)
|
||||||
|
* ~/src/ampere-kernel-decoders/phase0_findings_iter2.md
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
|
||||||
|
#define LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
|
||||||
|
|
||||||
|
#include <linux/types.h>
|
||||||
|
#include <linux/v4l2-controls.h>
|
||||||
|
|
||||||
|
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
|
||||||
|
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS \
|
||||||
|
(V4L2_CID_CODEC_STATELESS_BASE + 408)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS
|
||||||
|
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS \
|
||||||
|
(V4L2_CID_CODEC_STATELESS_BASE + 409)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
|
||||||
|
# define V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED 0x1
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
|
||||||
|
# define V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT 0x1
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* struct v4l2_ctrl_hevc_ext_sps_st_rps — HEVC short-term RPS parameters.
|
||||||
|
*
|
||||||
|
* Dynamic-size 1-dimension array. Number of elements is
|
||||||
|
* v4l2_ctrl_hevc_sps::num_short_term_ref_pic_sets
|
||||||
|
* Can contain up to 65 elements (the H.265 spec § 7.4.3.2.1 maximum).
|
||||||
|
*/
|
||||||
|
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED
|
||||||
|
# define V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED 1
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_st_rps {
|
||||||
|
__u8 delta_idx_minus1;
|
||||||
|
__u8 delta_rps_sign;
|
||||||
|
__u8 num_negative_pics;
|
||||||
|
__u8 num_positive_pics;
|
||||||
|
__u32 used_by_curr_pic;
|
||||||
|
__u32 use_delta_flag;
|
||||||
|
__u16 abs_delta_rps_minus1;
|
||||||
|
__u16 delta_poc_s0_minus1[16];
|
||||||
|
__u16 delta_poc_s1_minus1[16];
|
||||||
|
__u16 flags;
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* struct v4l2_ctrl_hevc_ext_sps_lt_rps — HEVC long-term RPS parameters.
|
||||||
|
*
|
||||||
|
* Dynamic-size 1-dimension array. Number of elements is
|
||||||
|
* v4l2_ctrl_hevc_sps::num_long_term_ref_pics_sps
|
||||||
|
* Can contain up to 33 elements (the H.265 spec § 7.4.3.2.1 maximum).
|
||||||
|
*/
|
||||||
|
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED
|
||||||
|
# define V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED 1
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_lt_rps {
|
||||||
|
__u16 lt_ref_pic_poc_lsb_sps;
|
||||||
|
__u16 flags;
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#endif /* LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H */
|
||||||
+172
-28
@@ -39,6 +39,8 @@
|
|||||||
|
|
||||||
#include <linux/dma-buf.h>
|
#include <linux/dma-buf.h>
|
||||||
|
|
||||||
|
#include "nv15.h"
|
||||||
|
#include "nv12_col128.h"
|
||||||
#include "tiled_yuv.h"
|
#include "tiled_yuv.h"
|
||||||
#include "utils.h"
|
#include "utils.h"
|
||||||
#include "v4l2.h"
|
#include "v4l2.h"
|
||||||
@@ -86,13 +88,50 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
|
|||||||
for (i = 0; i < planes_count; i++)
|
for (i = 0; i < planes_count; i++)
|
||||||
size += destination_sizes[i];
|
size += destination_sizes[i];
|
||||||
|
|
||||||
/* Here we calculate the sizes assuming NV12. */
|
if (format->fourcc == VA_FOURCC_P010) {
|
||||||
|
/*
|
||||||
|
* iter39: P010 image overrides V4L2-side NV15 sizing. The
|
||||||
|
* source is the kernel-reported NV15 packed plane; the image
|
||||||
|
* buffer holds dense P010 (2 bytes per pixel, 16bpp).
|
||||||
|
* Recompute sizes/pitches against P010 layout so consumers
|
||||||
|
* (vaGetImage, vaDeriveImage) see standard P010 geometry.
|
||||||
|
*/
|
||||||
|
destination_bytesperlines[0] = width * 2;
|
||||||
|
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||||
|
for (i = 1; i < destination_planes_count; i++) {
|
||||||
|
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||||
|
destination_sizes[i] = destination_sizes[0] / 2;
|
||||||
|
}
|
||||||
|
size = 0;
|
||||||
|
for (i = 0; i < destination_planes_count; i++)
|
||||||
|
size += destination_sizes[i];
|
||||||
|
} else if (format->fourcc == VA_FOURCC_NV12 &&
|
||||||
|
video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
|
||||||
|
/*
|
||||||
|
* iter40 Phase 5 review F2: NC12 source, NV12 image output.
|
||||||
|
* V4L2-reported destination_bytesperlines[0] is the NC12
|
||||||
|
* column stride (= ALIGN(height,8) * 3/2 — e.g. 1080 for
|
||||||
|
* 1280×720), NOT the linear NV12 Y stride. Override to the
|
||||||
|
* linear stride (width) so VAImage pitches reflect the
|
||||||
|
* detile-output layout the consumer reads.
|
||||||
|
*/
|
||||||
|
destination_bytesperlines[0] = width;
|
||||||
|
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||||
|
for (i = 1; i < destination_planes_count; i++) {
|
||||||
|
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||||
|
destination_sizes[i] = destination_sizes[0] / 2;
|
||||||
|
}
|
||||||
|
size = 0;
|
||||||
|
for (i = 0; i < destination_planes_count; i++)
|
||||||
|
size += destination_sizes[i];
|
||||||
|
} else {
|
||||||
|
/* NV12: V4L2 stride is correct, sizes derived from height. */
|
||||||
|
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||||
|
|
||||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
for (i = 1; i < destination_planes_count; i++) {
|
||||||
|
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||||
for (i = 1; i < destination_planes_count; i++) {
|
destination_sizes[i] = destination_sizes[0] / 2;
|
||||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
}
|
||||||
destination_sizes[i] = destination_sizes[0] / 2;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
id = object_heap_allocate(&driver_data->image_heap);
|
id = object_heap_allocate(&driver_data->image_heap);
|
||||||
@@ -217,19 +256,90 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
|
|||||||
}
|
}
|
||||||
|
|
||||||
for (i = 0; i < surface_object->destination_planes_count; i++) {
|
for (i = 0; i < surface_object->destination_planes_count; i++) {
|
||||||
#ifdef __arm__
|
/*
|
||||||
|
* iter40 Phase 5 review F1: guard extended from __arm__ to
|
||||||
|
* __arm__ || __aarch64__. Without this, the detile primitives
|
||||||
|
* silently compiled out on aarch64 (fresnel RK3399, ampere
|
||||||
|
* RK3588, higgs Pi CM5) and the memcpy fall-through delivered
|
||||||
|
* raw tiled bytes to NV12/P010 image consumers. iter39 5/5
|
||||||
|
* PASS masked the issue because no 10-bit path was exercised.
|
||||||
|
*/
|
||||||
|
#if defined(__arm__) || defined(__aarch64__)
|
||||||
|
/*
|
||||||
|
* Sunxi tiled_to_planar lives in tiled_yuv.S which is
|
||||||
|
* #ifdef __arm__ — symbol absent on aarch64. Keep this
|
||||||
|
* branch arm-only; aarch64 Sunxi support would need a C or
|
||||||
|
* aarch64-ASM port (no Sunxi aarch64 board in current fleet).
|
||||||
|
*/
|
||||||
|
#if defined(__arm__)
|
||||||
if (!video_format_is_linear(driver_data->video_format))
|
if (!video_format_is_linear(driver_data->video_format))
|
||||||
tiled_to_planar(surface_object->destination_data[i],
|
tiled_to_planar(surface_object->destination_data[i],
|
||||||
buffer_object->data + image->offsets[i],
|
buffer_object->data + image->offsets[i],
|
||||||
image->pitches[i], image->width,
|
image->pitches[i], image->width,
|
||||||
i == 0 ? image->height :
|
i == 0 ? image->height :
|
||||||
image->height / 2);
|
image->height / 2);
|
||||||
else {
|
else
|
||||||
|
#endif
|
||||||
|
if (driver_data->is_10bit &&
|
||||||
|
image->format.fourcc == VA_FOURCC_P010) {
|
||||||
|
/*
|
||||||
|
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
|
||||||
|
* bytes); the VA image buffer is dense P010 (2B/pixel,
|
||||||
|
* value in bits[15:6]). Source stride is the V4L2-
|
||||||
|
* reported NV15 bytesperline (= ceil(width/4)*5,
|
||||||
|
* possibly aligned higher by the kernel); destination
|
||||||
|
* stride is image->pitches[i] = width * 2.
|
||||||
|
*/
|
||||||
|
unsigned int plane_h = (i == 0) ? image->height
|
||||||
|
: image->height / 2;
|
||||||
|
nv15_unpack_plane_to_p010(
|
||||||
|
surface_object->destination_data[i],
|
||||||
|
(uint16_t *)(buffer_object->data + image->offsets[i]),
|
||||||
|
image->width, plane_h,
|
||||||
|
surface_object->destination_bytesperlines[i]);
|
||||||
|
} else if (driver_data->video_format != NULL &&
|
||||||
|
driver_data->video_format->v4l2_format ==
|
||||||
|
V4L2_PIX_FMT_NV12_COL128 &&
|
||||||
|
image->format.fourcc == VA_FOURCC_NV12) {
|
||||||
|
/*
|
||||||
|
* iter40: Pi 5 rpi-hevc-dec emits NV12_COL128 (SAND
|
||||||
|
* 128-pixel-wide column tiles). Detile to linear NV12
|
||||||
|
* via the per-plane primitive. surface_object->
|
||||||
|
* destination_data[i] is the V4L2 CAPTURE mmap (single
|
||||||
|
* buffer, planes_count==2): i==0 is the Y plane base,
|
||||||
|
* i==1 is the UV plane base offset within the SAME
|
||||||
|
* physical buffer (per cap_pool plane[1] offset = Y
|
||||||
|
* plane size in COL128 layout).
|
||||||
|
*
|
||||||
|
* src_col_stride = destination_bytesperlines[i] = the
|
||||||
|
* kernel-reported NC12 bytesperline (column stride,
|
||||||
|
* = ALIGN(image_h, 8) * 3/2). Same for both planes
|
||||||
|
* since column geometry is plane-agnostic.
|
||||||
|
*
|
||||||
|
* dst stride is image->pitches[i] = image->width
|
||||||
|
* (overridden in RequestCreateImage NC12 branch below).
|
||||||
|
*/
|
||||||
|
if (i == 0) {
|
||||||
|
nv12_col128_detile_y(
|
||||||
|
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||||
|
image->pitches[i],
|
||||||
|
surface_object->destination_data[i],
|
||||||
|
surface_object->destination_bytesperlines[i],
|
||||||
|
image->width, image->height);
|
||||||
|
} else {
|
||||||
|
nv12_col128_detile_uv(
|
||||||
|
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||||
|
image->pitches[i],
|
||||||
|
surface_object->destination_data[i],
|
||||||
|
surface_object->destination_bytesperlines[i],
|
||||||
|
image->width, image->height / 2);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
#endif
|
#endif
|
||||||
memcpy(buffer_object->data + image->offsets[i],
|
memcpy(buffer_object->data + image->offsets[i],
|
||||||
surface_object->destination_data[i],
|
surface_object->destination_data[i],
|
||||||
surface_object->destination_sizes[i]);
|
surface_object->destination_sizes[i]);
|
||||||
#ifdef __arm__
|
#if defined(__arm__) || defined(__aarch64__)
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
@@ -268,9 +378,17 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
|
|||||||
|
|
||||||
/* Fully populate VAImageFormat to match QueryImageFormats output. */
|
/* Fully populate VAImageFormat to match QueryImageFormats output. */
|
||||||
memset(&format, 0, sizeof(format));
|
memset(&format, 0, sizeof(format));
|
||||||
format.fourcc = VA_FOURCC_NV12;
|
if (driver_data->is_10bit) {
|
||||||
format.byte_order = VA_LSB_FIRST;
|
/* iter39: 10-bit session derives a P010 image. NV15-source
|
||||||
format.bits_per_pixel = 12;
|
* unpack happens in copy_surface_to_image. */
|
||||||
|
format.fourcc = VA_FOURCC_P010;
|
||||||
|
format.byte_order = VA_LSB_FIRST;
|
||||||
|
format.bits_per_pixel = 24;
|
||||||
|
} else {
|
||||||
|
format.fourcc = VA_FOURCC_NV12;
|
||||||
|
format.byte_order = VA_LSB_FIRST;
|
||||||
|
format.bits_per_pixel = 12;
|
||||||
|
}
|
||||||
|
|
||||||
status = RequestCreateImage(context, &format, surface_object->width,
|
status = RequestCreateImage(context, &format, surface_object->width,
|
||||||
surface_object->height, image);
|
surface_object->height, image);
|
||||||
@@ -305,26 +423,52 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
|
|||||||
VAStatus RequestQueryImageFormats(VADriverContextP context,
|
VAStatus RequestQueryImageFormats(VADriverContextP context,
|
||||||
VAImageFormat *formats, int *formats_count)
|
VAImageFormat *formats, int *formats_count)
|
||||||
{
|
{
|
||||||
|
struct request_data *driver_data = context->pDriverData;
|
||||||
|
int n = 0;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Populate the VAImageFormat fully per VAAPI spec for NV12 —
|
* Populate the VAImageFormat fully per VAAPI spec — not just
|
||||||
* not just .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv,
|
* .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, Firefox)
|
||||||
* Firefox) read .byte_order and .bits_per_pixel; leaving them
|
* read .byte_order and .bits_per_pixel; leaving them
|
||||||
* uninitialized inherits whatever caller-stack garbage is in
|
* uninitialized inherits caller-stack garbage and produces
|
||||||
* the buffer and produces non-deterministic behavior. Reference:
|
* non-deterministic behavior. Reference: Mesa's
|
||||||
* Mesa's gallium/frontends/va/image.c::vlVaQueryImageFormats and
|
* gallium/frontends/va/image.c::vlVaQueryImageFormats and
|
||||||
* intel-vaapi-driver's i965_drv_video.c — both publish NV12
|
* intel-vaapi-driver's i965_drv_video.c.
|
||||||
* with byte_order=VA_LSB_FIRST and bits_per_pixel=12.
|
|
||||||
*
|
*
|
||||||
* For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask
|
* iter39: advertise P010 when an active session is 10-bit so
|
||||||
* are not meaningful (those describe RGB bit layouts); leave them
|
* ffmpeg-vaapi sees a valid 10-bit-compatible entry during
|
||||||
* zeroed via memset before populating.
|
* vaQueryImageFormats. NV12 stays advertised unconditionally so
|
||||||
|
* the 8-bit catalog query response is unchanged.
|
||||||
*/
|
*/
|
||||||
memset(&formats[0], 0, sizeof(formats[0]));
|
memset(&formats[n], 0, sizeof(formats[n]));
|
||||||
formats[0].fourcc = VA_FOURCC_NV12;
|
formats[n].fourcc = VA_FOURCC_NV12;
|
||||||
formats[0].byte_order = VA_LSB_FIRST;
|
formats[n].byte_order = VA_LSB_FIRST;
|
||||||
formats[0].bits_per_pixel = 12;
|
formats[n].bits_per_pixel = 12;
|
||||||
*formats_count = 1;
|
n++;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter39 Option B revert (2026-05-17): P010 advertisement is
|
||||||
|
* gated on driver_data->is_10bit again. Previously advertised
|
||||||
|
* unconditionally (63fed87) so ffmpeg-vaapi's early
|
||||||
|
* vaQueryImageFormats (pre-vaCreateContext) could see it for
|
||||||
|
* 10-bit profiles — but that broke HEVC 8-bit on fresnel:
|
||||||
|
* ffmpeg-vaapi picked P010 for the HEVC hwframe pool, EndPicture
|
||||||
|
* SEGV'd in the .so when the consumer-side P010 expectations met
|
||||||
|
* an 8-bit NV12 CAPTURE buffer.
|
||||||
|
* Safe because Option B drops VAProfileHEVCMain10 + Hi10P from
|
||||||
|
* enumeration — no 10-bit decode pipeline will reach this catalog
|
||||||
|
* query so the gate-on-is_10bit (which stays false for 8-bit
|
||||||
|
* profiles) correctly returns NV12-only.
|
||||||
|
*/
|
||||||
|
if (driver_data->is_10bit && n < V4L2_REQUEST_MAX_IMAGE_FORMATS) {
|
||||||
|
memset(&formats[n], 0, sizeof(formats[n]));
|
||||||
|
formats[n].fourcc = VA_FOURCC_P010;
|
||||||
|
formats[n].byte_order = VA_LSB_FIRST;
|
||||||
|
formats[n].bits_per_pixel = 24;
|
||||||
|
n++;
|
||||||
|
}
|
||||||
|
|
||||||
|
*formats_count = n;
|
||||||
|
|
||||||
return VA_STATUS_SUCCESS;
|
return VA_STATUS_SUCCESS;
|
||||||
}
|
}
|
||||||
|
|||||||
+41
-3
@@ -22,6 +22,9 @@
|
|||||||
|
|
||||||
autoconf_data = configuration_data()
|
autoconf_data = configuration_data()
|
||||||
autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func)
|
autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func)
|
||||||
|
if get_option('daedalus_v4l2')
|
||||||
|
autoconf_data.set('HAVE_DAEDALUS_V4L2', 1)
|
||||||
|
endif
|
||||||
|
|
||||||
autoconf = configure_file(
|
autoconf = configure_file(
|
||||||
output: 'autoconfig.h',
|
output: 'autoconfig.h',
|
||||||
@@ -50,7 +53,19 @@ sources = [
|
|||||||
'h265.c',
|
'h265.c',
|
||||||
'vp8.c',
|
'vp8.c',
|
||||||
'vp9.c',
|
'vp9.c',
|
||||||
'codec.c'
|
'av1.c',
|
||||||
|
'codec.c',
|
||||||
|
'nv15.c',
|
||||||
|
'nv12_col128.c',
|
||||||
|
|
||||||
|
# Vendored GStreamer 1.28.2 H.265 parser + utilities (LGPL v2.1+,
|
||||||
|
# see src/h265_parser/gst_compat.h for sourcing notes + per-iter2
|
||||||
|
# adaptation strategy).
|
||||||
|
'h265_parser/gst_compat.c',
|
||||||
|
'h265_parser/gst/base/gstbitreader.c',
|
||||||
|
'h265_parser/gst/base/gstbytereader.c',
|
||||||
|
'h265_parser/gst/codecparsers/nalutils.c',
|
||||||
|
'h265_parser/gst/codecparsers/gsth265parser.c'
|
||||||
]
|
]
|
||||||
|
|
||||||
headers = [
|
headers = [
|
||||||
@@ -76,11 +91,34 @@ headers = [
|
|||||||
'h265.h',
|
'h265.h',
|
||||||
'vp8.h',
|
'vp8.h',
|
||||||
'vp9.h',
|
'vp9.h',
|
||||||
'codec.h'
|
'codec.h',
|
||||||
|
'nv15.h',
|
||||||
|
'nv12_col128.h',
|
||||||
|
|
||||||
|
# Internal mirror of Linux 7.0 V4L2 HEVC EXT_SPS_*_RPS UAPI defs
|
||||||
|
# (allows building against pre-7.0 linux-api-headers; redundant
|
||||||
|
# once the host headers are 7.0+).
|
||||||
|
'hevc-ctrls/v4l2-hevc-ext-controls.h',
|
||||||
|
|
||||||
|
# Vendored GStreamer + project shim headers (see sources above).
|
||||||
|
'h265_parser/gst_compat.h',
|
||||||
|
'h265_parser/gst/gst.h',
|
||||||
|
'h265_parser/gst/glib-compat-private.h',
|
||||||
|
'h265_parser/gst/base/base-prelude.h',
|
||||||
|
'h265_parser/gst/base/gstbitreader.h',
|
||||||
|
'h265_parser/gst/base/gstbytereader.h',
|
||||||
|
'h265_parser/gst/base/gstbitwriter.h',
|
||||||
|
'h265_parser/gst/codecparsers/codecparsers-prelude.h',
|
||||||
|
'h265_parser/gst/codecparsers/gsth265parser.h',
|
||||||
|
'h265_parser/gst/codecparsers/nalutils.h'
|
||||||
]
|
]
|
||||||
|
|
||||||
includes = [
|
includes = [
|
||||||
include_directories('../include')
|
include_directories('../include'),
|
||||||
|
# Vendored GStreamer parser tree — the parser's #include <gst/base/...>
|
||||||
|
# style references resolve here via stub headers that redirect to
|
||||||
|
# gst_compat.h.
|
||||||
|
include_directories('h265_parser')
|
||||||
]
|
]
|
||||||
|
|
||||||
cflags = [
|
cflags = [
|
||||||
|
|||||||
@@ -0,0 +1,114 @@
|
|||||||
|
/*
|
||||||
|
* V4L2_PIX_FMT_NV12_COL128 → linear NV12 detile primitive. Pi 5 / CM5
|
||||||
|
* rpi-hevc-dec CAPTURE. iter40 (2026-05-17).
|
||||||
|
*
|
||||||
|
* Math derived from kernel hevc_d_video.c (size formula) +
|
||||||
|
* ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h (per-pixel offset). The
|
||||||
|
* single-stripe fast path memcpy's 128 bytes at a time when an output
|
||||||
|
* row falls entirely within one tile column (the common case);
|
||||||
|
* straddling rows are split into two memcpy halves.
|
||||||
|
*
|
||||||
|
* No NEON / SIMD here — correctness first. Each output row generates
|
||||||
|
* (width / 128) + ~1 memcpys of up to 128 bytes; for 1920x1080 that's
|
||||||
|
* ~17000 small memcpys per frame, fine for Phase 1 PoC.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "nv12_col128.h"
|
||||||
|
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Tile column width in bytes. The 'COL128' name embeds this; if it ever
|
||||||
|
* varies, take it from V4L2_PIX_FMT_NV12_COL128's kernel definition.
|
||||||
|
*/
|
||||||
|
#define NC12_TILE_W 128
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Common Y / UV plane detile — the layout is identical (single-byte per
|
||||||
|
* pixel, column-major 128-wide tiles). The only thing that varies is
|
||||||
|
* what plane the caller passes in. width here is plane width in bytes
|
||||||
|
* (= image width for both Y and CbCr-interleaved NV12 UV); height is
|
||||||
|
* plane height in pixels (image height for Y, image height / 2 for UV).
|
||||||
|
*/
|
||||||
|
static void nv12_col128_detile_plane(uint8_t *dst, unsigned int dst_stride,
|
||||||
|
const uint8_t *src,
|
||||||
|
unsigned int src_col_stride,
|
||||||
|
unsigned int width, unsigned int height)
|
||||||
|
{
|
||||||
|
unsigned int y, x;
|
||||||
|
|
||||||
|
for (y = 0; y < height; y++) {
|
||||||
|
uint8_t *drow = dst + y * dst_stride;
|
||||||
|
x = 0;
|
||||||
|
while (x < width) {
|
||||||
|
unsigned int col = x / NC12_TILE_W;
|
||||||
|
unsigned int in_col = x % NC12_TILE_W;
|
||||||
|
unsigned int n = NC12_TILE_W - in_col;
|
||||||
|
if (n > width - x)
|
||||||
|
n = width - x;
|
||||||
|
/*
|
||||||
|
* Source byte = base + col*128*col_stride + y*128 + in_col
|
||||||
|
* Copy n contiguous bytes (all within this tile column,
|
||||||
|
* since n is capped at the remaining width-in-column).
|
||||||
|
*/
|
||||||
|
const uint8_t *p = src
|
||||||
|
+ (size_t)col * NC12_TILE_W * src_col_stride
|
||||||
|
+ (size_t)y * NC12_TILE_W
|
||||||
|
+ in_col;
|
||||||
|
memcpy(drow + x, p, n);
|
||||||
|
x += n;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
|
||||||
|
const uint8_t *src_y, unsigned int src_col_stride,
|
||||||
|
unsigned int width, unsigned int height)
|
||||||
|
{
|
||||||
|
nv12_col128_detile_plane(dst, dst_stride, src_y, src_col_stride,
|
||||||
|
width, height);
|
||||||
|
}
|
||||||
|
|
||||||
|
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
|
||||||
|
const uint8_t *src_uv, unsigned int src_col_stride,
|
||||||
|
unsigned int width, unsigned int uv_height)
|
||||||
|
{
|
||||||
|
/* UV plane (CbCr interleaved): byte-width equals Y-plane width
|
||||||
|
* (one Cb + one Cr per 2x2 Y block → 2 bytes per 2 horizontal Y
|
||||||
|
* samples → 1 byte per Y pixel horizontally). Height is half. */
|
||||||
|
nv12_col128_detile_plane(dst, dst_stride, src_uv, src_col_stride,
|
||||||
|
width, uv_height);
|
||||||
|
}
|
||||||
|
|
||||||
|
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
|
||||||
|
unsigned int image_height)
|
||||||
|
{
|
||||||
|
unsigned int aligned_h = (image_height + 7) & ~7u;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* In the COL128 SAND layout, Y and UV are NOT separate planes
|
||||||
|
* concatenated end-to-end. Within EACH 128-pixel-wide column:
|
||||||
|
* first 128 * height bytes = Y data for this column strip
|
||||||
|
* next 128 * height / 2 bytes = UV data for this column strip
|
||||||
|
* total 128 * bytesperline (= 128 * height * 3/2) bytes per column
|
||||||
|
*
|
||||||
|
* The "UV plane base" pointer (data[1] in AVFrame convention) is
|
||||||
|
* just data[0] + (128 * height) — the offset of the UV bytes
|
||||||
|
* WITHIN the first column. All subsequent UV bytes are reached by
|
||||||
|
* the same column-stride arithmetic the Y plane uses (col *
|
||||||
|
* 128 * bytesperline + y * 128 + in_col), so passing this offset
|
||||||
|
* pointer + iterating y over [0, height/2) traverses all UV rows
|
||||||
|
* across all columns correctly.
|
||||||
|
*
|
||||||
|
* Earlier wrong formula was num_columns * 128 * aligned_h (i.e.
|
||||||
|
* sizeof(linear Y plane)) — that pushed past the end of the SAND
|
||||||
|
* buffer because the layout isn't planes-end-to-end.
|
||||||
|
*
|
||||||
|
* Cross-check: kernel sizeimage = bytesperline * width =
|
||||||
|
* (aligned_h * 3/2) * num_columns * 128 = num_columns * 128 *
|
||||||
|
* aligned_h * 3/2. Per column: 128 * aligned_h * 3/2. Y portion
|
||||||
|
* per column: 128 * aligned_h. UV portion per column: half of Y.
|
||||||
|
* Sum across columns: matches sizeimage.
|
||||||
|
*/
|
||||||
|
return NC12_TILE_W * aligned_h;
|
||||||
|
}
|
||||||
@@ -0,0 +1,88 @@
|
|||||||
|
/*
|
||||||
|
* V4L2_PIX_FMT_NV12_COL128 (NC12) SAND-tiled → linear NV12 detile.
|
||||||
|
*
|
||||||
|
* Pi 5 / CM5 (BCM2712) rpi-hevc-dec CAPTURE format. iter40 (2026-05-17).
|
||||||
|
*
|
||||||
|
* Layout (kernel drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
|
||||||
|
* size-formula + ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h per-pixel
|
||||||
|
* offset math):
|
||||||
|
*
|
||||||
|
* width ALIGN(image_width, 128) -- columns are 128 px wide
|
||||||
|
* height ALIGN(image_height, 8)
|
||||||
|
* col_stride (= bytesperline) = height * 3 / 2
|
||||||
|
* (bytes per [128-wide column] vertical unit incl. Y + UV)
|
||||||
|
* sizeimage = col_stride * width = total bytes
|
||||||
|
*
|
||||||
|
* For pixel (x, y) in the Y plane:
|
||||||
|
* col = x / 128
|
||||||
|
* in_col_x = x % 128
|
||||||
|
* offset = col * col_stride * 128 + y * 128 + in_col_x
|
||||||
|
*
|
||||||
|
* UV plane starts at offset (128 * height * num_columns_y) — the same
|
||||||
|
* per-column layout, h/2 rows tall (CbCr interleaved).
|
||||||
|
*
|
||||||
|
* The primitive copies the entire image extent at once. width/height are
|
||||||
|
* the cropped consumer-visible dimensions; src_col_stride is the kernel-
|
||||||
|
* reported bytesperline (i.e. ALIGN(height,8) * 3/2).
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef _NV12_COL128_H_
|
||||||
|
#define _NV12_COL128_H_
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Pre-Pi-kernel headers (Arch ALARM linux-api-headers, older mainline
|
||||||
|
* kernel-headers packages) may not define V4L2_PIX_FMT_NV12_COL128. The
|
||||||
|
* fourcc is Pi-specific. Provide a private fallback so the backend
|
||||||
|
* builds on hosts that target NON-Pi codecs too.
|
||||||
|
*/
|
||||||
|
#ifndef V4L2_PIX_FMT_NV12_COL128
|
||||||
|
#define V4L2_PIX_FMT_NV12_COL128 \
|
||||||
|
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
|
||||||
|
((unsigned int)('1') << 16) | ((unsigned int)('2') << 24))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef V4L2_PIX_FMT_NV12_10_COL128
|
||||||
|
/* 10-bit SAND variant: 3 pixels packed into 4 bytes in 128-byte / 96-pixel
|
||||||
|
* wide columns. iter40 references the fourcc for completeness; the 10-bit
|
||||||
|
* Pi 5 HEVC chapter (Main10) is post-iter40. */
|
||||||
|
#define V4L2_PIX_FMT_NV12_10_COL128 \
|
||||||
|
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
|
||||||
|
((unsigned int)('3') << 16) | ((unsigned int)('0') << 24))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Detile the Y plane of an NC12 source to a linear NV12 Y plane.
|
||||||
|
* dst : pointer to linear NV12 Y plane (caller-owned, dst_stride * height bytes)
|
||||||
|
* dst_stride : linear Y plane stride in bytes (= width for plain NV12)
|
||||||
|
* src_y : pointer to start of NC12 Y plane (= NC12 buffer base)
|
||||||
|
* src_col_stride: kernel-reported bytesperline (= ALIGN(height,8) * 3/2)
|
||||||
|
* width, height: cropped image dimensions in pixels
|
||||||
|
*/
|
||||||
|
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
|
||||||
|
const uint8_t *src_y, unsigned int src_col_stride,
|
||||||
|
unsigned int width, unsigned int height);
|
||||||
|
|
||||||
|
/* Detile the UV plane (CbCr interleaved, half-height) of an NC12 source.
|
||||||
|
* dst : pointer to linear NV12 UV plane
|
||||||
|
* dst_stride : linear UV plane stride in bytes (= width for NV12)
|
||||||
|
* src_uv : pointer to start of NC12 UV plane (= src_y + Y-plane-size)
|
||||||
|
* src_col_stride: same as Y plane (same column geometry)
|
||||||
|
* width : Y-plane width in pixels (UV plane has same byte width)
|
||||||
|
* uv_height : UV plane height = height / 2
|
||||||
|
*/
|
||||||
|
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
|
||||||
|
const uint8_t *src_uv, unsigned int src_col_stride,
|
||||||
|
unsigned int width, unsigned int uv_height);
|
||||||
|
|
||||||
|
/* Compute the offset of the UV plane within an NC12 buffer.
|
||||||
|
* image_width, image_height: cropped image dimensions in pixels
|
||||||
|
* Returns: byte offset from buffer start to UV plane start
|
||||||
|
* (= 128 * ALIGN(image_height, 8) * num_columns_y)
|
||||||
|
*/
|
||||||
|
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
|
||||||
|
unsigned int image_height);
|
||||||
|
|
||||||
|
#endif /* _NV12_COL128_H_ */
|
||||||
+75
@@ -0,0 +1,75 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||||
|
*
|
||||||
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
|
* copy of this software and associated documentation files (the
|
||||||
|
* "Software"), to deal in the Software without restriction, including
|
||||||
|
* without limitation the rights to use, copy, modify, merge, publish,
|
||||||
|
* distribute, sub license, and/or sell copies of the Software, and to
|
||||||
|
* permit persons to whom the Software is furnished to do so, subject to
|
||||||
|
* the following conditions:
|
||||||
|
*
|
||||||
|
* The above copyright notice and this permission notice (including the
|
||||||
|
* next paragraph) shall be included in all copies or substantial portions
|
||||||
|
* of the Software.
|
||||||
|
*
|
||||||
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||||
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||||
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||||
|
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||||
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||||
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||||
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "nv15.h"
|
||||||
|
|
||||||
|
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
|
||||||
|
unsigned int width, unsigned int height,
|
||||||
|
unsigned int src_stride)
|
||||||
|
{
|
||||||
|
unsigned int x, y;
|
||||||
|
unsigned int dst_pitch_px = width;
|
||||||
|
|
||||||
|
for (y = 0; y < height; y++) {
|
||||||
|
const uint8_t *s = src + y * src_stride;
|
||||||
|
uint16_t *d = dst + y * dst_pitch_px;
|
||||||
|
|
||||||
|
for (x = 0; x + 4 <= width; x += 4) {
|
||||||
|
uint16_t a = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
|
||||||
|
uint16_t b = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
|
||||||
|
uint16_t c = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
|
||||||
|
uint16_t e = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
|
||||||
|
|
||||||
|
d[0] = (uint16_t)(a << 6);
|
||||||
|
d[1] = (uint16_t)(b << 6);
|
||||||
|
d[2] = (uint16_t)(c << 6);
|
||||||
|
d[3] = (uint16_t)(e << 6);
|
||||||
|
|
||||||
|
d += 4;
|
||||||
|
s += 5;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (x < width) {
|
||||||
|
unsigned int rem = width - x;
|
||||||
|
uint16_t pix[4] = { 0, 0, 0, 0 };
|
||||||
|
|
||||||
|
pix[0] = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
|
||||||
|
if (rem >= 2)
|
||||||
|
pix[1] = ((uint16_t)s[1] >> 2) |
|
||||||
|
((uint16_t)(s[2] & 0x0F) << 6);
|
||||||
|
if (rem >= 3)
|
||||||
|
pix[2] = ((uint16_t)s[2] >> 4) |
|
||||||
|
((uint16_t)(s[3] & 0x3F) << 4);
|
||||||
|
if (rem >= 4)
|
||||||
|
pix[3] = ((uint16_t)s[3] >> 6) |
|
||||||
|
((uint16_t)s[4] << 2);
|
||||||
|
|
||||||
|
{
|
||||||
|
unsigned int j;
|
||||||
|
for (j = 0; j < rem; j++)
|
||||||
|
d[j] = (uint16_t)(pix[j] << 6);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
+61
@@ -0,0 +1,61 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||||
|
*
|
||||||
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
|
* copy of this software and associated documentation files (the
|
||||||
|
* "Software"), to deal in the Software without restriction, including
|
||||||
|
* without limitation the rights to use, copy, modify, merge, publish,
|
||||||
|
* distribute, sub license, and/or sell copies of the Software, and to
|
||||||
|
* permit persons to whom the Software is furnished to do so, subject to
|
||||||
|
* the following conditions:
|
||||||
|
*
|
||||||
|
* The above copyright notice and this permission notice (including the
|
||||||
|
* next paragraph) shall be included in all copies or substantial portions
|
||||||
|
* of the Software.
|
||||||
|
*
|
||||||
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||||
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||||
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||||
|
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||||
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||||
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||||
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef _NV15_H_
|
||||||
|
#define _NV15_H_
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Older or downstream linux-api-headers / kernel-headers packages may
|
||||||
|
* not define V4L2_PIX_FMT_NV15. Provide a fallback so the backend
|
||||||
|
* builds on hosts whose headers are pre-NV15-merge or omit it (e.g.
|
||||||
|
* Pi 5 Debian trixie 6.12.62 headers include NC12 but not NV15).
|
||||||
|
* Same numeric value as mainline.
|
||||||
|
*/
|
||||||
|
#ifndef V4L2_PIX_FMT_NV15
|
||||||
|
#define V4L2_PIX_FMT_NV15 \
|
||||||
|
((unsigned int)('N') | ((unsigned int)('V') << 8) | \
|
||||||
|
((unsigned int)('1') << 16) | ((unsigned int)('5') << 24))
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Unpack one plane of V4L2_PIX_FMT_NV15 (4 × 10-bit values packed into
|
||||||
|
* 5 consecutive bytes, LSB-first) into VA_FOURCC_P010 (16-bit per pixel,
|
||||||
|
* value in bits [15:6], zeros in [5:0]).
|
||||||
|
*
|
||||||
|
* Layout per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
|
||||||
|
* Call once per plane: luma (W × H, src_stride = ceil(W/4)*5) and chroma
|
||||||
|
* (W × H/2 — same width because UV are interleaved 10-bit values).
|
||||||
|
*
|
||||||
|
* src_stride must be the kernel-reported bytesperline for the NV15 plane.
|
||||||
|
* The destination is dense P010 with row pitch = width * 2 bytes.
|
||||||
|
*/
|
||||||
|
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
|
||||||
|
unsigned int width, unsigned int height,
|
||||||
|
unsigned int src_stride);
|
||||||
|
|
||||||
|
#endif
|
||||||
+36
-1
@@ -36,6 +36,7 @@
|
|||||||
#include "mpeg2.h"
|
#include "mpeg2.h"
|
||||||
#include "vp8.h"
|
#include "vp8.h"
|
||||||
#include "vp9.h"
|
#include "vp9.h"
|
||||||
|
#include "av1.h"
|
||||||
|
|
||||||
#include <assert.h>
|
#include <assert.h>
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
@@ -132,12 +133,14 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
memcpy(&surface_object->params.h264.picture,
|
memcpy(&surface_object->params.h264.picture,
|
||||||
buffer_object->data,
|
buffer_object->data,
|
||||||
sizeof(surface_object->params.h264.picture));
|
sizeof(surface_object->params.h264.picture));
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10:
|
||||||
memcpy(&surface_object->params.h265.picture,
|
memcpy(&surface_object->params.h265.picture,
|
||||||
buffer_object->data,
|
buffer_object->data,
|
||||||
sizeof(surface_object->params.h265.picture));
|
sizeof(surface_object->params.h265.picture));
|
||||||
@@ -155,6 +158,12 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
|||||||
sizeof(surface_object->params.vp9.picture));
|
sizeof(surface_object->params.vp9.picture));
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
memcpy(&surface_object->params.av1.picture,
|
||||||
|
buffer_object->data,
|
||||||
|
sizeof(surface_object->params.av1.picture));
|
||||||
|
break;
|
||||||
|
|
||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
@@ -167,12 +176,14 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
memcpy(&surface_object->params.h264.slice,
|
memcpy(&surface_object->params.h264.slice,
|
||||||
buffer_object->data,
|
buffer_object->data,
|
||||||
sizeof(surface_object->params.h264.slice));
|
sizeof(surface_object->params.h264.slice));
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case VAProfileHEVCMain: {
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10: {
|
||||||
unsigned int n = surface_object->params.h265.num_slices;
|
unsigned int n = surface_object->params.h265.num_slices;
|
||||||
if (n < HEVC_MAX_SLICES_PER_FRAME) {
|
if (n < HEVC_MAX_SLICES_PER_FRAME) {
|
||||||
memcpy(&surface_object->params.h265.slices[n],
|
memcpy(&surface_object->params.h265.slices[n],
|
||||||
@@ -220,6 +231,7 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
memcpy(&surface_object->params.h264.matrix,
|
memcpy(&surface_object->params.h264.matrix,
|
||||||
buffer_object->data,
|
buffer_object->data,
|
||||||
sizeof(surface_object->params.h264.matrix));
|
sizeof(surface_object->params.h264.matrix));
|
||||||
@@ -227,6 +239,7 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
|
|||||||
break;
|
break;
|
||||||
|
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10:
|
||||||
memcpy(&surface_object->params.h265.iqmatrix,
|
memcpy(&surface_object->params.h265.iqmatrix,
|
||||||
buffer_object->data,
|
buffer_object->data,
|
||||||
sizeof(surface_object->params.h265.iqmatrix));
|
sizeof(surface_object->params.h265.iqmatrix));
|
||||||
@@ -286,6 +299,7 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
|||||||
case VAProfileH264ConstrainedBaseline:
|
case VAProfileH264ConstrainedBaseline:
|
||||||
case VAProfileH264MultiviewHigh:
|
case VAProfileH264MultiviewHigh:
|
||||||
case VAProfileH264StereoHigh:
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileH264High10:
|
||||||
rc = h264_set_controls(driver_data, context, profile,
|
rc = h264_set_controls(driver_data, context, profile,
|
||||||
surface_object);
|
surface_object);
|
||||||
if (rc < 0)
|
if (rc < 0)
|
||||||
@@ -293,6 +307,7 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
|||||||
break;
|
break;
|
||||||
|
|
||||||
case VAProfileHEVCMain:
|
case VAProfileHEVCMain:
|
||||||
|
case VAProfileHEVCMain10:
|
||||||
rc = h265_set_controls(driver_data, context, surface_object);
|
rc = h265_set_controls(driver_data, context, surface_object);
|
||||||
if (rc < 0)
|
if (rc < 0)
|
||||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||||
@@ -310,6 +325,26 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
|
|||||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
/*
|
||||||
|
* Populates V4L2_CID_STATELESS_AV1_SEQUENCE from
|
||||||
|
* VAPictureParameterBufferAV1. The daedalus_v4l2 daemon
|
||||||
|
* (issue #11 daemon track) synthesises an OBU_SEQUENCE_HEADER
|
||||||
|
* from this ctrl and prepends it to the slice bitstream
|
||||||
|
* before handing it to libavcodec/libdav1d, which otherwise
|
||||||
|
* cannot parse the (sequence-header-stripped) OUTPUT buffer
|
||||||
|
* that ffmpeg-vaapi delivers.
|
||||||
|
*
|
||||||
|
* On the RK3588 vpu981 hardware path the same SEQUENCE ctrl
|
||||||
|
* is harmless: vpu981's driver parses the OBU stream
|
||||||
|
* directly and ignores the ctrl payload, so no per-decoder
|
||||||
|
* gating is required here.
|
||||||
|
*/
|
||||||
|
rc = av1_set_controls(driver_data, context, surface_object);
|
||||||
|
if (rc < 0)
|
||||||
|
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||||
|
break;
|
||||||
|
|
||||||
default:
|
default:
|
||||||
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
|
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
|
||||||
}
|
}
|
||||||
|
|||||||
+281
-3
@@ -57,6 +57,8 @@
|
|||||||
#include <linux/media.h>
|
#include <linux/media.h>
|
||||||
#include <linux/videodev2.h>
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* fresnel-fourier iter4 Phase 6 commit Z + iter7 Phase 6 (B1a): device-path
|
* fresnel-fourier iter4 Phase 6 commit Z + iter7 Phase 6 (B1a): device-path
|
||||||
* auto-detect via media controller topology with decoder-entity discrimination.
|
* auto-detect via media controller topology with decoder-entity discrimination.
|
||||||
@@ -91,6 +93,10 @@
|
|||||||
static const char * const known_decoder_drivers[] = {
|
static const char * const known_decoder_drivers[] = {
|
||||||
"rkvdec",
|
"rkvdec",
|
||||||
"hantro-vpu",
|
"hantro-vpu",
|
||||||
|
"rpi-hevc-dec", /* iter40: Pi 5 / CM5 stateless HEVC */
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
"daedalus_v4l2", /* phase 8.10: Pi 5 daemon-backed VP9/AV1/H264 */
|
||||||
|
#endif
|
||||||
"cedrus",
|
"cedrus",
|
||||||
"sun4i_csi",
|
"sun4i_csi",
|
||||||
NULL
|
NULL
|
||||||
@@ -286,6 +292,43 @@ out:
|
|||||||
* - non-NULL → match only that exact driver name
|
* - non-NULL → match only that exact driver name
|
||||||
* - NULL → match any name in known_decoder_drivers[]
|
* - NULL → match any name in known_decoder_drivers[]
|
||||||
*/
|
*/
|
||||||
|
/*
|
||||||
|
* iter2 (ampere-kernel-decoders campaign) — runtime probe for the
|
||||||
|
* V4L2 stateless HEVC EXT_SPS_{ST,LT}_RPS controls added in
|
||||||
|
* Linux 7.0 (Casanova VDPU381/VDPU383 series). Returns true iff BOTH
|
||||||
|
* controls are registered on the given fd. Stored per-fd on
|
||||||
|
* driver_data so the multi-device-probe model (iter38) doesn't
|
||||||
|
* silently misbehave when codec routing switches devices.
|
||||||
|
*
|
||||||
|
* The two CIDs together are the gate — neither alone is meaningful
|
||||||
|
* without the other (st-RPS + lt-RPS arrays both need to be set to
|
||||||
|
* match the SPS num_short_term_ref_pic_sets / num_long_term_ref_pics_sps
|
||||||
|
* counts). Old kernels (RK3399 rkvdec on linux 6.x) register neither;
|
||||||
|
* RK3588 rkvdec (VDPU381/383 path) registers both.
|
||||||
|
*
|
||||||
|
* Reference: phase4_plan_iter2.md §Step 3 in
|
||||||
|
* ~/src/ampere-kernel-decoders/.
|
||||||
|
*/
|
||||||
|
static bool probe_hevc_ext_sps_rps_controls(int video_fd)
|
||||||
|
{
|
||||||
|
struct v4l2_queryctrl q;
|
||||||
|
|
||||||
|
if (video_fd < 0)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
memset(&q, 0, sizeof(q));
|
||||||
|
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS;
|
||||||
|
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
memset(&q, 0, sizeof(q));
|
||||||
|
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS;
|
||||||
|
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
static int find_decoder_device_by_driver(const char *want_driver,
|
static int find_decoder_device_by_driver(const char *want_driver,
|
||||||
char *video_out, size_t video_out_sz,
|
char *video_out, size_t video_out_sz,
|
||||||
char *media_out, size_t media_out_sz)
|
char *media_out, size_t media_out_sz)
|
||||||
@@ -369,6 +412,16 @@ char request_device_kind_for_profile(VAProfile profile)
|
|||||||
case VAProfileMPEG2Main:
|
case VAProfileMPEG2Main:
|
||||||
case VAProfileVP8Version0_3:
|
case VAProfileVP8Version0_3:
|
||||||
return 'h';
|
return 'h';
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
/*
|
||||||
|
* ampere-av1-enablement Phase 2: RK3588 vpu981 dedicated
|
||||||
|
* AV1 hantro instance. 'a' kind dispatches to
|
||||||
|
* driver_data->video_fd_vpu981. On hosts without the AV1
|
||||||
|
* instance the fd stays -1 and RequestQueryConfigProfiles
|
||||||
|
* never enumerates AV1, so this branch is unreachable for
|
||||||
|
* non-RK3588 hosts.
|
||||||
|
*/
|
||||||
|
return 'a';
|
||||||
default:
|
default:
|
||||||
return '?';
|
return '?';
|
||||||
}
|
}
|
||||||
@@ -392,12 +445,77 @@ int request_switch_device_for_profile(struct request_data *driver_data,
|
|||||||
char kind = request_device_kind_for_profile(profile);
|
char kind = request_device_kind_for_profile(profile);
|
||||||
int target_video, target_media;
|
int target_video, target_media;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter40: HEVC override when rpi-hevc-dec is probed. The static
|
||||||
|
* table (request_device_kind_for_profile) maps HEVC → 'r' (rkvdec)
|
||||||
|
* because that's the canonical RK path. On Pi 5 there's no rkvdec
|
||||||
|
* — rpi-hevc-dec is the only decoder. When BOTH would be present
|
||||||
|
* (hypothetical mixed board), prefer rpi-hevc-dec for HEVC.
|
||||||
|
*
|
||||||
|
* Other rkvdec-routed profiles (VP9, H.264) stay on 'r' because
|
||||||
|
* rpi-hevc-dec is HEVC-only.
|
||||||
|
*/
|
||||||
|
if ((profile == VAProfileHEVCMain || profile == VAProfileHEVCMain10) &&
|
||||||
|
driver_data->video_fd_rpi_hevc_dec >= 0 &&
|
||||||
|
driver_data->media_fd_rpi_hevc_dec >= 0) {
|
||||||
|
kind = 'p';
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
/*
|
||||||
|
* LIBVA-1: VP9/AV1/H.264 → daedalus_v4l2 when the daemon-backed
|
||||||
|
* decoder fd is open. Pi 5 has no rkvdec (those profiles map to
|
||||||
|
* 'r' by default → video_fd_rkvdec = -1 → "stay on whatever's
|
||||||
|
* active" fallback would put H.264 frames on rpi-hevc-dec's fd
|
||||||
|
* and S_FMT would fail). Re-route to the daedalus daemon instead.
|
||||||
|
*
|
||||||
|
* HEVC stays on 'p' (rpi-hevc-dec is HEVC-only — daedalus would
|
||||||
|
* accept it via FFmpeg, but rpi-hevc-dec has the GPU-backed
|
||||||
|
* hardware path so it's the right choice on this SoC).
|
||||||
|
*
|
||||||
|
* AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was probed.
|
||||||
|
* On a Pi 5 the vpu981 slot stays -1, so we still route AV1 to
|
||||||
|
* daedalus here. Check video_fd_vpu981 to preserve the RK3588
|
||||||
|
* priority for that case.
|
||||||
|
*/
|
||||||
|
if (driver_data->video_fd_daedalus >= 0 &&
|
||||||
|
driver_data->media_fd_daedalus >= 0) {
|
||||||
|
switch (profile) {
|
||||||
|
case VAProfileH264Main:
|
||||||
|
case VAProfileH264High:
|
||||||
|
case VAProfileH264ConstrainedBaseline:
|
||||||
|
case VAProfileH264MultiviewHigh:
|
||||||
|
case VAProfileH264StereoHigh:
|
||||||
|
case VAProfileVP9Profile0:
|
||||||
|
kind = 'd';
|
||||||
|
break;
|
||||||
|
case VAProfileAV1Profile0:
|
||||||
|
if (driver_data->video_fd_vpu981 < 0)
|
||||||
|
kind = 'd';
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
if (kind == 'r') {
|
if (kind == 'r') {
|
||||||
target_video = driver_data->video_fd_rkvdec;
|
target_video = driver_data->video_fd_rkvdec;
|
||||||
target_media = driver_data->media_fd_rkvdec;
|
target_media = driver_data->media_fd_rkvdec;
|
||||||
} else if (kind == 'h') {
|
} else if (kind == 'h') {
|
||||||
target_video = driver_data->video_fd_hantro;
|
target_video = driver_data->video_fd_hantro;
|
||||||
target_media = driver_data->media_fd_hantro;
|
target_media = driver_data->media_fd_hantro;
|
||||||
|
} else if (kind == 'p') {
|
||||||
|
target_video = driver_data->video_fd_rpi_hevc_dec;
|
||||||
|
target_media = driver_data->media_fd_rpi_hevc_dec;
|
||||||
|
} else if (kind == 'a') {
|
||||||
|
target_video = driver_data->video_fd_vpu981;
|
||||||
|
target_media = driver_data->media_fd_vpu981;
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
} else if (kind == 'd') {
|
||||||
|
target_video = driver_data->video_fd_daedalus;
|
||||||
|
target_media = driver_data->media_fd_daedalus;
|
||||||
|
#endif
|
||||||
} else {
|
} else {
|
||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
@@ -585,6 +703,12 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
|||||||
driver_data->media_fd_rkvdec = -1;
|
driver_data->media_fd_rkvdec = -1;
|
||||||
driver_data->video_fd_hantro = -1;
|
driver_data->video_fd_hantro = -1;
|
||||||
driver_data->media_fd_hantro = -1;
|
driver_data->media_fd_hantro = -1;
|
||||||
|
driver_data->video_fd_rpi_hevc_dec = -1;
|
||||||
|
driver_data->media_fd_rpi_hevc_dec = -1;
|
||||||
|
driver_data->video_fd_daedalus = -1;
|
||||||
|
driver_data->media_fd_daedalus = -1;
|
||||||
|
driver_data->video_fd_vpu981 = -1;
|
||||||
|
driver_data->media_fd_vpu981 = -1;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* iter38: probe BOTH rkvdec and hantro-vpu so a single libva session
|
* iter38: probe BOTH rkvdec and hantro-vpu so a single libva session
|
||||||
@@ -615,6 +739,36 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
|||||||
alt_driver = "rkvdec";
|
alt_driver = "rkvdec";
|
||||||
driver_data->video_fd_hantro = video_fd;
|
driver_data->video_fd_hantro = video_fd;
|
||||||
driver_data->media_fd_hantro = media_fd;
|
driver_data->media_fd_hantro = media_fd;
|
||||||
|
} else if (strcmp(info.driver, "rpi-hevc-dec") == 0) {
|
||||||
|
/* iter40 + LIBVA-1: Pi 5 / CM5. rpi-hevc-dec is
|
||||||
|
* HEVC-only. If daedalus_v4l2 is ALSO loaded (Pi 5
|
||||||
|
* mixed deployment — out-of-tree daemon-backed
|
||||||
|
* decoder for VP9/AV1/H264), pick it up as the alt
|
||||||
|
* so VP9/AV1/H264 have somewhere to land. */
|
||||||
|
primary_driver = "rpi-hevc-dec";
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
alt_driver = "daedalus_v4l2";
|
||||||
|
#else
|
||||||
|
alt_driver = NULL;
|
||||||
|
#endif
|
||||||
|
driver_data->video_fd_rpi_hevc_dec = video_fd;
|
||||||
|
driver_data->media_fd_rpi_hevc_dec = media_fd;
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
} else if (strcmp(info.driver, "daedalus_v4l2") == 0) {
|
||||||
|
/* phase 8.10 + LIBVA-1: Pi 5 daemon-backed decoder.
|
||||||
|
* VP9 / AV1 / H.264 route through it via the 'd'
|
||||||
|
* kind below. On a mixed-driver box where
|
||||||
|
* rpi-hevc-dec is ALSO loaded, pick it up as the
|
||||||
|
* alt so HEVC has somewhere to land too — find_
|
||||||
|
* codec_device's known_decoder_drivers[] order
|
||||||
|
* normally puts rpi-hevc-dec first (we hit the
|
||||||
|
* other branch in practice), but symmetric handling
|
||||||
|
* keeps us correct if probe order ever flips. */
|
||||||
|
primary_driver = "daedalus_v4l2";
|
||||||
|
alt_driver = "rpi-hevc-dec";
|
||||||
|
driver_data->video_fd_daedalus = video_fd;
|
||||||
|
driver_data->media_fd_daedalus = media_fd;
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -626,15 +780,38 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
|||||||
int alt_v = open(alt_video, O_RDWR | O_NONBLOCK);
|
int alt_v = open(alt_video, O_RDWR | O_NONBLOCK);
|
||||||
int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1;
|
int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1;
|
||||||
if (alt_v >= 0 && alt_m >= 0) {
|
if (alt_v >= 0 && alt_m >= 0) {
|
||||||
|
/* Dispatch into the matching per-driver slot.
|
||||||
|
* iter38 only had rkvdec/hantro pairs; iter40 +
|
||||||
|
* LIBVA-1 extended this to rpi-hevc-dec and
|
||||||
|
* daedalus_v4l2 for the Pi 5 mixed-decoder
|
||||||
|
* deployment. */
|
||||||
if (strcmp(alt_driver, "rkvdec") == 0) {
|
if (strcmp(alt_driver, "rkvdec") == 0) {
|
||||||
driver_data->video_fd_rkvdec = alt_v;
|
driver_data->video_fd_rkvdec = alt_v;
|
||||||
driver_data->media_fd_rkvdec = alt_m;
|
driver_data->media_fd_rkvdec = alt_m;
|
||||||
} else {
|
} else if (strcmp(alt_driver, "hantro-vpu") == 0) {
|
||||||
driver_data->video_fd_hantro = alt_v;
|
driver_data->video_fd_hantro = alt_v;
|
||||||
driver_data->media_fd_hantro = alt_m;
|
driver_data->media_fd_hantro = alt_m;
|
||||||
|
} else if (strcmp(alt_driver, "rpi-hevc-dec") == 0) {
|
||||||
|
driver_data->video_fd_rpi_hevc_dec = alt_v;
|
||||||
|
driver_data->media_fd_rpi_hevc_dec = alt_m;
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
} else if (strcmp(alt_driver, "daedalus_v4l2") == 0) {
|
||||||
|
driver_data->video_fd_daedalus = alt_v;
|
||||||
|
driver_data->media_fd_daedalus = alt_m;
|
||||||
|
#endif
|
||||||
|
} else {
|
||||||
|
/* Shouldn't happen — primary_driver branches
|
||||||
|
* above only set alt_driver to one of the
|
||||||
|
* names handled here. Close and move on. */
|
||||||
|
close(alt_v);
|
||||||
|
close(alt_m);
|
||||||
|
alt_v = -1;
|
||||||
|
alt_m = -1;
|
||||||
|
}
|
||||||
|
if (alt_v >= 0) {
|
||||||
|
request_log("iter38: also opened %s decoder at %s + %s\n",
|
||||||
|
alt_driver, alt_video, alt_media);
|
||||||
}
|
}
|
||||||
request_log("iter38: also opened %s decoder at %s + %s\n",
|
|
||||||
alt_driver, alt_video, alt_media);
|
|
||||||
} else {
|
} else {
|
||||||
if (alt_v >= 0) close(alt_v);
|
if (alt_v >= 0) close(alt_v);
|
||||||
if (alt_m >= 0) close(alt_m);
|
if (alt_m >= 0) close(alt_m);
|
||||||
@@ -642,8 +819,95 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
(void)primary_driver;
|
(void)primary_driver;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* ampere-av1-enablement Phase 2: walk hantro-vpu media nodes
|
||||||
|
* for a SECOND one that advertises V4L2_PIX_FMT_AV1_FRAME
|
||||||
|
* (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances
|
||||||
|
* (legacy MPEG2/VP8 decoder, vepu121 encoder, vpu981 AV1
|
||||||
|
* decoder) all reporting driver="hantro-vpu" / model="hantro-
|
||||||
|
* vpu" — so OUTPUT-format probe is the only reliable
|
||||||
|
* disambiguator that doesn't depend on parsing card-name
|
||||||
|
* strings (which are DTS-dependent). First match wins.
|
||||||
|
*
|
||||||
|
* On non-RK3588 hosts the slot stays -1; RequestQueryConfig
|
||||||
|
* Profiles' AV1 push then no-ops because any_fd_supports_
|
||||||
|
* output_format() returns false for AV1F.
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
char path[32], av1_video[32];
|
||||||
|
|
||||||
|
for (i = 0; i < 16; i++) {
|
||||||
|
int mfd, vfd;
|
||||||
|
struct media_device_info info;
|
||||||
|
|
||||||
|
snprintf(path, sizeof path, "/dev/media%d", i);
|
||||||
|
mfd = open(path, O_RDWR | O_NONBLOCK);
|
||||||
|
if (mfd < 0) continue;
|
||||||
|
memset(&info, 0, sizeof info);
|
||||||
|
if (ioctl(mfd, MEDIA_IOC_DEVICE_INFO, &info) != 0 ||
|
||||||
|
strcmp(info.driver, "hantro-vpu") != 0) {
|
||||||
|
close(mfd);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (find_decoder_video_node_via_topology(
|
||||||
|
mfd, av1_video, sizeof av1_video) != 0) {
|
||||||
|
close(mfd);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
vfd = open(av1_video, O_RDWR | O_NONBLOCK);
|
||||||
|
if (vfd < 0) {
|
||||||
|
close(mfd);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT, V4L2_PIX_FMT_AV1_FRAME) &&
|
||||||
|
!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, V4L2_PIX_FMT_AV1_FRAME)) {
|
||||||
|
close(vfd);
|
||||||
|
close(mfd);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
driver_data->video_fd_vpu981 = vfd;
|
||||||
|
driver_data->media_fd_vpu981 = mfd;
|
||||||
|
request_log("ampere-av1: vpu981 AV1 decoder at %s + %s\n",
|
||||||
|
av1_video, path);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter2 (ampere-kernel-decoders): probe the new HEVC EXT_SPS_RPS
|
||||||
|
* controls on each rkvdec/hantro fd. Result is consumed by
|
||||||
|
* h265_set_controls per-codec gate. Per-fd storage matches the
|
||||||
|
* iter38 multi-device-probe pattern (Phase 5 review item).
|
||||||
|
*/
|
||||||
|
driver_data->has_hevc_ext_sps_rps_rkvdec =
|
||||||
|
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rkvdec);
|
||||||
|
driver_data->has_hevc_ext_sps_rps_hantro =
|
||||||
|
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_hantro);
|
||||||
|
driver_data->has_hevc_ext_sps_rps_rpi_hevc_dec =
|
||||||
|
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rpi_hevc_dec);
|
||||||
|
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
|
||||||
|
request_log("iter2: kernel registers HEVC EXT_SPS_{ST,LT}_RPS "
|
||||||
|
"controls on rkvdec fd (will route through "
|
||||||
|
"vendored GStreamer parser)\n");
|
||||||
|
}
|
||||||
|
if (driver_data->video_fd_rpi_hevc_dec >= 0) {
|
||||||
|
request_log("iter40: also opened rpi-hevc-dec at video_fd=%d "
|
||||||
|
"media_fd=%d (Pi 5 HEVC stateless)\n",
|
||||||
|
driver_data->video_fd_rpi_hevc_dec,
|
||||||
|
driver_data->media_fd_rpi_hevc_dec);
|
||||||
|
}
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
if (driver_data->video_fd_daedalus >= 0) {
|
||||||
|
request_log("phase 8.10: opened daedalus_v4l2 at video_fd=%d "
|
||||||
|
"media_fd=%d (Pi 5 daemon-backed VP9/AV1/H264)\n",
|
||||||
|
driver_data->video_fd_daedalus,
|
||||||
|
driver_data->media_fd_daedalus);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
status = VA_STATUS_SUCCESS;
|
status = VA_STATUS_SUCCESS;
|
||||||
goto complete;
|
goto complete;
|
||||||
|
|
||||||
@@ -690,6 +954,20 @@ VAStatus RequestTerminate(VADriverContextP context)
|
|||||||
close(driver_data->video_fd_hantro);
|
close(driver_data->video_fd_hantro);
|
||||||
if (driver_data->media_fd_hantro >= 0)
|
if (driver_data->media_fd_hantro >= 0)
|
||||||
close(driver_data->media_fd_hantro);
|
close(driver_data->media_fd_hantro);
|
||||||
|
if (driver_data->video_fd_rpi_hevc_dec >= 0)
|
||||||
|
close(driver_data->video_fd_rpi_hevc_dec);
|
||||||
|
if (driver_data->media_fd_rpi_hevc_dec >= 0)
|
||||||
|
close(driver_data->media_fd_rpi_hevc_dec);
|
||||||
|
if (driver_data->video_fd_vpu981 >= 0)
|
||||||
|
close(driver_data->video_fd_vpu981);
|
||||||
|
if (driver_data->media_fd_vpu981 >= 0)
|
||||||
|
close(driver_data->media_fd_vpu981);
|
||||||
|
#ifdef HAVE_DAEDALUS_V4L2
|
||||||
|
if (driver_data->video_fd_daedalus >= 0)
|
||||||
|
close(driver_data->video_fd_daedalus);
|
||||||
|
if (driver_data->media_fd_daedalus >= 0)
|
||||||
|
close(driver_data->media_fd_daedalus);
|
||||||
|
#endif
|
||||||
/* Fall back to direct close if neither alt fd captured the active
|
/* Fall back to direct close if neither alt fd captured the active
|
||||||
* pair (env-override path). */
|
* pair (env-override path). */
|
||||||
if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) {
|
if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) {
|
||||||
|
|||||||
+138
-1
@@ -38,9 +38,20 @@
|
|||||||
|
|
||||||
#include <linux/videodev2.h>
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
|
||||||
|
|
||||||
#define V4L2_REQUEST_STR_VENDOR "v4l2-request"
|
#define V4L2_REQUEST_STR_VENDOR "v4l2-request"
|
||||||
|
|
||||||
#define V4L2_REQUEST_MAX_PROFILES 11
|
/*
|
||||||
|
* Sized for max-possible enumeration with iter39 Option B reverted:
|
||||||
|
* MPEG2(2) + H264(6 incl. Hi10P) + HEVC(2 incl. Main10) + VP8 + VP9 + AV1 = 13.
|
||||||
|
* The per-group guards use `if (... && index < (MAX_PROFILES - N))` where N
|
||||||
|
* is the push-group size, so MAX must be ≥ total+1 — 14 here. Bumping
|
||||||
|
* defensively now so a future re-enable of Hi10P/Main10 doesn't silently
|
||||||
|
* drop AV1 through the off-by-one trap that ate ampere-av1's enumeration
|
||||||
|
* for a week (see issue marfrit/libva-v4l2-request-fourier#2).
|
||||||
|
*/
|
||||||
|
#define V4L2_REQUEST_MAX_PROFILES 14
|
||||||
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5
|
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5
|
||||||
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
|
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
|
||||||
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
|
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
|
||||||
@@ -76,6 +87,121 @@ struct request_data {
|
|||||||
int media_fd_rkvdec;
|
int media_fd_rkvdec;
|
||||||
int video_fd_hantro;
|
int video_fd_hantro;
|
||||||
int media_fd_hantro;
|
int media_fd_hantro;
|
||||||
|
/*
|
||||||
|
* iter40: third multi-device-probe slot for rpi-hevc-dec (Pi 5 /
|
||||||
|
* CM5 / BCM2712). V4L2 stateless HEVC; CAPTURE is NC12/NC30 SAND
|
||||||
|
* 128-pixel-wide column tiled (Pi-specific). On Pi 5 this is the
|
||||||
|
* ONLY decoder slot; on RK hosts it stays -1 and HEVC routes to
|
||||||
|
* rkvdec as before.
|
||||||
|
*/
|
||||||
|
int video_fd_rpi_hevc_dec;
|
||||||
|
int media_fd_rpi_hevc_dec;
|
||||||
|
/*
|
||||||
|
* phase 8.10: fifth multi-device-probe slot for daedalus_v4l2 — the
|
||||||
|
* out-of-tree V4L2 stateless decoder shim that forwards bitstream
|
||||||
|
* to a userspace daemon (daedalus-v4l2 sibling repo). Daemon does
|
||||||
|
* FFmpeg-software decode for VP9 / AV1 / H.264 and ships pixels
|
||||||
|
* back via dmabuf into the CAPTURE buffer. Picked up via the
|
||||||
|
* same media-controller probe + known_decoder_drivers[] entry
|
||||||
|
* pattern as iter40 rpi-hevc-dec. Stays -1 on hosts without the
|
||||||
|
* daedalus module loaded; HEVC routes to rpi-hevc-dec as before.
|
||||||
|
*
|
||||||
|
* Fields are unconditional (8 bytes per session) so the struct
|
||||||
|
* layout is stable regardless of meson option. The active
|
||||||
|
* probe + dispatch code in request.c is gated by
|
||||||
|
* HAVE_DAEDALUS_V4L2; when disabled the fields stay at their
|
||||||
|
* -1 init and no codepath touches them.
|
||||||
|
*/
|
||||||
|
int video_fd_daedalus;
|
||||||
|
int media_fd_daedalus;
|
||||||
|
/*
|
||||||
|
* ampere-av1-enablement Phase 2: fourth multi-device-probe slot
|
||||||
|
* for vpu981 (RK3588's dedicated AV1 hantro instance, kernel
|
||||||
|
* card="rockchip,rk3588-av1-vpu-dec", driver name "hantro-vpu" —
|
||||||
|
* shared with the legacy MPEG-2/VP8/H.264 hantro). Discriminated
|
||||||
|
* by V4L2_PIX_FMT_AV1_FRAME (AV1F) OUTPUT-pixfmt capability since
|
||||||
|
* the driver name alone is ambiguous on RK3588. Stays -1 on hosts
|
||||||
|
* without the AV1 vpu-dec.
|
||||||
|
*
|
||||||
|
* Named "vpu981" for consistency with the in-progress av1-iter1
|
||||||
|
* operator branch (Phase 3-5 bit-exact AV1 work — when that lands
|
||||||
|
* these fields receive the actual decode dispatch wiring).
|
||||||
|
*/
|
||||||
|
int video_fd_vpu981;
|
||||||
|
int media_fd_vpu981;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter2 (ampere-kernel-decoders campaign) — per-fd probe result
|
||||||
|
* for the V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS controls
|
||||||
|
* introduced in Linux 7.0 (Casanova VDPU381/VDPU383 series).
|
||||||
|
* RK3399 rkvdec doesn't have them and the probe returns false;
|
||||||
|
* RK3588 rkvdec (VDPU381/383) registers them and the probe is
|
||||||
|
* true. h265_set_controls consults only the rkvdec entry because
|
||||||
|
* HEVC routes through rkvdec only — hantro's entry stays false
|
||||||
|
* naturally (it doesn't have rkvdec-specific controls).
|
||||||
|
*
|
||||||
|
* The pair-of-flags layout mirrors video_fd_rkvdec /
|
||||||
|
* video_fd_hantro above (iter38 multi-device-probe pattern,
|
||||||
|
* memory feedback_multi_device_probe_design). Phase 5 review
|
||||||
|
* surfaced this as a correctness item: a single scalar on
|
||||||
|
* driver_data would silently misbehave across device-switch
|
||||||
|
* boundaries; per-fd storage is the safe shape.
|
||||||
|
*/
|
||||||
|
bool has_hevc_ext_sps_rps_rkvdec;
|
||||||
|
bool has_hevc_ext_sps_rps_hantro;
|
||||||
|
/* iter40: rpi-hevc-dec doesn't expose EXT_SPS_*_RPS controls
|
||||||
|
* (verified Phase 0 higgs probe: QUERY_EXT_CTRL on 0xa97 → EINVAL).
|
||||||
|
* Probed for consistency with the iter2 pair-of-flags pattern;
|
||||||
|
* stays false on Pi 5 and the iter2 vendored-parser path naturally
|
||||||
|
* doesn't engage. */
|
||||||
|
bool has_hevc_ext_sps_rps_rpi_hevc_dec;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter2 — cached SPS-derived RPS arrays. SPS NALs only appear in
|
||||||
|
* source_data on IDR frames; non-IDR frames' h265_set_controls
|
||||||
|
* reuse the cached arrays so we don't submit zero-filled RPS to
|
||||||
|
* the kernel (which would re-trigger the OOPS the iter2 fix is
|
||||||
|
* designed to prevent). Single-slot cache (sps_id 0 only) —
|
||||||
|
* adequate for the BBB / typical-stream case; multi-SPS streams
|
||||||
|
* would need expanding to a [16] cache keyed by sps_id.
|
||||||
|
*
|
||||||
|
* The cache stores the post-mapped V4L2 control struct arrays
|
||||||
|
* (not the intermediate GstH265SPS) so request.h doesn't need
|
||||||
|
* to know about the vendored GStreamer parser types — only the
|
||||||
|
* V4L2 UAPI structs from hevc-ctrls/v4l2-hevc-ext-controls.h
|
||||||
|
* included above.
|
||||||
|
*
|
||||||
|
* Owned by h265.c; freed at RequestTerminate.
|
||||||
|
*/
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_st_rps *hevc_rps_cache_st;
|
||||||
|
unsigned int hevc_rps_cache_st_count;
|
||||||
|
struct v4l2_ctrl_hevc_ext_sps_lt_rps *hevc_rps_cache_lt;
|
||||||
|
unsigned int hevc_rps_cache_lt_count;
|
||||||
|
bool hevc_rps_cache_valid;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter40b: bitstream-derived SPS field cache for VAAPI-omitted
|
||||||
|
* fields. rpi-hevc-dec validates these against bitstream-true
|
||||||
|
* values; the rkvdec/hantro fallback (sps_max_dec_pic_buffering_minus1,
|
||||||
|
* 0) that satisfies §A.4.2 isn't enough for rpi.
|
||||||
|
*
|
||||||
|
* Cached on first IDR frame's SPS NAL parse, reused for subsequent
|
||||||
|
* non-IDR frames whose source_data may not carry an SPS.
|
||||||
|
*
|
||||||
|
* sps_max_sub_layers_minus1 is the index into max_*[] arrays. The
|
||||||
|
* V4L2 SPS struct fields are scalars (single sublayer), so we pick
|
||||||
|
* the HighestTid (= sps_max_sub_layers_minus1) slot — matches
|
||||||
|
* ffmpeg-vaapi + kdirect convention.
|
||||||
|
*/
|
||||||
|
struct {
|
||||||
|
bool valid;
|
||||||
|
uint8_t sps_max_sub_layers_minus1;
|
||||||
|
uint8_t max_dec_pic_buffering_minus1;
|
||||||
|
uint8_t max_num_reorder_pics;
|
||||||
|
uint8_t max_latency_increase_plus1;
|
||||||
|
bool scaling_list_enabled;
|
||||||
|
bool scaling_list_data_present;
|
||||||
|
} hevc_sps_field_cache;
|
||||||
|
|
||||||
struct video_format *video_format;
|
struct video_format *video_format;
|
||||||
|
|
||||||
@@ -133,6 +259,17 @@ struct request_data {
|
|||||||
unsigned int fmt_buffers_count;
|
unsigned int fmt_buffers_count;
|
||||||
unsigned int fmt_sizes[VIDEO_MAX_PLANES];
|
unsigned int fmt_sizes[VIDEO_MAX_PLANES];
|
||||||
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
|
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter39: active session is decoding a 10-bit profile (Hi10P / Main10).
|
||||||
|
* Set in RequestCreateContext from config->profile. Drives:
|
||||||
|
* - CAPTURE pix_fmt selection (NV15 instead of NV12)
|
||||||
|
* - image.c DeriveImage / QueryImageFormats fourcc reporting (P010
|
||||||
|
* instead of NV12)
|
||||||
|
* - copy_surface_to_image NV15→P010 unpack branch
|
||||||
|
* Reset to false at DestroyContext.
|
||||||
|
*/
|
||||||
|
bool is_10bit;
|
||||||
};
|
};
|
||||||
|
|
||||||
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
|
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
|
||||||
|
|||||||
+11
-2
@@ -182,7 +182,9 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
|||||||
* surface_bind_format_uniform_fields(); the per-slot
|
* surface_bind_format_uniform_fields(); the per-slot
|
||||||
* destination_* fields fill at BeginPicture via surface_bind_slot.
|
* destination_* fields fill at BeginPicture via surface_bind_slot.
|
||||||
*/
|
*/
|
||||||
if (format != VA_RT_FORMAT_YUV420)
|
/* iter39: allow YUV420_10 for Hi10P / Main10 surface allocation. */
|
||||||
|
if (format != VA_RT_FORMAT_YUV420 &&
|
||||||
|
format != VA_RT_FORMAT_YUV420_10)
|
||||||
return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
|
return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
|
||||||
|
|
||||||
for (i = 0; i < surfaces_count; i++) {
|
for (i = 0; i < surfaces_count; i++) {
|
||||||
@@ -706,7 +708,14 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
|
|||||||
|
|
||||||
planes_count = surface_object->destination_planes_count;
|
planes_count = surface_object->destination_planes_count;
|
||||||
|
|
||||||
surface_descriptor->fourcc = VA_FOURCC_NV12;
|
/* iter39: 10-bit session exports a DRM_FORMAT_NV15 buffer; advertise
|
||||||
|
* the matching fourcc so a PRIME consumer aware of NV15 (panfrost-
|
||||||
|
* Mesa et al.) can import correctly. PRIME consumers that only know
|
||||||
|
* NV12 / P010 should use the COPY (vaGetImage) path which unpacks
|
||||||
|
* NV15→P010 in image.c::copy_surface_to_image. */
|
||||||
|
surface_descriptor->fourcc = driver_data->is_10bit
|
||||||
|
? VA_FOURCC('N', 'V', '1', '5')
|
||||||
|
: VA_FOURCC_NV12;
|
||||||
surface_descriptor->width = surface_object->width;
|
surface_descriptor->width = surface_object->width;
|
||||||
surface_descriptor->height = surface_object->height;
|
surface_descriptor->height = surface_object->height;
|
||||||
surface_descriptor->num_objects = export_fds_count;
|
surface_descriptor->num_objects = export_fds_count;
|
||||||
|
|||||||
@@ -122,6 +122,18 @@ struct object_surface {
|
|||||||
VADecPictureParameterBufferVP9 picture;
|
VADecPictureParameterBufferVP9 picture;
|
||||||
VASliceParameterBufferVP9 slice;
|
VASliceParameterBufferVP9 slice;
|
||||||
} vp9;
|
} vp9;
|
||||||
|
struct {
|
||||||
|
/*
|
||||||
|
* AV1 picture parameter buffer. Slice params are
|
||||||
|
* intentionally absent — the daedalus daemon track
|
||||||
|
* (issue #11) consumes the slice OBU bytes directly
|
||||||
|
* from the OUTPUT bitstream and synthesises only the
|
||||||
|
* sequence-header OBU from V4L2_CID_STATELESS_AV1_
|
||||||
|
* SEQUENCE. No per-tile-group struct→OBU re-synthesis
|
||||||
|
* required from libva today.
|
||||||
|
*/
|
||||||
|
VADecPictureParameterBufferAV1 picture;
|
||||||
|
} av1;
|
||||||
} params;
|
} params;
|
||||||
|
|
||||||
int request_fd;
|
int request_fd;
|
||||||
|
|||||||
+26
-3
@@ -476,12 +476,35 @@ int v4l2_set_controls(int video_fd, int request_fd,
|
|||||||
struct v4l2_ext_control *control_array,
|
struct v4l2_ext_control *control_array,
|
||||||
unsigned int num_controls)
|
unsigned int num_controls)
|
||||||
{
|
{
|
||||||
|
struct v4l2_ext_controls controls;
|
||||||
int rc;
|
int rc;
|
||||||
|
|
||||||
rc = v4l2_ioctl_controls(video_fd, request_fd, VIDIOC_S_EXT_CTRLS,
|
memset(&controls, 0, sizeof(controls));
|
||||||
control_array, num_controls);
|
controls.controls = control_array;
|
||||||
|
controls.count = num_controls;
|
||||||
|
if (request_fd >= 0) {
|
||||||
|
controls.which = V4L2_CTRL_WHICH_REQUEST_VAL;
|
||||||
|
controls.request_fd = request_fd;
|
||||||
|
}
|
||||||
|
|
||||||
|
rc = ioctl(video_fd, VIDIOC_S_EXT_CTRLS, &controls);
|
||||||
if (rc < 0) {
|
if (rc < 0) {
|
||||||
request_log("Unable to set control(s): %s\n", strerror(errno));
|
/* error_idx is the index of the first failing control;
|
||||||
|
* if it equals count, the ioctl itself failed (not a
|
||||||
|
* specific control payload). Useful for triaging
|
||||||
|
* which V4L2_CID_STATELESS_* the kernel rejected. */
|
||||||
|
if (controls.error_idx < num_controls)
|
||||||
|
request_log("Unable to set control(s): %s "
|
||||||
|
"(error_idx=%u/%u failing_ctrl_id=0x%x size=%u)\n",
|
||||||
|
strerror(errno),
|
||||||
|
controls.error_idx, controls.count,
|
||||||
|
control_array[controls.error_idx].id,
|
||||||
|
control_array[controls.error_idx].size);
|
||||||
|
else
|
||||||
|
request_log("Unable to set control(s): %s "
|
||||||
|
"(error_idx=%u/%u ioctl-level)\n",
|
||||||
|
strerror(errno),
|
||||||
|
controls.error_idx, controls.count);
|
||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
+34
@@ -31,6 +31,8 @@
|
|||||||
#include <drm_fourcc.h>
|
#include <drm_fourcc.h>
|
||||||
#include <linux/videodev2.h>
|
#include <linux/videodev2.h>
|
||||||
|
|
||||||
|
#include "nv12_col128.h" /* fallback V4L2_PIX_FMT_NV12_COL128 define */
|
||||||
|
#include "nv15.h" /* fallback V4L2_PIX_FMT_NV15 define */
|
||||||
#include "utils.h"
|
#include "utils.h"
|
||||||
#include "video.h"
|
#include "video.h"
|
||||||
|
|
||||||
@@ -45,6 +47,38 @@ static struct video_format formats[] = {
|
|||||||
.planes_count = 2,
|
.planes_count = 2,
|
||||||
.bpp = 16,
|
.bpp = 16,
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
.description = "NV15 YUV (10-bit, rkvdec)",
|
||||||
|
.v4l2_format = V4L2_PIX_FMT_NV15,
|
||||||
|
.v4l2_buffers_count = 1,
|
||||||
|
.v4l2_mplane = true,
|
||||||
|
.drm_format = DRM_FORMAT_NV15,
|
||||||
|
.drm_modifier = DRM_FORMAT_MOD_NONE,
|
||||||
|
.planes_count = 2,
|
||||||
|
.bpp = 24,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* iter40: Pi 5 / CM5 rpi-hevc-dec CAPTURE format. 8-bit NV12
|
||||||
|
* stored as 128-pixel-wide column tiles (SAND128 layout).
|
||||||
|
* Pi-specific; not in mainline drm_fourcc.h (uses NV12 + a
|
||||||
|
* BROADCOM_SAND128 modifier for DRM_PRIME). Our consumer path
|
||||||
|
* always detiles to linear NV12 in copy_surface_to_image, so
|
||||||
|
* we don't expose the SAND modifier downstream — drm_format is
|
||||||
|
* still DRM_FORMAT_NV12 and drm_modifier MOD_NONE so the
|
||||||
|
* format-is-linear gate doesn't pull us into tiled_to_planar
|
||||||
|
* (Sunxi-specific). image.c branches on v4l2_format ==
|
||||||
|
* V4L2_PIX_FMT_NV12_COL128 to invoke the dedicated detile.
|
||||||
|
*/
|
||||||
|
.description = "NV12 SAND128 (8-bit, rpi-hevc-dec)",
|
||||||
|
.v4l2_format = V4L2_PIX_FMT_NV12_COL128,
|
||||||
|
.v4l2_buffers_count = 1,
|
||||||
|
.v4l2_mplane = true,
|
||||||
|
.drm_format = DRM_FORMAT_NV12,
|
||||||
|
.drm_modifier = DRM_FORMAT_MOD_NONE,
|
||||||
|
.planes_count = 2,
|
||||||
|
.bpp = 16,
|
||||||
|
},
|
||||||
// Code to handle this DRM_FORMAT is __arm__ only
|
// Code to handle this DRM_FORMAT is __arm__ only
|
||||||
#ifdef __arm__
|
#ifdef __arm__
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -0,0 +1,196 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||||
|
*
|
||||||
|
* MIT-licensed per project. iter40 self-test for nv12_col128 detile.
|
||||||
|
*
|
||||||
|
* Build an NC12-tiled source buffer from a known linear NV12 image,
|
||||||
|
* run the detile primitive, assert output matches the original. No
|
||||||
|
* hardware needed — pure bit-layout verification of the kernel math
|
||||||
|
* (drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
|
||||||
|
* V4L2_PIX_FMT_NV12_COL128 case + ffmpeg/Kynesim per-pixel offset).
|
||||||
|
*
|
||||||
|
* Build:
|
||||||
|
* cc -Wall -Werror -O2 -o test_nv12_col128_detile \
|
||||||
|
* tests/test_nv12_col128_detile.c src/nv12_col128.c
|
||||||
|
*
|
||||||
|
* Exit 0 = all asserts pass.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "../src/nv12_col128.h"
|
||||||
|
|
||||||
|
#include <assert.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
#define TILE_W 128
|
||||||
|
|
||||||
|
static unsigned int align_up(unsigned int v, unsigned int a)
|
||||||
|
{
|
||||||
|
return (v + a - 1) & ~(a - 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Pack a linear plane (width × height bytes, stride=width) into NC12
|
||||||
|
* layout: each 128-wide column held contiguously, columns at offsets
|
||||||
|
* col * col_stride * 128. col_stride is the kernel-reported bytesperline
|
||||||
|
* = ALIGN(height, 8) * 3/2. Returns the buffer + sizes. */
|
||||||
|
static uint8_t *pack_to_nc12(const uint8_t *linear,
|
||||||
|
unsigned int width, unsigned int height,
|
||||||
|
unsigned int *out_col_stride,
|
||||||
|
size_t *out_size)
|
||||||
|
{
|
||||||
|
unsigned int aligned_w = align_up(width, TILE_W);
|
||||||
|
unsigned int aligned_h = align_up(height, 8);
|
||||||
|
unsigned int col_stride = aligned_h * 3 / 2;
|
||||||
|
unsigned int num_cols = aligned_w / TILE_W;
|
||||||
|
size_t total = (size_t)col_stride * aligned_w;
|
||||||
|
uint8_t *buf;
|
||||||
|
unsigned int col, y, in_col;
|
||||||
|
|
||||||
|
buf = calloc(1, total);
|
||||||
|
assert(buf != NULL);
|
||||||
|
|
||||||
|
for (col = 0; col < num_cols; col++) {
|
||||||
|
uint8_t *col_base = buf + (size_t)col * TILE_W * col_stride;
|
||||||
|
for (y = 0; y < height; y++) {
|
||||||
|
for (in_col = 0; in_col < TILE_W; in_col++) {
|
||||||
|
unsigned int x = col * TILE_W + in_col;
|
||||||
|
if (x >= width)
|
||||||
|
break;
|
||||||
|
col_base[(size_t)y * TILE_W + in_col] =
|
||||||
|
linear[(size_t)y * width + x];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
*out_col_stride = col_stride;
|
||||||
|
*out_size = total;
|
||||||
|
return buf;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_detile_y(unsigned int width, unsigned int height)
|
||||||
|
{
|
||||||
|
uint8_t *linear, *tiled, *recovered;
|
||||||
|
unsigned int col_stride;
|
||||||
|
size_t tile_size, i;
|
||||||
|
|
||||||
|
linear = malloc((size_t)width * height);
|
||||||
|
assert(linear != NULL);
|
||||||
|
/* Distinctive content per pixel: y * 17 + x * 13 — avoids byte-
|
||||||
|
* aliasing patterns that could mask off-by-one bugs. */
|
||||||
|
for (unsigned int y = 0; y < height; y++)
|
||||||
|
for (unsigned int x = 0; x < width; x++)
|
||||||
|
linear[(size_t)y * width + x] = (uint8_t)(y * 17 + x * 13);
|
||||||
|
|
||||||
|
tiled = pack_to_nc12(linear, width, height, &col_stride, &tile_size);
|
||||||
|
|
||||||
|
recovered = calloc(1, (size_t)width * height);
|
||||||
|
assert(recovered != NULL);
|
||||||
|
|
||||||
|
nv12_col128_detile_y(recovered, width, tiled, col_stride, width, height);
|
||||||
|
|
||||||
|
for (i = 0; i < (size_t)width * height; i++) {
|
||||||
|
if (recovered[i] != linear[i]) {
|
||||||
|
fprintf(stderr,
|
||||||
|
"FAIL %ux%u Y: pixel %zu (x=%zu y=%zu) "
|
||||||
|
"linear=0x%02x recovered=0x%02x\n",
|
||||||
|
width, height, i,
|
||||||
|
i % width, i / width,
|
||||||
|
linear[i], recovered[i]);
|
||||||
|
free(linear); free(tiled); free(recovered);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf("PASS %ux%u Y plane (%u columns, col_stride=%u, tile_size=%zu)\n",
|
||||||
|
width, height, align_up(width, TILE_W) / TILE_W,
|
||||||
|
col_stride, tile_size);
|
||||||
|
|
||||||
|
free(linear);
|
||||||
|
free(tiled);
|
||||||
|
free(recovered);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_detile_uv(unsigned int width, unsigned int height)
|
||||||
|
{
|
||||||
|
unsigned int uv_h = height / 2;
|
||||||
|
uint8_t *linear, *tiled, *recovered;
|
||||||
|
unsigned int col_stride;
|
||||||
|
size_t tile_size, i;
|
||||||
|
|
||||||
|
linear = malloc((size_t)width * uv_h);
|
||||||
|
assert(linear != NULL);
|
||||||
|
for (unsigned int y = 0; y < uv_h; y++)
|
||||||
|
for (unsigned int x = 0; x < width; x++)
|
||||||
|
linear[(size_t)y * width + x] = (uint8_t)(y * 23 + x * 7);
|
||||||
|
|
||||||
|
tiled = pack_to_nc12(linear, width, uv_h, &col_stride, &tile_size);
|
||||||
|
|
||||||
|
recovered = calloc(1, (size_t)width * uv_h);
|
||||||
|
assert(recovered != NULL);
|
||||||
|
|
||||||
|
nv12_col128_detile_uv(recovered, width, tiled, col_stride, width, uv_h);
|
||||||
|
|
||||||
|
for (i = 0; i < (size_t)width * uv_h; i++) {
|
||||||
|
if (recovered[i] != linear[i]) {
|
||||||
|
fprintf(stderr,
|
||||||
|
"FAIL %ux%u UV: pixel %zu linear=0x%02x recovered=0x%02x\n",
|
||||||
|
width, height, i,
|
||||||
|
linear[i], recovered[i]);
|
||||||
|
free(linear); free(tiled); free(recovered);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf("PASS %ux%u UV plane\n", width, height);
|
||||||
|
|
||||||
|
free(linear);
|
||||||
|
free(tiled);
|
||||||
|
free(recovered);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_uv_offset(void)
|
||||||
|
{
|
||||||
|
/* Per the SAND COL128 layout, Y and UV are interleaved within
|
||||||
|
* EACH column (not concatenated as separate planes), so the UV
|
||||||
|
* plane base pointer is offset by 128 * ALIGN(height, 8) — the
|
||||||
|
* Y portion of column 0. NOT 128 * height * num_columns (the
|
||||||
|
* size of all Y across all columns), which was an earlier wrong
|
||||||
|
* formula caught by Phase 7 SEGV on higgs. */
|
||||||
|
unsigned int off = nv12_col128_uv_plane_offset(1280, 720);
|
||||||
|
if (off != 128u * 720) {
|
||||||
|
fprintf(stderr, "FAIL UV offset 1280×720: got %u expected %u\n",
|
||||||
|
off, 128u * 720);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
printf("PASS UV offset 1280×720 = %u\n", off);
|
||||||
|
|
||||||
|
off = nv12_col128_uv_plane_offset(1366, 768);
|
||||||
|
if (off != 128u * 768) {
|
||||||
|
fprintf(stderr, "FAIL UV offset 1366×768: got %u expected %u\n",
|
||||||
|
off, 128u * 768);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
printf("PASS UV offset 1366×768 (column-misaligned width)\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void)
|
||||||
|
{
|
||||||
|
/* Phase 3 fixture sizes — all 128-aligned, 8-line-aligned. */
|
||||||
|
test_detile_y(640, 360);
|
||||||
|
test_detile_y(1280, 720);
|
||||||
|
test_detile_y(1920, 1080);
|
||||||
|
|
||||||
|
/* Phase 5 review F4: column-misaligned width (1366 → 1408 padding). */
|
||||||
|
test_detile_y(1366, 768);
|
||||||
|
|
||||||
|
/* UV plane (half-height) at each width. */
|
||||||
|
test_detile_uv(640, 360);
|
||||||
|
test_detile_uv(1280, 720);
|
||||||
|
test_detile_uv(1920, 1080);
|
||||||
|
test_detile_uv(1366, 768);
|
||||||
|
|
||||||
|
test_uv_offset();
|
||||||
|
|
||||||
|
printf("All NC12 detile asserts pass.\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
@@ -0,0 +1,224 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
|
||||||
|
*
|
||||||
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
|
* copy of this software and associated documentation files (the
|
||||||
|
* "Software"), to deal in the Software without restriction, including
|
||||||
|
* without limitation the rights to use, copy, modify, merge, publish,
|
||||||
|
* distribute, sub license, and/or sell copies of the Software, and to
|
||||||
|
* permit persons to whom the Software is furnished to do so, subject to
|
||||||
|
* the following conditions:
|
||||||
|
*
|
||||||
|
* The above copyright notice and this permission notice (including the
|
||||||
|
* next paragraph) shall be included in all copies or substantial portions
|
||||||
|
* of the Software.
|
||||||
|
*
|
||||||
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||||
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||||
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
||||||
|
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
||||||
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||||
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||||
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* iter39 self-test for nv15_unpack_plane_to_p010.
|
||||||
|
*
|
||||||
|
* Builds NV15 plane buffers from known 10-bit pixel arrays, runs the
|
||||||
|
* unpack, asserts P010 output matches the expected pixel<<6 values.
|
||||||
|
* No hardware needed — pure bit layout verification per
|
||||||
|
* Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
|
||||||
|
*
|
||||||
|
* Build:
|
||||||
|
* cc -Wall -Werror -O2 -o test_nv15_unpack tests/test_nv15_unpack.c src/nv15.c
|
||||||
|
*
|
||||||
|
* Exit 0 = all asserts pass.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "../src/nv15.h"
|
||||||
|
|
||||||
|
#include <assert.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
/* Pack 4 10-bit pixels into 5 bytes per NV15 layout (LSB-first across
|
||||||
|
* bits 0..39). Inverse of nv15_unpack_plane_to_p010's per-group unpack. */
|
||||||
|
static void pack4(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
|
||||||
|
uint8_t out[5])
|
||||||
|
{
|
||||||
|
out[0] = (uint8_t)(a & 0xFF);
|
||||||
|
out[1] = (uint8_t)(((a >> 8) & 0x03) | ((b & 0x3F) << 2));
|
||||||
|
out[2] = (uint8_t)(((b >> 6) & 0x0F) | ((c & 0x0F) << 4));
|
||||||
|
out[3] = (uint8_t)(((c >> 4) & 0x3F) | ((d & 0x03) << 6));
|
||||||
|
out[4] = (uint8_t)((d >> 2) & 0xFF);
|
||||||
|
}
|
||||||
|
|
||||||
|
#define ASSERT_EQ(actual, expected, msg) do { \
|
||||||
|
if ((actual) != (expected)) { \
|
||||||
|
fprintf(stderr, "FAIL %s: actual=0x%04x expected=0x%04x at %s:%d\n", \
|
||||||
|
(msg), (unsigned)(actual), (unsigned)(expected), \
|
||||||
|
__FILE__, __LINE__); \
|
||||||
|
exit(1); \
|
||||||
|
} \
|
||||||
|
} while (0)
|
||||||
|
|
||||||
|
static void test_pack_unpack_roundtrip(uint16_t a, uint16_t b, uint16_t c,
|
||||||
|
uint16_t d)
|
||||||
|
{
|
||||||
|
uint8_t packed[5];
|
||||||
|
uint16_t dst[4];
|
||||||
|
|
||||||
|
pack4(a, b, c, d, packed);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], (uint16_t)(a << 6), "roundtrip a");
|
||||||
|
ASSERT_EQ(dst[1], (uint16_t)(b << 6), "roundtrip b");
|
||||||
|
ASSERT_EQ(dst[2], (uint16_t)(c << 6), "roundtrip c");
|
||||||
|
ASSERT_EQ(dst[3], (uint16_t)(d << 6), "roundtrip d");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_zero(void)
|
||||||
|
{
|
||||||
|
uint8_t packed[5] = { 0, 0, 0, 0, 0 };
|
||||||
|
uint16_t dst[4] = { 0xDEAD, 0xDEAD, 0xDEAD, 0xDEAD };
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], 0, "zero[0]");
|
||||||
|
ASSERT_EQ(dst[1], 0, "zero[1]");
|
||||||
|
ASSERT_EQ(dst[2], 0, "zero[2]");
|
||||||
|
ASSERT_EQ(dst[3], 0, "zero[3]");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_all_max(void)
|
||||||
|
{
|
||||||
|
/* All four pixels = 0x3FF (max 10-bit). Packed bits all 1 → all 0xFF. */
|
||||||
|
uint8_t packed[5] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
|
||||||
|
uint16_t dst[4] = { 0, 0, 0, 0 };
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], 0xFFC0, "max[0]");
|
||||||
|
ASSERT_EQ(dst[1], 0xFFC0, "max[1]");
|
||||||
|
ASSERT_EQ(dst[2], 0xFFC0, "max[2]");
|
||||||
|
ASSERT_EQ(dst[3], 0xFFC0, "max[3]");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_known_vectors(void)
|
||||||
|
{
|
||||||
|
/* Position-sensitive sanity: each pixel = its index+1. */
|
||||||
|
test_pack_unpack_roundtrip(1, 2, 3, 4);
|
||||||
|
/* Spread patterns that exercise every byte-boundary bit. */
|
||||||
|
test_pack_unpack_roundtrip(0x3FF, 0x000, 0x3FF, 0x000);
|
||||||
|
test_pack_unpack_roundtrip(0x000, 0x3FF, 0x000, 0x3FF);
|
||||||
|
test_pack_unpack_roundtrip(0x155, 0x2AA, 0x155, 0x2AA);
|
||||||
|
test_pack_unpack_roundtrip(0x001, 0x002, 0x004, 0x008);
|
||||||
|
test_pack_unpack_roundtrip(0x080, 0x040, 0x020, 0x010);
|
||||||
|
test_pack_unpack_roundtrip(0x200, 0x100, 0x080, 0x040);
|
||||||
|
test_pack_unpack_roundtrip(0x3F0, 0x0F3, 0x33C, 0x2A5);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_remainder_width(void)
|
||||||
|
{
|
||||||
|
/* width=1: only A unpacked, B/C/D undefined */
|
||||||
|
{
|
||||||
|
uint8_t packed[5];
|
||||||
|
uint16_t dst[1] = { 0xDEAD };
|
||||||
|
pack4(0x123, 0x000, 0x000, 0x000, packed);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 1, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], 0x123 << 6, "rem1[0]");
|
||||||
|
}
|
||||||
|
/* width=2 */
|
||||||
|
{
|
||||||
|
uint8_t packed[5];
|
||||||
|
uint16_t dst[2] = { 0, 0 };
|
||||||
|
pack4(0x111, 0x222, 0x000, 0x000, packed);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 2, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], 0x111 << 6, "rem2[0]");
|
||||||
|
ASSERT_EQ(dst[1], 0x222 << 6, "rem2[1]");
|
||||||
|
}
|
||||||
|
/* width=3 */
|
||||||
|
{
|
||||||
|
uint8_t packed[5];
|
||||||
|
uint16_t dst[3] = { 0, 0, 0 };
|
||||||
|
pack4(0x111, 0x222, 0x333, 0x000, packed);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 3, 1, 5);
|
||||||
|
ASSERT_EQ(dst[0], 0x111 << 6, "rem3[0]");
|
||||||
|
ASSERT_EQ(dst[1], 0x222 << 6, "rem3[1]");
|
||||||
|
ASSERT_EQ(dst[2], 0x333 << 6, "rem3[2]");
|
||||||
|
}
|
||||||
|
/* width=7: one full group + 3 remainder */
|
||||||
|
{
|
||||||
|
uint8_t packed[10];
|
||||||
|
uint16_t dst[7] = { 0 };
|
||||||
|
pack4(0x100, 0x200, 0x300, 0x010, &packed[0]);
|
||||||
|
pack4(0x011, 0x022, 0x033, 0x000, &packed[5]);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 7, 1, 10);
|
||||||
|
ASSERT_EQ(dst[0], 0x100 << 6, "rem7[0]");
|
||||||
|
ASSERT_EQ(dst[1], 0x200 << 6, "rem7[1]");
|
||||||
|
ASSERT_EQ(dst[2], 0x300 << 6, "rem7[2]");
|
||||||
|
ASSERT_EQ(dst[3], 0x010 << 6, "rem7[3]");
|
||||||
|
ASSERT_EQ(dst[4], 0x011 << 6, "rem7[4]");
|
||||||
|
ASSERT_EQ(dst[5], 0x022 << 6, "rem7[5]");
|
||||||
|
ASSERT_EQ(dst[6], 0x033 << 6, "rem7[6]");
|
||||||
|
}
|
||||||
|
/* width=8: two full groups */
|
||||||
|
{
|
||||||
|
uint8_t packed[10];
|
||||||
|
uint16_t dst[8] = { 0 };
|
||||||
|
pack4(0x101, 0x202, 0x303, 0x101, &packed[0]);
|
||||||
|
pack4(0x202, 0x303, 0x101, 0x202, &packed[5]);
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 8, 1, 10);
|
||||||
|
ASSERT_EQ(dst[7], 0x202 << 6, "w8[7]");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_multi_row_stride_padding(void)
|
||||||
|
{
|
||||||
|
/* 4-pixel-wide, 3-row plane; stride = 8 bytes (3 bytes padding). */
|
||||||
|
uint8_t packed[24]; /* 3 rows × 8 bytes */
|
||||||
|
uint16_t dst[12]; /* 3 rows × 4 pixels */
|
||||||
|
memset(packed, 0xCC, sizeof(packed)); /* padding poison */
|
||||||
|
|
||||||
|
pack4(0x111, 0x222, 0x333, 0x044, &packed[0 * 8]);
|
||||||
|
pack4(0x055, 0x166, 0x177, 0x188, &packed[1 * 8]);
|
||||||
|
pack4(0x099, 0x1AA, 0x2BB, 0x3CC, &packed[2 * 8]);
|
||||||
|
|
||||||
|
memset(dst, 0xAB, sizeof(dst));
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 4, 3, 8);
|
||||||
|
|
||||||
|
ASSERT_EQ(dst[0], 0x111 << 6, "row0[0]");
|
||||||
|
ASSERT_EQ(dst[3], 0x044 << 6, "row0[3]");
|
||||||
|
ASSERT_EQ(dst[4], 0x055 << 6, "row1[0]");
|
||||||
|
ASSERT_EQ(dst[7], 0x188 << 6, "row1[3]");
|
||||||
|
ASSERT_EQ(dst[8], 0x099 << 6, "row2[0]");
|
||||||
|
ASSERT_EQ(dst[11], 0x3CC << 6, "row2[3]");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_chroma_half_height(void)
|
||||||
|
{
|
||||||
|
/* 4-pixel-wide × 2-row chroma (matches 4×4 luma quadrant).
|
||||||
|
* NV15 chroma uses same packing as luma, just half-height. */
|
||||||
|
uint8_t packed[10]; /* 2 rows × 5 bytes */
|
||||||
|
uint16_t dst[8]; /* 2 rows × 4 pixels (UV pairs in interleaved form) */
|
||||||
|
|
||||||
|
pack4(0x080, 0x180, 0x280, 0x380, &packed[0]);
|
||||||
|
pack4(0x040, 0x140, 0x240, 0x340, &packed[5]);
|
||||||
|
|
||||||
|
nv15_unpack_plane_to_p010(packed, dst, 4, 2, 5);
|
||||||
|
|
||||||
|
ASSERT_EQ(dst[0], 0x080 << 6, "chroma row0[0]");
|
||||||
|
ASSERT_EQ(dst[3], 0x380 << 6, "chroma row0[3]");
|
||||||
|
ASSERT_EQ(dst[4], 0x040 << 6, "chroma row1[0]");
|
||||||
|
ASSERT_EQ(dst[7], 0x340 << 6, "chroma row1[3]");
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void)
|
||||||
|
{
|
||||||
|
test_zero();
|
||||||
|
test_all_max();
|
||||||
|
test_known_vectors();
|
||||||
|
test_remainder_width();
|
||||||
|
test_multi_row_stride_padding();
|
||||||
|
test_chroma_half_height();
|
||||||
|
printf("test_nv15_unpack: all PASS\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user