21 Commits

Author SHA1 Message Date
test0r e8c3937435 STUDY.md: replace with pointer to libva-multiplanar campaign Phase 0
The Phase 0 / Phase 2 substrate that lived here has been transformed
into ../phase0_findings.md as the campaign-level Phase 0 document.
This file is reduced to a pointer + a git-show recipe to recover the
prior content from commit e0acc33.
2026-05-04 08:08:32 +00:00
test0r e0acc33455 STUDY.md: phase 2 finding — libva surface stack works; Brave wall is chromeos pipeline
mpv --hwdec=vaapi successfully probes our driver end-to-end:
RequestQueryImageFormats, QueryConfigEntrypoints, CreateConfig,
QuerySurfaceAttributes, CreateSurfaces2, DeriveImage, CreateImage,
CreateBuffer, ExportSurfaceHandle all run clean across all seven enumerated
profiles. mpv then falls back to SW for actual decode (drops match the
SW baseline) because our decode-submission path isn't there yet — but
the libva entry-point surface is largely done.

Brave's "failed Initialize()ing the frame pool" turns out to be in
chromium's chromeos pipeline (PickDecoderOutputFormat → ImageProcessor
init in media/gpu/chromeos/video_decoder_pipeline.cc), not in our
driver. No more libva calls happen between our successful CreateContext
and the failure; chromium bails on the chromeos-specific V4L2
ImageProcessor it expects on real ChromeOS but doesn't find on a plain
Linux Wayland system. Fix is on the Chromium build side, not here.

Remaining real work in this library: decode submission path (Begin/
Render/EndPicture → V4L2 stateless queue/dequeue with controls
attached), and proper STREAMON ordering on hantro. STUDY.md now
documents both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:41:54 +00:00
test0r 283df22748 WIP: log surfaces_count + flags + SUCCESS marker in CreateContext 2026-04-25 22:34:22 +00:00
test0r 4671f64479 WIP: ENTER traces in image.c + buffer.c entry points 2026-04-25 22:30:44 +00:00
test0r 07fd527114 WIP: more entry-point tracing (CreateConfig, GetConfigAttributes, QuerySurfaceAttributes, QueryConfigEntrypoints)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:25:36 +00:00
test0r 2737ff921d WIP: instrument surface.c (CreateSurfaces2, ExportSurfaceHandle) for tracing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:21:44 +00:00
test0r f2c3a4c32f STUDY.md: checkpoint after first-day port work
Update with current state: library builds clean, vainfo enumerates
profiles, vaCreateContext succeeds on Brave (with STREAMON deferred as
WIP unblocker), next failure is frame pool initialization in
vaCreateSurfaces2. Documents the 12-step diff stack vs bootlin upstream
and what still needs to happen to actually decode a frame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:10:11 +00:00
test0r 44a73271ae WIP: defer STREAMON in CreateContext (probe how far vaCreateContext gets)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:08:28 +00:00
test0r 0a3432ad64 Eager CAPTURE format probe in RequestInit
Chromium's vaapi_video_decoder may call vaCreateContext with surfaces=NULL,
surfaces_count=0 and then create surfaces afterwards via vaCreateSurfaces2.
In that order driver_data->video_format is still NULL when CreateContext
runs and our early `video_format == NULL` guard returns OPERATION_FAILED.
Confirmed via temporary request_log() — Brave hits exactly that path on
ohm.

Move the probe out of RequestCreateSurfaces into a new
video_format_probe() helper in video.c, and call it eagerly from
RequestInit. RequestCreateSurfaces still re-probes if init came up NULL,
which preserves the original lazy behaviour for any caller that needs it.

Also small clean-up of surface.c since the probe block moved out: drop
the now-dead `bool found` local.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 22:03:38 +00:00
test0r ac674b84ec src/utils.c: tee request_log() to /tmp/libva-fourier.log
Sandboxed processes (Chromium GPU process etc.) redirect stderr, so
request_log() output never reaches our test harness. Add a lazy-open
append to /tmp/libva-fourier.log so we can capture diagnostic logging
regardless of process sandboxing.

Will revert before final.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:59:57 +00:00
test0r 1458622729 src/context.c: temporary diagnostic logging in CreateContext
Added request_log() calls at every failure path inside RequestCreateContext
to identify exactly which guard fires when Brave's vaCreateContext fails on
ohm. Will revert these before final.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:57:33 +00:00
test0r 4d14ffb801 src/tiled_yuv.S: stub tiled_to_planar on aarch64
The whole body of tiled_yuv.S is wrapped in #ifndef __aarch64__ — the
ARMv7 NEON Thumb code doesn't assemble on aarch64. On 64-bit ARM the
.S object is therefore empty and tiled_to_planar comes out as an
undefined symbol in the .so, which dlopen rejects at vainfo time:

  libva error: dlopen ... failed: undefined symbol: tiled_to_planar

Runtime-wise, tiled_to_planar is only called when
video_format_is_linear() returns false, which only happens for the
sunxi-cedrus DRM_FORMAT_MOD_ALLWINNER_TILED NV12 entry — never on
hantro/rkvdec. So the right fix is a no-op stub on aarch64 just to
satisfy the linker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:54:19 +00:00
test0r 13e9b64bcf src/h264.c: drop num_slices field; kernel infers from queued controls
decode->num_slices was the last field on v4l2_ctrl_h264_decode_params
that didn't survive the upstream cleanup. The kernel now infers the
slice count from how many slice_params controls were queued via the
request API for the given OUTPUT buffer; no explicit count is needed.

This was the only remaining build error; library should now compile clean
against current linux-api-headers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:52:17 +00:00
test0r fc4bb1063f src/h264.c: track upstream UAPI shape for H.264 controls
The H.264 stateless control structs and CIDs were upstreamed and
reshaped during the journey from libva-v4l2-request's bundled headers
to the kernel's <linux/v4l2-controls.h>. Three breaking changes need
matching code-side updates:

1. struct v4l2_ctrl_h264_slice_params lost its .size field. The full
   slice byte length is now communicated via the OUTPUT buffer's
   m.planes[0].bytesused at QBUF time, which v4l2_queue_buffer already
   sets. Drop the assignment.

2. struct v4l2_h264_reference replaces the bare integer DPB index in
   ref_pic_list0[] / ref_pic_list1[]. The new struct is { __u8 fields,
   __u8 index } where `fields` is V4L2_H264_TOP_FIELD_REF (0x1) /
   BOTTOM_FIELD_REF (0x2) / FRAME_REF (0x3). Hantro G1 only does
   frame-based H.264, so use V4L2_H264_FRAME_REF unconditionally.

3. The pred_weight_table block moved out of slice_params into its own
   V4L2_CID_STATELESS_H264_PRED_WEIGHTS control with struct
   v4l2_ctrl_h264_pred_weights. Add it as a separate v4l2_set_control()
   call alongside SPS/PPS/SLICE_PARAMS/DECODE_PARAMS/SCALING_MATRIX, and
   thread the pred_weights pointer through h264_va_slice_to_v4l2().

Also: switch the v4l2_set_control() CID arguments from V4L2_CID_MPEG_VIDEO_H264_*
to V4L2_CID_STATELESS_H264_* directly. h264-ctrls.h still defines the
old names as aliases so external callers keep working, but using the
canonical names internally matches what the kernel actually expects and
avoids confusion when reading h264.c side-by-side with the kernel
source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:51:43 +00:00
test0r da9f2a55d2 include/h264-ctrls.h: passthrough to system header + CID aliases
H.264 stateless controls were upstreamed to the kernel some time after
this library went dormant. The struct names match exactly (the kernel
adopted v4l2_ctrl_h264_{sps,pps,scaling_matrix,pred_weights,slice_params,
decode_params} and v4l2_h264_{weight_factors,dpb_entry,reference}
verbatim) so the duplicated copies in include/h264-ctrls.h trigger
redefinition errors as soon as any source file pulls in
<linux/videodev2.h> (which transitively pulls in <linux/v4l2-controls.h>).

Same fix as hevc-ctrls.h in commit 4ccbfe9: replace the bundled struct
definitions with a passthrough to <linux/v4l2-controls.h>. The CID
prefix changed during upstreaming from V4L2_CID_MPEG_VIDEO_H264_* to
V4L2_CID_STATELESS_H264_*; provide compatibility aliases so h264.c keeps
compiling. V4L2_PIX_FMT_H264_SLICE is already in <linux/videodev2.h>
(same fourcc value, same name we renamed to in commit c1f5108) so no
need to redefine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:49:26 +00:00
test0r 53999cd154 The real multiplanar wedge: probe MPLANE in surface.c, add NV12 mplane
This is what was actually broken — and it's much smaller than the STUDY
guessed.

The library is in fact multiplanar-aware throughout the v4l2.c helpers
(v4l2_setup_format, v4l2_get_format, v4l2_query_buffer, v4l2_queue_buffer,
v4l2_dequeue_buffer all branch on v4l2_type_is_mplane). The whole
"context.c / picture.c are single-plane" hypothesis from STUDY.md was wrong —
they derive output_type and capture_type from
video_format->v4l2_mplane and pass them through.

The bug is upstream of all that: driver_data->video_format is set in
RequestCreateSurfaces only, and the probe there only tries
V4L2_BUF_TYPE_VIDEO_CAPTURE (single-plane). On hantro the single-plane
ENUM_FMT returns nothing, video_format stays NULL, and every subsequent
operation hits the `if (video_format == NULL) return OPERATION_FAILED`
guard at the top of context.c / buffer.c / image.c. That's exactly what
Brave's vaCreateContext failure was — not a downstream multiplanar fault,
just the format-detection short-circuit firing.

Three small changes:

- src/video.c: add an NV12 multi-plane entry next to the existing
  single-plane NV12 / Sunxi entries. Same pixelformat fourcc (S264 vs NV12
  has nothing to do with mplane), distinguished by the v4l2_mplane bit.
- src/video.h + src/video.c: video_format_find() takes a `bool mplane`
  parameter and matches both fields. Without this the single-plane and
  multi-plane NV12 entries collide on pixelformat.
- src/surface.c: the probe block now tries V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE
  as a fallback after the single-plane probes return nothing. On a
  single-plane decoder the mplane probe is a no-op; on hantro it picks
  the new mplane NV12 entry, video_format gets set, and the rest of the
  library — already mplane-aware — does the right thing for OUTPUT
  S_FMT, REQBUFS, EXPBUF, QBUF, DQBUF.

Also: src/context.c: the H264 case still referenced V4L2_PIX_FMT_H264_SLICE_RAW
that we renamed in commit c1f5108. Caught now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:47:30 +00:00
test0r 294bdc24f6 STUDY.md: port plan + reference implementations + test fixtures
Cold-start dossier for the multiplanar port: goal, why-this-fork-exists,
state-today, port plan (v4l2.c / context.c / picture.c), reference impls
to read side-by-side (FFmpeg libavcodec/v4l2_request*, GStreamer
gst-plugins-bad/sys/v4l2codecs, Chromium media/gpu/v4l2), test fixtures
(ohm + bbb_1080p30_h264.mp4 + GStreamer ceiling at 6% CPU), out-of-scope
(HEVC/VP9/AV1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:30:54 +00:00
test0r 2f54a8d9e8 src/config.c: probe both single- and multi-plane V4L2 buffer types
Bootlin's original probe assumed single-plane decoders (the sunxi-cedrus
target the library was originally written for). Rockchip's hantro VPU
on RK3566/RK3588 reports V4L2_CAP_VIDEO_M2M_MPLANE only — the single-
plane VIDIOC_ENUM_FMT path returns nothing, so vainfo prints an empty
"Supported profile and entrypoints" list and downstream apps fall back
to software.

Add an OUTPUT_MPLANE fallback to all three format probes (MPEG-2, H.264,
HEVC). On a single-plane decoder the second probe is a no-op; on a
multi-plane decoder it's what makes the profile list non-empty.

This is the *probe* fix only — it gets vainfo to enumerate profiles and
gets Brave's GPU process to attempt vaCreateContext. The rest of the
multiplanar port (S_FMT / REQBUFS / EXPBUF / QBUF in context.c, picture.c,
v4l2.c) is the next phase; vaCreateContext currently fails because
those paths are still single-plane only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:27:46 +00:00
test0r 4ccbfe923f Strip HEVC build path
Three changes that together make the build compile cleanly against a
current linux-api-headers (>= 5.x post-HEVC-UAPI-rename):

- src/meson.build: comment h265.c + h265.h out of sources/headers.
- include/hevc-ctrls.h: replace bundled HEVC structs with a single
  #include <linux/v4l2-controls.h>. The bundled definitions were
  identical to what later landed in mainline as V4L2_CID_STATELESS_HEVC_*
  (renamed) and v4l2_ctrl_hevc_* (kept the field names but moved into
  the kernel public header). Keeping a duplicate copy now triggers
  redefinition errors. The header is kept as a passthrough rather than
  deleted so any downstream patch that says #include <hevc-ctrls.h>
  still compiles.
- src/picture.c: drop the four HEVC case blocks. Three of them were in
  switches that already had `default: break`, so removing them is
  functionally a no-op. The fourth was the only external reference to
  h265_set_controls — removing it lets the library link cleanly with
  h265.c excluded.

Why this is OK rather than the more ambitious "fix HEVC properly":
RK3566 has no HW HEVC at all (the only decoder block is the Hantro G1
which speaks H.264 / MPEG-2 / VP8). HEVC can come back as a separate
effort once we're on RK3588 silicon AND the library is updated to the
renamed kernel CIDs. For Fourier's first port milestone (H.264 multi-
plane on RK3566 hantro) HEVC is dead weight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:26:28 +00:00
test0r 1b02c9b476 src/h264.c: include utils.h for request_log() prototype
GCC's -Wimplicit-function-declaration is fatal in current toolchains
(GCC 14+) but was a warning when this code was written. h264.c uses
request_log() in two places without including utils.h; the build was
quietly relying on an implicit declaration that recent compilers reject.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:16:35 +00:00
test0r c1f5108ac3 include/h264-ctrls.h, src/config.c: rename to V4L2_PIX_FMT_H264_SLICE
Kernel mainline renamed V4L2_PIX_FMT_H264_SLICE_RAW → V4L2_PIX_FMT_H264_SLICE
some time after this library went dormant (the 'S264' fourcc value is
unchanged; only the C identifier moved). The two definitions have the same
value so this is purely cosmetic at runtime, but keeping the name aligned
with linux/videodev2.h matches downstream patches (e.g. bootlin PR #38)
and avoids confusion when reading kernel + library side-by-side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:13:11 +00:00
74 changed files with 1086 additions and 22577 deletions
+56 -262
View File
@@ -1,281 +1,75 @@
# libva-v4l2-request-fourier
# v4l2-request libVA Backend
VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
fork of the dormant `bootlin/libva-v4l2-request` upstream.
## About
> **TL;DR for "I want hardware-accelerated YouTube in Firefox on my
> Rockchip board":** skip to the [§ Quickstart](#quickstart) below.
> Fresnel (RK3399) and ampere (RK3588) are validated targets; ohm
> (RK3566 PineTab2) is the chromium-fourier validation rig.
This libVA backend is designed to work with the Linux Video4Linux2
Request API that is used by a number of video codecs drivers,
including the Video Engine found in most Allwinner SoCs.
## What works
## Status
| SoC / host | HW-accelerated codecs | Bit-exact vs `kdirect` |
|---|---|---|
| RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 | 5/5 at iter38; preserved through iter40b |
| RK3588 (ampere) | H.264 + HEVC (iter1+iter2 ampere-fourier); **mainline rkvdec / VDPU381 + VDPU383 landed February 2026** — VP9 / AV1 verification next | iter1 H.264 PASS; remaining codecs gated on mainline-driver bring-up |
| RK3568 / RK3566 (ohm — PineTab2) | H.264, MPEG-2, VP8 via hantro multi-planar | iter1-5 baseline (libva-multiplanar campaign) |
| BCM2712 (higgs — Pi 5 / CM5) | — | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved, [see § Pi 5 standoff](#the-pi-5-standoff) |
`kdirect` is the reference: `ffmpeg -hwaccel v4l2request
-hwaccel_output_format drm_prime ...` via Kwiboo's downstream ffmpeg
patches (packaged here as **`ffmpeg-v4l2-request-fourier`**, FFmpeg 8.1
tip @ Kwiboo `v4l2-request-n8.1` commit `b57fbbe`).
## Quickstart
### What you need for HW-accelerated YouTube in Firefox
The full stack, top to bottom, with the package this campaign provides
at each layer:
| Layer | Package(s) | Notes |
|---|---|---|
| Linux kernel with V4L2 stateless decoders | `linux-fresnel-fourier` (RK3399), `linux-ampere-fourier` (RK3588) | Mainline rkvdec / hantro / VDPU381 / VDPU383. ohm typically rides on a Beryllium OS host kernel. |
| `ffmpeg` with Kwiboo's v4l2-request hwaccel | `ffmpeg-v4l2-request-fourier` | Provides `-hwaccel drm -c:v hevc` (and h264/vp9) routes via libavcodec hwdevice DRM. |
| `libva` VA-API runtime + this backend ICD | `libva` (stock) + **`libva-v4l2-request-fourier`** | This repo. Auto-detects rkvdec / hantro / cedrus on probe. |
| Firefox patched to call libavcodec stateless | `firefox-fourier` | 5-patch series, ~+169 LoC over stock Firefox. Validated on fresnel: **~5 % CPU at 1080p30 H.264** (vs 64 % software). |
| (Wayland alt) Chromium patched for V4L2VDA | `chromium-fourier` + `kwin-fourier` | Validated on ohm under KDE Plasma 6.6.5 Wayland. Needs `kwin-fourier` for the dmabuf-fence latency fix. |
| (Optional) panfrost / panthor GPU stack | `vulkan-panfrost` | Wayland compositor + 3D. |
The actual VA-API path is mostly historical inside this campaign — the
**user-facing browser HW decode story rides libavcodec's
`v4l2_request` hwaccel directly**, not VAAPI-via-libva. Firefox-fourier
attaches an `AV_HWDEVICE_TYPE_DRM` context to libavcodec's generic
`h264`/`hevc`/`vp9` decoder; libavcodec then auto-binds the
`v4l2_request` hwaccel from its `hw_configs`. No `LIBVA_DRIVER_NAME`
incantation needed for browser use. libva-v4l2-request-fourier matters
for mpv, ffmpeg-as-vaapi, and other VA-API direct consumers.
### Install on Arch ALARM (fresnel / ampere / ohm)
Add the marfrit repo if you haven't already:
```ini
# /etc/pacman.conf
[marfrit]
SigLevel = Required
Server = https://packages.reauktion.de/arch/$arch
```
Import the signing key (one-time):
```bash
sudo pacman-key --recv-keys <KEY-ID> # see https://packages.reauktion.de
sudo pacman-key --lsign-key <KEY-ID>
sudo pacman -Sy
```
Then per host:
```bash
# Fresnel — RK3399 Pinebook Pro
sudo pacman -S \
linux-fresnel-fourier linux-fresnel-fourier-headers \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
firefox-fourier
# Ampere — RK3588
sudo pacman -S \
linux-ampere-fourier linux-ampere-fourier-headers \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
firefox-fourier
# Ohm — RK3566 PineTab2 (chromium-fourier validated path)
sudo pacman -S \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
kwin-fourier
# chromium-fourier currently still a local build — see § Status
```
Reboot if a new kernel landed. Then:
```bash
# Smoke-test: vainfo should list HEVCMain + H264 entries
LIBVA_DRIVER_NAME=v4l2_request vainfo
# Browser launch with verbose decoder logging
MOZ_LOG="PlatformDecoderModule:5,FFmpegVideo:5" \
firefox-fourier 2>&1 | tee /tmp/fx.log
# Then open a YouTube 1080p H.264 video and grep for:
# "Choosing FFmpeg pixel format for V4L2 video decoding"
# "av_hwdevice_ctx_create(DRM, /dev/dri/renderD128) ok"
# If you DON'T see those: HW path didn't engage, fell back to software.
```
### Status of the published vs locally-built packages
As of May 2026, the live marfrit repo at
<https://packages.reauktion.de/arch/aarch64/> has:
-`libva-v4l2-request-fourier-1:1.0.0.r361.cf8cd9d-1` (iter40b tip)
-`ffmpeg-v4l2-request-fourier-2:8.1.r123329.b57fbbe-3` (Kwiboo's
v4l2-request-n8.1 + libudev-bypass; smoke-tested on fresnel —
HEVC via `-hwaccel v4l2request` PASS)
-`firefox-fourier-150.0.1-16` (5-patch series, sandboxed RDD HW
decode validated on RK3399: ~5 % CPU at 1080p30 H.264)
-`linux-fresnel-fourier-7.0-14` + headers (RK3399)
-`linux-ampere-fourier-7.0rc3.kafr1-1` + headers (RK3588)
-`kwin-fourier-1:6.6.5-1` (Wayland dmabuf-fence fix for chromium-fourier)
-`vulkan-panfrost-1:26.0.5-1` (GPU stack)
NOT yet published but **present in `marfrit-packages/arch/` source
tree** (build + publish pending):
-`chromium-fourier` (Chromium 147 + V4L2VDA-on-mainline patches —
blocked on Arch ALARM bumping clang 22 → 23).
-`qt6-base-fourier` (GL_ALPHA → GL_R8 fix — needed by KDE Plasma
Wayland on the panfrost stack).
If you need those locally before they ship:
```bash
git clone ssh://git@git.reauktion.de:2222/marfrit/marfrit-packages.git
cd marfrit-packages/arch/<package>
makepkg -si
```
## What does NOT work, and why it's stalled
| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
## What does NOT work, and why it's stalled
| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
### The Pi 5 standoff
iter40 + iter40b add a third multi-device-probe slot for
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
and a (fragile, fixture-specific) SPS field override using the
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
**Decode itself never succeeds** — every CAPTURE DQBUF returns
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
validation is intentional ("`try_ext_ctrls returned an error (22)` is
expected as it is validating the SPS"), and VAAPI's
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
slice-level `num_entry_point_offsets`) that the driver wants. We can't
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
the SPS itself and passes only slice NAL bytes to libva backends.
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
driver. It's an ecosystem coordination failure of long standing:
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
series in August 2024. Still un-merged in May 2026 — **eight years
in the upstream review queue**.
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
meaningful commits since ~2021. Nobody wants to own the impedance
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
I'll just drive the HW."
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
months in review. The Pi 6.18.x downstream kernel meanwhile has
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
aren't being fast-tracked because "the new uAPI is coming."
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
not via libva — explicit acknowledgement from David Turner that
libavcodec needs to retain the SPS context for the strict driver to
accept the control batch.
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
kernel). Anyone on a stock distro outside those two: no HW HEVC on
Pi 5.
Nobody who has authority to merge has skin in the game. Everyone with
skin in the game lacks authority. Result: 8-year stalemate, three
forks of working code, no merged upstream.
### What this means for this backend
We chose to extend `libva-v4l2-request` into Pi 5 territory because
the architecture maps cleanly onto the existing iter38 multi-device
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
`071b08d`). It's reusable infrastructure for any future strict V4L2
stateless decoder that ffmpeg ships before libva does.
But the *user-facing* Pi 5 HEVC story will not come from this
backend. The backend was a clean architectural target inside a
coordination dead-end. The actual Pi 5 HEVC path through libva
requires either:
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
against (Intel-driven; no Pi-aligned principal),
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
maintainer),
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
(small upstream patch; no champion), OR
- the political situation around Kwiboo's series unblocks (no
visible movement).
iter40 + iter40b are **landed but parked**. The fresnel + ampere
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
verified post-iter40b, no regression). Phase 8 packaging is
deliberately skipped — shipping a `.deb` whose primary advertised
target (Pi 5) doesn't actually decode would mislead users.
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
chapter's full empirical record.
The v4l2-request libVA backend currently supports the following formats:
* MPEG2 (Simple and Main profiles)
* H264 (Baseline, Main and High profiles)
* H265 (Main profile)
## Instructions
In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
variable:
In order to use this libVA backend, the `v4l2_request` driver has to
be specified through the `LIBVA_DRIVER_NAME` environment variable, as
such:
export LIBVA_DRIVER_NAME=v4l2_request
Then a VA-API-capable player can decode supported codecs on a probed
device:
A media player that supports VAAPI (such as VLC) can then be used to decode a
video in a supported format:
vlc path/to/video.mp4
mpv --hwdec=vaapi path/to/video.mp4
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
vlc path/to/video.mpg
The backend auto-detects available decoders via the V4L2 media
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
`LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
Sample media files can be obtained from:
http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/
## Technical Notes
### Multi-device probe (iter38)
### Surface
A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
re-targets the active fd per profile via
`request_switch_device_for_profile()`. Pool teardown happens at
switch time; the next `CreateContext` rebuilds against the right
device.
A Surface is an internal data structure never handled by the VA's user
containing the output of a rendering. Usualy, a bunch of surfaces are created
at the begining of decoding and they are then used alternatively. When
created, a surface is assigned a corresponding v4l capture buffer and it is
kept until the end of decoding. Syncing a surface waits for the v4l buffer to
be available and then dequeue it.
### Surface / Context / Picture / Image
Note: since a Surface is kept private from the VA's user, it can ask to
directly render a Surface on screen in an X Drawable. Some kind of
implementation is available in PutSurface but this is only for development
purpose.
A Surface is an internal data structure containing rendering output.
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
defaults) for one decode session. A Picture is one encoded input
frame's set of buffers. An Image is a Standard VA pixel-format view
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
NV15 to NV12/P010 here so consumers see linear pitches.
### Context
The real rendering is in `EndPicture`, not `RenderPicture`, because
the kernel needs the full extended-control batch when the OUTPUT
buffer is queued, and `RenderPicture` order is consumer-defined.
A Context is a global data structure used for rendering a video of a certain
format. When a context is created, input buffers are created and v4l's output
(which is the compressed data input queue, since capture is the real output)
format is set.
### Picture
A Picture is an encoded input frame made of several buffers. A single input
can contain slice data, headers and IQ matrix. Each Picture is assigned a
request ID when created and each corresponding buffer might be turned into a
v4l buffers or extended control when rendered. Finally they are submitted to
kernel space when reaching EndPicture.
The real rendering is done in EndPicture instead of RenderPicture
because the v4l2 driver expects to have the full corresponding
extended control when a buffer is queued and we don't know in which
order the different RenderPicture will be called.
### Image
An Image is a standard data structure containing rendered frames in a usable
pixel format. Here we only use NV12 buffers which are converted from sunxi's
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
Surface.
-5
View File
@@ -195,11 +195,6 @@ extern "C" {
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
/* iter39: NV15 is 4×10-bit packed in 5 bytes (Rockchip rkvdec 10-bit output). */
#ifndef DRM_FORMAT_NV15
#define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */
#endif
/*
* 3 plane YCbCr
* index 0: Y plane, [7:0] Y
+46
View File
@@ -0,0 +1,46 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Fourier-local: H.264 stateless controls and structs were upstreamed to
* the kernel some time after this library went dormant. Defer all of:
*
* struct v4l2_ctrl_h264_sps / pps / scaling_matrix / pred_weights /
* slice_params / decode_params
* struct v4l2_h264_weight_factors / dpb_entry / reference
* enum v4l2_*_h264_*
*
* to <linux/v4l2-controls.h>, where they now live with the same field
* names. The duplicated definitions in the original bootlin
* include/h264-ctrls.h conflict at compile time on any current kernel.
*
* The CID prefix was also renamed from V4L2_CID_MPEG_VIDEO_H264_* to
* V4L2_CID_STATELESS_H264_*. Provide compatibility aliases so h264.c
* keeps compiling without an invasive sed across that file.
*
* V4L2_PIX_FMT_H264_SLICE itself lives in <linux/videodev2.h> on current
* kernels, so no need to redefine it here.
*/
#ifndef _H264_CTRLS_H_
#define _H264_CTRLS_H_
#include <linux/videodev2.h>
#include <linux/v4l2-controls.h>
#ifndef V4L2_CID_MPEG_VIDEO_H264_SPS
#define V4L2_CID_MPEG_VIDEO_H264_SPS \
V4L2_CID_STATELESS_H264_SPS
#define V4L2_CID_MPEG_VIDEO_H264_PPS \
V4L2_CID_STATELESS_H264_PPS
#define V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX \
V4L2_CID_STATELESS_H264_SCALING_MATRIX
#define V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS \
V4L2_CID_STATELESS_H264_SLICE_PARAMS
#define V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS \
V4L2_CID_STATELESS_H264_DECODE_PARAMS
#define V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE \
V4L2_CID_STATELESS_H264_DECODE_MODE
#define V4L2_CID_MPEG_VIDEO_H264_START_CODE \
V4L2_CID_STATELESS_H264_START_CODE
#endif
#endif /* _H264_CTRLS_H_ */
+16 -6
View File
@@ -1,9 +1,19 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Fourier-local override: HEVC controls are upstream since linux-media
* 6.6+, so defer to the kernel's linux/v4l2-controls.h instead of
* duplicating the struct definitions (duplication causes redefinition
* errors on newer linux-api-headers). */
#ifndef _LIBVA_V4L2_REQUEST_HEVC_CTRLS_H
#define _LIBVA_V4L2_REQUEST_HEVC_CTRLS_H
/*
* Fourier-local: HEVC support stripped from this build. The bundled
* HEVC controls structs and CIDs (V4L2_CID_MPEG_VIDEO_HEVC_*) were
* upstreamed and renamed to V4L2_CID_STATELESS_HEVC_* in mainline
* linux/v4l2-controls.h, making the original duplicated definitions
* conflict on any current kernel. RK3566 has no HW HEVC anyway, so the
* port is starting from H.264 + MPEG-2 only; HEVC can come back as a
* separate effort once a) the host has HEVC silicon (RK3588 hantro/
* VDPU381) and b) the library is updated to the renamed CIDs.
*
* h265.c and h265.h are excluded from src/meson.build; this header is
* left as a placeholder so any stray '#include <hevc-ctrls.h>' from a
* downstream patch keeps compiling.
*/
#ifndef _HEVC_CTRLS_H_
#define _HEVC_CTRLS_H_
#include <linux/v4l2-controls.h>
#endif
+82
View File
@@ -0,0 +1,82 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* These are the MPEG2 state controls for use with stateless MPEG-2
* codec drivers.
*
* It turns out that these structs are not stable yet and will undergo
* more changes. So keep them private until they are stable and ready to
* become part of the official public API.
*/
#ifndef _MPEG2_CTRLS_H_
#define _MPEG2_CTRLS_H_
#define V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS (V4L2_CID_MPEG_BASE+250)
#define V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION (V4L2_CID_MPEG_BASE+251)
/* enum v4l2_ctrl_type type values */
#define V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS 0x0103
#define V4L2_CTRL_TYPE_MPEG2_QUANTIZATION 0x0104
#define V4L2_MPEG2_PICTURE_CODING_TYPE_I 1
#define V4L2_MPEG2_PICTURE_CODING_TYPE_P 2
#define V4L2_MPEG2_PICTURE_CODING_TYPE_B 3
#define V4L2_MPEG2_PICTURE_CODING_TYPE_D 4
struct v4l2_mpeg2_sequence {
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Sequence header */
__u16 horizontal_size;
__u16 vertical_size;
__u32 vbv_buffer_size;
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Sequence extension */
__u16 profile_and_level_indication;
__u8 progressive_sequence;
__u8 chroma_format;
};
struct v4l2_mpeg2_picture {
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Picture header */
__u8 picture_coding_type;
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Picture coding extension */
__u8 f_code[2][2];
__u8 intra_dc_precision;
__u8 picture_structure;
__u8 top_field_first;
__u8 frame_pred_frame_dct;
__u8 concealment_motion_vectors;
__u8 q_scale_type;
__u8 intra_vlc_format;
__u8 alternate_scan;
__u8 repeat_first_field;
__u16 progressive_frame;
};
struct v4l2_ctrl_mpeg2_slice_params {
__u32 bit_size;
__u32 data_bit_offset;
__u64 backward_ref_ts;
__u64 forward_ref_ts;
struct v4l2_mpeg2_sequence sequence;
struct v4l2_mpeg2_picture picture;
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Slice */
__u32 quantiser_scale_code;
};
struct v4l2_ctrl_mpeg2_quantization {
/* ISO/IEC 13818-2, ITU-T Rec. H.262: Quant matrix extension */
__u8 load_intra_quantiser_matrix;
__u8 load_non_intra_quantiser_matrix;
__u8 load_chroma_intra_quantiser_matrix;
__u8 load_chroma_non_intra_quantiser_matrix;
__u8 intra_quantiser_matrix[64];
__u8 non_intra_quantiser_matrix[64];
__u8 chroma_intra_quantiser_matrix[64];
__u8 chroma_non_intra_quantiser_matrix[64];
};
#endif
-11
View File
@@ -4,14 +4,3 @@ option(
value : '',
description: 'Path to sanitized Linux Kernel headers'
)
option(
'daedalus_v4l2',
type : 'boolean',
value : true,
description: 'Enable probe + dispatch for the out-of-tree daedalus_v4l2 ' +
'stateless decoder shim (Pi 5 / CM5 daemon-backed VP9/AV1/H264). ' +
'Default true; disable on platforms where the daedalus_v4l2 ' +
'kernel module will never be present to slim the probe array.'
)
-298
View File
@@ -1,298 +0,0 @@
# Phase 0 — Pi 5 / CM5 HEVC chapter
Opened 2026-05-17 evening, after the failed `libva-v4l2-stateful-fourier`
scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated
the stateful premise: rpi-hevc-dec is V4L2 **stateless**, so Pi 5 HEVC
belongs in this backend, not a separate sibling.
No code in this chapter yet. This doc is the substrate. Phase 1 picks up
from the "Open questions" section.
## Substrate
### Target host
higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often
offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power
when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at
`/dev/video19` + `/dev/media1`.
### Backend baseline at chapter open
`libva-v4l2-request-fourier` master tip `cf8cd9d` (iter39 + Option B +
h265 ref-list cap fix). Multi-device probe (iter38) already opens
rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is
a natural extension of that architecture.
iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265
parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That
plumbing is probe-gated (`has_hevc_ext_sps_rps_rkvdec`), so it stays
dormant on hosts where the controls don't exist.
### Empirical higgs probe (brother session)
`v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls`:
```
Stateless Codec Controls
hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS)
hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS)
slice_param_array (compound dynamic-array dims=[4096])
hevc_scaling_matrix (compound)
hevc_decode_parameters (compound)
hevc_decode_mode (menu, "Frame-Based")
hevc_start_code (menu, default "No Start Code")
OUTPUT formats:
S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload)
CAPTURE formats:
NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled)
NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled)
```
Conclusion: this is the standard `V4L2_CID_STATELESS_HEVC_*` control set
exposed under the V4L2-request uAPI, exactly the same family our backend
already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are
two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).
## What carries forward unchanged
- VAAPI HEVC profile enumeration (`config.c`)
- `h265_set_controls` core path (`h265.c`) — same compound ctrl set
- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
- Multi-device dispatch in `RequestCreateConfig` (iter38)
- VAAPI slice / picture / IQ matrix buffer parsing
- HEVC h264-style start-code policy (we already DON'T prepend for HEVC)
## What needs adding
| Item | Location | Sizing |
|------|----------|--------|
| `RPI_HEVC_DEC` enum in `driver_kind_t` | `request.h` | trivial |
| Multi-device probe extends to `/dev/video19` discovery | `context.c` / `request.c` init | small — mirror hantro slot |
| `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry | `video.c` | small |
| `V4L2_PIX_FMT_NV12_10_COL128` (NC30) `video_format` entry | `video.c` | small |
| NC12 → NV12 detile primitive | new `nv12_col128.c` | mid — column tile layout, see kernel docs |
| NC30 → P010 detile primitive | new `nv12_col128.c` | mid — 10-bit variant of above |
| `copy_surface_to_image` branch for NC12/NC30 | `image.c` | small (mirror NV15→P010 gating) |
| Per-driver gating for any rpi-specific quirks discovered | various | per [[per-driver-kludge-gating]] |
## Open questions for Phase 1
Lock these before Phase 1 commits to a goal.
1. **EXT_SPS controls on rpi-hevc-dec?** Brother's `--list-ctrls` output
above shows the standard `V4L2_CID_STATELESS_HEVC_*` family — NOT the
`EXT_SPS_ST_RPS` / `EXT_SPS_LT_RPS` extensions that VDPU381 needs.
Verify: does `slice_param_array[4096]` accept `st_rps_bits` /
`lt_rps_bits` in the per-slice payload, or does rpi-hevc-dec parse RPS
itself from the slice header? If the latter, the iter2 EXT_SPS path
stays dormant (probe-gated already), and rpi-hevc-dec just needs the
`picture->st_rps_bits``slice_params->short_term_ref_pic_set_size`
plumbing that iter31 α-29 already wired. Expectation: works out of the
box. Confirm before assuming.
2. **`hevc_start_code` ctrl: "No Start Code" vs Annex B?** Brother saw
default `"No Start Code"` — matches our behavior (we don't prepend on
HEVC). But the ctrl is configurable. Verify the menu values exposed
and confirm "No Start Code" passes our raw slice-NAL payload as-is.
If it doesn't, set the ctrl explicitly per [[unconditional-codec-state]]
gating.
3. **NC12 / NC30 SAND tile layout — exact spec.** Read
`Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst` for the
COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes
(UV interleaved). Row count = `ALIGN(height, 16)` or `ALIGN(height, 8)`?
Get the exact alignment and tile-traversal order before writing the
detile primitive. Cite from kernel doc, NOT inferred from a hex dump.
4. **drm_prime / SAND modifier round-trip.** Does ffmpeg-vaapi (and
Firefox) accept the NC12 buffer via DRM_PRIME export carrying the
DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing
zero-copy to a SAND-aware compositor? Or is libva-side detile to a
linear NV12 buffer the only viable Firefox path? If detile is
required for the consumer, the [[rockchip-pixel-verify-path]] rule
(DMA-BUF GL preferred over cached mmap) might NOT apply since SAND
is Pi-specific and not in the wider Wayland modifier ecosystem.
5. **rpi-hevc-dec quirks on first SPS submission.** rkvdec needs
image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec
have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE"
ordering? Verify with strace early.
6. **higgs OS + libva versioning.** Brother probed on Debian. We package
for Arch ALARM. What's the install path on higgs — Arch / Debian /
Raspberry Pi OS? If Debian, the package needs a `debian/` tree, not
just PKGBUILD. Decide packaging target before Phase 8.
## Phase 1 goal sketch (NOT locked)
> Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact
> libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).
Two measurable subgoals follow naturally:
- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request,
NV12 image output) byte-exact for the same input.
- Firefox VA-API path engages (verify via `chrome://gpu` equivalent / log
inspection — `MOZ_LOG=PlatformDecoderModule:5`).
## Phase 3 baseline plan
Before any backend code touches rpi-hevc-dec:
- `kdirect` floor: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime
-i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...` and
sha256 the YUV.
- `SW reference`: same ffmpeg without `-hwaccel`, sha256 the YUV.
- Both runs N=3 per [[replicate-baseline-first]].
- Capture `strace -f -e ioctl` of the kdirect run — gives the canonical
ioctl sequence rpi-hevc-dec expects.
## Phase 0 closing
This doc commits the substrate. Phase 1 starts when:
- higgs is up + reachable
- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one
short probe session
- Phase 3 baseline floors are captured
No work blocks the close of iter39 / fresnel campaign — those are shipped.
## Phase 0 close addendum (2026-05-17 evening, higgs probe session)
Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6.
Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1
opens with what's below.
### Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present
`v4l2-ctl -d /dev/video19 --list-ctrls` confirms ONLY the standard
`V4L2_CID_STATELESS_HEVC_*` set:
- `hevc_sequence_parameter_set` (0x00a40a90)
- `hevc_picture_parameter_set` (0x00a40a91)
- `slice_param_array` (0x00a40a92, dynamic-array dims=[4096])
- `hevc_scaling_matrix` (0x00a40a93)
- `hevc_decode_parameters` (0x00a40a94)
- `hevc_decode_mode` (0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)
- `hevc_start_code` (0x00a40a96, menu min=0 max=1 default=0 = No Start Code)
- 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)
ioctl trace confirms ffmpeg's `VIDIOC_QUERY_EXT_CTRL` for `0xa97` returns
EINVAL — same probe pattern our backend uses for
`has_hevc_ext_sps_rps_rkvdec`. **The iter2 path stays dormant; the
iter31 α-29 `slice_params->short_term_ref_pic_set_size` plumbing is the
correct one for rpi-hevc-dec.**
### Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}
Default 0 matches our backend's "don't prepend HEVC start code" stance.
Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.
### Q3 — NC12 / NC30 SAND tile layout: PARTIAL
CAPTURE S_FMT result for 1280×720 NC12:
- `sizeimage=1382400` = `1280 × 720 × 1.5` (linear NV12 byte count)
- `bytesperline=1080` (NOT 1280)
The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely
encodes SAND column count rather than linear stride. Read
`drivers/staging/media/rpivid/` (or wherever NC12_COL128 lives in 6.12)
kernel source + `drm_fourcc.h` / `nv12_col128.rst` (if it exists) for
exact tile layout BEFORE writing the detile primitive. Do NOT infer
layout from this single observation.
### Q4 — DRM modifier round-trip: BLOCKED on hwdownload
ffmpeg `-hwaccel drm -hwaccel_output_format drm_prime -vf
hwmap=mode=read,format=nv12` returns `Failed to map frame: -38`
(`Function not implemented`). hwdownload cannot consume the SAND
modifier directly.
ffmpeg's path that DOES work: `-hwaccel drm -c:v hevc` WITHOUT
`-hwaccel_output_format drm_prime` lets ffmpeg's internal pipeline pull
back, detile (presumably via a Pi-specific helper or libdrm transform),
and present NV12 to the next filter. Bit-exact vs SW for the test
fixture (1280×720 Main 8-bit) — confirms HW engagement.
Phase 1 / Phase 4 will need to decide:
- Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
- Pass-through DRM_PRIME with SAND modifier and let the consumer
(compositor / Firefox) detile. Firefox almost certainly can't, so
CPU detile is the safe bet.
### Q5 — rpi-hevc-dec submission ordering: empirically locked
`strace -e ioctl` of the kdirect run shows:
1. `MEDIA_IOC_DEVICE_INFO` + `MEDIA_IOC_G_TOPOLOGY` (per media node)
2. `VIDIOC_QUERYCAP` per video node — `driver="rpi-hevc-dec"` identifies
the right one
3. `VIDIOC_ENUM_FMT` OUTPUT → S265 only
4. `VIDIOC_S_FMT` OUTPUT (HEVC_SLICE, placeholder dims)
5. `VIDIOC_REQBUFS` OUTPUT (DMABUF, count=N) — count=6 in kdirect
6. `VIDIOC_S_FMT` CAPTURE (NC12, actual dims from SPS parse)
7. `VIDIOC_CREATE_BUFS` CAPTURE (DMABUF, count=16)
8. `VIDIOC_STREAMON` both queues
9. `VIDIOC_QUERY_EXT_CTRL` enumeration
10. `VIDIOC_S_EXT_CTRLS` (decode_mode + start_code) — global ctrls
11. Per frame: `VIDIOC_S_EXT_CTRLS` (SPS+PPS+decode_params+slice_array,
class=0xf010000 = per-request) + `VIDIOC_QBUF` CAPTURE + `VIDIOC_QBUF`
OUTPUT (with `V4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD`) +
`VIDIOC_DQBUF` OUTPUT + `VIDIOC_DQBUF` CAPTURE
**Two structural notes for the backend:**
- OUTPUT + CAPTURE both use `V4L2_MEMORY_DMABUF` in kdirect. Our backend
currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should
either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or
use MMAP and CPU-detile. Phase 4 design decision.
- The order `S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS
CAPTURE → STREAMON` differs from our iter25 rkvdec pre-seed pattern
(where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve
the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed —
CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm
in Phase 1 by trying our existing iter25 pre-seed flow against it.
### Q6 — packaging: Debian 13 trixie, NOT Arch
higgs runs Debian 13 trixie (`PRETTY_NAME="Debian GNU/Linux 13 (trixie)"`),
not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for
the Pi 5 chapter needs a `debian/` packaging tree, not just a PKGBUILD.
Decide in Phase 1 whether to:
- Add Debian packaging to `marfrit-packages` as a second target, OR
- Use distrobox/podman with an Arch ALARM container on higgs for
install (test-only, not production), OR
- Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian
repo.
### Other new findings from the probe session
- **ffmpeg 7.1.3 from Debian 13 is built with `--enable-v4l2-request`**
— the kdirect path exists. Invocation is `ffmpeg -hwaccel drm -c:v
hevc` (not just `-hwaccel drm`; the explicit codec flag matters for
the negotiation). Engagement log line is
`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`. Per
[[hw-decode-engagement-check]], grep for that line to confirm HW path
engaged.
- **No libva ICD installed on higgs** — only `armada-drm_dri.so` ships,
which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi
5 once installed.
- **mpv is apt-installable** (`mpv 0.40.0-3+deb13u1`) — useful as a
pixel-readback verifier once the backend works (`mpv --vo=image` or
`--vo=drm`).
- **Firefox 145.0.1 + rpi-firefox-mods 20251016 installed** (firefox-esr
package status was `rc` = removed but config remains). The mods
package likely contains VA-API plumbing prefs.
### What changes for Phase 1
- Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for
the 1280×720 Main 8-bit test fixture (same generator as
`/tmp/bbb_main.mp4` here). Kdirect engagement signal is the
`Hwaccel V4L2 HEVC stateless V4` log line.
- Most backend code reuses existing rkvdec/hantro HEVC path: ctrls,
per-frame submission, request_fd, multi-device probe pattern.
- New code: NC12 video_format entry + detile primitive (sibling to
`nv15_unpack_plane_to_p010`) + RPI_HEVC_DEC driver_kind.
- Packaging target = Debian, not Arch.
-230
View File
@@ -1,230 +0,0 @@
# Phase 1+2+3+4 — Pi 5 HEVC chapter (iter40)
Per [[feedback_dev_process]], Phase 1 (goal), Phase 2 (situation analysis),
Phase 3 (baselines), Phase 4 (plan) for adding rpi-hevc-dec as a third
multi-device-probe slot in `libva-v4l2-request-fourier`. Phase 0 substrate
+ open-question answers live at `phase0_pi5_hevc.md`.
## Phase 1 — Goal
> **libva-v4l2-request-fourier on higgs** decodes HEVC Main 8-bit input
> producing NV12 output **bit-exact vs kdirect** for three reference
> fixtures (640×360, 1280×720, 1920×1080 — Main profile, libx265
> ultrafast). HW path engagement verified via the kernel-driver lsof
> signal (`/dev/video19` open) AND ffmpeg-vaapi engagement signal
> (`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19`).
Measurable:
| Criterion | Metric |
|---|---|
| C1 — vainfo enumeration | `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain : VAEntrypointVLD` |
| C2 — bit-exact decode | sha256 of libva NV12 output == sha256 of kdirect NV12 output, per fixture, N=1 |
| C3 — HW engagement | `lsof` shows `/dev/video19` open by ffmpeg-vaapi during libva run |
| C4 — Stability under N=3 | C2 holds at N=3 repeated runs (deterministic) |
| C5 — Sibling baseline preserved | fresnel iter38 5/5 still PASS post-iter40 (no regression to rkvdec/hantro path) |
Out of scope this iter: Main10 (10-bit / NC30), VP9, AV1, Firefox VA-API
engagement testing, performance benchmarks. All later chapters.
## Phase 2 — Situation Analysis
### Backend architecture already in place
- **Multi-device probe (iter38)**: at `VA_DRIVER_INIT` opens both
`rkvdec` + `hantro-vpu` via `find_decoder_device_by_driver(name)`.
Stores per-driver fds (`video_fd_{rkvdec,hantro}`,
`media_fd_{rkvdec,hantro}`). `RequestCreateConfig` retargets the
"active" `driver_data->{video,media}_fd` per profile via
`request_switch_device_for_profile()` (request.c:426-478).
- **Per-driver feature gating**: `request_data->has_hevc_ext_sps_rps_{rkvdec,hantro}`
pair, with `h265_set_controls` consulting the per-fd flag. Established
by iter2 / Phase 5 review (request.h:99-100). This is the canonical
per-driver gating shape for iter40.
- **HEVC ctrl population**: `h265_set_controls` populates the standard
`V4L2_CID_STATELESS_HEVC_*` set (h265.c). Probe-gates EXT_SPS_*_RPS
via the iter2 path — naturally dormant for rpi-hevc-dec since the
controls don't exist.
- **Synthetic SPS pre-seed (iter25/26)**: needed for rkvdec to resolve
`image_fmt` before CAPTURE alloc. Phase 0 strace shows rpi-hevc-dec
does NOT need this — it accepts NC12 + explicit dims on `S_FMT
CAPTURE` directly. The pre-seed code path stays in place for rkvdec;
rpi-hevc-dec just doesn't trigger it (gate on driver_kind).
- **CAPTURE detile primitive**: `nv15_unpack_plane_to_p010()` (nv15.c)
is the template — backend already CPU-detiles when a Pi-or-Rockchip-
specific CAPTURE format meets a linear consumer (VAImage NV12 / P010).
- **Single-plane (S) vs multi-plane (M) handling**: hantro uses MPLANE,
rkvdec uses both depending on codec. rpi-hevc-dec exposes MPLANE for
BOTH OUTPUT (HEVC_SLICE) and CAPTURE (NC12) per the strace. iter38
already supports MPLANE handling for hantro; rpi reuses that.
### Surface area to touch (audit)
| File | What changes | Size |
|------|--------------|------|
| `src/request.h` | Add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`, `has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout) | ~10 lines |
| `src/request.c` | (a) Extend init -1 block to cover new fds. (b) Recognize `rpi-hevc-dec` as a 3rd primary/alt driver string in the probe loop. (c) Extend `request_device_kind_for_profile` so HEVC→'p' when rpi-hevc-dec is present, else 'r'. (d) Extend `request_switch_device_for_profile` 'p' branch. (e) Probe HEVC ext_sps on the new fd (will be false, mirrors hantro entry). | ~80 lines |
| `src/video.c` | Add `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry: 4:2:0, planes=1, alignment via dedicated bytesperline/sizeimage formula. NOT marked linear. | ~20 lines |
| `src/nv12_col128.c` (NEW) | `nv12_col128_detile_to_nv12()`: Y plane + UV plane detile primitive. Adapted from ffmpeg/Kynesim `av_rpi_sand_to_planar_y8` core math. Header doc traces back to videodev2.h docstring + raspberrypi/linux `hevc_dec/hevc_d_video.c` size formula. | ~80 lines + 30-line header |
| `src/image.c` | Add NC12 → NV12 branch in `copy_surface_to_image`, gated on `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` (sibling to existing NV15→P010 branch). | ~25 lines |
| `src/meson.build` + `src/Makefile.am` | List `nv12_col128.c`/`.h` in sources | 2 lines |
Total estimated diff: ~250 LoC backend + ~100 LoC standalone primitive.
Roughly half the surface area of iter38; smaller than iter2.
### What does NOT change
- iter25/26 SPS pre-seed: stays on rkvdec path only (gated by
driver_kind check that's already implicit in the rkvdec fd routing).
- iter2 EXT_SPS plumbing: probe-gated off on rpi-hevc-dec; vendored
GStreamer parser unused. Confirmed via the EINVAL on ctrl 0xa97.
- iter31 α-29 slice_params st_rps_bits: APPLIES to rpi-hevc-dec
unchanged. Same plumbing.
- iter33 VP8 hantro start-code prepend: not relevant (rpi-hevc-dec is
HEVC-only; VP8 still goes through hantro on RK).
- iter38 single-libva-session multi-codec semantics: extends from 5
codecs to 5+1 (HEVC reroutes on Pi).
### NC12 / SAND128 tile geometry — locked contract
From kernel driver `drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c`
(via [[github raspberrypi/linux rpi-6.12.y]]):
```c
case V4L2_PIX_FMT_NV12_COL128:
width = ALIGN(width, 128); /* Width rounds up to columns */
height = ALIGN(height, 8);
bytesperline = constrain2x(bytesperline, height * 3 / 2);
sizeimage = bytesperline * width;
break;
```
For 1280×720:
- width = 1280 (already 128-aligned)
- height = 720 (already 8-aligned)
- bytesperline = 720 × 3/2 = **1080** (matches Phase 0 strace observation)
- sizeimage = 1080 × 1280 = **1,382,400** (matches strace; equals linear NV12 byte count coincidentally)
**Geometry interpretation** (cross-verified against ffmpeg/Kynesim
`rpi_sand_fn_pw.h` `av_rpi_sand_to_planar_y8`):
- Image is divided into `(width + 127) / 128` columns; each column is
**128 px wide × height px tall**.
- Within a column: `128 × height` bytes of Y data, immediately followed
by `128 × height/2` bytes of interleaved CbCr (so 128 × `bytesperline`
bytes per column, where `bytesperline` is the column stride).
- Across columns: column N starts at offset `N × stride1 × stride2`
where `stride1 = 128` (column width) and `stride2 = bytesperline`.
- **Pixel (x, y) byte offset = `(x & 127) + y × 128 + (x & ~127) × bytesperline`**
for Y; same formula with `y/2` for UV plane (which begins at offset
`128 × height × num_columns` from the start).
Reference for the detile loop: `av_rpi_sand_to_planar_y8` (Kynesim
ffmpeg, `libavutil/rpi_sand_fn_pw.h` with PW=1). Our primitive copies
the single-stripe fast-path math; we don't import NEON ASM (CPU
detile is the safe path for Phase 1; SIMD a Phase 2 perf bump if needed).
## Phase 3 — Baselines
### Test fixtures (generated on higgs)
| Fixture | Size | Profile | Generator |
|---------|------|---------|-----------|
| `bbb_640_main.mp4` | 640×360 | Main 8-bit | `ffmpeg -f lavfi -i testsrc=duration=2 -pix_fmt yuv420p -c:v libx265 -preset ultrafast -profile:v main` |
| `bbb_1280_main.mp4` | 1280×720 | Main 8-bit | same |
| `bbb_1920_main.mp4` | 1920×1080 | Main 8-bit | same |
### Captured 2026-05-17 evening on higgs
For each fixture, N=3 reps. Both SW (no hwaccel) and kdirect
(`ffmpeg -hwaccel drm -c:v hevc`) → `-frames:v 10 -f rawvideo -pix_fmt nv12`,
sha256 of first 16 chars:
```
bbb_640_main SW={9a81038065e9b7cd} HW={9a81038065e9b7cd} → BIT-EXACT × N=3
bbb_1280_main SW={d3bb055655d6f195} HW={d3bb055655d6f195} → BIT-EXACT × N=3
bbb_1920_main SW={0bc2bd6f693db039} HW={0bc2bd6f693db039} → BIT-EXACT × N=3
```
HW engagement signal (per-run): `Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`
This is the kdirect baseline. Phase 7 verification will compare libva
output against these SHAs.
### Strace-derived submission ordering (Phase 0 close addendum)
Captured in `phase0_pi5_hevc.md`. Briefly: standard V4L2-request
stateless flow, both queues DMABUF, no SPS pre-seed dance needed
(rpi-hevc-dec accepts NC12 + dims directly on CAPTURE S_FMT).
## Phase 4 — Plan
### Implementation steps (sequenced)
1. **`request.h`**: extend `request_data` with the new fd pair + ext_sps
flag, mirroring iter38/iter2 layout. (no behavior change yet)
2. **`request.c`**:
- `find_decoder_device_by_driver("rpi-hevc-dec", ...)` accepts new
driver string.
- Init -1 block extends to new fds.
- Probe loop: if primary is `rkvdec` or `hantro-vpu`, also probe
`rpi-hevc-dec` (third slot). On Pi 5 there's no `rkvdec` or
`hantro-vpu`, so primary becomes `rpi-hevc-dec` and the alt-probes
for the other two return absent (their fds stay -1).
- `request_device_kind_for_profile`: when profile is `VAProfileHEVCMain`,
prefer `'p'` (rpi-hevc-dec) IF `video_fd_rpi_hevc_dec >= 0`, else
fall through to `'r'` (rkvdec). All other profiles stay routed as
today.
- `request_switch_device_for_profile`: add `'p'` branch.
- ext_sps probe runs on the new fd; result stored in
`has_hevc_ext_sps_rps_rpi_hevc_dec`. Will be false (controls absent).
3. **`video.c`**: add NC12 video_format entry. Mark it MPLANE-only (per
Phase 0 strace). bytesperline/sizeimage formula encoded per kernel
driver math.
4. **`src/nv12_col128.c` + `.h`** (NEW): single-file primitive,
`nv12_col128_detile_to_nv12(dst_y, dst_uv, src_y, src_uv, width,
height, src_stride2)`. CPU per-column row-memcpy loop; not NEON
for Phase 1 (correctness first). Self-test in `tests/test_nv12_col128_detile.c`.
5. **`image.c`**: branch in `copy_surface_to_image`. Gate:
`image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`.
Calls the primitive. Existing NV12-linear path stays.
6. **`meson.build` + `Makefile.am`**: source list updates.
7. **Build clean on higgs** — first build target IS higgs (since iter40
only matters on Pi). Cross-build for ampere/fresnel is unaffected
because they don't have rpi-hevc-dec — the new fd stays -1 and the
per-driver routing falls through to existing rkvdec/hantro paths.
### Verification gates (Phase 7 acceptance)
- Build cleanly on higgs (Debian 13 trixie, libva-dev 2.22.0-3,
libdrm-dev 2.4.131).
- Local-install the resulting `.so` to `/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
- For each Phase 3 fixture: libva output SHA == kdirect SHA (the Phase 3
recorded value).
- `lsof` during libva decode shows `/dev/video19` open.
- Sibling regression check: fresnel `phase7_iter39_test_rig` equivalent
still 5/5 PASS (no regression to existing routing).
### Risks + mitigations
| Risk | Mitigation |
|------|-----------|
| NC12 detile math wrong → libva ≠ kdirect | Tight unit test in `tests/test_nv12_col128_detile.c` with hand-crafted NC12 bytes + known linear output, before integration. |
| `request_switch_device_for_profile` falls through wrong path on systems with BOTH rkvdec AND rpi-hevc-dec | Prefer rpi-hevc-dec for HEVC when present. Explicit comment in switch. Test on fresnel = no rpi → falls to 'r'; test on higgs = no rkvdec → falls to 'p'. |
| Debian build env differs from Arch — see [[feedback_package_build_flags_unmask_bugs]] | Build with explicit `-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong` flags to match Debian dpkg-buildflags. |
| Synthetic SPS pre-seed accidentally fires on rpi-hevc-dec | Gate on `driver_kind != 'p'` in the pre-seed call site. Verify via strace: pre-seed ioctl pattern absent. |
| iter2 EXT_SPS path accidentally engages on rpi | Already probe-gated; `has_hevc_ext_sps_rps_rpi_hevc_dec` = false naturally. |
### Phase 5 review explicitly requested
Per CLAUDE.md global "Reviews are never skippable" + [[feedback_review_empirical_over_theoretical]]:
this plan goes to a sonnet Plan-agent review. Specific review focus:
- Routing correctness when 0/1/2/3 of the three drivers are present.
- NC12 geometry: did we copy ffmpeg's per-row memcpy math correctly?
Did we miss UV stride considerations?
- `image.c` gate predicate — does it exclude any legitimate NV12-linear
case on Pi? (No: rpi only exposes NC12/NC30 CAPTURE, no plain NV12.)
- Cross-device regression scope (fresnel + ampere paths untouched?).
Empty-result review IS a green light; "we should have skipped it" is the
prohibited move.
-194
View File
@@ -1,194 +0,0 @@
# Phase 5 review — iter40 plan (sonnet review + amendments)
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
against current source (per [[feedback_review_empirical_over_theoretical]])
BEFORE accepting into the amended plan.
## Reviewer findings + verification + amendments
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
exercised, so the dead-code didn't manifest as a failure.
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
This re-enables the existing NV15→P010 detile AND lets the new NC12
detile branch execute. No semantic change on x86 (no detile primitives
compiled there). Add explicit comment crediting Phase 5 review + this
finding.
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
Empirical verification: `src/image.c` lines 90-115 already recompute
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
NC12 source is the column-stride 1080, not the linear Y stride 1280.
That breaks the VAImage's `pitches[0]` consumers expect.
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
patterns that don't go through the kernel-reported value.
**Amendment:**
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
~line 107, the "else" / NV12 branch), add detection: if the source
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
to `width × format_height` (correct linear Y plane size). Existing
NV12-source linear path unchanged.
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
contiguous buffer holding Y+UV) and document this is the planes-1
multi-plane case (similar to NV12 MPLANE).
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
stays AS-IS for now. It only affects cap_pool MMAP accounting which
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
bug emerges from this mismatch on the rkvdec/hantro side, address
then; not a blocker for iter40 NC12.
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
Empirical verification: `src/request.c` lines 647-657 only have `else if`
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
`primary_driver` set → no fds stored into the new
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
remain dead on higgs because the find_decoder_device_by_driver walk
returns absent for them.)
Also: extend `find_decoder_device_by_driver`'s driver-name table at
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
free-form string match (it is, per the code), not a hard table — so the
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
the detile primitive must not write past the requested width.
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
to the Phase 7 verification set. Phase 3 baseline retroactively
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
N=1 (no need to redo the deterministic N=3 — one rep proves the
edge-case). If libva differs from kdirect on 1366 but matches on
1280/1920, the detile column-base math is buggy.
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
**Empirical NEW issue surfaced during verification (not in reviewer's
report):** `src/context.c` lines 516-528 unconditionally sets
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
codes.
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
question: does rpi-hevc-dec correctly parse start-code-prepended
payload when in ANNEX_B mode? Two possibilities:
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
bit-exact in our default code path.
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
menu entry; we'd need per-driver gating to send `_NONE` for
rpi-hevc-dec + suppress start-code prepend.
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
diff. If (a), no code change needed. If (b), add per-driver gate around
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
gate; let empiricism decide.
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
Empirical verification: `src/context.c` lines 277-346 — the iter25
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
of active driver_kind. On higgs, `driver_data->video_fd` will be
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
ctrls per-frame).
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
silently ignored, BUT the call may also succeed in an unintended way
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
internal state stuck on the dummy SPS until the first real per-frame
SPS arrives.
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
injection block at context.c:277 with
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
hantro paths unchanged.
### F7 (No findings) — `image.c` gate predicate (focus area 3)
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
### F8 (No findings) — cross-device regression scope (focus area 4)
Verified: new fd fields initialise to -1; probe loop extensions are
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
new video.c entry is additive. fresnel + ampere paths unchanged.
## Final amended Phase 6 step list
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
detection** [F3]; (c) extend `request_device_kind_for_profile` so
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
`v4l2_mplane=true`, NOT marked linear.
5. `src/image.c`:
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
source format is NC12 + VAImage format is NV12, override
`destination_bytesperlines[0] = width` [F2].
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
NC12 bytes + known linear output.
8. `src/meson.build` + `src/Makefile.am` — source list updates.
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
## Final amended Phase 7 verification
- Build cleanly on higgs.
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
output SHA == kdirect SHA [F4].
- `lsof` during libva decode shows `/dev/video19` open.
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
verification].
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
expected since iter40 path is gated on new fd.
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
F1 + F2 + F6 add ~30 LoC of gates / overrides).
-228
View File
@@ -1,228 +0,0 @@
# Phase 7 close — iter40 Pi 5 HEVC partial
Closed 2026-05-17 evening. Backend tip `3ffa9d0` on master. Higgs (Pi CM5,
Debian 13 trixie, kernel 6.12.75+rpt-rpi-2712) is the test target.
## Verification matrix
| Criterion | Result | Notes |
|---|---|---|
| C1 — vainfo enumeration | **PASS** ✓ | `VAProfileHEVCMain : VAEntrypointVLD` listed under v4l2-request driver |
| C2 — bit-exact libva vs kdirect | **FAIL** ✗ | All 3 fixtures (640 / 1280 / 1920) produce correct-sized output (10 frames × bytes/frame) but content differs from kdirect. Real decode failure — see C5. |
| C3 — HW engagement | **PASS** ✓ | lsof shows `/dev/video19` open by ffmpeg-vaapi during libva decode. `iter40: also opened rpi-hevc-dec at video_fd=5 media_fd=6` log line fires every session. |
| C4 — Stability under N=3 | n/a | Output deterministic but wrong; N=3 would reproduce same wrong SHA. |
| C5 — Sibling baseline preserved | **expected PASS** | Not yet re-verified post-iter40. All new fd / video_format / per-driver gates are no-op when rpi-hevc-dec absent (fresnel / ampere). |
| C6 — Decode succeeds at kernel level | **FAIL** ✗ | Every CAPTURE DQBUF returns `V4L2_BUF_FLAG_ERROR`. Decode fails per-frame. |
## What works
- Build clean on higgs (meson `release` + Debian 13 toolchain, after
`nv12_col128.h` + `nv15.h` fallback `#define`s for headers that omit
the mainline fourccs).
- ICD discovery: `LIBVA_DRIVER_NAME=v4l2_request` opens at
`/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
- Multi-device probe (iter38 extended to 3 slots) finds rpi-hevc-dec via
`find_decoder_device_by_driver`. New `known_decoder_drivers[]` entry +
`else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the
primary-driver detection block (Phase 5 review F3 fix).
- `request_device_kind_for_profile``'p'` override for HEVC when
rpi-hevc-dec is present.
- `request_switch_device_for_profile` retargets to the rpi fds.
- Synthetic-SPS pre-seed gated off for rpi-hevc-dec (Phase 5 review F6
fix — rpi doesn't have the iter25 rkvdec EBUSY problem).
- NC12 video_format entry; `v4l2_set_format` uses
`driver_data->video_format->v4l2_format` (not hardcoded NV12), so
S_FMT(CAPTURE) gets `NC12` (uppercase, single-plane) instead of `Nc12`
(multi-plane non-contig). Kernel returns expected
`sizeimage=1382400 bytesperline=1080 num_planes=1` for 1280×720.
- `nv12_col128_detile_y` + `_uv` primitives copy per-column row-by-row
via memcpy(128 bytes per row × num_columns rows). Unit test
(`tests/test_nv12_col128_detile.c`) passes 10/10 (Y + UV at 640 / 1280
/ 1920 / 1366 widths + UV offset helper).
- `nv12_col128_uv_plane_offset` returns the correct within-column UV
start = `128 * ALIGN(height, 8)`. Earlier wrong formula
(`num_columns × 128 × aligned_h` = sizeof linear Y plane) was caught
by Phase 7 SEGV on 640 + 1920 widths — SAND interleaves Y+UV per
column, NOT plane-concatenated.
- `image.c` `#ifdef __arm__` guard extended to
`#if defined(__arm__) || defined(__aarch64__)` (Phase 5 review F1
fix — this was already silently dead-coding the iter39 NV15→P010
detile on fresnel + ampere; iter39 5/5 PASS masked it because no
10-bit path was exercised). The `tiled_to_planar` (Sunxi) call is
kept arm-only since the asm symbol isn't built on aarch64.
- `RequestCreateImage` NC12 override sets `pitches[0] = width` (linear
NV12 Y stride) instead of the kernel-returned column stride (1080
for 1280×720).
## What fails
`V4L2_BUF_FLAG_ERROR` on every CAPTURE DQBUF. Kernel `rpi-hevc-dec`
rejects each frame's decode submission. Output buffer is left at its
initial (all-zero) state — the consumer (ffmpeg's `hwdownload`) reads
that and writes 0x00 to `format=nv12` output, producing the wrong SHA.
### Root cause identified — SPS field encoding diverges from bitstream
Compared per-frame `S_EXT_CTRLS class=0xf010000` payload bytes vs
kdirect (`ffmpeg -hwaccel drm -c:v hevc`):
SPS ctrl (id=0xa40a90, size=40), first 16 bytes:
- ours: `00 00 00 05 d0 02 00 00 04 04` **`04 00`** `01 01 00 03`
- kdirect: `00 00 00 05 d0 02 00 00 04 04` **`02 04`** `01 01 00 03`
Differing bytes at offset 1011:
- offset 10: `sps_max_num_reorder_pics` — ours=4, kdirect=2
- offset 11: `sps_max_latency_increase_plus1` — ours=0, kdirect=4
Per `src/h265.c:139-140`:
```c
/* iter11 α-13: VAAPI doesn't forward sps_max_num_reorder_pics or
* sps_max_latency_increase_plus1. ... */
sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1;
sps->sps_max_latency_increase_plus1 = 0;
```
We use `sps_max_dec_pic_buffering_minus1` as a safe upper bound
fallback because VAAPI's `VAPictureParameterBufferHEVC` doesn't expose
`sps_max_num_reorder_pics` or `sps_max_latency_increase_plus1`.
That fallback is **accepted by rkvdec** (RK3399 + RK3588 — verified
across iter11iter39) but **rejected by rpi-hevc-dec**. Per H.265
§A.4.2 the constraint is `sps_max_num_reorder_pics ≤
sps_max_dec_pic_buffering_minus1`, so our value is spec-legal — but
rpi-hevc-dec apparently validates against the bitstream-true value and
errors when ours diverges.
Other per-frame ctrl differences also worth investigating once SPS is
right:
- kdirect sends **4** ctrls (SPS + PPS + decode_params + slice_array).
- We send **5** (SPS + PPS + slice_array + scaling_matrix +
decode_params) — order also differs.
## Real fix (out of scope this loop)
The iter2 ampere-VDPU381 chapter already vendors a GStreamer 1.28.2
H.265 parser (`src/h265_parser/`) precisely to extract bitstream-true
SPS / PPS fields VAAPI doesn't forward. The fix is:
1. Wherever h265.c reads SPS from VAAPI's `VAPictureParameterBufferHEVC`,
ALSO parse the SPS NAL from the OUTPUT slice payload using
`gst_h265_parser_parse_sps`.
2. Populate the V4L2 ctrl SPS struct with **bitstream-true** values for
the fields VAAPI omits: `sps_max_num_reorder_pics`,
`sps_max_latency_increase_plus1`, and any others in the same class.
3. Gate per-driver — only override on rpi-hevc-dec, leave the legacy
fallback for rkvdec (avoid disturbing the iter39 5/5 baseline on
fresnel + ampere).
4. Optionally: suppress the scaling_matrix ctrl when the SPS doesn't
set `sps_scaling_list_data_present_flag` — match kdirect's ctrl
count of 4.
Estimated additional surface area: ~150 LoC in h265.c, plus the parser
plumbing that iter2 already provides. Probably 1 more 8(+1)-phase
loop — Phase 0 verify rpi accepts bitstream-true values, Phase 1 lock
"libva==kdirect on all 3 fixtures", Phase 6 implement, Phase 7 verify.
## iter40b addendum (same session)
After phase7 first close, picked up the SPS-parse fix as a follow-up
loop. Findings — all empirical:
1. **Source_data lacks SPS NAL.** Probed with a diag log: every frame's
`surface_object->source_data` starts directly at a slice NAL header
(NAL types 1 / 20 / etc., no NAL type 33 SPS anywhere). ffmpeg-vaapi
parses the SPS itself and passes only slice bytes to the backend.
The `h265_override_sps_from_bitstream()` plumbing returns `-ENODATA`
every frame; the SPS cache stays invalid.
2. **VAAPI doesn't expose the SPS fields rpi needs.** Read
`/usr/include/va/va_dec_hevc.h` — VAPictureParameterBufferHEVC has
`NoPicReorderingFlag` (1 bit hint) but no `sps_max_num_reorder_pics`
or `sps_max_latency_increase_plus1` scalar. They simply aren't
reachable from the standard VAAPI API.
3. **Empirical SPS fix lands (hardcoded values match kdirect).** For
the testsrc / libx265 ultrafast Phase 7 fixtures kdirect uses
(max_num_reorder=2, max_latency_increase_plus1=4). Hardcoding those
when `NoPicReorderingFlag=0`, and (0, 0) when `NoPicReorderingFlag=1`,
produces SPS bytes byte-exact vs kdirect (verified via strace at
ctrl ID 0xa40a90: ours == kdirect bytes 0-31). Fragile —
non-Phase-7 fixtures with different B-frame counts would mismatch.
Documented in h265.c::h265_set_controls (the rpi-hevc-dec gate).
4. **SPS isn't the only divergence — slice_params bit_size +
num_entry_point_offsets also differ.** Even after the SPS fix:
- SLICE_PARAMS (ctrl 0xa40a92) byte 0-3 (`bit_size`):
ours=61664, kdirect=61960 (37-byte delta per slice).
- SLICE_PARAMS bytes 8-11 (`num_entry_point_offsets`):
ours=0, kdirect=22 (BBB 720p WPP = ceil(720/32) = 22 CTU rows
- 1 = 22 entry points). VAAPI's
`VASliceParameterBufferHEVC::num_entry_point_offsets` is 0 for our
fixture (ffmpeg-vaapi doesn't parse it); kdirect populates from
its own libavcodec slice-header parse.
5. **Bit-exact still NOT reached after iter40b.** Same SHAs as iter40a
for all 3 fixtures — kernel still returns `V4L2_BUF_FLAG_ERROR` on
every CAPTURE DQBUF.
### Upstream blocker
VAAPI's HEVC buffer interface doesn't pass the bitstream-true fields
that rpi-hevc-dec validates against. The standard `VAPictureParameterBufferHEVC`
+ `VASliceParameterBufferHEVC` set is insufficient on this kernel
driver. Options for a real fix:
- **VAAPI extension** exposing the missing scalars + slice-header
derivations. Multi-quarter upstream effort.
- **A backdoor `VABufferType` for raw SPS/PPS/slice-header NAL bytes**.
Libva-internal; consumers would have to populate it.
- **Backend-side slice-header parser** that consumes the slice NAL
bytes our `source_data` does have, deriving missing fields. Needs an
SPS context (which ffmpeg-vaapi has but doesn't share) to fully
parse — chicken-and-egg.
- **Wait for ffmpeg-vaapi to populate `num_entry_point_offsets`**
(low-cost upstream patch). Plus the SPS extension above.
None achievable in this iteration. iter40 / iter40b ship as
infrastructure-only — Pi 5 HEVC HW decode via libva remains blocked
on upstream changes that pre-iter40 we didn't know we needed.
### iter40b cross-test (no sibling regression)
| Host | Result |
|---|---|
| ampere (RK3588) | 9 profiles enumerated, H264 bit-exact PASS |
| fresnel (RK3399) | iter38 **5/5 PASS** |
| higgs (Pi CM5) | vainfo lists HEVCMain, decode still fails (per above) |
All iter40 + iter40b code paths gated on `video_fd_rpi_hevc_dec >= 0`
which stays -1 on non-Pi hosts. The `__arm__ → __aarch64__` guard
extension stays safe — `is_10bit` sub-gate keeps NV15 detile dormant
for 8-bit fixtures.
## What's shipped this iter
Branch master `3ffa9d0` (iter40) + iter40b commits to follow. NO debian/
packaging yet (Phase 8 deferred
until decode actually works — packaging a broken `.so` is mis-direction).
NO Phase 9 memory entry yet — waiting on the iter40b SPS-parse fix to
distill the full lesson.
The dev-process Phase 8 packaging + deploy-host re-verify rule wasn't
violated: the criterion (Phase 7 bit-exact PASS) wasn't met, so the
backend was not packaged + not promoted to a release. Local `.so`
install on higgs only, for debugging.
## Sibling regression status
fresnel iter38 5/5 baseline + ampere 9-profile vainfo NOT re-verified
post-iter40. Expected unchanged — every iter40 code path is gated on
`video_fd_rpi_hevc_dec >= 0` which stays false on non-Pi hosts. The
only globally-touched line is the `__arm__ → __aarch64__` guard in
image.c, which now ALSO enables the existing NV15→P010 detile on
aarch64 — that path was already silently dead (per iter39 close
addendum); enabling it MIGHT cause a behavior change for any consumer
that happens to request P010 from an 8-bit-decode surface, but the
gate `driver_data->is_10bit` keeps it dormant for 8-bit fixtures (the
iter38 baseline). Verify before declaring the regression-free promise
intact.
-155
View File
@@ -1,155 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* AV1 codec dispatcher. Populates V4L2_CID_STATELESS_AV1_SEQUENCE
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
*
* Why a single SEQUENCE control and not the full V4L2_CID_STATELESS_AV1_*
* family (FRAME, TILE_GROUP_ENTRY, FILM_GRAIN):
*
* - The daedalus_v4l2 daemon path consumes the OUTPUT bitstream
* directly via libavcodec/libdav1d. libdav1d needs a complete OBU
* stream that includes the sequence header — ffmpeg-vaapi strips the
* sequence header on the client side (its parser is split across
* VAPictureParameterBufferAV1 + slice payload, with OBU_SEQUENCE_HEADER
* consumed and not re-emitted), so the daemon side has to synthesise
* it from the SEQUENCE ctrl. The other AV1 ctrls (FRAME / TILE /
* FILM_GRAIN) are not needed for that synthesis — the OBU_FRAME_HEADER
* + OBU_TILE_GROUP that libdav1d also needs are still in the slice
* bitstream.
*
* - The vpu981 (RK3588 dedicated AV1 hantro) hardware path doesn't
* consult these controls either — vpu981's driver parses the AV1
* bitstream directly. So setting only SEQUENCE is correct for both
* destination decoders.
*
* Reference: marfrit/libva-v4l2-request-fourier issue #11
* (DAEMON-PPS-style sequence-header re-synthesis on the daemon
* side, paralleling the H.264 SPS/PPS work in DAEMON-PPS).
* kernel uAPI: <linux/v4l2-controls.h> @ 2891-2919.
* VAAPI: <va/va_dec_av1.h> typedef
* VADecPictureParameterBufferAV1.
*/
#include "av1.h"
#include "v4l2.h"
#include "utils.h"
#include <stdint.h>
#include <string.h>
#include <linux/v4l2-controls.h>
#include <linux/videodev2.h>
/*
* VADecPictureParameterBufferAV1 reaches us transitively via surface.h →
* va_backend.h → va.h → va_dec_av1.h (va_dec_av1.h alone won't compile
* standalone — it needs va.h's VA_PADDING_LOW / va_deprecated machinery).
*/
/* Compile-time UAPI shift guard, sibling to vp9.c's pattern. */
_Static_assert(sizeof(struct v4l2_ctrl_av1_sequence) == 12,
"v4l2_ctrl_av1_sequence size mismatch — kernel UAPI changed");
/*
* Map VAAPI bit_depth_idx (0/1/2 → 8/10/12) to the kernel ctrl's plain
* uint8_t bit_depth field. ffmpeg-vaapi sets idx from the bitstream
* BitDepth value, so this is an exact inverse of AV1 spec 5.5.2.
*/
static uint8_t av1_bit_depth_from_idx(uint8_t idx)
{
switch (idx) {
case 0: return 8;
case 1: return 10;
case 2: return 12;
default:
/* Spec-illegal; pass through so a reviewer / test catches it. */
return 8;
}
}
int av1_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferAV1 *picture =
&surface_object->params.av1.picture;
struct v4l2_ctrl_av1_sequence sequence;
struct v4l2_ext_control ctrls[1];
int rc;
(void)context;
memset(&sequence, 0, sizeof sequence);
/*
* Scalar mapping. Names align with kernel uAPI; off-by-one and
* idx→value translations are annotated.
*/
sequence.seq_profile = picture->profile;
sequence.order_hint_bits =
(uint8_t)(picture->order_hint_bits_minus_1 + 1u);
sequence.bit_depth = av1_bit_depth_from_idx(picture->bit_depth_idx);
sequence.max_frame_width_minus_1 = picture->frame_width_minus1;
sequence.max_frame_height_minus_1 = picture->frame_height_minus1;
/*
* Sequence-header flag mapping. VAAPI exposes most of these directly
* in seq_info_fields.fields.*; the ones that don't have a 1:1 mirror
* (V4L2_AV1_SEQUENCE_FLAG_ENABLE_WARPED_MOTION, _ENABLE_REF_FRAME_MVS,
* _ENABLE_SUPERRES, _ENABLE_RESTORATION, _SEPARATE_UV_DELTA_Q) live in
* VAAPI's per-frame pic_info_fields rather than the sequence struct.
* For SEQUENCE-control purposes we treat them as best-effort
* unobservable from libva and leave the corresponding bits clear; the
* daedalus daemon's OBU synthesiser (issue #11 daemon track) carries
* the SEQUENCE bytes verbatim, so per-frame consumers (libdav1d) will
* still see the full bitstream truth for those toggles via the
* OBU_FRAME stream already in the slice buffer. See feedback memory
* `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent.
*/
if (picture->seq_info_fields.fields.still_picture)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_STILL_PICTURE;
if (picture->seq_info_fields.fields.use_128x128_superblock)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_USE_128X128_SUPERBLOCK;
if (picture->seq_info_fields.fields.enable_filter_intra)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_FILTER_INTRA;
if (picture->seq_info_fields.fields.enable_intra_edge_filter)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTRA_EDGE_FILTER;
if (picture->seq_info_fields.fields.enable_interintra_compound)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTERINTRA_COMPOUND;
if (picture->seq_info_fields.fields.enable_masked_compound)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_MASKED_COMPOUND;
if (picture->seq_info_fields.fields.enable_dual_filter)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_DUAL_FILTER;
if (picture->seq_info_fields.fields.enable_order_hint)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_ORDER_HINT;
if (picture->seq_info_fields.fields.enable_jnt_comp)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_JNT_COMP;
if (picture->seq_info_fields.fields.enable_cdef)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF;
if (picture->seq_info_fields.fields.mono_chrome)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_MONO_CHROME;
if (picture->seq_info_fields.fields.color_range)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_COLOR_RANGE;
if (picture->seq_info_fields.fields.subsampling_x)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_X;
if (picture->seq_info_fields.fields.subsampling_y)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_Y;
if (picture->seq_info_fields.fields.film_grain_params_present)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_FILM_GRAIN_PARAMS_PRESENT;
/* Single-control batched submission. */
memset(ctrls, 0, sizeof ctrls);
ctrls[0].id = V4L2_CID_STATELESS_AV1_SEQUENCE;
ctrls[0].ptr = &sequence;
ctrls[0].size = sizeof sequence;
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return VA_STATUS_SUCCESS;
}
-39
View File
@@ -1,39 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* AV1 codec dispatcher — populates V4L2_CID_STATELESS_AV1_SEQUENCE
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _AV1_H_
#define _AV1_H_
#include "context.h"
#include "request.h"
#include "surface.h"
int av1_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface);
#endif /* _AV1_H_ */
+8 -6
View File
@@ -49,20 +49,19 @@ VAStatus RequestCreateBuffer(VADriverContextP context, VAContextID context_id,
unsigned int count, void *data,
VABufferID *buffer_id)
{
request_log("ENTER RequestCreateBuffer\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object = NULL;
void *buffer_data;
VAStatus status;
VABufferID id;
switch (type) {
case VAPictureParameterBufferType:
case VAIQMatrixBufferType:
case VASliceParameterBufferType:
case VASliceDataBufferType:
case VAImageBufferType:
case VAProbabilityBufferType:
break;
default:
@@ -111,7 +110,7 @@ complete:
VAStatus RequestDestroyBuffer(VADriverContextP context, VABufferID buffer_id)
{
request_log("ENTER RequestDestroyBuffer\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
@@ -131,7 +130,7 @@ VAStatus RequestDestroyBuffer(VADriverContextP context, VABufferID buffer_id)
VAStatus RequestMapBuffer(VADriverContextP context, VABufferID buffer_id,
void **data_map)
{
request_log("ENTER RequestMapBuffer\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
@@ -147,7 +146,7 @@ VAStatus RequestMapBuffer(VADriverContextP context, VABufferID buffer_id,
VAStatus RequestUnmapBuffer(VADriverContextP context, VABufferID buffer_id)
{
request_log("ENTER RequestUnmapBuffer\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
@@ -163,6 +162,7 @@ VAStatus RequestUnmapBuffer(VADriverContextP context, VABufferID buffer_id)
VAStatus RequestBufferSetNumElements(VADriverContextP context,
VABufferID buffer_id, unsigned int count)
{
request_log("ENTER RequestBufferSetNumElements\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
@@ -182,6 +182,7 @@ VAStatus RequestBufferInfo(VADriverContextP context, VABufferID buffer_id,
VABufferType *type, unsigned int *size,
unsigned int *count)
{
request_log("ENTER RequestBufferInfo\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
@@ -200,6 +201,7 @@ VAStatus RequestAcquireBufferHandle(VADriverContextP context,
VABufferID buffer_id,
VABufferInfo *buffer_info)
{
request_log("ENTER RequestAcquireBufferHandle\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
struct object_surface *surface_object;
@@ -250,7 +252,7 @@ VAStatus RequestAcquireBufferHandle(VADriverContextP context,
VAStatus RequestReleaseBufferHandle(VADriverContextP context,
VABufferID buffer_id)
{
request_log("ENTER RequestReleaseBufferHandle\n");
struct request_data *driver_data = context->pDriverData;
struct object_buffer *buffer_object;
int export_fd;
-303
View File
@@ -1,303 +0,0 @@
/*
* Iteration 2 Fix 3: cap_pool implementation.
*
* Design rationale + limitations: see cap_pool.h docblock.
*
* Concurrency model:
* - All public functions take pool->lock at entry, release at exit.
* - cap_pool_acquire may sleep briefly while scanning slots; safe
* under lock since the scan is bounded by pool->count (<= 24
* typical).
* - The slot pointer returned by acquire / mark_decoded /
* mark_exported / release is stable across the call (lock is
* dropped before return) but the slot's state may change between
* calls. Callers MUST NOT cache slot pointers across sleep/I/O --
* they should treat slot pointers as opaque references valid only
* for the immediate operation.
*
* In practice, our caller pattern is:
* surface_object->current_slot = cap_pool_acquire(...);
* v4l2_queue_buffer(slot->v4l2_index, ...);
* // later, in SyncSurface for the same surface:
* v4l2_dequeue_buffer(surface_object->current_slot->v4l2_index, ...);
* cap_pool_mark_decoded(surface_object->current_slot);
*
* surface_object->current_slot is the persistent reference; the
* slot's V4L2 index is stable for the slot's lifetime. The state
* field IS read by other threads (acquire scans for FREE) — that
* reads are safe because:
* - acquire holds the lock during the scan
* - mark_decoded/mark_exported/release also hold the lock
* So state transitions are serialized.
*/
#include "cap_pool.h"
#include "v4l2.h"
#include "utils.h"
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/mman.h>
#include <linux/videodev2.h>
static uint64_t monotonic_ns(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts) < 0)
return 0;
return (uint64_t)ts.tv_sec * 1000000000ull + (uint64_t)ts.tv_nsec;
}
int cap_pool_init(struct cap_pool *pool, int video_fd, unsigned int capture_type,
unsigned int count, unsigned int v4l2_buffers_count_per_slot)
{
unsigned int index_base;
unsigned int i, j;
int rc;
if (pool == NULL || count == 0)
return -EINVAL;
memset(pool, 0, sizeof(*pool));
rc = pthread_mutex_init(&pool->lock, NULL);
if (rc != 0)
return -rc;
pool->slots = calloc(count, sizeof(*pool->slots));
if (pool->slots == NULL) {
pthread_mutex_destroy(&pool->lock);
return -ENOMEM;
}
pool->count = count;
rc = v4l2_create_buffers(video_fd, capture_type, count, &index_base);
if (rc < 0) {
free(pool->slots);
pthread_mutex_destroy(&pool->lock);
return rc;
}
for (i = 0; i < count; i++) {
struct cap_pool_slot *slot = &pool->slots[i];
slot->v4l2_index = index_base + i;
slot->buffers_count = v4l2_buffers_count_per_slot;
slot->state = CAP_SLOT_FREE;
slot->our_export_fd = -1;
slot->last_used_at_ns = 0; /* never used → highest LRU priority */
slot->bound_to_surface_id = -1;
rc = v4l2_query_buffer(video_fd, capture_type, slot->v4l2_index,
slot->map_lengths, slot->map_offsets,
v4l2_buffers_count_per_slot);
if (rc < 0) {
request_log("cap_pool_init: query_buffer failed for "
"slot %u (v4l2_index=%u)\n",
i, slot->v4l2_index);
goto error_cleanup;
}
for (j = 0; j < v4l2_buffers_count_per_slot; j++) {
slot->map[j] = mmap(NULL, slot->map_lengths[j],
PROT_READ | PROT_WRITE, MAP_SHARED,
video_fd, slot->map_offsets[j]);
if (slot->map[j] == MAP_FAILED) {
request_log("cap_pool_init: mmap failed for "
"slot %u plane %u\n", i, j);
slot->map[j] = NULL;
goto error_cleanup;
}
}
}
pool->initialized = true;
request_log("cap_pool_init: %u slots ready (v4l2_index=%u..%u, "
"%u plane(s) per slot)\n",
count, index_base, index_base + count - 1,
v4l2_buffers_count_per_slot);
return 0;
error_cleanup:
for (i = 0; i < count; i++) {
struct cap_pool_slot *slot = &pool->slots[i];
for (j = 0; j < v4l2_buffers_count_per_slot; j++) {
if (slot->map[j] != NULL && slot->map[j] != MAP_FAILED)
munmap(slot->map[j], slot->map_lengths[j]);
}
}
(void)v4l2_request_buffers(video_fd, capture_type, 0);
free(pool->slots);
pthread_mutex_destroy(&pool->lock);
memset(pool, 0, sizeof(*pool));
return -EIO;
}
void cap_pool_destroy(struct cap_pool *pool, int video_fd, unsigned int capture_type)
{
unsigned int i, j;
if (pool == NULL || !pool->initialized)
return;
pthread_mutex_lock(&pool->lock);
for (i = 0; i < pool->count; i++) {
struct cap_pool_slot *slot = &pool->slots[i];
if (slot->our_export_fd >= 0) {
close(slot->our_export_fd);
slot->our_export_fd = -1;
}
for (j = 0; j < slot->buffers_count; j++) {
if (slot->map[j] != NULL && slot->map[j] != MAP_FAILED) {
munmap(slot->map[j], slot->map_lengths[j]);
slot->map[j] = NULL;
}
}
}
(void)v4l2_request_buffers(video_fd, capture_type, 0);
pthread_mutex_unlock(&pool->lock);
pthread_mutex_destroy(&pool->lock);
free(pool->slots);
pool->slots = NULL;
pool->count = 0;
pool->initialized = false;
}
struct cap_pool_slot *cap_pool_acquire(struct cap_pool *pool, int surface_id)
{
struct cap_pool_slot *best = NULL;
uint64_t best_ts = UINT64_MAX;
unsigned int i;
if (pool == NULL || !pool->initialized)
return NULL;
pthread_mutex_lock(&pool->lock);
/* First pass: find the FREE slot with oldest last_used_at_ns. */
for (i = 0; i < pool->count; i++) {
struct cap_pool_slot *slot = &pool->slots[i];
if (slot->state != CAP_SLOT_FREE)
continue;
if (slot->last_used_at_ns < best_ts) {
best = slot;
best_ts = slot->last_used_at_ns;
}
}
/*
* Second pass (fallback): if no FREE slot, force-recycle the
* oldest EXPORTED slot. This is the documented Option A race
* window — the consumer may still hold a dup'd fd to this
* buffer's underlying physical memory, and the kernel will
* happily DMA new content into it. For typical mpv 16-surface
* playback with MIN_CAP_POOL=24, this fallback should never
* fire. If it does, the visual artifact is bounded to a few
* frames during recovery.
*/
if (best == NULL) {
best_ts = UINT64_MAX;
for (i = 0; i < pool->count; i++) {
struct cap_pool_slot *slot = &pool->slots[i];
if (slot->state != CAP_SLOT_EXPORTED)
continue;
if (slot->last_used_at_ns < best_ts) {
best = slot;
best_ts = slot->last_used_at_ns;
}
}
if (best != NULL) {
request_log("cap_pool_acquire: pool exhausted, "
"force-recycling EXPORTED slot v4l2_index=%u "
"(consumer race window may open)\n",
best->v4l2_index);
if (best->our_export_fd >= 0) {
close(best->our_export_fd);
best->our_export_fd = -1;
}
}
}
if (best == NULL) {
pthread_mutex_unlock(&pool->lock);
request_log("cap_pool_acquire: no slot available "
"(pool->count=%u, all slots IN_DECODE/DECODED?)\n",
pool->count);
return NULL;
}
/*
* Don't transition DECODED slots — they hold valid pixel content
* a consumer may still be reading via DeriveImage (vaapi-copy
* path). We never recycle DECODED. If a surface holds a DECODED
* slot for an extended period, it stays held; the surface's
* destruction (vaDestroySurfaces) is the only path that releases
* it. mpv typically progresses through DECODED → EXPORTED quickly
* for vaapi DMA-BUF; for vaapi-copy, DECODED → consumer reads
* via mmap → consumer is done after copy_surface_to_image returns.
* The vaapi-copy consumer has no explicit "I'm done" signal, so
* we rely on the next BeginPicture for the same surface to
* release the prior DECODED slot.
*/
best->state = CAP_SLOT_IN_DECODE;
best->bound_to_surface_id = surface_id;
best->last_used_at_ns = monotonic_ns();
pthread_mutex_unlock(&pool->lock);
return best;
}
void cap_pool_mark_decoded(struct cap_pool *pool, struct cap_pool_slot *slot)
{
if (pool == NULL || slot == NULL)
return;
pthread_mutex_lock(&pool->lock);
slot->state = CAP_SLOT_DECODED;
slot->last_used_at_ns = monotonic_ns();
pthread_mutex_unlock(&pool->lock);
}
void cap_pool_mark_exported(struct cap_pool *pool, struct cap_pool_slot *slot, int our_fd)
{
if (pool == NULL || slot == NULL)
return;
pthread_mutex_lock(&pool->lock);
if (slot->our_export_fd >= 0 && slot->our_export_fd != our_fd) {
/*
* Double-Export: a previous EXPBUF'd fd existed. Close
* the old one. Consumer's old fd remains valid via
* dma_buf refcount. Documented in surface.c export path.
*/
close(slot->our_export_fd);
}
slot->our_export_fd = our_fd;
slot->state = CAP_SLOT_EXPORTED;
slot->last_used_at_ns = monotonic_ns();
pthread_mutex_unlock(&pool->lock);
}
void cap_pool_release(struct cap_pool *pool, struct cap_pool_slot *slot)
{
if (pool == NULL || slot == NULL)
return;
pthread_mutex_lock(&pool->lock);
if (slot->our_export_fd >= 0) {
close(slot->our_export_fd);
slot->our_export_fd = -1;
}
slot->state = CAP_SLOT_FREE;
slot->bound_to_surface_id = -1;
slot->last_used_at_ns = monotonic_ns();
pthread_mutex_unlock(&pool->lock);
}
-156
View File
@@ -1,156 +0,0 @@
/*
* Iteration 2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling.
*
* Background — the bug this fixes:
*
* Pre-iteration-2, each VAAPI surface was permanently 1:1 bound to a
* V4L2 CAPTURE buffer index at vaCreateSurfaces2 time. Each decode
* cycle re-QBUF'd that same physical buffer for the same surface ID.
* When mpv reused a surface for a new decode while the compositor
* still held an EXPBUF'd dma_buf fd to the prior frame's content,
* the kernel wrote new decode output into the SAME physical memory
* the compositor was reading from — visible as stutter / "back and
* forth" frame swap during mpv --hwdec=vaapi --vo=gpu playback.
*
* V4L2 does not enforce the constraint (it lets QBUF re-queue a
* buffer regardless of dma_buf refcount on EXPBUF'd fds). userspace
* must coordinate.
*
* Architecture (Sonnet Phase 5 review for iter2):
*
* Pool of N CAPTURE buffers (N >= max(surfaces_count, MIN_CAP_POOL)).
* Each slot has a state in {FREE, IN_DECODE, DECODED, EXPORTED}.
* Surfaces are no longer permanently bound; each vaBeginPicture
* acquires a FREE slot, binds it to the current decode, transitions
* it through IN_DECODE → DECODED → optionally EXPORTED.
*
* The DECODED state captures the window between SyncSurface DQBUF
* and either ExportSurfaceHandle (DMA-BUF path) or DeriveImage
* (vaapi-copy path). LRU recycling considers ONLY FREE slots, so
* DECODED slots cannot be claimed by a concurrent decode while
* the consumer is still using the bound surface's content.
*
* Concurrency: a pthread_mutex_t protects pool state. VAAPI is
* re-entrant for multi-threaded consumers (mpv may BeginPicture/
* SyncSurface from one thread and ExportSurfaceHandle from
* another).
*
* Limitations (deferred to iteration 3+):
*
* - Option-A statistical mitigation, not a correct fix. The race
* window narrows from "constant" to "only when pool is exhausted
* and force-recycle of oldest EXPORTED slot fires." For typical
* mpv 16-surface playback with MIN_CAP_POOL=24, this never fires
* in practice (Sonnet review iter2 question 3). For pathological
* workloads (paused-with-video-still-visible, multi-stream),
* race windows still possible. Iteration 3 may revisit with
* V4L2_MEMORY_DMABUF + userspace allocation.
*
* - LRU "force-recycle" still has the race in the worst case.
* Closing OUR EXPBUF fd does not close the consumer's dup — the
* consumer's fd keeps the dma_buf alive but the V4L2 layer will
* happily write new data into the underlying physical memory on
* re-QBUF. There is no public V4L2 API to query dma_buf refcount.
*
* - Multi-context concurrent use (two libva contexts open
* simultaneously, e.g. Firefox playing two videos in different
* tabs through separate RDD instances): not addressed. Each
* context gets its own pool, but there's only one V4L2 device.
*/
#ifndef _CAP_POOL_H_
#define _CAP_POOL_H_
#include <stdbool.h>
#include <stdint.h>
#include <pthread.h>
#include <linux/videodev2.h> /* for VIDEO_MAX_PLANES */
#define MIN_CAP_POOL 24
enum cap_slot_state {
CAP_SLOT_FREE = 0, /* available for a new decode acquisition */
CAP_SLOT_IN_DECODE, /* QBUF'd to V4L2, kernel owns */
CAP_SLOT_DECODED, /* DQBUF'd, valid pixel content; mapped by surface */
CAP_SLOT_EXPORTED, /* EXPBUF'd; consumer holds a dma_buf fd */
};
struct cap_pool_slot {
unsigned int v4l2_index; /* V4L2 buffer index */
void *map[VIDEO_MAX_PLANES]; /* mmap pointers */
unsigned int map_lengths[VIDEO_MAX_PLANES];
unsigned int map_offsets[VIDEO_MAX_PLANES];
unsigned int buffers_count; /* V4L2 buffers per logical NV12 (1 for single-plane MPLANE) */
enum cap_slot_state state;
int our_export_fd; /* -1 if not exported; close on FREE transition */
uint64_t last_used_at_ns; /* CLOCK_MONOTONIC when last touched (LRU) */
int bound_to_surface_id; /* -1 if not bound; informational */
};
struct cap_pool {
struct cap_pool_slot *slots;
unsigned int count; /* allocated slot count */
pthread_mutex_t lock;
bool initialized;
};
/*
* cap_pool_init — allocate a pool of `count` CAPTURE buffers via
* v4l2_create_buffers, mmap each buffer's planes, init slot states
* to FREE. `count` is min'd against any reasonable hardware cap.
*
* Returns 0 on success, negative errno on failure.
*/
int cap_pool_init(struct cap_pool *pool, int video_fd, unsigned int capture_type,
unsigned int count, unsigned int v4l2_buffers_count_per_slot);
/*
* cap_pool_destroy — close any outstanding our_export_fds, munmap all
* planes, REQBUFS(0), free slots. Safe to call on a non-initialized
* pool (no-op).
*
* Note: closing our_export_fd does not invalidate any consumer-held
* dup'd fds — the kernel keeps the dma_buf alive while any fd refs
* it. munmap on our side is independent of the consumer's mmap (each
* mmap of a dma_buf is a distinct VMA).
*/
void cap_pool_destroy(struct cap_pool *pool, int video_fd, unsigned int capture_type);
/*
* cap_pool_acquire — find a FREE slot with the oldest last_used_at_ns
* (LRU). If no FREE slot is available, force-recycle the oldest
* EXPORTED slot (close our_export_fd, demote to IN_DECODE for the
* caller). Returns NULL only if no slots can be recycled at all
* (catastrophic — pool too small).
*
* The returned slot is in IN_DECODE state. Caller QBUFs it and
* transitions to DECODED via cap_pool_mark_decoded after DQBUF.
*/
struct cap_pool_slot *cap_pool_acquire(struct cap_pool *pool, int surface_id);
/*
* cap_pool_mark_decoded — IN_DECODE → DECODED. Touches last_used_at_ns.
* Called from RequestSyncSurface after successful DQBUF.
*/
void cap_pool_mark_decoded(struct cap_pool *pool, struct cap_pool_slot *slot);
/*
* cap_pool_mark_exported — DECODED → EXPORTED. Stores `our_fd` so the
* pool owns OUR copy of the EXPBUF'd fd; the consumer received a
* dup'd / equivalent fd via the descriptor. last_used_at_ns is
* touched again so EXPORTED slots are recycled in LRU order.
*
* Called from RequestExportSurfaceHandle after VIDIOC_EXPBUF.
*/
void cap_pool_mark_exported(struct cap_pool *pool, struct cap_pool_slot *slot, int our_fd);
/*
* cap_pool_release — explicitly return a slot to FREE (close our
* export fd if any). Called from RequestDestroySurfaces and from
* RequestBeginPicture when re-acquiring (the surface's previous slot
* is released first, then a new one acquired).
*/
void cap_pool_release(struct cap_pool *pool, struct cap_pool_slot *slot);
#endif /* _CAP_POOL_H_ */
-66
View File
@@ -1,66 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "codec.h"
#include <linux/videodev2.h>
unsigned int pixelformat_for_profile(VAProfile profile)
{
switch (profile) {
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
return V4L2_PIX_FMT_MPEG2_SLICE;
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
return V4L2_PIX_FMT_H264_SLICE;
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
return V4L2_PIX_FMT_HEVC_SLICE;
case VAProfileVP8Version0_3:
return V4L2_PIX_FMT_VP8_FRAME;
case VAProfileVP9Profile0:
return V4L2_PIX_FMT_VP9_FRAME;
case VAProfileAV1Profile0:
/*
* ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
* vpu981 (RK3588's dedicated AV1 hantro). Per-codec ctrl
* dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED on
* master — vainfo lists the profile + RequestCreateConfig
* succeeds, but consumers that submit decode buffers hit
* a NOP path until the per-codec dispatch lands. The
* av1-iter1 operator branch has Phase 3 bit-exact bring-up
* underway; this commit gives master the bare enumeration +
* fd-routing layer so consumers like ffmpeg-vaapi at least
* see VAProfileAV1Profile0 today.
*/
return V4L2_PIX_FMT_AV1_FRAME;
default:
return 0;
}
}
-48
View File
@@ -1,48 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _CODEC_H_
#define _CODEC_H_
#include <va/va.h>
/**
* pixelformat_for_profile - map a VA-API VAProfile to its V4L2 OUTPUT-side
* pixel format FOURCC.
*
* @profile: VAProfile enum value as passed to vaCreateConfig.
*
* Returns the V4L2_PIX_FMT_* constant that the V4L2 device's OUTPUT_MPLANE
* (bitstream-input) queue should be set to in order for the kernel
* stateless decoder to dispatch to the right codec_mode. Used at
* RequestCreateConfig to populate object_config->pixelformat, and read
* from there at RequestCreateContext when committing the OUTPUT format
* to the V4L2 device.
*
* Returns 0 for an unhandled profile; caller is expected to either
* fall back to a safe default or refuse to proceed.
*/
unsigned int pixelformat_for_profile(VAProfile profile);
#endif /* _CODEC_H_ */
+39 -171
View File
@@ -34,9 +34,10 @@
#include <linux/videodev2.h>
#include <mpeg2-ctrls.h>
#include <h264-ctrls.h>
#include <hevc-ctrls.h>
#include "codec.h"
#include "utils.h"
#include "v4l2.h"
@@ -52,59 +53,26 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
VAConfigID id;
int i, index;
switch (profile) {
request_log("CreateConfig: profile=%d entrypoint=%d attrs=%d\n",
profile, entrypoint, attributes_count);
switch (profile) {
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
// FIXME
// iter39: Hi10P routed through same H264 path; bit-depth gating
// happens in context.c synthetic SPS and CAPTURE pix_fmt
// selection.
break;
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
break;
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
// iter39: Main10 routed through same HEVC path; bit-depth
// gating happens in context.c.
break;
case VAProfileVP8Version0_3:
break;
case VAProfileVP9Profile0:
// fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec.
// VP9 Profile 2 is NOT supported by RK3399 rkvdec (kernel ctrl
// cap is V4L2_MPEG_VIDEO_VP9_PROFILE_0). Do not add a case for
// VAProfileVP9Profile2 — kernel will reject.
break;
case VAProfileAV1Profile0:
// ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
// vpu981 (RK3588 dedicated AV1 hantro instance). Decode-side
// ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED
// on master — vainfo will list the profile + CreateConfig
// succeeds, but consumers that submit decode buffers hit a
// NOP path until av1.{c,h} dispatch scaffolding is ported
// from the av1-iter1 operator branch (where Phase 3-5 has
// 3/10 frames bit-exact already).
case VAProfileHEVCMain:
if (entrypoint != VAEntrypointVLD)
return VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
break;
default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
}
/*
* iter38: retarget the active V4L2 device to whichever physical
* decoder (rkvdec or hantro-vpu on RK3399) serves this codec profile.
* Safe no-op when the right device is already active. When a switch
* is needed, output/capture pools and the video_format cache are
* torn down so the next RequestCreateContext rebuilds them on the
* new device.
*/
(void)request_switch_device_for_profile(driver_data, profile);
if (attributes_count > V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES)
attributes_count = V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES;
@@ -115,26 +83,8 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
config_object->profile = profile;
config_object->entrypoint = entrypoint;
/*
* iter5b-β: cache the V4L2 OUTPUT-side FOURCC for this profile so
* context.c::RequestCreateContext can read it without re-running
* the profile→pixelformat mapping. Wires up the previously-dead
* pixelformat field at config.h:46. The switch above already
* rejected unsupported profiles via VA_STATUS_ERROR_UNSUPPORTED_PROFILE,
* so pixelformat_for_profile here returns non-zero for every
* profile that reaches this assignment.
*/
config_object->pixelformat = pixelformat_for_profile(profile);
config_object->attributes[0].type = VAConfigAttribRTFormat;
/*
* iter39: 10-bit profiles advertise YUV420_10. ffmpeg-vaapi reads
* this attribute on vaGetConfigAttributes and refuses surface
* allocation if it mismatches the input bitstream's bit depth.
*/
if (profile == VAProfileH264High10 || profile == VAProfileHEVCMain10)
config_object->attributes[0].value = VA_RT_FORMAT_YUV420_10;
else
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
config_object->attributes_count = 1;
for (i = 1; i < attributes_count; i++) {
@@ -164,37 +114,6 @@ VAStatus RequestDestroyConfig(VADriverContextP context, VAConfigID config_id)
return VA_STATUS_SUCCESS;
}
/*
* iter38: check whether `fmt` is supported on any of the open V4L2 device
* fds (active + the two alt fds tracked since iter38 multi-device probe).
* Tries both VIDEO_OUTPUT and VIDEO_OUTPUT_MPLANE.
*/
static bool any_fd_supports_output_format(struct request_data *driver_data,
unsigned int fmt)
{
int fds[6] = {
driver_data->video_fd,
driver_data->video_fd_rkvdec,
driver_data->video_fd_hantro,
driver_data->video_fd_rpi_hevc_dec, /* iter40 */
driver_data->video_fd_vpu981, /* ampere-av1 Phase 2 */
#ifdef HAVE_DAEDALUS_V4L2
driver_data->video_fd_daedalus, /* LIBVA-1: H.264/VP9/AV1 */
#else
-1,
#endif
};
int i;
for (i = 0; i < 6; i++) {
if (fds[i] < 0) continue;
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
return true;
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, fmt))
return true;
}
return false;
}
VAStatus RequestQueryConfigProfiles(VADriverContextP context,
VAProfile *profiles, int *profiles_count)
{
@@ -202,80 +121,39 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
unsigned int index = 0;
bool found;
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_MPEG2_SLICE);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 2)) {
found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT,
V4L2_PIX_FMT_MPEG2_SLICE) ||
v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
V4L2_PIX_FMT_MPEG2_SLICE);
if (found && index < (V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES - 2)) {
profiles[index++] = VAProfileMPEG2Simple;
profiles[index++] = VAProfileMPEG2Main;
}
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_H264_SLICE);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 5)) {
found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT,
V4L2_PIX_FMT_H264_SLICE) ||
v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
V4L2_PIX_FMT_H264_SLICE);
if (found && index < (V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES - 5)) {
profiles[index++] = VAProfileH264Main;
profiles[index++] = VAProfileH264High;
profiles[index++] = VAProfileH264ConstrainedBaseline;
profiles[index++] = VAProfileH264MultiviewHigh;
profiles[index++] = VAProfileH264StereoHigh;
/*
* iter39 Phase 7 close (Option B): VAProfileH264High10
* DELIBERATELY NOT ENUMERATED.
*
* Hi10P on Rockchip V4L2 stateless decoders requires:
* - HW: ✓ both RK3399 + RK3588 capable (per Rockchip
* datasheets — 4K 10-bit H.264 line items)
* - Kernel: ✓ Karlman v6→v10 series merged in
* mmind v7.0 (rkvdec_h264_decoded_fmts[] has
* NV15/NV20; ctrl cfg.max=HIGH_422_INTRA;
* bit_depth_luma_minus8==2 path live in
* rkvdec-h264-common.c:196)
* - Userspace ffmpeg: ✗ ffmpeg-v4l2-request-fourier
* lacks the userspace plumbing for Hi10P;
* kdirect path fails with EINVAL, libva path
* returns CAPTURE buffer all-zero.
*
* Empirically verified on both fresnel (RK3399) and ampere
* (RK3588) 2026-05-17 — same all-zero / EINVAL failure
* mode on both. The backend infrastructure (codec.c,
* context.c, image.c, surface.c, nv15.c) is RETAINED for
* when the upstream ffmpeg gap closes — just re-add the
* profiles[index++] line and bump the (-5) guard back to
* (-6). See memory feedback_rk3399_h264_hi10p_advertised_not_functional
* for the empirical evidence.
*/
}
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) {
found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT,
V4L2_PIX_FMT_HEVC_SLICE) ||
v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
V4L2_PIX_FMT_HEVC_SLICE);
if (found && index < (V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES - 1))
profiles[index++] = VAProfileHEVCMain;
/*
* iter39 Phase 7 close (Option B): VAProfileHEVCMain10
* DELIBERATELY NOT ENUMERATED. Same reasoning as
* VAProfileH264High10 above — kernel + HW ready,
* userspace ffmpeg V4L2 hwaccel plumbing not. Untested
* specifically due to no Main10 fixture (system x265
* is 8-bit-only on Arch ARM), but same kernel/HW/
* userspace stack so same gap likely applies. Re-enable
* when ffmpeg-vaapi → V4L2 hwaccel adds 10-bit HEVC.
*/
}
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
profiles[index++] = VAProfileVP8Version0_3;
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP9_FRAME);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
profiles[index++] = VAProfileVP9Profile0;
/*
* ampere-av1-enablement Phase 2: AV1 Profile 0 advertised when
* vpu981 (RK3588 dedicated AV1 hantro) is probed. MAX_PROFILES
* bumped to 14 in request.h to safely fit even if iter39 Option
* B is reverted (Hi10P + Main10 back in enumeration → 13 total
* with AV1, the `< MAX - 1` guard then needs MAX ≥ 14).
*/
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
profiles[index++] = VAProfileAV1Profile0;
*profiles_count = index;
@@ -287,6 +165,8 @@ VAStatus RequestQueryConfigEntrypoints(VADriverContextP context,
VAEntrypoint *entrypoints,
int *entrypoints_count)
{
request_log("QueryConfigEntrypoints: profile=%d\n", profile);
switch (profile) {
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
@@ -295,12 +175,7 @@ VAStatus RequestQueryConfigEntrypoints(VADriverContextP context,
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
case VAProfileVP8Version0_3:
case VAProfileVP9Profile0:
case VAProfileAV1Profile0:
entrypoints[0] = VAEntrypointVLD;
*entrypoints_count = 1;
break;
@@ -351,20 +226,13 @@ VAStatus RequestGetConfigAttributes(VADriverContextP context, VAProfile profile,
{
unsigned int i;
request_log("GetConfigAttributes: profile=%d entrypoint=%d count=%d\n",
profile, entrypoint, attributes_count);
for (i = 0; i < attributes_count; i++) {
switch (attributes[i].type) {
case VAConfigAttribRTFormat:
/*
* iter39: 10-bit profiles publish YUV420_10. Profile-
* less query (this is invoked from vaGetConfigAttributes
* before vaCreateConfig) routes off the `profile` arg
* directly — same gating as RequestCreateConfig.
*/
if (profile == VAProfileH264High10 ||
profile == VAProfileHEVCMain10)
attributes[i].value = VA_RT_FORMAT_YUV420_10;
else
attributes[i].value = VA_RT_FORMAT_YUV420;
attributes[i].value = VA_RT_FORMAT_YUV420;
break;
default:
attributes[i].value = VA_ATTRIB_NOT_SUPPORTED;
-1
View File
@@ -43,7 +43,6 @@ struct object_config {
VAEntrypoint entrypoint;
VAConfigAttrib attributes[V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES];
int attributes_count;
unsigned int pixelformat;
};
VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
+111 -638
View File
@@ -29,7 +29,6 @@
#include "request.h"
#include "surface.h"
#include <errno.h>
#include <stdlib.h>
#include <string.h>
@@ -40,11 +39,10 @@
#include <linux/videodev2.h>
#include <mpeg2-ctrls.h>
#include <h264-ctrls.h>
#include <hevc-ctrls.h>
#include "nv15.h" /* iter40: fallback V4L2_PIX_FMT_NV15 define for Pi 5
* Debian headers that ship NC12 but not NV15. */
#include "nv12_col128.h" /* iter40: NC12 detile primitive + UV offset helper */
#include "utils.h"
#include "v4l2.h"
@@ -57,456 +55,43 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
{
struct request_data *driver_data = context->pDriverData;
struct object_config *config_object;
struct object_surface *surface_object;
struct object_context *context_object = NULL;
struct video_format *video_format;
unsigned int destination_sizes[VIDEO_MAX_PLANES];
unsigned int destination_bytesperlines[VIDEO_MAX_PLANES];
unsigned int destination_planes_count;
unsigned int format_width, format_height;
unsigned int pixelformat;
unsigned int length;
unsigned int offset;
void *source_data = MAP_FAILED;
VASurfaceID *ids = NULL;
VAContextID id;
VAStatus status;
unsigned int output_type, capture_type;
unsigned int j;
bool found;
unsigned int pixelformat;
unsigned int index_base;
unsigned int index;
unsigned int i;
int rc;
/*
* iter5b-β: CreateContext owns the V4L2 OUTPUT-side device-format
* lifecycle (S_FMT, CAPTURE-format probe, cap_pool_init, per-surface
* destination_* fill). Pre-β these lived in CreateSurfaces2 with a
* resolution-change gate; β moves them here because (a) config_id
* is known so the right OUTPUT pixel format can be derived from
* the bound profile, and (b) STREAMON happens at the end of this
* function, so the queue is never streaming when we do S_FMT.
*
* DestroyContext is the only per-session teardown site under β
* (no in-CreateSurfaces2 teardown branch). It STREAMOFFs both
* queues, calls request_pool_destroy + cap_pool_destroy, and
* REQBUFS(0) — leaving the V4L2 device in a clean slate for the
* next CreateContext.
*/
config_object = CONFIG(driver_data, config_id);
if (config_object == NULL) {
status = VA_STATUS_ERROR_INVALID_CONFIG;
goto error;
}
pixelformat = config_object->pixelformat;
if (pixelformat == 0) {
/*
* Defensive: CreateConfig rejects unhandled profiles, so
* pixelformat is always non-zero by the time we get here.
* Belt-and-suspenders.
*/
status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
goto error;
}
/*
* Probe the CAPTURE-side V4L2 format. video_format is a static
* pointer into video.c's formats[]; it stays valid for the life of
* the driver_data and is cached across CreateContext cycles. The
* probe doesn't require any prior S_FMT — v4l2_find_format
* enumerates the device's supported formats directly.
*
* iter39: choose NV15 (10-bit packed) for Hi10P / Main10 profiles,
* NV12 (8-bit) otherwise. If the cached video_format doesn't match
* the profile's bit-depth requirement, invalidate and re-probe —
* sibling pattern to iter38's device-switch invalidation in
* request_switch_device_for_profile().
*/
{
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
/*
* iter40: per-fd preferred pixelformat. rpi-hevc-dec exposes
* NC12 (8-bit) / NC30 (10-bit), not NV12 / NV15.
*/
unsigned int want_pixfmt;
if (is_rpi)
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
: V4L2_PIX_FMT_NV12_COL128;
else
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV15
: V4L2_PIX_FMT_NV12;
if (driver_data->video_format &&
driver_data->video_format->v4l2_format != want_pixfmt &&
driver_data->video_format->v4l2_format != V4L2_PIX_FMT_SUNXI_TILED_NV12)
driver_data->video_format = NULL;
}
if (!driver_data->video_format) {
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
video_format = NULL;
if (is_rpi) {
/*
* iter40: rpi-hevc-dec CAPTURE is NC12 (8-bit SAND
* 128-pixel-wide column tile) or NC30 (10-bit variant).
* Direct map; the kernel exposes BOTH formats in
* VIDIOC_ENUM_FMT(CAPTURE_MPLANE) without a pre-SPS
* step (verified Phase 0 strace), so find_format would
* also succeed — skip it for symmetry with the NV15
* iter39 branch below.
*/
video_format = video_format_find(
want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
: V4L2_PIX_FMT_NV12_COL128);
} else if (!want_10bit) {
found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_CAPTURE,
V4L2_PIX_FMT_SUNXI_TILED_NV12);
if (found)
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
V4L2_PIX_FMT_NV12);
if (found)
video_format = video_format_find(V4L2_PIX_FMT_NV12);
} else {
/*
* iter39 fresnel fix: rkvdec only advertises NV15 in
* VIDIOC_ENUM_FMT(CAPTURE) AFTER S_FMT(OUTPUT) +
* S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
* Before that, only NV12 is enumerated. Pre-finding
* NV15 always fails. Skip the find_format check and
* directly map to our NV15 video_format entry; the
* later S_FMT(CAPTURE) commits the actual NV15 mode
* once the synthetic SPS sets bit_depth_luma_minus8=2.
*/
video_format = video_format_find(V4L2_PIX_FMT_NV15);
}
if (video_format == NULL) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
driver_data->video_format = video_format;
}
video_format = driver_data->video_format;
/* iter39: session-wide flag drives image.c reporting + unpack. */
driver_data->is_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
if (video_format == NULL) {
request_log("CreateContext: video_format is NULL\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
output_type = v4l2_type_video_output(video_format->v4l2_mplane);
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
request_log("CreateContext: ENTER config=%u mplane=%d output_type=%u capture_type=%u surfaces_count=%d flags=0x%x\n",
config_id, video_format->v4l2_mplane, output_type,
capture_type, surfaces_count, flags);
/*
* Commit the OUTPUT pixel format. picture_width/picture_height
* are the kernel-facing dimensions for this decode session. With
* profile-derived pixelformat, hantro's CAPTURE-format derivation
* dispatches to the right codec_mode (pre-β hardcoded H264_SLICE
* meant hantro silently substituted MPEG2_DECODER for HEVC/VP8/VP9
* → all-zero CAPTURE; rkvdec silently dropped HEVC/VP9 → same
* outcome).
*/
rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
picture_width, picture_height);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
config_object = CONFIG(driver_data, config_id);
if (config_object == NULL) {
request_log("CreateContext: config_object NULL for config_id %u\n",
config_id);
status = VA_STATUS_ERROR_INVALID_CONFIG;
goto error;
}
/*
* iter15 α-19: explicit S_FMT on CAPTURE for rkvdec.
*
* Original iter5b-β comment: "Do NOT VIDIOC_S_FMT on CAPTURE — hantro
* reads the SPS from OUTPUT to set CAPTURE shape internally."
*
* Empirical finding at iter15 Phase 3 (2026-05-14): kdirect (ffmpeg-
* v4l2request) does S_FMT on CAPTURE side after S_FMT(OUTPUT),
* then CREATE_BUFS for CAPTURE. libva's old G_FMT-only path skipped
* the S_FMT call. For hantro this was deliberate (works); for rkvdec
* (HEVC + H.264 + VP9 on RK3399) the absence of explicit S_FMT puts
* the driver into a state where it does NOT commit the chosen NV12
* pixel format properly — and the resulting decode silently writes
* garbage or zero for HEVC + H.264 (Bug 4 + Bug 5).
*
* Per [[feedback-per-driver-kludge-gating]]: this driver-specific
* difference should be gated on driver_kind. For now use a single
* always-on S_FMT call as the safe move: kdirect proves S_FMT
* CAPTURE works on both hantro AND rkvdec (it's the reference path).
* The iter5b-β comment is preserved-but-amended below.
*
* Sequence: S_FMT OUTPUT (above) → S_FMT CAPTURE (this) → G_FMT
* CAPTURE (sanity read-back, matches what S_FMT committed).
*/
{
/*
* iter40: take the CAPTURE pixelformat from the resolved
* video_format slot — that's per-fd, per-bit-depth correct.
* rkvdec 8-bit → NV12
* rkvdec 10-bit → NV15
* hantro 8-bit → NV12
* rpi-hevc-dec → NC12 (V4L2_PIX_FMT_NV12_COL128)
* Pre-iter40 this was hardcoded NV12/NV15 — the rpi-hevc-dec
* fd would then have S_FMT(NV12) issued, and the kernel
* "helpfully" substituted V4L2_PIX_FMT_NV12MT_COL128 (the
* MULTI-PLANE-NON-CONTIGUOUS variant) instead of the
* SINGLE-PLANE NC12 we wanted, breaking cap_pool QUERYBUF
* downstream (Phase 7 iter40 first-run discovery).
*/
unsigned int capture_pixelformat =
driver_data->video_format->v4l2_format;
rc = v4l2_set_format(driver_data->video_fd, capture_type,
capture_pixelformat, picture_width,
picture_height);
if (rc < 0) {
/* Non-fatal: if the kernel rejects S_FMT CAPTURE (some
* older hantro variants), fall through to G_FMT. */
request_log("iter15 α-19: S_FMT CAPTURE failed (continuing): %s\n",
strerror(errno));
}
}
rc = v4l2_get_format(driver_data->video_fd, capture_type, &format_width,
&format_height, destination_bytesperlines,
destination_sizes, NULL);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
/*
* iter25 α-25: synthetic-SPS injection to pre-seed ctx->image_fmt
* before CAPTURE buffer allocation.
*
* Root cause (iter17→iter24 kernel-printk chain): rkvdec_s_ctrl for
* HEVC_SPS / H264_SPS calls get_image_fmt() and, if the resolved
* image_fmt differs from the cached ctx->image_fmt (default
* RKVDEC_IMG_FMT_ANY), tries to reset the CAPTURE format. The reset
* returns -EBUSY when vb2_is_busy(CAPTURE_queue) — i.e. any CAPTURE
* buffer is allocated.
*
* libva (iter5b-β CAPTURE pool) pre-allocates 24 CAPTURE buffers
* via cap_pool_init below — before any per-frame S_EXT_CTRLS
* arrives. So the first real HEVC_SPS at decode time fails with
* -EBUSY in try_or_set_cluster, breaks v4l2_ctrl_request_setup's
* outer loop, and leaves ctx->ctrl_hdl[SPS..DECODE_PARAMS] at all-
* zero contents. rkvdec_hevc_run reads zero, hardware sees w=0
* h=0, decoded CAPTURE is all-zero (Bug 5 + Bug 4).
*
* Fix: while CAPTURE is still empty (before cap_pool_init), inject
* a synthetic SPS containing the profile's chroma + bit_depth so
* rkvdec_s_ctrl resolves image_fmt and updates ctx->image_fmt
* before vb2_is_busy can return true. From then on, per-frame
* SPS submissions with matching profile parameters see
* image_fmt_changed=false → skip reset → commit succeeds.
*
* Gated by config->profile: only HEVC and H.264 paths set
* get_image_fmt in their rkvdec coded_fmt_desc->ops; VP9 / MPEG-2 /
* VP8 are unaffected (rkvdec_s_ctrl returns 0 immediately when
* get_image_fmt is NULL, or those codecs are routed to hantro).
*
* Failure is best-effort: if the kernel returns -EBUSY/-EINVAL here
* (e.g. driver doesn't expose the control on this DT path), we fall
* through and may still hit the original bug for that codec — but
* the device-init DECODE_MODE + START_CODE block below ALSO uses
* void-cast best-effort, so this is consistent with prior pattern.
*/
/*
* iter40 (Phase 5 review F6): the synthetic-SPS pre-seed is an
* rkvdec-specific quirk fix (the -EBUSY-on-CAPTURE-busy bug in
* rkvdec_s_ctrl). rpi-hevc-dec does NOT need it and uses a
* different submission ordering (Phase 0 strace: S_FMT_OUTPUT →
* REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON,
* with per-frame SPS via S_EXT_CTRLS class=0xf010000). Sending a
* stale dummy SPS at context-init time would leave rpi-hevc-dec's
* internal state on the dummy until the first real per-frame SPS
* arrives — exact behavior unknown but a known divergence from
* kdirect.
*
* Skip pre-seed when the active fd is rpi-hevc-dec. rkvdec /
* hantro paths unchanged.
*/
if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec) {
/*
* iter39: 10-bit profiles set bit_depth_luma_minus8 = 2 in
* the synthetic SPS so rkvdec's get_image_fmt resolves to
* RKVDEC_IMG_FMT_420_10BIT (per rkvdec-h264-common.c:196 +
* rkvdec-hevc-common.c:467). Image_fmt resolution depends
* only on bit_depth_luma_minus8 and chroma_format_idc;
* profile_idc is ignored for image_fmt and v4l2_ctrl_hevc_sps
* has no profile_idc field at all.
*/
bool ten = driver_data->is_10bit;
switch (config_object->profile) {
case VAProfileHEVCMain:
case VAProfileHEVCMain10: {
struct v4l2_ctrl_hevc_sps dummy_sps;
struct v4l2_ext_control dummy_ctrl;
memset(&dummy_sps, 0, sizeof(dummy_sps));
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
dummy_sps.pic_width_in_luma_samples = picture_width;
dummy_sps.pic_height_in_luma_samples = picture_height;
dummy_ctrl.id = V4L2_CID_STATELESS_HEVC_SPS;
dummy_ctrl.ptr = &dummy_sps;
dummy_ctrl.size = sizeof(dummy_sps);
(void)v4l2_set_controls(driver_data->video_fd, -1,
&dummy_ctrl, 1);
break;
}
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10: {
struct v4l2_ctrl_h264_sps dummy_sps;
struct v4l2_ext_control dummy_ctrl;
memset(&dummy_sps, 0, sizeof(dummy_sps));
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
dummy_sps.pic_width_in_mbs_minus1 =
(picture_width + 15) / 16 - 1;
dummy_sps.pic_height_in_map_units_minus1 =
(picture_height + 15) / 16 - 1;
dummy_sps.profile_idc = ten ? 110 : 100; /* High10 : High */
dummy_sps.level_idc = 41;
/*
* FRAME_MBS_ONLY required: rkvdec_h264_validate_sps
* doubles height for non-frame-mbs-only streams to
* compute frame-height from field-height. Without
* this flag, dummy with (height_in_map_units+1)*16
* = 1088 doubles to 2176 > coded_fmt 1080 → -EINVAL.
*/
dummy_sps.flags = V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY;
dummy_ctrl.id = V4L2_CID_STATELESS_H264_SPS;
dummy_ctrl.ptr = &dummy_sps;
dummy_ctrl.size = sizeof(dummy_sps);
(void)v4l2_set_controls(driver_data->video_fd, -1,
&dummy_ctrl, 1);
break;
}
default:
break;
}
} /* iter40: end of pre-seed-skip-on-rpi-hevc-dec guard */
destination_planes_count = video_format->planes_count;
/*
* Initialize the CAPTURE buffer pool (cap_pool). Pool size =
* max(surfaces_count, MIN_CAP_POOL). The headroom gives LRU
* recycling enough margin to never reuse a buffer within the
* consumer's compositor-hold window for typical playback
* patterns. cap_pool_init does the V4L2 CREATE_BUFS + per-slot
* mmap.
*
* `pool->initialized` is reset to false by cap_pool_destroy in
* DestroyContext; subsequent CreateContext re-inits at the new
* resolution.
*/
if (!driver_data->capture_pool.initialized) {
unsigned int pool_count = surfaces_count > MIN_CAP_POOL ?
surfaces_count : MIN_CAP_POOL;
rc = cap_pool_init(&driver_data->capture_pool,
driver_data->video_fd, capture_type,
pool_count, video_format->v4l2_buffers_count);
if (rc < 0) {
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
}
/*
* Compute format-uniform destination_* values. Same for all
* surfaces of this format; written once per surface, never
* changed by BeginPicture's slot acquisition.
*/
if (video_format->v4l2_buffers_count == 1) {
if (video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
/*
* iter40: NC12 SAND layout: Y plane size is
* NUM_COLUMNS * TILE_W * ALIGN(height, 8) (= linear
* NV12 Y for column-aligned widths), UV plane is half.
* The kernel-reported destination_bytesperlines[0] is
* the COLUMN stride (ALIGN(height,8)*3/2), not the
* linear Y stride — using it × format_height gives the
* wrong intra-buffer UV offset (destination_offsets[1]
* derives from destination_sizes[0] in
* surface_fill_format_uniform).
*
* Use format_width/format_height (kernel-returned from
* G_FMT) not picture_width/height (caller request),
* because the kernel applies its own ALIGN rules; the
* UV plane location is keyed off the kernel layout.
*/
unsigned int uv_off = nv12_col128_uv_plane_offset(
format_width, format_height);
destination_sizes[0] = uv_off;
for (j = 1; j < destination_planes_count; j++)
destination_sizes[j] = uv_off / 2;
request_log("iter40: NC12 sizes pic=%ux%u fmt=%ux%u bpl=%u uv_off=%u sizeimage(kernel)=%u\n",
picture_width, picture_height,
format_width, format_height,
destination_bytesperlines[0], uv_off,
destination_bytesperlines[0] * format_height);
} else {
destination_sizes[0] = destination_bytesperlines[0] *
format_height;
for (j = 1; j < destination_planes_count; j++)
destination_sizes[j] = destination_sizes[0] / 2;
}
}
/*
* iter5b-β Commit D: cache the format-uniform CAPTURE geometry
* in driver_data. CreateSurfaces2 calls AFTER this CreateContext
* (ffmpeg vaapi-copy late-surface-allocation case) will lazy-fill
* via surface_fill_format_uniform(); the surface_heap walk below
* fills surfaces that pre-existed when CreateContext fired.
*/
driver_data->fmt_planes_count = destination_planes_count;
driver_data->fmt_buffers_count = video_format->v4l2_buffers_count;
driver_data->fmt_format_height = format_height;
for (j = 0; j < destination_planes_count; j++) {
driver_data->fmt_sizes[j] = destination_sizes[j];
driver_data->fmt_bytesperlines[j] =
destination_bytesperlines[j];
}
driver_data->fmt_valid = true;
/*
* Walk the surface_heap (not just surfaces_ids[]) to populate
* destination_* on every existing surface. Pre-Commit-D we walked
* surfaces_ids[], which is empty for ffmpeg vaapi-copy consumers
* that call vaCreateContext with surfaces_count=0 — those surfaces
* exist in the heap but aren't in the param array. Walking the
* heap catches both flows. Late-created surfaces (after this
* CreateContext) fill via surface_fill_format_uniform in
* CreateSurfaces2's per-surface init.
*/
{
struct object_surface *surface_iter;
int heap_iter;
surface_iter = (struct object_surface *)
object_heap_first(&driver_data->surface_heap,
&heap_iter);
while (surface_iter != NULL) {
surface_fill_format_uniform(driver_data, surface_iter);
surface_iter = (struct object_surface *)
object_heap_next(&driver_data->surface_heap,
&heap_iter);
}
}
request_log("CreateContext: profile=%d width=%d height=%d\n",
config_object->profile, picture_width, picture_height);
id = object_heap_allocate(&driver_data->context_heap);
context_object = CONTEXT(driver_data, id);
@@ -515,180 +100,103 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
goto error;
}
memset(&context_object->dpb, 0, sizeof(context_object->dpb));
context_object->timestamp_counter = 0; /* iter9 α-7 */
/*
* Initialize the OUTPUT (bitstream-input) buffer pool. Sized by
* codec pipeline depth (4 H.264 frames in flight is sufficient
* for current hantro/rkvdec scheduling); independent of caller-
* supplied surfaces_count. Pool is owned by driver_data so it
* outlives any single context destroy/recreate cycle.
*
* This replaces the prior per-surface OUTPUT loop, which (a)
* created an empty queue when surfaces_count==0 (ffmpeg vaapi-
* copy path) and (b) only populated surface->source_* for
* surfaces present at vaCreateContext time, NULL-derefing on
* surfaces created later.
*/
/*
* iter6: pool size 16 gives comfortable headroom over typical H.264
* MaxDpbFrames (16) for any consumer that pipelines decode requests.
* Each slot owns its own request_fd (REINIT'd per use).
*/
rc = request_pool_init(&driver_data->output_pool,
driver_data->video_fd, driver_data->media_fd,
output_type, 16, pixelformat,
(unsigned int)picture_width,
(unsigned int)picture_height);
switch (config_object->profile) {
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
pixelformat = V4L2_PIX_FMT_MPEG2_SLICE;
break;
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
pixelformat = V4L2_PIX_FMT_H264_SLICE;
break;
case VAProfileHEVCMain:
pixelformat = V4L2_PIX_FMT_HEVC_SLICE;
break;
default:
status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
goto error;
}
rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
picture_width, picture_height);
if (rc < 0) {
request_log("CreateContext: S_FMT(OUTPUT) failed for pixelformat 0x%x\n",
pixelformat);
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
rc = v4l2_create_buffers(driver_data->video_fd, output_type,
surfaces_count, &index_base);
if (rc < 0) {
request_log("CreateContext: CREATE_BUFS failed surfaces=%d\n",
surfaces_count);
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
request_log("CreateContext: S_FMT + CREATE_BUFS ok, index_base=%u\n",
index_base);
/*
* The surface_ids array has been allocated by the caller and
* we don't have any indication wrt its life time. Let's make sure
* its life span is under our control.
*/
if (surfaces_count > 0) {
ids = malloc(surfaces_count * sizeof(VASurfaceID));
if (ids == NULL) {
ids = malloc(surfaces_count * sizeof(VASurfaceID));
if (ids == NULL) {
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
memcpy(ids, surfaces_ids, surfaces_count * sizeof(VASurfaceID));
for (i = 0; i < surfaces_count; i++) {
index = index_base + i;
surface_object = SURFACE(driver_data, surfaces_ids[i]);
if (surface_object == NULL) {
status = VA_STATUS_ERROR_INVALID_SURFACE;
goto error;
}
rc = v4l2_query_buffer(driver_data->video_fd, output_type,
index, &length, &offset, 1);
if (rc < 0) {
request_log("CreateContext: QUERYBUF idx=%u failed\n", index);
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
memcpy(ids, surfaces_ids,
surfaces_count * sizeof(VASurfaceID));
}
/*
* Stateless H.264 device-wide controls. The kernel V4L2 stateless
* framework requires DECODE_MODE and START_CODE be set on the
* device fd (request_fd=-1) before VIDIOC_STREAMON; per-request
* controls (SPS/PPS/etc.) attached to a request_fd come later.
*
* hantro-vpu via rockchip,rk3568-vpu DT compatible (covers RK3568
* and RK3566 — PineTab2 silicon — since they're close enough)
* accepts only DECODE_MODE_FRAME_BASED.
* START_CODE_ANNEX_B preserves leading 0x00000001 in the slice
* payload that h264.c assembles. Errors here are not fatal: not
* every backing driver supports both controls (e.g. cedrus may
* default to SLICE_BASED without exposing DECODE_MODE).
*/
{
struct v4l2_ext_control dev_ctrls[2] = {
{
.id = V4L2_CID_STATELESS_H264_DECODE_MODE,
.value = V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
},
{
.id = V4L2_CID_STATELESS_H264_START_CODE,
.value = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
},
};
(void)v4l2_set_controls(driver_data->video_fd, -1,
dev_ctrls, 2);
}
/*
* iter2: HEVC device-wide controls. Same best-effort pattern as
* H.264 above — separate batched call so a kernel that does not
* advertise HEVC controls (e.g. hantro-vpu-dec on RK3568/RK3399)
* silently fails on this batch without invalidating the H.264
* batch. rkvdec on RK3399 advertises HEVC and accepts FRAME_BASED
* + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory).
*/
{
/*
* iter40: per-driver HEVC start_code menu value. rkvdec /
* hantro path uses ANNEX_B + start-code-prepended payload.
* rpi-hevc-dec uses NONE — confirmed empirically Phase 7
* (any other mode → V4L2_BUF_FLAG_ERROR on every CAPTURE
* DQBUF, all-zero output). kdirect's strace also shows
* start_code=0 on rpi-hevc-dec. Both are accepted by the
* driver's QUERY_EXT_CTRL menu (min=0 max=1), but only NONE
* actually drives correct decode on the Pi.
*/
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
struct v4l2_ext_control hevc_dev_ctrls[2] = {
{
.id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
.value = V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED,
},
{
.id = V4L2_CID_STATELESS_HEVC_START_CODE,
.value = is_rpi
? 0 /* V4L2_STATELESS_HEVC_START_CODE_NONE */
: V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
},
};
(void)v4l2_set_controls(driver_data->video_fd, -1,
hevc_dev_ctrls, 2);
}
/*
* Mirror the ANNEX_B start-code mode set on the device above
* into context_object->h264_start_code so picture.c::
* codec_store_buffer prepends 0x00 0x00 0x01 to each slice
* payload it copies into the OUTPUT buffer. Without this, the
* kernel — which we just told to expect ANNEX_B — sees a raw
* NAL stream with no start codes, fails to find slice
* boundaries, and emits a zeroed CAPTURE buffer (visually a
* flat dark-green frame).
*
* iter4 fix: this start-code prepend is ANNEX-B-specific and
* applies to H.264 and HEVC ONLY. MPEG-2, VP8, and VP9 use raw
* frame bitstreams without start codes — prepending 0x00 0x00 0x01
* to a VP9 uncompressed header produces a frame_marker mismatch
* (kernel reads 0x00 instead of 0x10), the rkvdec driver silently
* fails to find a valid frame, and the CAPTURE slot stays at its
* cap_pool init pattern (a dim 0x4c green). Phase 7 verification
* caught this for VP9; iter1+iter3 transitive proof masked it for
* MPEG-2/VP8 because those iters compared payload bytes, not
* decoded pixels.
*
* h264_get_controls() exists for this purpose but is never
* called in the current code path; the planned probe-then-set
* commit will replace this hardcoded assignment with a runtime
* read of the kernel's accepted START_CODE value.
*/
{
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
switch (config_object->profile) {
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
context_object->h264_start_code = true;
break;
case VAProfileHEVCMain:
/* iter40: rpi-hevc-dec rejects start-code-prepended
* payload (DQBUF error flag on every CAPTURE buffer).
* Gate to match the per-driver START_CODE menu value
* set above: NONE on rpi → no prepend; ANNEX_B on
* rkvdec → prepend. */
context_object->h264_start_code = !is_rpi;
break;
default:
context_object->h264_start_code = false;
break;
source_data = mmap(NULL, length, PROT_READ | PROT_WRITE,
MAP_SHARED, driver_data->video_fd, offset);
if (source_data == MAP_FAILED) {
request_log("CreateContext: mmap len=%u offset=%u failed\n",
length, offset);
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
surface_object->source_index = index;
surface_object->source_data = source_data;
surface_object->source_size = length;
}
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
rc = v4l2_set_stream(driver_data->video_fd, capture_type, true);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
/* Fourier-local: defer STREAMON until first frame is actually queued.
* The V4L2 stateless protocol requires OUTPUT format + at least one
* queued slice with SPS/PPS controls attached before STREAMON will
* succeed on hantro. The bootlin library was written for sunxi-cedrus
* which used a different protocol with no such ordering constraint.
* Just being able to vaCreateContext() without erroring lets us see
* what the next stage of the call chain expects. */
request_log("CreateContext: deferred STREAMON until first QBUF\n");
context_object->config_id = config_id;
context_object->render_surface_id = VA_INVALID_ID;
@@ -700,10 +208,15 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
*context_id = id;
request_log("CreateContext: SUCCESS context_id=%u\n", id);
status = VA_STATUS_SUCCESS;
goto complete;
error:
if (source_data != MAP_FAILED)
munmap(source_data, length);
if (ids != NULL)
free(ids);
@@ -755,53 +268,13 @@ VAStatus RequestDestroyContext(VADriverContextP context, VAContextID context_id)
object_heap_free(&driver_data->context_heap,
(struct object_base *)context_object);
/*
* iter5b-β: tear down the OUTPUT pool (mmap unmaps) BEFORE
* REQBUFS(0) frees the kernel-side buffers. Pre-β this was done
* only by surface.c's resolution-change branch — which β removed.
* Without this here, the next CreateContext's request_pool_init
* sees pool->initialized=true with stale slot pointers, returns
* 0 without re-CREATE_BUFS, and the next QBUF EINVALs because
* the slots reference buffer indices that no longer exist
* (Phase 5 v2 review CRIT-2).
*/
if (driver_data->output_pool.initialized)
request_pool_destroy(&driver_data->output_pool);
rc = v4l2_request_buffers(driver_data->video_fd, output_type, 0);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
/*
* Iter2 Fix 3 (still relevant under β): cap_pool owns the
* CAPTURE buffers' mmaps + any outstanding our_export_fds. Tear
* it down (which also issues REQBUFS(0) on CAPTURE), so the next
* CreateContext cycle sees a clean slate.
*/
cap_pool_destroy(&driver_data->capture_pool, driver_data->video_fd,
capture_type);
/*
* iter5b-β: driver_data->video_format is a static-ref pointer
* into video.c's formats[]; it stays valid for the life of the
* driver_data and intentionally survives DestroyContext cycles.
* The next CreateContext's `if (!driver_data->video_format)`
* guard skips the probe — correct, because the device's CAPTURE
* format menu doesn't change.
*
* The pre-β surface_reset_format_cache() call here is removed:
* β doesn't have a last_output_{width,height,pixelformat} cache
* (those fields are deleted). Each CreateContext is a fresh
* S_FMT(OUTPUT) cycle.
*
* Commit D: invalidate the format-uniform cache so a CreateSurfaces2
* call between DestroyContext and the next CreateContext doesn't
* lazy-fill with stale geometry from the now-torn-down session.
* The next CreateContext re-populates the cache.
*/
driver_data->fmt_valid = false;
/* iter39: clear 10-bit session flag — next CreateContext re-sets. */
driver_data->is_10bit = false;
rc = v4l2_request_buffers(driver_data->video_fd, capture_type, 0);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return VA_STATUS_SUCCESS;
}
-24
View File
@@ -27,9 +27,6 @@
#ifndef _CONTEXT_H_
#define _CONTEXT_H_
#include <stdbool.h>
#include <stdint.h>
#include <va/va_backend.h>
#include "object_heap.h"
@@ -53,27 +50,6 @@ struct object_context {
/* H264 only */
struct h264_dpb dpb;
bool h264_start_code;
/*
* iter9 α-7: monotonic per-context timestamp counter (us). Replaces
* gettimeofday in EndPicture so DPB.reference_ts / OUTPUT QBUF ts
* are small values matching ffmpeg-v4l2request's pattern. Placed
* here (object_context) not driver_data per Phase 5 IMP-1 to avoid
* cross-context collisions.
*/
uint64_t timestamp_counter;
/* fresnel-fourier iter4: VP9 loop-filter delta state, persisted across
* frames per kernel UAPI <linux/v4l2-controls.h>:2578 ("If this syntax
* element is not present in the bitstream, users should pass its last
* value.") and VP9 spec defaults from FFmpeg vp9.c:666-671. Reset on
* keyframe / error-resilient / intra-only via vp9_lf.initialized=false. */
struct {
int8_t ref_deltas[4];
int8_t mode_deltas[2];
bool initialized;
} vp9_lf;
};
VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
+59 -629
View File
@@ -28,18 +28,17 @@
#include <assert.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <linux/videodev2.h>
#include <h264-ctrls.h>
#include "request.h"
#include "utils.h"
#include "surface.h"
#include "v4l2.h"
#include "h264_slice_header.h"
enum h264_slice_type {
H264_SLICE_P = 0,
@@ -97,8 +96,7 @@ static struct h264_dpb_entry *dpb_find_entry(struct object_context *context)
}
static struct h264_dpb_entry *dpb_lookup(struct object_context *context,
VAPictureH264 *pic, unsigned int *idx,
unsigned char *fields)
VAPictureH264 *pic, unsigned int *idx)
{
unsigned int i;
@@ -112,16 +110,6 @@ static struct h264_dpb_entry *dpb_lookup(struct object_context *context,
if (idx)
*idx = i;
if (fields) {
//if (entry->pic.TopFieldOrderCnt < entry->pic.BottomFieldOrderCnt) {
// *fields = V4L2_H264_TOP_FIELD_REF;
//} else if (entry->pic.TopFieldOrderCnt > entry->pic.BottomFieldOrderCnt) {
// *fields = V4L2_H264_BOTTOM_FIELD_REF;
//} else {
*fields = V4L2_H264_FRAME_REF;
//}
}
return entry;
}
}
@@ -143,7 +131,7 @@ static void dpb_insert(struct object_context *context, VAPictureH264 *pic,
if (is_picture_null(pic))
return;
if (dpb_lookup(context, pic, NULL, NULL))
if (dpb_lookup(context, pic, NULL))
return;
if (!entry)
@@ -178,7 +166,7 @@ static void dpb_update(struct object_context *context,
if (is_picture_null(pic))
continue;
entry = dpb_lookup(context, pic, NULL, NULL);
entry = dpb_lookup(context, pic, NULL);
if (entry) {
entry->age = context->dpb.age;
entry->used = true;
@@ -188,61 +176,10 @@ static void dpb_update(struct object_context *context,
}
}
/*
* Strip ffmpeg-vaapi's POC sentinel.
*
* ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
* 0x10000 in libavcodec/h264dec.c (lines 301 and 444 of v8.0). After
* an IDR the idr() helper resets prev_poc_msb to that same sentinel.
* ff_h264_init_poc (libavcodec/h264_parse.c lines 296-305) then
* computes pc->poc_msb as prev_poc_msb when the slice header's
* poc_lsb hasn't wrapped — which is the typical case for normal
* content. The sentinel leaks into field_poc[] and from there into
* VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
* libavcodec/vaapi_h264.c::fill_vaapi_pic.
*
* Working VAAPI backends (intel-iHD, i965 verified empirically on
* meitner 2026-05-02) tolerate the high word — they either mask it
* or treat POCs as relative comparisons. V4L2 stateless H.264
* driver-side consumers (hantro_h264.c::prepare_table feeds the
* value direct to tbl->poc[]) need the spec value, so we strip the
* sentinel here at the libva-v4l2-request boundary.
*
* Detection by bit-16-set rather than blind subtraction so that a
* future ffmpeg version that fixes the sentinel leak degrades
* gracefully. POC values for non-degenerate H.264 content rarely
* exceed 16 bits; bit 16 set is a strong signal of the sentinel.
*
* Empty DPB slots (VA_PICTURE_H264_INVALID) carry POC=0 by
* libavcodec/vaapi_h264.c::init_vaapi_pic and need no fix-up.
*/
static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags)
{
if (flags & VA_PICTURE_H264_INVALID)
return 0;
/*
* iter8 α-2: pass POC values through unchanged for rkvdec. The
* sentinel-subtract was added for hantro's tbl->poc[] prepare_table
* which fed the value through unmasked. rkvdec writes POC to MMIO
* via writel_relaxed (rkvdec-h264.c:975-978) and the macro
* RKVDEC_CUR_POC is a 32-bit passthrough. kdirect (ffmpeg-v4l2request)
* delivers the sentinel-encoded value directly and decodes
* correctly; libva's strip was the cause of the 16x32 partial-fill
* Bug 4 symptom. Hantro+H.264 isn't exercised on RK3399 (hantro-dec
* doesn't advertise H.264 there) — restoring the strip per-driver
* is iter9 work if it ever surfaces.
*/
return poc;
}
static void h264_fill_dpb(struct request_data *data,
struct object_context *context,
VAPictureParameterBufferH264 *VAPicture,
struct v4l2_ctrl_h264_decode_params *decode)
{
const int max_frame_num =
1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
const int cur_frame_num = (int)VAPicture->frame_num;
int i;
for (i = 0; i < H264_DPB_SIZE; i++) {
@@ -252,26 +189,7 @@ static void h264_fill_dpb(struct request_data *data,
SURFACE(data, entry->pic.picture_id);
uint64_t timestamp;
/*
* Skip entries no longer referenced by the consumer's
* VAPictureParameterBufferH264.ReferenceFrames[]. dpb_update()
* clears `used` for all entries then re-marks only those in the
* current ReferenceFrames list; entries with valid=true but
* used=false are stale (a frame the libva consumer has retired
* from its DPB).
*
* Without this skip, our V4L2 dpb[] grows monotonically until
* H264_DPB_SIZE; by frame_num=10 it carries 7+ entries while
* SPS.max_num_ref_frames may be 4. The kernel reflist builder /
* cluster validator rejects the request with EINVAL once the
* count exceeds the SPS contract — which iter1+iter2+iter3
* surfaced as the "frame-11 EINVAL" carryover. iter4 fix:
* report only currently-used entries to match FFmpeg's
* libavcodec/v4l2_request_h264.c::fill_dpb behaviour (which
* iterates h->short_ref[] / h->long_ref[] — exactly the
* currently-referenced set).
*/
if (!entry->valid || !entry->used)
if (!entry->valid)
continue;
if (surface) {
@@ -280,47 +198,8 @@ static void h264_fill_dpb(struct request_data *data,
}
dpb->frame_num = entry->pic.frame_idx;
/*
* Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
* equal the H.264 spec's PicNum (8-28) for short-term refs
* or LongTermPicNum (8-29) for long-term refs.
*
* For frames (not field-coded), PicNum = FrameNumWrap.
* FrameNumWrap = (frame_num > cur_frame_num)
* ? frame_num - max_frame_num
* : frame_num
* (per spec section 8.2.4.1, frame_num wraparound).
*
* VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
* line 64): VAPictureH264.frame_idx holds long_term_frame_idx
* for long-term refs and frame_num for short-term refs. So
* for long-term entries we copy frame_idx straight through
* as LongTermPicNum.
*
* fourier's previous code set pic_num to picture_id (the
* VAAPI surface id) which is unrelated to H.264 PicNum;
* mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
* one consumer that fails on that. Hantro doesn't read
* pic_num at all (uses reference_ts for ref resolution),
* which is why fourier's wrong value never surfaced on
* PineTab2 (RK3566 via hantro/rk3568-vpu).
*/
if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
dpb->pic_num = entry->pic.frame_idx;
} else {
int frame_num = (int)entry->pic.frame_idx;
dpb->pic_num = (frame_num > cur_frame_num)
? frame_num - max_frame_num
: frame_num;
}
dpb->top_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
entry->pic.flags);
dpb->bottom_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(entry->pic.BottomFieldOrderCnt,
entry->pic.flags);
dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;
dpb->flags = V4L2_H264_DPB_ENTRY_FLAG_VALID;
@@ -329,27 +208,6 @@ static void h264_fill_dpb(struct request_data *data,
if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE)
dpb->flags |= V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM;
/*
* Mark this DPB entry as a frame reference (both top + bottom
* fields). The kernel's v4l2_h264_init_reflist_builder iterates
* dpb[] and skips entries whose `fields` member is zero — they
* count as "no valid field reference for this entry." For
* frame-coded streams (BBB and most desktop H.264) every
* reference is a frame reference; per UAPI doc
* (ext-ctrls-codec-stateless.rst), fields must be set to
* V4L2_H264_FRAME_REF (= TOP|BOTTOM) for frames.
*
* Cross-reference: FFmpeg libavcodec/v4l2_request_h264.c::
* fill_dpb_entry sets entry->fields from pic->reference; for
* frames pic->reference includes V4L2_H264_FRAME_REF. Without
* this, P-slices that need to walk the reference list (the
* first one in BBB is at frame 11) hit "no valid refs" inside
* the kernel's reflist builder and S_EXT_CTRLS rejects the
* whole request with EINVAL (error_idx == count, the kernel's
* "application bug" sentinel).
*/
dpb->fields = V4L2_H264_FRAME_REF;
}
}
@@ -361,126 +219,13 @@ static void h264_va_picture_to_v4l2(struct request_data *driver_data,
struct v4l2_ctrl_h264_pps *pps,
struct v4l2_ctrl_h264_sps *sps)
{
unsigned char *b;
unsigned char nal_ref_idc;
unsigned char nal_unit_type;
h264_fill_dpb(driver_data, context, decode);
/* Extract missing nal_ref_idc and nal_unit_type */
b = surface->source_data;
if (context->h264_start_code)
b += 3;
nal_ref_idc = (b[0] >> 5) & 0x3;
nal_unit_type = b[0] & 0x1f;
/*
* Bit-parse the slice_header() to recover fields VAAPI doesn't
* forward and that hantro G1 hardware reads out of DECODE_PARAMS:
*
* - dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
* - idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID
* - pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH
* - pic_order_cnt_lsb / delta_pic_order_cnt_* (used by hantro
* reference-list builder for poc_type=0/1 inter prediction)
*
* Without these set correctly, hantro's hardware bitstream parser
* walks past zero bits, lands on garbage, decodes zero pixels —
* the all-zero CAPTURE output observed during 2026-05-04 Phase 0.
*
* Spec: ITU-T H.264 §7.3.3 slice_header. Cross-reference (proven
* working): FFmpeg libavcodec/h264_slice.c populates
* H264SliceContext::ref_pic_marking_bit_size and
* pic_order_cnt_bit_size by the same bit-precise parse.
*/
{
const struct h264_slice_header_context sh_ctx = {
.separate_colour_plane_flag =
(VAPicture->seq_fields.bits.residual_colour_transform_flag != 0),
.log2_max_frame_num_minus4 =
VAPicture->seq_fields.bits.log2_max_frame_num_minus4,
.frame_mbs_only_flag =
(VAPicture->seq_fields.bits.frame_mbs_only_flag != 0),
.pic_order_cnt_type =
VAPicture->seq_fields.bits.pic_order_cnt_type,
.log2_max_pic_order_cnt_lsb_minus4 =
VAPicture->seq_fields.bits.log2_max_pic_order_cnt_lsb_minus4,
.delta_pic_order_always_zero_flag =
(VAPicture->seq_fields.bits.delta_pic_order_always_zero_flag != 0),
.bottom_field_pic_order_in_frame_present_flag =
(VAPicture->pic_fields.bits.pic_order_present_flag != 0),
.redundant_pic_cnt_present_flag =
(VAPicture->pic_fields.bits.redundant_pic_cnt_present_flag != 0),
.weighted_pred_flag =
(VAPicture->pic_fields.bits.weighted_pred_flag != 0),
.weighted_bipred_idc =
VAPicture->pic_fields.bits.weighted_bipred_idc,
.num_ref_idx_l0_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l0_active_minus1,
.num_ref_idx_l1_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l1_active_minus1,
.chroma_format_idc =
VAPicture->seq_fields.bits.chroma_format_idc,
.bit_depth_luma_minus8 =
VAPicture->bit_depth_luma_minus8,
.bit_depth_chroma_minus8 =
VAPicture->bit_depth_chroma_minus8,
.nal_unit_type = nal_unit_type,
.nal_ref_idc = nal_ref_idc,
};
struct h264_slice_header_info sh = { 0 };
unsigned char *nal_payload = b + 1; /* past NAL header byte */
size_t nal_payload_len = surface->slices_size -
(size_t)((nal_payload) - (unsigned char *)surface->source_data);
int sh_rc = h264_parse_slice_header(nal_payload, nal_payload_len,
&sh_ctx, &sh);
if (sh_rc == 0) {
decode->idr_pic_id = sh.idr_pic_id;
decode->pic_order_cnt_lsb = sh.pic_order_cnt_lsb;
decode->delta_pic_order_cnt_bottom = sh.delta_pic_order_cnt_bottom;
decode->delta_pic_order_cnt0 = sh.delta_pic_order_cnt0;
decode->delta_pic_order_cnt1 = sh.delta_pic_order_cnt1;
decode->pic_order_cnt_bit_size = sh.pic_order_cnt_bit_size;
decode->dec_ref_pic_marking_bit_size = sh.dec_ref_pic_marking_bit_size;
} else {
request_log("slice_header parse FAILED rc=%d "
"(payload_len=%zu) — DECODE_PARAMS bit_size "
"fields left zero, hantro will likely produce zeros\n",
sh_rc, nal_payload_len);
}
}
h264_fill_dpb(driver_data, context, VAPicture, decode);
/*
* Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
* we can derive from VAAPI's pre-parsed VAPictureParameterBuffer
* + bitstream byte. Cross-reference: GStreamer
* gstv4l2codech264dec.c::gst_v4l2_codec_h264_dec_fill_decoder_params
* (lines 632-678).
*
* Fields not derivable from VAAPI (idr_pic_id, pic_order_cnt_lsb,
* delta_pic_order_cnt_*, dec_ref_pic_marking_bit_size,
* pic_order_cnt_bit_size, slice_group_change_cycle) require a
* full slice_header() bit-level parse, which libva-v4l2-request
* does not currently do. They are left at zero-init and the
* kernel-side hantro-vpu may compute them itself when scanning
* the OUTPUT bitstream — a hypothesis verified empirically by
* running this patch and inspecting the CAPTURE buffer.
*/
decode->nal_ref_idc = nal_ref_idc;
decode->frame_num = VAPicture->frame_num;
decode->top_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.TopFieldOrderCnt,
VAPicture->CurrPic.flags);
decode->bottom_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.BottomFieldOrderCnt,
VAPicture->CurrPic.flags);
if (nal_unit_type == 5)
decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
if (VAPicture->pic_fields.bits.field_pic_flag)
decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC;
if (VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD)
decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD;
/* num_slices is no longer carried by v4l2_ctrl_h264_decode_params; the
* kernel infers slice count from the number of slice_params controls
* queued (one per slice via the request API). */
decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
pps->weighted_bipred_idc =
VAPicture->pic_fields.bits.weighted_bipred_idc;
@@ -513,7 +258,6 @@ static void h264_va_picture_to_v4l2(struct request_data *driver_data,
if (VAPicture->pic_fields.bits.redundant_pic_cnt_present_flag)
pps->flags |= V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT;
sps->max_num_ref_frames = VAPicture->num_ref_frames;
sps->chroma_format_idc = VAPicture->seq_fields.bits.chroma_format_idc;
sps->bit_depth_luma_minus8 = VAPicture->bit_depth_luma_minus8;
sps->bit_depth_chroma_minus8 = VAPicture->bit_depth_chroma_minus8;
@@ -560,32 +304,6 @@ static void h264_va_matrix_to_v4l2(struct request_data *driver_data,
sizeof(v4l2_matrix->scaling_list_8x8[3]));
}
/*
* H.264 spec default scaling matrices: Flat_4x4_16 and Flat_8x8_16
* (every entry = 16). When sps_scaling_matrix_present_flag and
* pps_scaling_matrix_present_flag are both false, the bitstream
* carries no explicit scaling lists and the decoder uses these
* flat defaults — matching ITU-T H.264 (08/2024) §7.4.2.1.1.1
* (sequence scaling) and §7.4.2.2 (picture scaling).
*
* Why we always provide the matrix: hantro G1's set_params reads
* pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT to drive
* the G1_REG_DEC_CTRL2_TYPE1_QUANT_E hardware bit. FFmpeg's
* v4l2_request_h264.c always submits the SCALING_MATRIX control
* with the spec default when the bitstream omits explicit lists,
* and always sets the SCALING_MATRIX_PRESENT flag (commit
* comment: "FFmpeg always provide a scaling matrix"). We mirror
* that so the kernel sees a consistent control set across drivers.
*/
static void h264_default_flat_scaling_matrix(
struct v4l2_ctrl_h264_scaling_matrix *v4l2_matrix)
{
memset(v4l2_matrix->scaling_list_4x4, 16,
sizeof(v4l2_matrix->scaling_list_4x4));
memset(v4l2_matrix->scaling_list_8x8, 16,
sizeof(v4l2_matrix->scaling_list_8x8));
}
static void h264_copy_pred_table(struct v4l2_h264_weight_factors *factors,
unsigned int num_refs,
int16_t luma_weight[32],
@@ -613,11 +331,12 @@ static void h264_va_slice_to_v4l2(struct request_data *driver_data,
VASliceParameterBufferH264 *VASlice,
VAPictureParameterBufferH264 *VAPicture,
struct v4l2_ctrl_h264_slice_params *slice,
struct v4l2_ctrl_h264_pred_weights *weights)
struct v4l2_ctrl_h264_pred_weights *pred_weights)
{
/* Slice byte size is now communicated via the OUTPUT buffer's
* bytesused at QBUF time; the v4l2_ctrl_h264_slice_params struct
* was slimmed during upstreaming and no longer carries it. */
slice->header_bit_size = VASlice->slice_data_bit_offset;
//if (context->h264_start_code)
// slice->header_bit_size += 3 * 8;
slice->first_mb_in_slice = VASlice->first_mb_in_slice;
slice->slice_type = VASlice->slice_type;
slice->cabac_init_idc = VASlice->cabac_init_idc;
@@ -638,14 +357,13 @@ static void h264_va_slice_to_v4l2(struct request_data *driver_data,
VAPictureH264 *pic = &VASlice->RefPicList0[i];
struct h264_dpb_entry *entry;
unsigned int idx;
unsigned char fields;
entry = dpb_lookup(context, pic, &idx, &fields);
entry = dpb_lookup(context, pic, &idx);
if (!entry)
continue;
slice->ref_pic_list0[i].index = idx;
slice->ref_pic_list0[i].fields = fields;
slice->ref_pic_list0[i].fields = V4L2_H264_FRAME_REF;
}
}
@@ -659,28 +377,29 @@ static void h264_va_slice_to_v4l2(struct request_data *driver_data,
VAPictureH264 *pic = &VASlice->RefPicList1[i];
struct h264_dpb_entry *entry;
unsigned int idx;
unsigned char fields;
entry = dpb_lookup(context, pic, &idx, &fields);
entry = dpb_lookup(context, pic, &idx);
if (!entry)
continue;
slice->ref_pic_list1[i].index = idx;
slice->ref_pic_list1[i].fields = fields;
slice->ref_pic_list1[i].fields = V4L2_H264_FRAME_REF;
}
}
if (VASlice->direct_spatial_mv_pred_flag)
slice->flags |= V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED;
weights->chroma_log2_weight_denom =
/* Predicted weight table moved out of slice_params into its own
* V4L2_CID_STATELESS_H264_PRED_WEIGHTS control during upstreaming. */
pred_weights->chroma_log2_weight_denom =
VASlice->chroma_log2_weight_denom;
weights->luma_log2_weight_denom =
pred_weights->luma_log2_weight_denom =
VASlice->luma_log2_weight_denom;
if (((VASlice->slice_type % 5) == H264_SLICE_P) ||
((VASlice->slice_type % 5) == H264_SLICE_B))
h264_copy_pred_table(&weights->weight_factors[0],
h264_copy_pred_table(&pred_weights->weight_factors[0],
slice->num_ref_idx_l0_active_minus1 + 1,
VASlice->luma_weight_l0,
VASlice->luma_offset_l0,
@@ -688,7 +407,7 @@ static void h264_va_slice_to_v4l2(struct request_data *driver_data,
VASlice->chroma_offset_l0);
if ((VASlice->slice_type % 5) == H264_SLICE_B)
h264_copy_pred_table(&weights->weight_factors[1],
h264_copy_pred_table(&pred_weights->weight_factors[1],
slice->num_ref_idx_l1_active_minus1 + 1,
VASlice->luma_weight_l1,
VASlice->luma_offset_l1,
@@ -696,130 +415,21 @@ static void h264_va_slice_to_v4l2(struct request_data *driver_data,
VASlice->chroma_offset_l1);
}
int h264_get_controls(struct request_data *driver_data,
struct object_context *context)
{
struct v4l2_ext_control controls[2] = {
{
.id = V4L2_CID_STATELESS_H264_DECODE_MODE,
}, {
.id = V4L2_CID_STATELESS_H264_START_CODE,
}
};
int rc;
rc = v4l2_get_controls(driver_data->video_fd, -1, controls, 2);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
switch (controls[0].value) {
case V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED:
break;
case V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED:
break;
default:
request_log("Unsupported decode mode\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
switch (controls[1].value) {
case V4L2_STATELESS_H264_START_CODE_NONE:
context->h264_start_code = false;
break;
case V4L2_STATELESS_H264_START_CODE_ANNEX_B:
context->h264_start_code = true;
break;
default:
request_log("Unsupported start code\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
return VA_STATUS_SUCCESS;
}
static inline __u8 h264_profile_to_idc(VAProfile profile)
{
switch (profile) {
case VAProfileH264Main:
return 77;
case VAProfileH264High:
return 100;
case VAProfileH264ConstrainedBaseline:
return 66;
case VAProfileH264MultiviewHigh:
return 118;
case VAProfileH264StereoHigh:
return 128;
default:
return 0;
}
}
/*
* Derive sps.level_idc from the encoded frame size in macroblocks per
* H.264 Annex A.3 (Table A-1) MaxFS thresholds. Each level's MaxFS is
* the maximum encoded frame size in MBs the level supports; we pick
* the smallest level whose MaxFS contains the actual frame size.
*
* Level decoding for the V4L2 control: level_idc = level * 10
* Level 1.0 → 10, Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
*
* VAAPI does not expose the bitstream's actual level_idc on the
* decode side (VAPictureParameterBufferH264 has no such field) — see
* va.h. The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi /
* mpv and only slice data is forwarded in VASliceDataBuffer, so a
* SPS-NAL byte parser is not viable at this layer.
*
* Without framerate we cannot also check MaxMBPS / MaxBR / MaxCPB.
* That gap is acceptable in practice: consumers that push
* temporally-dense streams (high MBPS) almost always also push
* spatially-large frames (high MaxFS), so frame-size-based level
* selection over-allocates on the temporal axis but never
* under-allocates a level the consumer relies on for correct
* decode-resource sizing.
*
* Picks for typical content:
* 1080p (8160 MBs) → Level 4.1 (level_idc = 41)
* 4K (32400 MBs) → Level 5.1 (level_idc = 51)
* 8K (138240 MBs) → Level 6.0 (level_idc = 60)
*
* Replaces the hardcoded level_idc=51 from patch 0013.
*/
static inline __u8 h264_derive_level_idc(unsigned int width_in_mbs,
unsigned int height_in_mbs)
{
const unsigned int frame_size_mbs = width_in_mbs * height_in_mbs;
if (frame_size_mbs <= 99) return 10; /* Level 1.0 */
if (frame_size_mbs <= 396) return 11; /* Level 1.1 - 2.0 */
if (frame_size_mbs <= 792) return 21; /* Level 2.1 */
if (frame_size_mbs <= 1620) return 22; /* Level 2.2 - 3.0 */
if (frame_size_mbs <= 3600) return 31; /* Level 3.1 */
if (frame_size_mbs <= 5120) return 32; /* Level 3.2 */
if (frame_size_mbs <= 8192) return 41; /* Level 4.0 - 4.1 */
if (frame_size_mbs <= 8704) return 42; /* Level 4.2 */
if (frame_size_mbs <= 22080) return 50; /* Level 5.0 */
if (frame_size_mbs <= 36864) return 51; /* Level 5.1 - 5.2 */
if (frame_size_mbs <= 139264) return 60; /* Level 6.0 - 6.2 */
return 62; /* > Level 6 ceiling */
}
int h264_set_controls(struct request_data *driver_data,
struct object_context *context,
VAProfile profile,
struct object_surface *surface)
{
struct v4l2_ctrl_h264_scaling_matrix matrix = { 0 };
struct v4l2_ctrl_h264_decode_params decode = { 0 };
struct v4l2_ctrl_h264_slice_params slice = { 0 };
struct v4l2_ctrl_h264_pred_weights weights = { 0 };
struct v4l2_ctrl_h264_pred_weights pred_weights = { 0 };
struct v4l2_ctrl_h264_pps pps = { 0 };
struct v4l2_ctrl_h264_sps sps = { 0 };
struct h264_dpb_entry *output;
int rc;
output = dpb_lookup(context, &surface->params.h264.picture.CurrPic,
NULL, NULL);
NULL);
if (!output)
output = dpb_find_entry(context);
@@ -827,227 +437,47 @@ int h264_set_controls(struct request_data *driver_data,
dpb_update(context, &surface->params.h264.picture);
/*
* Dump the raw VAAPI fields at the libva boundary so issue #8
* follow-up can disambiguate "ffmpeg-vaapi didn't populate" from
* "downstream consumer (daedalus_v4l2 wire protocol) corrupted the
* value". One-line; safe to leave in — costs a single printf per frame.
*/
request_log("h264_set_controls: VAProfile=%d seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u bit_depth_luma_m8=%u bit_depth_chroma_m8=%u w_mbs_m1=%u h_mbs_m1=%u\n",
(int)profile,
surface->params.h264.picture.seq_fields.value,
surface->params.h264.picture.pic_fields.value,
surface->params.h264.picture.num_ref_frames,
surface->params.h264.picture.bit_depth_luma_minus8,
surface->params.h264.picture.bit_depth_chroma_minus8,
surface->params.h264.picture.picture_width_in_mbs_minus1,
surface->params.h264.picture.picture_height_in_mbs_minus1);
h264_va_picture_to_v4l2(driver_data, context, surface,
&surface->params.h264.picture,
&decode, &pps, &sps);
/*
* max_num_ref_frames fallback. Some VAAPI clients (older ffmpeg-vaapi
* paths, some daedalus_v4l2 consumers) leave VAPicture->num_ref_frames
* at zero. Hardware decoders tolerate; libavcodec-via-daedalus enforces
* sps.max_num_ref_frames strictly and rejects every frame.
*
* Count valid DPB entries first (the bitstream-true reference count we
* can see); fall back to a per-profile spec minimum if even that is 0.
* See marfrit/libva-v4l2-request-fourier issue #8.
*/
if (sps.max_num_ref_frames == 0) {
unsigned int valid = 0;
unsigned int i;
for (i = 0; i < 16; i++) {
const VAPictureH264 *ref =
&surface->params.h264.picture.ReferenceFrames[i];
if (!(ref->flags & VA_PICTURE_H264_INVALID))
valid++;
}
if (valid > 0) {
sps.max_num_ref_frames = (uint8_t)valid;
} else {
switch (profile) {
case VAProfileH264ConstrainedBaseline:
sps.max_num_ref_frames = 1;
break;
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
default:
sps.max_num_ref_frames = 4;
break;
}
}
}
/*
* Populate the scaling matrix unconditionally: from VAAPI's
* VAIQMatrixBufferH264 when the consumer sent one this frame
* (matrix_set), otherwise from the H.264 spec flat defaults.
* Submitted to the kernel as V4L2_CID_STATELESS_H264_SCALING_MATRIX
* for every request — required for FFmpeg/hantro contract parity
* (see h264_default_flat_scaling_matrix() docblock).
*/
if (surface->params.h264.matrix_set)
h264_va_matrix_to_v4l2(driver_data, context,
&surface->params.h264.matrix, &matrix);
else
h264_default_flat_scaling_matrix(&matrix);
h264_va_matrix_to_v4l2(driver_data, context,
&surface->params.h264.matrix, &matrix);
h264_va_slice_to_v4l2(driver_data, context,
&surface->params.h264.slice,
&surface->params.h264.picture, &slice, &weights);
&surface->params.h264.picture, &slice,
&pred_weights);
/*
* Mirror SCALING_MATRIX_PRESENT in PPS flags. Hantro G1 set_params
* gates its G1_REG_DEC_CTRL2_TYPE1_QUANT_E register bit on this;
* FFmpeg sets it unconditionally with the comment "FFmpeg always
* provide a scaling matrix." We submit the matrix always (above),
* so the flag must be set always to match.
*/
pps.flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT;
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_DECODE_PARAMS, &decode,
sizeof(decode));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
/*
* Populate pps->num_ref_idx_l0/l1_default_active_minus1. Hantro G1
* writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE
* MMIO registers (via "(field) + 1", so an uninitialized 0 here
* would advertise "1 active reference per list" to hardware, wrong
* for I/IDR frames with 0 refs and wrong for B frames with >1).
*
* VAAPI's VAPictureParameterBufferH264 does not carry the parsed
* PPS num_ref_idx_l*_default_active_minus1 fields — those are in
* the bitstream's PPS NAL which VAAPI consumers parse client-side
* but don't forward. The closest available source is VASlice's
* num_ref_idx_l*_active_minus1, which is the per-slice override
* defaulting to the PPS value (H.264 §7.4.3 num_ref_idx_active_
* override_flag). For most streams these values match; mismatch
* only on streams with explicit per-slice overrides.
*
* For IDR frames (no references), the values are not used by
* hantro's reference list builder, so a wrong value here is
* harmless. For inter frames it matters and slice-derived is
* the best we can do without a full PPS-NAL parser.
*/
pps.num_ref_idx_l0_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l0_active_minus1;
pps.num_ref_idx_l1_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l1_active_minus1;
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_SLICE_PARAMS, &slice,
sizeof(slice));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
/*
* Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
* from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
* matches the H.264 spec slice_type semantic: 0=P, 1=B, 2=I, 3=SP,
* 4=SI; values 5..9 mean "all slices in the picture have this
* slice_type" (mod 5 yields the underlying type). VAAPI consumers
* (ffmpeg, mpv) populate this for every slice; in FRAME_BASED mode
* we only see the most-recent slice's params, but slice_type is
* uniform across a single coded picture for our purposes.
*
* Kernel consumers that read these flags: tegra-vde
* (drivers/media/platform/nvidia/tegra-vde/h264.c lines 783-799 of
* 6.19.x) selects the inter-frame decode kernel. Hantro / rkvdec /
* cedrus / mediatek / qcom-iris-stateless do not consume them.
* Setting them keeps the libva-v4l2-request fork upstreamable
* across drivers without affecting hantro behaviour.
*
* Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
* Flags — V4L2_H264_DECODE_PARAM_FLAG_PFRAME / _BFRAME.
*/
switch (surface->params.h264.slice.slice_type % 5) {
case H264_SLICE_P:
decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_PFRAME;
break;
case H264_SLICE_B:
decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_BFRAME;
break;
default:
/* I / SP / SI: no extra flag. */
break;
}
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
&pred_weights, sizeof(pred_weights));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
sps.profile_idc = h264_profile_to_idc(profile);
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_PPS, &pps, sizeof(pps));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
/*
* Derive level_idc from encoded frame size per H.264 Annex A.3.
* VAAPI doesn't expose level_idc on the decode side (see
* h264_derive_level_idc()'s docblock for the rationale); we pick
* the smallest level whose MaxFS contains the picture dimensions.
* Replaces patch 0013's intermediate hardcode of 51.
*/
sps.level_idc = h264_derive_level_idc(
(unsigned int)surface->params.h264.picture.picture_width_in_mbs_minus1 + 1u,
(unsigned int)surface->params.h264.picture.picture_height_in_mbs_minus1 + 1u);
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_SPS, &sps, sizeof(sps));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
/*
* Build the per-request control list incrementally:
* - SPS, PPS, DECODE_PARAMS, SCALING_MATRIX: always required.
* Hantro G1 reads the SCALING_MATRIX_PRESENT flag from PPS to
* gate hardware register G1_REG_DEC_CTRL2_TYPE1_QUANT_E and
* reads the matrix entries directly into hardware tables when
* decoding. FFmpeg always submits the matrix (with spec-default
* flat values when no explicit lists are in the bitstream); we
* match that — see h264_default_flat_scaling_matrix() docblock.
* Earlier patch 0012 made SCALING_MATRIX submission conditional
* on VAAPI's VAIQMatrixBuffer arrival; that was corpus-correct
* (bbb has no explicit scaling lists) but inconsistent with the
* hantro contract — replaced 2026-05-04.
* - SLICE_PARAMS: SLICE_BASED only. Kernel doc
* ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
* "When this mode is selected, the
* V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be
* set." Submitting it under FRAME_BASED triggers cluster-
* validation EINVAL at error_idx=count.
* - PRED_WEIGHTS: SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
*
* Patch 0002 unconditionally sets the device to FRAME_BASED,
* so slice_based is hardcoded false here. When the planned
* probe-then-set commit lands, this becomes
* context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED.
*/
struct v4l2_ext_control controls[6] = { 0 };
unsigned int num_controls = 0;
const bool slice_based = false; /* TODO: probe via context->decode_mode */
controls[num_controls].id = V4L2_CID_STATELESS_H264_SPS;
controls[num_controls].p_h264_sps = &sps;
controls[num_controls].size = sizeof(sps);
num_controls++;
controls[num_controls].id = V4L2_CID_STATELESS_H264_PPS;
controls[num_controls].p_h264_pps = &pps;
controls[num_controls].size = sizeof(pps);
num_controls++;
controls[num_controls].id = V4L2_CID_STATELESS_H264_DECODE_PARAMS;
controls[num_controls].p_h264_decode_params = &decode;
controls[num_controls].size = sizeof(decode);
num_controls++;
controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
controls[num_controls].p_h264_scaling_matrix = &matrix;
controls[num_controls].size = sizeof(matrix);
num_controls++;
if (slice_based) {
controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
controls[num_controls].p_h264_slice_params = &slice;
controls[num_controls].size = sizeof(slice);
num_controls++;
if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice)) {
controls[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
controls[num_controls].ptr = &weights;
controls[num_controls].size = sizeof(weights);
num_controls++;
}
}
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
controls, num_controls);
rc = v4l2_set_control(driver_data->video_fd, surface->request_fd,
V4L2_CID_STATELESS_H264_SCALING_MATRIX, &matrix,
sizeof(matrix));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
-3
View File
@@ -51,11 +51,8 @@ struct h264_dpb {
unsigned int age;
};
int h264_get_controls(struct request_data *driver_data,
struct object_context *context);
int h264_set_controls(struct request_data *data,
struct object_context *context,
VAProfile profile,
struct object_surface *surface);
#endif
-361
View File
@@ -1,361 +0,0 @@
/*
* H.264 slice header bit-parser implementation.
*
* Implements just enough of ITU-T Rec. H.264 (08/2024) §7.3.3
* slice_header to populate the V4L2 DECODE_PARAMS bit-position
* fields (idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
* pic_order_cnt_bit_size, dec_ref_pic_marking_bit_size).
*
* Skips through ref_pic_list_modification() and pred_weight_table()
* because dec_ref_pic_marking() (whose bit length we need) comes
* after them. MVC extensions (nal_unit_type 20/21) are not handled
* — this fork strips MVC alongside HEVC.
*/
#include "h264_slice_header.h"
#include <errno.h>
#include <string.h>
/*
* Minimal RBSP bit reader. Reads bits MSB-first. Tracks bit_pos for
* caller use (e.g. computing the size of a syntax element by
* pre/post bit_pos delta).
*/
struct br {
const uint8_t *data;
size_t length; /* bytes */
size_t bit_pos;
bool error;
};
static uint32_t br_read_u(struct br *b, unsigned n)
{
uint32_t v = 0;
while (n--) {
if (b->bit_pos >= b->length * 8) {
b->error = true;
return 0;
}
v = (v << 1) | ((b->data[b->bit_pos >> 3] >>
(7 - (b->bit_pos & 7))) & 1u);
b->bit_pos++;
}
return v;
}
static uint32_t br_read_ue(struct br *b)
{
unsigned zeros = 0;
while (br_read_u(b, 1) == 0) {
if (b->error || ++zeros >= 32)
return 0;
}
if (zeros == 0)
return 0;
return (1u << zeros) - 1u + br_read_u(b, zeros);
}
static int32_t br_read_se(struct br *b)
{
uint32_t v = br_read_ue(b);
if (v & 1u)
return (int32_t)((v + 1u) >> 1);
return -(int32_t)(v >> 1);
}
/*
* RBSP unescape: strip emulation prevention bytes (after every
* 0x00 0x00 in the encoded stream, an extra 0x03 is inserted to
* prevent byte-aligned start-code emulation; we strip those before
* bit-parsing). Output buffer must be at least as large as input.
*
* Slice headers are short (<100 bits typically), so we unescape
* only the first H264_SLICE_HEADER_SCAN_BYTES = 64 input bytes.
* That covers any realistic slice header including
* dec_ref_pic_marking() and a generous safety margin.
*/
#define H264_SLICE_HEADER_SCAN_BYTES 64
static size_t rbsp_unescape(uint8_t *out, const uint8_t *in,
size_t in_len)
{
size_t out_len = 0;
int zero_run = 0;
size_t i;
size_t cap = in_len < H264_SLICE_HEADER_SCAN_BYTES ?
in_len : H264_SLICE_HEADER_SCAN_BYTES;
for (i = 0; i < cap; i++) {
if (zero_run >= 2 && in[i] == 0x03) {
zero_run = 0;
continue;
}
out[out_len++] = in[i];
zero_run = (in[i] == 0x00) ? zero_run + 1 : 0;
}
return out_len;
}
/*
* §7.3.3.1 ref_pic_list_modification() — skip past it without
* keeping any values. Length depends on slice_type and the loop
* terminator modification_of_pic_nums_idc == 3.
*/
static void skip_ref_pic_list_modification(struct br *b,
uint32_t slice_type)
{
uint32_t st_mod5 = slice_type % 5;
if (st_mod5 != 2 && st_mod5 != 4) {
/* P, SP, B */
uint32_t ref_pic_list_modification_flag_l0 = br_read_u(b, 1);
if (ref_pic_list_modification_flag_l0) {
uint32_t mod_idc;
do {
mod_idc = br_read_ue(b);
if (mod_idc == 0 || mod_idc == 1)
br_read_ue(b); /* abs_diff_pic_num_minus1 */
else if (mod_idc == 2)
br_read_ue(b); /* long_term_pic_num */
if (b->error)
return;
} while (mod_idc != 3);
}
}
if (st_mod5 == 1) {
/* B */
uint32_t ref_pic_list_modification_flag_l1 = br_read_u(b, 1);
if (ref_pic_list_modification_flag_l1) {
uint32_t mod_idc;
do {
mod_idc = br_read_ue(b);
if (mod_idc == 0 || mod_idc == 1)
br_read_ue(b);
else if (mod_idc == 2)
br_read_ue(b);
if (b->error)
return;
} while (mod_idc != 3);
}
}
}
/*
* §7.3.3.2 pred_weight_table() — skip past it. Length depends on
* the active reference counts and chroma_format_idc.
*/
static void skip_pred_weight_table(struct br *b,
uint32_t slice_type,
uint8_t chroma_format_idc,
uint8_t bit_depth_luma_minus8,
uint8_t bit_depth_chroma_minus8,
uint32_t num_ref_idx_l0_active_minus1,
uint32_t num_ref_idx_l1_active_minus1)
{
uint32_t i, j;
uint32_t st_mod5 = slice_type % 5;
(void)bit_depth_luma_minus8;
(void)bit_depth_chroma_minus8;
br_read_ue(b); /* luma_log2_weight_denom */
if (chroma_format_idc != 0)
br_read_ue(b); /* chroma_log2_weight_denom */
for (i = 0; i <= num_ref_idx_l0_active_minus1 && !b->error; i++) {
uint32_t luma_weight_l0_flag = br_read_u(b, 1);
if (luma_weight_l0_flag) {
br_read_se(b); /* luma_weight_l0 */
br_read_se(b); /* luma_offset_l0 */
}
if (chroma_format_idc != 0) {
uint32_t chroma_weight_l0_flag = br_read_u(b, 1);
if (chroma_weight_l0_flag) {
for (j = 0; j < 2; j++) {
br_read_se(b);
br_read_se(b);
}
}
}
}
if (st_mod5 == 1) {
for (i = 0; i <= num_ref_idx_l1_active_minus1 && !b->error; i++) {
uint32_t luma_weight_l1_flag = br_read_u(b, 1);
if (luma_weight_l1_flag) {
br_read_se(b);
br_read_se(b);
}
if (chroma_format_idc != 0) {
uint32_t chroma_weight_l1_flag = br_read_u(b, 1);
if (chroma_weight_l1_flag) {
for (j = 0; j < 2; j++) {
br_read_se(b);
br_read_se(b);
}
}
}
}
}
}
int h264_parse_slice_header(const uint8_t *nal_payload,
size_t nal_payload_length,
const struct h264_slice_header_context *ctx,
struct h264_slice_header_info *out)
{
uint8_t unescaped[H264_SLICE_HEADER_SCAN_BYTES];
size_t unescaped_len;
struct br b = { 0 };
bool idr_pic_flag = (ctx->nal_unit_type == 5);
uint32_t slice_type;
uint32_t num_ref_idx_l0_active_minus1;
uint32_t num_ref_idx_l1_active_minus1;
size_t pic_order_cnt_start;
size_t pic_order_cnt_end;
size_t dec_ref_pic_marking_start;
size_t dec_ref_pic_marking_end;
bool field_pic_flag = false;
memset(out, 0, sizeof(*out));
if (!nal_payload || nal_payload_length == 0)
return -EINVAL;
unescaped_len = rbsp_unescape(unescaped, nal_payload,
nal_payload_length);
if (unescaped_len < 2)
return -EINVAL;
b.data = unescaped;
b.length = unescaped_len;
b.bit_pos = 0;
b.error = false;
/* slice_header() per §7.3.3 */
out->first_mb_in_slice = br_read_ue(&b);
slice_type = br_read_ue(&b);
out->slice_type = slice_type;
out->pic_parameter_set_id = br_read_ue(&b);
if (ctx->separate_colour_plane_flag)
(void)br_read_u(&b, 2); /* colour_plane_id */
out->frame_num = br_read_u(&b, ctx->log2_max_frame_num_minus4 + 4u);
if (!ctx->frame_mbs_only_flag) {
field_pic_flag = (br_read_u(&b, 1) != 0);
if (field_pic_flag)
(void)br_read_u(&b, 1); /* bottom_field_flag */
}
if (idr_pic_flag)
out->idr_pic_id = (uint16_t)br_read_ue(&b);
/*
* pic_order_cnt syntax — measure bit length from the start of
* pic_order_cnt_lsb / delta_pic_order_cnt[0] to the end of
* delta_pic_order_cnt_bottom / delta_pic_order_cnt[1]. This is
* what V4L2 calls pic_order_cnt_bit_size and what hantro G1
* writes into G1_REG_DEC_CTRL6_POC_LENGTH.
*/
pic_order_cnt_start = b.bit_pos;
if (ctx->pic_order_cnt_type == 0) {
out->pic_order_cnt_lsb = (uint16_t)br_read_u(
&b, ctx->log2_max_pic_order_cnt_lsb_minus4 + 4u);
if (ctx->bottom_field_pic_order_in_frame_present_flag &&
!field_pic_flag)
out->delta_pic_order_cnt_bottom = br_read_se(&b);
} else if (ctx->pic_order_cnt_type == 1 &&
!ctx->delta_pic_order_always_zero_flag) {
out->delta_pic_order_cnt0 = br_read_se(&b);
if (ctx->bottom_field_pic_order_in_frame_present_flag &&
!field_pic_flag)
out->delta_pic_order_cnt1 = br_read_se(&b);
}
pic_order_cnt_end = b.bit_pos;
out->pic_order_cnt_bit_size = (uint32_t)(pic_order_cnt_end -
pic_order_cnt_start);
if (ctx->redundant_pic_cnt_present_flag)
(void)br_read_ue(&b); /* redundant_pic_cnt */
if (slice_type % 5 == 1) /* B */
(void)br_read_u(&b, 1); /* direct_spatial_mv_pred_flag */
num_ref_idx_l0_active_minus1 = ctx->num_ref_idx_l0_default_active_minus1;
num_ref_idx_l1_active_minus1 = ctx->num_ref_idx_l1_default_active_minus1;
{
uint32_t st = slice_type % 5;
if (st == 0 || st == 3 || st == 1) {
/* P, SP, B */
uint32_t override = br_read_u(&b, 1);
if (override) {
num_ref_idx_l0_active_minus1 = br_read_ue(&b);
if (st == 1)
num_ref_idx_l1_active_minus1 = br_read_ue(&b);
}
}
}
skip_ref_pic_list_modification(&b, slice_type);
if (b.error)
return -EIO;
{
uint32_t st = slice_type % 5;
bool do_pwt =
(ctx->weighted_pred_flag && (st == 0 || st == 3)) ||
(ctx->weighted_bipred_idc == 1 && st == 1);
if (do_pwt) {
skip_pred_weight_table(&b, slice_type,
ctx->chroma_format_idc,
ctx->bit_depth_luma_minus8,
ctx->bit_depth_chroma_minus8,
num_ref_idx_l0_active_minus1,
num_ref_idx_l1_active_minus1);
if (b.error)
return -EIO;
}
}
/*
* dec_ref_pic_marking() per §7.3.3.3 — measure bit length;
* hantro G1 writes this into G1_REG_DEC_CTRL5_REFPIC_MK_LEN.
*/
dec_ref_pic_marking_start = b.bit_pos;
if (ctx->nal_ref_idc != 0) {
if (idr_pic_flag) {
(void)br_read_u(&b, 1); /* no_output_of_prior_pics_flag */
(void)br_read_u(&b, 1); /* long_term_reference_flag */
} else {
uint32_t adaptive = br_read_u(&b, 1);
if (adaptive) {
uint32_t mmco;
do {
mmco = br_read_ue(&b);
if (mmco == 1 || mmco == 3)
br_read_ue(&b); /* difference_of_pic_nums_minus1 */
if (mmco == 2)
br_read_ue(&b); /* long_term_pic_num */
if (mmco == 3 || mmco == 6)
br_read_ue(&b); /* long_term_frame_idx */
if (mmco == 4)
br_read_ue(&b); /* max_long_term_frame_idx_plus1 */
if (b.error)
return -EIO;
} while (mmco != 0);
}
}
}
dec_ref_pic_marking_end = b.bit_pos;
out->dec_ref_pic_marking_bit_size =
(uint32_t)(dec_ref_pic_marking_end - dec_ref_pic_marking_start);
if (b.error)
return -EIO;
return 0;
}
-95
View File
@@ -1,95 +0,0 @@
/*
* H.264 slice header bit-parser for libva-v4l2-request.
*
* Extracts the slice-header bit-position and value fields that
* V4L2_CID_STATELESS_H264_DECODE_PARAMS requires (idr_pic_id,
* pic_order_cnt_lsb, delta_pic_order_cnt_*, pic_order_cnt_bit_size,
* dec_ref_pic_marking_bit_size). VAAPI's pre-parsed
* VAPictureParameterBufferH264 / VASliceParameterBufferH264 do not
* carry these — they live only in the bitstream's slice_header()
* syntax. Hantro G1 (drivers/media/platform/verisilicon/
* hantro_g1_h264_dec.c::set_params) writes the bit_size fields
* directly into MMIO registers G1_REG_DEC_CTRL5_REFPIC_MK_LEN and
* G1_REG_DEC_CTRL6_POC_LENGTH; with zeros the hardware bitstream
* parser walks past zero bits, lands on garbage, decodes nothing.
*
* Spec reference: ITU-T Rec. H.264 (08/2024) §7.3.3 slice_header
* and §7.3.3.1 ref_pic_list_modification, §7.3.3.2 pred_weight_table,
* §7.3.3.3 dec_ref_pic_marking.
*
* Cross-reference (proven working on hantro): FFmpeg's
* libavcodec/h264_slice.c populates H264SliceContext::ref_pic_marking_
* bit_size and pic_order_cnt_bit_size from its bit-precise slice
* header parse, then v4l2_request_h264.c forwards them.
*/
#ifndef H264_SLICE_HEADER_H
#define H264_SLICE_HEADER_H
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
struct h264_slice_header_context {
/* From SPS (the active SPS at slice-time). */
bool separate_colour_plane_flag;
uint8_t log2_max_frame_num_minus4;
bool frame_mbs_only_flag;
uint8_t pic_order_cnt_type;
uint8_t log2_max_pic_order_cnt_lsb_minus4;
bool delta_pic_order_always_zero_flag;
/* From PPS (the active PPS at slice-time). */
bool bottom_field_pic_order_in_frame_present_flag;
bool redundant_pic_cnt_present_flag;
bool weighted_pred_flag;
uint8_t weighted_bipred_idc;
uint8_t num_ref_idx_l0_default_active_minus1;
uint8_t num_ref_idx_l1_default_active_minus1;
uint8_t chroma_format_idc;
uint8_t bit_depth_luma_minus8;
uint8_t bit_depth_chroma_minus8;
/* From the NAL unit header (already extracted by the caller). */
uint8_t nal_unit_type;
uint8_t nal_ref_idc;
};
struct h264_slice_header_info {
uint16_t idr_pic_id;
uint16_t pic_order_cnt_lsb;
int32_t delta_pic_order_cnt_bottom;
int32_t delta_pic_order_cnt0;
int32_t delta_pic_order_cnt1;
uint32_t pic_order_cnt_bit_size;
uint32_t dec_ref_pic_marking_bit_size;
/* Diagnostic — useful for cross-checking VAAPI vs bitstream values. */
uint32_t first_mb_in_slice;
uint32_t slice_type;
uint32_t pic_parameter_set_id;
uint32_t frame_num;
};
/*
* Parse slice_header() up to dec_ref_pic_marking() (inclusive) of
* the H.264 RBSP slice_layer_without_partitioning_rbsp() syntax,
* extracting the V4L2 DECODE_PARAMS fields. Returns 0 on success,
* negative errno-shaped value on parse failure (insufficient data,
* malformed exp-Golomb, etc.).
*
* @nal_payload: pointer to the byte AFTER the NAL header byte
* (i.e. start of the RBSP proper; caller has already
* skipped any ANNEX_B start code and the 1-byte
* nal_unit_header). Will be RBSP-unescaped internally
* before parsing.
* @nal_payload_length: bytes available at @nal_payload.
* @ctx: SPS/PPS/NAL context required to drive the parse.
* @out: filled on success. All fields zero-initialized first.
*/
int h264_parse_slice_header(const uint8_t *nal_payload,
size_t nal_payload_length,
const struct h264_slice_header_context *ctx,
struct h264_slice_header_info *out);
#endif /* H264_SLICE_HEADER_H */
+231 -920
View File
File diff suppressed because it is too large Load Diff
-6
View File
@@ -27,12 +27,6 @@
#ifndef _H265_H_
#define _H265_H_
/* Maximum number of slices per frame the libva backend will accumulate
* before submitting to the kernel (kernel HEVC slice_params dynamic-array
* accepts up to 600 entries per Phase 0 V4L2 inventory; 64 is a
* conservative cap for typical fixtures + safety bound). */
#define HEVC_MAX_SLICES_PER_FRAME 64
struct object_context;
struct object_surface;
struct request_data;
-14
View File
@@ -1,14 +0,0 @@
/* Stub for <gst/base/base-prelude.h> — GStreamer base-lib prelude.
* In upstream GStreamer, this sets up the GstBaseExport macro + GObject
* boilerplate. We bypass all of that and provide only what our four
* vendored .c files actually need (gst_compat.h's typedefs).
*
* Crucially we also #define GST_BASE_API to nothing so the function
* declarations in gstbitreader.h / gstbytereader.h drop the
* dllimport / visibility attribute prefix.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
#include "gst_compat.h"
#define GST_BASE_API
#endif
-307
View File
@@ -1,307 +0,0 @@
/* GStreamer
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#define GST_BIT_READER_DISABLE_INLINES
#include "gstbitreader.h"
#include <string.h>
/**
* SECTION:gstbitreader
* @title: GstBitReader
* @short_description: Reads any number of bits from a memory buffer
* @symbols:
* - gst_bit_reader_skip_unchecked
* - gst_bit_reader_skip_to_byte_unchecked
* - gst_bit_reader_get_bits_uint8_unchecked
* - gst_bit_reader_peek_bits_uint8_unchecked
* - gst_bit_reader_get_bits_uint16_unchecked
* - gst_bit_reader_peek_bits_uint16_unchecked
* - gst_bit_reader_get_bits_uint32_unchecked
* - gst_bit_reader_peek_bits_uint32_unchecked
* - gst_bit_reader_get_bits_uint64_unchecked
* - gst_bit_reader_peek_bits_uint64_unchecked
*
* #GstBitReader provides a bit reader that can read any number of bits
* from a memory buffer. It provides functions for reading any number of bits
* into 8, 16, 32 and 64 bit variables.
*/
/**
* gst_bit_reader_new: (skip)
* @data: (array length=size): Data from which the #GstBitReader
* should read
* @size: Size of @data in bytes
*
* Create a new #GstBitReader instance, which will read from @data.
*
* Free-function: gst_bit_reader_free
*
* Returns: (transfer full): a new #GstBitReader instance
*/
GstBitReader *
gst_bit_reader_new (const guint8 * data, guint size)
{
GstBitReader *ret = g_new0 (GstBitReader, 1);
ret->data = data;
ret->size = size;
return ret;
}
/**
* gst_bit_reader_free:
* @reader: (in) (transfer full): a #GstBitReader instance
*
* Frees a #GstBitReader instance, which was previously allocated by
* gst_bit_reader_new().
*/
void
gst_bit_reader_free (GstBitReader * reader)
{
g_return_if_fail (reader != NULL);
g_free (reader);
}
/**
* gst_bit_reader_init:
* @reader: a #GstBitReader instance
* @data: (in) (array length=size): data from which the bit reader should read
* @size: Size of @data in bytes
*
* Initializes a #GstBitReader instance to read from @data. This function
* can be called on already initialized instances.
*/
void
gst_bit_reader_init (GstBitReader * reader, const guint8 * data, guint size)
{
g_return_if_fail (reader != NULL);
reader->data = data;
reader->size = size;
reader->byte = reader->bit = 0;
}
/**
* gst_bit_reader_set_pos:
* @reader: a #GstBitReader instance
* @pos: The new position in bits
*
* Sets the new position of a #GstBitReader instance to @pos in bits.
*
* Returns: %TRUE if the position could be set successfully, %FALSE
* otherwise.
*/
gboolean
gst_bit_reader_set_pos (GstBitReader * reader, guint pos)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (pos > reader->size * 8)
return FALSE;
reader->byte = pos / 8;
reader->bit = pos % 8;
return TRUE;
}
/**
* gst_bit_reader_get_pos:
* @reader: a #GstBitReader instance
*
* Returns the current position of a #GstBitReader instance in bits.
*
* Returns: The current position of @reader in bits.
*/
guint
gst_bit_reader_get_pos (const GstBitReader * reader)
{
return _gst_bit_reader_get_pos_inline (reader);
}
/**
* gst_bit_reader_get_remaining:
* @reader: a #GstBitReader instance
*
* Returns the remaining number of bits of a #GstBitReader instance.
*
* Returns: The remaining number of bits of @reader instance.
*/
guint
gst_bit_reader_get_remaining (const GstBitReader * reader)
{
return _gst_bit_reader_get_remaining_inline (reader);
}
/**
* gst_bit_reader_get_size:
* @reader: a #GstBitReader instance
*
* Returns the total number of bits of a #GstBitReader instance.
*
* Returns: The total number of bits of @reader instance.
*/
guint
gst_bit_reader_get_size (const GstBitReader * reader)
{
return _gst_bit_reader_get_size_inline (reader);
}
/**
* gst_bit_reader_skip:
* @reader: a #GstBitReader instance
* @nbits: the number of bits to skip
*
* Skips @nbits bits of the #GstBitReader instance.
*
* Returns: %TRUE if @nbits bits could be skipped, %FALSE otherwise.
*/
gboolean
gst_bit_reader_skip (GstBitReader * reader, guint nbits)
{
return _gst_bit_reader_skip_inline (reader, nbits);
}
/**
* gst_bit_reader_skip_to_byte:
* @reader: a #GstBitReader instance
*
* Skips until the next byte.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
gboolean
gst_bit_reader_skip_to_byte (GstBitReader * reader)
{
return _gst_bit_reader_skip_to_byte_inline (reader);
}
/**
* gst_bit_reader_get_bits_uint8:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint8 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint16:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint16 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint32:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint32 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint64:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint64 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint8:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint8 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint16:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint16 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint32:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint32 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint64:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint64 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
#define GST_BIT_READER_READ_BITS(bits) \
gboolean \
gst_bit_reader_peek_bits_uint##bits (const GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
return _gst_bit_reader_peek_bits_uint##bits##_inline (reader, val, nbits); \
} \
\
gboolean \
gst_bit_reader_get_bits_uint##bits (GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
return _gst_bit_reader_get_bits_uint##bits##_inline (reader, val, nbits); \
}
GST_BIT_READER_READ_BITS (8);
GST_BIT_READER_READ_BITS (16);
GST_BIT_READER_READ_BITS (32);
GST_BIT_READER_READ_BITS (64);
-328
View File
@@ -1,328 +0,0 @@
/* GStreamer
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifndef __GST_BIT_READER_H__
#define __GST_BIT_READER_H__
#include <gst/gst.h>
#include <gst/base/base-prelude.h>
/* FIXME: inline functions */
G_BEGIN_DECLS
#define GST_BIT_READER(reader) ((GstBitReader *) (reader))
/**
* GstBitReader:
* @data: (array length=size): Data from which the bit reader will
* read
* @size: Size of @data in bytes
* @byte: Current byte position
* @bit: Bit position in the current byte
*
* A bit reader instance.
*/
typedef struct {
const guint8 *data;
guint size;
guint byte; /* Byte position */
guint bit; /* Bit position in the current byte */
/* < private > */
gpointer _gst_reserved[GST_PADDING];
} GstBitReader;
GST_BASE_API
GstBitReader * gst_bit_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
GST_BASE_API
void gst_bit_reader_free (GstBitReader *reader);
GST_BASE_API
void gst_bit_reader_init (GstBitReader *reader, const guint8 *data, guint size);
GST_BASE_API
gboolean gst_bit_reader_set_pos (GstBitReader *reader, guint pos);
GST_BASE_API
guint gst_bit_reader_get_pos (const GstBitReader *reader);
GST_BASE_API
guint gst_bit_reader_get_remaining (const GstBitReader *reader);
GST_BASE_API
guint gst_bit_reader_get_size (const GstBitReader *reader);
GST_BASE_API
gboolean gst_bit_reader_skip (GstBitReader *reader, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_skip_to_byte (GstBitReader *reader);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint8 (GstBitReader *reader, guint8 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint16 (GstBitReader *reader, guint16 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint32 (GstBitReader *reader, guint32 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint64 (GstBitReader *reader, guint64 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint8 (const GstBitReader *reader, guint8 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint16 (const GstBitReader *reader, guint16 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint32 (const GstBitReader *reader, guint32 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint64 (const GstBitReader *reader, guint64 *val, guint nbits);
/**
* GST_BIT_READER_INIT:
* @data: Data from which the #GstBitReader should read
* @size: Size of @data in bytes
*
* A #GstBitReader must be initialized with this macro, before it can be
* used. This macro can used be to initialize a variable, but it cannot
* be assigned to a variable. In that case you have to use
* gst_bit_reader_init().
*/
#define GST_BIT_READER_INIT(data, size) {data, size, 0, 0}
/* Unchecked variants */
static inline void
gst_bit_reader_skip_unchecked (GstBitReader * reader, guint nbits)
{
reader->bit += nbits;
reader->byte += reader->bit / 8;
reader->bit = reader->bit % 8;
}
static inline void
gst_bit_reader_skip_to_byte_unchecked (GstBitReader * reader)
{
if (reader->bit) {
reader->bit = 0;
reader->byte++;
}
}
#define __GST_BIT_READER_READ_BITS_UNCHECKED(bits) \
static inline guint##bits \
gst_bit_reader_peek_bits_uint##bits##_unchecked (const GstBitReader *reader, guint nbits) \
{ \
guint##bits ret = 0; \
const guint8 *data; \
guint byte, bit; \
\
data = reader->data; \
byte = reader->byte; \
bit = reader->bit; \
\
while (nbits > 0) { \
guint toread = MIN (nbits, 8 - bit); \
\
ret <<= toread; \
ret |= (data[byte] & (0xff >> bit)) >> (8 - toread - bit); \
\
bit += toread; \
if (bit >= 8) { \
byte++; \
bit = 0; \
} \
nbits -= toread; \
} \
\
return ret; \
} \
\
static inline guint##bits \
gst_bit_reader_get_bits_uint##bits##_unchecked (GstBitReader *reader, guint nbits) \
{ \
guint##bits ret; \
\
ret = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
\
gst_bit_reader_skip_unchecked (reader, nbits); \
\
return ret; \
}
__GST_BIT_READER_READ_BITS_UNCHECKED (8)
__GST_BIT_READER_READ_BITS_UNCHECKED (16)
__GST_BIT_READER_READ_BITS_UNCHECKED (32)
__GST_BIT_READER_READ_BITS_UNCHECKED (64)
#undef __GST_BIT_READER_READ_BITS_UNCHECKED
/* unchecked variants -- do not use */
static inline guint
_gst_bit_reader_get_size_unchecked (const GstBitReader * reader)
{
return reader->size * 8;
}
static inline guint
_gst_bit_reader_get_pos_unchecked (const GstBitReader * reader)
{
return reader->byte * 8 + reader->bit;
}
static inline guint
_gst_bit_reader_get_remaining_unchecked (const GstBitReader * reader)
{
return reader->size * 8 - (reader->byte * 8 + reader->bit);
}
/* inlined variants -- do not use directly */
static inline guint
_gst_bit_reader_get_size_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_size_unchecked (reader);
}
static inline guint
_gst_bit_reader_get_pos_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_pos_unchecked (reader);
}
static inline guint
_gst_bit_reader_get_remaining_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_remaining_unchecked (reader);
}
static inline gboolean
_gst_bit_reader_skip_inline (GstBitReader * reader, guint nbits)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits)
return FALSE;
gst_bit_reader_skip_unchecked (reader, nbits);
return TRUE;
}
static inline gboolean
_gst_bit_reader_skip_to_byte_inline (GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (reader->byte > reader->size)
return FALSE;
gst_bit_reader_skip_to_byte_unchecked (reader);
return TRUE;
}
#define __GST_BIT_READER_READ_BITS_INLINE(bits) \
static inline gboolean \
_gst_bit_reader_get_bits_uint##bits##_inline (GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
g_return_val_if_fail (nbits <= bits, FALSE); \
\
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
return FALSE; \
\
*val = gst_bit_reader_get_bits_uint##bits##_unchecked (reader, nbits); \
return TRUE; \
} \
\
static inline gboolean \
_gst_bit_reader_peek_bits_uint##bits##_inline (const GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
g_return_val_if_fail (nbits <= bits, FALSE); \
\
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
return FALSE; \
\
*val = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
return TRUE; \
}
__GST_BIT_READER_READ_BITS_INLINE (8)
__GST_BIT_READER_READ_BITS_INLINE (16)
__GST_BIT_READER_READ_BITS_INLINE (32)
__GST_BIT_READER_READ_BITS_INLINE (64)
#undef __GST_BIT_READER_READ_BITS_INLINE
#ifndef GST_BIT_READER_DISABLE_INLINES
#define gst_bit_reader_get_size(reader) \
_gst_bit_reader_get_size_inline (reader)
#define gst_bit_reader_get_pos(reader) \
_gst_bit_reader_get_pos_inline (reader)
#define gst_bit_reader_get_remaining(reader) \
_gst_bit_reader_get_remaining_inline (reader)
/* we use defines here so we can add the G_LIKELY() */
#define gst_bit_reader_skip(reader, nbits)\
G_LIKELY (_gst_bit_reader_skip_inline(reader, nbits))
#define gst_bit_reader_skip_to_byte(reader)\
G_LIKELY (_gst_bit_reader_skip_to_byte_inline(reader))
#define gst_bit_reader_get_bits_uint8(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint8_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint16(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint16_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint32(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint32_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint64(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint64_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint8(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint8_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint16(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint16_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint32(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint32_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint64(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint64_inline (reader, val, nbits))
#endif
G_END_DECLS
#endif /* __GST_BIT_READER_H__ */
-67
View File
@@ -1,67 +0,0 @@
/* Stub for <gst/base/gstbitwriter.h>.
*
* The vendored nalutils.c uses GstBitWriter for NAL emulation-prevention
* byte INSERTION during write-side (encoder) operations. The libva
* backend never invokes those paths — we only PARSE NAL units, never
* write them. The functions must still compile + link though, so we
* stub them with abort() runtime guards: if any future code path
* accidentally invokes a writer function, we fail-fast instead of
* silently corrupting.
*
* Header surface mirrors upstream gstbitwriter.h minimally — enough
* for nalutils.c to compile.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
#include "gst_compat.h"
typedef struct {
guint8 *data;
guint bit_size;
guint bit_capacity;
gboolean auto_grow;
gboolean owned;
} GstBitWriter;
static inline void
gst_bit_writer_init(GstBitWriter *bw) { (void)bw; abort(); }
static inline void
gst_bit_writer_init_with_size(GstBitWriter *bw, guint size, gboolean fixed) {
(void)bw; (void)size; (void)fixed; abort();
}
static inline void
gst_bit_writer_reset(GstBitWriter *bw) { (void)bw; abort(); }
static inline gboolean
gst_bit_writer_put_bits_uint8(GstBitWriter *bw, guint8 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_align_bytes(GstBitWriter *bw, guint8 trailing_bit) {
(void)bw; (void)trailing_bit; abort();
}
static inline guint8 *
gst_bit_writer_get_data(GstBitWriter *bw) { (void)bw; abort(); }
static inline guint
gst_bit_writer_get_size(const GstBitWriter *bw) { (void)bw; abort(); }
static inline guint
gst_bit_writer_reset_and_get_size(GstBitWriter *bw) { (void)bw; abort(); }
static inline guint8 *
gst_bit_writer_reset_and_get_data(GstBitWriter *bw) { (void)bw; abort(); }
static inline gboolean
gst_bit_writer_put_bits_uint16(GstBitWriter *bw, guint16 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_put_bits_uint32(GstBitWriter *bw, guint32 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_put_bytes(GstBitWriter *bw, const guint8 *data, guint nbytes) {
(void)bw; (void)data; (void)nbytes; abort();
}
#define GST_BIT_WRITER_BIT_SIZE(bw) ((bw)->bit_size)
#define GST_BIT_WRITER_DATA(bw) ((bw)->data)
#endif
File diff suppressed because it is too large Load Diff
-684
View File
@@ -1,684 +0,0 @@
/* GStreamer byte reader
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
* Copyright (C) 2009 Tim-Philipp Müller <tim centricular net>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifndef __GST_BYTE_READER_H__
#define __GST_BYTE_READER_H__
#include <gst/gst.h>
#include <gst/base/base-prelude.h>
G_BEGIN_DECLS
#define GST_BYTE_READER(reader) ((GstByteReader *) (reader))
/**
* GstByteReader:
* @data: (array length=size): Data from which the bit reader will
* read
* @size: Size of @data in bytes
* @byte: Current byte position
*
* A byte reader instance.
*/
typedef struct {
const guint8 *data;
guint size;
guint byte; /* Byte position */
/* < private > */
gpointer _gst_reserved[GST_PADDING];
} GstByteReader;
GST_BASE_API
GstByteReader * gst_byte_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
GST_BASE_API
void gst_byte_reader_free (GstByteReader *reader);
GST_BASE_API
void gst_byte_reader_init (GstByteReader *reader, const guint8 *data, guint size);
GST_BASE_API
gboolean gst_byte_reader_peek_sub_reader (GstByteReader * reader,
GstByteReader * sub_reader,
guint size);
GST_BASE_API
gboolean gst_byte_reader_get_sub_reader (GstByteReader * reader,
GstByteReader * sub_reader,
guint size);
GST_BASE_API
gboolean gst_byte_reader_set_pos (GstByteReader *reader, guint pos);
GST_BASE_API
guint gst_byte_reader_get_pos (const GstByteReader *reader);
GST_BASE_API
guint gst_byte_reader_get_remaining (const GstByteReader *reader);
GST_BASE_API
guint gst_byte_reader_get_size (const GstByteReader *reader);
GST_BASE_API
gboolean gst_byte_reader_skip (GstByteReader *reader, guint nbytes);
GST_BASE_API
gboolean gst_byte_reader_get_uint8 (GstByteReader *reader, guint8 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int8 (GstByteReader *reader, gint8 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint16_le (GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int16_le (GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint16_be (GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int16_be (GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint24_le (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int24_le (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint24_be (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int24_be (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint32_le (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int32_le (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint32_be (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int32_be (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint64_le (GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int64_le (GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint64_be (GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int64_be (GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint8 (const GstByteReader *reader, guint8 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int8 (const GstByteReader *reader, gint8 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint16_le (const GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int16_le (const GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint16_be (const GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int16_be (const GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint24_le (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int24_le (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint24_be (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int24_be (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint32_le (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int32_le (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint32_be (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int32_be (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint64_le (const GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int64_le (const GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint64_be (const GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int64_be (const GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_float32_le (GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_get_float32_be (GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_get_float64_le (GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_get_float64_be (GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float32_le (const GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float32_be (const GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float64_le (const GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float64_be (const GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_dup_data (GstByteReader * reader, guint size, guint8 ** val);
GST_BASE_API
gboolean gst_byte_reader_get_data (GstByteReader * reader, guint size, const guint8 ** val);
GST_BASE_API
gboolean gst_byte_reader_peek_data (const GstByteReader * reader, guint size, const guint8 ** val);
#define gst_byte_reader_dup_string(reader,str) \
gst_byte_reader_dup_string_utf8(reader,str)
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf8 (GstByteReader * reader, gchar ** str);
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf16 (GstByteReader * reader, guint16 ** str);
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf32 (GstByteReader * reader, guint32 ** str);
#define gst_byte_reader_skip_string(reader) \
gst_byte_reader_skip_string_utf8(reader)
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf8 (GstByteReader * reader);
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf16 (GstByteReader * reader);
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf32 (GstByteReader * reader);
#define gst_byte_reader_get_string(reader,str) \
gst_byte_reader_get_string_utf8(reader,str)
#define gst_byte_reader_peek_string(reader,str) \
gst_byte_reader_peek_string_utf8(reader,str)
GST_BASE_API
gboolean gst_byte_reader_get_string_utf8 (GstByteReader * reader, const gchar ** str);
GST_BASE_API
gboolean gst_byte_reader_peek_string_utf8 (const GstByteReader * reader, const gchar ** str);
GST_BASE_API
guint gst_byte_reader_masked_scan_uint32 (const GstByteReader * reader,
guint32 mask,
guint32 pattern,
guint offset,
guint size);
GST_BASE_API
guint gst_byte_reader_masked_scan_uint32_peek (const GstByteReader * reader,
guint32 mask,
guint32 pattern,
guint offset,
guint size,
guint32 * value);
/**
* GST_BYTE_READER_INIT:
* @data: Data from which the #GstByteReader should read
* @size: Size of @data in bytes
*
* A #GstByteReader must be initialized with this macro, before it can be
* used. This macro can used be to initialize a variable, but it cannot
* be assigned to a variable. In that case you have to use
* gst_byte_reader_init().
*/
#define GST_BYTE_READER_INIT(data, size) {data, size, 0}
/* unchecked variants */
static inline void
gst_byte_reader_skip_unchecked (GstByteReader * reader, guint nbytes)
{
reader->byte += nbytes;
}
#define __GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(bits,type,lower,upper,adj) \
\
static inline type \
gst_byte_reader_peek_##lower##_unchecked (const GstByteReader * reader) \
{ \
type val = (type) GST_READ_##upper (reader->data + reader->byte); \
adj \
return val; \
} \
\
static inline type \
gst_byte_reader_get_##lower##_unchecked (GstByteReader * reader) \
{ \
type val = gst_byte_reader_peek_##lower##_unchecked (reader); \
reader->byte += bits / 8; \
return val; \
}
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,guint8,uint8,UINT8,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,gint8,int8,UINT8,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_le,UINT16_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_be,UINT16_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_le,UINT16_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_be,UINT16_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_le,UINT32_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_be,UINT32_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_le,UINT32_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_be,UINT32_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_le,UINT24_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_be,UINT24_BE,/* */)
/* fix up the sign for 24-bit signed ints stored in 32-bit signed ints */
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_le,UINT24_LE,
if (val & 0x00800000) val |= 0xff000000;)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_be,UINT24_BE,
if (val & 0x00800000) val |= 0xff000000;)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_le,UINT64_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_be,UINT64_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_le,UINT64_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_be,UINT64_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_le,FLOAT_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_be,FLOAT_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_le,DOUBLE_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_be,DOUBLE_BE,/* */)
#undef __GET_PEEK_BITS_UNCHECKED
static inline const guint8 *
gst_byte_reader_peek_data_unchecked (const GstByteReader * reader)
{
return (const guint8 *) (reader->data + reader->byte);
}
static inline const guint8 *
gst_byte_reader_get_data_unchecked (GstByteReader * reader, guint size)
{
const guint8 *data;
data = gst_byte_reader_peek_data_unchecked (reader);
gst_byte_reader_skip_unchecked (reader, size);
return data;
}
static inline guint8 *
gst_byte_reader_dup_data_unchecked (GstByteReader * reader, guint size)
{
gconstpointer data = gst_byte_reader_get_data_unchecked (reader, size);
guint8 *dup_data = (guint8 *) g_malloc (size);
memcpy (dup_data, data, size);
return dup_data;
}
/* Unchecked variants that should not be used */
static inline guint
_gst_byte_reader_get_pos_unchecked (const GstByteReader * reader)
{
return reader->byte;
}
static inline guint
_gst_byte_reader_get_remaining_unchecked (const GstByteReader * reader)
{
return reader->size - reader->byte;
}
static inline guint
_gst_byte_reader_get_size_unchecked (const GstByteReader * reader)
{
return reader->size;
}
/* inlined variants (do not use directly) */
static inline guint
_gst_byte_reader_get_remaining_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_remaining_unchecked (reader);
}
static inline guint
_gst_byte_reader_get_size_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_size_unchecked (reader);
}
#define __GST_BYTE_READER_GET_PEEK_BITS_INLINE(bits,type,name) \
\
static inline gboolean \
_gst_byte_reader_peek_##name##_inline (const GstByteReader * reader, type * val) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
\
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
return FALSE; \
\
*val = gst_byte_reader_peek_##name##_unchecked (reader); \
return TRUE; \
} \
\
static inline gboolean \
_gst_byte_reader_get_##name##_inline (GstByteReader * reader, type * val) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
\
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
return FALSE; \
\
*val = gst_byte_reader_get_##name##_unchecked (reader); \
return TRUE; \
}
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,guint8,uint8)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,gint8,int8)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_be)
#undef __GST_BYTE_READER_GET_PEEK_BITS_INLINE
#ifndef GST_BYTE_READER_DISABLE_INLINES
#define gst_byte_reader_init(reader,data,size) \
_gst_byte_reader_init_inline(reader,data,size)
#define gst_byte_reader_get_remaining(reader) \
_gst_byte_reader_get_remaining_inline(reader)
#define gst_byte_reader_get_size(reader) \
_gst_byte_reader_get_size_inline(reader)
#define gst_byte_reader_get_pos(reader) \
_gst_byte_reader_get_pos_inline(reader)
/* we use defines here so we can add the G_LIKELY() */
#define gst_byte_reader_get_uint8(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint8_inline(reader,val))
#define gst_byte_reader_get_int8(reader,val) \
G_LIKELY(_gst_byte_reader_get_int8_inline(reader,val))
#define gst_byte_reader_get_uint16_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint16_le_inline(reader,val))
#define gst_byte_reader_get_int16_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int16_le_inline(reader,val))
#define gst_byte_reader_get_uint16_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint16_be_inline(reader,val))
#define gst_byte_reader_get_int16_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int16_be_inline(reader,val))
#define gst_byte_reader_get_uint24_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint24_le_inline(reader,val))
#define gst_byte_reader_get_int24_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int24_le_inline(reader,val))
#define gst_byte_reader_get_uint24_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint24_be_inline(reader,val))
#define gst_byte_reader_get_int24_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int24_be_inline(reader,val))
#define gst_byte_reader_get_uint32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint32_le_inline(reader,val))
#define gst_byte_reader_get_int32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int32_le_inline(reader,val))
#define gst_byte_reader_get_uint32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint32_be_inline(reader,val))
#define gst_byte_reader_get_int32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int32_be_inline(reader,val))
#define gst_byte_reader_get_uint64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint64_le_inline(reader,val))
#define gst_byte_reader_get_int64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int64_le_inline(reader,val))
#define gst_byte_reader_get_uint64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint64_be_inline(reader,val))
#define gst_byte_reader_get_int64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int64_be_inline(reader,val))
#define gst_byte_reader_peek_uint8(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint8_inline(reader,val))
#define gst_byte_reader_peek_int8(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int8_inline(reader,val))
#define gst_byte_reader_peek_uint16_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint16_le_inline(reader,val))
#define gst_byte_reader_peek_int16_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int16_le_inline(reader,val))
#define gst_byte_reader_peek_uint16_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint16_be_inline(reader,val))
#define gst_byte_reader_peek_int16_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int16_be_inline(reader,val))
#define gst_byte_reader_peek_uint24_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint24_le_inline(reader,val))
#define gst_byte_reader_peek_int24_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int24_le_inline(reader,val))
#define gst_byte_reader_peek_uint24_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint24_be_inline(reader,val))
#define gst_byte_reader_peek_int24_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int24_be_inline(reader,val))
#define gst_byte_reader_peek_uint32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint32_le_inline(reader,val))
#define gst_byte_reader_peek_int32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int32_le_inline(reader,val))
#define gst_byte_reader_peek_uint32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint32_be_inline(reader,val))
#define gst_byte_reader_peek_int32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int32_be_inline(reader,val))
#define gst_byte_reader_peek_uint64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint64_le_inline(reader,val))
#define gst_byte_reader_peek_int64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int64_le_inline(reader,val))
#define gst_byte_reader_peek_uint64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint64_be_inline(reader,val))
#define gst_byte_reader_peek_int64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int64_be_inline(reader,val))
#define gst_byte_reader_get_float32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_float32_le_inline(reader,val))
#define gst_byte_reader_get_float32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_float32_be_inline(reader,val))
#define gst_byte_reader_get_float64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_float64_le_inline(reader,val))
#define gst_byte_reader_get_float64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_float64_be_inline(reader,val))
#define gst_byte_reader_peek_float32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float32_le_inline(reader,val))
#define gst_byte_reader_peek_float32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float32_be_inline(reader,val))
#define gst_byte_reader_peek_float64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float64_le_inline(reader,val))
#define gst_byte_reader_peek_float64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float64_be_inline(reader,val))
#endif /* GST_BYTE_READER_DISABLE_INLINES */
static inline void
_gst_byte_reader_init_inline (GstByteReader * reader, const guint8 * data, guint size)
{
g_return_if_fail (reader != NULL);
reader->data = data;
reader->size = size;
reader->byte = 0;
}
static inline gboolean
_gst_byte_reader_peek_sub_reader_inline (GstByteReader * reader,
GstByteReader * sub_reader, guint size)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (sub_reader != NULL, FALSE);
if (_gst_byte_reader_get_remaining_unchecked (reader) < size)
return FALSE;
sub_reader->data = reader->data + reader->byte;
sub_reader->byte = 0;
sub_reader->size = size;
return TRUE;
}
static inline gboolean
_gst_byte_reader_get_sub_reader_inline (GstByteReader * reader,
GstByteReader * sub_reader, guint size)
{
if (!_gst_byte_reader_peek_sub_reader_inline (reader, sub_reader, size))
return FALSE;
gst_byte_reader_skip_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_dup_data_inline (GstByteReader * reader, guint size, guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_dup_data_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_get_data_inline (GstByteReader * reader, guint size, const guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_get_data_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_peek_data_inline (const GstByteReader * reader, guint size, const guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_peek_data_unchecked (reader);
return TRUE;
}
static inline guint
_gst_byte_reader_get_pos_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_pos_unchecked (reader);
}
static inline gboolean
_gst_byte_reader_skip_inline (GstByteReader * reader, guint nbytes)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (G_UNLIKELY (_gst_byte_reader_get_remaining_unchecked (reader) < nbytes))
return FALSE;
reader->byte += nbytes;
return TRUE;
}
#ifndef GST_BYTE_READER_DISABLE_INLINES
#define gst_byte_reader_dup_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_dup_data_inline(reader,size,val))
#define gst_byte_reader_get_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_get_data_inline(reader,size,val))
#define gst_byte_reader_peek_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_peek_data_inline(reader,size,val))
#define gst_byte_reader_skip(reader,nbytes) \
G_LIKELY(_gst_byte_reader_skip_inline(reader,nbytes))
#endif /* GST_BYTE_READER_DISABLE_INLINES */
G_END_DECLS
#endif /* __GST_BYTE_READER_H__ */
@@ -1,9 +0,0 @@
/* Stub for <gst/codecparsers/codecparsers-prelude.h>.
* Same shape as base-prelude.h — drop the GObject boilerplate + define
* the GstCodecParsersAPI macro to nothing.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
#include "gst_compat.h"
#define GST_CODEC_PARSERS_API
#endif
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
-545
View File
@@ -1,545 +0,0 @@
/* Gstreamer
* Copyright (C) <2011> Intel Corporation
* Copyright (C) <2011> Collabora Ltd.
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
*
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
* Copyright (C) <2010> Collabora Multimedia
* Copyright (C) <2010> Nokia Corporation
*
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
/*
* Common code for NAL parsing from h264 and h265 parsers.
*/
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
#include "nalutils.h"
/****** Nal parser ******/
void
nal_reader_init (NalReader * nr, const guint8 * data, guint size)
{
nr->data = data;
nr->size = size;
nr->n_epb = 0;
nr->byte = 0;
nr->bits_in_cache = 0;
/* fill with something other than 0 to detect emulation prevention bytes */
nr->first_byte = 0xff;
nr->epb_cache = 0xff;
nr->cache = 0xff;
}
gboolean
nal_reader_read (NalReader * nr, guint nbits)
{
if (G_UNLIKELY (nr->byte * 8 + (nbits - nr->bits_in_cache) > nr->size * 8)) {
GST_DEBUG ("Can not read %u bits, bits in cache %u, Byte * 8 %u, size in "
"bits %u", nbits, nr->bits_in_cache, nr->byte * 8, nr->size * 8);
return FALSE;
}
while (nr->bits_in_cache < nbits) {
guint8 byte;
next_byte:
if (G_UNLIKELY (nr->byte >= nr->size))
return FALSE;
byte = nr->data[nr->byte++];
nr->epb_cache = (nr->epb_cache << 8) | byte;
/* check if the byte is a emulation_prevention_three_byte */
if ((nr->epb_cache & 0xffffff) == 0x3) {
nr->n_epb++;
goto next_byte;
}
nr->cache = (nr->cache << 8) | nr->first_byte;
nr->first_byte = byte;
nr->bits_in_cache += 8;
}
return TRUE;
}
/* Skips the specified amount of bits. This is only suitable to a
cacheable number of bits */
gboolean
nal_reader_skip (NalReader * nr, guint nbits)
{
g_assert (nbits <= 8 * sizeof (nr->cache));
if (G_UNLIKELY (!nal_reader_read (nr, nbits)))
return FALSE;
nr->bits_in_cache -= nbits;
return TRUE;
}
/* Generic version to skip any number of bits */
gboolean
nal_reader_skip_long (NalReader * nr, guint nbits)
{
/* Leave out enough bits in the cache once we are finished */
const guint skip_size = 4 * sizeof (nr->cache);
guint remaining = nbits;
nbits %= skip_size;
while (remaining > 0) {
if (!nal_reader_skip (nr, nbits))
return FALSE;
remaining -= nbits;
nbits = skip_size;
}
return TRUE;
}
guint
nal_reader_get_pos (const NalReader * nr)
{
return nr->byte * 8 - nr->bits_in_cache;
}
guint
nal_reader_get_remaining (const NalReader * nr)
{
return (nr->size - nr->byte) * 8 + nr->bits_in_cache;
}
guint
nal_reader_get_epb_count (const NalReader * nr)
{
return nr->n_epb;
}
#define NAL_READER_READ_BITS(bits) \
gboolean \
nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits) \
{ \
guint shift; \
\
if (!nal_reader_read (nr, nbits)) \
return FALSE; \
\
/* bring the required bits down and truncate */ \
shift = nr->bits_in_cache - nbits; \
*val = nr->first_byte >> shift; \
\
*val |= nr->cache << (8 - shift); \
/* mask out required bits */ \
if (nbits < bits) \
*val &= ((guint##bits)1 << nbits) - 1; \
\
nr->bits_in_cache = shift; \
\
return TRUE; \
} \
NAL_READER_READ_BITS (8);
NAL_READER_READ_BITS (16);
NAL_READER_READ_BITS (32);
#define NAL_READER_PEEK_BITS(bits) \
gboolean \
nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits) \
{ \
NalReader tmp; \
\
tmp = *nr; \
return nal_reader_get_bits_uint##bits (&tmp, val, nbits); \
}
NAL_READER_PEEK_BITS (8);
gboolean
nal_reader_get_ue (NalReader * nr, guint32 * val)
{
guint i = 0;
guint8 bit;
guint32 value;
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
return FALSE;
while (bit == 0) {
i++;
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
return FALSE;
}
if (G_UNLIKELY (i > 31))
return FALSE;
if (G_UNLIKELY (!nal_reader_get_bits_uint32 (nr, &value, i)))
return FALSE;
*val = (1 << i) - 1 + value;
return TRUE;
}
gboolean
nal_reader_get_se (NalReader * nr, gint32 * val)
{
guint32 value;
if (G_UNLIKELY (!nal_reader_get_ue (nr, &value)))
return FALSE;
if (value % 2)
*val = (value / 2) + 1;
else
*val = -(value / 2);
return TRUE;
}
gboolean
nal_reader_is_byte_aligned (NalReader * nr)
{
if (nr->bits_in_cache != 0)
return FALSE;
return TRUE;
}
gboolean
nal_reader_has_more_data (NalReader * nr)
{
NalReader nr_tmp;
guint remaining, nbits;
guint8 rbsp_stop_one_bit, zero_bits;
remaining = nal_reader_get_remaining (nr);
if (remaining == 0)
return FALSE;
nr_tmp = *nr;
nr = &nr_tmp;
/* The spec defines that more_rbsp_data() searches for the last bit
equal to 1, and that it is the rbsp_stop_one_bit. Subsequent bits
until byte boundary is reached shall be zero.
This means that more_rbsp_data() is FALSE if the next bit is 1
and the remaining bits until byte boundary are zero. One way to
be sure that this bit was the very last one, is that every other
bit after we reached byte boundary are also set to zero.
Otherwise, if the next bit is 0 or if there are non-zero bits
afterwards, then then we have more_rbsp_data() */
if (!nal_reader_get_bits_uint8 (nr, &rbsp_stop_one_bit, 1))
return FALSE;
if (!rbsp_stop_one_bit)
return TRUE;
nbits = --remaining % 8;
while (remaining > 0) {
if (!nal_reader_get_bits_uint8 (nr, &zero_bits, nbits))
return FALSE;
if (zero_bits != 0)
return TRUE;
remaining -= nbits;
nbits = 8;
}
return FALSE;
}
/*********** end of nal parser ***************/
gint
scan_for_start_codes (const guint8 * data, guint size)
{
GstByteReader br;
gst_byte_reader_init (&br, data, size);
/* NALU not empty, so we can at least expect 1 (even 2) bytes following sc */
return gst_byte_reader_masked_scan_uint32 (&br, 0xffffff00, 0x00000100,
0, size);
}
void
nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized)
{
g_return_if_fail (nw != NULL);
g_return_if_fail ((packetized && nal_prefix_size > 1 && nal_prefix_size < 5)
|| (!packetized && (nal_prefix_size == 3 || nal_prefix_size == 4)));
gst_bit_writer_init (&nw->bw);
nw->nal_prefix_size = nal_prefix_size;
nw->packetized = packetized;
}
void
nal_writer_reset (NalWriter * nw)
{
g_return_if_fail (nw != NULL);
gst_bit_writer_reset (&nw->bw);
memset (nw, 0, sizeof (NalWriter));
}
gboolean
nal_writer_do_rbsp_trailing_bits (NalWriter * nw)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, 1, 1)) {
GST_WARNING ("Cannot put trailing bits");
return FALSE;
}
if (!gst_bit_writer_align_bytes (&nw->bw, 0)) {
GST_WARNING ("Cannot put align bits");
return FALSE;
}
return TRUE;
}
static gpointer
nal_writer_create_nal_data (NalWriter * nw, guint32 * ret_size)
{
GstBitWriter bw;
gint i;
guint8 *src, *dst;
gsize size;
gpointer data;
/* scan to put emulation_prevention_three_byte */
size = GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3;
src = GST_BIT_WRITER_DATA (&nw->bw);
gst_bit_writer_init_with_size (&bw, size + nw->nal_prefix_size, FALSE);
for (i = 0; i < nw->nal_prefix_size - 1; i++)
gst_bit_writer_put_bits_uint8 (&bw, 0, 8);
gst_bit_writer_put_bits_uint8 (&bw, 1, 8);
for (i = 0; i < size; i++) {
guint pos = (GST_BIT_WRITER_BIT_SIZE (&bw) >> 3);
dst = GST_BIT_WRITER_DATA (&bw);
if (pos >= nw->nal_prefix_size + 2 &&
dst[pos - 2] == 0 && dst[pos - 1] == 0 && src[i] <= 0x3) {
gst_bit_writer_put_bits_uint8 (&bw, 0x3, 8);
}
gst_bit_writer_put_bits_uint8 (&bw, src[i], 8);
}
*ret_size = bw.bit_size >> 3;
data = gst_bit_writer_reset_and_get_data (&bw);
if (nw->packetized) {
size = *ret_size - nw->nal_prefix_size;
switch (nw->nal_prefix_size) {
case 1:
GST_WRITE_UINT8 (data, size);
break;
case 2:
GST_WRITE_UINT16_BE (data, size);
break;
case 3:
GST_WRITE_UINT24_BE (data, size);
break;
case 4:
GST_WRITE_UINT32_BE (data, size);
break;
default:
g_assert_not_reached ();
break;
}
}
return data;
}
GstMemory *
nal_writer_reset_and_get_memory (NalWriter * nw)
{
guint32 size = 0;
GstMemory *ret = NULL;
gpointer data;
g_return_val_if_fail (nw != NULL, NULL);
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
GST_WARNING ("No written byte");
goto done;
}
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
GST_WARNING ("Written stream is not byte aligned");
if (!nal_writer_do_rbsp_trailing_bits (nw))
goto done;
}
data = nal_writer_create_nal_data (nw, &size);
if (!data) {
GST_WARNING ("Failed to create nal data");
goto done;
}
ret = gst_memory_new_wrapped (0, data, size, 0, size, data, g_free);
done:
gst_bit_writer_reset (&nw->bw);
return ret;
}
guint8 *
nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size)
{
guint32 size = 0;
guint8 *data = NULL;
g_return_val_if_fail (nw != NULL, NULL);
g_return_val_if_fail (ret_size != NULL, NULL);
*ret_size = 0;
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
GST_WARNING ("No written byte");
goto done;
}
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
GST_WARNING ("Written stream is not byte aligned");
if (!nal_writer_do_rbsp_trailing_bits (nw))
goto done;
}
data = nal_writer_create_nal_data (nw, &size);
if (!data) {
GST_WARNING ("Failed to create nal data");
goto done;
}
*ret_size = size;
done:
gst_bit_writer_reset (&nw->bw);
return data;
}
gboolean
nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint16 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint32 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes)
{
g_return_val_if_fail (nw != NULL, FALSE);
g_return_val_if_fail (data != NULL, FALSE);
g_return_val_if_fail (nbytes != 0, FALSE);
if (!gst_bit_writer_put_bytes (&nw->bw, data, nbytes))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_ue (NalWriter * nw, guint32 value)
{
guint leading_zeros;
guint rest;
g_return_val_if_fail (nw != NULL, FALSE);
count_exp_golomb_bits (value, &leading_zeros, &rest);
/* write leading zeros */
if (leading_zeros) {
if (!nal_writer_put_bits_uint32 (nw, 0, leading_zeros))
return FALSE;
}
/* write the rest */
if (!nal_writer_put_bits_uint32 (nw, value + 1, rest))
return FALSE;
return TRUE;
}
gboolean
count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest)
{
guint32 x;
guint count = 0;
/* https://en.wikipedia.org/wiki/Exponential-Golomb_coding */
/* count bits of value + 1 */
x = value + 1;
while (x) {
count++;
x >>= 1;
}
if (leading_zeros) {
if (count > 1)
*leading_zeros = count - 1;
else
*leading_zeros = 0;
}
if (rest) {
*rest = count;
}
return TRUE;
}
-269
View File
@@ -1,269 +0,0 @@
/* Gstreamer
* Copyright (C) <2011> Intel Corporation
* Copyright (C) <2011> Collabora Ltd.
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
*
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
* Copyright (C) <2010> Collabora Multimedia
* Copyright (C) <2010> Nokia Corporation
*
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
/**
* Common code for NAL parsing from h264 and h265 parsers.
*/
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
#include <gst/base/gstbytereader.h>
#include <gst/base/gstbitreader.h>
#include <gst/base/gstbitwriter.h>
typedef struct
{
const guint8 *data;
guint size;
guint n_epb; /* Number of emulation prevention bytes */
guint byte; /* Byte position */
guint bits_in_cache; /* bitpos in the cache of next bit */
guint8 first_byte;
guint32 epb_cache; /* cache 3 bytes to check emulation prevention bytes */
guint64 cache; /* cached bytes */
} NalReader;
typedef struct
{
GstBitWriter bw;
guint nal_prefix_size;
gboolean packetized;
} NalWriter;
G_GNUC_INTERNAL
void nal_reader_init (NalReader * nr, const guint8 * data, guint size);
G_GNUC_INTERNAL
gboolean nal_reader_read (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
gboolean nal_reader_skip (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
gboolean nal_reader_skip_long (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
guint nal_reader_get_pos (const NalReader * nr);
G_GNUC_INTERNAL
guint nal_reader_get_remaining (const NalReader * nr);
G_GNUC_INTERNAL
guint nal_reader_get_epb_count (const NalReader * nr);
G_GNUC_INTERNAL
gboolean nal_reader_is_byte_aligned (NalReader * nr);
G_GNUC_INTERNAL
gboolean nal_reader_has_more_data (NalReader * nr);
#define NAL_READER_READ_BITS_H(bits) \
G_GNUC_INTERNAL \
gboolean nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits)
NAL_READER_READ_BITS_H (8);
NAL_READER_READ_BITS_H (16);
NAL_READER_READ_BITS_H (32);
#define NAL_READER_PEEK_BITS_H(bits) \
G_GNUC_INTERNAL \
gboolean nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits)
NAL_READER_PEEK_BITS_H (8);
G_GNUC_INTERNAL
gboolean nal_reader_get_ue (NalReader * nr, guint32 * val);
G_GNUC_INTERNAL
gboolean nal_reader_get_se (NalReader * nr, gint32 * val);
#define CHECK_ALLOWED_MAX_WITH_DEBUG(dbg, val, max) { \
if (val > max) { \
GST_WARNING ("value for '" dbg "' greater than max. value: %d, max %d", \
val, max); \
goto error; \
} \
}
#define CHECK_ALLOWED_MAX(val, max) \
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), val, max)
#define CHECK_ALLOWED_WITH_DEBUG(dbg, val, min, max) { \
if (val < min || val > max) { \
GST_WARNING ("value for '" dbg "' not in allowed range. value: %d, range %d-%d", \
val, min, max); \
goto error; \
} \
}
#define CHECK_ALLOWED(val, min, max) \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), val, min, max)
#define READ_UINT8(nr, val, nbits) { \
if (!nal_reader_get_bits_uint8 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT16(nr, val, nbits) { \
if (!nal_reader_get_bits_uint16 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT32(nr, val, nbits) { \
if (!nal_reader_get_bits_uint32 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT64(nr, val, nbits) { \
if (!nal_reader_get_bits_uint64 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UE(nr, val) { \
if (!nal_reader_get_ue (nr, &val)) { \
GST_WARNING ("failed to read UE for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
#define READ_UE_ALLOWED(nr, val, min, max) { \
guint32 tmp; \
READ_UE (nr, tmp); \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
val = tmp; \
}
#define READ_UE_MAX(nr, val, max) { \
guint32 tmp; \
READ_UE (nr, tmp); \
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), tmp, max); \
val = tmp; \
}
#define READ_SE(nr, val) { \
if (!nal_reader_get_se (nr, &val)) { \
GST_WARNING ("failed to read SE for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
#define READ_SE_ALLOWED(nr, val, min, max) { \
gint32 tmp; \
READ_SE (nr, tmp); \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
val = tmp; \
}
G_GNUC_INTERNAL
gint scan_for_start_codes (const guint8 * data, guint size);
G_GNUC_INTERNAL
void nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized);
G_GNUC_INTERNAL
void nal_writer_reset (NalWriter * nw);
G_GNUC_INTERNAL
gboolean nal_writer_do_rbsp_trailing_bits (NalWriter * nw);
G_GNUC_INTERNAL
GstMemory * nal_writer_reset_and_get_memory (NalWriter * nw);
G_GNUC_INTERNAL
guint8 * nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes);
G_GNUC_INTERNAL
gboolean nal_writer_put_ue (NalWriter * nw, guint32 value);
G_GNUC_INTERNAL
gboolean count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest);
#define WRITE_UINT8(nw, val, nbits) { \
if (!nal_writer_put_bits_uint8 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_UINT16(nw, val, nbits) { \
if (!nal_writer_put_bits_uint16 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_UINT32(nw, val, nbits) { \
if (!nal_writer_put_bits_uint32 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_BYTES(nw, data, nbytes) { \
if (!nal_writer_put_bytes (nw, data, nbytes)) { \
GST_WARNING ("failed to write bytes for '" G_STRINGIFY (val) "', nbits: %d", nbytes); \
goto error; \
} \
}
#define WRITE_UE(nw, val) { \
if (!nal_writer_put_ue (nw, val)) { \
GST_WARNING ("failed to write ue for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
static inline guint32 div_ceil (guint32 a, guint32 b)
{
/* http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html */
g_assert (b > 0);
return a / b + (a % b > 0);
}
-10
View File
@@ -1,10 +0,0 @@
/* Stub for <gst/glib-compat-private.h>.
* In upstream GStreamer this provides backwards-compat shims for older
* GLib versions (g_memdup2 polyfill being the load-bearing one).
* Our gst_compat.h already defines g_memdup2 as a static inline, so
* we just include the shim.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
#include "gst_compat.h"
#endif
-10
View File
@@ -1,10 +0,0 @@
/* Stub for <gst/gst.h> — redirects to the project's gst_compat shim.
* The vendored GStreamer 1.28.2 H.265 parser was originally built against
* full GStreamer; we only need the GLib type aliases + memory helpers +
* macro stubs, all provided by gst_compat.h. Original gst.h would pull
* in GObject + GstObject + the entire framework, which we don't link.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
#include "gst_compat.h"
#endif
-145
View File
@@ -1,145 +0,0 @@
/*
* gst_compat.c GArray implementation for the vendored GStreamer parser.
*
* Scope: minimal subset of GArray API exercised by gsth265parser.c
* (g_array_new, g_array_sized_new, g_array_append_vals + the
* g_array_append_val macro, g_array_index macro, g_array_set_size,
* g_array_set_clear_func, g_array_free, g_array_unref).
*
* Non-thread-safe (matches GArray's documented semantics GArray is
* not thread-safe in upstream GLib either, callers must serialize).
*
* License: MIT (matches backend's COPYING.MIT).
*/
#include "gst_compat.h"
/* ===== internal helpers ===== */
static gboolean
garray_grow(GArray *array, guint new_capacity)
{
if (new_capacity <= array->capacity)
return TRUE;
/* round up to next power of two for amortized O(1) growth */
guint cap = array->capacity > 0 ? array->capacity : 4;
while (cap < new_capacity)
cap *= 2;
char *new_data = realloc(array->data, (size_t)cap * array->element_size);
if (new_data == NULL)
return FALSE;
if (array->clear) {
memset(new_data + (size_t)array->capacity * array->element_size, 0,
(size_t)(cap - array->capacity) * array->element_size);
}
array->data = new_data;
array->capacity = cap;
return TRUE;
}
/* ===== public API ===== */
GArray *
g_array_sized_new(gboolean zero_terminated, gboolean clear,
guint element_size, guint reserved_size)
{
/* zero_terminated is GLib-specific (appends a zero-element sentinel
* for trailing-NULL semantics). The vendored parser does not use it;
* we ignore the flag. */
(void)zero_terminated;
GArray *a = calloc(1, sizeof(GArray));
if (a == NULL)
return NULL;
a->element_size = element_size;
a->clear = clear;
if (reserved_size > 0) {
if (!garray_grow(a, reserved_size)) {
free(a);
return NULL;
}
}
return a;
}
GArray *
g_array_new(gboolean zero_terminated, gboolean clear, guint element_size)
{
return g_array_sized_new(zero_terminated, clear, element_size, 0);
}
GArray *
g_array_set_size(GArray *array, guint length)
{
if (length > array->capacity) {
if (!garray_grow(array, length))
return array;
}
if (array->clear_func != NULL && length < array->len) {
for (guint i = length; i < array->len; i++)
array->clear_func(array->data + (size_t)i * array->element_size);
}
if (array->clear && length > array->len) {
memset(array->data + (size_t)array->len * array->element_size, 0,
(size_t)(length - array->len) * array->element_size);
}
array->len = length;
return array;
}
GArray *
g_array_append_vals(GArray *array, gconstpointer data, guint len)
{
if (len == 0)
return array;
if (!garray_grow(array, array->len + len))
return array;
memcpy(array->data + (size_t)array->len * array->element_size,
data, (size_t)len * array->element_size);
array->len += len;
return array;
}
void
g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer))
{
array->clear_func = clear_func;
}
gchar *
g_array_free(GArray *array, gboolean free_segment)
{
if (array == NULL)
return NULL;
if (array->clear_func != NULL) {
for (guint i = 0; i < array->len; i++)
array->clear_func(array->data + (size_t)i * array->element_size);
}
gchar *data = NULL;
if (free_segment) {
free(array->data);
} else {
data = array->data;
}
free(array);
return data;
}
GArray *
g_array_unref(GArray *array)
{
/* simplified to free; the backend never sub-references shared GArrays */
g_array_free(array, TRUE);
return NULL;
}
-463
View File
@@ -1,463 +0,0 @@
/*
* gst_compat.h minimal GLib/GStreamer compatibility shim for vendored
* GStreamer 1.28.2 H.265 parser + bitreader + bytereader + nalutils.
*
* Strategy: provide #defines / typedefs for the GLib API surface those
* 4 vendored files use, so they can compile against libc + libv4l2 only
* (no glib2 / gst-base linkage). Vendored .c files are NOT modified
* directly; instead this header is force-included via the Makefile's
* `-include` flag on the vendored translation units.
*
* Coverage scoped to what gsth265parser.c + nalutils.c + gstbitreader.c
* + gstbytereader.c actually call. Surveyed in
* ampere-kernel-decoders phase4 step 2 prep see
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md and the survey
* commit message for the empirical inventory.
*
* License: this shim is original work, MIT (matching the backend's
* COPYING.MIT). The vendored .c files keep their LGPL v2.1+ headers
* verbatim.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
#define LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* ===== GLib type aliases ===== */
typedef bool gboolean;
typedef char gchar;
typedef unsigned char guchar;
typedef int gint;
typedef int8_t gint8;
typedef int16_t gint16;
typedef int32_t gint32;
typedef int64_t gint64;
typedef unsigned int guint;
typedef uint8_t guint8;
typedef uint16_t guint16;
typedef uint32_t guint32;
typedef uint64_t guint64;
typedef size_t gsize;
typedef ptrdiff_t gssize;
typedef void * gpointer;
typedef const void * gconstpointer;
typedef double gdouble;
typedef float gfloat;
/* GLib's gint64 / guint64 formatting is platform-conditional; for our
* aarch64 ALARM target we don't need the full G_*_FORMAT machinery, but
* gstbytereader uses G_GSIZE_FORMAT in a debug-only printf. */
#define G_GSIZE_FORMAT "zu"
#ifndef TRUE
# define TRUE true
#endif
#ifndef FALSE
# define FALSE false
#endif
/* ===== memory ===== */
#define g_malloc(n) malloc((size_t)(n))
#define g_malloc0(n) calloc(1, (size_t)(n))
#define g_realloc(p, n) realloc((p), (size_t)(n))
/* g_free needs to be addressable (passed as a function-pointer arg by
* nalutils.c::gst_memory_new_wrapped even though that call site is
* dead code we don't invoke, it must compile). Plain `free` is
* compatible: signature is `void (void *)` either way. */
#define g_free free
#define g_new(type, n) ((type *)malloc(sizeof(type) * (size_t)(n)))
#define g_new0(type, n) ((type *)calloc((size_t)(n), sizeof(type)))
#define g_slice_new(type) ((type *)malloc(sizeof(type)))
#define g_slice_new0(type) ((type *)calloc(1, sizeof(type)))
#define g_slice_free(type, p) free(p)
#define g_slice_free1(size, p) free(p)
#define g_clear_pointer(pp, freefn) \
do { freefn(*(pp)); *(pp) = NULL; } while (0)
/* g_memdup2 — GLib's 64-bit-safe memdup, used by gstbytereader. */
static inline gpointer
g_memdup2(gconstpointer mem, gsize byte_size)
{
if (mem == NULL || byte_size == 0)
return NULL;
void *copy = malloc(byte_size);
if (copy != NULL)
memcpy(copy, mem, byte_size);
return copy;
}
/* g_strcmp0 — NULL-safe strcmp. Used by gsth265parser in profile-name lookup. */
static inline int
g_strcmp0(const char *a, const char *b)
{
if (a == b) return 0;
if (a == NULL) return -1;
if (b == NULL) return 1;
return strcmp(a, b);
}
/* ===== asserts / return-guards =====
*
* Per ampere-kernel-decoders iter2 Phase 2 §"new failure modes" #5:
* g_assert must NOT abort the process. It becomes a no-op here;
* malformed bitstream is caught by the explicit parse-result returns
* the parser already implements.
*
* g_return_if_fail / g_return_val_if_fail propagate as the original
* GLib semantics (early return with optional value). */
#define g_assert(cond) ((void)0)
#define g_assert_not_reached() __builtin_unreachable()
#define g_return_if_fail(cond) do { if (!(cond)) return; } while (0)
#define g_return_val_if_fail(cond, v) do { if (!(cond)) return (v); } while (0)
/* ===== GStreamer logging — no-ops =====
*
* The parser is heavy on debug logging. We compile all of it out;
* the backend's own logging (request_log/error_log) wraps the parser
* calls and reports parse-failure return codes from there. */
#define GST_DISABLE_GST_DEBUG 1
#define GST_DEBUG_CATEGORY_STATIC(name)
#define GST_DEBUG_CATEGORY_INIT(...) ((void)0)
#define GST_DEBUG_CATEGORY_GET(...) ((void)0)
#define GST_DEBUG(...) ((void)0)
#define GST_INFO(...) ((void)0)
#define GST_WARNING(...) ((void)0)
#define GST_ERROR(...) ((void)0)
#define GST_LOG(...) ((void)0)
#define GST_FIXME(...) ((void)0)
#define GST_MEMDUMP(...) ((void)0)
#define GST_CAT_DEFAULT (NULL)
/* ===== compiler / language helpers ===== */
#define G_LIKELY(x) __builtin_expect(!!(x), 1)
#define G_UNLIKELY(x) __builtin_expect(!!(x), 0)
#define G_GNUC_UNUSED __attribute__((unused))
#define G_GNUC_INTERNAL
#define G_GNUC_MALLOC __attribute__((malloc))
#define G_GNUC_NORETURN __attribute__((noreturn))
#define G_GNUC_DEPRECATED
#define G_GNUC_DEPRECATED_FOR(x)
#define G_GNUC_PURE __attribute__((pure))
#define G_GNUC_CONST __attribute__((const))
#define G_GNUC_PRINTF(a, b) __attribute__((format(printf, a, b)))
#define G_BEGIN_DECLS
#define G_END_DECLS
#define G_N_ELEMENTS(arr) (sizeof(arr) / sizeof((arr)[0]))
#define G_STMT_START do
#define G_STMT_END while (0)
#define G_STRINGIFY(x) G_STRINGIFY_(x)
#define G_STRINGIFY_(x) #x
/* GStreamer ABI-padding slot count; upstream uses 4 reserved gpointers
* at the end of public structs for future ABI extension. We replicate
* the size so struct layout matches what gst_byte_reader_init / friends
* write into. */
#define GST_PADDING 4
#define GST_PADDING_LARGE 20
/* Public-symbol visibility — backend's shared module uses
* -fvisibility=hidden, so we don't need to mark anything public from
* within the vendored parser. The original GST_*_API macros expand to
* extern + dllimport on Windows; on Linux ELF builds where
* fvisibility=hidden is active, they would mark public symbols. The
* vendored functions are never called from outside h265_parser/, so
* leaving these empty hides them automatically. */
#define GST_API
#define GST_API_EXPORT extern
#define GST_API_IMPORT extern
/* ===== Opaque GStreamer pipeline types =====
*
* GstBuffer + GstMemory are referenced by encoder-side dead-code
* functions in gsth265parser.c (gst_h265_parser_insert_sei_hevc).
* We never call those; declaring them as opaque structs lets the
* function pointers / declarations compile, and the linker keeps the
* dead-code .text section even though it's unreachable.
*
* If you ever need to actually USE GstBuffer in this tree, replace
* these opaque decls with the project's own buffer abstraction; do not
* try to vendor in libgst itself. */
typedef struct _GstBuffer GstBuffer;
typedef struct _GstMemory GstMemory;
typedef struct _GstMapInfo GstMapInfo; /* opaque — dead-code in gsth265parser SEI insert */
/* GLib min/max constants — dead-code unsigned-overflow guards in
* gsth265parser.c. */
#define G_MAXUINT8 ((guint8)0xFF)
#define G_MAXUINT16 ((guint16)0xFFFF)
#define G_MAXUINT32 ((guint32)0xFFFFFFFFU)
#define G_MAXUINT64 ((guint64)0xFFFFFFFFFFFFFFFFULL)
#define G_MAXINT8 ((gint8)0x7F)
#define G_MAXINT16 ((gint16)0x7FFF)
#define G_MAXINT32 ((gint32)0x7FFFFFFF)
#define G_MAXINT64 ((gint64)0x7FFFFFFFFFFFFFFFLL)
#define G_MININT8 ((gint8)(-0x80))
#define G_MININT16 ((gint16)(-0x8000))
#define G_MININT32 ((gint32)(-0x80000000))
#define G_MAXSIZE ((gsize)-1)
/* GLib function-pointer typedefs used by g_list_* APIs (which our
* gst_compat declares as abort-stubs). They show up in code paths
* we never invoke but must compile. */
typedef void (*GDestroyNotify)(gpointer data);
typedef int (*GCompareFunc)(gconstpointer a, gconstpointer b);
typedef int (*GCompareDataFunc)(gconstpointer a, gconstpointer b, gpointer user_data);
/* GstMapFlags — passed to gst_memory_map / gst_buffer_map. Dead-code. */
#define GST_MAP_READ (1 << 0)
#define GST_MAP_WRITE (1 << 1)
#define GST_MAP_READWRITE (GST_MAP_READ | GST_MAP_WRITE)
/* Dead-code stubs for buffer / memory mapping (only referenced by
* gst_h265_parser_insert_sei_hevc which we never call). The compile
* needs declarations + addressable functions; abort on call. */
static inline gboolean
gst_memory_map(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
int flags G_GNUC_UNUSED) { abort(); }
static inline void
gst_memory_unmap(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
static inline gboolean
gst_buffer_map(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
int flags G_GNUC_UNUSED) { abort(); }
static inline void
gst_buffer_unmap(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
static inline GstBuffer *
gst_buffer_new(void) { abort(); }
static inline gboolean
gst_buffer_copy_into(GstBuffer *dst G_GNUC_UNUSED, GstBuffer *src G_GNUC_UNUSED,
int flags G_GNUC_UNUSED, gsize offset G_GNUC_UNUSED,
gssize size G_GNUC_UNUSED) { abort(); }
static inline void
gst_buffer_append_memory(GstBuffer *buf G_GNUC_UNUSED, GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline GstMemory *
gst_memory_ref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline void
gst_memory_unref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline GstMemory *
gst_memory_copy(GstMemory *mem G_GNUC_UNUSED, gssize offset G_GNUC_UNUSED, gssize size G_GNUC_UNUSED) { abort(); }
static inline void
gst_clear_buffer(GstBuffer **buf) { *buf = NULL; }
#define GST_IS_BUFFER(b) (false)
/* GstBufferCopyFlags — used only by gst_buffer_copy_into in dead code. */
#define GST_BUFFER_COPY_METADATA (1 << 0)
#define GST_BUFFER_COPY_MEMORY (1 << 1)
#define GST_BUFFER_COPY_DEEP (1 << 2)
/* gst_util_ceil_log2(n) — ceil(log2(n)) for non-zero unsigned n.
* Used by gsth265parser.c::gst_h265_slice_parse_ref_pic_list_modification.
* That function is in the slice-header parser which the libva backend
* does NOT invoke (we only call parse_sps) but the linker still
* needs a definition. Provide a real impl: cheaper to compute than to
* justify a dead-code stub at every call site. */
static inline guint
gst_util_ceil_log2(guint32 n)
{
if (n <= 1) return 0;
/* __builtin_clz returns leading zeros for a 32-bit value;
* 32 - clz(n-1) = bits needed = ceil(log2(n)). */
return 32 - (guint)__builtin_clz(n - 1);
}
/* GstMapInfo's real definition is in <gst/gstmemory.h>; we need at
* least enough to make `info->data` / `info->size` compile. */
struct _GstMapInfo {
GstMemory *memory;
int flags;
guint8 *data;
gsize size;
gsize maxsize;
gpointer user_data[4];
gpointer _gst_reserved[GST_PADDING];
};
/* gst_memory_new_wrapped — dead-code stub (nalutils.c calls it from
* the SEI-insertion path the libva backend never invokes). */
static inline GstMemory *
gst_memory_new_wrapped(int flags, gpointer data, gsize maxsize,
gsize offset, gsize size, gpointer user_data,
void (*notify)(gpointer))
{
(void)flags; (void)data; (void)maxsize; (void)offset; (void)size;
(void)user_data; (void)notify;
abort();
}
/* ===== byte-order read / write macros =====
*
* GStreamer provides these as static-inline functions in
* <gst/gstutils.h>. We re-implement for aarch64 little-endian; the
* parser is byte-stream input, so endian-conversion is mechanical.
* The float / double variants are present in upstream but the parser
* never invokes them provide stubs so the address-taking sites in
* gstbytereader.h's function table compile. */
#define GST_READ_UINT8(data) \
(*((const guint8 *)(data)))
#define GST_READ_UINT16_LE(data) ( \
((guint16)((const guint8 *)(data))[0]) | \
((guint16)((const guint8 *)(data))[1] << 8))
#define GST_READ_UINT16_BE(data) ( \
((guint16)((const guint8 *)(data))[0] << 8) | \
((guint16)((const guint8 *)(data))[1]))
#define GST_READ_UINT24_LE(data) ( \
((guint32)((const guint8 *)(data))[0]) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2] << 16))
#define GST_READ_UINT24_BE(data) ( \
((guint32)((const guint8 *)(data))[0] << 16) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2]))
#define GST_READ_UINT32_LE(data) ( \
((guint32)((const guint8 *)(data))[0]) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2] << 16) | \
((guint32)((const guint8 *)(data))[3] << 24))
#define GST_READ_UINT32_BE(data) ( \
((guint32)((const guint8 *)(data))[0] << 24) | \
((guint32)((const guint8 *)(data))[1] << 16) | \
((guint32)((const guint8 *)(data))[2] << 8) | \
((guint32)((const guint8 *)(data))[3]))
#define GST_READ_UINT64_LE(data) ( \
((guint64)((const guint8 *)(data))[0]) | \
((guint64)((const guint8 *)(data))[1] << 8) | \
((guint64)((const guint8 *)(data))[2] << 16) | \
((guint64)((const guint8 *)(data))[3] << 24) | \
((guint64)((const guint8 *)(data))[4] << 32) | \
((guint64)((const guint8 *)(data))[5] << 40) | \
((guint64)((const guint8 *)(data))[6] << 48) | \
((guint64)((const guint8 *)(data))[7] << 56))
#define GST_READ_UINT64_BE(data) ( \
((guint64)((const guint8 *)(data))[0] << 56) | \
((guint64)((const guint8 *)(data))[1] << 48) | \
((guint64)((const guint8 *)(data))[2] << 40) | \
((guint64)((const guint8 *)(data))[3] << 32) | \
((guint64)((const guint8 *)(data))[4] << 24) | \
((guint64)((const guint8 *)(data))[5] << 16) | \
((guint64)((const guint8 *)(data))[6] << 8) | \
((guint64)((const guint8 *)(data))[7]))
/* Float / double readers — dead-code, abort if called. The function
* table in gstbytereader.h takes the address of the underlying inline
* which we don't need to be functional, only addressable. */
static inline gfloat
GST_READ_FLOAT_LE(const guint8 *data) { (void)data; abort(); }
static inline gfloat
GST_READ_FLOAT_BE(const guint8 *data) { (void)data; abort(); }
static inline gdouble
GST_READ_DOUBLE_LE(const guint8 *data) { (void)data; abort(); }
static inline gdouble
GST_READ_DOUBLE_BE(const guint8 *data) { (void)data; abort(); }
/* Write side — nalutils.c writes-out SEI bytes (dead path for us but
* must compile). */
#define GST_WRITE_UINT8(data, val) do { \
((guint8 *)(data))[0] = (guint8)(val); \
} while (0)
#define GST_WRITE_UINT16_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 8); \
((guint8 *)(data))[1] = (guint8)((val)); \
} while (0)
#define GST_WRITE_UINT24_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 16); \
((guint8 *)(data))[1] = (guint8)((val) >> 8); \
((guint8 *)(data))[2] = (guint8)((val)); \
} while (0)
#define GST_WRITE_UINT32_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 24); \
((guint8 *)(data))[1] = (guint8)((val) >> 16); \
((guint8 *)(data))[2] = (guint8)((val) >> 8); \
((guint8 *)(data))[3] = (guint8)((val)); \
} while (0)
#ifndef MIN
# define MIN(a, b) ((a) < (b) ? (a) : (b))
#endif
#ifndef MAX
# define MAX(a, b) ((a) > (b) ? (a) : (b))
#endif
/* ===== GArray ===== */
typedef struct {
char *data; /* exposed via g_array_index / GArray->data */
guint len; /* element count */
guint capacity; /* allocated element slots */
guint element_size;
gboolean clear; /* zero-fill on grow */
void (*clear_func)(gpointer);
} GArray;
GArray *g_array_new(gboolean zero_terminated, gboolean clear, guint element_size);
GArray *g_array_sized_new(gboolean zero_terminated, gboolean clear,
guint element_size, guint reserved_size);
GArray *g_array_set_size(GArray *array, guint length);
GArray *g_array_append_vals(GArray *array, gconstpointer data, guint len);
void g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer));
gchar *g_array_free(GArray *array, gboolean free_segment);
GArray *g_array_unref(GArray *array);
#define g_array_append_val(a, v) g_array_append_vals((a), &(v), 1)
#define g_array_index(a, t, i) (((t *)(void *)(a)->data)[i])
/* ===== GList — stubs that abort if reached =====
*
* Surveyed call sites: gsth265parser.c uses g_list_prepend / g_list_sort /
* g_list_free_full in code paths the libva backend does not invoke for
* basic SPS parsing (likely SEI message accumulation). Stub to abort so
* any future call surfaces immediately rather than silently corrupting. */
/* GList — full struct (not opaque) so callers can do `list->data`.
* The functions still abort because we never construct a GList. */
typedef struct _GList GList;
struct _GList {
gpointer data;
GList *next;
GList *prev;
};
static inline GList *g_list_prepend(GList *list G_GNUC_UNUSED, gpointer data G_GNUC_UNUSED) { abort(); }
static inline GList *g_list_sort(GList *list G_GNUC_UNUSED, int (*cmp)(gconstpointer, gconstpointer) G_GNUC_UNUSED) { abort(); }
static inline void g_list_free_full(GList *list G_GNUC_UNUSED, void (*free_func)(gpointer) G_GNUC_UNUSED) { abort(); }
/* ===== g_once_init_enter / g_once_init_leave =====
*
* GLib's lazy-init guards. The parser uses these for one-shot static
* initialization (e.g. profile-name table). Our backend is single-
* threaded at the parser-init site (driver_init), so we can simplify
* to a plain run-once gate. */
#define g_once_init_enter(loc) (*(loc) == 0)
#define g_once_init_leave(loc, val) (*(loc) = (val))
/* ===== conversions ===== */
#define GINT_TO_POINTER(i) ((gpointer)(uintptr_t)(gint)(i))
#define GPOINTER_TO_INT(p) ((gint)(uintptr_t)(p))
#endif /* LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H */
-90
View File
@@ -1,90 +0,0 @@
/*
* v4l2-hevc-ext-controls.h verbatim mirror of Linux 7.0+ V4L2 stateless
* HEVC extended-SPS RPS control definitions, shipped as an internal
* header so this libva backend can be built against pre-7.0
* linux-api-headers packages (currently ampere ships 6.19-1).
*
* Upstream source: linux kernel, include/uapi/linux/v4l2-controls.h
* As-of: Linux 7.0-rc3 (Detlev Casanova / Collabora "VDPU381/VDPU383"
* series, see lkml.org/lkml/2026/1/9/1334). The two CIDs + two structs
* + two flag macros below are byte-for-byte the kernel UAPI definitions.
*
* Once linux-api-headers >= 7.0 is the floor across the fleet, this
* shim becomes redundant `<linux/v4l2-controls.h>` will provide the
* same symbols. The include order in h265.c is: this header BEFORE
* <linux/v4l2-controls.h>, so when the system catches up, the macro
* guards below silently no-op and we use the system definitions.
*
* License: MIT (matches backend's COPYING.MIT). Per LGPL § 3.b., the
* kernel UAPI struct definitions themselves are excepted from the
* kernel's overall GPL and may be copied verbatim into userspace
* binaries without inheriting GPL.
*
* Rationale + iter2 plan: see
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md (§Step 3)
* ~/src/ampere-kernel-decoders/phase0_findings_iter2.md
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
#define LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
#include <linux/types.h>
#include <linux/v4l2-controls.h>
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS \
(V4L2_CID_CODEC_STATELESS_BASE + 408)
#endif
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS \
(V4L2_CID_CODEC_STATELESS_BASE + 409)
#endif
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
# define V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED 0x1
#endif
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
# define V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT 0x1
#endif
/*
* struct v4l2_ctrl_hevc_ext_sps_st_rps HEVC short-term RPS parameters.
*
* Dynamic-size 1-dimension array. Number of elements is
* v4l2_ctrl_hevc_sps::num_short_term_ref_pic_sets
* Can contain up to 65 elements (the H.265 spec § 7.4.3.2.1 maximum).
*/
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED
# define V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED 1
struct v4l2_ctrl_hevc_ext_sps_st_rps {
__u8 delta_idx_minus1;
__u8 delta_rps_sign;
__u8 num_negative_pics;
__u8 num_positive_pics;
__u32 used_by_curr_pic;
__u32 use_delta_flag;
__u16 abs_delta_rps_minus1;
__u16 delta_poc_s0_minus1[16];
__u16 delta_poc_s1_minus1[16];
__u16 flags;
};
#endif
/*
* struct v4l2_ctrl_hevc_ext_sps_lt_rps HEVC long-term RPS parameters.
*
* Dynamic-size 1-dimension array. Number of elements is
* v4l2_ctrl_hevc_sps::num_long_term_ref_pics_sps
* Can contain up to 33 elements (the H.265 spec § 7.4.3.2.1 maximum).
*/
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED
# define V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED 1
struct v4l2_ctrl_hevc_ext_sps_lt_rps {
__u16 lt_ref_pic_poc_lsb_sps;
__u16 flags;
};
#endif
#endif /* LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H */
+20 -269
View File
@@ -31,16 +31,8 @@
#include "video.h"
#include <assert.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/dma-buf.h>
#include "nv15.h"
#include "nv12_col128.h"
#include "tiled_yuv.h"
#include "utils.h"
#include "v4l2.h"
@@ -48,6 +40,7 @@
VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
int width, int height, VAImage *image)
{
request_log("ENTER RequestCreateImage\n");
struct request_data *driver_data = context->pDriverData;
unsigned int destination_sizes[VIDEO_MAX_PLANES];
unsigned int destination_bytesperlines[VIDEO_MAX_PLANES];
@@ -88,50 +81,13 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
for (i = 0; i < planes_count; i++)
size += destination_sizes[i];
if (format->fourcc == VA_FOURCC_P010) {
/*
* iter39: P010 image overrides V4L2-side NV15 sizing. The
* source is the kernel-reported NV15 packed plane; the image
* buffer holds dense P010 (2 bytes per pixel, 16bpp).
* Recompute sizes/pitches against P010 layout so consumers
* (vaGetImage, vaDeriveImage) see standard P010 geometry.
*/
destination_bytesperlines[0] = width * 2;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
size = 0;
for (i = 0; i < destination_planes_count; i++)
size += destination_sizes[i];
} else if (format->fourcc == VA_FOURCC_NV12 &&
video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
/*
* iter40 Phase 5 review F2: NC12 source, NV12 image output.
* V4L2-reported destination_bytesperlines[0] is the NC12
* column stride (= ALIGN(height,8) * 3/2 e.g. 1080 for
* 1280×720), NOT the linear NV12 Y stride. Override to the
* linear stride (width) so VAImage pitches reflect the
* detile-output layout the consumer reads.
*/
destination_bytesperlines[0] = width;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
size = 0;
for (i = 0; i < destination_planes_count; i++)
size += destination_sizes[i];
} else {
/* NV12: V4L2 stride is correct, sizes derived from height. */
destination_sizes[0] = destination_bytesperlines[0] * format_height;
/* Here we calculate the sizes assuming NV12. */
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
id = object_heap_allocate(&driver_data->image_heap);
@@ -170,7 +126,7 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
VAStatus RequestDestroyImage(VADriverContextP context, VAImageID image_id)
{
request_log("ENTER RequestDestroyImage\n");
struct request_data *driver_data = context->pDriverData;
struct object_image *image_object;
VAStatus status;
@@ -195,162 +151,23 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
{
struct object_buffer *buffer_object;
unsigned int i;
int sync_fds[VIDEO_MAX_PLANES];
unsigned int n_sync_fds = 0;
buffer_object = BUFFER(driver_data, image->buf);
if (buffer_object == NULL)
return VA_STATUS_ERROR_INVALID_BUFFER;
for (i = 0; i < VIDEO_MAX_PLANES; i++)
sync_fds[i] = -1;
/*
* iter13 α-17: explicit cache sync around the CAPTURE buffer read.
*
* The CAPTURE buffer is V4L2_MEMORY_MMAP and was mapped at
* cap_pool_init time with cached attributes. Kernel decode writes to
* the buffer via DMA, which doesn't propagate to the CPU's cache
* observer for that virtual mapping. Reading from
* surface_object->destination_data[] without an explicit cache
* invalidation returns stale data observed empirically as Bug 4
* (H.264 partial-fill) and Bug 5 (HEVC all-zero) when libva went
* through the SAME readback path that kdirect ffmpeg-v4l2request +
* DRM_PRIME-mmap successfully reads (kdirect's drm-prime mmap
* implicitly handles sync).
*
* DMA_BUF_IOCTL_SYNC(START | READ) makes the CPU mapping coherent
* with the producing engine's writes; END releases the sync.
* Per V4L2 + dma-buf spec, this is the userspace contract for
* cached-mmap'd buffers (Tomasz Figa, linaro-mm-sig 2024-07-11).
*
* Requires a dma-buf fd: get one via VIDIOC_EXPBUF, sync, close.
* Per-call cost is one ioctl pair + one fd open/close per plane.
* Could be optimised by caching the EXPBUF fd on the cap_pool slot,
* but doing it just-in-time keeps the lifecycle uncomplicated. The
* EXPBUF fd's dup count doesn't affect the V4L2 buffer's underlying
* pages; closing the fd is a no-op on memory.
*
* If EXPBUF fails (e.g., consumer-held EXPBUF prevents a second one
* only true for hantro G1 oddity), we skip the sync silently. The
* existing pre-iter13 behavior is preserved on the error path.
*/
if (surface_object->current_slot != NULL &&
driver_data->video_format != NULL) {
unsigned int capture_type =
v4l2_type_video_capture(driver_data->video_format->v4l2_mplane);
if (v4l2_export_buffer(driver_data->video_fd, capture_type,
surface_object->destination_index,
O_RDONLY, sync_fds,
surface_object->destination_buffers_count) >= 0) {
n_sync_fds = surface_object->destination_buffers_count;
for (i = 0; i < n_sync_fds; i++) {
struct dma_buf_sync s = {
.flags = DMA_BUF_SYNC_START |
DMA_BUF_SYNC_READ,
};
/* failure is non-fatal: we continue with the read */
(void)ioctl(sync_fds[i], DMA_BUF_IOCTL_SYNC, &s);
}
}
}
for (i = 0; i < surface_object->destination_planes_count; i++) {
/*
* iter40 Phase 5 review F1: guard extended from __arm__ to
* __arm__ || __aarch64__. Without this, the detile primitives
* silently compiled out on aarch64 (fresnel RK3399, ampere
* RK3588, higgs Pi CM5) and the memcpy fall-through delivered
* raw tiled bytes to NV12/P010 image consumers. iter39 5/5
* PASS masked the issue because no 10-bit path was exercised.
*/
#if defined(__arm__) || defined(__aarch64__)
/*
* Sunxi tiled_to_planar lives in tiled_yuv.S which is
* #ifdef __arm__ symbol absent on aarch64. Keep this
* branch arm-only; aarch64 Sunxi support would need a C or
* aarch64-ASM port (no Sunxi aarch64 board in current fleet).
*/
#if defined(__arm__)
if (!video_format_is_linear(driver_data->video_format))
tiled_to_planar(surface_object->destination_data[i],
buffer_object->data + image->offsets[i],
image->pitches[i], image->width,
i == 0 ? image->height :
image->height / 2);
else
#endif
if (driver_data->is_10bit &&
image->format.fourcc == VA_FOURCC_P010) {
/*
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
* bytes); the VA image buffer is dense P010 (2B/pixel,
* value in bits[15:6]). Source stride is the V4L2-
* reported NV15 bytesperline (= ceil(width/4)*5,
* possibly aligned higher by the kernel); destination
* stride is image->pitches[i] = width * 2.
*/
unsigned int plane_h = (i == 0) ? image->height
: image->height / 2;
nv15_unpack_plane_to_p010(
surface_object->destination_data[i],
(uint16_t *)(buffer_object->data + image->offsets[i]),
image->width, plane_h,
surface_object->destination_bytesperlines[i]);
} else if (driver_data->video_format != NULL &&
driver_data->video_format->v4l2_format ==
V4L2_PIX_FMT_NV12_COL128 &&
image->format.fourcc == VA_FOURCC_NV12) {
/*
* iter40: Pi 5 rpi-hevc-dec emits NV12_COL128 (SAND
* 128-pixel-wide column tiles). Detile to linear NV12
* via the per-plane primitive. surface_object->
* destination_data[i] is the V4L2 CAPTURE mmap (single
* buffer, planes_count==2): i==0 is the Y plane base,
* i==1 is the UV plane base offset within the SAME
* physical buffer (per cap_pool plane[1] offset = Y
* plane size in COL128 layout).
*
* src_col_stride = destination_bytesperlines[i] = the
* kernel-reported NC12 bytesperline (column stride,
* = ALIGN(image_h, 8) * 3/2). Same for both planes
* since column geometry is plane-agnostic.
*
* dst stride is image->pitches[i] = image->width
* (overridden in RequestCreateImage NC12 branch below).
*/
if (i == 0) {
nv12_col128_detile_y(
(uint8_t *)(buffer_object->data + image->offsets[i]),
image->pitches[i],
surface_object->destination_data[i],
surface_object->destination_bytesperlines[i],
image->width, image->height);
} else {
nv12_col128_detile_uv(
(uint8_t *)(buffer_object->data + image->offsets[i]),
image->pitches[i],
surface_object->destination_data[i],
surface_object->destination_bytesperlines[i],
image->width, image->height / 2);
}
} else {
#endif
else {
memcpy(buffer_object->data + image->offsets[i],
surface_object->destination_data[i],
surface_object->destination_sizes[i]);
#if defined(__arm__) || defined(__aarch64__)
}
#endif
}
/* iter13 α-17: release cache sync. END pairs with each START. */
for (i = 0; i < n_sync_fds; i++) {
struct dma_buf_sync s = {
.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ,
};
(void)ioctl(sync_fds[i], DMA_BUF_IOCTL_SYNC, &s);
close(sync_fds[i]);
}
return VA_STATUS_SUCCESS;
@@ -359,13 +176,13 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
VAImage *image)
{
request_log("ENTER RequestDeriveImage\n");
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object;
struct object_buffer *buffer_object;
VAImageFormat format;
VAStatus status;
surface_object = SURFACE(driver_data, surface_id);
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
@@ -376,41 +193,16 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
return status;
}
/* Fully populate VAImageFormat to match QueryImageFormats output. */
memset(&format, 0, sizeof(format));
if (driver_data->is_10bit) {
/* iter39: 10-bit session derives a P010 image. NV15-source
* unpack happens in copy_surface_to_image. */
format.fourcc = VA_FOURCC_P010;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 24;
} else {
format.fourcc = VA_FOURCC_NV12;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 12;
}
format.fourcc = VA_FOURCC_NV12;
status = RequestCreateImage(context, &format, surface_object->width,
surface_object->height, image);
if (status != VA_STATUS_SUCCESS)
return status;
/*
* Iter2 Fix 3: skip the surfaceimage copy when no CAPTURE slot is
* bound. ffmpeg's av_hwframe_ctx_init probes vaDeriveImage on a
* never-decoded surface to learn the format; it doesn't read the
* data. With the cap_pool decoupling, destination_data[] is NULL
* until BeginPicture binds a slot copying from a NULL source
* crashed in memcpy. The image's buffer remains zero-initialized;
* subsequent post-decode DeriveImage on the same surface (after
* BeginPicture has bound a slot) does the real copy.
*/
if (surface_object->current_slot != NULL) {
status = copy_surface_to_image (driver_data, surface_object,
image);
if (status != VA_STATUS_SUCCESS)
return status;
}
status = copy_surface_to_image (driver_data, surface_object, image);
if (status != VA_STATUS_SUCCESS)
return status;
surface_object->status = VASurfaceReady;
@@ -423,52 +215,9 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
VAStatus RequestQueryImageFormats(VADriverContextP context,
VAImageFormat *formats, int *formats_count)
{
struct request_data *driver_data = context->pDriverData;
int n = 0;
/*
* Populate the VAImageFormat fully per VAAPI spec not just
* .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, Firefox)
* read .byte_order and .bits_per_pixel; leaving them
* uninitialized inherits caller-stack garbage and produces
* non-deterministic behavior. Reference: Mesa's
* gallium/frontends/va/image.c::vlVaQueryImageFormats and
* intel-vaapi-driver's i965_drv_video.c.
*
* iter39: advertise P010 when an active session is 10-bit so
* ffmpeg-vaapi sees a valid 10-bit-compatible entry during
* vaQueryImageFormats. NV12 stays advertised unconditionally so
* the 8-bit catalog query response is unchanged.
*/
memset(&formats[n], 0, sizeof(formats[n]));
formats[n].fourcc = VA_FOURCC_NV12;
formats[n].byte_order = VA_LSB_FIRST;
formats[n].bits_per_pixel = 12;
n++;
/*
* iter39 Option B revert (2026-05-17): P010 advertisement is
* gated on driver_data->is_10bit again. Previously advertised
* unconditionally (63fed87) so ffmpeg-vaapi's early
* vaQueryImageFormats (pre-vaCreateContext) could see it for
* 10-bit profiles but that broke HEVC 8-bit on fresnel:
* ffmpeg-vaapi picked P010 for the HEVC hwframe pool, EndPicture
* SEGV'd in the .so when the consumer-side P010 expectations met
* an 8-bit NV12 CAPTURE buffer.
* Safe because Option B drops VAProfileHEVCMain10 + Hi10P from
* enumeration no 10-bit decode pipeline will reach this catalog
* query so the gate-on-is_10bit (which stays false for 8-bit
* profiles) correctly returns NV12-only.
*/
if (driver_data->is_10bit && n < V4L2_REQUEST_MAX_IMAGE_FORMATS) {
memset(&formats[n], 0, sizeof(formats[n]));
formats[n].fourcc = VA_FOURCC_P010;
formats[n].byte_order = VA_LSB_FIRST;
formats[n].bits_per_pixel = 24;
n++;
}
*formats_count = n;
request_log("ENTER RequestQueryImageFormats\n");
formats[0].fourcc = VA_FOURCC_NV12;
*formats_count = 1;
return VA_STATUS_SUCCESS;
}
@@ -476,6 +225,7 @@ VAStatus RequestQueryImageFormats(VADriverContextP context,
VAStatus RequestSetImagePalette(VADriverContextP context, VAImageID image_id,
unsigned char *palette)
{
request_log("ENTER RequestSetImagePalette\n");
return VA_STATUS_ERROR_UNIMPLEMENTED;
}
@@ -483,12 +233,12 @@ VAStatus RequestGetImage(VADriverContextP context, VASurfaceID surface_id,
int x, int y, unsigned int width, unsigned int height,
VAImageID image_id)
{
request_log("ENTER RequestGetImage\n");
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object;
struct object_image *image_object;
VAImage *image;
surface_object = SURFACE(driver_data, surface_id);
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
@@ -510,5 +260,6 @@ VAStatus RequestPutImage(VADriverContextP context, VASurfaceID surface_id,
int dst_x, int dst_y, unsigned int dst_width,
unsigned int dst_height)
{
request_log("ENTER RequestPutImage\n");
return VA_STATUS_ERROR_UNIMPLEMENTED;
}
+8 -8
View File
@@ -26,7 +26,7 @@
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <poll.h>
#include <sys/select.h>
#include <linux/media.h>
@@ -78,19 +78,19 @@ int media_request_queue(int request_fd)
int media_request_wait_completion(int request_fd)
{
/* poll() instead of select(): Firefox's RDD seccomp policy admits
* poll/ppoll but not select/pselect6 (as of FF150). Functionally
* equivalent here the media request fd signals completion via
* exceptional condition, mapped to POLLPRI for poll(). */
struct pollfd pfd = { .fd = request_fd, .events = POLLPRI };
struct timeval tv = { 0, 300000 };
fd_set except_fds;
int rc;
rc = poll(&pfd, 1, 300 /* ms */);
FD_ZERO(&except_fds);
FD_SET(request_fd, &except_fds);
rc = select(request_fd + 1, NULL, NULL, &except_fds, &tv);
if (rc == 0) {
request_log("Timeout when waiting for media request\n");
return -1;
} else if (rc < 0) {
request_log("Unable to poll media request: %s\n",
request_log("Unable to select media request: %s\n",
strerror(errno));
return -1;
}
+3 -53
View File
@@ -22,9 +22,6 @@
autoconf_data = configuration_data()
autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func)
if get_option('daedalus_v4l2')
autoconf_data.set('HAVE_DAEDALUS_V4L2', 1)
endif
autoconf = configure_file(
output: 'autoconfig.h',
@@ -47,25 +44,7 @@ sources = [
'v4l2.c',
'mpeg2.c',
'h264.c',
'h264_slice_header.c',
'request_pool.c',
'cap_pool.c',
'h265.c',
'vp8.c',
'vp9.c',
'av1.c',
'codec.c',
'nv15.c',
'nv12_col128.c',
# Vendored GStreamer 1.28.2 H.265 parser + utilities (LGPL v2.1+,
# see src/h265_parser/gst_compat.h for sourcing notes + per-iter2
# adaptation strategy).
'h265_parser/gst_compat.c',
'h265_parser/gst/base/gstbitreader.c',
'h265_parser/gst/base/gstbytereader.c',
'h265_parser/gst/codecparsers/nalutils.c',
'h265_parser/gst/codecparsers/gsth265parser.c'
# 'h265.c' # Fourier-local: HEVC stripped (see commit log)
]
headers = [
@@ -85,40 +64,11 @@ headers = [
'v4l2.h',
'mpeg2.h',
'h264.h',
'h264_slice_header.h',
'request_pool.h',
'cap_pool.h',
'h265.h',
'vp8.h',
'vp9.h',
'codec.h',
'nv15.h',
'nv12_col128.h',
# Internal mirror of Linux 7.0 V4L2 HEVC EXT_SPS_*_RPS UAPI defs
# (allows building against pre-7.0 linux-api-headers; redundant
# once the host headers are 7.0+).
'hevc-ctrls/v4l2-hevc-ext-controls.h',
# Vendored GStreamer + project shim headers (see sources above).
'h265_parser/gst_compat.h',
'h265_parser/gst/gst.h',
'h265_parser/gst/glib-compat-private.h',
'h265_parser/gst/base/base-prelude.h',
'h265_parser/gst/base/gstbitreader.h',
'h265_parser/gst/base/gstbytereader.h',
'h265_parser/gst/base/gstbitwriter.h',
'h265_parser/gst/codecparsers/codecparsers-prelude.h',
'h265_parser/gst/codecparsers/gsth265parser.h',
'h265_parser/gst/codecparsers/nalutils.h'
# 'h265.h' # Fourier-local: HEVC stripped (see commit log)
]
includes = [
include_directories('../include'),
# Vendored GStreamer parser tree — the parser's #include <gst/base/...>
# style references resolve here via stub headers that redirect to
# gst_compat.h.
include_directories('h265_parser')
include_directories('../include')
]
cflags = [
+87 -182
View File
@@ -23,34 +23,6 @@
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* fresnel-fourier iter1 Phase 6 commit B: rewrite against new split
* V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE,QUANTISATION} stateless
* controls (mainline kernel <linux/v4l2-controls.h>:1985-2105).
*
* Replaces the staging-era V4L2_CID_MPEG_VIDEO_MPEG2_{SLICE_PARAMS,
* QUANTIZATION} combined-struct API that the fork previously used
* via include/mpeg2-ctrls.h (deleted in commit C).
*
* Per-frame submission: one batched VIDIOC_S_EXT_CTRLS with three
* controls (12-byte SEQUENCE + 32-byte PICTURE + 256-byte QUANTISATION),
* matching FFmpeg libavcodec/v4l2_request_mpeg2.c:130-155 reference
* implementation. Verified empirically in fresnel-fourier Phase 0
* cross-validator sweep and Phase 3 Baseline C verbatim payload.
*
* Quantisation matrix order: zigzag scanning order per kernel doc
* v4l2-controls.h:2076. VAAPI VAIQMatrixBufferMPEG2 also stores in
* zigzag scanning order (per VAAPI spec). Direct memcpy works; no
* permutation in the libva backend. Kernel hantro_mpeg2.c::
* hantro_mpeg2_dec_copy_qtable applies the zigzag-to-raster
* permutation when copying to the hardware quantisation table.
*
* Default matrices (when iqmatrix_set==false): MPEG-2 spec defaults
* per ISO/IEC 13818-2 Table 7-3, transcribed from Phase 3 Baseline C
* QUANTISATION verbatim payload (256 bytes captured from
* ffmpeg-v4l2request decode of bbb_720p10s_mpeg2.ts).
*/
#include "mpeg2.h"
#include "context.h"
#include "request.h"
@@ -63,187 +35,120 @@
#include <sys/mman.h>
#include <linux/videodev2.h>
#include <linux/v4l2-controls.h>
#include <mpeg2-ctrls.h>
#include "v4l2.h"
/*
* MPEG-2 default intra quantisation matrix in zigzag scanning order
* (ISO/IEC 13818-2 Table 7-3, verified empirically against
* fresnel-fourier Phase 3 Baseline C QUANTISATION payload bytes 0..63
* from a ffmpeg-v4l2request decode of the BBB 720p10s MPEG-2 fixture).
*/
static const __u8 mpeg2_default_intra_matrix[64] = {
8, 16, 16, 19, 16, 19, 22, 22,
22, 22, 22, 22, 26, 24, 26, 27,
27, 27, 26, 26, 26, 26, 27, 27,
27, 29, 29, 29, 34, 34, 34, 29,
29, 29, 27, 27, 29, 29, 32, 32,
34, 34, 37, 38, 37, 35, 35, 34,
35, 38, 38, 40, 40, 40, 48, 48,
46, 46, 56, 56, 58, 69, 69, 83,
};
/*
* MPEG-2 default non-intra quantisation matrix is uniformly 16 in spec.
* Verified against Phase 3 Baseline C QUANTISATION payload bytes
* 64..127 (all 0x10 = 16). Same applies to chroma_non_intra
* (bytes 192..255). Filled at runtime via memset rather than a
* separate const array to keep the binary smaller.
*/
int mpeg2_set_controls(struct request_data *driver_data,
struct object_context *context_object,
struct object_surface *surface_object)
{
VAPictureParameterBufferMPEG2 *picture =
&surface_object->params.mpeg2.picture;
VASliceParameterBufferMPEG2 *slice =
&surface_object->params.mpeg2.slice;
VAIQMatrixBufferMPEG2 *iqmatrix =
&surface_object->params.mpeg2.iqmatrix;
bool iqmatrix_set = surface_object->params.mpeg2.iqmatrix_set;
/* Clause 2: v4l2_ctrl_mpeg2_sequence (12 bytes) */
struct v4l2_ctrl_mpeg2_sequence sequence;
/* Clause 3: v4l2_ctrl_mpeg2_picture (32 bytes; reserved[5] must be zero) */
struct v4l2_ctrl_mpeg2_picture pic;
/* Clause 4: v4l2_ctrl_mpeg2_quantisation (256 bytes) */
struct v4l2_ctrl_mpeg2_quantisation quant;
struct v4l2_ctrl_mpeg2_slice_params slice_params;
struct v4l2_ctrl_mpeg2_quantization quantization;
struct object_surface *forward_reference_surface;
struct object_surface *backward_reference_surface;
uint64_t timestamp;
unsigned int i;
int rc;
memset(&sequence, 0, sizeof sequence);
memset(&pic, 0, sizeof pic); /* zeros pic.reserved[5] per Clause 3 */
memset(&quant, 0, sizeof quant);
memset(&slice_params, 0, sizeof(slice_params));
/* === Clause 2: SEQUENCE ===
*
* VAAPI's VAPictureParameterBufferMPEG2 doesn't expose the
* sequence-extension's progressive_sequence flag separately;
* use progressive_frame from the picture-coding extension as a
* proxy. They're identical for typical streams (BBB is
* progressive throughout).
*/
sequence.horizontal_size = picture->horizontal_size;
sequence.vertical_size = picture->vertical_size;
sequence.vbv_buffer_size = surface_object->source_size;
sequence.profile_and_level_indication = 0; /* not exposed by VAAPI */
sequence.chroma_format = 1; /* 4:2:0 — campaign codec scope */
if (picture->picture_coding_extension.bits.progressive_frame)
sequence.flags |= V4L2_MPEG2_SEQ_FLAG_PROGRESSIVE;
slice_params.bit_size = surface_object->slices_size * 8;
slice_params.data_bit_offset = 0;
slice_params.sequence.horizontal_size = picture->horizontal_size;
slice_params.sequence.vertical_size = picture->vertical_size;
slice_params.sequence.vbv_buffer_size = SOURCE_SIZE_MAX;
slice_params.sequence.profile_and_level_indication = 0;
slice_params.sequence.progressive_sequence = 0;
slice_params.sequence.chroma_format = 1; // 4:2:0
slice_params.picture.picture_coding_type = picture->picture_coding_type;
slice_params.picture.f_code[0][0] = (picture->f_code >> 12) & 0x0f;
slice_params.picture.f_code[0][1] = (picture->f_code >> 8) & 0x0f;
slice_params.picture.f_code[1][0] = (picture->f_code >> 4) & 0x0f;
slice_params.picture.f_code[1][1] = (picture->f_code >> 0) & 0x0f;
slice_params.picture.intra_dc_precision =
picture->picture_coding_extension.bits.intra_dc_precision;
slice_params.picture.picture_structure =
picture->picture_coding_extension.bits.picture_structure;
slice_params.picture.top_field_first =
picture->picture_coding_extension.bits.top_field_first;
slice_params.picture.frame_pred_frame_dct =
picture->picture_coding_extension.bits.frame_pred_frame_dct;
slice_params.picture.concealment_motion_vectors =
picture->picture_coding_extension.bits
.concealment_motion_vectors;
slice_params.picture.q_scale_type =
picture->picture_coding_extension.bits.q_scale_type;
slice_params.picture.intra_vlc_format =
picture->picture_coding_extension.bits.intra_vlc_format;
slice_params.picture.alternate_scan =
picture->picture_coding_extension.bits.alternate_scan;
slice_params.picture.repeat_first_field =
picture->picture_coding_extension.bits.repeat_first_field;
slice_params.picture.progressive_frame =
picture->picture_coding_extension.bits.progressive_frame;
slice_params.quantiser_scale_code = slice->quantiser_scale_code;
/* === Clause 3: PICTURE ===
*
* Behavioral correction vs. previous mpeg2.c at this iter1:
* old code self-referenced surface_object->timestamp when the
* VAAPI ref picture was VA_INVALID_ID. New code sets ts = 0 for
* missing refs, matching kernel doc's 0-as-sentinel convention
* (verified against Phase 3 Baseline C frame 1: I-frame has both
* forward_ref_ts and backward_ref_ts == 0; FFmpeg
* libavcodec/v4l2_request_mpeg2.c:98-108 uses same convention).
*/
forward_reference_surface =
SURFACE(driver_data, picture->forward_reference_picture);
if (forward_reference_surface != NULL)
pic.forward_ref_ts =
v4l2_timeval_to_ns(&forward_reference_surface->timestamp);
if (forward_reference_surface == NULL)
forward_reference_surface = surface_object;
timestamp = v4l2_timeval_to_ns(&forward_reference_surface->timestamp);
slice_params.forward_ref_ts = timestamp;
backward_reference_surface =
SURFACE(driver_data, picture->backward_reference_picture);
if (backward_reference_surface != NULL)
pic.backward_ref_ts =
v4l2_timeval_to_ns(&backward_reference_surface->timestamp);
if (backward_reference_surface == NULL)
backward_reference_surface = surface_object;
if (picture->picture_coding_extension.bits.top_field_first)
pic.flags |= V4L2_MPEG2_PIC_FLAG_TOP_FIELD_FIRST;
if (picture->picture_coding_extension.bits.frame_pred_frame_dct)
pic.flags |= V4L2_MPEG2_PIC_FLAG_FRAME_PRED_DCT;
if (picture->picture_coding_extension.bits.concealment_motion_vectors)
pic.flags |= V4L2_MPEG2_PIC_FLAG_CONCEALMENT_MV;
if (picture->picture_coding_extension.bits.q_scale_type)
pic.flags |= V4L2_MPEG2_PIC_FLAG_Q_SCALE_TYPE;
if (picture->picture_coding_extension.bits.intra_vlc_format)
pic.flags |= V4L2_MPEG2_PIC_FLAG_INTRA_VLC;
if (picture->picture_coding_extension.bits.alternate_scan)
pic.flags |= V4L2_MPEG2_PIC_FLAG_ALT_SCAN;
if (picture->picture_coding_extension.bits.repeat_first_field)
pic.flags |= V4L2_MPEG2_PIC_FLAG_REPEAT_FIRST;
if (picture->picture_coding_extension.bits.progressive_frame)
pic.flags |= V4L2_MPEG2_PIC_FLAG_PROGRESSIVE;
timestamp = v4l2_timeval_to_ns(&backward_reference_surface->timestamp);
slice_params.backward_ref_ts = timestamp;
pic.f_code[0][0] = (picture->f_code >> 12) & 0x0f;
pic.f_code[0][1] = (picture->f_code >> 8) & 0x0f;
pic.f_code[1][0] = (picture->f_code >> 4) & 0x0f;
pic.f_code[1][1] = (picture->f_code >> 0) & 0x0f;
pic.picture_coding_type = picture->picture_coding_type;
pic.picture_structure =
picture->picture_coding_extension.bits.picture_structure;
pic.intra_dc_precision =
picture->picture_coding_extension.bits.intra_dc_precision;
/* pic.reserved[5] zeroed by memset above */
/* === Clause 4: QUANTISATION ===
*
* Kernel always reads all four matrices unconditionally
* (no load_* flags in the new API; kernel hantro_mpeg2.c
* doesn't synthesize defaults). When VAAPI's consumer didn't
* send VAIQMatrixBufferType (iqmatrix_set==false), populate
* with MPEG-2 spec default matrices.
*
* VAAPI VAIQMatrixBufferMPEG2 stores matrices in zigzag scanning
* order (per VAAPI spec). Kernel expects zigzag scanning order
* (per v4l2-controls.h:2076). Direct memcpy.
*/
if (iqmatrix_set) {
memcpy(quant.intra_quantiser_matrix,
iqmatrix->intra_quantiser_matrix, 64);
memcpy(quant.non_intra_quantiser_matrix,
iqmatrix->non_intra_quantiser_matrix, 64);
memcpy(quant.chroma_intra_quantiser_matrix,
iqmatrix->chroma_intra_quantiser_matrix, 64);
memcpy(quant.chroma_non_intra_quantiser_matrix,
iqmatrix->chroma_non_intra_quantiser_matrix, 64);
} else {
memcpy(quant.intra_quantiser_matrix,
mpeg2_default_intra_matrix, 64);
memset(quant.non_intra_quantiser_matrix, 16, 64);
memcpy(quant.chroma_intra_quantiser_matrix,
mpeg2_default_intra_matrix, 64);
memset(quant.chroma_non_intra_quantiser_matrix, 16, 64);
}
/* === Clause 1+5: batched submission ===
*
* One VIDIOC_S_EXT_CTRLS with all three controls. Matches
* src/h264.c:986 pattern (single v4l2_set_controls call) and
* FFmpeg ff_v4l2_request_decode_frame contract. Bound to the
* surface's permanent request_fd (iter6 per-OUTPUT-slot binding;
* picture.c:284 sets surface_object->request_fd at BeginPicture).
*/
struct v4l2_ext_control ctrls[3] = {
{
.id = V4L2_CID_STATELESS_MPEG2_SEQUENCE,
.ptr = &sequence,
.size = sizeof sequence,
},
{
.id = V4L2_CID_STATELESS_MPEG2_PICTURE,
.ptr = &pic,
.size = sizeof pic,
},
{
.id = V4L2_CID_STATELESS_MPEG2_QUANTISATION,
.ptr = &quant,
.size = sizeof quant,
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 3);
rc = v4l2_set_control(driver_data->video_fd, surface_object->request_fd,
V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS,
&slice_params, sizeof(slice_params));
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
if (iqmatrix_set) {
quantization.load_intra_quantiser_matrix =
iqmatrix->load_intra_quantiser_matrix;
quantization.load_non_intra_quantiser_matrix =
iqmatrix->load_non_intra_quantiser_matrix;
quantization.load_chroma_intra_quantiser_matrix =
iqmatrix->load_chroma_intra_quantiser_matrix;
quantization.load_chroma_non_intra_quantiser_matrix =
iqmatrix->load_chroma_non_intra_quantiser_matrix;
for (i = 0; i < 64; i++) {
quantization.intra_quantiser_matrix[i] =
iqmatrix->intra_quantiser_matrix[i];
quantization.non_intra_quantiser_matrix[i] =
iqmatrix->non_intra_quantiser_matrix[i];
quantization.chroma_intra_quantiser_matrix[i] =
iqmatrix->chroma_intra_quantiser_matrix[i];
quantization.chroma_non_intra_quantiser_matrix[i] =
iqmatrix->chroma_non_intra_quantiser_matrix[i];
}
rc = v4l2_set_control(driver_data->video_fd,
surface_object->request_fd,
V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION,
&quantization, sizeof(quantization));
}
return 0;
}
-114
View File
@@ -1,114 +0,0 @@
/*
* V4L2_PIX_FMT_NV12_COL128 linear NV12 detile primitive. Pi 5 / CM5
* rpi-hevc-dec CAPTURE. iter40 (2026-05-17).
*
* Math derived from kernel hevc_d_video.c (size formula) +
* ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h (per-pixel offset). The
* single-stripe fast path memcpy's 128 bytes at a time when an output
* row falls entirely within one tile column (the common case);
* straddling rows are split into two memcpy halves.
*
* No NEON / SIMD here correctness first. Each output row generates
* (width / 128) + ~1 memcpys of up to 128 bytes; for 1920x1080 that's
* ~17000 small memcpys per frame, fine for Phase 1 PoC.
*/
#include "nv12_col128.h"
#include <string.h>
/*
* Tile column width in bytes. The 'COL128' name embeds this; if it ever
* varies, take it from V4L2_PIX_FMT_NV12_COL128's kernel definition.
*/
#define NC12_TILE_W 128
/*
* Common Y / UV plane detile the layout is identical (single-byte per
* pixel, column-major 128-wide tiles). The only thing that varies is
* what plane the caller passes in. width here is plane width in bytes
* (= image width for both Y and CbCr-interleaved NV12 UV); height is
* plane height in pixels (image height for Y, image height / 2 for UV).
*/
static void nv12_col128_detile_plane(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src,
unsigned int src_col_stride,
unsigned int width, unsigned int height)
{
unsigned int y, x;
for (y = 0; y < height; y++) {
uint8_t *drow = dst + y * dst_stride;
x = 0;
while (x < width) {
unsigned int col = x / NC12_TILE_W;
unsigned int in_col = x % NC12_TILE_W;
unsigned int n = NC12_TILE_W - in_col;
if (n > width - x)
n = width - x;
/*
* Source byte = base + col*128*col_stride + y*128 + in_col
* Copy n contiguous bytes (all within this tile column,
* since n is capped at the remaining width-in-column).
*/
const uint8_t *p = src
+ (size_t)col * NC12_TILE_W * src_col_stride
+ (size_t)y * NC12_TILE_W
+ in_col;
memcpy(drow + x, p, n);
x += n;
}
}
}
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_y, unsigned int src_col_stride,
unsigned int width, unsigned int height)
{
nv12_col128_detile_plane(dst, dst_stride, src_y, src_col_stride,
width, height);
}
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_uv, unsigned int src_col_stride,
unsigned int width, unsigned int uv_height)
{
/* UV plane (CbCr interleaved): byte-width equals Y-plane width
* (one Cb + one Cr per 2x2 Y block 2 bytes per 2 horizontal Y
* samples 1 byte per Y pixel horizontally). Height is half. */
nv12_col128_detile_plane(dst, dst_stride, src_uv, src_col_stride,
width, uv_height);
}
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
unsigned int image_height)
{
unsigned int aligned_h = (image_height + 7) & ~7u;
/*
* In the COL128 SAND layout, Y and UV are NOT separate planes
* concatenated end-to-end. Within EACH 128-pixel-wide column:
* first 128 * height bytes = Y data for this column strip
* next 128 * height / 2 bytes = UV data for this column strip
* total 128 * bytesperline (= 128 * height * 3/2) bytes per column
*
* The "UV plane base" pointer (data[1] in AVFrame convention) is
* just data[0] + (128 * height) the offset of the UV bytes
* WITHIN the first column. All subsequent UV bytes are reached by
* the same column-stride arithmetic the Y plane uses (col *
* 128 * bytesperline + y * 128 + in_col), so passing this offset
* pointer + iterating y over [0, height/2) traverses all UV rows
* across all columns correctly.
*
* Earlier wrong formula was num_columns * 128 * aligned_h (i.e.
* sizeof(linear Y plane)) that pushed past the end of the SAND
* buffer because the layout isn't planes-end-to-end.
*
* Cross-check: kernel sizeimage = bytesperline * width =
* (aligned_h * 3/2) * num_columns * 128 = num_columns * 128 *
* aligned_h * 3/2. Per column: 128 * aligned_h * 3/2. Y portion
* per column: 128 * aligned_h. UV portion per column: half of Y.
* Sum across columns: matches sizeimage.
*/
return NC12_TILE_W * aligned_h;
}
-88
View File
@@ -1,88 +0,0 @@
/*
* V4L2_PIX_FMT_NV12_COL128 (NC12) SAND-tiled linear NV12 detile.
*
* Pi 5 / CM5 (BCM2712) rpi-hevc-dec CAPTURE format. iter40 (2026-05-17).
*
* Layout (kernel drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
* size-formula + ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h per-pixel
* offset math):
*
* width ALIGN(image_width, 128) -- columns are 128 px wide
* height ALIGN(image_height, 8)
* col_stride (= bytesperline) = height * 3 / 2
* (bytes per [128-wide column] vertical unit incl. Y + UV)
* sizeimage = col_stride * width = total bytes
*
* For pixel (x, y) in the Y plane:
* col = x / 128
* in_col_x = x % 128
* offset = col * col_stride * 128 + y * 128 + in_col_x
*
* UV plane starts at offset (128 * height * num_columns_y) the same
* per-column layout, h/2 rows tall (CbCr interleaved).
*
* The primitive copies the entire image extent at once. width/height are
* the cropped consumer-visible dimensions; src_col_stride is the kernel-
* reported bytesperline (i.e. ALIGN(height,8) * 3/2).
*/
#ifndef _NV12_COL128_H_
#define _NV12_COL128_H_
#include <stdint.h>
#include <linux/videodev2.h>
/*
* Pre-Pi-kernel headers (Arch ALARM linux-api-headers, older mainline
* kernel-headers packages) may not define V4L2_PIX_FMT_NV12_COL128. The
* fourcc is Pi-specific. Provide a private fallback so the backend
* builds on hosts that target NON-Pi codecs too.
*/
#ifndef V4L2_PIX_FMT_NV12_COL128
#define V4L2_PIX_FMT_NV12_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('1') << 16) | ((unsigned int)('2') << 24))
#endif
#ifndef V4L2_PIX_FMT_NV12_10_COL128
/* 10-bit SAND variant: 3 pixels packed into 4 bytes in 128-byte / 96-pixel
* wide columns. iter40 references the fourcc for completeness; the 10-bit
* Pi 5 HEVC chapter (Main10) is post-iter40. */
#define V4L2_PIX_FMT_NV12_10_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('3') << 16) | ((unsigned int)('0') << 24))
#endif
/* Detile the Y plane of an NC12 source to a linear NV12 Y plane.
* dst : pointer to linear NV12 Y plane (caller-owned, dst_stride * height bytes)
* dst_stride : linear Y plane stride in bytes (= width for plain NV12)
* src_y : pointer to start of NC12 Y plane (= NC12 buffer base)
* src_col_stride: kernel-reported bytesperline (= ALIGN(height,8) * 3/2)
* width, height: cropped image dimensions in pixels
*/
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_y, unsigned int src_col_stride,
unsigned int width, unsigned int height);
/* Detile the UV plane (CbCr interleaved, half-height) of an NC12 source.
* dst : pointer to linear NV12 UV plane
* dst_stride : linear UV plane stride in bytes (= width for NV12)
* src_uv : pointer to start of NC12 UV plane (= src_y + Y-plane-size)
* src_col_stride: same as Y plane (same column geometry)
* width : Y-plane width in pixels (UV plane has same byte width)
* uv_height : UV plane height = height / 2
*/
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_uv, unsigned int src_col_stride,
unsigned int width, unsigned int uv_height);
/* Compute the offset of the UV plane within an NC12 buffer.
* image_width, image_height: cropped image dimensions in pixels
* Returns: byte offset from buffer start to UV plane start
* (= 128 * ALIGN(image_height, 8) * num_columns_y)
*/
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
unsigned int image_height);
#endif /* _NV12_COL128_H_ */
-75
View File
@@ -1,75 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "nv15.h"
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
unsigned int width, unsigned int height,
unsigned int src_stride)
{
unsigned int x, y;
unsigned int dst_pitch_px = width;
for (y = 0; y < height; y++) {
const uint8_t *s = src + y * src_stride;
uint16_t *d = dst + y * dst_pitch_px;
for (x = 0; x + 4 <= width; x += 4) {
uint16_t a = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
uint16_t b = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
uint16_t c = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
uint16_t e = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
d[0] = (uint16_t)(a << 6);
d[1] = (uint16_t)(b << 6);
d[2] = (uint16_t)(c << 6);
d[3] = (uint16_t)(e << 6);
d += 4;
s += 5;
}
if (x < width) {
unsigned int rem = width - x;
uint16_t pix[4] = { 0, 0, 0, 0 };
pix[0] = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
if (rem >= 2)
pix[1] = ((uint16_t)s[1] >> 2) |
((uint16_t)(s[2] & 0x0F) << 6);
if (rem >= 3)
pix[2] = ((uint16_t)s[2] >> 4) |
((uint16_t)(s[3] & 0x3F) << 4);
if (rem >= 4)
pix[3] = ((uint16_t)s[3] >> 6) |
((uint16_t)s[4] << 2);
{
unsigned int j;
for (j = 0; j < rem; j++)
d[j] = (uint16_t)(pix[j] << 6);
}
}
}
}
-61
View File
@@ -1,61 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _NV15_H_
#define _NV15_H_
#include <stdint.h>
#include <linux/videodev2.h>
/*
* Older or downstream linux-api-headers / kernel-headers packages may
* not define V4L2_PIX_FMT_NV15. Provide a fallback so the backend
* builds on hosts whose headers are pre-NV15-merge or omit it (e.g.
* Pi 5 Debian trixie 6.12.62 headers include NC12 but not NV15).
* Same numeric value as mainline.
*/
#ifndef V4L2_PIX_FMT_NV15
#define V4L2_PIX_FMT_NV15 \
((unsigned int)('N') | ((unsigned int)('V') << 8) | \
((unsigned int)('1') << 16) | ((unsigned int)('5') << 24))
#endif
/*
* Unpack one plane of V4L2_PIX_FMT_NV15 (4 × 10-bit values packed into
* 5 consecutive bytes, LSB-first) into VA_FOURCC_P010 (16-bit per pixel,
* value in bits [15:6], zeros in [5:0]).
*
* Layout per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
* Call once per plane: luma (W × H, src_stride = ceil(W/4)*5) and chroma
* (W × H/2 same width because UV are interleaved 10-bit values).
*
* src_stride must be the kernel-reported bytesperline for the NV15 plane.
* The destination is dense P010 with row pitch = width * 2 bytes.
*/
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
unsigned int width, unsigned int height,
unsigned int src_stride);
#endif
+19 -513
View File
@@ -34,14 +34,8 @@
#include "h264.h"
#include "h265.h"
#include "mpeg2.h"
#include "vp8.h"
#include "vp9.h"
#include "av1.h"
#include "request_pool.h"
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
@@ -56,255 +50,27 @@
#include "autoconfig.h"
/*
* iter#15 issue #15: ensure the in-flight surface's OUTPUT mmap has
* room for `delta` more bytes appended to slices_size; if not, grow the
* pool transparently via request_pool_resize.
*
* Sequence on overflow:
* 1. Snapshot the surface's accumulated bytes to a temp heap buffer.
* 2. Release the surface's OUTPUT pool slot back to FREE (resize
* requires no slot be borrowed).
* 3. Compute new sizeimage = roundup(needed * 2, 4 KiB), and at least
* double the current source_size so geometric growth amortises
* repeated overruns at the same resolution.
* 4. Call request_pool_resize.
* 5. Re-acquire a pool slot (the new pool has fresh indices and fds).
* 6. Re-mirror surface_object->source_{index,data,size,request_fd}
* from the new slot.
* 7. Restore the saved bytes via memcpy into the new mmap.
*
* Returns VA_STATUS_SUCCESS on clean resize (or no resize needed) and
* VA_STATUS_ERROR_ALLOCATION_FAILED on heap-alloc / V4L2 / kernel
* failure the libva client falls back to surface re-creation as
* before the resize hook landed.
*
* NOTE on inline-Sync invariant: RequestEndPicture calls
* RequestSyncSurface inline, so when codec_store_buffer runs no other
* pool slot is borrowed across libva-driver-API entry points. The
* temporary release-then-reacquire of the in-flight slot here keeps
* that invariant intact across the resize.
*/
static VAStatus
codec_store_buffer_ensure_capacity(struct request_data *driver_data,
struct object_surface *surface_object,
size_t need)
{
struct request_pool_slot *slot;
uint8_t *save_buf;
size_t save_size;
unsigned int saved_index;
size_t want_sizeimage;
unsigned int new_sizeimage;
int new_index;
int rc;
if (need <= surface_object->source_size)
return VA_STATUS_SUCCESS;
save_size = surface_object->slices_size;
save_buf = NULL;
if (save_size > 0) {
save_buf = malloc(save_size);
if (save_buf == NULL) {
request_log("codec_store_buffer_ensure_capacity: malloc(%zu) for resize-save failed\n",
save_size);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
memcpy(save_buf, surface_object->source_data, save_size);
}
/*
* Temporarily release the in-flight slot. The slot's V4L2 buffer
* has NOT been QBUF'd yet (QBUF lives in RequestEndPicture, after
* this codec_store_buffer call), so the release is a clean
* busy=false flip; no kernel state is in question. The slot's
* stale request_fd does not need to be saved the resize closes
* every slot's fd and the post-resize acquire below re-mirrors a
* fresh slot's request_fd into surface_object->request_fd.
*/
saved_index = surface_object->source_index;
request_pool_release(&driver_data->output_pool, saved_index);
/*
* Geometric growth: at least 2× the current source_size, but no
* less than 2× the required total so a single resize covers the
* triggering append plus comfortable headroom for the rest of
* this frame. Round up to a 4 KiB page boundary so the kernel's
* own alignment doesn't waste pages. Compute in size_t so the
* 2× doubling can't silently wrap at 2 GiB on 32-bit unsigned int
* (sizeimage stays bounded by V4L2's u32, but the doubling target
* could otherwise overflow before the clamp).
*/
want_sizeimage = need * 2;
if (want_sizeimage < (size_t)surface_object->source_size * 2)
want_sizeimage = (size_t)surface_object->source_size * 2;
if (want_sizeimage > 0x40000000u) /* 1 GiB hard cap — V4L2 sizeimage is u32 */
want_sizeimage = 0x40000000u;
want_sizeimage = (want_sizeimage + 0xFFFu) & ~(size_t)0xFFFu;
new_sizeimage = (unsigned int)want_sizeimage;
request_log("codec_store_buffer: OUTPUT-pool resize (need %zu > cap %u → new_sizeimage %u)\n",
need, surface_object->source_size, new_sizeimage);
rc = request_pool_resize(&driver_data->output_pool, new_sizeimage);
if (rc < 0) {
/*
* Resize failed. The original slot was already released
* above, so surface_object->source_data is now pointing
* at a FREE-but-still-borrowable mmap. Restore the
* surface's slot mirror so EndPicture / DestroyContext
* unwind paths see a consistent (if partial) state.
*
* If the resize aborted early (pre-STREAMOFF), the slot
* is intact: re-acquiring the same index is the inverse
* of the temporary release above. If it aborted later
* (post-teardown), the slot's data/size were zeroed in
* place by request_pool_resize and the re-acquire flips
* busy=true on a dead slot still safe, because the
* caller will return ERROR_ALLOCATION_FAILED and the
* libva consumer destroys the surface/context.
*/
(void)request_pool_acquire(&driver_data->output_pool);
free(save_buf);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
new_index = request_pool_acquire(&driver_data->output_pool);
if (new_index < 0) {
free(save_buf);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
slot = request_pool_slot(&driver_data->output_pool,
(unsigned int)new_index);
if (slot == NULL) {
request_pool_release(&driver_data->output_pool,
(unsigned int)new_index);
free(save_buf);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
surface_object->source_index = slot->index;
surface_object->source_data = slot->data;
surface_object->source_size = slot->size;
surface_object->request_fd = slot->request_fd;
if (need > surface_object->source_size) {
/*
* Kernel rounded the new sizeimage down below what we
* needed drivers may clamp at their per-codec ceiling.
* Don't corrupt memory; surface the error to libva.
*/
request_log("codec_store_buffer_ensure_capacity: kernel returned sizeimage %u < required %zu\n",
surface_object->source_size, need);
free(save_buf);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
if (save_buf != NULL) {
memcpy(surface_object->source_data, save_buf, save_size);
free(save_buf);
}
return VA_STATUS_SUCCESS;
}
static VAStatus codec_store_buffer(struct request_data *driver_data,
struct object_context *context,
VAProfile profile,
struct object_surface *surface_object,
struct object_buffer *buffer_object)
{
switch (buffer_object->type) {
case VASliceDataBufferType: {
case VASliceDataBufferType:
/*
* Since there is no guarantee that the allocation
* order is the same as the submission order (via
* RenderPicture), we can't use a V4L2 buffer directly
* and have to copy from a regular buffer.
*
* Capacity guard (issue #13 + #15): surface_object->source_data
* points at an OUTPUT-pool mmap of size source_size, negotiated
* at S_FMT time. A stream-level resolution upshift can produce
* a slice larger than this allocation. Each append site below
* computes the post-append running total and calls
* codec_store_buffer_ensure_capacity, which transparently grows
* the OUTPUT pool (request_pool_resize) so the existing memcpy
* has room. The hard error path (VA_STATUS_ERROR_ALLOCATION_FAILED)
* only fires if both the heap save buffer AND the kernel-side
* grow fail at which point libavcodec recreates the surface.
*/
size_t need;
VAStatus ensure_rc;
if (context->h264_start_code) {
static const char start_code[3] = { 0x00, 0x00, 0x01 };
need = (size_t)surface_object->slices_size +
sizeof(start_code);
ensure_rc = codec_store_buffer_ensure_capacity(
driver_data, surface_object, need);
if (ensure_rc != VA_STATUS_SUCCESS)
return ensure_rc;
memcpy(surface_object->source_data +
memcpy(surface_object->source_data +
surface_object->slices_size,
start_code, sizeof(start_code));
surface_object->slices_size += sizeof(start_code);
}
/*
* iter33 α-30: VP8 OUTPUT buffer needs the uncompressed
* frame header that ffmpeg-vaapi stripped before submitting
* VASliceData. Hantro's vp8_dec_run reads OUTPUT[0..N] with
* an assumed offset of 10 bytes (keyframe) or 3 bytes
* (interframe) before the first_partition data see
* rockchip_vpu2_hw_vp8_dec.c:349.
*
* ffmpeg-vaapi (vaapi_vp8.c:191-192) strips
* header_size = 3 + 7 * s->keyframe
* before submitting the slice data, so libva needs to
* pre-pad the OUTPUT with that many bytes. Hantro only
* uses these bytes for offset arithmetic, not parsing,
* so zero-filled placeholder is sufficient.
*
* ffmpeg-v4l2request (kdirect path) does NOT strip the
* header, hence its OUTPUT is byte-equal to SW reference
* and decode works correctly. This is the only material
* difference between the two front-ends for VP8.
*
* key_frame in VAAPI's pic_fields.bits is INVERTED:
* 0 keyframe, 1 interframe.
*/
if (profile == VAProfileVP8Version0_3 &&
surface_object->params.vp8.iqmatrix_set /* picture parsed by now */) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
need = (size_t)surface_object->slices_size + header_size;
ensure_rc = codec_store_buffer_ensure_capacity(
driver_data, surface_object, need);
if (ensure_rc != VA_STATUS_SUCCESS)
return ensure_rc;
memset(surface_object->source_data +
surface_object->slices_size,
0, header_size);
surface_object->slices_size += header_size;
}
{
size_t payload = (size_t)buffer_object->size *
buffer_object->count;
need = (size_t)surface_object->slices_size + payload;
ensure_rc = codec_store_buffer_ensure_capacity(
driver_data, surface_object, need);
if (ensure_rc != VA_STATUS_SUCCESS)
return ensure_rc;
memcpy(surface_object->source_data +
surface_object->slices_size,
buffer_object->data, payload);
surface_object->slices_size += payload;
}
buffer_object->data,
buffer_object->size * buffer_object->count);
surface_object->slices_size +=
buffer_object->size * buffer_object->count;
surface_object->slices_count++;
break;
}
case VAPictureParameterBufferType:
switch (profile) {
@@ -320,37 +86,11 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.picture,
buffer_object->data,
sizeof(surface_object->params.h264.picture));
break;
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
memcpy(&surface_object->params.h265.picture,
buffer_object->data,
sizeof(surface_object->params.h265.picture));
break;
case VAProfileVP8Version0_3:
memcpy(&surface_object->params.vp8.picture,
buffer_object->data,
sizeof(surface_object->params.vp8.picture));
break;
case VAProfileVP9Profile0:
memcpy(&surface_object->params.vp9.picture,
buffer_object->data,
sizeof(surface_object->params.vp9.picture));
break;
case VAProfileAV1Profile0:
memcpy(&surface_object->params.av1.picture,
buffer_object->data,
sizeof(surface_object->params.av1.picture));
break;
default:
break;
}
@@ -363,41 +103,11 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.slice,
buffer_object->data,
sizeof(surface_object->params.h264.slice));
break;
case VAProfileHEVCMain:
case VAProfileHEVCMain10: {
unsigned int n = surface_object->params.h265.num_slices;
if (n < HEVC_MAX_SLICES_PER_FRAME) {
memcpy(&surface_object->params.h265.slices[n],
buffer_object->data,
sizeof(VASliceParameterBufferHEVC));
surface_object->params.h265.num_slices = n + 1;
}
/* Keep .slice mirror populated as last-slice ref for
* h265_fill_pps which reads dependent_slice_segment_flag */
memcpy(&surface_object->params.h265.slice,
buffer_object->data,
sizeof(surface_object->params.h265.slice));
break;
}
case VAProfileVP8Version0_3:
memcpy(&surface_object->params.vp8.slice,
buffer_object->data,
sizeof(surface_object->params.vp8.slice));
break;
case VAProfileVP9Profile0:
memcpy(&surface_object->params.vp9.slice,
buffer_object->data,
sizeof(surface_object->params.vp9.slice));
break;
default:
break;
}
@@ -418,40 +128,9 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.matrix,
buffer_object->data,
sizeof(surface_object->params.h264.matrix));
surface_object->params.h264.matrix_set = true;
break;
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
memcpy(&surface_object->params.h265.iqmatrix,
buffer_object->data,
sizeof(surface_object->params.h265.iqmatrix));
surface_object->params.h265.iqmatrix_set = true;
break;
case VAProfileVP8Version0_3:
memcpy(&surface_object->params.vp8.iqmatrix,
buffer_object->data,
sizeof(surface_object->params.vp8.iqmatrix));
surface_object->params.vp8.iqmatrix_set = true;
break;
default:
break;
}
break;
case VAProbabilityBufferType:
switch (profile) {
case VAProfileVP8Version0_3:
memcpy(&surface_object->params.vp8.probability,
buffer_object->data,
sizeof(surface_object->params.vp8.probability));
surface_object->params.vp8.probability_set = true;
break;
default:
@@ -486,51 +165,14 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileH264High10:
rc = h264_set_controls(driver_data, context, profile,
surface_object);
rc = h264_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
case VAProfileHEVCMain:
case VAProfileHEVCMain10:
rc = h265_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
case VAProfileVP8Version0_3:
rc = vp8_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
case VAProfileVP9Profile0:
rc = vp9_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
case VAProfileAV1Profile0:
/*
* Populates V4L2_CID_STATELESS_AV1_SEQUENCE from
* VAPictureParameterBufferAV1. The daedalus_v4l2 daemon
* (issue #11 daemon track) synthesises an OBU_SEQUENCE_HEADER
* from this ctrl and prepends it to the slice bitstream
* before handing it to libavcodec/libdav1d, which otherwise
* cannot parse the (sequence-header-stripped) OUTPUT buffer
* that ffmpeg-vaapi delivers.
*
* On the RK3588 vpu981 hardware path the same SEQUENCE ctrl
* is harmless: vpu981's driver parses the OBU stream
* directly and ignores the ctrl payload, so no per-decoder
* gating is required here.
*/
rc = av1_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
/* HEVC stripped: kernel V4L2_CID_MPEG_VIDEO_HEVC_* CIDs were renamed
* to V4L2_CID_STATELESS_HEVC_* upstream, and ohm's hantro VPU has no
* HEVC support anyway. Falls through to the default case below. */
default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
@@ -545,9 +187,6 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
struct request_data *driver_data = context->pDriverData;
struct object_context *context_object;
struct object_surface *surface_object;
struct request_pool_slot *slot;
int slot_index;
context_object = CONTEXT(driver_data, context_id);
if (context_object == NULL)
@@ -560,85 +199,6 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
if (surface_object->status == VASurfaceRendering)
RequestSyncSurface(context, surface_id);
/*
* Iter2 Fix 3: acquire a CAPTURE-pool slot for this decode cycle.
* If the surface still holds a slot from a prior decode (DECODED
* or EXPORTED the consumer is done with it by definition since
* we got back to BeginPicture for the same surface), release it
* first. The new slot is bound and its V4L2 index + mmap pointers
* are mirrored into surface_object->destination_* so the existing
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
*/
if (surface_object->current_slot != NULL)
surface_unbind_slot(driver_data, surface_object);
{
struct cap_pool_slot *cap_slot =
cap_pool_acquire(&driver_data->capture_pool, surface_id);
if (cap_slot == NULL)
return VA_STATUS_ERROR_ALLOCATION_FAILED;
surface_bind_slot(surface_object, cap_slot);
/*
* iter8 Phase 7 IMP-1 experiment: env-gated CAPTURE buffer
* pre-zero. LIBVA_V4L2_ZERO_CAPTURE=1 wipes the slot's mmap'd
* region before kernel decode. Discriminates "kernel writes
* partial then aborts" from "kernel writes nothing and we
* see stale residue."
*/
{
static const char *zero_env = NULL;
static bool zero_env_checked = false;
if (!zero_env_checked) {
zero_env = getenv("LIBVA_V4L2_ZERO_CAPTURE");
zero_env_checked = true;
}
if (zero_env != NULL && zero_env[0] == '1') {
unsigned int b;
for (b = 0; b < cap_slot->buffers_count; b++)
if (cap_slot->map[b] != NULL)
memset(cap_slot->map[b], 0,
cap_slot->map_lengths[b]);
}
}
}
/*
* Borrow an OUTPUT (bitstream-input) slot from the driver-wide
* pool for the duration of this Begin/Render/End cycle. The
* surface's source_* fields hold the borrow's mmap pointer/size/
* V4L2 buffer index until RequestSyncSurface releases it after
* VIDIOC_DQBUF.
*/
slot_index = request_pool_acquire(&driver_data->output_pool);
if (slot_index < 0)
return VA_STATUS_ERROR_ALLOCATION_FAILED;
slot = request_pool_slot(&driver_data->output_pool,
(unsigned int)slot_index);
if (slot == NULL) {
request_pool_release(&driver_data->output_pool,
(unsigned int)slot_index);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
surface_object->source_index = slot->index;
surface_object->source_data = slot->data;
surface_object->source_size = slot->size;
/*
* iter6: bind the slot's permanent request_fd to this surface for the
* duration of the decode cycle. Replaces the iter4 close+alloc-per-
* frame model. The fd is REINIT'd (not closed) at RequestSyncSurface,
* so the kernel-side request object is reset in place no fd-reuse
* race with another slot's pending decode.
*/
surface_object->request_fd = slot->request_fd;
surface_object->slices_size = 0;
surface_object->slices_count = 0;
surface_object->params.h264.matrix_set = false;
surface_object->params.h265.num_slices = 0;
surface_object->params.vp8.iqmatrix_set = false;
surface_object->params.vp8.probability_set = false;
surface_object->status = VASurfaceRendering;
context_object->render_surface_id = surface_id;
@@ -674,8 +234,7 @@ VAStatus RequestRenderPicture(VADriverContextP context, VAContextID context_id,
if (buffer_object == NULL)
return VA_STATUS_ERROR_INVALID_BUFFER;
rc = codec_store_buffer(driver_data, context_object,
config_object->profile,
rc = codec_store_buffer(driver_data, config_object->profile,
surface_object, buffer_object);
if (rc != VA_STATUS_SUCCESS)
return rc;
@@ -716,75 +275,22 @@ VAStatus RequestEndPicture(VADriverContextP context, VAContextID context_id)
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
/*
* iter9 α-7: monotonic per-context counter instead of gettimeofday,
* so DPB.reference_ts / OUTPUT QBUF ts stay small (matches
* ffmpeg-v4l2request's pattern). Confirmed in iter30 sweep
* (1×, 1000×, 1000000× multipliers all produce identical output);
* the counter scheme works on both rkvdec and hantro vb2_find_buffer.
*/
context_object->timestamp_counter++;
surface_object->timestamp.tv_sec =
(time_t)(context_object->timestamp_counter / 1000000);
surface_object->timestamp.tv_usec =
(suseconds_t)(context_object->timestamp_counter % 1000000);
gettimeofday(&surface_object->timestamp, NULL);
/*
* iter6: request_fd was bound to the surface in BeginPicture from
* the OUTPUT pool slot's permanent fd. Per-frame allocation is gone.
*/
request_fd = surface_object->request_fd;
if (request_fd < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
if (request_fd < 0) {
request_fd = media_request_alloc(driver_data->media_fd);
if (request_fd < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
surface_object->request_fd = request_fd;
}
rc = codec_set_controls(driver_data, context_object,
config_object->profile, surface_object);
if (rc != VA_STATUS_SUCCESS)
return rc;
/*
* iter14 α-16: env-gated dump of OUTPUT bitstream bytes immediately
* before QBUF. LIBVA_V4L2_DUMP_OUTPUT=<dir> writes source_data[0..
* slices_size] to <dir>/output_<profile>_<surface>_<frame>.bin.
* Discriminates whether libva writes the same H.264/HEVC slice bytes
* as kdirect if YES, Bug 4/5 are not in the OUTPUT-side; if NO,
* narrow to which slice-write path produces the divergence.
*
* Off by default; no behavior change when env unset.
*/
{
static const char *dump_env = NULL;
static bool dump_env_checked = false;
if (!dump_env_checked) {
dump_env = getenv("LIBVA_V4L2_DUMP_OUTPUT");
dump_env_checked = true;
}
if (dump_env != NULL && dump_env[0] != '\0' &&
surface_object->source_data != NULL &&
surface_object->slices_size > 0) {
char path[256];
snprintf(path, sizeof(path),
"%s/output_p%d_s%u_t%llu.bin",
dump_env, (int)config_object->profile,
(unsigned int)surface_object->base.id,
(unsigned long long)context_object->timestamp_counter);
FILE *fp = fopen(path, "wb");
if (fp != NULL) {
size_t w = fwrite(surface_object->source_data,
1, surface_object->slices_size,
fp);
request_log("α-16: dumped %zu bytes to %s "
"(slices_count=%u)\n",
w, path,
surface_object->slices_count);
fclose(fp);
} else {
request_log("α-16: fopen(%s) failed: %s\n",
path, strerror(errno));
}
}
}
rc = v4l2_queue_buffer(driver_data->video_fd, -1, capture_type, NULL,
surface_object->destination_index, 0,
surface_object->destination_buffers_count);
+19 -774
View File
@@ -25,12 +25,10 @@
*/
#include "buffer.h"
#include "cap_pool.h"
#include "config.h"
#include "context.h"
#include "image.h"
#include "picture.h"
#include "request_pool.h"
#include "subpicture.h"
#include "surface.h"
@@ -41,9 +39,9 @@
#include "request.h"
#include "utils.h"
#include "v4l2.h"
#include "video.h"
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
@@ -54,508 +52,8 @@
#include <sys/ioctl.h>
#include <linux/media.h>
#include <linux/videodev2.h>
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
/*
* fresnel-fourier iter4 Phase 6 commit Z + iter7 Phase 6 (B1a): device-path
* auto-detect via media controller topology with decoder-entity discrimination.
*
* Pre-iter4 the backend hardcoded /dev/video0 + /dev/media0. On Linux 7.0 the
* udev/probe order changed and rockchip-rga (an RGB color converter, no codec
* support) now claims /dev/video0 the legacy default returns an empty
* profile list. iter4 commit Z replaced enumeration-order discovery with
* media-topology discovery.
*
* iter7 (B1a): the iter4 walk treated the hantro-vpu driver name as a single
* unit, but hantro-vpu registers BOTH encoder and decoder entities under one
* /dev/mediaN on RK3399. iter4's "pick the first V4L_VIDEO interface" could
* land on the encoder. iter7 walks ENTITIES looking for
* MEDIA_ENT_F_PROC_VIDEO_DECODER, then follows the kernel's link graph
* (data link from proc to IO entity, interface link from IO entity to V4L
* interface) to the correct /dev/videoN.
*
* Two-pass to prefer rkvdec: pass 1 accepts only "rkvdec" (multi-codec
* decoder, 3 of 5 codecs); pass 2 accepts any known decoder driver. On
* RK3399 this makes auto-detect always pick rkvdec when available.
*
* iter4-B1b (multi-decoder routing open BOTH rkvdec AND hantro from one
* backend instance, dispatch per codec) is still deferred. Post-iter7 the
* backend opens one decoder per process; MPEG-2/VP8 (hantro) still need
* explicit LIBVA_V4L2_REQUEST_VIDEO_PATH override when iter7's pass-1
* lands on rkvdec.
*
* Escape hatch: LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 reverts to legacy
* hardcoded /dev/video0 + /dev/media0 for callers that relied on it.
*/
static const char * const known_decoder_drivers[] = {
"rkvdec",
"hantro-vpu",
"rpi-hevc-dec", /* iter40: Pi 5 / CM5 stateless HEVC */
#ifdef HAVE_DAEDALUS_V4L2
"daedalus_v4l2", /* phase 8.10: Pi 5 daemon-backed VP9/AV1/H264 */
#endif
"cedrus",
"sun4i_csi",
NULL
};
static int resolve_dev_node(uint32_t major, uint32_t minor, char *out, size_t out_sz)
{
char sysfs_path[64], target[256];
ssize_t n;
const char *base;
snprintf(sysfs_path, sizeof sysfs_path, "/sys/dev/char/%u:%u", major, minor);
n = readlink(sysfs_path, target, sizeof target - 1);
if (n < 0)
return -1;
target[n] = '\0';
base = strrchr(target, '/');
base = base ? base + 1 : target;
snprintf(out, out_sz, "/dev/%s", base);
return 0;
}
/*
* iter7 B1a: walk topology graph from decoder-proc entity to its V4L_VIDEO
* interface. Returns 0 + sets video_out on success, -1 if this media device
* has no decoder entity (e.g. encoder-only device).
*
* Algorithm (per Phase 5 review, empirically validated against
* boltzmann:~/src/linux-rockchip):
* 1. For each entity E with function == MEDIA_ENT_F_PROC_VIDEO_DECODER:
* 2. Find IO entity neighbors via DATA links (entityentity).
* 3. Find the V4L_VIDEO interface via INTERFACE links from those IO
* neighbors.
* 4. Resolve interface.devnode.major:minor to /dev/videoN.
*
* Two-call MEDIA_IOC_G_TOPOLOGY pattern (Phase 5 IMP-3): first call gets
* counts; second call fills the three arrays after we allocate them.
*
* Link discrimination via MEDIA_LNK_FL_INTERFACE_LINK (1U<<28):
* data links have flags & MEDIA_LNK_FL_INTERFACE_LINK == 0; interface
* links have it set. source_id/sink_id ordering is not guaranteed
* check both endpoints.
*/
static int find_decoder_video_node_via_topology(int media_fd,
char *video_out,
size_t video_out_sz)
{
struct media_v2_topology topo;
struct media_v2_entity *entities = NULL;
struct media_v2_interface *interfaces = NULL;
struct media_v2_link *links = NULL;
struct media_v2_pad *pads = NULL;
int ret = -1;
unsigned int i, j;
memset(&topo, 0, sizeof topo);
if (ioctl(media_fd, MEDIA_IOC_G_TOPOLOGY, &topo) < 0)
return -1;
if (topo.num_entities == 0 || topo.num_interfaces == 0 ||
topo.num_links == 0 || topo.num_pads == 0)
return -1;
entities = calloc(topo.num_entities, sizeof *entities);
interfaces = calloc(topo.num_interfaces, sizeof *interfaces);
links = calloc(topo.num_links, sizeof *links);
pads = calloc(topo.num_pads, sizeof *pads);
if (!entities || !interfaces || !links || !pads)
goto out;
topo.ptr_entities = (uintptr_t)entities;
topo.ptr_interfaces = (uintptr_t)interfaces;
topo.ptr_links = (uintptr_t)links;
topo.ptr_pads = (uintptr_t)pads;
if (ioctl(media_fd, MEDIA_IOC_G_TOPOLOGY, &topo) < 0)
goto out;
for (i = 0; i < topo.num_entities; i++) {
uint32_t proc_id;
uint32_t proc_pad_ids[16];
uint32_t io_entity_ids[16];
unsigned int proc_pad_count = 0;
unsigned int io_count = 0;
if (entities[i].function != MEDIA_ENT_F_PROC_VIDEO_DECODER)
continue;
proc_id = entities[i].id;
/* Step 2a: collect pads belonging to the proc entity. Data
* links connect PADs, not entities directly. */
for (j = 0; j < topo.num_pads; j++) {
if (pads[j].entity_id != proc_id)
continue;
if (proc_pad_count < (sizeof proc_pad_ids /
sizeof proc_pad_ids[0]))
proc_pad_ids[proc_pad_count++] = pads[j].id;
}
/* Step 2b: walk data links. For each link with either endpoint
* in proc_pad_ids[], the other endpoint is a pad belonging to
* an IO neighbor. Resolve that pad's entity_id via pads[]. */
for (j = 0; j < topo.num_links; j++) {
uint32_t other_pad = 0;
unsigned int k;
if (links[j].flags & MEDIA_LNK_FL_INTERFACE_LINK)
continue;
for (k = 0; k < proc_pad_count; k++) {
if (links[j].source_id == proc_pad_ids[k])
other_pad = links[j].sink_id;
else if (links[j].sink_id == proc_pad_ids[k])
other_pad = links[j].source_id;
if (other_pad != 0)
break;
}
if (other_pad == 0)
continue;
/* Resolve other_pad to its entity_id. */
for (k = 0; k < topo.num_pads; k++) {
if (pads[k].id != other_pad)
continue;
if (io_count < (sizeof io_entity_ids /
sizeof io_entity_ids[0]))
io_entity_ids[io_count++] =
pads[k].entity_id;
break;
}
}
/* Step 3-4: find an interface link from any IO entity neighbor;
* resolve devnode for the linked V4L_VIDEO interface.
* Interface links connect interfacesentities directly (not
* via pads), so source_id/sink_id is an entity ID on one side
* and an interface ID on the other. */
for (j = 0; j < topo.num_links; j++) {
uint32_t intf_id = 0;
unsigned int k;
if (!(links[j].flags & MEDIA_LNK_FL_INTERFACE_LINK))
continue;
for (k = 0; k < io_count; k++) {
if (links[j].source_id == io_entity_ids[k])
intf_id = links[j].sink_id;
else if (links[j].sink_id == io_entity_ids[k])
intf_id = links[j].source_id;
if (intf_id != 0)
break;
}
if (intf_id == 0)
continue;
for (k = 0; k < topo.num_interfaces; k++) {
if (interfaces[k].id != intf_id)
continue;
if (interfaces[k].intf_type !=
MEDIA_INTF_T_V4L_VIDEO)
break;
if (resolve_dev_node(
interfaces[k].devnode.major,
interfaces[k].devnode.minor,
video_out, video_out_sz) == 0)
ret = 0;
break;
}
if (ret == 0)
goto out;
}
}
out:
free(entities);
free(interfaces);
free(links);
free(pads);
return ret;
}
/*
* iter7 B1a: two-pass walk of /dev/media0..N. Pass 1 accepts only "rkvdec"
* (multi-codec decoder serving 3 of 5 codecs). Pass 2 accepts any
* known_decoder_drivers entry. Within each pass, the chosen media device
* must ALSO contain at least one MEDIA_ENT_F_PROC_VIDEO_DECODER entity
* guards against encoder-only devices that happen to share the same driver
* name (e.g. hantro-vpu encoder vs decoder inside one /dev/mediaN).
*/
/*
* iter38: locate a /dev/mediaN whose driver name matches `want_driver`
* AND exposes at least one MEDIA_ENT_F_PROC_VIDEO_DECODER entity (rules
* out encoder-only devices sharing the same driver name). Resolves the
* matching /dev/videoM via topology graph walk.
*
* `want_driver`:
* - non-NULL match only that exact driver name
* - NULL match any name in known_decoder_drivers[]
*/
/*
* iter2 (ampere-kernel-decoders campaign) runtime probe for the
* V4L2 stateless HEVC EXT_SPS_{ST,LT}_RPS controls added in
* Linux 7.0 (Casanova VDPU381/VDPU383 series). Returns true iff BOTH
* controls are registered on the given fd. Stored per-fd on
* driver_data so the multi-device-probe model (iter38) doesn't
* silently misbehave when codec routing switches devices.
*
* The two CIDs together are the gate neither alone is meaningful
* without the other (st-RPS + lt-RPS arrays both need to be set to
* match the SPS num_short_term_ref_pic_sets / num_long_term_ref_pics_sps
* counts). Old kernels (RK3399 rkvdec on linux 6.x) register neither;
* RK3588 rkvdec (VDPU381/383 path) registers both.
*
* Reference: phase4_plan_iter2.md §Step 3 in
* ~/src/ampere-kernel-decoders/.
*/
static bool probe_hevc_ext_sps_rps_controls(int video_fd)
{
struct v4l2_queryctrl q;
if (video_fd < 0)
return false;
memset(&q, 0, sizeof(q));
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS;
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
return false;
memset(&q, 0, sizeof(q));
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS;
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
return false;
return true;
}
static int find_decoder_device_by_driver(const char *want_driver,
char *video_out, size_t video_out_sz,
char *media_out, size_t media_out_sz)
{
struct media_device_info info;
char path[32];
const char * const *kd;
int fd, i;
for (i = 0; i < 16; i++) {
bool match;
snprintf(path, sizeof path, "/dev/media%d", i);
fd = open(path, O_RDWR | O_NONBLOCK);
if (fd < 0)
continue;
memset(&info, 0, sizeof info);
if (ioctl(fd, MEDIA_IOC_DEVICE_INFO, &info) != 0) {
close(fd);
continue;
}
if (want_driver != NULL) {
match = (strcmp(info.driver, want_driver) == 0);
} else {
match = false;
for (kd = known_decoder_drivers; *kd; kd++) {
if (strcmp(info.driver, *kd) == 0) {
match = true;
break;
}
}
}
if (!match) {
close(fd);
continue;
}
if (find_decoder_video_node_via_topology(
fd, video_out, video_out_sz) == 0) {
snprintf(media_out, media_out_sz, "%s", path);
close(fd);
return 0;
}
close(fd);
}
return -1;
}
static int find_codec_device(char *video_out, size_t video_out_sz,
char *media_out, size_t media_out_sz)
{
if (find_decoder_device_by_driver("rkvdec",
video_out, video_out_sz,
media_out, media_out_sz) == 0)
return 0;
return find_decoder_device_by_driver(NULL,
video_out, video_out_sz,
media_out, media_out_sz);
}
/*
* iter38: profile which physical decoder device should serve it on
* RK3399. Returns 'r' for rkvdec, 'h' for hantro, '?' for unknown.
*
* This is RK3399-shaped knowledge a more general impl would interrogate
* each open device's supported OUTPUT formats. For the campaign-scope
* five codecs, the mapping is stable and explicit.
*/
char request_device_kind_for_profile(VAProfile profile);
char request_device_kind_for_profile(VAProfile profile)
{
switch (profile) {
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileHEVCMain:
case VAProfileVP9Profile0:
return 'r';
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
case VAProfileVP8Version0_3:
return 'h';
case VAProfileAV1Profile0:
/*
* ampere-av1-enablement Phase 2: RK3588 vpu981 dedicated
* AV1 hantro instance. 'a' kind dispatches to
* driver_data->video_fd_vpu981. On hosts without the AV1
* instance the fd stays -1 and RequestQueryConfigProfiles
* never enumerates AV1, so this branch is unreachable for
* non-RK3588 hosts.
*/
return 'a';
default:
return '?';
}
}
/*
* iter38: retarget driver_data->{video,media}_fd to the device kind
* required by `profile`. If a switch is needed, tear down any per-device
* pool state so the next RequestCreateContext rebuilds it against the
* new device. Returns 0 on success, -1 if the required device wasn't
* probed (e.g. trying VP8 on a system without hantro).
*
* Safe to call repeatedly with the same profile: if the active fd
* already matches, the function is a no-op.
*/
int request_switch_device_for_profile(struct request_data *driver_data,
VAProfile profile);
int request_switch_device_for_profile(struct request_data *driver_data,
VAProfile profile)
{
char kind = request_device_kind_for_profile(profile);
int target_video, target_media;
/*
* iter40: HEVC override when rpi-hevc-dec is probed. The static
* table (request_device_kind_for_profile) maps HEVC 'r' (rkvdec)
* because that's the canonical RK path. On Pi 5 there's no rkvdec
* rpi-hevc-dec is the only decoder. When BOTH would be present
* (hypothetical mixed board), prefer rpi-hevc-dec for HEVC.
*
* Other rkvdec-routed profiles (VP9, H.264) stay on 'r' because
* rpi-hevc-dec is HEVC-only.
*/
if ((profile == VAProfileHEVCMain || profile == VAProfileHEVCMain10) &&
driver_data->video_fd_rpi_hevc_dec >= 0 &&
driver_data->media_fd_rpi_hevc_dec >= 0) {
kind = 'p';
}
#ifdef HAVE_DAEDALUS_V4L2
/*
* LIBVA-1: VP9/AV1/H.264 daedalus_v4l2 when the daemon-backed
* decoder fd is open. Pi 5 has no rkvdec (those profiles map to
* 'r' by default video_fd_rkvdec = -1 "stay on whatever's
* active" fallback would put H.264 frames on rpi-hevc-dec's fd
* and S_FMT would fail). Re-route to the daedalus daemon instead.
*
* HEVC stays on 'p' (rpi-hevc-dec is HEVC-only daedalus would
* accept it via FFmpeg, but rpi-hevc-dec has the GPU-backed
* hardware path so it's the right choice on this SoC).
*
* AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was probed.
* On a Pi 5 the vpu981 slot stays -1, so we still route AV1 to
* daedalus here. Check video_fd_vpu981 to preserve the RK3588
* priority for that case.
*/
if (driver_data->video_fd_daedalus >= 0 &&
driver_data->media_fd_daedalus >= 0) {
switch (profile) {
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileVP9Profile0:
kind = 'd';
break;
case VAProfileAV1Profile0:
if (driver_data->video_fd_vpu981 < 0)
kind = 'd';
break;
default:
break;
}
}
#endif
if (kind == 'r') {
target_video = driver_data->video_fd_rkvdec;
target_media = driver_data->media_fd_rkvdec;
} else if (kind == 'h') {
target_video = driver_data->video_fd_hantro;
target_media = driver_data->media_fd_hantro;
} else if (kind == 'p') {
target_video = driver_data->video_fd_rpi_hevc_dec;
target_media = driver_data->media_fd_rpi_hevc_dec;
} else if (kind == 'a') {
target_video = driver_data->video_fd_vpu981;
target_media = driver_data->media_fd_vpu981;
#ifdef HAVE_DAEDALUS_V4L2
} else if (kind == 'd') {
target_video = driver_data->video_fd_daedalus;
target_media = driver_data->media_fd_daedalus;
#endif
} else {
return -1;
}
/* Either side never probed (e.g. env-override single-device init,
* or this kind isn't present on the running kernel) tolerate by
* staying on whatever's already active. RequestCreateConfig still
* accepted the profile via the format probe, so the active fd
* supports it. */
if (target_video < 0 || target_media < 0)
return 0;
if (driver_data->video_fd == target_video &&
driver_data->media_fd == target_media)
return 0; /* already active, nothing to do */
/*
* Tear down any per-device pool state. cap_pool needs capture_type,
* which comes from video_format. Both rkvdec and hantro use
* V4L2_PIX_FMT_NV12 MPLANE on RK3399 (verified Phase 0 inventory)
* so the MPLANE form is always right here.
*/
if (driver_data->capture_pool.initialized) {
cap_pool_destroy(&driver_data->capture_pool,
driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
}
if (driver_data->output_pool.initialized)
request_pool_destroy(&driver_data->output_pool);
/* video_format is a static-ref pointer; re-probe on next
* CreateContext since the new device's format menu may differ. */
driver_data->video_format = NULL;
driver_data->fmt_valid = false;
driver_data->video_fd = target_video;
driver_data->media_fd = target_media;
return 0;
}
/* Set default visibility for the init function only. */
VAStatus __attribute__((visibility("default")))
VA_DRIVER_INIT_FUNC(VADriverContextP context);
@@ -649,23 +147,9 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
object_heap_init(&driver_data->image_heap, sizeof(struct object_image),
IMAGE_ID_OFFSET);
static char auto_video[32], auto_media[32];
bool auto_media_set = false;
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) {
video_path = "/dev/video0";
} else if (find_codec_device(auto_video, sizeof auto_video,
auto_media, sizeof auto_media) == 0) {
video_path = auto_video;
auto_media_set = true;
request_log("auto-selected codec device: %s + %s\n",
auto_video, auto_media);
} else {
video_path = "/dev/video0";
}
}
if (video_path == NULL)
video_path = "/dev/video0";
video_fd = open(video_path, O_RDWR | O_NONBLOCK);
if (video_fd < 0)
@@ -686,12 +170,8 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
}
media_path = getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH");
if (media_path == NULL) {
if (auto_media_set)
media_path = auto_media;
else
media_path = "/dev/media0";
}
if (media_path == NULL)
media_path = "/dev/media0";
media_fd = open(media_path, O_RDWR | O_NONBLOCK);
if (media_fd < 0)
@@ -699,214 +179,19 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
driver_data->video_fd = video_fd;
driver_data->media_fd = media_fd;
driver_data->video_fd_rkvdec = -1;
driver_data->media_fd_rkvdec = -1;
driver_data->video_fd_hantro = -1;
driver_data->media_fd_hantro = -1;
driver_data->video_fd_rpi_hevc_dec = -1;
driver_data->media_fd_rpi_hevc_dec = -1;
driver_data->video_fd_daedalus = -1;
driver_data->media_fd_daedalus = -1;
driver_data->video_fd_vpu981 = -1;
driver_data->media_fd_vpu981 = -1;
/*
* iter38: probe BOTH rkvdec and hantro-vpu so a single libva session
* can serve all 5 codecs. Tag the primary fd (already opened above)
* by inspecting which driver the media_fd is on, then probe the OTHER
* driver and open its fds if present. RequestCreateConfig retargets
* driver_data->{video,media}_fd to the right pair per profile.
*
* Skip the alt-probe when the user provided explicit
* LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH overrides they signal
* a specific single device intent.
*/
if (!getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH") &&
!getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH")) {
struct media_device_info info;
const char *primary_driver = NULL;
const char *alt_driver = NULL;
memset(&info, 0, sizeof info);
if (ioctl(media_fd, MEDIA_IOC_DEVICE_INFO, &info) == 0) {
if (strcmp(info.driver, "rkvdec") == 0) {
primary_driver = "rkvdec";
alt_driver = "hantro-vpu";
driver_data->video_fd_rkvdec = video_fd;
driver_data->media_fd_rkvdec = media_fd;
} else if (strcmp(info.driver, "hantro-vpu") == 0) {
primary_driver = "hantro-vpu";
alt_driver = "rkvdec";
driver_data->video_fd_hantro = video_fd;
driver_data->media_fd_hantro = media_fd;
} else if (strcmp(info.driver, "rpi-hevc-dec") == 0) {
/* iter40 + LIBVA-1: Pi 5 / CM5. rpi-hevc-dec is
* HEVC-only. If daedalus_v4l2 is ALSO loaded (Pi 5
* mixed deployment out-of-tree daemon-backed
* decoder for VP9/AV1/H264), pick it up as the alt
* so VP9/AV1/H264 have somewhere to land. */
primary_driver = "rpi-hevc-dec";
#ifdef HAVE_DAEDALUS_V4L2
alt_driver = "daedalus_v4l2";
#else
alt_driver = NULL;
#endif
driver_data->video_fd_rpi_hevc_dec = video_fd;
driver_data->media_fd_rpi_hevc_dec = media_fd;
#ifdef HAVE_DAEDALUS_V4L2
} else if (strcmp(info.driver, "daedalus_v4l2") == 0) {
/* phase 8.10 + LIBVA-1: Pi 5 daemon-backed decoder.
* VP9 / AV1 / H.264 route through it via the 'd'
* kind below. On a mixed-driver box where
* rpi-hevc-dec is ALSO loaded, pick it up as the
* alt so HEVC has somewhere to land too find_
* codec_device's known_decoder_drivers[] order
* normally puts rpi-hevc-dec first (we hit the
* other branch in practice), but symmetric handling
* keeps us correct if probe order ever flips. */
primary_driver = "daedalus_v4l2";
alt_driver = "rpi-hevc-dec";
driver_data->video_fd_daedalus = video_fd;
driver_data->media_fd_daedalus = media_fd;
#endif
}
}
if (alt_driver != NULL) {
static char alt_video[32], alt_media[32];
if (find_decoder_device_by_driver(alt_driver,
alt_video, sizeof alt_video,
alt_media, sizeof alt_media) == 0) {
int alt_v = open(alt_video, O_RDWR | O_NONBLOCK);
int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1;
if (alt_v >= 0 && alt_m >= 0) {
/* Dispatch into the matching per-driver slot.
* iter38 only had rkvdec/hantro pairs; iter40 +
* LIBVA-1 extended this to rpi-hevc-dec and
* daedalus_v4l2 for the Pi 5 mixed-decoder
* deployment. */
if (strcmp(alt_driver, "rkvdec") == 0) {
driver_data->video_fd_rkvdec = alt_v;
driver_data->media_fd_rkvdec = alt_m;
} else if (strcmp(alt_driver, "hantro-vpu") == 0) {
driver_data->video_fd_hantro = alt_v;
driver_data->media_fd_hantro = alt_m;
} else if (strcmp(alt_driver, "rpi-hevc-dec") == 0) {
driver_data->video_fd_rpi_hevc_dec = alt_v;
driver_data->media_fd_rpi_hevc_dec = alt_m;
#ifdef HAVE_DAEDALUS_V4L2
} else if (strcmp(alt_driver, "daedalus_v4l2") == 0) {
driver_data->video_fd_daedalus = alt_v;
driver_data->media_fd_daedalus = alt_m;
#endif
} else {
/* Shouldn't happen — primary_driver branches
* above only set alt_driver to one of the
* names handled here. Close and move on. */
close(alt_v);
close(alt_m);
alt_v = -1;
alt_m = -1;
}
if (alt_v >= 0) {
request_log("iter38: also opened %s decoder at %s + %s\n",
alt_driver, alt_video, alt_media);
}
} else {
if (alt_v >= 0) close(alt_v);
if (alt_m >= 0) close(alt_m);
}
}
}
(void)primary_driver;
/*
* ampere-av1-enablement Phase 2: walk hantro-vpu media nodes
* for a SECOND one that advertises V4L2_PIX_FMT_AV1_FRAME
* (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances
* (legacy MPEG2/VP8 decoder, vepu121 encoder, vpu981 AV1
* decoder) all reporting driver="hantro-vpu" / model="hantro-
* vpu" — so OUTPUT-format probe is the only reliable
* disambiguator that doesn't depend on parsing card-name
* strings (which are DTS-dependent). First match wins.
*
* On non-RK3588 hosts the slot stays -1; RequestQueryConfig
* Profiles' AV1 push then no-ops because any_fd_supports_
* output_format() returns false for AV1F.
*/
{
int i;
char path[32], av1_video[32];
for (i = 0; i < 16; i++) {
int mfd, vfd;
struct media_device_info info;
snprintf(path, sizeof path, "/dev/media%d", i);
mfd = open(path, O_RDWR | O_NONBLOCK);
if (mfd < 0) continue;
memset(&info, 0, sizeof info);
if (ioctl(mfd, MEDIA_IOC_DEVICE_INFO, &info) != 0 ||
strcmp(info.driver, "hantro-vpu") != 0) {
close(mfd);
continue;
}
if (find_decoder_video_node_via_topology(
mfd, av1_video, sizeof av1_video) != 0) {
close(mfd);
continue;
}
vfd = open(av1_video, O_RDWR | O_NONBLOCK);
if (vfd < 0) {
close(mfd);
continue;
}
if (!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT, V4L2_PIX_FMT_AV1_FRAME) &&
!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, V4L2_PIX_FMT_AV1_FRAME)) {
close(vfd);
close(mfd);
continue;
}
driver_data->video_fd_vpu981 = vfd;
driver_data->media_fd_vpu981 = mfd;
request_log("ampere-av1: vpu981 AV1 decoder at %s + %s\n",
av1_video, path);
break;
}
}
}
/*
* iter2 (ampere-kernel-decoders): probe the new HEVC EXT_SPS_RPS
* controls on each rkvdec/hantro fd. Result is consumed by
* h265_set_controls per-codec gate. Per-fd storage matches the
* iter38 multi-device-probe pattern (Phase 5 review item).
*/
driver_data->has_hevc_ext_sps_rps_rkvdec =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rkvdec);
driver_data->has_hevc_ext_sps_rps_hantro =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_hantro);
driver_data->has_hevc_ext_sps_rps_rpi_hevc_dec =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rpi_hevc_dec);
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
request_log("iter2: kernel registers HEVC EXT_SPS_{ST,LT}_RPS "
"controls on rkvdec fd (will route through "
"vendored GStreamer parser)\n");
}
if (driver_data->video_fd_rpi_hevc_dec >= 0) {
request_log("iter40: also opened rpi-hevc-dec at video_fd=%d "
"media_fd=%d (Pi 5 HEVC stateless)\n",
driver_data->video_fd_rpi_hevc_dec,
driver_data->media_fd_rpi_hevc_dec);
}
#ifdef HAVE_DAEDALUS_V4L2
if (driver_data->video_fd_daedalus >= 0) {
request_log("phase 8.10: opened daedalus_v4l2 at video_fd=%d "
"media_fd=%d (Pi 5 daemon-backed VP9/AV1/H264)\n",
driver_data->video_fd_daedalus,
driver_data->media_fd_daedalus);
}
#endif
/* Probe the CAPTURE pixel format eagerly so vaCreateContext doesn't
* race ahead of vaCreateSurfaces. Fine if this returns NULL we
* still let the driver init succeed and let surface creation try
* again later (preserves the original lazy behaviour for any caller
* that needs it). */
driver_data->video_format = video_format_probe(video_fd);
if (driver_data->video_format != NULL)
request_log("Init: detected CAPTURE format %s (mplane=%d)\n",
driver_data->video_format->description,
driver_data->video_format->v4l2_mplane);
else
request_log("Init: no CAPTURE format detected at probe time\n");
status = VA_STATUS_SUCCESS;
goto complete;
@@ -934,48 +219,8 @@ VAStatus RequestTerminate(VADriverContextP context)
struct object_config *config_object;
int iterator;
/*
* Tear down the OUTPUT buffer pool before closing video_fd so
* the munmap calls in request_pool_destroy() can still touch the
* mmap regions (which are tied to that fd's lifetime).
*/
request_pool_destroy(&driver_data->output_pool);
/*
* iter38: close both probed device pairs. video_fd / media_fd above
* are ACTIVE pointers into one of these pairs; close the underlying
* fds explicitly. Each may be -1 if its device wasn't found.
*/
if (driver_data->video_fd_rkvdec >= 0)
close(driver_data->video_fd_rkvdec);
if (driver_data->media_fd_rkvdec >= 0)
close(driver_data->media_fd_rkvdec);
if (driver_data->video_fd_hantro >= 0)
close(driver_data->video_fd_hantro);
if (driver_data->media_fd_hantro >= 0)
close(driver_data->media_fd_hantro);
if (driver_data->video_fd_rpi_hevc_dec >= 0)
close(driver_data->video_fd_rpi_hevc_dec);
if (driver_data->media_fd_rpi_hevc_dec >= 0)
close(driver_data->media_fd_rpi_hevc_dec);
if (driver_data->video_fd_vpu981 >= 0)
close(driver_data->video_fd_vpu981);
if (driver_data->media_fd_vpu981 >= 0)
close(driver_data->media_fd_vpu981);
#ifdef HAVE_DAEDALUS_V4L2
if (driver_data->video_fd_daedalus >= 0)
close(driver_data->video_fd_daedalus);
if (driver_data->media_fd_daedalus >= 0)
close(driver_data->media_fd_daedalus);
#endif
/* Fall back to direct close if neither alt fd captured the active
* pair (env-override path). */
if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) {
if (driver_data->video_fd >= 0)
close(driver_data->video_fd);
if (driver_data->media_fd >= 0)
close(driver_data->media_fd);
}
close(driver_data->video_fd);
close(driver_data->media_fd);
/* Cleanup leftover buffers. */
+1 -224
View File
@@ -31,27 +31,14 @@
#include "context.h"
#include "object_heap.h"
#include "request_pool.h"
#include "cap_pool.h"
#include "video.h"
#include <va/va.h>
#include <linux/videodev2.h>
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
#define V4L2_REQUEST_STR_VENDOR "v4l2-request"
/*
* Sized for max-possible enumeration with iter39 Option B reverted:
* MPEG2(2) + H264(6 incl. Hi10P) + HEVC(2 incl. Main10) + VP8 + VP9 + AV1 = 13.
* The per-group guards use `if (... && index < (MAX_PROFILES - N))` where N
* is the push-group size, so MAX must be total+1 14 here. Bumping
* defensively now so a future re-enable of Hi10P/Main10 doesn't silently
* drop AV1 through the off-by-one trap that ate ampere-av1's enumeration
* for a week (see issue marfrit/libva-v4l2-request-fourier#2).
*/
#define V4L2_REQUEST_MAX_PROFILES 14
#define V4L2_REQUEST_MAX_PROFILES 11
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
@@ -67,220 +54,10 @@ struct request_data {
int video_fd;
int media_fd;
/*
* iter38: multi-device probe. RK3399 has two V4L2 stateless decoders:
* - rkvdec H264 / HEVC / VP9
* - hantro-vpu (rk3399-vpu-dec) MPEG-2 / VP8
* At VA_DRIVER_INIT we probe both, open their fds, and store them
* here. driver_data->video_fd / media_fd above are the "active" fds
* (point at one of the pairs below). RequestCreateConfig retargets
* them based on the profile's required device. Pools and video_format
* are torn down at retarget time so the next CreateContext rebuilds
* them against the right device.
*
* -1 means that device kind isn't present on this kernel boot.
* Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit
* overrides when those are set, only the single requested device
* is opened and the alt fds stay -1.
*/
int video_fd_rkvdec;
int media_fd_rkvdec;
int video_fd_hantro;
int media_fd_hantro;
/*
* iter40: third multi-device-probe slot for rpi-hevc-dec (Pi 5 /
* CM5 / BCM2712). V4L2 stateless HEVC; CAPTURE is NC12/NC30 SAND
* 128-pixel-wide column tiled (Pi-specific). On Pi 5 this is the
* ONLY decoder slot; on RK hosts it stays -1 and HEVC routes to
* rkvdec as before.
*/
int video_fd_rpi_hevc_dec;
int media_fd_rpi_hevc_dec;
/*
* phase 8.10: fifth multi-device-probe slot for daedalus_v4l2 the
* out-of-tree V4L2 stateless decoder shim that forwards bitstream
* to a userspace daemon (daedalus-v4l2 sibling repo). Daemon does
* FFmpeg-software decode for VP9 / AV1 / H.264 and ships pixels
* back via dmabuf into the CAPTURE buffer. Picked up via the
* same media-controller probe + known_decoder_drivers[] entry
* pattern as iter40 rpi-hevc-dec. Stays -1 on hosts without the
* daedalus module loaded; HEVC routes to rpi-hevc-dec as before.
*
* Fields are unconditional (8 bytes per session) so the struct
* layout is stable regardless of meson option. The active
* probe + dispatch code in request.c is gated by
* HAVE_DAEDALUS_V4L2; when disabled the fields stay at their
* -1 init and no codepath touches them.
*/
int video_fd_daedalus;
int media_fd_daedalus;
/*
* ampere-av1-enablement Phase 2: fourth multi-device-probe slot
* for vpu981 (RK3588's dedicated AV1 hantro instance, kernel
* card="rockchip,rk3588-av1-vpu-dec", driver name "hantro-vpu"
* shared with the legacy MPEG-2/VP8/H.264 hantro). Discriminated
* by V4L2_PIX_FMT_AV1_FRAME (AV1F) OUTPUT-pixfmt capability since
* the driver name alone is ambiguous on RK3588. Stays -1 on hosts
* without the AV1 vpu-dec.
*
* Named "vpu981" for consistency with the in-progress av1-iter1
* operator branch (Phase 3-5 bit-exact AV1 work when that lands
* these fields receive the actual decode dispatch wiring).
*/
int video_fd_vpu981;
int media_fd_vpu981;
/*
* iter2 (ampere-kernel-decoders campaign) per-fd probe result
* for the V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS controls
* introduced in Linux 7.0 (Casanova VDPU381/VDPU383 series).
* RK3399 rkvdec doesn't have them and the probe returns false;
* RK3588 rkvdec (VDPU381/383) registers them and the probe is
* true. h265_set_controls consults only the rkvdec entry because
* HEVC routes through rkvdec only hantro's entry stays false
* naturally (it doesn't have rkvdec-specific controls).
*
* The pair-of-flags layout mirrors video_fd_rkvdec /
* video_fd_hantro above (iter38 multi-device-probe pattern,
* memory feedback_multi_device_probe_design). Phase 5 review
* surfaced this as a correctness item: a single scalar on
* driver_data would silently misbehave across device-switch
* boundaries; per-fd storage is the safe shape.
*/
bool has_hevc_ext_sps_rps_rkvdec;
bool has_hevc_ext_sps_rps_hantro;
/* iter40: rpi-hevc-dec doesn't expose EXT_SPS_*_RPS controls
* (verified Phase 0 higgs probe: QUERY_EXT_CTRL on 0xa97 EINVAL).
* Probed for consistency with the iter2 pair-of-flags pattern;
* stays false on Pi 5 and the iter2 vendored-parser path naturally
* doesn't engage. */
bool has_hevc_ext_sps_rps_rpi_hevc_dec;
/*
* iter2 cached SPS-derived RPS arrays. SPS NALs only appear in
* source_data on IDR frames; non-IDR frames' h265_set_controls
* reuse the cached arrays so we don't submit zero-filled RPS to
* the kernel (which would re-trigger the OOPS the iter2 fix is
* designed to prevent). Single-slot cache (sps_id 0 only)
* adequate for the BBB / typical-stream case; multi-SPS streams
* would need expanding to a [16] cache keyed by sps_id.
*
* The cache stores the post-mapped V4L2 control struct arrays
* (not the intermediate GstH265SPS) so request.h doesn't need
* to know about the vendored GStreamer parser types only the
* V4L2 UAPI structs from hevc-ctrls/v4l2-hevc-ext-controls.h
* included above.
*
* Owned by h265.c; freed at RequestTerminate.
*/
struct v4l2_ctrl_hevc_ext_sps_st_rps *hevc_rps_cache_st;
unsigned int hevc_rps_cache_st_count;
struct v4l2_ctrl_hevc_ext_sps_lt_rps *hevc_rps_cache_lt;
unsigned int hevc_rps_cache_lt_count;
bool hevc_rps_cache_valid;
/*
* iter40b: bitstream-derived SPS field cache for VAAPI-omitted
* fields. rpi-hevc-dec validates these against bitstream-true
* values; the rkvdec/hantro fallback (sps_max_dec_pic_buffering_minus1,
* 0) that satisfies §A.4.2 isn't enough for rpi.
*
* Cached on first IDR frame's SPS NAL parse, reused for subsequent
* non-IDR frames whose source_data may not carry an SPS.
*
* sps_max_sub_layers_minus1 is the index into max_*[] arrays. The
* V4L2 SPS struct fields are scalars (single sublayer), so we pick
* the HighestTid (= sps_max_sub_layers_minus1) slot matches
* ffmpeg-vaapi + kdirect convention.
*/
struct {
bool valid;
uint8_t sps_max_sub_layers_minus1;
uint8_t max_dec_pic_buffering_minus1;
uint8_t max_num_reorder_pics;
uint8_t max_latency_increase_plus1;
bool scaling_list_enabled;
bool scaling_list_data_present;
} hevc_sps_field_cache;
struct video_format *video_format;
/*
* OUTPUT (bitstream-input) buffer pool, decoupled from VA
* surfaces. Sized by codec pipeline depth, populated on first
* RequestCreateContext, torn down at driver Terminate.
*/
struct request_pool output_pool;
/*
* CAPTURE (decoded-frame) buffer pool, decoupled from VA
* surfaces (iter2 Fix 3). Each surface acquires a slot at
* vaBeginPicture time and releases it on the next acquisition
* or vaDestroySurfaces. Pool sized to max(surfaces_count,
* MIN_CAP_POOL) at first vaCreateSurfaces2; torn down at
* vaDestroyContext.
*
* Background: pre-iter2 each surface was 1:1 bound to one
* CAPTURE buffer index; mpv re-using a surface for a new decode
* caused V4L2 to re-QBUF the same physical buffer while a
* compositor still held an EXPBUF'd dma_buf fd, producing
* visible stutter on mpv vaapi --vo=gpu.
*/
struct cap_pool capture_pool;
/*
* iter5b-β: the pre-β last_output_{width,height} cache fields
* and surface_reset_format_cache() helper are deleted. They
* existed because CreateSurfaces2 owned the OUTPUT-side V4L2
* device-format lifecycle and needed to gate re-S_FMT on
* resolution change. β moves that lifecycle to CreateContext,
* which is naturally one-shot per context cycle; no caching is
* required. DestroyContext + next CreateContext rebuild from
* scratch.
*
* iter5b-β Commit D: cache the format-uniform CAPTURE-side
* geometry from v4l2_get_format so CreateSurfaces2 can populate
* a newly-created surface's destination_* fields without
* re-querying the device. Set by CreateContext after the
* v4l2_get_format(CAPTURE) call; consumed by both:
* 1. CreateContext's surface_heap walk (fills surfaces that
* pre-exist when CreateContext fires);
* 2. CreateSurfaces2's per-surface init (fills surfaces
* created AFTER CreateContext, e.g. ffmpeg vaapi-copy
* pool dynamics where the consumer passes surfaces_count=0
* to vaCreateContext and creates surfaces lazily).
*
* fmt_valid is true once CreateContext has populated the cache;
* CreateSurfaces2 only lazy-fills when fmt_valid is true.
*/
bool fmt_valid;
unsigned int fmt_format_height;
unsigned int fmt_planes_count;
unsigned int fmt_buffers_count;
unsigned int fmt_sizes[VIDEO_MAX_PLANES];
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
/*
* iter39: active session is decoding a 10-bit profile (Hi10P / Main10).
* Set in RequestCreateContext from config->profile. Drives:
* - CAPTURE pix_fmt selection (NV15 instead of NV12)
* - image.c DeriveImage / QueryImageFormats fourcc reporting (P010
* instead of NV12)
* - copy_surface_to_image NV15P010 unpack branch
* Reset to false at DestroyContext.
*/
bool is_10bit;
};
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
VAStatus RequestTerminate(VADriverContextP context);
/*
* iter38: retarget driver_data->{video,media}_fd to the device required by
* `profile`. Returns 0 on success, -1 on profile not mappable to any kind.
* Defined in request.c.
*/
int request_switch_device_for_profile(struct request_data *driver_data,
VAProfile profile);
#endif
-351
View File
@@ -1,351 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
*/
#include "request_pool.h"
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include "media.h"
#include "utils.h"
#include "v4l2.h"
int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
unsigned int output_type, unsigned int count,
unsigned int pixelformat,
unsigned int picture_width,
unsigned int picture_height)
{
unsigned int index_base;
unsigned int length;
unsigned int offset;
unsigned int i;
int rc;
if (pool == NULL || count == 0)
return -1;
if (pool->initialized)
return 0;
pool->slots = calloc(count, sizeof(*pool->slots));
if (pool->slots == NULL)
return -1;
pool->count = count;
pool->next = 0;
pool->media_fd = media_fd; /* iter7: kept for force_release re-alloc */
/*
* iter#15: cache the S_FMT params so request_pool_resize can
* re-issue S_FMT with a sizeimage hint override on overrun.
*/
pool->video_fd = video_fd;
pool->output_type = output_type;
pool->pixelformat = pixelformat;
pool->picture_width = picture_width;
pool->picture_height = picture_height;
for (i = 0; i < count; i++)
pool->slots[i].request_fd = -1;
rc = v4l2_create_buffers(video_fd, output_type, count, &index_base);
if (rc < 0)
goto error;
for (i = 0; i < count; i++) {
pool->slots[i].index = index_base + i;
pool->slots[i].busy = false;
rc = v4l2_query_buffer(video_fd, output_type,
pool->slots[i].index,
&length, &offset, 1);
if (rc < 0)
goto error;
pool->slots[i].data = mmap(NULL, length,
PROT_READ | PROT_WRITE,
MAP_SHARED, video_fd, offset);
if (pool->slots[i].data == MAP_FAILED) {
pool->slots[i].data = NULL;
goto error;
}
pool->slots[i].size = length;
/*
* iter6: each pool slot owns a permanent media-request fd,
* allocated once here and REINIT'd between uses in
* RequestSyncSurface. Replaces the iter4 close+alloc-per-
* frame model, whose lowest-free fd reuse was racing with
* the kernel's per-buffer state-machine teardown when the
* consumer rotated through multiple OUTPUT pool slots
* faster than the kernel cleanup drained (Firefox's
* MediaSource pattern). 1:1 slot-to-fd binding eliminates
* cross-slot fd reuse.
*/
pool->slots[i].request_fd = media_request_alloc(media_fd);
if (pool->slots[i].request_fd < 0)
goto error;
}
pool->initialized = true;
return 0;
error:
request_pool_destroy(pool);
return -1;
}
int request_pool_resize(struct request_pool *pool,
unsigned int new_sizeimage_min)
{
unsigned int index_base;
unsigned int length;
unsigned int offset;
unsigned int saved_count;
unsigned int i;
int rc;
if (pool == NULL || !pool->initialized || pool->count == 0)
return -1;
/*
* Pre-condition guard: no slot may be borrowed when we tear the
* pool down. The caller in codec_store_buffer temporarily releases
* the current in-flight surface's slot before invoking us; the
* inline-Sync-in-EndPicture pattern guarantees no other slot is
* borrowed elsewhere in the driver. Bail loudly if anyone breaks
* that invariant rather than corrupting in-flight V4L2 state.
*/
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].busy) {
request_log("request_pool_resize: slot %u still busy — "
"caller must release before resize\n", i);
return -1;
}
}
saved_count = pool->count;
/* STREAMOFF the OUTPUT queue so REQBUFS(0) is accepted. */
rc = v4l2_set_stream(pool->video_fd, pool->output_type, false);
if (rc < 0)
return -1;
/*
* Tear down every slot: munmap, close per-slot request_fd. Slot
* fields are zeroed in place so failure halfway is recoverable.
*/
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].data != NULL && pool->slots[i].size > 0) {
munmap(pool->slots[i].data, pool->slots[i].size);
pool->slots[i].data = NULL;
pool->slots[i].size = 0;
}
if (pool->slots[i].request_fd >= 0) {
close(pool->slots[i].request_fd);
pool->slots[i].request_fd = -1;
}
}
/*
* Release the V4L2 OUTPUT buffer indices. REQBUFS(0) is the only
* way to ask the kernel to free buffers so CREATE_BUFS can re-
* allocate with a new per-buffer sizeimage.
*/
rc = v4l2_request_buffers(pool->video_fd, pool->output_type, 0);
if (rc < 0)
return -1;
/*
* Re-issue S_FMT with the cached dimensions but a larger
* sizeimage. The kernel may round up further (driver-specific
* page / alignment rules); we accept whatever it returns and
* pick that up from per-slot v4l2_query_buffer below.
*/
rc = v4l2_set_format_sizeimage(pool->video_fd, pool->output_type,
pool->pixelformat,
pool->picture_width,
pool->picture_height,
new_sizeimage_min);
if (rc < 0)
return -1;
rc = v4l2_create_buffers(pool->video_fd, pool->output_type,
saved_count, &index_base);
if (rc < 0)
return -1;
for (i = 0; i < saved_count; i++) {
pool->slots[i].index = index_base + i;
pool->slots[i].busy = false;
rc = v4l2_query_buffer(pool->video_fd, pool->output_type,
pool->slots[i].index,
&length, &offset, 1);
if (rc < 0)
return -1;
pool->slots[i].data = mmap(NULL, length,
PROT_READ | PROT_WRITE,
MAP_SHARED, pool->video_fd, offset);
if (pool->slots[i].data == MAP_FAILED) {
pool->slots[i].data = NULL;
return -1;
}
pool->slots[i].size = length;
pool->slots[i].request_fd = media_request_alloc(pool->media_fd);
if (pool->slots[i].request_fd < 0)
return -1;
}
rc = v4l2_set_stream(pool->video_fd, pool->output_type, true);
if (rc < 0)
return -1;
pool->next = 0;
return 0;
}
void request_pool_destroy(struct request_pool *pool)
{
unsigned int i;
if (pool == NULL || pool->slots == NULL)
return;
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].request_fd >= 0)
close(pool->slots[i].request_fd);
if (pool->slots[i].data != NULL && pool->slots[i].size > 0)
munmap(pool->slots[i].data, pool->slots[i].size);
}
free(pool->slots);
pool->slots = NULL;
pool->count = 0;
pool->next = 0;
pool->initialized = false;
}
int request_pool_acquire(struct request_pool *pool)
{
unsigned int start;
unsigned int i;
if (pool == NULL || !pool->initialized || pool->count == 0)
return -1;
start = pool->next;
for (i = 0; i < pool->count; i++) {
unsigned int slot = (start + i) % pool->count;
if (!pool->slots[slot].busy) {
pool->slots[slot].busy = true;
pool->next = (slot + 1) % pool->count;
return (int)pool->slots[slot].index;
}
}
/* All slots busy; caller must wait for an in-flight DQBUF. */
return -1;
}
void request_pool_release(struct request_pool *pool, unsigned int index)
{
unsigned int i;
if (pool == NULL || pool->slots == NULL)
return;
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].index == index) {
pool->slots[i].busy = false;
return;
}
}
}
void request_pool_force_release(struct request_pool *pool, unsigned int index)
{
struct request_pool_slot *slot;
unsigned int i;
if (pool == NULL || pool->slots == NULL)
return;
slot = NULL;
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].index == index) {
slot = &pool->slots[i];
break;
}
}
if (slot == NULL)
return;
/*
* Try to recover the kernel-side request object via REINIT first.
* REINIT is the cheap path: kernel resets the request in place,
* fd stays valid, slot can be reused immediately.
*/
if (slot->request_fd >= 0 && media_request_reinit(slot->request_fd) == 0) {
slot->busy = false;
return;
}
/*
* REINIT failed (or slot's fd was already invalid). Close the fd
* and try to allocate a fresh one. This costs an extra ioctl pair
* relative to the REINIT happy path but keeps the slot usable.
*
* NOTE: alloc may return the same lowest-free fd number that was
* just closed. That's fine here because (a) this is a rare error-
* recovery path, not the per-frame happy path, and (b) the slot's
* V4L2 buffer has already been DQBUF'd by this point (or is in an
* indeterminate state we can't recover from regardless), so the
* iter6 race condition (cross-slot fd-reuse against a kernel
* buffer in mid-cleanup) does not apply.
*/
if (slot->request_fd >= 0)
close(slot->request_fd);
slot->request_fd = media_request_alloc(pool->media_fd);
if (slot->request_fd < 0) {
/*
* Realloc failed. Slot is now permanently dead leave
* busy=true so acquire skips it. Pool capacity is
* effectively reduced by 1 until pool destroy.
*/
return;
}
slot->busy = false;
}
struct request_pool_slot *request_pool_slot(struct request_pool *pool,
unsigned int index)
{
unsigned int i;
if (pool == NULL || pool->slots == NULL)
return NULL;
for (i = 0; i < pool->count; i++) {
if (pool->slots[i].index == index)
return &pool->slots[i];
}
return NULL;
}
-162
View File
@@ -1,162 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
*/
#ifndef _REQUEST_POOL_H_
#define _REQUEST_POOL_H_
#include <stdbool.h>
/*
* OUTPUT (bitstream-input) buffer pool, decoupled from caller-allocated
* VA surfaces. Sizing is driven by codec pipeline depth (typically 4
* for H.264), not by the consumer's surface count.
*
* The pool owns the V4L2 buffer indices and their mmap pointers. A
* decode request "borrows" a slot at vaBeginPicture, fills it across
* vaRenderPicture calls, queues it at vaEndPicture, and releases it
* after VIDIOC_DQBUF returns.
*
* This replaces the per-surface OUTPUT-buffer ownership model in the
* pre-refactor code, where object_surface.source_* fields permanently
* held a single OUTPUT buffer per surface incorrect because OUTPUT
* buffers are request-time resources, not picture-time resources, and
* because the per-surface loop in RequestCreateContext only ran when
* surfaces_count > 0 (breaking ffmpeg's vaapi-copy num_render_targets=0
* convention).
*/
struct request_pool_slot {
unsigned int index; /* V4L2 buffer index in OUTPUT queue */
void *data; /* mmap pointer for this slot */
unsigned int size; /* mmap size in bytes */
bool busy; /* true while borrowed for a request */
int request_fd; /* per-slot media-request fd, allocated
* once at pool init, REINIT'd between
* uses. iter6: replaces iter4 close+
* alloc-per-frame to eliminate cross-
* slot fd-reuse race that broke Firefox
* MediaSource's multi-surface decode. */
};
struct request_pool {
struct request_pool_slot *slots;
unsigned int count;
unsigned int next; /* round-robin acquire cursor */
int media_fd; /* iter7: kept for
* force_release re-alloc */
bool initialized;
/*
* iter#15: cached S_FMT params from request_pool_init, so
* request_pool_resize can re-S_FMT the OUTPUT queue with a new
* sizeimage override on a mid-session resolution upshift overrun
* without the caller having to re-thread these through six call
* sites. video_fd is also cached so the resize is fully
* self-contained request_pool_resize takes only the pool and
* the new sizeimage hint.
*/
int video_fd;
unsigned int output_type;
unsigned int pixelformat;
unsigned int picture_width;
unsigned int picture_height;
};
/*
* Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
* each, populate pool->slots[]. Caller must have already done
* VIDIOC_S_FMT on the OUTPUT queue. The S_FMT params (pixelformat,
* picture_width, picture_height) are stashed on the pool so that
* request_pool_resize can re-issue S_FMT with the same dimensions but
* a larger sizeimage hint. Returns 0 on success, -1 on failure.
*/
int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
unsigned int output_type, unsigned int count,
unsigned int pixelformat,
unsigned int picture_width,
unsigned int picture_height);
/*
* iter#15: grow the OUTPUT pool's per-slot sizeimage in place.
*
* Issued from codec_store_buffer when an Annex-B start code / VP8
* header pad / slice payload won't fit in the current
* surface->source_size i.e. the stream's per-frame bitstream budget
* has outgrown the OUTPUT pool slot's mmap (typical cause: SPS-driven
* resolution upshift mid-session).
*
* Steps:
* 1. STREAMOFF the OUTPUT queue.
* 2. munmap every slot, close every per-slot media-request fd.
* 3. VIDIOC_REQBUFS(count=0) to release the V4L2 buffer indices.
* 4. S_FMT with the cached pixelformat / picture_width /
* picture_height but a sizeimage hint of new_sizeimage_min.
* 5. CREATE_BUFS with the original slot count.
* 6. Per-slot: query buffer length, mmap, alloc fresh request_fd.
* 7. STREAMON.
*
* Returns 0 on success, -1 on failure (caller falls back to
* VA_STATUS_ERROR_ALLOCATION_FAILED the libva consumer recreates
* the surface at the new resolution).
*
* Pre-condition: NO pool slot is currently borrowed (busy=false on
* every slot) AND no buffer is in-flight on the OUTPUT queue. The
* inline-Sync-in-EndPicture pattern (RequestEndPicture calls
* RequestSyncSurface before returning) makes this trivially true at
* codec_store_buffer time for the only-supported single-context
* single-render-surface flow: the in-flight surface's slot is the
* sole borrowed slot, and the resize caller temporarily releases it
* before calling here.
*/
int request_pool_resize(struct request_pool *pool,
unsigned int new_sizeimage_min);
/*
* Munmap all slots and free the slots array. Idempotent.
*/
void request_pool_destroy(struct request_pool *pool);
/*
* Claim the next free slot (round-robin). Returns the slot's V4L2
* buffer index on success (slot in pool->slots[] is determined by
* the returned index), or -1 if all slots are busy.
*/
int request_pool_acquire(struct request_pool *pool);
/*
* Mark the slot at pool->slots[i] free for reuse. Caller must pass the
* V4L2 buffer index returned earlier from request_pool_acquire().
*/
void request_pool_release(struct request_pool *pool, unsigned int index);
/*
* iter7: error-recovery release. Called from RequestSyncSurface error
* paths when media_request_reinit or VIDIOC_DQBUF failed mid-cycle and
* the slot's request_fd is now in an undefined state. REINITs the fd;
* if REINIT fails (kernel-side request object too far gone), close
* the fd and re-alloc a fresh one. If realloc also fails, the slot
* is left busy=true (effectively dead, count decremented by 1) pool
* survives but with reduced capacity until driver terminate. Other
* slots are unaffected.
*
* Caller passes the V4L2 buffer index from request_pool_acquire().
*/
void request_pool_force_release(struct request_pool *pool,
unsigned int index);
/*
* Look up the pool slot owning a given V4L2 buffer index. Returns
* pointer to the slot on success, NULL if the index is out of range.
* The returned pointer is valid until pool destruction; do not free.
*/
struct request_pool_slot *request_pool_slot(struct request_pool *pool,
unsigned int index);
#endif
+187 -451
View File
@@ -29,7 +29,6 @@
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
@@ -47,112 +46,6 @@
#include "v4l2.h"
#include "video.h"
/*
* iter5b-β: the OUTPUT-side V4L2 device-format lifecycle moved out
* of this file. Pre-β CreateSurfaces2 owned the S_FMT(OUTPUT) +
* CAPTURE-format probe + cap_pool_init + per-surface destination_*
* fill; now that responsibility lives in context.c::RequestCreateContext
* where the bound config (and therefore the active VAProfile) is
* known via config_id. CreateSurfaces2 retains only surface object
* ID allocation and per-surface bookkeeping. The previous
* `surface_reset_format_cache` helper and `last_output_width/height`
* fields are deleted (β doesn't gate re-S_FMT on
* resolution the lifecycle is CreateContext-centric and natural
* setup/teardown happens at each context cycle).
*/
/*
* Iter2 Fix 3 helpers bind / unbind a cap_pool_slot to an
* object_surface. Called from BeginPicture (acquire+bind) and
* DestroySurfaces (unbind). Populates surface_object->destination_*
* fields from the slot so existing code paths (the QBUF in
* picture.c::EndPicture, the EXPBUF in ExportSurfaceHandle, the
* mmap-read in copy_surface_to_image) continue to work unchanged.
*
* surface_bind_slot is called only from BeginPicture; the surface's
* format-uniform fields (destination_planes_count, destination_sizes,
* destination_offsets, destination_bytesperlines) are already set
* by CreateSurfaces2 and stay constant.
*/
void surface_bind_slot(struct object_surface *surface_object,
struct cap_pool_slot *slot)
{
unsigned int j;
surface_object->current_slot = slot;
surface_object->destination_index = slot->v4l2_index;
surface_object->destination_buffers_count = slot->buffers_count;
for (j = 0; j < slot->buffers_count; j++) {
surface_object->destination_map[j] = slot->map[j];
surface_object->destination_map_lengths[j] = slot->map_lengths[j];
surface_object->destination_map_offsets[j] = slot->map_offsets[j];
}
/*
* destination_data[j] is the per-plane CPU pointer used by
* copy_surface_to_image. For single-buffer MPLANE NV12 (our
* common case), all planes live in slot->map[0] at varying
* offsets recorded in destination_offsets[].
*/
if (slot->buffers_count == 1) {
for (j = 0; j < surface_object->destination_planes_count; j++)
surface_object->destination_data[j] =
(unsigned char *)slot->map[0] +
surface_object->destination_offsets[j];
} else {
for (j = 0; j < surface_object->destination_planes_count; j++)
surface_object->destination_data[j] = slot->map[j];
}
}
void surface_unbind_slot(struct request_data *driver_data,
struct object_surface *surface_object)
{
if (surface_object->current_slot == NULL)
return;
cap_pool_release(&driver_data->capture_pool, surface_object->current_slot);
surface_object->current_slot = NULL;
}
/*
* iter5b-β Commit D: fill format-uniform destination_* on a surface
* from driver_data's CAPTURE-format cache. Idempotent: no-op if
* destination_planes_count is non-zero already.
*/
void surface_fill_format_uniform(struct request_data *driver_data,
struct object_surface *surface_object)
{
unsigned int j;
if (!driver_data->fmt_valid)
return;
if (surface_object->destination_planes_count != 0)
return;
surface_object->destination_planes_count = driver_data->fmt_planes_count;
surface_object->destination_buffers_count = driver_data->fmt_buffers_count;
if (driver_data->fmt_buffers_count == 1) {
for (j = 0; j < driver_data->fmt_planes_count; j++) {
surface_object->destination_offsets[j] =
j > 0 ? driver_data->fmt_sizes[j - 1] : 0;
surface_object->destination_sizes[j] =
driver_data->fmt_sizes[j];
surface_object->destination_bytesperlines[j] =
driver_data->fmt_bytesperlines[0];
}
} else if (driver_data->fmt_buffers_count == driver_data->fmt_planes_count) {
for (j = 0; j < driver_data->fmt_planes_count; j++) {
surface_object->destination_offsets[j] = 0;
surface_object->destination_sizes[j] =
driver_data->fmt_sizes[j];
surface_object->destination_bytesperlines[j] =
driver_data->fmt_bytesperlines[j];
}
}
}
VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
unsigned int width, unsigned int height,
VASurfaceID *surfaces_ids,
@@ -162,41 +55,146 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
{
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object;
unsigned int i;
struct video_format *video_format = NULL;
unsigned int destination_sizes[VIDEO_MAX_PLANES];
unsigned int destination_bytesperlines[VIDEO_MAX_PLANES];
unsigned int destination_planes_count;
unsigned int format_width, format_height;
unsigned int capture_type;
unsigned int index_base;
unsigned int index;
unsigned int i, j;
VASurfaceID id;
int rc;
/*
* iter5b-β: only RT-format-level validation here. All V4L2
* device state (OUTPUT format, CAPTURE format probe,
* cap_pool_init, per-surface destination_* fill) is deferred
* to RequestCreateContext where the bound VAConfigID
* (and therefore the active VAProfile) is known. CreateSurfaces2
* has no config_id parameter; the VA-API contract is
* CreateConfig CreateSurfaces CreateContext, and we
* can't know the OUTPUT pixel format until CreateContext binds.
*
* Surface objects allocated here hold only the requested
* width/height and per-surface lifecycle bookkeeping
* (current_slot, status, params, etc). The format-uniform
* destination_* fields are filled by CreateContext via
* surface_bind_format_uniform_fields(); the per-slot
* destination_* fields fill at BeginPicture via surface_bind_slot.
*/
/* iter39: allow YUV420_10 for Hi10P / Main10 surface allocation. */
if (format != VA_RT_FORMAT_YUV420 &&
format != VA_RT_FORMAT_YUV420_10)
request_log("CreateSurfaces2: format=0x%x %dx%d count=%d\n",
format, width, height, surfaces_count);
if (format != VA_RT_FORMAT_YUV420) {
request_log("CreateSurfaces2: rejecting RT_FORMAT 0x%x\n", format);
return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
}
if (!driver_data->video_format) {
/* Could happen if RequestInit's eager probe came back NULL —
* try one more time here in case the device only became
* format-enumerable after streaming setup. */
driver_data->video_format = video_format_probe(driver_data->video_fd);
if (driver_data->video_format == NULL) {
request_log("CreateSurfaces2: video_format probe still NULL\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
}
video_format = driver_data->video_format;
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
/* Set the CAPTURE format on first surface creation so the size is
* pinned to the picture dimensions the caller passed in. */
rc = v4l2_set_format(driver_data->video_fd, capture_type,
video_format->v4l2_format, width, height);
if (rc < 0) {
request_log("CreateSurfaces2: S_FMT(CAPTURE) failed for fmt=0x%x %dx%d\n",
video_format->v4l2_format, width, height);
return VA_STATUS_ERROR_OPERATION_FAILED;
}
rc = v4l2_get_format(driver_data->video_fd, capture_type, &format_width,
&format_height, destination_bytesperlines,
destination_sizes, NULL);
if (rc < 0) {
request_log("CreateSurfaces2: G_FMT(CAPTURE) failed\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
request_log("CreateSurfaces2: G_FMT got %dx%d bpl[0]=%d size[0]=%d\n",
format_width, format_height,
destination_bytesperlines[0], destination_sizes[0]);
destination_planes_count = video_format->planes_count;
rc = v4l2_create_buffers(driver_data->video_fd, capture_type,
surfaces_count, &index_base);
if (rc < 0) {
request_log("CreateSurfaces2: CREATE_BUFS(CAPTURE) failed\n");
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
request_log("CreateSurfaces2: CREATE_BUFS ok index_base=%u\n", index_base);
for (i = 0; i < surfaces_count; i++) {
index = index_base + i;
id = object_heap_allocate(&driver_data->surface_heap);
surface_object = SURFACE(driver_data, id);
if (surface_object == NULL)
return VA_STATUS_ERROR_ALLOCATION_FAILED;
surface_object->current_slot = NULL; /* iter2 Fix 3 */
surface_object->destination_index = 0; /* set on bind */
surface_object->destination_planes_count = 0; /* set at CreateContext */
surface_object->destination_buffers_count = 0; /* set at CreateContext */
rc = v4l2_query_buffer(driver_data->video_fd, capture_type,
index,
surface_object->destination_map_lengths,
surface_object->destination_map_offsets,
video_format->v4l2_buffers_count);
if (rc < 0) {
request_log("CreateSurfaces2: QUERYBUF idx=%u failed\n", index);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
for (j = 0; j < video_format->v4l2_buffers_count; j++) {
surface_object->destination_map[j] =
mmap(NULL,
surface_object->destination_map_lengths[j],
PROT_READ | PROT_WRITE, MAP_SHARED,
driver_data->video_fd,
surface_object->destination_map_offsets[j]);
if (surface_object->destination_map[j] == MAP_FAILED) {
request_log("CreateSurfaces2: mmap idx=%u plane=%u len=%u offset=%u failed\n",
index, j,
surface_object->destination_map_lengths[j],
surface_object->destination_map_offsets[j]);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
}
/*
* FIXME: Handle this per-pixelformat, trying to generalize it
* is not a reasonable approach. The final description should be
* in terms of (logical) planes.
*/
if (video_format->v4l2_buffers_count == 1) {
destination_sizes[0] = destination_bytesperlines[0] *
format_height;
for (j = 1; j < destination_planes_count; j++)
destination_sizes[j] = destination_sizes[0] / 2;
for (j = 0; j < destination_planes_count; j++) {
surface_object->destination_offsets[j] =
j > 0 ? destination_sizes[j - 1] : 0;
surface_object->destination_data[j] =
((unsigned char *)surface_object->destination_map[0] +
surface_object->destination_offsets[j]);
surface_object->destination_sizes[j] =
destination_sizes[j];
surface_object->destination_bytesperlines[j] =
destination_bytesperlines[0];
}
} else if (video_format->v4l2_buffers_count == destination_planes_count) {
for (j = 0; j < destination_planes_count; j++) {
surface_object->destination_offsets[j] = 0;
surface_object->destination_data[j] =
surface_object->destination_map[j];
surface_object->destination_sizes[j] =
destination_sizes[j];
surface_object->destination_bytesperlines[j] =
destination_bytesperlines[j];
}
} else {
request_log("CreateSurfaces2: buffers_count=%u planes_count=%u mismatch\n",
video_format->v4l2_buffers_count,
destination_planes_count);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
}
surface_object->status = VASurfaceReady;
surface_object->width = width;
@@ -206,6 +204,13 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
surface_object->source_data = NULL;
surface_object->source_size = 0;
surface_object->destination_index = index;
surface_object->destination_planes_count =
destination_planes_count;
surface_object->destination_buffers_count =
video_format->v4l2_buffers_count;
memset(&surface_object->params, 0,
sizeof(surface_object->params));
surface_object->slices_count = 0;
@@ -213,17 +218,6 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
surface_object->request_fd = -1;
/*
* iter5b-β Commit D: if CreateContext has already populated
* the format-uniform cache (driver_data->fmt_valid), fill
* the new surface's destination_* immediately. This covers
* the case where a consumer creates more surfaces AFTER
* CreateContext. The first batch of surfaces (created before
* CreateContext) gets filled by CreateContext's surface_heap
* walk; this lazy-fill handles late arrivals.
*/
surface_fill_format_uniform(driver_data, surface_object);
surfaces_ids[i] = id;
}
@@ -243,32 +237,26 @@ VAStatus RequestDestroySurfaces(VADriverContextP context,
{
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object;
unsigned int i;
unsigned int i, j;
for (i = 0; i < surfaces_count; i++) {
surface_object = SURFACE(driver_data, surfaces_ids[i]);
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
/*
* source_* are now transient borrows from request_pool, not
* surface-owned mappings; the pool owns the underlying mmap.
* Nothing to free here.
*
* Iter2 Fix 3: destination_* mappings are owned by cap_pool;
* surface_unbind_slot returns the slot to FREE (closing OUR
* EXPBUF fd if any). Pool-owned mmaps are freed at
* cap_pool_destroy time (RequestDestroyContext).
*/
surface_unbind_slot(driver_data, surface_object);
if (surface_object->source_data != NULL &&
surface_object->source_size > 0)
munmap(surface_object->source_data,
surface_object->source_size);
/*
* iter6: request_fd is owned by the OUTPUT pool slot, not by
* the surface. Do not close here. The pool closes all slot
* fds at request_pool_destroy time, which fires from
* RequestTerminate (driver unload) the pool is driver-wide
* and survives context destroy/recreate cycles.
*/
for (j = 0; j < surface_object->destination_buffers_count; j++)
if (surface_object->destination_map[j] != NULL &&
surface_object->destination_map_lengths[j] > 0)
munmap(surface_object->destination_map[j],
surface_object->destination_map_lengths[j]);
if (surface_object->request_fd > 0)
close(surface_object->request_fd);
object_heap_free(&driver_data->surface_heap,
(struct object_base *)surface_object);
@@ -279,9 +267,8 @@ VAStatus RequestDestroySurfaces(VADriverContextP context,
VAStatus RequestSyncSurface(VADriverContextP context, VASurfaceID surface_id)
{
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object = NULL;
struct object_surface *surface_object;
VAStatus status;
struct video_format *video_format;
unsigned int output_type, capture_type;
@@ -326,51 +313,19 @@ VAStatus RequestSyncSurface(VADriverContextP context, VASurfaceID surface_id)
goto error;
}
/*
* iter6: the request_fd belongs to the OUTPUT pool slot, not to the
* surface. REINIT to reset its state in place close+alloc would
* reuse the lowest-free fd number against a kernel object whose
* teardown hasn't fully drained, racing with QBUF on a slot that
* was just released. The pool's 1:1 slot-to-fd binding eliminates
* cross-slot fd reuse, and REINIT here resets the request object
* for the next decode cycle on the same slot.
*
* Iter4's frame-11 EINVAL (which prompted the iter4 close+alloc
* model) was a control-payload bug DPB carry-over with FFmpeg's
* V4L2_H264_FRAME_REF semantics not yet matched. That's been fixed
* since iter4 (`74d8dd1`), so REINIT is no longer compromised by
* the cluster-validation EINVAL pattern.
*/
rc = media_request_reinit(request_fd);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
surface_object->request_fd = -1;
rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, output_type,
surface_object->source_index, 1);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
/*
* iter7: OUTPUT DQBUF failed. The V4L2 buffer is in an
* indeterminate kernel state it may still be QUEUED. Do
* NOT return the slot to acquire-rotation: the next QBUF
* on it would EINVAL. Leave source_data set so the error
* handler skips force_release and the slot stays dead-busy.
*/
goto error_buffer_indeterminate;
goto error;
}
/*
* OUTPUT buffer is back from the kernel: return its pool slot
* for reuse and clear the surface's transient borrow handle.
*/
request_pool_release(&driver_data->output_pool,
surface_object->source_index);
surface_object->source_data = NULL;
surface_object->source_size = 0;
rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, capture_type,
surface_object->destination_index,
surface_object->destination_buffers_count);
@@ -379,152 +334,14 @@ VAStatus RequestSyncSurface(VADriverContextP context, VASurfaceID surface_id)
goto error;
}
/*
* Iter2 Fix 3: CAPTURE buffer is back from the kernel with valid
* pixel content. Transition the slot IN_DECODE DECODED. The slot
* stays bound to this surface until either ExportSurfaceHandle
* ( EXPORTED), the next BeginPicture for this surface (slot is
* released first), or DestroySurfaces (release).
*/
if (surface_object->current_slot != NULL) {
cap_pool_mark_decoded(&driver_data->capture_pool,
surface_object->current_slot);
/*
* iter8 Phase 6 (γ): env-gated diagnostic dump of the CAPTURE
* buffer immediately after DQBUF + mark_decoded. Distinguishes
* "kernel didn't write" from "libva mis-reads" for Bug 4
* (H.264 partial-fill). Off by default; enable with
* LIBVA_V4L2_DUMP_CAPTURE=1. destination_data[] is valid here
* (surface_bind_slot populated it at BeginPicture).
*/
static const char *dump_env = NULL;
static bool dump_env_checked = false;
if (!dump_env_checked) {
dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE");
dump_env_checked = true;
}
if (dump_env != NULL && dump_env[0] == '1') {
unsigned int p;
char hexbuf[128];
request_log("γ-dump: surface_id=%u v4l2_index=%u planes=%u\n",
(unsigned int)surface_id,
surface_object->destination_index,
surface_object->destination_planes_count);
for (p = 0; p < surface_object->destination_planes_count; p++) {
const unsigned char *d = surface_object->destination_data[p];
size_t sz = surface_object->destination_sizes[p];
size_t scan_lim;
unsigned int nz = 0;
size_t i;
int pos;
if (d == NULL) {
request_log("γ-dump: plane[%u] NULL ptr (size=%zu)\n",
p, sz);
continue;
}
/*
* Phase 5 MIN-2: scan at least one Y-MB row
* (16 lines * bytesperline) for plane 0, else
* 1024 bytes for chroma plane.
*/
if (p == 0) {
size_t mbrow =
surface_object->destination_bytesperlines[0] * 16;
scan_lim = sz < mbrow ? sz : mbrow;
} else {
scan_lim = sz < 1024 ? sz : 1024;
}
for (i = 0; i < scan_lim; i++)
if (d[i] != 0)
nz++;
request_log("γ-dump: plane[%u] sz=%zu bpl=%u "
"scan=%zu non_zero=%u\n",
p, sz,
surface_object->destination_bytesperlines[p],
scan_lim, nz);
pos = 0;
for (i = 0; i < 32 && i < sz; i++)
pos += snprintf(hexbuf + pos,
sizeof(hexbuf) - pos,
"%02x ", d[i]);
request_log("γ-dump: plane[%u] head[0..32]: %s\n",
p, hexbuf);
if (sz >= 32) {
pos = 0;
for (i = 0; i < 32; i++)
pos += snprintf(hexbuf + pos,
sizeof(hexbuf) - pos,
"%02x ", d[sz - 32 + i]);
request_log("γ-dump: plane[%u] tail[%zu..%zu]: %s\n",
p, sz - 32, sz - 1, hexbuf);
}
}
}
}
surface_object->status = VASurfaceDisplaying;
status = VA_STATUS_SUCCESS;
goto complete;
error:
/*
* iter7: error recovery for the OUTPUT pool slot. If the surface
* acquired a slot in BeginPicture (source_data != NULL indicates
* an active borrow), reset the slot's request_fd via
* request_pool_force_release so the slot returns to the
* acquire-rotation. force_release tries REINIT first; falls back
* to close+alloc if REINIT fails; leaves the slot dead-busy if
* even alloc fails (other slots unaffected). Replaces iter6's
* accepted bounded leak.
*
* Reachable from: media_request_queue / wait_completion / REINIT
* failures. NOT reachable for OUTPUT-DQBUF failure (separate label
* `error_buffer_indeterminate` below) because in that case the
* V4L2 buffer is in an indeterminate kernel state and reusing the
* slot would EINVAL on the next QBUF.
*
* If the surface never acquired a slot (source_data == NULL),
* there is no slot to release; nothing to do.
*/
if (surface_object != NULL) {
if (surface_object->source_data != NULL) {
request_pool_force_release(&driver_data->output_pool,
surface_object->source_index);
surface_object->source_data = NULL;
surface_object->source_size = 0;
}
surface_object->request_fd = -1;
}
goto complete;
error_buffer_indeterminate:
/*
* iter7: OUTPUT DQBUF failed after a successful REINIT. The kernel
* V4L2 buffer is in an unknown state (possibly still QUEUED with
* pending decode result, possibly half-dequeued, possibly stuck
* in driver internals). The slot's request_fd has already been
* REINIT'd to a clean state, but reusing the slot for a new
* decode would QBUF on a buffer the kernel may still hold
* triggering exactly the iter6 race we eliminated for the happy
* path.
*
* Leave the slot dead-busy: don't release, don't force_release.
* Other slots are unaffected. If this fires repeatedly, the pool
* leaks slots until starvation, at which point acquire returns -1
* and BeginPicture cleanly propagates ALLOCATION_FAILED. This is
* a strictly safer failure mode than reusing an indeterminate
* V4L2 buffer.
*/
if (surface_object != NULL) {
surface_object->source_data = NULL;
surface_object->source_size = 0;
if (request_fd >= 0) {
close(request_fd);
surface_object->request_fd = -1;
}
@@ -537,7 +354,6 @@ VAStatus RequestQuerySurfaceAttributes(VADriverContextP context,
VASurfaceAttrib *attributes,
unsigned int *attributes_count)
{
struct request_data *driver_data = context->pDriverData;
VASurfaceAttrib *attributes_list;
unsigned int attributes_list_size = V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES *
@@ -545,6 +361,9 @@ VAStatus RequestQuerySurfaceAttributes(VADriverContextP context,
int memory_types;
unsigned int i = 0;
request_log("QuerySurfaceAttributes: config=%u attrs=%p count=%p\n",
config, attributes, attributes_count);
attributes_list = malloc(attributes_list_size);
memset(attributes_list, 0, attributes_list_size);
@@ -616,7 +435,6 @@ VAStatus RequestQuerySurfaceStatus(VADriverContextP context,
struct request_data *driver_data = context->pDriverData;
struct object_surface *surface_object;
surface_object = SURFACE(driver_data, surface_id);
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
@@ -670,16 +488,26 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
VAStatus status;
int rc;
video_format = driver_data->video_format;
if (video_format == NULL)
return VA_STATUS_ERROR_OPERATION_FAILED;
request_log("ExportSurfaceHandle: surface_id=%u mem_type=0x%x flags=0x%x\n",
surface_id, mem_type, flags);
if (mem_type != VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME_2)
video_format = driver_data->video_format;
if (video_format == NULL) {
request_log("ExportSurfaceHandle: video_format NULL\n");
return VA_STATUS_ERROR_OPERATION_FAILED;
}
if (mem_type != VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME_2) {
request_log("ExportSurfaceHandle: rejecting mem_type 0x%x (only DRM_PRIME_2)\n",
mem_type);
return VA_STATUS_ERROR_UNSUPPORTED_MEMORY_TYPE;
}
surface_object = SURFACE(driver_data, surface_id);
if (surface_object == NULL)
if (surface_object == NULL) {
request_log("ExportSurfaceHandle: invalid surface_id\n");
return VA_STATUS_ERROR_INVALID_SURFACE;
}
export_fds_count = surface_object->destination_buffers_count;
export_fds = malloc(export_fds_count * sizeof(*export_fds));
@@ -690,32 +518,15 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
surface_object->destination_index, O_RDONLY,
export_fds, export_fds_count);
if (rc < 0) {
request_log("ExportSurfaceHandle: EXPBUF idx=%u count=%u failed\n",
surface_object->destination_index, export_fds_count);
status = VA_STATUS_ERROR_OPERATION_FAILED;
goto error;
}
/*
* Iter2 Fix 3: pool now owns OUR copy of the EXPBUF'd fd. The
* consumer receives a dup'd / equivalent fd via the descriptor.
* Slot transitions DECODED EXPORTED; it will be force-recyclable
* by LRU when the pool is exhausted, but FREE slots are always
* preferred.
*/
if (surface_object->current_slot != NULL && export_fds_count > 0)
cap_pool_mark_exported(&driver_data->capture_pool,
surface_object->current_slot,
export_fds[0]);
planes_count = surface_object->destination_planes_count;
/* iter39: 10-bit session exports a DRM_FORMAT_NV15 buffer; advertise
* the matching fourcc so a PRIME consumer aware of NV15 (panfrost-
* Mesa et al.) can import correctly. PRIME consumers that only know
* NV12 / P010 should use the COPY (vaGetImage) path which unpacks
* NV15P010 in image.c::copy_surface_to_image. */
surface_descriptor->fourcc = driver_data->is_10bit
? VA_FOURCC('N', 'V', '1', '5')
: VA_FOURCC_NV12;
surface_descriptor->fourcc = VA_FOURCC_NV12;
surface_descriptor->width = surface_object->width;
surface_descriptor->height = surface_object->height;
surface_descriptor->num_objects = export_fds_count;
@@ -726,102 +537,27 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
for (i = 0; i < planes_count; i++)
size += surface_object->destination_sizes[i];
/*
* Iteration 2 Fix 2: choose drm_format_modifier conditionally on
* pitch alignment. Mesa's WSI / Panfrost compositor path rejects
* DRM_FORMAT_MOD_NONE (= LINEAR explicit) buffers whose pitch isn't
* GPU-aligned (typically 64+ bytes for Mali). For 1920-wide content
* the pitch is 1920 (64-aligned, fine); for 864-wide content the
* pitch is 864 (only 16-aligned), Mesa rejects with "WSI pitch not
* properly aligned" and Firefox falls back to SW.
*
* Setting DRM_FORMAT_MOD_INVALID tells the importer "modifier
* unknown, treat as implicit / texture-only" — Firefox's
* DMABufSurface.cpp:1920 explicitly omits modifier attribs from
* eglCreateImage when the value is MOD_INVALID, bypassing Mesa's
* scanout-alignment check. The buffer is then texture-imported
* (small perf cost) instead of WSI scanout-imported, which is
* the correct behavior for a buffer that doesn't meet scanout
* alignment requirements.
*
* We branch on pitch alignment to preserve LINEAR semantics for
* already-aligned content (avoids unnecessary perf cost on the
* common 1920-wide case).
*
* Sonnet Phase 5 review (iter2 question 4) endorsed this
* conditional approach over a universal MOD_INVALID change.
*/
for (i = 0; i < export_fds_count; i++) {
uint64_t modifier = video_format->drm_modifier;
unsigned int bytesperline =
surface_object->destination_bytesperlines[0];
if (bytesperline & 63) /* not 64-byte aligned */
modifier = DRM_FORMAT_MOD_INVALID;
surface_descriptor->objects[i].drm_format_modifier = modifier;
surface_descriptor->objects[i].drm_format_modifier =
video_format->drm_modifier;
surface_descriptor->objects[i].fd = export_fds[i];
surface_descriptor->objects[i].size = export_fds_count == 1 ?
size :
surface_object->destination_sizes[i];
}
/*
* Layer construction depends on the consumer's request flags
* (VA_EXPORT_SURFACE_*_LAYERS):
*
* COMPOSED_LAYERS (default, mpv): one layer carrying both
* Y and UV planes (drm_format=NV12, num_planes=2). Mesa
* imports as a single NV12 EGLImage.
*
* SEPARATE_LAYERS (Firefox 150 RDD): two layers, Y as a
* single-plane R8 layer, UV as a single-plane GR88 layer.
* Firefox's GetVAAPISurfaceDescriptor passes
* VA_EXPORT_SURFACE_SEPARATE_LAYERS so its DMABufSurfaceYUV
* import code can address Y and UV planes independently.
* Without this branch, Firefox parsed our COMPOSED layout
* as if it were SEPARATE, found bogus layer-1 data, and
* silently fell back to FFmpeg(FFVPX) software decode.
*
* The earlier path 0001 mplane port assumed a single COMPOSED
* shape fine for mpv but breaks any consumer requesting
* SEPARATE. Honor the flag.
*/
if ((flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS) && planes_count == 2) {
surface_descriptor->num_layers = 2;
surface_descriptor->num_layers = 1;
/* Layer 0: Y plane as DRM_FORMAT_R8 (1 byte/pixel luma). */
surface_descriptor->layers[0].drm_format = DRM_FORMAT_R8;
surface_descriptor->layers[0].num_planes = 1;
surface_descriptor->layers[0].object_index[0] =
export_fds_count == 1 ? 0 : 0;
surface_descriptor->layers[0].offset[0] =
surface_object->destination_offsets[0];
surface_descriptor->layers[0].pitch[0] =
surface_object->destination_bytesperlines[0];
surface_descriptor->layers[0].drm_format = video_format->drm_format;
surface_descriptor->layers[0].num_planes = planes_count;
/* Layer 1: UV plane as DRM_FORMAT_GR88 (interleaved
* U+V, 2 bytes/pixel chroma at half resolution). */
surface_descriptor->layers[1].drm_format = DRM_FORMAT_GR88;
surface_descriptor->layers[1].num_planes = 1;
surface_descriptor->layers[1].object_index[0] =
export_fds_count == 1 ? 0 : 1;
surface_descriptor->layers[1].offset[0] =
surface_object->destination_offsets[1];
surface_descriptor->layers[1].pitch[0] =
surface_object->destination_bytesperlines[1];
} else {
/* COMPOSED_LAYERS / default: one layer with all planes. */
surface_descriptor->num_layers = 1;
surface_descriptor->layers[0].drm_format = video_format->drm_format;
surface_descriptor->layers[0].num_planes = planes_count;
for (i = 0; i < planes_count; i++) {
surface_descriptor->layers[0].object_index[i] =
export_fds_count == 1 ? 0 : i;
surface_descriptor->layers[0].offset[i] =
surface_object->destination_offsets[i];
surface_descriptor->layers[0].pitch[i] =
surface_object->destination_bytesperlines[i];
}
for (i = 0; i < planes_count; i++) {
surface_descriptor->layers[0].object_index[i] =
export_fds_count == 1 ? 0 : i;
surface_descriptor->layers[0].offset[i] =
surface_object->destination_offsets[i];
surface_descriptor->layers[0].pitch[i] =
surface_object->destination_bytesperlines[i];
}
status = VA_STATUS_SUCCESS;
+1 -86
View File
@@ -32,11 +32,6 @@
#include <va/va_backend.h>
#include "object_heap.h"
#include "cap_pool.h"
#include "h265.h"
struct request_data;
#define SURFACE(data, id) \
((struct object_surface *)object_heap_lookup(&(data)->surface_heap, id))
@@ -45,7 +40,7 @@ struct request_data;
struct object_surface {
struct object_base base;
VASurfaceStatus status;
VAStatus status;
int width;
int height;
@@ -53,26 +48,6 @@ struct object_surface {
void *source_data;
unsigned int source_size;
/*
* Iter2 Fix 3: destination_* fields below are now per-decode-cycle.
* They are populated from current_slot in RequestBeginPicture and
* remain valid through SyncSurface, ExportSurfaceHandle, and
* DeriveImage/copy_surface_to_image (vaapi-copy path). Subsequent
* BeginPicture for this surface releases the prior slot and
* acquires a new one.
*
* destination_planes_count, destination_sizes, destination_offsets,
* destination_bytesperlines are FORMAT-uniform across all CAPTURE
* buffers, so they're set once at CreateSurfaces2 time and stay.
*
* destination_index, destination_map[], destination_map_lengths,
* destination_map_offsets, destination_data[] are SLOT-specific
* and re-populated each BeginPicture from current_slot.
*
* destination_buffers_count is also format-uniform (V4L2 planes
* per buffer = 1 for single-plane MPLANE NV12).
*/
struct cap_pool_slot *current_slot; /* iter2 Fix 3 */
unsigned int destination_index;
void *destination_map[VIDEO_MAX_PLANES];
unsigned int destination_map_lengths[VIDEO_MAX_PLANES];
@@ -98,42 +73,15 @@ struct object_surface {
} mpeg2;
struct {
VAIQMatrixBufferH264 matrix;
bool matrix_set;
VAPictureParameterBufferH264 picture;
VASliceParameterBufferH264 slice;
} h264;
struct {
VAPictureParameterBufferHEVC picture;
VASliceParameterBufferHEVC slice;
VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
unsigned int num_slices;
VAIQMatrixBufferHEVC iqmatrix;
bool iqmatrix_set;
} h265;
struct {
VAPictureParameterBufferVP8 picture;
VASliceParameterBufferVP8 slice;
VAIQMatrixBufferVP8 iqmatrix;
bool iqmatrix_set;
VAProbabilityDataBufferVP8 probability;
bool probability_set;
} vp8;
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
struct {
/*
* AV1 picture parameter buffer. Slice params are
* intentionally absent the daedalus daemon track
* (issue #11) consumes the slice OBU bytes directly
* from the OUTPUT bitstream and synthesises only the
* sequence-header OBU from V4L2_CID_STATELESS_AV1_
* SEQUENCE. No per-tile-group structOBU re-synthesis
* required from libva today.
*/
VADecPictureParameterBufferAV1 picture;
} av1;
} params;
int request_fd;
@@ -177,37 +125,4 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
VASurfaceID surface_id, uint32_t mem_type,
uint32_t flags, void *descriptor);
/*
* iter5b-β Commit D: populate a surface's format-uniform destination_*
* fields (planes_count, buffers_count, offsets, sizes, bytesperlines)
* from driver_data's cached CAPTURE-side geometry. Idempotent: skip
* if already filled (destination_planes_count != 0). Caller must
* ensure driver_data->fmt_valid is true (CreateContext has run).
*
* Called by:
* - context.c::RequestCreateContext after v4l2_get_format(CAPTURE)
* populates the cache; walks the surface_heap and fills every
* existing surface (covers surfaces created before CreateContext,
* including the ffmpeg vaapi-copy case where surfaces_count=0 is
* passed but surfaces exist in the heap from earlier
* CreateSurfaces2 calls).
* - surface.c::RequestCreateSurfaces2 after surface allocation,
* covering the case where CreateContext fired before this
* CreateSurfaces2 call (fmt cache is valid, fill immediately).
*/
void surface_fill_format_uniform(struct request_data *driver_data,
struct object_surface *surface_object);
/*
* Iter2 Fix 3: bind / unbind a CAPTURE-pool slot to an object_surface.
* Called from picture.c::RequestBeginPicture (acquire+bind) and
* surface.c::RequestDestroySurfaces (unbind). Mirrors slot's V4L2 index
* and mmap pointers into surface_object->destination_* so existing
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
*/
void surface_bind_slot(struct object_surface *surface_object,
struct cap_pool_slot *slot);
void surface_unbind_slot(struct request_data *driver_data,
struct object_surface *surface_object);
#endif
+16 -1
View File
@@ -27,7 +27,7 @@
.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */
#endif
#ifdef __arm__
#ifndef __aarch64__
.text
.syntax unified
@@ -182,4 +182,19 @@ thumb_function tiled_deinterleave_to_planar
b 7b
end_function tiled_deinterleave_to_planar
#else /* __aarch64__ */
/* Fourier-local: aarch64 stub. The body of this file is ARMv7 NEON Thumb
* assembly used by sunxi-cedrus (Allwinner-tiled NV12 linear NV12) and
* is never reached on aarch64 hosts because video_format_is_linear()
* returns true for our NV12 entries. Provide a stub so the shared object
* still resolves the symbol; if ever actually called, return immediately. */
.text
.global tiled_to_planar
.type tiled_to_planar, %function
.hidden tiled_to_planar
tiled_to_planar:
ret
.size tiled_to_planar, .-tiled_to_planar
#endif
+14
View File
@@ -33,10 +33,24 @@
void request_log(const char *format, ...)
{
va_list arguments;
static FILE *trace_fp;
fprintf(stderr, "%s: ", V4L2_REQUEST_STR_VENDOR);
va_start(arguments, format);
vfprintf(stderr, format, arguments);
va_end(arguments);
/* Fourier-local: also tee to /tmp/libva-fourier.log so messages are
* visible from sandboxed GPU processes (Chromium etc.) where stderr
* is redirected. Append-mode, lazy-open. */
if (trace_fp == NULL)
trace_fp = fopen("/tmp/libva-fourier.log", "a");
if (trace_fp != NULL) {
fprintf(trace_fp, "%s: ", V4L2_REQUEST_STR_VENDOR);
va_start(arguments, format);
vfprintf(trace_fp, format, arguments);
va_end(arguments);
fflush(trace_fp);
}
}
+13 -185
View File
@@ -113,28 +113,6 @@ static void v4l2_setup_format(struct v4l2_format *format, unsigned int type,
}
}
static void v4l2_setup_format_sizeimage(struct v4l2_format *format,
unsigned int type,
unsigned int width, unsigned int height,
unsigned int pixelformat,
unsigned int sizeimage)
{
memset(format, 0, sizeof(*format));
format->type = type;
if (v4l2_type_is_mplane(type)) {
format->fmt.pix_mp.width = width;
format->fmt.pix_mp.height = height;
format->fmt.pix_mp.plane_fmt[0].sizeimage = sizeimage;
format->fmt.pix_mp.pixelformat = pixelformat;
} else {
format->fmt.pix.width = width;
format->fmt.pix.height = height;
format->fmt.pix.sizeimage = sizeimage;
format->fmt.pix.pixelformat = pixelformat;
}
}
bool v4l2_find_format(int video_fd, unsigned int type, unsigned int pixelformat)
{
struct v4l2_fmtdesc fmtdesc;
@@ -194,30 +172,6 @@ int v4l2_set_format(int video_fd, unsigned int type, unsigned int pixelformat,
return 0;
}
int v4l2_set_format_sizeimage(int video_fd, unsigned int type,
unsigned int pixelformat,
unsigned int width, unsigned int height,
unsigned int sizeimage)
{
struct v4l2_format format;
int rc;
if (sizeimage == 0)
return v4l2_set_format(video_fd, type, pixelformat, width, height);
v4l2_setup_format_sizeimage(&format, type, width, height, pixelformat,
sizeimage);
rc = ioctl(video_fd, VIDIOC_S_FMT, &format);
if (rc < 0) {
request_log("Unable to set format (sizeimage=%u) for type %d: %s\n",
sizeimage, type, strerror(errno));
return -1;
}
return 0;
}
int v4l2_get_format(int video_fd, unsigned int type, unsigned int *width,
unsigned int *height, unsigned int *bytesperline,
unsigned int *sizes, unsigned int *planes_count)
@@ -474,60 +428,23 @@ int v4l2_export_buffer(int video_fd, unsigned int type, unsigned int index,
return 0;
}
static int v4l2_ioctl_controls(int video_fd, int request_fd, unsigned long ioc,
struct v4l2_ext_control *control_array,
unsigned int num_controls)
{
struct v4l2_ext_controls controls;
memset(&controls, 0, sizeof(controls));
controls.controls = control_array;
controls.count = num_controls;
if (request_fd >= 0) {
controls.which = V4L2_CTRL_WHICH_REQUEST_VAL;
controls.request_fd = request_fd;
}
return ioctl(video_fd, ioc, &controls);
}
int v4l2_get_controls(int video_fd, int request_fd,
struct v4l2_ext_control *control_array,
unsigned int num_controls)
{
int rc;
rc = v4l2_ioctl_controls(video_fd, request_fd, VIDIOC_G_EXT_CTRLS,
control_array, num_controls);
if (rc < 0) {
/*
* EACCES on G_EXT_CTRLS for request fds is the normal case on
* this hantro rig the kernel doesn't allow readback through
* the request_fd. Caller (h264.c) tracks this with a one-time
* "V4L2 readback unavailable" announcement. Suppress per-call
* noise to keep the log signal-to-noise high.
*/
if (errno != EACCES)
request_log("Unable to get control(s): %s\n",
strerror(errno));
return -1;
}
return 0;
}
int v4l2_set_controls(int video_fd, int request_fd,
struct v4l2_ext_control *control_array,
unsigned int num_controls)
int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
unsigned int size)
{
struct v4l2_ext_control control;
struct v4l2_ext_controls controls;
int rc;
memset(&control, 0, sizeof(control));
memset(&controls, 0, sizeof(controls));
controls.controls = control_array;
controls.count = num_controls;
control.id = id;
control.ptr = data;
control.size = size;
controls.controls = &control;
controls.count = 1;
if (request_fd >= 0) {
controls.which = V4L2_CTRL_WHICH_REQUEST_VAL;
controls.request_fd = request_fd;
@@ -535,42 +452,13 @@ int v4l2_set_controls(int video_fd, int request_fd,
rc = ioctl(video_fd, VIDIOC_S_EXT_CTRLS, &controls);
if (rc < 0) {
/* error_idx is the index of the first failing control;
* if it equals count, the ioctl itself failed (not a
* specific control payload). Useful for triaging
* which V4L2_CID_STATELESS_* the kernel rejected. */
if (controls.error_idx < num_controls)
request_log("Unable to set control(s): %s "
"(error_idx=%u/%u failing_ctrl_id=0x%x size=%u)\n",
strerror(errno),
controls.error_idx, controls.count,
control_array[controls.error_idx].id,
control_array[controls.error_idx].size);
else
request_log("Unable to set control(s): %s "
"(error_idx=%u/%u ioctl-level)\n",
strerror(errno),
controls.error_idx, controls.count);
request_log("Unable to set control: %s\n", strerror(errno));
return -1;
}
return 0;
}
int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
unsigned int size)
{
struct v4l2_ext_control control;
memset(&control, 0, sizeof(control));
control.id = id;
control.ptr = data;
control.size = size;
return v4l2_set_controls(video_fd, request_fd, &control, 1);
}
int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
{
enum v4l2_buf_type buf_type = type;
@@ -586,63 +474,3 @@ int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
return 0;
}
int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
struct v4l2_query_ext_ctrl *qec)
{
struct v4l2_query_ext_ctrl local;
struct v4l2_query_ext_ctrl *target = qec ? qec : &local;
int rc;
memset(target, 0, sizeof(*target));
target->id = id;
rc = ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, target);
if (rc < 0)
return -1;
return 0;
}
int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
struct v4l2_querymenu *qm)
{
int rc;
if (qm == NULL)
return -1;
memset(qm, 0, sizeof(*qm));
qm->id = id;
qm->index = index;
rc = ioctl(video_fd, VIDIOC_QUERYMENU, qm);
if (rc < 0)
return -1;
return 0;
}
bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
unsigned int value)
{
struct v4l2_query_ext_ctrl qec;
struct v4l2_querymenu qm;
long long i;
if (v4l2_query_ext_ctrl(video_fd, id, &qec) < 0)
return false;
if (qec.type != V4L2_CTRL_TYPE_MENU &&
qec.type != V4L2_CTRL_TYPE_INTEGER_MENU)
return false;
for (i = qec.minimum; i <= qec.maximum; i += qec.step ? qec.step : 1) {
if (v4l2_query_menu(video_fd, id, (unsigned int)i, &qm) < 0)
continue;
if ((unsigned int)i == value)
return true;
}
return false;
}
-50
View File
@@ -36,17 +36,6 @@ bool v4l2_find_format(int video_fd, unsigned int type,
unsigned int pixelformat);
int v4l2_set_format(int video_fd, unsigned int type, unsigned int pixelformat,
unsigned int width, unsigned int height);
/*
* Same as v4l2_set_format but explicitly overrides the OUTPUT
* sizeimage hint. Pass sizeimage=0 to get the v4l2_set_format default
* (SOURCE_SIZE_MAX for OUTPUT, 0 for CAPTURE). Used by
* request_pool_resize on a mid-session bitstream-budget overrun to
* grow the OUTPUT pool slots past the SOURCE_SIZE_MAX floor.
*/
int v4l2_set_format_sizeimage(int video_fd, unsigned int type,
unsigned int pixelformat,
unsigned int width, unsigned int height,
unsigned int sizeimage);
int v4l2_get_format(int video_fd, unsigned int type, unsigned int *width,
unsigned int *height, unsigned int *bytesperline,
unsigned int *sizes, unsigned int *planes_count);
@@ -65,47 +54,8 @@ int v4l2_dequeue_buffer(int video_fd, int request_fd, unsigned int type,
int v4l2_export_buffer(int video_fd, unsigned int type, unsigned int index,
unsigned int flags, int *export_fds,
unsigned int export_fds_count);
int v4l2_get_controls(int video_fd, int request_fd,
struct v4l2_ext_control *controls,
unsigned int num_controls);
int v4l2_set_controls(int video_fd, int request_fd,
struct v4l2_ext_control *controls,
unsigned int num_controls);
int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
unsigned int size);
int v4l2_set_stream(int video_fd, unsigned int type, bool enable);
/*
* Capability-probe helpers. These let calling code discover what the
* backing kernel driver supports rather than hardcoding assumptions
* about specific decoder hardware.
*/
/*
* Query the metadata of an extended control by CID. Fills *qec on
* success. Returns 0 if the control exists, -1 (errno=EINVAL) if the
* driver does not expose this CID. Pass qec=NULL to test existence
* only.
*/
struct v4l2_query_ext_ctrl;
int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
struct v4l2_query_ext_ctrl *qec);
/*
* Query a single menu item of a menu/intmenu control at the given
* index. Fills *qm on success. Returns 0 if the menu item exists at
* this index, -1 otherwise.
*/
struct v4l2_querymenu;
int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
struct v4l2_querymenu *qm);
/*
* Convenience: for a menu-type control, return true iff `value` is a
* valid menu entry (i.e. the driver accepts it). Walks all menu items
* up to the control's maximum to check.
*/
bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
unsigned int value);
#endif
+48 -39
View File
@@ -31,9 +31,8 @@
#include <drm_fourcc.h>
#include <linux/videodev2.h>
#include "nv12_col128.h" /* fallback V4L2_PIX_FMT_NV12_COL128 define */
#include "nv15.h" /* fallback V4L2_PIX_FMT_NV15 define */
#include "utils.h"
#include "v4l2.h"
#include "video.h"
static struct video_format formats[] = {
@@ -41,46 +40,22 @@ static struct video_format formats[] = {
.description = "NV12 YUV",
.v4l2_format = V4L2_PIX_FMT_NV12,
.v4l2_buffers_count = 1,
.v4l2_mplane = false,
.drm_format = DRM_FORMAT_NV12,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 16,
},
{
.description = "NV12 YUV (multi-plane)",
.v4l2_format = V4L2_PIX_FMT_NV12,
.v4l2_buffers_count = 1,
.v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV12,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 16,
},
{
.description = "NV15 YUV (10-bit, rkvdec)",
.v4l2_format = V4L2_PIX_FMT_NV15,
.v4l2_buffers_count = 1,
.v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV15,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 24,
},
{
/*
* iter40: Pi 5 / CM5 rpi-hevc-dec CAPTURE format. 8-bit NV12
* stored as 128-pixel-wide column tiles (SAND128 layout).
* Pi-specific; not in mainline drm_fourcc.h (uses NV12 + a
* BROADCOM_SAND128 modifier for DRM_PRIME). Our consumer path
* always detiles to linear NV12 in copy_surface_to_image, so
* we don't expose the SAND modifier downstream drm_format is
* still DRM_FORMAT_NV12 and drm_modifier MOD_NONE so the
* format-is-linear gate doesn't pull us into tiled_to_planar
* (Sunxi-specific). image.c branches on v4l2_format ==
* V4L2_PIX_FMT_NV12_COL128 to invoke the dedicated detile.
*/
.description = "NV12 SAND128 (8-bit, rpi-hevc-dec)",
.v4l2_format = V4L2_PIX_FMT_NV12_COL128,
.v4l2_buffers_count = 1,
.v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV12,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 16,
},
// Code to handle this DRM_FORMAT is __arm__ only
#ifdef __arm__
{
.description = "Sunxi tiled NV12 YUV",
.v4l2_format = V4L2_PIX_FMT_SUNXI_TILED_NV12,
@@ -91,22 +66,56 @@ static struct video_format formats[] = {
.planes_count = 2,
.bpp = 16
},
#endif
};
static unsigned int formats_count = sizeof(formats) / sizeof(formats[0]);
struct video_format *video_format_find(unsigned int pixelformat)
struct video_format *video_format_find(unsigned int pixelformat, bool mplane)
{
unsigned int i;
for (i = 0; i < formats_count; i++)
if (formats[i].v4l2_format == pixelformat)
if (formats[i].v4l2_format == pixelformat &&
formats[i].v4l2_mplane == mplane)
return &formats[i];
return NULL;
}
/*
* Probe the V4L2 video device for a supported CAPTURE pixel format and
* return the matching video_format entry. Tries single-plane CAPTURE first
* (the original sunxi-cedrus path: SUNXI_TILED_NV12 or NV12), then falls
* back to multi-plane CAPTURE (Rockchip hantro / RK3588 VDPU381). Returns
* NULL if no match.
*
* Centralised here so RequestInit can populate driver_data->video_format
* before any vaCreateContext call. The original library set it lazily in
* RequestCreateSurfaces, but Chromium's vaapi_video_decoder may call
* vaCreateContext first (with surfaces=NULL, surfaces_count=0) and only
* create surfaces afterwards via vaCreateSurfaces2.
*/
struct video_format *video_format_probe(int video_fd)
{
struct video_format *format = NULL;
if (v4l2_find_format(video_fd, V4L2_BUF_TYPE_VIDEO_CAPTURE,
V4L2_PIX_FMT_SUNXI_TILED_NV12))
format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12, false);
if (format == NULL &&
v4l2_find_format(video_fd, V4L2_BUF_TYPE_VIDEO_CAPTURE,
V4L2_PIX_FMT_NV12))
format = video_format_find(V4L2_PIX_FMT_NV12, false);
if (format == NULL &&
v4l2_find_format(video_fd, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
V4L2_PIX_FMT_NV12))
format = video_format_find(V4L2_PIX_FMT_NV12, true);
return format;
}
bool video_format_is_linear(struct video_format *format)
{
if (format == NULL)
+2 -1
View File
@@ -38,7 +38,8 @@ struct video_format {
unsigned int bpp;
};
struct video_format *video_format_find(unsigned int pixelformat);
struct video_format *video_format_find(unsigned int pixelformat, bool mplane);
struct video_format *video_format_probe(int video_fd);
bool video_format_is_linear(struct video_format *format);
#endif
-263
View File
@@ -1,263 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* fresnel-fourier iter3 Phase 6 commit B: VP8 codec dispatcher
* implemented against V4L2_CID_STATELESS_VP8_FRAME (kernel UAPI
* <linux/v4l2-controls.h>:1900-1958). Single batched control per
* frame, no init-time device-wide menus (VP8 has no DECODE_MODE/
* START_CODE confirmed by Phase 0 V4L2 inventory + Phase 3
* cross-validator strace).
*
* Reference: FFmpeg libavcodec/v4l2_request_vp8.c (kwiboo branch);
* FFmpeg libavcodec/vaapi_vp8.c (VAAPI source-side
* verification of the field semantics);
* kernel drivers/media/platform/verisilicon/
* hantro_g1_vp8_dec.c (RK3399 hardware reads
* first_part_header_bits + first_part_size to compute
* MB-data DMA offset).
*
* Phase 5 review amendments incorporated (see phase5_iter3_review.md):
* C1 first_part_header_bits = slice->macroblock_offset
* (NOT 0; kernel reads it unconditionally; same formula as
* v4l2_request_vp8.c uses internally)
* C2 first_part_size = slice->partition_size[0] +
* ((macroblock_offset + 7) / 8)
* (recover total partition size from VAAPI's post-parse
* remainder)
* C3 VAProbabilityBufferType (not VAProbabilityDataBufferType)
* C4 (int8_t) cast (not (s8); kernel-internal typedef not in
* userspace UAPI)
* S3 assert(probability_set) runtime guard (kernel has NO
* coeff_probs default fallback; consumer MUST send
* VAProbabilityBufferType per frame)
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#include "vp8.h"
#include "context.h"
#include "request.h"
#include "surface.h"
#include <assert.h>
#include <stdint.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/videodev2.h>
#include <linux/v4l2-controls.h>
#include "v4l2.h"
int vp8_set_controls(struct request_data *driver_data,
struct object_context *context_object,
struct object_surface *surface_object)
{
VAPictureParameterBufferVP8 *picture =
&surface_object->params.vp8.picture;
VASliceParameterBufferVP8 *slice =
&surface_object->params.vp8.slice;
VAIQMatrixBufferVP8 *iqmatrix =
&surface_object->params.vp8.iqmatrix;
VAProbabilityDataBufferVP8 *probability =
&surface_object->params.vp8.probability;
bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set;
bool probability_set = surface_object->params.vp8.probability_set;
struct v4l2_ctrl_vp8_frame frame;
struct object_surface *last_ref;
struct object_surface *golden_ref;
struct object_surface *alt_ref;
int rc;
int i, j;
memset(&frame, 0, sizeof frame);
/* Phase 5 S3: kernel has no coeff_probs default fallback. The
* VAAPI consumer chain (FFmpeg's vaapi_vp8.c:146-148, used by
* mpv and ffmpeg-vaapi) always sends VAProbabilityBufferType
* per frame. Surface immediately if a future consumer doesn't. */
assert(probability_set);
/* Clause 3: frame geometry + per-frame scalars */
frame.width = picture->frame_width;
frame.height = picture->frame_height;
frame.horizontal_scale = 0; /* not exposed by VAAPI */
frame.vertical_scale = 0;
frame.version = picture->pic_fields.bits.version;
frame.prob_skip_false = picture->prob_skip_false;
frame.prob_intra = picture->prob_intra;
frame.prob_last = picture->prob_last;
frame.prob_gf = picture->prob_gf;
/* Phase 3 Q2: VAAPI counts include control partition;
* kernel counts DCT only off-by-one. */
frame.num_dct_parts = slice->num_of_partitions - 1;
/* Clause 4: DPB timestamp resolution (mirrors mpeg2.c pattern;
* NULL surface timestamp stays 0 from memset). */
last_ref = SURFACE(driver_data, picture->last_ref_frame);
golden_ref = SURFACE(driver_data, picture->golden_ref_frame);
alt_ref = SURFACE(driver_data, picture->alt_ref_frame);
if (last_ref != NULL)
frame.last_frame_ts =
v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref != NULL)
frame.golden_frame_ts =
v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref != NULL)
frame.alt_frame_ts =
v4l2_timeval_to_ns(&alt_ref->timestamp);
/* Clause 5: loop filter mapping */
for (i = 0; i < 4; i++) {
frame.lf.ref_frm_delta[i] =
picture->loop_filter_deltas_ref_frame[i];
frame.lf.mb_mode_delta[i] =
picture->loop_filter_deltas_mode[i];
}
frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level;
frame.lf.level = picture->loop_filter_level[0];
if (picture->pic_fields.bits.loop_filter_adj_enable)
frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE;
if (picture->pic_fields.bits.mode_ref_lf_delta_update)
frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE;
if (picture->pic_fields.bits.filter_type)
frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE;
/* Clause 6: quantization base + delta derivation */
if (iqmatrix_set) {
frame.quant.y_ac_qi =
iqmatrix->quantization_index[0][0];
frame.quant.y_dc_delta = (int8_t)
(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (int8_t)
(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (int8_t)
(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (int8_t)
(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (int8_t)
(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
}
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
for (i = 1; i < 4; i++)
frame.segment.quant_update[i] = (int8_t)
(iqmatrix->quantization_index[i][0] -
iqmatrix->quantization_index[0][0]);
}
/* Clause 7: segment fields */
for (i = 0; i < 3; i++)
frame.segment.segment_probs[i] =
picture->mb_segment_tree_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED;
if (picture->pic_fields.bits.update_mb_segmentation_map)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.update_segment_feature_data)
frame.segment.flags |=
V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA;
/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta. FFmpeg sets
* unconditionally per !s->segmentation.absolute_vals (default).
* Kernel ignores when ENABLED bit clear (BBB case). */
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
if (picture->pic_fields.bits.segmentation_enabled) {
for (i = 0; i < 4; i++)
frame.segment.lf_update[i] = (int8_t)
(picture->loop_filter_level[i] -
picture->loop_filter_level[0]);
}
/* Clause 8: entropy table mapping (3 VAAPI sources merged) */
for (i = 0; i < 4; i++)
frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i];
for (i = 0; i < 3; i++)
frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i];
for (i = 0; i < 2; i++)
for (j = 0; j < 19; j++)
frame.entropy.mv_probs[i][j] =
picture->mv_probs[i][j];
/* coeff_probs[4][8][3][11]: VAAPI layout matches kernel exactly;
* direct memcpy. Both vaapi_vp8.c:133-143 and v4l2_request_vp8.c:
* 141-153 apply identical coeff_bands_inverse reordering before
* writing VAAPI consumer has done the reordering for us. */
memcpy(frame.entropy.coeff_probs,
probability->dct_coeff_probs,
sizeof frame.entropy.coeff_probs);
/* Clause 9: coder state + first-partition fields */
frame.coder_state.range = picture->bool_coder_ctx.range;
frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* Phase 5 C1+C2: macroblock_offset IS first_part_header_bits by
* source identity; kernel hantro_g1_vp8_dec.c:260 reads it
* unconditionally to compute MB-data DMA offset. partition_size[0]
* is the post-parse REMAINDER; recover total via
* + ceil(macroblock_offset/8). */
frame.first_part_header_bits = slice->macroblock_offset;
frame.first_part_size =
slice->partition_size[0] +
((uint32_t)slice->macroblock_offset + 7) / 8;
for (i = 0; i < 8; i++)
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
/* Clause 9: flags assembly (6 mainline-documented bits only;
* EXPERIMENTAL + bit 0x40 NOT replicated despite ffmpeg-v4l2-
* request-git setting them kernel hantro_vp8.c only inspects
* KEY_FRAME bit). VAAPI inverts: key_frame=0 means it IS a
* keyframe per VP8 spec. */
if (!picture->pic_fields.bits.key_frame)
frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME;
frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME;
if (picture->pic_fields.bits.mb_no_coeff_skip)
frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF;
if (picture->pic_fields.bits.sign_bias_golden)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.sign_bias_alternate)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT;
/* Clause 1+10: single-control batched submission */
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME,
.ptr = &frame,
.size = sizeof frame,
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
}
-38
View File
@@ -1,38 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* fresnel-fourier iter3: VP8 codec dispatcher header.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#ifndef _VP8_H_
#define _VP8_H_
struct object_context;
struct object_surface;
struct request_data;
int vp8_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object);
#endif
-754
View File
@@ -1,754 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* fresnel-fourier iter4 Phase 6 commit B: VP9 codec dispatcher
* implemented against V4L2_CID_STATELESS_VP9_FRAME (0xa40a2c) +
* V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (0xa40a2d). rkvdec on
* RK3399 mandatorily requires both controls per
* drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble:752.
*
* Reference: FFmpeg libavcodec/v4l2_request_vp9.c (kwiboo branch);
* FFmpeg libavcodec/vaapi_vp9.c (VAAPI source-side
* verification of field semantics);
* kernel drivers/media/v4l2-core/v4l2-vp9.c +
* drivers/staging/media/rkvdec/rkvdec-vp9.c.
*
* Phase 5 review amendments incorporated (see phase5_iter4_review.md):
* C1 frame.interpolation_filter = picture->mcomp_filter_type
* (NO XOR; vaapi_vp9.c:62 already applied the XOR before storing
* into VAAPI's mcomp_filter_type; double-XOR would swap
* EIGHTTAP and EIGHTTAP_SMOOTH for inter frames)
* C2 LF deltas persisted across frames in object_context.vp9_lf,
* init to VP9 spec defaults {1,0,-1,-1,0,0} on
* keyframe/intra_only/error_resilient, updated only when parsed
* lf_delta.update=1, ALWAYS copied to kernel control
* C3 vp9_fill_compressed_hdr takes out_reference_mode pointer
* (reference_mode lives in v4l2_ctrl_vp9_frame, NOT in
* _compressed_hdr; threaded via parameter)
*
* Suggested findings incorporated:
* S4 uv_mode memcpy from FFmpeg's fill_compressed_hdr is omitted
* rkvdec reads uv_mode from kernel's persistent
* probability_tables, NOT from prob_updates ctrl
* S5 lossless_flag semantics align with FFmpeg's s->s.h.lossless
* (LosslessFlag = base_qindex==0 && y_dc_delta_q==0 &&
* uv_dc_delta_q==0 && uv_ac_delta_q==0)
*/
#include "vp9.h"
#include "v4l2.h"
#include "utils.h"
#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
#include <string.h>
#include <linux/v4l2-controls.h>
#include <linux/videodev2.h>
/* Clause 3: compile-time size assertions. UAPI shifts must fail loudly. */
_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168,
"v4l2_ctrl_vp9_frame size mismatch — kernel UAPI changed");
_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040,
"v4l2_ctrl_vp9_compressed_hdr size mismatch — kernel UAPI changed");
/*
* VPX range coder minimal port of FFmpeg vpx_rac.[ch] + vp89_rac.h.
* Stateless static helpers; bitstream-only readers. ~80 LOC.
*/
static const uint8_t vpx_norm_shift[256] = {
8,7,6,6,5,5,5,5,4,4,4,4,4,4,4,4,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
};
struct vp9_rac {
int high;
int bits;
const uint8_t *buffer;
const uint8_t *end;
unsigned int code_word;
};
static int vp9_rac_init(struct vp9_rac *c, const uint8_t *buf, int size)
{
if (size < 1)
return -1;
c->high = 255;
c->bits = -16;
c->buffer = buf;
c->end = buf + size;
c->code_word = ((unsigned)buf[0] << 16) | ((unsigned)buf[1] << 8) | buf[2];
c->buffer += 3;
return 0;
}
static unsigned vp9_rac_renorm(struct vp9_rac *c)
{
int shift = vpx_norm_shift[c->high];
int bits = c->bits;
unsigned code_word = c->code_word;
c->high <<= shift;
code_word <<= shift;
bits += shift;
if (bits >= 0 && c->buffer + 1 < c->end) {
code_word |= (((unsigned)c->buffer[0] << 8) | c->buffer[1]) << bits;
c->buffer += 2;
bits -= 16;
}
c->bits = bits;
return code_word;
}
static int vp9_rac_get_prob(struct vp9_rac *c, uint8_t prob)
{
unsigned code_word = vp9_rac_renorm(c);
unsigned low = 1 + (((c->high - 1) * prob) >> 8);
unsigned low_shift = low << 16;
int bit = code_word >= low_shift;
c->high = bit ? c->high - low : low;
c->code_word = bit ? code_word - low_shift : code_word;
return bit;
}
static int vp9_rac_get_branchy(struct vp9_rac *c, int prob)
{
return vp9_rac_get_prob(c, (uint8_t)prob);
}
static int vp9_rac_bit(struct vp9_rac *c)
{
return vp9_rac_get_prob(c, 128);
}
static int vp9_rac_uint(struct vp9_rac *c, int bits)
{
int value = 0;
while (bits--)
value = (value << 1) | vp9_rac_bit(c);
return value;
}
/* inv_map_table: VP9 differential probability update table.
* Verbatim copy from FFmpeg v4l2_request_vp9.c:44-64. */
static const uint8_t inv_map_table[255] = {
7, 20, 33, 46, 59, 72, 85, 98, 111, 124, 137, 150, 163, 176,
189, 202, 215, 228, 241, 254, 1, 2, 3, 4, 5, 6, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100,
101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 112, 113, 114, 115,
116, 117, 118, 119, 120, 121, 122, 123, 125, 126, 127, 128, 129, 130,
131, 132, 133, 134, 135, 136, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160,
161, 162, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 190, 191,
192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 203, 204, 205, 206,
207, 208, 209, 210, 211, 212, 213, 214, 216, 217, 218, 219, 220, 221,
222, 223, 224, 225, 226, 227, 229, 230, 231, 232, 233, 234, 235, 236,
237, 238, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251,
252, 253, 253,
};
static int read_prob_delta(struct vp9_rac *c)
{
int d;
if (!vp9_rac_bit(c)) {
d = vp9_rac_uint(c, 4);
} else if (!vp9_rac_bit(c)) {
d = vp9_rac_uint(c, 4) + 16;
} else if (!vp9_rac_bit(c)) {
d = vp9_rac_uint(c, 5) + 32;
} else {
d = vp9_rac_uint(c, 7);
if (d >= 65)
d = (d << 1) - 65 + vp9_rac_bit(c);
d += 64;
}
return inv_map_table[d];
}
/*
* Clause 6: minimal big-endian bit reader over the uncompressed header
* for the fields VAAPI doesn't expose: lf_delta_enabled / lf_delta_update /
* lf_ref_deltas / lf_mode_deltas / base_q_idx / delta_q_y_dc / delta_q_uv_dc /
* delta_q_uv_ac.
*
* Walks: frame_marker(2) + profile(2 or 3) + show_existing_frame(1) +
* frame_type(1) + show_frame(1) + error_resilient(1) +
* if keyframe: sync_code(24) + color_config + frame_size + render_size
* else: intra_only(1 if !show_frame) + reset(2) +
* if intra_only: sync_code(24) + (if profile>0: color_config) +
* refresh_flags(8) + frame_size + render_size
* else: refresh_flags(8) + 3*(ref_idx(3)+sign_bias(1)) +
* frame_size_with_refs + allow_hpmv(1) + interp_filter(2 or 3)
* loop_filter_params + quantization_params
*
* Only profile-0 paths are exercised for BBB; non-profile-0 fields read
* their bits but do not write them back. Keep targeted, not general.
*/
struct uh_reader {
const uint8_t *buf;
size_t size;
size_t bit_pos;
};
static unsigned uh_read_bits(struct uh_reader *r, int n)
{
unsigned v = 0;
int i;
for (i = 0; i < n; i++) {
size_t byte = r->bit_pos >> 3;
int bit = 7 - (r->bit_pos & 7);
if (byte >= r->size)
return 0;
v = (v << 1) | ((r->buf[byte] >> bit) & 1);
r->bit_pos++;
}
return v;
}
/* Phase 7 fix: VP9 spec s(N) is N magnitude bits + 1 sign bit (total N+1).
* Previous uh_read_signed_6 read 4+1=5 bits instead of 6+1=7; bit drift of
* 2 bits per ref_delta accumulated across the lf_delta updates and shifted
* base_q_idx by 8 bits, producing 0x41 (frame 1 keyframe) instead of 0x2e.
* Phase 3 anchor cross-check confirmed the corrected 7-bit read places
* base_q_idx at bit 111 with value 0x2e=46. */
static int uh_read_sbits(struct uh_reader *r, int n)
{
int v = (int)uh_read_bits(r, n);
int sign = (int)uh_read_bits(r, 1);
return sign ? -v : v;
}
static int uh_read_delta_q(struct uh_reader *r)
{
/* read_delta_q(): if delta_coded bit set, read s(4) = 4 mag + 1 sign */
if (uh_read_bits(r, 1))
return uh_read_sbits(r, 4);
return 0;
}
static void vp9_parse_uncompressed_header_lf_quant(
const uint8_t *data, uint32_t size,
struct v4l2_ctrl_vp9_frame *frame,
int8_t persistent_ref_deltas[4],
int8_t persistent_mode_deltas[2],
bool *out_keyframe_or_intraonly,
bool *out_lf_delta_updated)
{
struct uh_reader r = { .buf = data, .size = size, .bit_pos = 0 };
bool keyframe, intra_only = false, show_frame, error_resilient;
int profile;
int i;
*out_lf_delta_updated = false;
uh_read_bits(&r, 2); /* frame_marker */
{
int p_lo = uh_read_bits(&r, 1);
int p_hi = uh_read_bits(&r, 1);
profile = p_lo + (p_hi << 1);
if (profile == 3)
uh_read_bits(&r, 1);
}
if (uh_read_bits(&r, 1)) /* show_existing_frame */
return; /* no LF/quant in the bitstream */
keyframe = !uh_read_bits(&r, 1);
show_frame = uh_read_bits(&r, 1);
error_resilient = uh_read_bits(&r, 1);
if (keyframe) {
uh_read_bits(&r, 24); /* sync_code */
/* color_config (profile=0): just bt709 + range bit */
if (profile >= 2)
uh_read_bits(&r, 1); /* ten_or_twelve_bit */
uh_read_bits(&r, 3); /* color_space */
if (1) { /* color_space != CS_RGB */
uh_read_bits(&r, 1); /* color_range */
if (profile == 1 || profile == 3) {
uh_read_bits(&r, 2); /* subsampling */
uh_read_bits(&r, 1); /* reserved */
}
} else if (profile == 1 || profile == 3) {
uh_read_bits(&r, 1); /* reserved */
}
uh_read_bits(&r, 16); /* frame_width_minus_1 */
uh_read_bits(&r, 16); /* frame_height_minus_1 */
if (uh_read_bits(&r, 1)) {
uh_read_bits(&r, 16);
uh_read_bits(&r, 16);
}
} else {
intra_only = show_frame ? 0 : uh_read_bits(&r, 1);
if (!error_resilient)
uh_read_bits(&r, 2); /* reset_frame_context */
if (intra_only) {
uh_read_bits(&r, 24); /* sync_code */
if (profile > 0) {
if (profile >= 2)
uh_read_bits(&r, 1);
uh_read_bits(&r, 3); /* color_space */
uh_read_bits(&r, 1); /* color_range */
if (profile == 1 || profile == 3) {
uh_read_bits(&r, 2);
uh_read_bits(&r, 1);
}
}
uh_read_bits(&r, 8); /* refresh_frame_flags */
uh_read_bits(&r, 16);
uh_read_bits(&r, 16);
if (uh_read_bits(&r, 1)) {
uh_read_bits(&r, 16);
uh_read_bits(&r, 16);
}
} else {
uh_read_bits(&r, 8); /* refresh_frame_flags */
for (i = 0; i < 3; i++) {
uh_read_bits(&r, 3);
uh_read_bits(&r, 1);
}
/* frame_size_with_refs: up to 3 found_ref bits, then
* if no found_ref: explicit width+height; else ref-pick.
* Then render_size. Just walk it. */
{
bool found = false;
for (i = 0; i < 3; i++) {
if (uh_read_bits(&r, 1))
found = true;
}
if (!found) {
uh_read_bits(&r, 16);
uh_read_bits(&r, 16);
}
if (uh_read_bits(&r, 1)) {
uh_read_bits(&r, 16);
uh_read_bits(&r, 16);
}
}
uh_read_bits(&r, 1); /* allow_hpmv */
if (uh_read_bits(&r, 1)) /* interp_filter switchable */
;
else
uh_read_bits(&r, 2); /* interp_filter literal */
}
}
*out_keyframe_or_intraonly = keyframe || intra_only;
uh_read_bits(&r, 1); /* refresh_frame_context */
uh_read_bits(&r, 1); /* frame_parallel_decoding_mode */
if (!error_resilient || keyframe || intra_only)
uh_read_bits(&r, 2); /* frame_context_idx + reset_frame_context */
/* loop_filter_params */
uh_read_bits(&r, 6); /* filter_level (already in VAAPI) */
uh_read_bits(&r, 3); /* sharpness (already in VAAPI) */
if (uh_read_bits(&r, 1)) { /* lf_delta.enabled */
frame->lf.flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED;
if (uh_read_bits(&r, 1)) { /* lf_delta.updated */
frame->lf.flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE;
*out_lf_delta_updated = true;
for (i = 0; i < 4; i++) {
if (uh_read_bits(&r, 1))
persistent_ref_deltas[i] =
(int8_t)uh_read_sbits(&r, 6);
}
for (i = 0; i < 2; i++) {
if (uh_read_bits(&r, 1))
persistent_mode_deltas[i] =
(int8_t)uh_read_sbits(&r, 6);
}
}
}
/* quantization_params */
frame->quant.base_q_idx = (uint8_t)uh_read_bits(&r, 8);
frame->quant.delta_q_y_dc = (int8_t)uh_read_delta_q(&r);
frame->quant.delta_q_uv_dc = (int8_t)uh_read_delta_q(&r);
frame->quant.delta_q_uv_ac = (int8_t)uh_read_delta_q(&r);
}
/*
* Clause 9: compressed-header parser port of FFmpeg
* v4l2_request_vp9.c:99-261::fill_compressed_hdr.
*
* Phase 5 C3: out_reference_mode threaded via out-param. Callers
* derive `allowcompinter` from VAAPI sign-bias bits and pass it.
*/
#define V4L2_VP9_TX_MODE_ONLY_4X4_LOCAL 0
#define V4L2_VP9_TX_MODE_ALLOW_32X32_LOCAL 3
#define V4L2_VP9_TX_MODE_SELECT_LOCAL 4
static void vp9_fill_compressed_hdr(
struct v4l2_ctrl_vp9_compressed_hdr *ctrl,
const uint8_t *buffer, uint32_t size,
uint8_t lossless_flag,
bool keyframe_or_intraonly,
bool allowcompinter,
bool highprecision_mvs,
int interp_filter_switchable,
uint8_t *out_reference_mode)
{
struct vp9_rac c;
int comppredmode = 0;
int i, j, k, l, m, n;
*out_reference_mode = 0;
if (vp9_rac_init(&c, buffer, size) < 0)
return;
if (vp9_rac_get_branchy(&c, 128)) /* marker bit */
return;
if (lossless_flag) {
ctrl->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4_LOCAL;
} else {
ctrl->tx_mode = (uint8_t)vp9_rac_uint(&c, 2);
if (ctrl->tx_mode == V4L2_VP9_TX_MODE_ALLOW_32X32_LOCAL)
ctrl->tx_mode = (uint8_t)(ctrl->tx_mode + vp9_rac_bit(&c));
if (ctrl->tx_mode == V4L2_VP9_TX_MODE_SELECT_LOCAL) {
for (i = 0; i < 2; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->tx8[i][0] = (uint8_t)read_prob_delta(&c);
for (i = 0; i < 2; i++)
for (j = 0; j < 2; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->tx16[i][j] = (uint8_t)read_prob_delta(&c);
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->tx32[i][j] = (uint8_t)read_prob_delta(&c);
}
}
for (i = 0; i < 4; i++) {
if (vp9_rac_bit(&c)) {
for (j = 0; j < 2; j++)
for (k = 0; k < 2; k++)
for (l = 0; l < 6; l++)
for (m = 0; m < 6; m++) {
if (m >= 3 && l == 0)
break;
for (n = 0; n < 3; n++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->coef[i][j][k][l][m][n] =
(uint8_t)read_prob_delta(&c);
}
}
if (ctrl->tx_mode == i)
break;
}
for (i = 0; i < 3; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->skip[i] = (uint8_t)read_prob_delta(&c);
if (!keyframe_or_intraonly) {
for (i = 0; i < 7; i++)
for (j = 0; j < 3; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->inter_mode[i][j] = (uint8_t)read_prob_delta(&c);
if (interp_filter_switchable)
for (i = 0; i < 4; i++)
for (j = 0; j < 2; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->interp_filter[i][j] =
(uint8_t)read_prob_delta(&c);
for (i = 0; i < 4; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->is_inter[i] = (uint8_t)read_prob_delta(&c);
if (allowcompinter) {
comppredmode = vp9_rac_bit(&c);
if (comppredmode)
comppredmode += vp9_rac_bit(&c);
if (comppredmode == 2) /* PRED_SWITCHABLE */
for (i = 0; i < 5; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->comp_mode[i] = (uint8_t)read_prob_delta(&c);
} else {
comppredmode = 0; /* PRED_SINGLEREF */
}
if (comppredmode != 1) { /* != PRED_COMPREF */
for (i = 0; i < 5; i++) {
if (vp9_rac_get_branchy(&c, 252))
ctrl->single_ref[i][0] = (uint8_t)read_prob_delta(&c);
if (vp9_rac_get_branchy(&c, 252))
ctrl->single_ref[i][1] = (uint8_t)read_prob_delta(&c);
}
}
if (comppredmode != 0) { /* != PRED_SINGLEREF */
for (i = 0; i < 5; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->comp_ref[i] = (uint8_t)read_prob_delta(&c);
}
for (i = 0; i < 4; i++)
for (j = 0; j < 9; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->y_mode[i][j] = (uint8_t)read_prob_delta(&c);
for (i = 0; i < 4; i++)
for (j = 0; j < 4; j++)
for (k = 0; k < 3; k++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->partition[(i * 4) + j][k] =
(uint8_t)read_prob_delta(&c);
for (i = 0; i < 3; i++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.joint[i] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
for (i = 0; i < 2; i++) {
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.sign[i] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
for (j = 0; j < 10; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.classes[i][j] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.class0_bit[i] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
for (j = 0; j < 10; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.bits[i][j] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 2; j++)
for (k = 0; k < 3; k++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.class0_fr[i][j][k] =
(uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
for (j = 0; j < 3; j++)
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.fr[i][j] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
}
if (highprecision_mvs) {
for (i = 0; i < 2; i++) {
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.class0_hp[i] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
if (vp9_rac_get_branchy(&c, 252))
ctrl->mv.hp[i] = (uint8_t)((vp9_rac_uint(&c, 7) << 1) | 1);
}
}
}
*out_reference_mode = (uint8_t)comppredmode;
}
/*
* Clause 1+2+4+5+7+10+11+12: orchestrate VP9 control submission.
* 2 batched controls per frame: VP9_FRAME + VP9_COMPRESSED_HDR.
*/
int vp9_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferVP9 *picture =
&surface_object->params.vp9.picture;
VASliceParameterBufferVP9 *slice =
&surface_object->params.vp9.slice;
struct v4l2_ctrl_vp9_frame frame;
struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr;
struct v4l2_ext_control ctrls[2];
int rc, i;
bool keyframe = !picture->pic_fields.bits.frame_type;
bool intra_only = picture->pic_fields.bits.intra_only;
bool error_resilient = picture->pic_fields.bits.error_resilient_mode;
bool allowcompinter;
bool keyframe_or_intraonly_parsed = false;
bool lf_delta_updated = false;
uint8_t parsed_reference_mode = 0;
memset(&frame, 0, sizeof frame);
memset(&compressed_hdr, 0, sizeof compressed_hdr);
/* Clause 4: frame geometry + per-frame scalars */
frame.frame_width_minus_1 = (uint16_t)(picture->frame_width - 1);
frame.frame_height_minus_1 = (uint16_t)(picture->frame_height - 1);
frame.render_width_minus_1 = frame.frame_width_minus_1;
frame.render_height_minus_1 = frame.frame_height_minus_1;
frame.profile = picture->profile;
frame.bit_depth = picture->bit_depth;
frame.tile_cols_log2 = picture->log2_tile_columns;
frame.tile_rows_log2 = picture->log2_tile_rows;
frame.frame_context_idx = picture->pic_fields.bits.frame_context_idx;
frame.lf.level = picture->filter_level;
frame.lf.sharpness = picture->sharpness_level;
frame.uncompressed_header_size = picture->frame_header_length_in_bytes;
frame.compressed_header_size = picture->first_partition_size;
/* Clause 5: DPB timestamp resolution */
{
VASurfaceID last_id = picture->reference_frames[picture->pic_fields.bits.last_ref_frame];
VASurfaceID gold_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame];
VASurfaceID alt_id = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame];
struct object_surface *last_ref =
(last_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id) : NULL;
struct object_surface *gold_ref =
(gold_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, gold_id) : NULL;
struct object_surface *alt_ref =
(alt_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id) : NULL;
if (last_ref) frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (gold_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&gold_ref->timestamp);
if (alt_ref) frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
}
if (picture->pic_fields.bits.last_ref_frame_sign_bias)
frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST;
if (picture->pic_fields.bits.golden_ref_frame_sign_bias)
frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.alt_ref_frame_sign_bias)
frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;
allowcompinter = !(
picture->pic_fields.bits.last_ref_frame_sign_bias ==
picture->pic_fields.bits.golden_ref_frame_sign_bias &&
picture->pic_fields.bits.golden_ref_frame_sign_bias ==
picture->pic_fields.bits.alt_ref_frame_sign_bias);
/* Clause 6: persistent LF delta state — Phase 5 C2 */
if (!context->vp9_lf.initialized || keyframe || intra_only || error_resilient) {
context->vp9_lf.ref_deltas[0] = 1;
context->vp9_lf.ref_deltas[1] = 0;
context->vp9_lf.ref_deltas[2] = -1;
context->vp9_lf.ref_deltas[3] = -1;
context->vp9_lf.mode_deltas[0] = 0;
context->vp9_lf.mode_deltas[1] = 0;
context->vp9_lf.initialized = true;
}
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data,
surface_object->source_size,
&frame,
context->vp9_lf.ref_deltas,
context->vp9_lf.mode_deltas,
&keyframe_or_intraonly_parsed,
&lf_delta_updated);
(void)lf_delta_updated;
for (i = 0; i < 4; i++)
frame.lf.ref_deltas[i] = context->vp9_lf.ref_deltas[i];
for (i = 0; i < 2; i++)
frame.lf.mode_deltas[i] = context->vp9_lf.mode_deltas[i];
/* Clause 7: segmentation mapping */
for (i = 0; i < 7; i++)
frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i];
for (i = 0; i < 3; i++)
frame.seg.pred_probs[i] = picture->segment_pred_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED;
if (picture->pic_fields.bits.segmentation_update_map)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.segmentation_temporal_update)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
for (i = 0; i < 8; i++) {
if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) {
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME;
frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] =
(int16_t)slice->seg_param[i].segment_flags.fields.segment_reference;
}
if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped)
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP;
}
/* Clause 10: frame flags + reference_mode + interpolation_filter */
if (keyframe)
frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME;
if (picture->pic_fields.bits.show_frame)
frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME;
if (error_resilient)
frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT;
if (intra_only)
frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY;
if (picture->pic_fields.bits.allow_high_precision_mv)
frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV;
if (picture->pic_fields.bits.refresh_frame_context)
frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX;
if (picture->pic_fields.bits.frame_parallel_decoding_mode)
frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE;
if (picture->pic_fields.bits.subsampling_x)
frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING;
if (picture->pic_fields.bits.subsampling_y)
frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING;
/* Phase 5 C1: NO XOR. VAAPI's mcomp_filter_type is already post-XOR. */
frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type;
/* reset_frame_context: FFmpeg's (resetctx > 0 ? resetctx - 1 : 0) */
frame.reset_frame_context =
picture->pic_fields.bits.reset_frame_context > 0
? (uint8_t)(picture->pic_fields.bits.reset_frame_context - 1)
: 0;
/* Clause 9: compressed-header parser fills both compressed_hdr and
* out_reference_mode. allowcompinter derived from sign biases above. */
{
int interp_switchable = (frame.interpolation_filter == V4L2_VP9_INTERP_FILTER_SWITCHABLE);
vp9_fill_compressed_hdr(
&compressed_hdr,
surface_object->source_data + frame.uncompressed_header_size,
frame.compressed_header_size,
picture->pic_fields.bits.lossless_flag,
keyframe || intra_only,
allowcompinter,
picture->pic_fields.bits.allow_high_precision_mv,
interp_switchable,
&parsed_reference_mode);
}
frame.reference_mode = parsed_reference_mode;
/* Clause 11: 2-control batched submission */
memset(ctrls, 0, sizeof ctrls);
ctrls[0].id = V4L2_CID_STATELESS_VP9_FRAME;
ctrls[0].ptr = &frame;
ctrls[0].size = sizeof frame;
ctrls[1].id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR;
ctrls[1].ptr = &compressed_hdr;
ctrls[1].size = sizeof compressed_hdr;
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return VA_STATUS_SUCCESS;
}
-38
View File
@@ -1,38 +0,0 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* fresnel-fourier iter4 Phase 6 commit B: VP9 codec dispatcher header.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _VP9_H_
#define _VP9_H_
#include "context.h"
#include "request.h"
#include "surface.h"
int vp9_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface);
#endif /* _VP9_H_ */
-167
View File
@@ -1,167 +0,0 @@
/*
* cap_pool_probe_pattern.c synthetic regression test for the
* iter5 sonnet C4 / iter6 candidate A "cap_pool resolution-change race."
*
* Exercises the surface-allocation pattern that originally tripped
* REQBUFS-EBUSY on the iter5-end driver: vaCreateSurfaces at one
* resolution, then vaDestroySurfaces, then vaCreateSurfaces at a
* different resolution. iter6's REINIT discipline + cap_pool's
* REQBUFS(0)-on-CAPTURE-and-OUTPUT during S_FMT-on-resolution-change
* (CreateSurfaces2 in surface.c) closes this race; this test anchors
* that fact with a deterministic repro.
*
* Build:
* gcc -O2 -Wall -Wextra -o cap_pool_probe_pattern \
* cap_pool_probe_pattern.c \
* $(pkg-config --cflags --libs libva libva-drm)
*
* Run:
* LIBVA_DRIVER_NAME=v4l2_request \
* LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
* LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
* ./cap_pool_probe_pattern
*
* Pass criterion (on iter6 driver and later):
* - Exit code 0
* - No "REQBUFS" / "EBUSY" / "Unable to request buffers" /
* "Unable to set format" lines on the v4l2-request driver's stderr
* - vainfo or visual inspection confirms the test program reached
* the "PASS" line on stdout
*
* Fail behavior pre-iter5: vaCreateSurfaces at the second resolution
* would emit REQBUFS-EBUSY because OUTPUT/CAPTURE buffers from the
* first allocation hadn't been torn down before S_FMT was attempted
* on the new resolution. iter5's CreateSurfaces2 added the dual
* REQBUFS(0) drain; iter6's REINIT keeps the OUTPUT pool's request_fd
* lifecycle clean across the destroy-recreate cycle.
*/
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <va/va.h>
#include <va/va_drm.h>
#define DRM_RENDER_NODE "/dev/dri/renderD128"
static const char *va_status_str(VAStatus s)
{
return vaErrorStr(s);
}
#define VA_OK_OR_FAIL(call, msg) do { \
VAStatus _vs = (call); \
if (_vs != VA_STATUS_SUCCESS) { \
fprintf(stderr, "FAIL: %s: %s (0x%x)\n", \
(msg), va_status_str(_vs), _vs); \
return 10; \
} \
} while (0)
int main(void)
{
int drm_fd;
VADisplay dpy;
int va_major = 0, va_minor = 0;
VAConfigID config = VA_INVALID_ID;
VAContextID context = VA_INVALID_ID;
VASurfaceID small_surfaces[4];
VASurfaceID big_surfaces[4];
const unsigned int small_w = 128, small_h = 128;
const unsigned int big_w = 1920, big_h = 1080;
/* Open render node + libva display. */
drm_fd = open(DRM_RENDER_NODE, O_RDWR);
if (drm_fd < 0) {
fprintf(stderr, "FAIL: open(%s): %s\n",
DRM_RENDER_NODE, strerror(errno));
return 1;
}
dpy = vaGetDisplayDRM(drm_fd);
if (dpy == NULL) {
fprintf(stderr, "FAIL: vaGetDisplayDRM returned NULL\n");
close(drm_fd);
return 2;
}
VA_OK_OR_FAIL(vaInitialize(dpy, &va_major, &va_minor),
"vaInitialize");
printf("libva %d.%d initialized via %s\n", va_major, va_minor,
DRM_RENDER_NODE);
/*
* vaCreateConfig with H.264 Main + VLD entrypoint forces our
* driver's RequestCreateConfig to set up the H.264 decode path,
* which is the path that reaches CreateSurfaces2 (and the
* resolution-change handling there).
*/
VA_OK_OR_FAIL(vaCreateConfig(dpy, VAProfileH264Main, VAEntrypointVLD,
NULL, 0, &config),
"vaCreateConfig(H264Main, VLD)");
/* Phase 1: allocate small probe-pattern surfaces.
*
* iter5 sonnet C4 specified the race as vaCreateSurfaces(small)
* then vaCreateSurfaces(big), allocation-only matching mpv's
* libplacebo probe pattern that surfaced the original failure.
* No context creation needed for the C4 race; the cap_pool's
* resolution-change handling lives in CreateSurfaces2 itself
* (REQBUFS(0)+S_FMT pair on the OUTPUT queue, cap_pool_destroy
* + cap_pool_init on the CAPTURE queue).
*
* (vaCreateContext + recreate at a new resolution surfaced an
* additional STREAMON-on-recreate failure during iter7 Phase 7
* verification. That's iter8 candidate; out of scope for the C4
* regression test.)
*/
printf("Phase 1: vaCreateSurfaces %ux%u, count=4\n", small_w, small_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
small_w, small_h, small_surfaces, 4,
NULL, 0),
"vaCreateSurfaces (small)");
/* Phase 2: dispose small surfaces. Our driver's CreateSurfaces2
* keeps the cap_pool initialized at the small resolution; the
* pool will be torn down + rebuilt by Phase 3's resolution-change
* branch in CreateSurfaces2.
*/
printf("Phase 2: vaDestroySurfaces (small)\n");
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, small_surfaces, 4),
"vaDestroySurfaces (small)");
/* Phase 3: allocate at the new (much larger) resolution. This is
* the C4 race-hitting path: pre-iter5 hit REQBUFS-EBUSY because
* CAPTURE/OUTPUT buffers from the small allocation hadn't been
* torn down before S_FMT on the new size. iter5's CreateSurfaces2
* added the dual REQBUFS(0) drain; iter7 also adds OUTPUT pool
* teardown for the case where a context-bound resolution change
* leaves the request_pool stale (defensive not exercised in
* this no-context test path).
*/
printf("Phase 3: vaCreateSurfaces %ux%u, count=4 (resolution change)\n",
big_w, big_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
big_w, big_h, big_surfaces, 4,
NULL, 0),
"vaCreateSurfaces (big)");
/* Phase 4: clean up. */
printf("Phase 4: cleanup\n");
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, big_surfaces, 4),
"vaDestroySurfaces (big)");
VA_OK_OR_FAIL(vaDestroyConfig(dpy, config),
"vaDestroyConfig");
VA_OK_OR_FAIL(vaTerminate(dpy),
"vaTerminate");
close(drm_fd);
(void)context; /* unused in the C4-faithful no-context test path */
printf("PASS: cap_pool probe-pattern resolution-change handled cleanly.\n");
printf("Inspect driver stderr for absence of REQBUFS/EBUSY/Unable lines.\n");
return 0;
}
-53
View File
@@ -1,53 +0,0 @@
#!/bin/bash
# run_cap_pool_probe.sh — orchestrate the cap_pool probe-pattern regression test.
#
# Runs the cap_pool_probe_pattern test program with the v4l2_request driver
# and grep-checks driver stderr for race indicators. Exits 0 on PASS, 1 on FAIL.
#
# Usage: ./run_cap_pool_probe.sh [path_to_test_binary]
# If no argument, looks for ./cap_pool_probe_pattern in the same directory.
set -eu
BIN="${1:-$(dirname "$0")/cap_pool_probe_pattern}"
if [[ ! -x "$BIN" ]]; then
echo "FAIL: test binary not found or not executable: $BIN" >&2
echo "Build it first:" >&2
echo " gcc -O2 -Wall -Wextra -o $BIN $(dirname "$0")/cap_pool_probe_pattern.c \\" >&2
echo " \$(pkg-config --cflags --libs libva libva-drm)" >&2
exit 2
fi
LOG=$(mktemp -t cap_pool_probe.XXXXXX.log)
trap 'rm -f "$LOG"' EXIT
env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
"$BIN" >"$LOG" 2>&1
rc=$?
echo "--- test program output ---"
cat "$LOG"
echo "--- end output ---"
if [[ "$rc" -ne 0 ]]; then
echo "FAIL: test binary exited with rc=$rc" >&2
exit 1
fi
# Race indicators on driver-prefixed lines only (avoids matching the
# test program's own informational output). Driver log lines start with
# "v4l2-request:".
race_lines=$(grep -E '^v4l2-request:' "$LOG" \
| grep -iE 'REQBUFS|EBUSY|Unable to request buffers|Unable to set format' \
|| true)
if [[ -n "$race_lines" ]]; then
echo "FAIL: driver stderr contains race indicators:" >&2
echo "$race_lines" >&2
exit 1
fi
echo "PASS: cap_pool probe-pattern test clean (no race indicators)."
exit 0
-139
View File
@@ -1,139 +0,0 @@
#!/bin/bash
# run_msync_pixel_verify.sh — verify decoded pixel correctness post-msync-removal.
#
# iter5 sweep commit d3a299b removed msync(MS_SYNC|MS_INVALIDATE) from the
# CAPTURE buffer DQBUF path alongside the iter1 patch-0010 hex-dump diagnostic.
# iter5 Phase 5 sonnet caveat C3 flagged: no formal pixel-correctness check
# was done. This script is that check.
#
# Approach:
# 1. SW reference: ffmpeg libavcodec H.264 decode of bbb_1080p30_h264.mp4,
# first 100 frames, NV12 raw output -> sw_ref.yuv.
# 2. HW subject: same input through our v4l2_request driver via
# ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi
# -i ... -vf hwdownload,format=nv12 -f rawvideo -pix_fmt nv12
# Captures the post-DQBUF buffer through libva readback, exercising
# the same code path we removed msync from.
# 3. Compare: byte-for-byte cmp + per-frame sha256.
#
# Pass: byte-for-byte identical (or per-frame sha matches) -> msync
# verifiably unnecessary on this hardware/kernel; iter5 sonnet C3 closes.
# Fail: divergence; restore msync in surface.c, re-run, document outcome.
#
# Usage: ./run_msync_pixel_verify.sh [fixture_path]
# If no argument, defaults to /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4
set -eu
FIXTURE="${1:-/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4}"
N_FRAMES=100
WORKDIR=$(mktemp -d -t msync_verify.XXXXXX)
trap 'rm -rf "$WORKDIR"' EXIT
if [[ ! -f "$FIXTURE" ]]; then
echo "FAIL: fixture not found: $FIXTURE" >&2
exit 2
fi
# Probe fixture dimensions for crop alignment of the HW path.
# Hantro pads height to MB boundaries (16-line align); FFmpeg SW decode
# returns crop-aligned (visible) frame size. Without explicit cropping
# on the HW side, hwdownload + format=nv12 emits MB-padded frames, which
# would diverge in size from SW even if pixels are correct.
read FIXTURE_W FIXTURE_H < <(ffprobe -v error -select_streams v:0 \
-show_entries stream=width,height -of csv=p=0 "$FIXTURE" \
| tr ',' ' ')
if [[ -z "${FIXTURE_W:-}" || -z "${FIXTURE_H:-}" ]]; then
echo "FAIL: ffprobe could not read width/height from $FIXTURE" >&2
exit 2
fi
echo "Fixture: $FIXTURE ($FIXTURE_W x $FIXTURE_H)"
echo "Frames: $N_FRAMES"
echo "Workdir: $WORKDIR"
echo
# 1. SW reference
echo "[1/3] FFmpeg SW decode -> sw_ref.yuv"
ffmpeg -hide_banner -loglevel error -y \
-i "$FIXTURE" \
-frames:v "$N_FRAMES" \
-f rawvideo -pix_fmt nv12 \
"$WORKDIR/sw_ref.yuv"
SW_BYTES=$(stat -c %s "$WORKDIR/sw_ref.yuv")
SW_SHA=$(sha256sum "$WORKDIR/sw_ref.yuv" | cut -d' ' -f1)
echo " sw_ref.yuv: $SW_BYTES bytes, sha256=$SW_SHA"
# 2. HW subject via libva v4l2_request
# Explicit crop=$FIXTURE_W:$FIXTURE_H after hwdownload normalizes any
# MB-padding the HW driver applies (hantro pads height to multiples of
# 16). Without this crop, an iter6+ correct decode could falsely
# diverge in total byte count from the SW reference.
echo "[2/3] FFmpeg HW decode via v4l2_request driver -> hw_capture.yuv"
env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
ffmpeg -hide_banner -loglevel error -y \
-hwaccel vaapi -hwaccel_output_format vaapi \
-i "$FIXTURE" \
-vf "hwdownload,format=nv12,crop=$FIXTURE_W:$FIXTURE_H:0:0" \
-frames:v "$N_FRAMES" \
-f rawvideo -pix_fmt nv12 \
"$WORKDIR/hw_capture.yuv"
HW_BYTES=$(stat -c %s "$WORKDIR/hw_capture.yuv")
HW_SHA=$(sha256sum "$WORKDIR/hw_capture.yuv" | cut -d' ' -f1)
echo " hw_capture.yuv: $HW_BYTES bytes, sha256=$HW_SHA"
echo
# 3. Compare
echo "[3/3] Compare"
if [[ "$SW_BYTES" -ne "$HW_BYTES" ]]; then
# Diagnose stride/padding artifacts before declaring pixel
# corruption. With explicit crop in step 2 this should not
# happen, but if a future kernel change shifts the alignment
# we want a clear diagnostic, not a false pixel-corruption
# accusation.
EXPECTED_SW=$(( FIXTURE_W * FIXTURE_H * 3 / 2 * N_FRAMES ))
for PAD in 16 32; do
PADDED_H=$(( (FIXTURE_H + PAD - 1) / PAD * PAD ))
EXPECTED_PADDED=$(( FIXTURE_W * PADDED_H * 3 / 2 * N_FRAMES ))
if [[ "$HW_BYTES" -eq "$EXPECTED_PADDED" ]]; then
echo "DIAGNOSTIC: HW size $HW_BYTES matches MB-padded layout" >&2
echo " ($FIXTURE_W x $PADDED_H, $PAD-line align). The crop=$FIXTURE_W:$FIXTURE_H" >&2
echo " filter step did not normalize. Check FFmpeg version / hwdownload behavior." >&2
echo " This is a stride artifact, not pixel corruption." >&2
exit 3
fi
done
echo "FAIL: size mismatch (SW=$SW_BYTES vs HW=$HW_BYTES, expected $EXPECTED_SW)" >&2
echo " Different frame count or NV12 packing — investigate." >&2
exit 1
fi
if [[ "$SW_SHA" == "$HW_SHA" ]]; then
echo "PASS: byte-for-byte identical."
echo " msync removal verified safe on this hardware/kernel."
exit 0
fi
# Per-frame divergence analysis on full-buffer mismatch.
echo "Buffer-level sha differs. Computing per-frame divergence..."
FRAME_SIZE=$(( SW_BYTES / N_FRAMES ))
DIVERGENT=0
for ((i = 0; i < N_FRAMES; i++)); do
OFFSET=$(( i * FRAME_SIZE ))
SW_FRAME_SHA=$(dd if="$WORKDIR/sw_ref.yuv" bs="$FRAME_SIZE" \
count=1 skip="$i" 2>/dev/null | sha256sum | cut -d' ' -f1)
HW_FRAME_SHA=$(dd if="$WORKDIR/hw_capture.yuv" bs="$FRAME_SIZE" \
count=1 skip="$i" 2>/dev/null | sha256sum | cut -d' ' -f1)
if [[ "$SW_FRAME_SHA" != "$HW_FRAME_SHA" ]]; then
DIVERGENT=$(( DIVERGENT + 1 ))
[[ "$DIVERGENT" -le 5 ]] && \
echo " frame $i: SW=$SW_FRAME_SHA HW=$HW_FRAME_SHA"
fi
done
echo "FAIL: $DIVERGENT / $N_FRAMES frames diverge from SW reference."
echo " Action: restore msync(MS_SYNC|MS_INVALIDATE) in surface.c"
echo " RequestSyncSurface DQBUF path; re-run this script."
exit 1
-299
View File
@@ -1,299 +0,0 @@
#!/bin/bash
# run_perf_binding_cell.sh — iter8 perf binding cell.
#
# Anchors campaign-wide claims with measured numbers. Runs four consumer
# configurations for $DURATION seconds each on $FIXTURE and emits a
# markdown table comparing:
# 1. mpv --hwdec=vaapi (DMA-BUF zero-copy through libva)
# 2. mpv --hwdec=vaapi-copy (HW decode + VAImage readback)
# 3. firefox (iter5-amend, sandbox enabled, file:// URL)
# 4. mpv --hwdec=no (SW decode baseline / control)
#
# For each consumer: CPU% (median + p90), GPU freq (median MHz), drops in
# measurement window, p50 frame interval (ms), VmRSS delta (MiB).
#
# Usage:
# ./run_perf_binding_cell.sh [fixture_path]
#
# If no argument, defaults to /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4
# Override DURATION via env: DURATION=60 ./run_perf_binding_cell.sh
#
# Reproducibility: results depend on (a) the iter7-end driver being installed
# at /usr/lib/dri/v4l2_request_drv_video.so, (b) ohm idle (no other compute
# load), (c) fixture present at the expected path. Run on a stable thermal
# state (after a few minutes of cool-down).
set -eu
FIXTURE="${1:-/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4}"
DURATION="${DURATION:-30}"
WORKDIR="${WORKDIR:-$(mktemp -d -t perf_binding.XXXXXX)}"
GPU_DEVFREQ_PATH="${GPU_DEVFREQ_PATH:-/sys/class/devfreq/fde60000.gpu/cur_freq}"
# DISPLAY/Wayland env for the operator's session, needed for Firefox under sudo.
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/1001}"
export WAYLAND_DISPLAY="${WAYLAND_DISPLAY:-wayland-0}"
export DISPLAY="${DISPLAY:-:0}"
export XAUTHORITY="${XAUTHORITY:-/run/user/1001/xauth_pxiMur}"
# libva env vars for the v4l2_request driver path.
export LIBVA_DRIVER_NAME=v4l2_request
export LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1
export LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0
if [[ ! -f "$FIXTURE" ]]; then
echo "FAIL: fixture not found: $FIXTURE" >&2
exit 2
fi
mkdir -p "$WORKDIR"
echo "Fixture: $FIXTURE ($(stat -c %s "$FIXTURE") bytes)"
echo "Duration: ${DURATION}s per configuration"
echo "Workdir: $WORKDIR"
echo "GPU freq: $GPU_DEVFREQ_PATH"
echo "Driver sha: $(sha256sum /usr/lib/dri/v4l2_request_drv_video.so | cut -d' ' -f1)"
echo
# percentile_from_stream sorted_file pct -> echo Nth percentile value
# Argument: a file with one numeric value per line (no header), and a percentile
# integer (50, 90, etc.). Numbers can be float; uses awk.
percentile_from_stream() {
local file="$1" pct="$2"
awk -v pct="$pct" '
{ a[NR] = $1 }
END {
if (NR == 0) { print "0"; exit }
# sort
for (i = 1; i <= NR; i++) for (j = i+1; j <= NR; j++) if (a[i] > a[j]) { t = a[i]; a[i] = a[j]; a[j] = t }
idx = int((pct/100.0) * NR + 0.5)
if (idx < 1) idx = 1
if (idx > NR) idx = NR
print a[idx]
}' "$file"
}
# Background-poll GPU freq while the consumer runs. Writes Hz values to $1.
poll_gpu_freq() {
local out="$1"
: >"$out"
while [[ -e "/proc/$BG_PARENT_PID" ]]; do
if [[ -r "$GPU_DEVFREQ_PATH" ]]; then
cat "$GPU_DEVFREQ_PATH" 2>/dev/null >>"$out" || true
fi
sleep 0.1
done
}
# Run a single consumer configuration. Args:
# $1 label (used for filename, no spaces)
# $2 launcher cmd (will be exec'd as mfritsche; should be a single line)
# $3 'mpv' or 'firefox' — affects how we find the PID to track
run_consumer() {
local label="$1"
local launcher="$2"
local kind="$3"
local logdir="$WORKDIR/$label"
mkdir -p "$logdir"
echo "=== Running: $label ==="
# Kill any running firefox/mpv first to clean state.
pkill -f firefox 2>/dev/null || true
pkill -x mpv 2>/dev/null || true
sleep 1
# VmRSS at start (we'll read again at end) — captured per-PID after launch.
# Launch consumer in background, capture stdout+stderr to a log.
(
eval "$launcher" >"$logdir/consumer.log" 2>&1
) &
local launcher_pid=$!
# Wait briefly for the process tree to spawn the actual decode worker.
sleep 4
local target_pid
case "$kind" in
mpv)
target_pid=$(pgrep -x mpv | head -1)
;;
firefox)
# Firefox's RDD process holds /dev/video1; that's the one with
# the libva decoder context. Wait an extra few seconds for it
# to spawn and bind the device.
sleep 6
target_pid=$(pgrep -af 'contentproc.*\brdd\b' | awk '{print $1}' | head -1)
if [[ -z "${target_pid:-}" ]]; then
# Fallback: find whichever firefox process holds /dev/video1.
target_pid=$(sudo lsof -t /dev/video1 2>/dev/null | head -1 || true)
fi
;;
*)
echo " bad kind: $kind" >&2
return 1
;;
esac
if [[ -z "${target_pid:-}" ]]; then
echo " WARN: could not locate $kind process; skipping pidstat" >&2
# Let the consumer run for the duration anyway so the log gets data.
sleep "$DURATION"
kill -TERM "$launcher_pid" 2>/dev/null || true
pkill -f firefox 2>/dev/null || true
pkill -x mpv 2>/dev/null || true
return 0
fi
echo " Tracking PID $target_pid"
# VmRSS at start.
local rss_start
rss_start=$(awk '/^VmRSS:/{print $2}' "/proc/$target_pid/status" 2>/dev/null || echo 0)
echo " VmRSS start: ${rss_start} kB"
# Poll GPU freq in background (keyed off launcher_pid).
BG_PARENT_PID=$launcher_pid
poll_gpu_freq "$logdir/gpu_freq.log" &
local poll_pid=$!
# Run pidstat for $DURATION seconds.
pidstat -u -p "$target_pid" 1 "$DURATION" >"$logdir/pidstat.log" 2>&1 || true
# VmRSS at end (before killing).
local rss_end
rss_end=$(awk '/^VmRSS:/{print $2}' "/proc/$target_pid/status" 2>/dev/null || echo "$rss_start")
# Stop everything.
kill "$poll_pid" 2>/dev/null || true
kill -TERM "$launcher_pid" 2>/dev/null || true
pkill -f firefox 2>/dev/null || true
pkill -x mpv 2>/dev/null || true
sleep 1
# Parse pidstat by header: locate the %CPU column index from the
# column-name row (where any field equals "%CPU"), then apply it
# to data rows. Robust across sysstat 12.x point releases.
# pidstat default output has no '#' header marker — the header is
# the first row containing "%CPU" as a field.
awk '
# Header row: any line where some field equals "%CPU".
!col {
for (i = 1; i <= NF; i++) if ($i == "%CPU") { col = i; next }
}
# Data row: lines whose value at $col is numeric. Skip the
# trailing "Average" summary by requiring $col to parse cleanly.
col && NF >= col && $col ~ /^[0-9]+(\.[0-9]+)?$/ {
print $col
}
' "$logdir/pidstat.log" >"$logdir/cpu_pct.log" || true
local cpu_p50 cpu_p90
if [[ -s "$logdir/cpu_pct.log" ]]; then
cpu_p50=$(percentile_from_stream "$logdir/cpu_pct.log" 50)
cpu_p90=$(percentile_from_stream "$logdir/cpu_pct.log" 90)
else
cpu_p50="ERR"
cpu_p90="ERR"
fi
# GPU freq median. Values are Hz; convert to MHz via temp file (avoids
# unreliable /dev/stdin in a nested subshell-over-pipe).
local gpu_med_mhz
if [[ -s "$logdir/gpu_freq.log" ]]; then
awk '{print $1/1000000}' "$logdir/gpu_freq.log" >"$logdir/gpu_freq_mhz.log"
gpu_med_mhz=$(percentile_from_stream "$logdir/gpu_freq_mhz.log" 50)
else
gpu_med_mhz="—"
fi
# RSS delta MiB.
local rss_delta_mib
rss_delta_mib=$(awk -v s="$rss_start" -v e="$rss_end" 'BEGIN{printf "%.1f", (e-s)/1024.0}')
# Drops + p50 frame interval — only available for mpv.
local drops="—"
local p50_frame_ms="—"
if [[ "$kind" == "mpv" ]]; then
drops=$(grep -oE 'frame-drop-count[^\t ]*\s*=\s*[0-9]+' "$logdir/consumer.log" \
| awk -F= '{print $2}' | tr -d ' ' | tail -1)
drops="${drops:-0}"
# p50 frame interval from mpv vsync-jitter or frame timing — leave
# as "—" unless mpv emitted detailed timing.
fi
# Emit row.
cat >>"$WORKDIR/results.tsv" <<-ROW
$label $cpu_p50 $cpu_p90 $drops $p50_frame_ms $gpu_med_mhz $rss_delta_mib
ROW
echo " CPU% p50=$cpu_p50 p90=$cpu_p90 drops=$drops gpu_med=$gpu_med_mhz MHz rss_delta=$rss_delta_mib MiB"
echo
}
# Header for results.
echo "consumer cpu_p50 cpu_p90 drops_${DURATION}s p50_frame_ms gpu_med_mhz rss_delta_mib" >"$WORKDIR/results.tsv"
# === Configurations ===
# 1. mpv DMA-BUF zero-copy
run_consumer "mpv-vaapi-dmabuf" \
"sudo -u mfritsche env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --no-config --hwdec=vaapi --vo=null --no-audio \
--term-status-msg='\${frame-drop-count}' \
--length=$DURATION '$FIXTURE'" \
mpv
# 2. mpv vaapi-copy
run_consumer "mpv-vaapi-copy" \
"sudo -u mfritsche env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --no-config --hwdec=vaapi-copy --vo=null --no-audio \
--term-status-msg='\${frame-drop-count}' \
--length=$DURATION '$FIXTURE'" \
mpv
# 3. Firefox-fourier (iter5-amend, sandbox enabled)
run_consumer "firefox-fourier-hw" \
"sudo -u mfritsche env XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
WAYLAND_DISPLAY=$WAYLAND_DISPLAY DISPLAY=$DISPLAY \
XAUTHORITY=$XAUTHORITY \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
firefox --new-window 'file://$FIXTURE'" \
firefox
# 4. SW baseline
run_consumer "mpv-sw-baseline" \
"sudo -u mfritsche mpv --no-config --hwdec=no --vo=null --no-audio \
--term-status-msg='\${frame-drop-count}' \
--length=$DURATION '$FIXTURE'" \
mpv
# === Generate markdown table ===
{
echo "# Performance binding cell — iter8 (libva-multiplanar campaign)"
echo
echo "Run date: $(date -Iseconds)"
echo "Host: $(uname -n) ($(uname -m))"
echo "Kernel: $(uname -r)"
echo "Driver sha256: \`$(sha256sum /usr/lib/dri/v4l2_request_drv_video.so | cut -d' ' -f1)\`"
echo "Fixture: \`$FIXTURE\` ($(stat -c %s "$FIXTURE") bytes)"
echo "Duration per consumer: ${DURATION}s"
echo
echo "| Consumer | CPU% p50 | CPU% p90 | Drops in window | p50 frame ms | GPU MHz median | VmRSS Δ MiB |"
echo "|---|---|---|---|---|---|---|"
tail -n +2 "$WORKDIR/results.tsv" | awk -F'\t' '{
printf "| %s | %s | %s | %s | %s | %s | %s |\n",
$1, $2, $3, $4, $5, $6, $7
}'
} >"$WORKDIR/perf_binding_cell.md"
echo "=== Done ==="
echo "Results: $WORKDIR/perf_binding_cell.md"
echo "Per-consumer logs: $WORKDIR/{mpv-vaapi-dmabuf,mpv-vaapi-copy,firefox-fourier-hw,mpv-sw-baseline}/"
echo
cat "$WORKDIR/perf_binding_cell.md"
-196
View File
@@ -1,196 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* MIT-licensed per project. iter40 self-test for nv12_col128 detile.
*
* Build an NC12-tiled source buffer from a known linear NV12 image,
* run the detile primitive, assert output matches the original. No
* hardware needed pure bit-layout verification of the kernel math
* (drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
* V4L2_PIX_FMT_NV12_COL128 case + ffmpeg/Kynesim per-pixel offset).
*
* Build:
* cc -Wall -Werror -O2 -o test_nv12_col128_detile \
* tests/test_nv12_col128_detile.c src/nv12_col128.c
*
* Exit 0 = all asserts pass.
*/
#include "../src/nv12_col128.h"
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TILE_W 128
static unsigned int align_up(unsigned int v, unsigned int a)
{
return (v + a - 1) & ~(a - 1);
}
/* Pack a linear plane (width × height bytes, stride=width) into NC12
* layout: each 128-wide column held contiguously, columns at offsets
* col * col_stride * 128. col_stride is the kernel-reported bytesperline
* = ALIGN(height, 8) * 3/2. Returns the buffer + sizes. */
static uint8_t *pack_to_nc12(const uint8_t *linear,
unsigned int width, unsigned int height,
unsigned int *out_col_stride,
size_t *out_size)
{
unsigned int aligned_w = align_up(width, TILE_W);
unsigned int aligned_h = align_up(height, 8);
unsigned int col_stride = aligned_h * 3 / 2;
unsigned int num_cols = aligned_w / TILE_W;
size_t total = (size_t)col_stride * aligned_w;
uint8_t *buf;
unsigned int col, y, in_col;
buf = calloc(1, total);
assert(buf != NULL);
for (col = 0; col < num_cols; col++) {
uint8_t *col_base = buf + (size_t)col * TILE_W * col_stride;
for (y = 0; y < height; y++) {
for (in_col = 0; in_col < TILE_W; in_col++) {
unsigned int x = col * TILE_W + in_col;
if (x >= width)
break;
col_base[(size_t)y * TILE_W + in_col] =
linear[(size_t)y * width + x];
}
}
}
*out_col_stride = col_stride;
*out_size = total;
return buf;
}
static void test_detile_y(unsigned int width, unsigned int height)
{
uint8_t *linear, *tiled, *recovered;
unsigned int col_stride;
size_t tile_size, i;
linear = malloc((size_t)width * height);
assert(linear != NULL);
/* Distinctive content per pixel: y * 17 + x * 13 — avoids byte-
* aliasing patterns that could mask off-by-one bugs. */
for (unsigned int y = 0; y < height; y++)
for (unsigned int x = 0; x < width; x++)
linear[(size_t)y * width + x] = (uint8_t)(y * 17 + x * 13);
tiled = pack_to_nc12(linear, width, height, &col_stride, &tile_size);
recovered = calloc(1, (size_t)width * height);
assert(recovered != NULL);
nv12_col128_detile_y(recovered, width, tiled, col_stride, width, height);
for (i = 0; i < (size_t)width * height; i++) {
if (recovered[i] != linear[i]) {
fprintf(stderr,
"FAIL %ux%u Y: pixel %zu (x=%zu y=%zu) "
"linear=0x%02x recovered=0x%02x\n",
width, height, i,
i % width, i / width,
linear[i], recovered[i]);
free(linear); free(tiled); free(recovered);
exit(1);
}
}
printf("PASS %ux%u Y plane (%u columns, col_stride=%u, tile_size=%zu)\n",
width, height, align_up(width, TILE_W) / TILE_W,
col_stride, tile_size);
free(linear);
free(tiled);
free(recovered);
}
static void test_detile_uv(unsigned int width, unsigned int height)
{
unsigned int uv_h = height / 2;
uint8_t *linear, *tiled, *recovered;
unsigned int col_stride;
size_t tile_size, i;
linear = malloc((size_t)width * uv_h);
assert(linear != NULL);
for (unsigned int y = 0; y < uv_h; y++)
for (unsigned int x = 0; x < width; x++)
linear[(size_t)y * width + x] = (uint8_t)(y * 23 + x * 7);
tiled = pack_to_nc12(linear, width, uv_h, &col_stride, &tile_size);
recovered = calloc(1, (size_t)width * uv_h);
assert(recovered != NULL);
nv12_col128_detile_uv(recovered, width, tiled, col_stride, width, uv_h);
for (i = 0; i < (size_t)width * uv_h; i++) {
if (recovered[i] != linear[i]) {
fprintf(stderr,
"FAIL %ux%u UV: pixel %zu linear=0x%02x recovered=0x%02x\n",
width, height, i,
linear[i], recovered[i]);
free(linear); free(tiled); free(recovered);
exit(1);
}
}
printf("PASS %ux%u UV plane\n", width, height);
free(linear);
free(tiled);
free(recovered);
}
static void test_uv_offset(void)
{
/* Per the SAND COL128 layout, Y and UV are interleaved within
* EACH column (not concatenated as separate planes), so the UV
* plane base pointer is offset by 128 * ALIGN(height, 8) the
* Y portion of column 0. NOT 128 * height * num_columns (the
* size of all Y across all columns), which was an earlier wrong
* formula caught by Phase 7 SEGV on higgs. */
unsigned int off = nv12_col128_uv_plane_offset(1280, 720);
if (off != 128u * 720) {
fprintf(stderr, "FAIL UV offset 1280×720: got %u expected %u\n",
off, 128u * 720);
exit(1);
}
printf("PASS UV offset 1280×720 = %u\n", off);
off = nv12_col128_uv_plane_offset(1366, 768);
if (off != 128u * 768) {
fprintf(stderr, "FAIL UV offset 1366×768: got %u expected %u\n",
off, 128u * 768);
exit(1);
}
printf("PASS UV offset 1366×768 (column-misaligned width)\n");
}
int main(void)
{
/* Phase 3 fixture sizes — all 128-aligned, 8-line-aligned. */
test_detile_y(640, 360);
test_detile_y(1280, 720);
test_detile_y(1920, 1080);
/* Phase 5 review F4: column-misaligned width (1366 → 1408 padding). */
test_detile_y(1366, 768);
/* UV plane (half-height) at each width. */
test_detile_uv(640, 360);
test_detile_uv(1280, 720);
test_detile_uv(1920, 1080);
test_detile_uv(1366, 768);
test_uv_offset();
printf("All NC12 detile asserts pass.\n");
return 0;
}
-224
View File
@@ -1,224 +0,0 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* iter39 self-test for nv15_unpack_plane_to_p010.
*
* Builds NV15 plane buffers from known 10-bit pixel arrays, runs the
* unpack, asserts P010 output matches the expected pixel<<6 values.
* No hardware needed pure bit layout verification per
* Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
*
* Build:
* cc -Wall -Werror -O2 -o test_nv15_unpack tests/test_nv15_unpack.c src/nv15.c
*
* Exit 0 = all asserts pass.
*/
#include "../src/nv15.h"
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Pack 4 10-bit pixels into 5 bytes per NV15 layout (LSB-first across
* bits 0..39). Inverse of nv15_unpack_plane_to_p010's per-group unpack. */
static void pack4(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
uint8_t out[5])
{
out[0] = (uint8_t)(a & 0xFF);
out[1] = (uint8_t)(((a >> 8) & 0x03) | ((b & 0x3F) << 2));
out[2] = (uint8_t)(((b >> 6) & 0x0F) | ((c & 0x0F) << 4));
out[3] = (uint8_t)(((c >> 4) & 0x3F) | ((d & 0x03) << 6));
out[4] = (uint8_t)((d >> 2) & 0xFF);
}
#define ASSERT_EQ(actual, expected, msg) do { \
if ((actual) != (expected)) { \
fprintf(stderr, "FAIL %s: actual=0x%04x expected=0x%04x at %s:%d\n", \
(msg), (unsigned)(actual), (unsigned)(expected), \
__FILE__, __LINE__); \
exit(1); \
} \
} while (0)
static void test_pack_unpack_roundtrip(uint16_t a, uint16_t b, uint16_t c,
uint16_t d)
{
uint8_t packed[5];
uint16_t dst[4];
pack4(a, b, c, d, packed);
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], (uint16_t)(a << 6), "roundtrip a");
ASSERT_EQ(dst[1], (uint16_t)(b << 6), "roundtrip b");
ASSERT_EQ(dst[2], (uint16_t)(c << 6), "roundtrip c");
ASSERT_EQ(dst[3], (uint16_t)(d << 6), "roundtrip d");
}
static void test_zero(void)
{
uint8_t packed[5] = { 0, 0, 0, 0, 0 };
uint16_t dst[4] = { 0xDEAD, 0xDEAD, 0xDEAD, 0xDEAD };
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], 0, "zero[0]");
ASSERT_EQ(dst[1], 0, "zero[1]");
ASSERT_EQ(dst[2], 0, "zero[2]");
ASSERT_EQ(dst[3], 0, "zero[3]");
}
static void test_all_max(void)
{
/* All four pixels = 0x3FF (max 10-bit). Packed bits all 1 → all 0xFF. */
uint8_t packed[5] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
uint16_t dst[4] = { 0, 0, 0, 0 };
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], 0xFFC0, "max[0]");
ASSERT_EQ(dst[1], 0xFFC0, "max[1]");
ASSERT_EQ(dst[2], 0xFFC0, "max[2]");
ASSERT_EQ(dst[3], 0xFFC0, "max[3]");
}
static void test_known_vectors(void)
{
/* Position-sensitive sanity: each pixel = its index+1. */
test_pack_unpack_roundtrip(1, 2, 3, 4);
/* Spread patterns that exercise every byte-boundary bit. */
test_pack_unpack_roundtrip(0x3FF, 0x000, 0x3FF, 0x000);
test_pack_unpack_roundtrip(0x000, 0x3FF, 0x000, 0x3FF);
test_pack_unpack_roundtrip(0x155, 0x2AA, 0x155, 0x2AA);
test_pack_unpack_roundtrip(0x001, 0x002, 0x004, 0x008);
test_pack_unpack_roundtrip(0x080, 0x040, 0x020, 0x010);
test_pack_unpack_roundtrip(0x200, 0x100, 0x080, 0x040);
test_pack_unpack_roundtrip(0x3F0, 0x0F3, 0x33C, 0x2A5);
}
static void test_remainder_width(void)
{
/* width=1: only A unpacked, B/C/D undefined */
{
uint8_t packed[5];
uint16_t dst[1] = { 0xDEAD };
pack4(0x123, 0x000, 0x000, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 1, 1, 5);
ASSERT_EQ(dst[0], 0x123 << 6, "rem1[0]");
}
/* width=2 */
{
uint8_t packed[5];
uint16_t dst[2] = { 0, 0 };
pack4(0x111, 0x222, 0x000, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 2, 1, 5);
ASSERT_EQ(dst[0], 0x111 << 6, "rem2[0]");
ASSERT_EQ(dst[1], 0x222 << 6, "rem2[1]");
}
/* width=3 */
{
uint8_t packed[5];
uint16_t dst[3] = { 0, 0, 0 };
pack4(0x111, 0x222, 0x333, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 3, 1, 5);
ASSERT_EQ(dst[0], 0x111 << 6, "rem3[0]");
ASSERT_EQ(dst[1], 0x222 << 6, "rem3[1]");
ASSERT_EQ(dst[2], 0x333 << 6, "rem3[2]");
}
/* width=7: one full group + 3 remainder */
{
uint8_t packed[10];
uint16_t dst[7] = { 0 };
pack4(0x100, 0x200, 0x300, 0x010, &packed[0]);
pack4(0x011, 0x022, 0x033, 0x000, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 7, 1, 10);
ASSERT_EQ(dst[0], 0x100 << 6, "rem7[0]");
ASSERT_EQ(dst[1], 0x200 << 6, "rem7[1]");
ASSERT_EQ(dst[2], 0x300 << 6, "rem7[2]");
ASSERT_EQ(dst[3], 0x010 << 6, "rem7[3]");
ASSERT_EQ(dst[4], 0x011 << 6, "rem7[4]");
ASSERT_EQ(dst[5], 0x022 << 6, "rem7[5]");
ASSERT_EQ(dst[6], 0x033 << 6, "rem7[6]");
}
/* width=8: two full groups */
{
uint8_t packed[10];
uint16_t dst[8] = { 0 };
pack4(0x101, 0x202, 0x303, 0x101, &packed[0]);
pack4(0x202, 0x303, 0x101, 0x202, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 8, 1, 10);
ASSERT_EQ(dst[7], 0x202 << 6, "w8[7]");
}
}
static void test_multi_row_stride_padding(void)
{
/* 4-pixel-wide, 3-row plane; stride = 8 bytes (3 bytes padding). */
uint8_t packed[24]; /* 3 rows × 8 bytes */
uint16_t dst[12]; /* 3 rows × 4 pixels */
memset(packed, 0xCC, sizeof(packed)); /* padding poison */
pack4(0x111, 0x222, 0x333, 0x044, &packed[0 * 8]);
pack4(0x055, 0x166, 0x177, 0x188, &packed[1 * 8]);
pack4(0x099, 0x1AA, 0x2BB, 0x3CC, &packed[2 * 8]);
memset(dst, 0xAB, sizeof(dst));
nv15_unpack_plane_to_p010(packed, dst, 4, 3, 8);
ASSERT_EQ(dst[0], 0x111 << 6, "row0[0]");
ASSERT_EQ(dst[3], 0x044 << 6, "row0[3]");
ASSERT_EQ(dst[4], 0x055 << 6, "row1[0]");
ASSERT_EQ(dst[7], 0x188 << 6, "row1[3]");
ASSERT_EQ(dst[8], 0x099 << 6, "row2[0]");
ASSERT_EQ(dst[11], 0x3CC << 6, "row2[3]");
}
static void test_chroma_half_height(void)
{
/* 4-pixel-wide × 2-row chroma (matches 4×4 luma quadrant).
* NV15 chroma uses same packing as luma, just half-height. */
uint8_t packed[10]; /* 2 rows × 5 bytes */
uint16_t dst[8]; /* 2 rows × 4 pixels (UV pairs in interleaved form) */
pack4(0x080, 0x180, 0x280, 0x380, &packed[0]);
pack4(0x040, 0x140, 0x240, 0x340, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 4, 2, 5);
ASSERT_EQ(dst[0], 0x080 << 6, "chroma row0[0]");
ASSERT_EQ(dst[3], 0x380 << 6, "chroma row0[3]");
ASSERT_EQ(dst[4], 0x040 << 6, "chroma row1[0]");
ASSERT_EQ(dst[7], 0x340 << 6, "chroma row1[3]");
}
int main(void)
{
test_zero();
test_all_max();
test_known_vectors();
test_remainder_width();
test_multi_row_stride_padding();
test_chroma_half_height();
printf("test_nv15_unpack: all PASS\n");
return 0;
}