Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate.
Two libva driver_data instances in the same process (e.g. Firefox
playing two tabs at different resolutions, or Firefox + mpv via the
same dlopened backend) would race on the static cache.
Move to struct request_data.last_output_width/height. The V4L2
device fd is already per-driver_data, so this is the correct binding
unit (one fd, one current OUTPUT format).
Verified: two concurrent mpv processes (2s stagger) both decode
300 frames cleanly with no cross-corruption. Same-instant init still
hits kernel-level fd contention on /dev/video1 (hantro is a
single-instance device); cross-process serialization is out of scope
for a libva backend.
Resolves the surface_reset_format_cache() callsite: now takes
driver_data parameter (was zero-arg).
Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls
that the iter5 sweep left behind.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.
Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
(N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
{FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
oldest FREE slot (LRU), binds it for the decode cycle, and the slot
cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
vaDestroySurfaces.
Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
cap_pools -- iter3 scope).
Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).
Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix 1 of iteration 2 per phase4_iter2_plan.md.
Adds surface_reset_format_cache() exposed from src/surface.h. Called
from RequestDestroyContext after the dual REQBUFS(0). Without this,
multi-video Firefox sessions on mozilla.org corrupted the next
session's CAPTURE format query: the kernel reset to defaults but
our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,'
so the next G_FMT returned 48x48 and the exported descriptor
encoded wrong pitch/offset.
Also adds REQBUFS(0) on CAPTURE in the resolution-change path of
RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The
existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting;
hantro derives CAPTURE format from OUTPUT format, so leftover
CAPTURE buffers from the prior resolution would also block the
implicit format change. Pre-existing bug surfaced by Sonnet's
audit; Fix 3 pool refactor would have exposed it more often.
Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/
HEIGHT cache is a static process-global, so concurrent multi-
context use still races (Sonnet 7.3 / 9.6). Iteration 2 only
addresses sequential sessions. Multi-context safety is iteration 3+.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.
fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.
Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).
Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.
Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.
This patch:
- surface.h: adds bool matrix_set to params.h264, mirroring
mpeg2.iqmatrix_set / h265.iqmatrix_set.
- picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
sets matrix_set = true when the buffer arrives.
- picture.c RequestBeginPicture: resets matrix_set = false at the
start of each Begin/Render/End cycle.
- h264.c h264_set_controls: builds the controls[] array
incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."
Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:
- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
(V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
(kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
build path is excluded from this fork (RK3568 hantro G1/G2 has
no HEVC, and the kernel-side HEVC controls have a separate
rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
includes it; the passthrough shim's CID aliases live in the
kernel header now).
Functionally equivalent to the prior fork master commits:
c1f5108 V4L2_PIX_FMT_H264_SLICE rename
4ccbfe9 Strip HEVC build path
da9f2a5 include/h264-ctrls.h passthrough + CID aliases
fc4bb10 src/h264.c track upstream UAPI shape
13e9b64 src/h264.c drop num_slices field
4d14ffb src/tiled_yuv.S aarch64 stub
1b02c9b src/h264.c include utils.h
Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
This adds support for MPEG2 quantization matrices, which are optional
given that fallback default matrices are used (on the kernel side) when
no such matrix is provided.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
This is the latest version of dma-buf export, that does support
specifying DRM modifiers.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.
Add a counter for the number of slices per surface, and set num_slices to
that value.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Using VAAPI as a video output (through vaPutSurface) is deprecated and
definitely not recommended for any use case. Since we're starting to
support non-X11 pipelines, remove X11 support altogether.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
This VA backend uses v4l2's Frame API proposal to interface with the
"sunxi-cedrus" video driver on Allwinner SoC. Only a few parts of the
code are really dependent on sunxi-cedrus and this VA backend could be
reused for other v4l drivers using the Frame API.