Three structural fixes for AV1 with film_grain on vpu981 (RK3588). Output
is no longer empty / crashed; frame 0 (IDR with apply_grain=1) is
bit-exact vs kdirect. Inter frames still diverge.
Fix 1 — surface.h + surface.c: linked_decode_surface_id field on
object_surface, initialized to VA_INVALID_SURFACE. When AV1 picture has
apply_grain=1, VAAPI's VADecPictureParameterBufferAV1 carries a
current_display_picture distinct from current_frame. ffmpeg-vaapi calls
vaBeginPicture on current_frame (decode surface, slot gets bound) but
vaGetImage on current_display_picture (display surface, no slot) → NULL
deref in copy_surface_to_image.
Fix 2 — av1.c: in av1_set_controls, when cur_frame != cur_display, set
display_surface->linked_decode_surface_id = current_frame. Establishes
the back-link so display surface can borrow decode surface's data.
Fix 3 — image.c copy_surface_to_image: when slot is NULL and the
surface has linked_decode_surface_id, lookup the decode surface and
mirror its destination_data[] + destination_sizes[] +
destination_planes_count. NULL guard with diagnostic log retained.
Fix 4 — av1.c fill_film_grain: when apply_grain=1, also set
V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN. Confirmed by strace-diff: kdirect
sends flags=0x0B (APPLY|UPDATE|...), libva was sending 0x09 (APPLY but
no UPDATE). Without UPDATE the kernel tries to reuse from
film_grain_params_ref_idx=0, which is never populated. Earlier reverted
because UPDATE seemed to trigger a SEGV — but that SEGV was the
unmasked NULL-slot deref; with fix 1+2+3 in place UPDATE is safe.
Fix 5 — av1.c reference_frame_ts plumbing: when a referenced surface
has timestamp=0 AND linked_decode_surface_id set, follow the link to
find the decode surface that carries the real timestamp. Display
surfaces don't get OUTPUT QBUF'd by us, so their own timestamp stays
zero.
Also: BeginPicture diagnostic log + surface_unbind_slot diagnostic log
+ v4l2.c error_idx diagnostic (kept from earlier — useful for ongoing
investigation).
Verification on ampere:
test_av1.ivf (208x208, 2 frames, no grain): bit-exact PASS sha
029ee72c214b37c1 (unchanged, no regression)
av1_larger.ivf (352x288, 10 frames, film_grain alternates):
frame 0 (key, apply_grain=1): PASS bit-exact vs kdirect
frame 4: PASS bit-exact
frames 1,2,3,5,6,7,8,9: DIFFER
Frame 0 PASS proves: SEQUENCE + FRAME + TILE_GROUP_ENTRY + FILM_GRAIN
mapping is correct for IDR. Frame 4 PASS is unexplained but encouraging.
Inter-frame divergence (frame 1+) points at: reference handling for
inter prediction is still off — either order_hints[] (still zero,
VAAPI doesn't expose per-ref), or grain-applied vs pre-grain DPB
semantics, or ref_frame_idx pointing into the wrong surface space.
Next investigation: per-frame strace diff between libva and kdirect
controls payload to spot remaining field mis-mappings on inter frames.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
V4L2 CAPTURE buffers are V4L2_MEMORY_MMAP and mapped cached. Kernel
DMA writes don't propagate to CPU cache observer; reading
destination_data[] without DMA_BUF_IOCTL_SYNC(START|READ) returns
stale data on RK3399 — observed as Bug 4 (H.264 partial-fill) and
Bug 5 (HEVC all-zero) when libva goes through cached-mmap readback
while kdirect ffmpeg-v4l2request + DRM_PRIME-mmap reads cleanly via
implicit sync.
Per Tomasz Figa's 2024 linaro-mm-sig discussion + feedback_rfc_v2_
vb2_dma_resv_scope.md: userspace responsibility for cache sync on
cached-mmap'd V4L2 buffers. RFC v2 fence work doesn't engage this
path; this ioctl pair does.
Just-in-time EXPBUF + SYNC + close per copy. Per-call cost is one
ioctl pair + one fd lifecycle per plane. Could cache the EXPBUF fd
on cap_pool slot but doing it transient keeps lifecycle simple.
Closing the EXPBUF fd is a no-op on V4L2 buffer memory.
If EXPBUF or SYNC fails, fall through to existing memcpy path —
preserves pre-iter13 behavior on the error branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c,
picture.c, surface.c. These were diagnostic-only entry-point logs
added during iter1's "where does Firefox RDD crash?" investigation.
With the iter1+iter2+iter3+iter4 fixes landed, the entry-point
traces are pure noise.
If a future investigation needs entry-point coverage, strace -e trace
on the libva consumer process gives equivalent visibility without
modifying the driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.
Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
(N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
{FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
oldest FREE slot (LRU), binds it for the decode cycle, and the slot
cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
vaDestroySurfaces.
Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
cap_pools -- iter3 scope).
Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).
Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QueryImageFormats and DeriveImage previously set only .fourcc and
left byte_order, bits_per_pixel, depth, and color masks zero
(uninitialized in the caller's buffer). VAAPI consumers that read
these fields (FFmpeg's hwcontext_vaapi.c::vaapi_init_pixfmt,
intel-vaapi-driver test paths) inherit caller-stack garbage with
non-deterministic behavior.
Cross-reference: Mesa's gallium/frontends/va/image.c and
intel-vaapi-driver's i965_drv_video.c both publish NV12 with
byte_order=VA_LSB_FIRST and bits_per_pixel=12. We now match.
For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask
are not meaningful (RGB-bitlayout-only fields); leave them zeroed
via memset.
Audit context: 2026-05-04 cross-reference of all libva entry
points Firefox 150 calls vs our backend implementations. The
SEPARATE_LAYERS fix (commit ac891a0) cleared the load-bearing
bug; this fixes a latent uninitialized-field issue that was
masked by mpv's tolerance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds request_log on entry to:
- RequestSyncSurface
- RequestQuerySurfaceAttributes
- RequestQuerySurfaceStatus (including the returned status value)
- RequestDeriveImage
- RequestQueryImageFormats
- RequestGetImage
Goal: identify which API call Firefox 150 makes that returns
differently than it expects, causing the SW fallback after
frame 0. mpv works end-to-end with the surface-export fix in
place; Firefox does not. Per operator's correction: don't assume
mpv's success means the driver is correct — Firefox may detect
a real spec violation that mpv silently tolerates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:
- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
(V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
(kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
build path is excluded from this fork (RK3568 hantro G1/G2 has
no HEVC, and the kernel-side HEVC controls have a separate
rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
includes it; the passthrough shim's CID aliases live in the
kernel header now).
Functionally equivalent to the prior fork master commits:
c1f5108 V4L2_PIX_FMT_H264_SLICE rename
4ccbfe9 Strip HEVC build path
da9f2a5 include/h264-ctrls.h passthrough + CID aliases
fc4bb10 src/h264.c track upstream UAPI shape
13e9b64 src/h264.c drop num_slices field
4d14ffb src/tiled_yuv.S aarch64 stub
1b02c9b src/h264.c include utils.h
Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We where first copying the image structure and then setting the pitches
and offets, so this information was lost. This fixes vaDerivedImage and
vaGetImage implementation.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
This enables raw playback within GStreamer. This is useful for testing
even if slower then DMABuf. This is a partial implementation since we
don't implement partial copy of the surface.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
The IMAGE macro takes an implicit driver_data argument. In order to make
it obvious that we need it, let's put it as an explicit parameter.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
void * can be assigned from and stored to any pointer type without any
warning. Remove the explicit casts.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.
As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This long structure name makes it quite difficult to fit within the 80
characters limit. Shorten it.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The BUFFER macro takes an implicit driver_data argument. In order to make
it obvious that we need it, let's put it as an explicit parameter.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This VA backend uses v4l2's Frame API proposal to interface with the
"sunxi-cedrus" video driver on Allwinner SoC. Only a few parts of the
code are really dependent on sunxi-cedrus and this VA backend could be
reused for other v4l drivers using the Frame API.