claude-noether 5803cbcf6c ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact
Three structural fixes for AV1 with film_grain on vpu981 (RK3588). Output
is no longer empty / crashed; frame 0 (IDR with apply_grain=1) is
bit-exact vs kdirect. Inter frames still diverge.

Fix 1 — surface.h + surface.c: linked_decode_surface_id field on
object_surface, initialized to VA_INVALID_SURFACE. When AV1 picture has
apply_grain=1, VAAPI's VADecPictureParameterBufferAV1 carries a
current_display_picture distinct from current_frame. ffmpeg-vaapi calls
vaBeginPicture on current_frame (decode surface, slot gets bound) but
vaGetImage on current_display_picture (display surface, no slot) → NULL
deref in copy_surface_to_image.

Fix 2 — av1.c: in av1_set_controls, when cur_frame != cur_display, set
display_surface->linked_decode_surface_id = current_frame. Establishes
the back-link so display surface can borrow decode surface's data.

Fix 3 — image.c copy_surface_to_image: when slot is NULL and the
surface has linked_decode_surface_id, lookup the decode surface and
mirror its destination_data[] + destination_sizes[] +
destination_planes_count. NULL guard with diagnostic log retained.

Fix 4 — av1.c fill_film_grain: when apply_grain=1, also set
V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN. Confirmed by strace-diff: kdirect
sends flags=0x0B (APPLY|UPDATE|...), libva was sending 0x09 (APPLY but
no UPDATE). Without UPDATE the kernel tries to reuse from
film_grain_params_ref_idx=0, which is never populated. Earlier reverted
because UPDATE seemed to trigger a SEGV — but that SEGV was the
unmasked NULL-slot deref; with fix 1+2+3 in place UPDATE is safe.

Fix 5 — av1.c reference_frame_ts plumbing: when a referenced surface
has timestamp=0 AND linked_decode_surface_id set, follow the link to
find the decode surface that carries the real timestamp. Display
surfaces don't get OUTPUT QBUF'd by us, so their own timestamp stays
zero.

Also: BeginPicture diagnostic log + surface_unbind_slot diagnostic log
+ v4l2.c error_idx diagnostic (kept from earlier — useful for ongoing
investigation).

Verification on ampere:
  test_av1.ivf (208x208, 2 frames, no grain): bit-exact PASS sha
    029ee72c214b37c1 (unchanged, no regression)
  av1_larger.ivf (352x288, 10 frames, film_grain alternates):
    frame 0 (key, apply_grain=1): PASS bit-exact vs kdirect
    frame 4: PASS bit-exact
    frames 1,2,3,5,6,7,8,9: DIFFER

Frame 0 PASS proves: SEQUENCE + FRAME + TILE_GROUP_ENTRY + FILM_GRAIN
mapping is correct for IDR. Frame 4 PASS is unexplained but encouraging.
Inter-frame divergence (frame 1+) points at: reference handling for
inter prediction is still off — either order_hints[] (still zero,
VAAPI doesn't expose per-ref), or grain-applied vs pre-grain DPB
semantics, or ref_frame_idx pointing into the wrong surface space.

Next investigation: per-frame strace diff between libva and kdirect
controls payload to spot remaining field mis-mappings on inter frames.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:45:31 +00:00
2016-08-26 15:43:09 +02:00
2016-08-26 15:43:09 +02:00
2018-09-08 08:51:51 +02:00

v4l2-request libVA Backend

About

This libVA backend is designed to work with the Linux Video4Linux2 Request API that is used by a number of video codecs drivers, including the Video Engine found in most Allwinner SoCs.

Status

The v4l2-request libVA backend currently supports the following formats:

  • MPEG2 (Simple and Main profiles)
  • H264 (Baseline, Main and High profiles)
  • H265 (Main profile)

Instructions

In order to use this libVA backend, the v4l2_request driver has to be specified through the LIBVA_DRIVER_NAME environment variable, as such:

export LIBVA_DRIVER_NAME=v4l2_request

A media player that supports VAAPI (such as VLC) can then be used to decode a video in a supported format:

vlc path/to/video.mpg

Sample media files can be obtained from:

http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/

Technical Notes

Surface

A Surface is an internal data structure never handled by the VA's user containing the output of a rendering. Usualy, a bunch of surfaces are created at the begining of decoding and they are then used alternatively. When created, a surface is assigned a corresponding v4l capture buffer and it is kept until the end of decoding. Syncing a surface waits for the v4l buffer to be available and then dequeue it.

Note: since a Surface is kept private from the VA's user, it can ask to directly render a Surface on screen in an X Drawable. Some kind of implementation is available in PutSurface but this is only for development purpose.

Context

A Context is a global data structure used for rendering a video of a certain format. When a context is created, input buffers are created and v4l's output (which is the compressed data input queue, since capture is the real output) format is set.

Picture

A Picture is an encoded input frame made of several buffers. A single input can contain slice data, headers and IQ matrix. Each Picture is assigned a request ID when created and each corresponding buffer might be turned into a v4l buffers or extended control when rendered. Finally they are submitted to kernel space when reaching EndPicture.

The real rendering is done in EndPicture instead of RenderPicture because the v4l2 driver expects to have the full corresponding extended control when a buffer is queued and we don't know in which order the different RenderPicture will be called.

Image

An Image is a standard data structure containing rendered frames in a usable pixel format. Here we only use NV12 buffers which are converted from sunxi's proprietary tiled pixel format with tiled_yuv when deriving an Image from a Surface.

S
Description
bootlin/libva-v4l2-request fork: multiplanar V4L2 support for Rockchip hantro (Fourier)
Readme 2.6 MiB
Languages
C 96.3%
Shell 1.9%
Meson 0.8%
Assembly 0.4%
Makefile 0.4%
Other 0.2%