claude-noether c839b9456e ampere-av1 Phase 3 finding: iter2 Fix 3 release is NOT the divergence cause
Investigated whether picture.c::BeginPicture's iter2 Fix 3 release-on-
rebind was causing AV1 inter-frame divergence on av1_larger.ivf
(film_grain stress vector). Added env-gated LIBVA_SKIP_REBIND=1
experiment (leak old slot instead of release); A/B run showed identical
3/10 PASS count with and without the release. Hypothesis disproved.

Where the divergence actually lives:
  - patched ffmpeg-v4l2-request-fourier libavcodec.so with a fwrite
    diag in ff_v4l2_request_append_output → 7 dump files for the
    -frames:v 5 kdirect run, sizes [15133, 3670, 1970, 1323, 812,
    886, 1310] BYTE-IDENTICAL to our LIBVA_V4L2_DUMP_OUTPUT first 7
    submissions for the same input
  - our backend has 2 EXTRA EndPicture calls (t8 size 824, t9 size
    487) on RE-USED surfaces (0x4000008 and 0x4000006)
  - the extras happen because ffmpeg-vaapi's AV1 hwaccel issues
    redecode requests onto surfaces that already hold frames the
    consumer hasn't downloaded yet
  - SKIP_REBIND should let those redecodes' slots stay around but
    doesn't help, because surface_object->current_slot can only
    point at ONE slot at a time and bind_slot overwrites it

True root cause: ffmpeg-vaapi AV1 hwaccel's surface accounting is
incompatible with the iter2 Fix 3 1:1 surface↔slot invariant when
the stream has show_existing_frame frames. Fix would need either
(a) cap_pool tracking N surfaces per slot, or (b) backend reading
ffmpeg-vaapi's display-order mapping and remapping slots accordingly.
Both are non-trivial Phase 4 work — outside this iteration's scope.

Reverted the LIBVA_SKIP_REBIND env-gate to clean shape. Comment
updated with the investigation outcome so the next session has the
context without rediscovering.

State: 3/10 av1_larger frames bit-exact (frames 0/2/4, the
apply_grain=1 IDR-derived ones). test_av1.ivf 208x208 still bit-exact
PASS (no regression). diagnostic logs in BeginPicture +
surface_unbind_slot + v4l2_ioctl_controls retained for future
investigation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 12:12:23 +00:00
2016-08-26 15:43:09 +02:00
2016-08-26 15:43:09 +02:00
2018-09-08 08:51:51 +02:00

v4l2-request libVA Backend

About

This libVA backend is designed to work with the Linux Video4Linux2 Request API that is used by a number of video codecs drivers, including the Video Engine found in most Allwinner SoCs.

Status

The v4l2-request libVA backend currently supports the following formats:

  • MPEG2 (Simple and Main profiles)
  • H264 (Baseline, Main and High profiles)
  • H265 (Main profile)

Instructions

In order to use this libVA backend, the v4l2_request driver has to be specified through the LIBVA_DRIVER_NAME environment variable, as such:

export LIBVA_DRIVER_NAME=v4l2_request

A media player that supports VAAPI (such as VLC) can then be used to decode a video in a supported format:

vlc path/to/video.mpg

Sample media files can be obtained from:

http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/

Technical Notes

Surface

A Surface is an internal data structure never handled by the VA's user containing the output of a rendering. Usualy, a bunch of surfaces are created at the begining of decoding and they are then used alternatively. When created, a surface is assigned a corresponding v4l capture buffer and it is kept until the end of decoding. Syncing a surface waits for the v4l buffer to be available and then dequeue it.

Note: since a Surface is kept private from the VA's user, it can ask to directly render a Surface on screen in an X Drawable. Some kind of implementation is available in PutSurface but this is only for development purpose.

Context

A Context is a global data structure used for rendering a video of a certain format. When a context is created, input buffers are created and v4l's output (which is the compressed data input queue, since capture is the real output) format is set.

Picture

A Picture is an encoded input frame made of several buffers. A single input can contain slice data, headers and IQ matrix. Each Picture is assigned a request ID when created and each corresponding buffer might be turned into a v4l buffers or extended control when rendered. Finally they are submitted to kernel space when reaching EndPicture.

The real rendering is done in EndPicture instead of RenderPicture because the v4l2 driver expects to have the full corresponding extended control when a buffer is queued and we don't know in which order the different RenderPicture will be called.

Image

An Image is a standard data structure containing rendered frames in a usable pixel format. Here we only use NV12 buffers which are converted from sunxi's proprietary tiled pixel format with tiled_yuv when deriving an Image from a Surface.

S
Description
bootlin/libva-v4l2-request fork: multiplanar V4L2 support for Rockchip hantro (Fourier)
Readme 2.6 MiB
Languages
C 96.2%
Shell 2%
Meson 0.8%
Assembly 0.4%
Makefile 0.4%
Other 0.2%