Replaces stub av1_set_controls with full VAAPI → V4L2 stateless AV1
control translation. Four V4L2 controls batched per-frame:
V4L2_CID_STATELESS_AV1_SEQUENCE (sequence-level flags)
V4L2_CID_STATELESS_AV1_FRAME (heavy — quant, lf, cdef, lr, gm,
tile_info, refs, frame flags)
V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY[] (DYNAMIC_ARRAY, size=MAX(N,1))
V4L2_CID_STATELESS_AV1_FILM_GRAIN (gated on driver_data->has_av1_film_grain)
Reference: Kwiboo/FFmpeg v4l2-request-n8.1:libavcodec/v4l2_request_av1.c
(636 LoC); same V4L2 output schema, sourced from VAAPI's
VADecPictureParameterBufferAV1 instead of FFmpeg's AV1RawSequenceHeader.
VAAPI gap notes (fields the spec needs but VAAPI doesn't expose):
- sequence max_frame_{width,height}_minus_1 — use current frame size
- enable_warped_motion / enable_ref_frame_mvs / enable_superres /
enable_restoration sequence-level — conservative set-true (per-frame
flags gate actual behavior)
- order_hints[], reference_frame_ts[] — zero (kernel cross-refs by
OUTPUT timestamp / surface id)
- tile_start_col_sb[] / tile_start_row_sb[] — reconstruct via
prefix-sum on VAAPI's width/height_in_sbs_minus_1[]
- tile_size_bytes — set to 4 for multi-tile frames (max value), 0
for single-tile (matches Kwiboo's conditional)
- render_width/height — fall back to coded dimensions
- current_frame_id / refresh_frame_flags / skip_mode_frame_idx /
buffer_removal_time / frame_refs_short_signaling — zero
- film_grain_params_ref_idx / update_grain — zero (only consulted in
reuse paths; apply_grain=1 + populated arrays drive decode directly)
F1/F2/F3 risk mitigations per phase1_plan_v2:
F1: mi_col/row_starts sentinel = 2 * ((frame_width + 7) >> 3) at
index [tile_cols]/[tile_rows] — mirrors Kwiboo lines 238/244
F2: superres_denom direct from VAAPI's superres_scale_denominator
(VAAPI's encoding is the final value; no AV1_SUPERRES_DENOM_MIN
math). Fallback to AV1_SUPERRES_NUM=8 if zero.
F3: loop_restoration_size[] gated on USES_LR flag derived from
y_t != 0 || cb_t != 0 || cr_t != 0 — mirrors Kwiboo lines 281-287
Plus:
- request.h: has_av1_film_grain bool on driver_data
- request.c: probe VIDIOC_QUERY_EXT_CTRL for FILM_GRAIN on vpu981 fd
at VA_DRIVER_INIT (Janet v3 amendment A: init-time, not lazy)
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so,
0 errors, pre-existing GStreamer #warnings only.
Phase 3 verification on ampere is next: 208x208 smoke + film_grain
stress vector (av1-1-b8-23-film_grain-50.ivf) byte-compare libva vs
kdirect (Phase 0 proved kdirect bit-perfect).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v4l2-request libVA Backend
About
This libVA backend is designed to work with the Linux Video4Linux2 Request API that is used by a number of video codecs drivers, including the Video Engine found in most Allwinner SoCs.
Status
The v4l2-request libVA backend currently supports the following formats:
- MPEG2 (Simple and Main profiles)
- H264 (Baseline, Main and High profiles)
- H265 (Main profile)
Instructions
In order to use this libVA backend, the v4l2_request driver has to
be specified through the LIBVA_DRIVER_NAME environment variable, as
such:
export LIBVA_DRIVER_NAME=v4l2_request
A media player that supports VAAPI (such as VLC) can then be used to decode a video in a supported format:
vlc path/to/video.mpg
Sample media files can be obtained from:
http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/
Technical Notes
Surface
A Surface is an internal data structure never handled by the VA's user containing the output of a rendering. Usualy, a bunch of surfaces are created at the begining of decoding and they are then used alternatively. When created, a surface is assigned a corresponding v4l capture buffer and it is kept until the end of decoding. Syncing a surface waits for the v4l buffer to be available and then dequeue it.
Note: since a Surface is kept private from the VA's user, it can ask to directly render a Surface on screen in an X Drawable. Some kind of implementation is available in PutSurface but this is only for development purpose.
Context
A Context is a global data structure used for rendering a video of a certain format. When a context is created, input buffers are created and v4l's output (which is the compressed data input queue, since capture is the real output) format is set.
Picture
A Picture is an encoded input frame made of several buffers. A single input can contain slice data, headers and IQ matrix. Each Picture is assigned a request ID when created and each corresponding buffer might be turned into a v4l buffers or extended control when rendered. Finally they are submitted to kernel space when reaching EndPicture.
The real rendering is done in EndPicture instead of RenderPicture because the v4l2 driver expects to have the full corresponding extended control when a buffer is queued and we don't know in which order the different RenderPicture will be called.
Image
An Image is a standard data structure containing rendered frames in a usable pixel format. Here we only use NV12 buffers which are converted from sunxi's proprietary tiled pixel format with tiled_yuv when deriving an Image from a Surface.