test0r fdb0b728d7 h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
ROOT CAUSE for "kernel decodes successfully but produces zeroed
CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":

ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
0x10000 as a sentinel for "uninitialised":
  libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
  libavcodec/h264dec.c:444 — IDR reset in idr() helper
ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
pc->poc_msb = pc->prev_poc_msb whenever the slice header's
pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
is the typical case for any normal H.264 content with sane POC
ordering). The sentinel leaks into field_poc[] (line 305) and from
there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).

Empirical confirmation via meitner 2026-05-02 ground-truth test:
ran an LD_PRELOAD shim around vaCreateBuffer against an i965
VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
showed TopFieldOrderCnt = (POC | 0x10000):

  Frame 1 IDR:  raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
  Frame 2:      raw bytes "06 00 01 00"             → TopFOC=65542
  Frame 3:      "02 00 01 00"                       → TopFOC=65538

i965 successfully decodes regardless. V4L2 stateless drivers
(hantro_h264.c::prepare_table feeds the value direct to
tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
for cur_pic_order_count comparison) cannot tolerate the high word —
the kernel's resource sizing math sees POC=65536 for an IDR and
breaks.

This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
inline in src/h264.c. It detects bit 16 set rather than blindly
subtracting, so a future ffmpeg version that fixes the leak
degrades gracefully. The helper is applied at all four POC sites:

  1. h264_fill_dpb:           dpb->top_field_order_cnt
  2. h264_fill_dpb:           dpb->bottom_field_order_cnt
  3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
  4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt

VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
sets POC=0 there; the sentinel never applies. Zeroing them
explicitly removes a class of "stale POC value in invalidated
slot" foot-guns.

Non-trivial follow-ups identified during the meitner experiment
that are NOT addressed by this patch:
  - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
    not yet derived from VASliceParameterBufferH264.slice_type. The
    bbb corpus is I-only at the start so this hasn't been a
    blocker, but a clip with B-frames will need the slice-type
    routing patch.
  - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
    almost certainly wrong per the kernel doc — pic_num must equal
    the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
    id. Out of scope here; will surface as a defect on streams
    that have multi-frame DPB lookups.

Cross-references:
  audit_0008_decode_params_2026-05-01.md — kernel-side consumer
    audit confirming POC fields are userspace-required.
  api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
    semantics; H.264 spec section 8.2.1 is the binding contract.
  meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
    capture of the sentinel pattern across 60 frames.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
2016-08-26 15:43:09 +02:00
2016-08-26 15:43:09 +02:00
2018-09-08 08:51:51 +02:00

v4l2-request libVA Backend

About

This libVA backend is designed to work with the Linux Video4Linux2 Request API that is used by a number of video codecs drivers, including the Video Engine found in most Allwinner SoCs.

Status

The v4l2-request libVA backend currently supports the following formats:

  • MPEG2 (Simple and Main profiles)
  • H264 (Baseline, Main and High profiles)
  • H265 (Main profile)

Instructions

In order to use this libVA backend, the v4l2_request driver has to be specified through the LIBVA_DRIVER_NAME environment variable, as such:

export LIBVA_DRIVER_NAME=v4l2_request

A media player that supports VAAPI (such as VLC) can then be used to decode a video in a supported format:

vlc path/to/video.mpg

Sample media files can be obtained from:

http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/

Technical Notes

Surface

A Surface is an internal data structure never handled by the VA's user containing the output of a rendering. Usualy, a bunch of surfaces are created at the begining of decoding and they are then used alternatively. When created, a surface is assigned a corresponding v4l capture buffer and it is kept until the end of decoding. Syncing a surface waits for the v4l buffer to be available and then dequeue it.

Note: since a Surface is kept private from the VA's user, it can ask to directly render a Surface on screen in an X Drawable. Some kind of implementation is available in PutSurface but this is only for development purpose.

Context

A Context is a global data structure used for rendering a video of a certain format. When a context is created, input buffers are created and v4l's output (which is the compressed data input queue, since capture is the real output) format is set.

Picture

A Picture is an encoded input frame made of several buffers. A single input can contain slice data, headers and IQ matrix. Each Picture is assigned a request ID when created and each corresponding buffer might be turned into a v4l buffers or extended control when rendered. Finally they are submitted to kernel space when reaching EndPicture.

The real rendering is done in EndPicture instead of RenderPicture because the v4l2 driver expects to have the full corresponding extended control when a buffer is queued and we don't know in which order the different RenderPicture will be called.

Image

An Image is a standard data structure containing rendered frames in a usable pixel format. Here we only use NV12 buffers which are converted from sunxi's proprietary tiled pixel format with tiled_yuv when deriving an Image from a Surface.

S
Description
bootlin/libva-v4l2-request fork: multiplanar V4L2 support for Rockchip hantro (Fourier)
Readme 2.6 MiB
Languages
C 96.3%
Shell 1.9%
Meson 0.8%
Assembly 0.4%
Makefile 0.4%
Other 0.2%