Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache,
per-fd gating).
src/h265.c additions:
- #include <errno.h>, the v4l2-hevc-ext-controls.h, and the
vendored gst/codecparsers/gsth265parser.h
- new static helper h265_populate_ext_sps_rps_cache(): walks
surface_object->source_data for an SPS NAL (nal_unit_type == 33)
using gst_h265_parser_identify_nalu; if found, calls
gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps —
the latter discards the per-RPS-entry EXT data we need); maps
GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt
(carrying use_delta_flag[16], used_by_curr_pic_flag[16],
delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2
struct arrays; stores on driver_data->hevc_rps_cache_*
- non-IDR-frame handling: cache holds across frames, so frames
whose source_data lacks an SPS NAL reuse the previously-parsed
cached arrays (Phase 5 review item #3)
- controls[] grows from [5] to [7]; the 2 new entries are appended
after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/
DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec
(per-fd probe result from Step 3) + the cache being valid
- field-by-field mapping mirrors GStreamer's
gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream
reference identified in Phase 0 prior-art survey)
src/request.h additions:
- struct request_data carries hevc_rps_cache_st (array pointer),
_st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid.
Single-slot cache (sps_id 0 only; multi-SPS streams would need
expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't
need to know GstH265SPS / GstH265SPSEXT types.
Critical interpretation correction (Phase 5 review followup):
GstH265SPS has short_term_ref_pic_set[65] (base) but NOT
short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE
GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext.
The 'plain' gst_h265_parser_parse_sps internally calls _ext with a
LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must
use the _ext variant directly to keep the EXT data. Caught during
Step 4 first-build error.
Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB
original, 682 KB after Step 2 vendor — the +80 KB is the new helper
+ extension).
iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1
falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it
still OOPSes, loopback Phase 0 with re-opened kernel-agent#11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v4l2-request libVA Backend
About
This libVA backend is designed to work with the Linux Video4Linux2 Request API that is used by a number of video codecs drivers, including the Video Engine found in most Allwinner SoCs.
Status
The v4l2-request libVA backend currently supports the following formats:
- MPEG2 (Simple and Main profiles)
- H264 (Baseline, Main and High profiles)
- H265 (Main profile)
Instructions
In order to use this libVA backend, the v4l2_request driver has to
be specified through the LIBVA_DRIVER_NAME environment variable, as
such:
export LIBVA_DRIVER_NAME=v4l2_request
A media player that supports VAAPI (such as VLC) can then be used to decode a video in a supported format:
vlc path/to/video.mpg
Sample media files can be obtained from:
http://samplemedia.linaro.org/MPEG2/
http://samplemedia.linaro.org/MPEG4/SVT/
Technical Notes
Surface
A Surface is an internal data structure never handled by the VA's user containing the output of a rendering. Usualy, a bunch of surfaces are created at the begining of decoding and they are then used alternatively. When created, a surface is assigned a corresponding v4l capture buffer and it is kept until the end of decoding. Syncing a surface waits for the v4l buffer to be available and then dequeue it.
Note: since a Surface is kept private from the VA's user, it can ask to directly render a Surface on screen in an X Drawable. Some kind of implementation is available in PutSurface but this is only for development purpose.
Context
A Context is a global data structure used for rendering a video of a certain format. When a context is created, input buffers are created and v4l's output (which is the compressed data input queue, since capture is the real output) format is set.
Picture
A Picture is an encoded input frame made of several buffers. A single input can contain slice data, headers and IQ matrix. Each Picture is assigned a request ID when created and each corresponding buffer might be turned into a v4l buffers or extended control when rendered. Finally they are submitted to kernel space when reaching EndPicture.
The real rendering is done in EndPicture instead of RenderPicture because the v4l2 driver expects to have the full corresponding extended control when a buffer is queued and we don't know in which order the different RenderPicture will be called.
Image
An Image is a standard data structure containing rendered frames in a usable pixel format. Here we only use NV12 buffers which are converted from sunxi's proprietary tiled pixel format with tiled_yuv when deriving an Image from a Surface.