Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache,
per-fd gating).
src/h265.c additions:
- #include <errno.h>, the v4l2-hevc-ext-controls.h, and the
vendored gst/codecparsers/gsth265parser.h
- new static helper h265_populate_ext_sps_rps_cache(): walks
surface_object->source_data for an SPS NAL (nal_unit_type == 33)
using gst_h265_parser_identify_nalu; if found, calls
gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps —
the latter discards the per-RPS-entry EXT data we need); maps
GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt
(carrying use_delta_flag[16], used_by_curr_pic_flag[16],
delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2
struct arrays; stores on driver_data->hevc_rps_cache_*
- non-IDR-frame handling: cache holds across frames, so frames
whose source_data lacks an SPS NAL reuse the previously-parsed
cached arrays (Phase 5 review item #3)
- controls[] grows from [5] to [7]; the 2 new entries are appended
after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/
DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec
(per-fd probe result from Step 3) + the cache being valid
- field-by-field mapping mirrors GStreamer's
gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream
reference identified in Phase 0 prior-art survey)
src/request.h additions:
- struct request_data carries hevc_rps_cache_st (array pointer),
_st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid.
Single-slot cache (sps_id 0 only; multi-SPS streams would need
expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't
need to know GstH265SPS / GstH265SPSEXT types.
Critical interpretation correction (Phase 5 review followup):
GstH265SPS has short_term_ref_pic_set[65] (base) but NOT
short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE
GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext.
The 'plain' gst_h265_parser_parse_sps internally calls _ext with a
LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must
use the _ext variant directly to keep the EXT data. Caught during
Step 4 first-build error.
Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB
original, 682 KB after Step 2 vendor — the +80 KB is the new helper
+ extension).
iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1
falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it
still OOPSes, loopback Phase 0 with re-opened kernel-agent#11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
α-26 (iter26) wrote VAAPI's picture->st_rps_bits to the V4L2 decode_params
field of the same name based on field-name match. Per V4L2 spec, this field
is the bit-count of st_ref_pic_set() *in the SPS* — VAAPI doesn't expose
that. The slice-header bit-count (which IS what VAAPI's st_rps_bits provides)
belongs in slice_params->short_term_ref_pic_set_size (handled correctly in
α-29).
rkvdec doesn't read decode_params->short_term_ref_pic_set_size, so the
misroute was harmless but stale. This revert restores spec-correct semantics
(0 when SPS bit-count is unknown).
Cosmetic cleanup; no functional change.
ROOT CAUSE FIX for HEVC frame 2+ divergence (Bug 5 remainder).
rkvdec's assemble_sw_rps (rkvdec-hevc.c:386-389) uses
sl_params->short_term_ref_pic_set_size to compute the bit offset where
long-term RPS data starts in the slice header. When zero, it falls back
to fls(num_short_term_ref_pic_sets - 1) — wrong when num=1 (BBB's case).
α-26 misdirected: set decode_params->short_term_ref_pic_set_size = st_rps_bits
but rkvdec doesn't use that field. The correct consumer is slice_params per
V4L2 spec and rkvdec source.
VAAPI's picture->st_rps_bits is documented as: 'number of bits that structure
short_term_ref_pic_set(num_short_term_ref_pic_sets) takes in slice segment
header when short_term_ref_pic_set_sps_flag equals 0' — exactly what
sl_params->short_term_ref_pic_set_size means.
Frames 1 (IDR) unaffected (V4L2 rkvdec gates on !IDR_PIC flag).
Frames 2+: bit offset for long-term RPS now correct, slice header parsing
no longer falls off the edge of the entropy bitstream.
Env-gated via LIBVA_HEVC_DUMP_SLICE_TAIL=1. Goal: characterise the
40-byte inflation in libva's slice_data buffer vs ffmpeg-v4l2request
(see iter27/28 close — HEVC frame 2+ divergence at byte 1382401).
Dumps per slice: nal_unit_type, slice_data_size, slice_data_byte_offset,
and the last 80 bytes of source_data for that slice. Lets us see if the
trailing 40 bytes are (a) real entropy, (b) trailing zeros, (c) a
next-NAL start code prefix, or (d) random memory.
VAAPI's slice_data_size includes NAL+slice header bytes that precede the
slice payload. rkvdec_hevc expects bit_size to cover the slice payload
(starting at data_byte_offset). Setting bit_size = slice_data_size * 8
made rkvdec read past slice payload → wrong entropy state → frame 2+
garbage despite correct ctx->image_fmt (iter25) and decode_params
(iter26).
Empirical match: with formula (slice_data_size - slice_data_byte_offset)
* 8, libva produces bit_size=44096 for BBB frame 2 matching kdirect's
44096 exactly per iter27 dmesg printk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VAPictureParameterBufferHEVC exposes st_rps_bits — the number of bits
the inline short_term_ref_pic_set syntax element takes in the slice
header. rkvdec's DPB resolution for P/B frames uses this to skip the
RPS data correctly; with size=0 it skips wrong bytes and reads wrong
references → frame 2+ visual divergence.
iter25 evidence: libva HEVC frame 1 byte-identical to kdirect, but
frame 2 diverges at the decode_params bytes 4-5 (libva 0x00 0x00,
kdirect 0x0a 0x00 = 10).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Env-gated by LIBVA_V4L2_REQ_GETBACK. After v4l2_set_controls() against
the request_fd in h265_set_controls(), issue G_EXT_CTRLS with the same
request_fd targeting SPS and log first 16 bytes returned.
iter20 (kernel printk) found rkvdec sees all-zero ctx->ctrl_hdl SPS for
libva HEVC vs correct bytes for kdirect. The remaining branch is whether
req->p_new was ever staged with libva's payload, or whether
v4l2_ctrl_request_setup failed to apply it.
α-24 distinguishes the two:
zero readback -> staging failed in v4l2_s_ext_ctrls
non-zero -> apply failed in v4l2_ctrl_request_setup
EACCES -> kernel disallows req readback; need deeper printk
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If removing DECODE_PARAMS from libva's S_EXT_CTRLS batch lets the other
4 controls stage, rkvdec_hevc_run printk will show w=1280 h=720 etc.
That confirms DECODE_PARAMS specifically is failing kernel validation
and rolling back the whole batch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static storage for sps/pps/decode_params/scaling_matrix + no-free for
slice_params_array. Tests the kernel-defers-compound-copy hypothesis
from iter17 P7 finding.
If hashes change -> mechanism 3 confirmed; will refactor to per-surface
heap allocation.
If hashes unchanged -> mechanism 3 disproved; iter19 explores
mechanisms 1/2/5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes in one commit:
α-13 (h265_fill_sps): sps_max_num_reorder_pics now derived from
sps_max_dec_pic_buffering_minus1 (safe upper bound per H.265 §A.4.2)
instead of hardcoded 0. Phase 5b empirically showed rkvdec ignores
this field on RK3399, so this is wire-correctness hygiene only — matches
kdirect's payload pattern without behavior change.
α-14 (h265_set_controls): derive IRAP_PIC / IDR_PIC flags from the
first slice's nal_unit_type (parsed by h265_fill_slice_params into
slice_params_array[0].nal_unit_type). Without these flags rkvdec
doesn't recognise the keyframe boundary, treats IDR as inter without
references, and produces all-zero CAPTURE output — observed as Bug 5
on libva HEVC (06b2c5a0...). kdirect sets these from the bitstream
parse and decodes correctly (9340b832...).
Mapping:
nal_unit_type 16..23 -> IRAP_PIC
nal_unit_type 19 (IDR_W_RADL) or 20 (IDR_N_LP) -> IDR_PIC
HEVC-only (no risk to other codecs). h265_set_controls already
profile-gated via picture.c::codec_set_controls VAProfileHEVCMain
dispatch. Per feedback_unconditional_codec_state.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC
dispatch + per-slice accumulation against the modern split V4L2_CID_
STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,
DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era
V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were
removed from the kernel UAPI.
Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS:
0xa40a90 SPS (40 bytes)
0xa40a91 PPS (64 bytes)
0xa40a92 SLICE_PARAMS (variable; dynamic-array; one entry per slice)
0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list)
0xa40a94 DECODE_PARAMS (328 bytes; per-frame DPB info)
Plus device-wide menus set once at context.c init (separate batched
S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on
RK3568/RK3399 — silently fails its batch without invalidating H.264):
0xa40a95 DECODE_MODE (FRAME_BASED on rkvdec)
0xa40a96 START_CODE (ANNEX_B on rkvdec)
Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
(v4l2_request_hevc_queue_decode batched submission shape).
Phase 5 review amendments incorporated:
C1 (data_byte_offset NOT data_bit_offset):
Old h265.c at lines 184-209 ran an 8-bit search to compute
bit-granularity offset. New API renames the field to
data_byte_offset (u32 byte offset). Bit-search dropped; replaced
with plain byte offset = source_offset + slice->slice_data_byte_offset.
C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val
singular; poc_st_curr_*[] arrays hold DPB INDICES not POC):
h265_fill_decode_params replaces old slice-params DPB iteration
with explicit DPB classification + index-array population.
For each VAAPI ReferenceFrames[i]:
- Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via
VA_PICTURE_HEVC_RPS_* flags.
- Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic.
- Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR.
- Append j (DPB index, u8) to poc_st_curr_before[k] /
poc_st_curr_after[k] / poc_lt_curr[k] based on classification.
C3 (union-aliasing reasoning corrected):
BeginPicture's params.h265.num_slices = 0 reset is benign for
non-HEVC profiles because byte ~17764 of the params union is past
any field non-HEVC profiles read, NOT because RenderPicture's
per-buffer copies overwrite that location. Wording amended in
phase4_iter2_plan.md per phase5_iter2_review.md.
S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and
UNIFORM_SPACING):
Empirically VAAPI does NOT expose either flag in the
VAPictureParameterBufferHEVC pic_fields.bits or
slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc
fixture uses neither tiles nor explicit deblocking-control
parameters, so the omission is correct for the iter2 binding cell.
S2 (3 PPS scalars added):
pic_parameter_set_id (default 0; VAAPI doesn't expose),
num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_
active_minus1 (both populated from VAAPI picture struct).
Q2 (slice_segment_addr populated):
Was missing in old h265.c. Now sourced from
VAAPI's slice->slice_segment_address.
S3 (SCALING_MATRIX content choice):
Implementer choice taken: when iqmatrix_set==false (BBB has no
scaling list per SPS flags = SAO|STRONG_INTRA_SMOOTHING),
h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's
sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves
byte-equality vs cross-validator anchor).
S4 (FFmpeg function name fix): cosmetic; no code impact.
Plus one Phase 6 inline correction: phase 5 review S1 suggested
VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical
test-compile shows it doesn't. Comment added in h265_fill_pps
documenting the omission.
Picture.c changes (3 edits):
1. codec_set_controls HEVCMain dispatch (lines 204-206 → call
h265_set_controls; replaces explicit Fourier-local: HEVC stripped
reject).
2. codec_store_buffer HEVC VASliceParameterBufferType case: append
VAAPI slice param to params.h265.slices[N] array, increment
num_slices. Single-slice mirror at .slice retained for
h265_fill_pps (which reads dependent_slice_segment_flag from
LongSliceFlags).
3. RequestBeginPicture: add params.h265.num_slices = 0 reset
alongside existing h264.matrix_set = false reset.
Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_
FRAME=64] array + num_slices counter. ~17 KB extra per surface union;
24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth.
object_heap allocator picks up new size automatically via
sizeof(struct object_surface).
Context.c: separate 2-control batched call sets HEVC DECODE_MODE +
START_CODE device-wide. Same best-effort (void)v4l2_set_controls
pattern as the existing H.264 device-init block; if kernel doesn't
advertise HEVC controls (hantro on RK3568/RK3399), the batch silently
fails without invalidating the H.264 batch.
Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73)
in sources + headers lists.
H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct
forward declarations.
Phase 6 smoke test on fresnel (post Commit A + Commit B):
Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding
(/dev/video1 + /dev/media0). PASS.
Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4
-frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots
ready. PASS.
Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture:
HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
HW=SW byte-identical for both frames; frame1 != frame2 (real motion).
PASS.
Criterion 5: regression hashes hold for both prior cells:
H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)
PASS.
All five criteria green on first build attempt — Phase 5 review
caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset
rename; dpb.rps field gone + pic_order_cnt_val rename + index-array
semantics) that would have been Phase 6 compile failures or silent
Phase 7 byte-compare divergences. Without that review pass, this
commit would have been the start of a 2+ loopback debugging cycle.
Refs:
../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses,
File 4 patch shape)
../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2,
S3, S4, Q2 amendments
all incorporated)
../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/
ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7
bonus byte-compare verification target)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>