GStreamer's MERGED v4l2_codec_h265_dec_fill_ext_sps_rps in gst-plugins-bad (GStreamer 1.28, MR !10820) is the primary upstream reference. Walks its own gst_h265_parser_'s GstH265SPS.short_term_ ref_pic_set[] array, field names match the H.265 spec, one-to-one mapping to the V4L2 control struct. Header strategy: runtime-optional control probe, NO #ifndef shim. Casanova's FFmpeg WIP branch (v4l2-request-ext-sps-rps-n8.0.1 at gitlab.collabora.com) is the secondary reference — walks libavcodec internal HEVCSPS->st_rps[] with different field names. Useful as cross-check but not the primary template (renaming gymnastics). cros-codecs has no support yet (would follow GStreamer's shape if added). Casanova's kernel-test framework uses fluster through these two upstream consumers; no other reference exists. Q1 (architecture): resolved — implement H.265 SPS parser in backend, mirror GStreamer pattern with spec-compliant field names. Q2 (UAPI shim): resolved — runtime-optional control probe per GStreamer pattern, NOT #ifndef shim. Remaining sub-question for Phase 1: parser SOURCE (vendor GStreamer's gsth265parser.c, adapt to backend idioms, or implement minimal fresh from H.265 §7.3.7). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
13 KiB
Phase 0 — iter2 (HEVC backend EXT_SPS_*_RPS extension) substrate
Closed 2026-05-16 evening, post-meta-iter1-close.
Research question
Can a libva-v4l2-request-fourier patch that registers and populates V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS and _LT_RPS unblock HEVC HW decode on ampere RK3588 — and if so, what is the source of the RPS array contents (which VAAPI's VAPictureParameterBufferHEVC does NOT expose)?
Substrate
Backend HEVC code layout (in ~/src/libva-v4l2-request-fourier/src/h265.c on ampere):
h265_fill_spsat line 96 — populatesstruct v4l2_ctrl_hevc_spsfromVAPictureParameterBufferHEVC. Readspicture->num_short_term_ref_pic_sets(line 145) andpicture->num_long_term_ref_pic_sps(line 146) into the SPS struct. Does NOT touch RPS arrays.h265_fill_ppsat line 173 — populatesstruct v4l2_ctrl_hevc_pps. Comment at line 238: "VAAPI does not expose either flag in VAPictureParameterBufferHEVC."h265_fill_decode_paramsat ~line 256 — DECODE_PARAMS population; ends with comment at line 325 referencing iter31'sva-st-rps-bits-is-slice-fieldcorrection (the field with the same name in different V4L2 structs has different semantics).h265_fill_slice_paramsat line 361 — SLICE_PARAMS per slice. Has the iter31 α-29 fix:slice_params->short_term_ref_pic_set_size = picture->st_rps_bits(line 477+) — VAAPI'sst_rps_bitsis the slice-header bit-count, belongs here.h265_set_controls(the call site that registers controls) at ~line 660 — registers 5 controls today: SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS viav4l2_set_controls. Plus DECODE_MODE + START_CODE registered earlier incontext.c:465-469.
No H.265 bitstream parser exists. The backend has h264_slice_header.{c,h} for H.264 slice-header parsing (precedent that the codebase does this when needed), but no h265_* parser file.
VAAPI's VAPictureParameterBufferHEVC only exposes RPS COUNTS, not contents. Confirmed by grepping all VAPicture*HEVC field references in h265.c — only num_short_term_ref_pic_sets and num_long_term_ref_pic_sps are read, no delta_poc_s0_minus1[], no delta_idx_minus1, no per-RPS fields. VAAPI's struct simply doesn't carry them.
Kernel struct shapes for the new controls (from ~/src/linux-rockchip/include/uapi/linux/v4l2-controls.h):
struct v4l2_ctrl_hevc_ext_sps_st_rps { // dynamic array, sized by sps->num_short_term_ref_pic_sets, ≤65 entries
__u8 delta_idx_minus1;
__u8 delta_rps_sign;
__u8 num_negative_pics;
__u8 num_positive_pics;
__u32 used_by_curr_pic;
__u32 use_delta_flag;
__u16 abs_delta_rps_minus1;
__u16 delta_poc_s0_minus1[16];
__u16 delta_poc_s1_minus1[16];
__u16 flags; // V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
};
struct v4l2_ctrl_hevc_ext_sps_lt_rps { // dynamic array, sized by sps->num_long_term_ref_pics_sps, ≤65 entries
__u16 lt_ref_pic_poc_lsb_sps;
__u16 flags; // V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
};
linux-api-headers 6.19-1 on ampere does NOT define these — the backend would need a local UAPI shim (precedent: no current hevc-ctrls/ dir in the backend, would need to be added).
Kernel function that crashes (from rkvdec-hevc-common.c:380-410):
static void rkvdec_hevc_prepare_hw_st_rps(struct rkvdec_hevc_run *run, struct rkvdec_rps *rps,
struct v4l2_ctrl_hevc_ext_sps_st_rps *cache)
{
if (!run->ext_sps_st_rps)
return; // ← early return for NULL pointer
if (!memcmp(cache, run->ext_sps_st_rps, // ← OOPSes here per the stack trace
sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
return;
/* ... per-element processing */
}
The crash IS in this memcmp. For the crash to happen at all:
run->ext_sps_st_rpsmust be non-NULL (else early-return fires before memcmp), ANDmemcmpmust dereference an unmapped / invalid address from one ofcacheorrun->ext_sps_st_rps.
Open mechanism question: how does run->ext_sps_st_rps become a non-NULL pointer to invalid memory when the userspace never sets the control? Two candidates:
- (a) V4L2 control framework auto-allocates the control's
p_cur.pto a default-zeroed buffer; later,v4l2_ctrl_findreturns a control whosep_cur.pis a stale sentinel after some state transition. - (b) The control storage is lazily allocated only on first set, but
v4l2_ctrl_findreturns the registered control object whosep_cur.pis whatever the registration-time stub left it as (likely uninitialized).
Resolving (a) vs (b) requires reading drivers/media/v4l2-core/v4l2-ctrls-*.c for the auto-allocation behavior of dynamic-array controls. Phase 2 work — not Phase 0.
In-session baseline anchor for iter2
The HEVC OOPS reproducer remains as captured in ampere-fourier iter1 Phase 0:
LIBVA_DRIVER_NAME=v4l2_request \
ffmpeg -hide_banner -hwaccel vaapi -hwaccel_output_format vaapi \
-i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 \
-vf "hwdownload,format=nv12" -frames:v 30 -f null -
# → kernel OOPS in dmesg, v4l2_mem2mem wedges all decoders until reboot
This is the iter2 falsifier — if a backend patch makes this stop OOPSing, the survey hypothesis is corroborated. If it still OOPSes the same way, mechanism is something else.
Existing precedent: UAPI shim files
The backend currently has NO hevc-ctrls/ directory (was searched; doesn't exist). The H.264 path uses system kernel headers via <linux/v4l2-controls.h>. Adding new HEVC CIDs that aren't in linux-api-headers 6.19-1 will require:
- Adding a
hevc-ctrls/directory with a local stub header that defines the missing constants + structs (matching the kernel 7.0 definitions verbatim). - OR bumping the
linux-api-headerspackage on ampere to 7.0+.
Per the fresnel-iter25 / feedback_rkvdec_image_fmt_pre_seed precedent, the backend ships local UAPI shims when the kernel side gets ahead of distro headers. Iter2 follows that precedent unless the operator prefers the headers-bump route.
Open questions tabled into Phase 1
- Architecture for RPS data sourcing (the BIG one): given VAAPI doesn't expose the RPS table contents, how does the backend obtain them?
- (A) Implement H.265 SPS bitstream parser in the backend — ~800-1500 lines of new code, well-defined per H.265 spec §7.3.2.2 + §7.3.7, follows
h264_slice_header.cprecedent. Highest scope, but self-contained and doesn't add dependencies. - (B) Test the "minimal patch with zero-init RPS data" hypothesis first — if just registering the controls (with
delta_idx_minus1=0, num_*_pics=0etc.) eliminates the OOPS, then HEVC decode probably produces wrong/black frames but doesn't crash. Iterates risk: stage A confirms mechanism, stage B (real parsing) follows. This is the staged approach Phase 1 of the META campaign already named as iter2's first concrete action. - (C) Link libavcodec's HEVC parser — adds a build-time dep on FFmpeg's HEVC code, would expose
H265RawSPS. Avoids reimplementing the parser. Out of campaign-typical practice (backend is minimal-deps); operator decision. - (D) Some other channel I haven't identified — e.g. ffmpeg-vaapi's
VABufferTypeecosystem may have an SPS-RPS extension somewhere; Phase 2 would need to confirm.
- (A) Implement H.265 SPS bitstream parser in the backend — ~800-1500 lines of new code, well-defined per H.265 spec §7.3.2.2 + §7.3.7, follows
linux-api-headersshim vs bump: shiphevc-ctrls/per iter25 precedent, or bump the package?- Mechanism reconstruction depth: do we need to read
v4l2-ctrls-*.cto fully understand WHY the OOPS happens, or is "make ext_sps_*_rps non-NULL with valid data" empirically sufficient to validate the fix? - Test-decode reference clip: BBB 60s 720p HEVC is the iter1 substrate; works for iter2 too. No new clip needed.
- Phase 7 verification anchor: ampere-fourier iter1 baseline (H.264 + VP8 + MPEG-2 still PASS C1-C6) PLUS new HEVC C1-C6 — iter2's Phase 1 success criteria should mirror iter1's per-codec C1-C6 with HEVC added; floor for HEVC SSIM Y at f720 expected in H.264-drift territory (~0.65 ± 0.05) per fresnel iter1 + ampere iter1 convergent observations.
Phase 0 close
Substrate captured: 5 existing HEVC controls in the backend, no H.265 parser, VAAPI doesn't expose RPS contents, kernel struct shapes documented, mechanism partially understood (memcmp dereferences invalid memory; precise cause = open Q3). 5 open questions for Phase 1, with Q1 (architecture for RPS sourcing) being the load-bearing decision.
Upstream-consumer survey (added 2026-05-16 post-Phase-0)
Per feedback_upstream_alignment_over_speed, surveyed real upstream V4L2 stateless HEVC consumers for the EXT_SPS_*_RPS pattern. Subagent transcript: ~/.../tasks/aa6f3e6382bc0d721.output. Findings:
| Consumer | Status | Pattern |
|---|---|---|
| GStreamer | MERGED for GStreamer 1.28 (!10820) | Walks its own gst_h265_parser_*'s GstH265SPS.short_term_ref_pic_set[] array — field names match H.265 spec, one-to-one mapping to the V4L2 struct. Header strategy: runtime-optional control probe, NO #ifndef shim. File: subprojects/gst-plugins-bad/sys/v4l2codecs/gstv4l2codech265dec.c, function gst_v4l2_codec_h265_dec_fill_ext_sps_rps |
| FFmpeg (Casanova WIP) | Not yet on ffmpeg-devel (branch v4l2-request-ext-sps-rps-n8.0.1 at gitlab.collabora.com) |
Walks libavcodec's internal HEVCSPS->st_rps[] (different field names than spec — rps_predict, delta_idx, abs_delta_rps, etc., requires translation). LT_RPS commented-out (incomplete). Function: fill_ext_sps_st_rps in libavcodec/v4l2_request_hevc.c |
| cros-codecs | No support yet (would parse via own cros_codecs::codec::h265::parser::Sps when added — same shape as GStreamer) |
n/a |
| Casanova kernel-test framework | fluster through GStreamer 1.28 + Collabora FFmpeg WIP — no separate reference consumer | n/a |
Bootlin libva-v4l2-request |
Dormant since 2019, no 7.0-UAPI work | n/a |
Upstream-aligned pattern is unambiguous: parse the H.265 SPS NAL ourselves, populate the V4L2 controls from our parser's output. Both active upstream consumers (GStreamer merged, FFmpeg WIP) follow this exactly. VAAPI does not and will not expose the RPS array content, so we must parse.
GStreamer's mapping is the cleanest reference — GstH265ShortTermRefPicSet field names mirror the H.265 spec, so the V4L2-control assignment is mechanical. FFmpeg's renaming gymnastics are a useful cross-check but should NOT be the primary template.
Header strategy decided: no #ifndef shim. Mirror GStreamer's "optional control" probe path — at backend init, VIDIOC_QUERYCTRL the two new CIDs; if both present and the active driver-kind is VDPU381/383 HEVC, set them; if absent, log + skip (graceful fallback for older kernels). Constants + struct shapes need to be available at compile time, however, so the build pipeline either requires linux-api-headers ≥ 7.0 OR ships a minimal internal header with just the two new CIDs + structs (with a comment pointing to the upstream UAPI source). Picking which of those is a tactical Phase 4 detail.
Phase 0 update — Q1 (architecture) and Q2 (UAPI shim) resolved
- Q1 (architecture for RPS data sourcing): B — implement H.265 SPS parser in backend, mirroring GStreamer's
gst_v4l2_codec_h265_dec_fill_ext_sps_rpspattern with one-to-one spec-compliant field names. Per-RPS-set + LT_RPS arrays. - Q2 (UAPI shim vs headers bump): runtime-optional control probe (not a header
#ifndefshim). Compile-time access to the new CIDs/structs handled via either a headers package bump OR a minimal internal header — Phase 4 picks tactically. - Q3 (mechanism reconstruction depth): now lower-priority — once the backend populates valid RPS data per the upstream pattern, the OOPS should be gone whatever its precise cause was. If somehow it isn't, then loopback Phase 0 with whatever new evidence the failure surfaces.
- Q4 (test clip): unchanged — BBB iter1 carries.
- Q5 (Phase 7 anchor): unchanged — ampere-fourier iter1 + HEVC C1-C6 added.
Sub-question remaining for Phase 1 lock: what's the H.265 SPS parser source? Three options, all upstream-aligned:
- (B1) Vendor GStreamer's parser — copy
subprojects/gst-plugins-bad/codecparsers/gsth265parser.c(LGPL, compatible with libva backend license). Keeps backend self-contained; reuses thoroughly tested code; carries forward GStreamer's spec-compliant field naming. Mostly a copy + minor adaptation (drop GLib dependency or replace with libc equivalents). - (B2) Adapt GStreamer's parser to the backend's idioms — same data flow but rewritten to match
h264_slice_header.cstyle (C plain, no GLib). More work; fewer LOC. - (B3) Implement minimal SPS-RPS-only parser fresh from H.265 spec §7.3.7 — narrowest scope (just the bits needed for the two controls), but does not benefit from GStreamer's edge-case handling.
(B1) is the most upstream-aligned. (B2) is the same data flow with the project's house style. (B3) is the most minimal but reinvents.