Files
ampere-kernel-decoders/phase0_findings_iter2.md
marfrit 299e376d51 iter2 phase0 update: upstream-consumer survey closes Q1 + Q2
GStreamer's MERGED v4l2_codec_h265_dec_fill_ext_sps_rps in
gst-plugins-bad (GStreamer 1.28, MR !10820) is the primary upstream
reference. Walks its own gst_h265_parser_'s GstH265SPS.short_term_
ref_pic_set[] array, field names match the H.265 spec, one-to-one
mapping to the V4L2 control struct. Header strategy: runtime-optional
control probe, NO #ifndef shim.

Casanova's FFmpeg WIP branch (v4l2-request-ext-sps-rps-n8.0.1 at
gitlab.collabora.com) is the secondary reference — walks libavcodec
internal HEVCSPS->st_rps[] with different field names. Useful as
cross-check but not the primary template (renaming gymnastics).

cros-codecs has no support yet (would follow GStreamer's shape if
added). Casanova's kernel-test framework uses fluster through these
two upstream consumers; no other reference exists.

Q1 (architecture): resolved — implement H.265 SPS parser in backend,
mirror GStreamer pattern with spec-compliant field names.
Q2 (UAPI shim): resolved — runtime-optional control probe per
GStreamer pattern, NOT #ifndef shim.

Remaining sub-question for Phase 1: parser SOURCE (vendor GStreamer's
gsth265parser.c, adapt to backend idioms, or implement minimal fresh
from H.265 §7.3.7).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:38:52 +00:00

13 KiB
Raw Permalink Blame History

Phase 0 — iter2 (HEVC backend EXT_SPS_*_RPS extension) substrate

Closed 2026-05-16 evening, post-meta-iter1-close.

Research question

Can a libva-v4l2-request-fourier patch that registers and populates V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS and _LT_RPS unblock HEVC HW decode on ampere RK3588 — and if so, what is the source of the RPS array contents (which VAAPI's VAPictureParameterBufferHEVC does NOT expose)?

Substrate

Backend HEVC code layout (in ~/src/libva-v4l2-request-fourier/src/h265.c on ampere):

  • h265_fill_sps at line 96 — populates struct v4l2_ctrl_hevc_sps from VAPictureParameterBufferHEVC. Reads picture->num_short_term_ref_pic_sets (line 145) and picture->num_long_term_ref_pic_sps (line 146) into the SPS struct. Does NOT touch RPS arrays.
  • h265_fill_pps at line 173 — populates struct v4l2_ctrl_hevc_pps. Comment at line 238: "VAAPI does not expose either flag in VAPictureParameterBufferHEVC."
  • h265_fill_decode_params at ~line 256 — DECODE_PARAMS population; ends with comment at line 325 referencing iter31's va-st-rps-bits-is-slice-field correction (the field with the same name in different V4L2 structs has different semantics).
  • h265_fill_slice_params at line 361 — SLICE_PARAMS per slice. Has the iter31 α-29 fix: slice_params->short_term_ref_pic_set_size = picture->st_rps_bits (line 477+) — VAAPI's st_rps_bits is the slice-header bit-count, belongs here.
  • h265_set_controls (the call site that registers controls) at ~line 660 — registers 5 controls today: SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS via v4l2_set_controls. Plus DECODE_MODE + START_CODE registered earlier in context.c:465-469.

No H.265 bitstream parser exists. The backend has h264_slice_header.{c,h} for H.264 slice-header parsing (precedent that the codebase does this when needed), but no h265_* parser file.

VAAPI's VAPictureParameterBufferHEVC only exposes RPS COUNTS, not contents. Confirmed by grepping all VAPicture*HEVC field references in h265.c — only num_short_term_ref_pic_sets and num_long_term_ref_pic_sps are read, no delta_poc_s0_minus1[], no delta_idx_minus1, no per-RPS fields. VAAPI's struct simply doesn't carry them.

Kernel struct shapes for the new controls (from ~/src/linux-rockchip/include/uapi/linux/v4l2-controls.h):

struct v4l2_ctrl_hevc_ext_sps_st_rps {  // dynamic array, sized by sps->num_short_term_ref_pic_sets, ≤65 entries
    __u8  delta_idx_minus1;
    __u8  delta_rps_sign;
    __u8  num_negative_pics;
    __u8  num_positive_pics;
    __u32 used_by_curr_pic;
    __u32 use_delta_flag;
    __u16 abs_delta_rps_minus1;
    __u16 delta_poc_s0_minus1[16];
    __u16 delta_poc_s1_minus1[16];
    __u16 flags;  // V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
};

struct v4l2_ctrl_hevc_ext_sps_lt_rps {  // dynamic array, sized by sps->num_long_term_ref_pics_sps, ≤65 entries
    __u16 lt_ref_pic_poc_lsb_sps;
    __u16 flags;  // V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
};

linux-api-headers 6.19-1 on ampere does NOT define these — the backend would need a local UAPI shim (precedent: no current hevc-ctrls/ dir in the backend, would need to be added).

Kernel function that crashes (from rkvdec-hevc-common.c:380-410):

static void rkvdec_hevc_prepare_hw_st_rps(struct rkvdec_hevc_run *run, struct rkvdec_rps *rps,
                                          struct v4l2_ctrl_hevc_ext_sps_st_rps *cache)
{
    if (!run->ext_sps_st_rps)
        return;                                                       // ← early return for NULL pointer
    if (!memcmp(cache, run->ext_sps_st_rps,                          // ← OOPSes here per the stack trace
                sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
        return;
    /* ... per-element processing */
}

The crash IS in this memcmp. For the crash to happen at all:

  • run->ext_sps_st_rps must be non-NULL (else early-return fires before memcmp), AND
  • memcmp must dereference an unmapped / invalid address from one of cache or run->ext_sps_st_rps.

Open mechanism question: how does run->ext_sps_st_rps become a non-NULL pointer to invalid memory when the userspace never sets the control? Two candidates:

  • (a) V4L2 control framework auto-allocates the control's p_cur.p to a default-zeroed buffer; later, v4l2_ctrl_find returns a control whose p_cur.p is a stale sentinel after some state transition.
  • (b) The control storage is lazily allocated only on first set, but v4l2_ctrl_find returns the registered control object whose p_cur.p is whatever the registration-time stub left it as (likely uninitialized).

Resolving (a) vs (b) requires reading drivers/media/v4l2-core/v4l2-ctrls-*.c for the auto-allocation behavior of dynamic-array controls. Phase 2 work — not Phase 0.

In-session baseline anchor for iter2

The HEVC OOPS reproducer remains as captured in ampere-fourier iter1 Phase 0:

LIBVA_DRIVER_NAME=v4l2_request \
ffmpeg -hide_banner -hwaccel vaapi -hwaccel_output_format vaapi \
    -i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 \
    -vf "hwdownload,format=nv12" -frames:v 30 -f null -
# → kernel OOPS in dmesg, v4l2_mem2mem wedges all decoders until reboot

This is the iter2 falsifier — if a backend patch makes this stop OOPSing, the survey hypothesis is corroborated. If it still OOPSes the same way, mechanism is something else.

Existing precedent: UAPI shim files

The backend currently has NO hevc-ctrls/ directory (was searched; doesn't exist). The H.264 path uses system kernel headers via <linux/v4l2-controls.h>. Adding new HEVC CIDs that aren't in linux-api-headers 6.19-1 will require:

  • Adding a hevc-ctrls/ directory with a local stub header that defines the missing constants + structs (matching the kernel 7.0 definitions verbatim).
  • OR bumping the linux-api-headers package on ampere to 7.0+.

Per the fresnel-iter25 / feedback_rkvdec_image_fmt_pre_seed precedent, the backend ships local UAPI shims when the kernel side gets ahead of distro headers. Iter2 follows that precedent unless the operator prefers the headers-bump route.

Open questions tabled into Phase 1

  1. Architecture for RPS data sourcing (the BIG one): given VAAPI doesn't expose the RPS table contents, how does the backend obtain them?
    • (A) Implement H.265 SPS bitstream parser in the backend — ~800-1500 lines of new code, well-defined per H.265 spec §7.3.2.2 + §7.3.7, follows h264_slice_header.c precedent. Highest scope, but self-contained and doesn't add dependencies.
    • (B) Test the "minimal patch with zero-init RPS data" hypothesis first — if just registering the controls (with delta_idx_minus1=0, num_*_pics=0 etc.) eliminates the OOPS, then HEVC decode probably produces wrong/black frames but doesn't crash. Iterates risk: stage A confirms mechanism, stage B (real parsing) follows. This is the staged approach Phase 1 of the META campaign already named as iter2's first concrete action.
    • (C) Link libavcodec's HEVC parser — adds a build-time dep on FFmpeg's HEVC code, would expose H265RawSPS. Avoids reimplementing the parser. Out of campaign-typical practice (backend is minimal-deps); operator decision.
    • (D) Some other channel I haven't identified — e.g. ffmpeg-vaapi's VABufferType ecosystem may have an SPS-RPS extension somewhere; Phase 2 would need to confirm.
  2. linux-api-headers shim vs bump: ship hevc-ctrls/ per iter25 precedent, or bump the package?
  3. Mechanism reconstruction depth: do we need to read v4l2-ctrls-*.c to fully understand WHY the OOPS happens, or is "make ext_sps_*_rps non-NULL with valid data" empirically sufficient to validate the fix?
  4. Test-decode reference clip: BBB 60s 720p HEVC is the iter1 substrate; works for iter2 too. No new clip needed.
  5. Phase 7 verification anchor: ampere-fourier iter1 baseline (H.264 + VP8 + MPEG-2 still PASS C1-C6) PLUS new HEVC C1-C6 — iter2's Phase 1 success criteria should mirror iter1's per-codec C1-C6 with HEVC added; floor for HEVC SSIM Y at f720 expected in H.264-drift territory (~0.65 ± 0.05) per fresnel iter1 + ampere iter1 convergent observations.

Phase 0 close

Substrate captured: 5 existing HEVC controls in the backend, no H.265 parser, VAAPI doesn't expose RPS contents, kernel struct shapes documented, mechanism partially understood (memcmp dereferences invalid memory; precise cause = open Q3). 5 open questions for Phase 1, with Q1 (architecture for RPS sourcing) being the load-bearing decision.


Upstream-consumer survey (added 2026-05-16 post-Phase-0)

Per feedback_upstream_alignment_over_speed, surveyed real upstream V4L2 stateless HEVC consumers for the EXT_SPS_*_RPS pattern. Subagent transcript: ~/.../tasks/aa6f3e6382bc0d721.output. Findings:

Consumer Status Pattern
GStreamer MERGED for GStreamer 1.28 (!10820) Walks its own gst_h265_parser_*'s GstH265SPS.short_term_ref_pic_set[] array — field names match H.265 spec, one-to-one mapping to the V4L2 struct. Header strategy: runtime-optional control probe, NO #ifndef shim. File: subprojects/gst-plugins-bad/sys/v4l2codecs/gstv4l2codech265dec.c, function gst_v4l2_codec_h265_dec_fill_ext_sps_rps
FFmpeg (Casanova WIP) Not yet on ffmpeg-devel (branch v4l2-request-ext-sps-rps-n8.0.1 at gitlab.collabora.com) Walks libavcodec's internal HEVCSPS->st_rps[] (different field names than spec — rps_predict, delta_idx, abs_delta_rps, etc., requires translation). LT_RPS commented-out (incomplete). Function: fill_ext_sps_st_rps in libavcodec/v4l2_request_hevc.c
cros-codecs No support yet (would parse via own cros_codecs::codec::h265::parser::Sps when added — same shape as GStreamer) n/a
Casanova kernel-test framework fluster through GStreamer 1.28 + Collabora FFmpeg WIP — no separate reference consumer n/a
Bootlin libva-v4l2-request Dormant since 2019, no 7.0-UAPI work n/a

Upstream-aligned pattern is unambiguous: parse the H.265 SPS NAL ourselves, populate the V4L2 controls from our parser's output. Both active upstream consumers (GStreamer merged, FFmpeg WIP) follow this exactly. VAAPI does not and will not expose the RPS array content, so we must parse.

GStreamer's mapping is the cleanest referenceGstH265ShortTermRefPicSet field names mirror the H.265 spec, so the V4L2-control assignment is mechanical. FFmpeg's renaming gymnastics are a useful cross-check but should NOT be the primary template.

Header strategy decided: no #ifndef shim. Mirror GStreamer's "optional control" probe path — at backend init, VIDIOC_QUERYCTRL the two new CIDs; if both present and the active driver-kind is VDPU381/383 HEVC, set them; if absent, log + skip (graceful fallback for older kernels). Constants + struct shapes need to be available at compile time, however, so the build pipeline either requires linux-api-headers ≥ 7.0 OR ships a minimal internal header with just the two new CIDs + structs (with a comment pointing to the upstream UAPI source). Picking which of those is a tactical Phase 4 detail.

Phase 0 update — Q1 (architecture) and Q2 (UAPI shim) resolved

  • Q1 (architecture for RPS data sourcing): B — implement H.265 SPS parser in backend, mirroring GStreamer's gst_v4l2_codec_h265_dec_fill_ext_sps_rps pattern with one-to-one spec-compliant field names. Per-RPS-set + LT_RPS arrays.
  • Q2 (UAPI shim vs headers bump): runtime-optional control probe (not a header #ifndef shim). Compile-time access to the new CIDs/structs handled via either a headers package bump OR a minimal internal header — Phase 4 picks tactically.
  • Q3 (mechanism reconstruction depth): now lower-priority — once the backend populates valid RPS data per the upstream pattern, the OOPS should be gone whatever its precise cause was. If somehow it isn't, then loopback Phase 0 with whatever new evidence the failure surfaces.
  • Q4 (test clip): unchanged — BBB iter1 carries.
  • Q5 (Phase 7 anchor): unchanged — ampere-fourier iter1 + HEVC C1-C6 added.

Sub-question remaining for Phase 1 lock: what's the H.265 SPS parser source? Three options, all upstream-aligned:

  • (B1) Vendor GStreamer's parser — copy subprojects/gst-plugins-bad/codecparsers/gsth265parser.c (LGPL, compatible with libva backend license). Keeps backend self-contained; reuses thoroughly tested code; carries forward GStreamer's spec-compliant field naming. Mostly a copy + minor adaptation (drop GLib dependency or replace with libc equivalents).
  • (B2) Adapt GStreamer's parser to the backend's idioms — same data flow but rewritten to match h264_slice_header.c style (C plain, no GLib). More work; fewer LOC.
  • (B3) Implement minimal SPS-RPS-only parser fresh from H.265 spec §7.3.7 — narrowest scope (just the bits needed for the two controls), but does not benefit from GStreamer's edge-case handling.

(B1) is the most upstream-aligned. (B2) is the same data flow with the project's house style. (B3) is the most minimal but reinvents.