Files
ampere-kernel-decoders/phase0_findings_iter2.md
T
marfrit cd047a34de iter2 phase0: HEVC backend extension substrate
5 existing HEVC controls in backend (SPS/PPS/SLICE_PARAMS/SCALING_
MATRIX/DECODE_PARAMS at h265.c:660-688) + DECODE_MODE/START_CODE in
context.c. No H.265 bitstream parser in backend (h264_slice_header.c
is the only such precedent — for H.264).

CRITICAL substrate finding: VAAPI VAPictureParameterBufferHEVC
exposes RPS COUNTS (num_short_term_ref_pic_sets,
num_long_term_ref_pic_sps) but NOT the per-RPS array contents
(delta_poc_s0_minus1[], delta_idx_minus1, etc.). So the backend
can't just copy from VAAPI — needs another data source.

5 open questions tabled for iter2 Phase 1, with Q1 = architecture
for RPS data sourcing being load-bearing:
  A. Implement H.265 SPS parser in backend (~800-1500 LOC)
  B. Stage-A test minimal-patch hypothesis (zero-init RPS) first
  C. Link libavcodec's H265RawSPS (adds FFmpeg build dep)
  D. Some other channel TBD (e.g. VAAPI extension buffer)

Plus Q2 (linux-api-headers shim vs bump), Q3 (mechanism depth),
Q4 (test clip — BBB iter1 carries), Q5 (Phase 7 anchor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:33:50 +00:00

8.4 KiB
Raw Blame History

Phase 0 — iter2 (HEVC backend EXT_SPS_*_RPS extension) substrate

Closed 2026-05-16 evening, post-meta-iter1-close.

Research question

Can a libva-v4l2-request-fourier patch that registers and populates V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS and _LT_RPS unblock HEVC HW decode on ampere RK3588 — and if so, what is the source of the RPS array contents (which VAAPI's VAPictureParameterBufferHEVC does NOT expose)?

Substrate

Backend HEVC code layout (in ~/src/libva-v4l2-request-fourier/src/h265.c on ampere):

  • h265_fill_sps at line 96 — populates struct v4l2_ctrl_hevc_sps from VAPictureParameterBufferHEVC. Reads picture->num_short_term_ref_pic_sets (line 145) and picture->num_long_term_ref_pic_sps (line 146) into the SPS struct. Does NOT touch RPS arrays.
  • h265_fill_pps at line 173 — populates struct v4l2_ctrl_hevc_pps. Comment at line 238: "VAAPI does not expose either flag in VAPictureParameterBufferHEVC."
  • h265_fill_decode_params at ~line 256 — DECODE_PARAMS population; ends with comment at line 325 referencing iter31's va-st-rps-bits-is-slice-field correction (the field with the same name in different V4L2 structs has different semantics).
  • h265_fill_slice_params at line 361 — SLICE_PARAMS per slice. Has the iter31 α-29 fix: slice_params->short_term_ref_pic_set_size = picture->st_rps_bits (line 477+) — VAAPI's st_rps_bits is the slice-header bit-count, belongs here.
  • h265_set_controls (the call site that registers controls) at ~line 660 — registers 5 controls today: SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS via v4l2_set_controls. Plus DECODE_MODE + START_CODE registered earlier in context.c:465-469.

No H.265 bitstream parser exists. The backend has h264_slice_header.{c,h} for H.264 slice-header parsing (precedent that the codebase does this when needed), but no h265_* parser file.

VAAPI's VAPictureParameterBufferHEVC only exposes RPS COUNTS, not contents. Confirmed by grepping all VAPicture*HEVC field references in h265.c — only num_short_term_ref_pic_sets and num_long_term_ref_pic_sps are read, no delta_poc_s0_minus1[], no delta_idx_minus1, no per-RPS fields. VAAPI's struct simply doesn't carry them.

Kernel struct shapes for the new controls (from ~/src/linux-rockchip/include/uapi/linux/v4l2-controls.h):

struct v4l2_ctrl_hevc_ext_sps_st_rps {  // dynamic array, sized by sps->num_short_term_ref_pic_sets, ≤65 entries
    __u8  delta_idx_minus1;
    __u8  delta_rps_sign;
    __u8  num_negative_pics;
    __u8  num_positive_pics;
    __u32 used_by_curr_pic;
    __u32 use_delta_flag;
    __u16 abs_delta_rps_minus1;
    __u16 delta_poc_s0_minus1[16];
    __u16 delta_poc_s1_minus1[16];
    __u16 flags;  // V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
};

struct v4l2_ctrl_hevc_ext_sps_lt_rps {  // dynamic array, sized by sps->num_long_term_ref_pics_sps, ≤65 entries
    __u16 lt_ref_pic_poc_lsb_sps;
    __u16 flags;  // V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
};

linux-api-headers 6.19-1 on ampere does NOT define these — the backend would need a local UAPI shim (precedent: no current hevc-ctrls/ dir in the backend, would need to be added).

Kernel function that crashes (from rkvdec-hevc-common.c:380-410):

static void rkvdec_hevc_prepare_hw_st_rps(struct rkvdec_hevc_run *run, struct rkvdec_rps *rps,
                                          struct v4l2_ctrl_hevc_ext_sps_st_rps *cache)
{
    if (!run->ext_sps_st_rps)
        return;                                                       // ← early return for NULL pointer
    if (!memcmp(cache, run->ext_sps_st_rps,                          // ← OOPSes here per the stack trace
                sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
        return;
    /* ... per-element processing */
}

The crash IS in this memcmp. For the crash to happen at all:

  • run->ext_sps_st_rps must be non-NULL (else early-return fires before memcmp), AND
  • memcmp must dereference an unmapped / invalid address from one of cache or run->ext_sps_st_rps.

Open mechanism question: how does run->ext_sps_st_rps become a non-NULL pointer to invalid memory when the userspace never sets the control? Two candidates:

  • (a) V4L2 control framework auto-allocates the control's p_cur.p to a default-zeroed buffer; later, v4l2_ctrl_find returns a control whose p_cur.p is a stale sentinel after some state transition.
  • (b) The control storage is lazily allocated only on first set, but v4l2_ctrl_find returns the registered control object whose p_cur.p is whatever the registration-time stub left it as (likely uninitialized).

Resolving (a) vs (b) requires reading drivers/media/v4l2-core/v4l2-ctrls-*.c for the auto-allocation behavior of dynamic-array controls. Phase 2 work — not Phase 0.

In-session baseline anchor for iter2

The HEVC OOPS reproducer remains as captured in ampere-fourier iter1 Phase 0:

LIBVA_DRIVER_NAME=v4l2_request \
ffmpeg -hide_banner -hwaccel vaapi -hwaccel_output_format vaapi \
    -i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 \
    -vf "hwdownload,format=nv12" -frames:v 30 -f null -
# → kernel OOPS in dmesg, v4l2_mem2mem wedges all decoders until reboot

This is the iter2 falsifier — if a backend patch makes this stop OOPSing, the survey hypothesis is corroborated. If it still OOPSes the same way, mechanism is something else.

Existing precedent: UAPI shim files

The backend currently has NO hevc-ctrls/ directory (was searched; doesn't exist). The H.264 path uses system kernel headers via <linux/v4l2-controls.h>. Adding new HEVC CIDs that aren't in linux-api-headers 6.19-1 will require:

  • Adding a hevc-ctrls/ directory with a local stub header that defines the missing constants + structs (matching the kernel 7.0 definitions verbatim).
  • OR bumping the linux-api-headers package on ampere to 7.0+.

Per the fresnel-iter25 / feedback_rkvdec_image_fmt_pre_seed precedent, the backend ships local UAPI shims when the kernel side gets ahead of distro headers. Iter2 follows that precedent unless the operator prefers the headers-bump route.

Open questions tabled into Phase 1

  1. Architecture for RPS data sourcing (the BIG one): given VAAPI doesn't expose the RPS table contents, how does the backend obtain them?
    • (A) Implement H.265 SPS bitstream parser in the backend — ~800-1500 lines of new code, well-defined per H.265 spec §7.3.2.2 + §7.3.7, follows h264_slice_header.c precedent. Highest scope, but self-contained and doesn't add dependencies.
    • (B) Test the "minimal patch with zero-init RPS data" hypothesis first — if just registering the controls (with delta_idx_minus1=0, num_*_pics=0 etc.) eliminates the OOPS, then HEVC decode probably produces wrong/black frames but doesn't crash. Iterates risk: stage A confirms mechanism, stage B (real parsing) follows. This is the staged approach Phase 1 of the META campaign already named as iter2's first concrete action.
    • (C) Link libavcodec's HEVC parser — adds a build-time dep on FFmpeg's HEVC code, would expose H265RawSPS. Avoids reimplementing the parser. Out of campaign-typical practice (backend is minimal-deps); operator decision.
    • (D) Some other channel I haven't identified — e.g. ffmpeg-vaapi's VABufferType ecosystem may have an SPS-RPS extension somewhere; Phase 2 would need to confirm.
  2. linux-api-headers shim vs bump: ship hevc-ctrls/ per iter25 precedent, or bump the package?
  3. Mechanism reconstruction depth: do we need to read v4l2-ctrls-*.c to fully understand WHY the OOPS happens, or is "make ext_sps_*_rps non-NULL with valid data" empirically sufficient to validate the fix?
  4. Test-decode reference clip: BBB 60s 720p HEVC is the iter1 substrate; works for iter2 too. No new clip needed.
  5. Phase 7 verification anchor: ampere-fourier iter1 baseline (H.264 + VP8 + MPEG-2 still PASS C1-C6) PLUS new HEVC C1-C6 — iter2's Phase 1 success criteria should mirror iter1's per-codec C1-C6 with HEVC added; floor for HEVC SSIM Y at f720 expected in H.264-drift territory (~0.65 ± 0.05) per fresnel iter1 + ampere iter1 convergent observations.

Phase 0 close

Substrate captured: 5 existing HEVC controls in the backend, no H.265 parser, VAAPI doesn't expose RPS contents, kernel struct shapes documented, mechanism partially understood (memcmp dereferences invalid memory; precise cause = open Q3). 5 open questions for Phase 1, with Q1 (architecture for RPS sourcing) being the load-bearing decision.