iter2 phase0: HEVC backend extension substrate

5 existing HEVC controls in backend (SPS/PPS/SLICE_PARAMS/SCALING_
MATRIX/DECODE_PARAMS at h265.c:660-688) + DECODE_MODE/START_CODE in
context.c. No H.265 bitstream parser in backend (h264_slice_header.c
is the only such precedent — for H.264).

CRITICAL substrate finding: VAAPI VAPictureParameterBufferHEVC
exposes RPS COUNTS (num_short_term_ref_pic_sets,
num_long_term_ref_pic_sps) but NOT the per-RPS array contents
(delta_poc_s0_minus1[], delta_idx_minus1, etc.). So the backend
can't just copy from VAAPI — needs another data source.

5 open questions tabled for iter2 Phase 1, with Q1 = architecture
for RPS data sourcing being load-bearing:
  A. Implement H.265 SPS parser in backend (~800-1500 LOC)
  B. Stage-A test minimal-patch hypothesis (zero-init RPS) first
  C. Link libavcodec's H265RawSPS (adds FFmpeg build dep)
  D. Some other channel TBD (e.g. VAAPI extension buffer)

Plus Q2 (linux-api-headers shim vs bump), Q3 (mechanism depth),
Q4 (test clip — BBB iter1 carries), Q5 (Phase 7 anchor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 08:33:50 +00:00
parent 0b3c23ba66
commit cd047a34de
+108
View File
@@ -0,0 +1,108 @@
# Phase 0 — iter2 (HEVC backend EXT_SPS_*_RPS extension) substrate
Closed 2026-05-16 evening, post-meta-iter1-close.
## Research question
**Can a libva-v4l2-request-fourier patch that registers and populates `V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS` and `_LT_RPS` unblock HEVC HW decode on ampere RK3588 — and if so, what is the source of the RPS array contents (which VAAPI's `VAPictureParameterBufferHEVC` does NOT expose)?**
## Substrate
**Backend HEVC code layout** (in `~/src/libva-v4l2-request-fourier/src/h265.c` on ampere):
- `h265_fill_sps` at line 96 — populates `struct v4l2_ctrl_hevc_sps` from `VAPictureParameterBufferHEVC`. Reads `picture->num_short_term_ref_pic_sets` (line 145) and `picture->num_long_term_ref_pic_sps` (line 146) into the SPS struct. **Does NOT touch RPS arrays.**
- `h265_fill_pps` at line 173 — populates `struct v4l2_ctrl_hevc_pps`. Comment at line 238: *"VAAPI does not expose either flag in VAPictureParameterBufferHEVC."*
- `h265_fill_decode_params` at ~line 256 — DECODE_PARAMS population; ends with comment at line 325 referencing iter31's `va-st-rps-bits-is-slice-field` correction (the field with the same name in different V4L2 structs has different semantics).
- `h265_fill_slice_params` at line 361 — SLICE_PARAMS per slice. Has the iter31 α-29 fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (line 477+) — VAAPI's `st_rps_bits` is the slice-header bit-count, belongs here.
- `h265_set_controls` (the call site that registers controls) at ~line 660 — registers **5** controls today: SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS via `v4l2_set_controls`. Plus DECODE_MODE + START_CODE registered earlier in `context.c:465-469`.
**No H.265 bitstream parser exists.** The backend has `h264_slice_header.{c,h}` for H.264 slice-header parsing (precedent that the codebase does this when needed), but no `h265_*` parser file.
**VAAPI's `VAPictureParameterBufferHEVC` only exposes RPS COUNTS, not contents.** Confirmed by grepping all `VAPicture*HEVC` field references in h265.c — only `num_short_term_ref_pic_sets` and `num_long_term_ref_pic_sps` are read, no `delta_poc_s0_minus1[]`, no `delta_idx_minus1`, no per-RPS fields. VAAPI's struct simply doesn't carry them.
**Kernel struct shapes for the new controls** (from `~/src/linux-rockchip/include/uapi/linux/v4l2-controls.h`):
```c
struct v4l2_ctrl_hevc_ext_sps_st_rps { // dynamic array, sized by sps->num_short_term_ref_pic_sets, ≤65 entries
__u8 delta_idx_minus1;
__u8 delta_rps_sign;
__u8 num_negative_pics;
__u8 num_positive_pics;
__u32 used_by_curr_pic;
__u32 use_delta_flag;
__u16 abs_delta_rps_minus1;
__u16 delta_poc_s0_minus1[16];
__u16 delta_poc_s1_minus1[16];
__u16 flags; // V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
};
struct v4l2_ctrl_hevc_ext_sps_lt_rps { // dynamic array, sized by sps->num_long_term_ref_pics_sps, ≤65 entries
__u16 lt_ref_pic_poc_lsb_sps;
__u16 flags; // V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
};
```
`linux-api-headers 6.19-1` on ampere does NOT define these — the backend would need a local UAPI shim (precedent: no current `hevc-ctrls/` dir in the backend, would need to be added).
**Kernel function that crashes** (from `rkvdec-hevc-common.c:380-410`):
```c
static void rkvdec_hevc_prepare_hw_st_rps(struct rkvdec_hevc_run *run, struct rkvdec_rps *rps,
struct v4l2_ctrl_hevc_ext_sps_st_rps *cache)
{
if (!run->ext_sps_st_rps)
return; // ← early return for NULL pointer
if (!memcmp(cache, run->ext_sps_st_rps, // ← OOPSes here per the stack trace
sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
return;
/* ... per-element processing */
}
```
The crash IS in this memcmp. For the crash to happen at all:
- `run->ext_sps_st_rps` must be non-NULL (else early-return fires before memcmp), AND
- `memcmp` must dereference an unmapped / invalid address from one of `cache` or `run->ext_sps_st_rps`.
**Open mechanism question**: how does `run->ext_sps_st_rps` become a non-NULL pointer to invalid memory when the userspace never sets the control? Two candidates:
- (a) V4L2 control framework auto-allocates the control's `p_cur.p` to a default-zeroed buffer; later, `v4l2_ctrl_find` returns a control whose `p_cur.p` is a stale sentinel after some state transition.
- (b) The control storage is lazily allocated only on first set, but `v4l2_ctrl_find` returns the registered control object whose `p_cur.p` is whatever the registration-time stub left it as (likely uninitialized).
Resolving (a) vs (b) requires reading `drivers/media/v4l2-core/v4l2-ctrls-*.c` for the auto-allocation behavior of dynamic-array controls. Phase 2 work — not Phase 0.
## In-session baseline anchor for iter2
The HEVC OOPS reproducer remains as captured in `ampere-fourier` iter1 Phase 0:
```sh
LIBVA_DRIVER_NAME=v4l2_request \
ffmpeg -hide_banner -hwaccel vaapi -hwaccel_output_format vaapi \
-i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 \
-vf "hwdownload,format=nv12" -frames:v 30 -f null -
# → kernel OOPS in dmesg, v4l2_mem2mem wedges all decoders until reboot
```
This is the iter2 falsifier — if a backend patch makes this stop OOPSing, the survey hypothesis is corroborated. If it still OOPSes the same way, mechanism is something else.
## Existing precedent: UAPI shim files
The backend currently has NO `hevc-ctrls/` directory (was searched; doesn't exist). The H.264 path uses system kernel headers via `<linux/v4l2-controls.h>`. Adding new HEVC CIDs that aren't in `linux-api-headers 6.19-1` will require:
- Adding a `hevc-ctrls/` directory with a local stub header that defines the missing constants + structs (matching the kernel 7.0 definitions verbatim).
- OR bumping the `linux-api-headers` package on ampere to 7.0+.
Per the fresnel-iter25 / `feedback_rkvdec_image_fmt_pre_seed` precedent, the backend ships local UAPI shims when the kernel side gets ahead of distro headers. Iter2 follows that precedent unless the operator prefers the headers-bump route.
## Open questions tabled into Phase 1
1. **Architecture for RPS data sourcing** (the BIG one): given VAAPI doesn't expose the RPS table contents, how does the backend obtain them?
- **(A) Implement H.265 SPS bitstream parser in the backend** — ~800-1500 lines of new code, well-defined per H.265 spec §7.3.2.2 + §7.3.7, follows `h264_slice_header.c` precedent. Highest scope, but self-contained and doesn't add dependencies.
- **(B) Test the "minimal patch with zero-init RPS data" hypothesis first** — if just registering the controls (with `delta_idx_minus1=0, num_*_pics=0` etc.) eliminates the OOPS, then HEVC decode probably produces wrong/black frames but doesn't crash. Iterates risk: stage A confirms mechanism, stage B (real parsing) follows. This is the staged approach Phase 1 of the META campaign already named as iter2's first concrete action.
- **(C) Link libavcodec's HEVC parser** — adds a build-time dep on FFmpeg's HEVC code, would expose `H265RawSPS`. Avoids reimplementing the parser. Out of campaign-typical practice (backend is minimal-deps); operator decision.
- **(D) Some other channel I haven't identified** — e.g. ffmpeg-vaapi's `VABufferType` ecosystem may have an SPS-RPS extension somewhere; Phase 2 would need to confirm.
2. **`linux-api-headers` shim vs bump**: ship `hevc-ctrls/` per iter25 precedent, or bump the package?
3. **Mechanism reconstruction depth**: do we need to read `v4l2-ctrls-*.c` to fully understand WHY the OOPS happens, or is "make ext_sps_*_rps non-NULL with valid data" empirically sufficient to validate the fix?
4. **Test-decode reference clip**: BBB 60s 720p HEVC is the iter1 substrate; works for iter2 too. No new clip needed.
5. **Phase 7 verification anchor**: ampere-fourier iter1 baseline (H.264 + VP8 + MPEG-2 still PASS C1-C6) PLUS new HEVC C1-C6 — iter2's Phase 1 success criteria should mirror iter1's per-codec C1-C6 with HEVC added; floor for HEVC SSIM Y at f720 expected in H.264-drift territory (~0.65 ± 0.05) per fresnel iter1 + ampere iter1 convergent observations.
## Phase 0 close
Substrate captured: 5 existing HEVC controls in the backend, no H.265 parser, VAAPI doesn't expose RPS contents, kernel struct shapes documented, mechanism partially understood (memcmp dereferences invalid memory; precise cause = open Q3). 5 open questions for Phase 1, with Q1 (architecture for RPS sourcing) being the load-bearing decision.