Files
ampere-kernel-decoders/phase4_plan_iter2.md
T
marfrit d2ff661554 iter2 phase5: second-model review, 1 finding rebut, 3 findings adopt
Plan subagent (sonnet) reviewed Phase 0-4 with raw artifacts. Four
findings:

1. GstH265SPSEXT data-source gap — REBUT with empirical evidence.
   Reviewer confused GstH265SPSEXT (SPS extension params, separate
   struct) with GstH265ShortTermRefPicSetExt (RPS-extended struct,
   contains the 4 fields they flagged as missing). Both
   ShortTermRefPicSet AND ShortTermRefPicSetExt ARE in the vendored
   gsth265parser.h. Direct read of fetched header confirmed.
2. Per-fd storage for has_hevc_ext_sps_rps — ADOPT. Mirror iter38
   pattern of storing rkvdec + hantro fds separately. Add explicit
   driver_kind=='r' gate for human-readable intent.
3. SPS NAL caching strategy — ADOPT, critical. SPS NALs only arrive
   at IDR frames; per-frame walk would submit zero-filled RPS for
   non-IDR frames and re-OOPS. Parse-and-cache at first IDR, reuse
   on subsequent frames.
4. C3 prediction caveat — ADOPT. Anchor SHA per-clip (HEVC HW vs
   HEVC SW) not cross-codec; iter1's shared SHA across codecs was
   lucky empirical convergence, not guaranteed.

Three Phase 4 amendments applied as appendix to phase4_plan_iter2.md:
  - §Step 3 — per-driver-kind probe storage (pair instead of scalar)
  - §Step 4 — explicit two-struct mapping table; SPS parse-and-cache
  - §Phase 7 predictions C3 — anchor per-clip

Risk register gains risk #6 (SPS absent on non-IDR frames).

Per feedback_review_empirical_over_theoretical: the Finding #1
rebut was done by reading the actual vendored header file content,
not by source-reading the reviewer's argument. Empirical evidence
won, as the memory rule requires.

Plan sound with amendments. Phase 6 can proceed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:57:06 +00:00

15 KiB

Phase 4 — iter2 plan (concrete implementation sequence)

Drafted 2026-05-16 evening, post-Phase-3-baseline. This file is what Phase 6 executes verbatim, strict scope per feedback_dev_process §6 ("scope strictly to plan — resist feature creep").

Files touched

Status Path Approx. LOC Origin
NEW src/h265_parser/h265parser.c ~5410 vendored from gstreamer/subprojects/gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c (LGPL v2.1+, headers preserved verbatim)
NEW src/h265_parser/h265parser.h ~2440 vendored, same source path
NEW src/h265_parser/bitreader.c ~300 vendored from gstreamer/subprojects/gstreamer/gst/parse/gstbitreader.c (LGPL v2.1+)
NEW src/h265_parser/bitreader.h ~300 vendored, same source path
NEW src/hevc-ctrls/v4l2-hevc-ext-controls.h ~50 hand-written, mirrors kernel UAPI 7.0 verbatim with citation
MODIFIED src/h265.c +~80, -~5 new include of v4l2-hevc-ext-controls.h + h265_parser/; extend h265_set_controls to add 2 new entries to the controls[] array when probed-supported; per-driver-kind gate
MODIFIED src/context.c +~30 runtime probe via VIDIOC_QUERYCTRL for the two new CIDs at context init; store result on driver_data
MODIFIED src/request.h (or src/codec.h, TBD) +~5 add bool has_hevc_ext_sps_rps; field on struct request_data
MODIFIED src/Makefile.am +~5 add new source files to libv4l2_request_drv_video_la_SOURCES
MODIFIED README.md +~15 new "Vendored third-party components" section listing the 4 LGPL files with upstream URLs + pinned-revision SHA
MODIFIED STUDY.md (optional) +~20 brief note that the H.265 parser is vendored verbatim from GStreamer for upstream-bug-fix-sync simplicity

Total: ~8650 LOC added (~7900 vendored unchanged, ~750 hand-written), modifications to 5 existing files.

Phase 6 step-by-step sequence

Strict ordering — each step is a separate commit so Phase 7 can bisect if needed.

Step 1 — vendor the parser unchanged

Fetch from GStreamer main mirror (or pin to GStreamer 1.28 tag — confirm in Phase 5 review). Copy verbatim into src/h265_parser/. Drop into git as-is. The four vendored files compile WILL not compile yet (GLib deps not satisfied). Build is broken at this commit — that's intentional: this step is "upstream code baseline, ready to be adapted."

Commit message clear: "vendor GStreamer H.265 parser unchanged @ — does not build yet, see step 2."

Step 2 — GLib-to-libc mechanical adaptation

Per the 6 replacements documented in Phase 2:

  1. GArray *struct { void *data; size_t count; size_t element_size; } plus 4-5 helper functions matching g_array_* API surface. Either inline-replace at every call site OR add a thin wrapper header src/h265_parser/array_compat.h. Phase 5 review decides which.
  2. g_malloc / g_free / g_slice_* → libc malloc / free. Direct substitution; check for size-overflow patterns that g_malloc traps and libc doesn't (none expected in parser code, but verify).
  3. g_clear_pointer (&p, free)do { free(p); p = NULL; } while (0) inline.
  4. g_assert (cond) → propagate to the caller as a parser-failure return code. CRITICAL: do NOT use assert() here per Phase 2 §"new failure modes" §5; the parser must return an error, not abort the process.
  5. gboolean / gint* / guint* / gsize<stdbool.h> + <stdint.h> C99 types (bool, int8_t, int16_t, int32_t, uint8_t, uint16_t, uint32_t, size_t).
  6. GST_DEBUG_CATEGORY / GST_DEBUG (...) / GST_WARNING (...) / GST_ERROR (...) → backend's request_log / error_log (check src/utils.h for existing log macros — likely there).

Build now succeeds; parser is callable but unused (no callers yet).

Commit message: "GLib-to-libc mechanical adaptation of vendored parser; no logic changes." Reference the 6-row table verbatim.

Step 3 — add UAPI header + runtime probe

Write src/hevc-ctrls/v4l2-hevc-ext-controls.h with verbatim copies of:

  • #define V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS (V4L2_CID_CODEC_STATELESS_BASE + 408)
  • #define V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS (V4L2_CID_CODEC_STATELESS_BASE + 409)
  • struct v4l2_ctrl_hevc_ext_sps_st_rps { … } (exact kernel definition)
  • struct v4l2_ctrl_hevc_ext_sps_lt_rps { … }
  • #define V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED 0x1
  • #define V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT 0x1

Header citation block: upstream kernel UAPI source URL + commit + dated as 7.0-rc3 reference; include guard.

Add bool has_hevc_ext_sps_rps; to struct request_data (in src/request.h).

Add runtime probe to context init (src/context.c or new helper in src/codec.c):

static bool probe_hevc_ext_sps_controls(int video_fd) {
    struct v4l2_queryctrl q = { .id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS };
    if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0) return false;
    q = (struct v4l2_queryctrl){ .id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS };
    return ioctl(video_fd, VIDIOC_QUERYCTRL, &q) >= 0;
}
/* on the rkvdec fd at init time: */
driver_data->has_hevc_ext_sps_rps = probe_hevc_ext_sps_controls(rkvdec_fd);

Per-driver-kind also (the gate is "kernel supports it AND driver-kind is vdpu381/383"):

  • For RK3399 (driver_data->driver_kind == 'r' in iter38 terms): probe returns false on its rkvdec fd, so has_hevc_ext_sps_rps is naturally false. No explicit gate needed beyond the probe.
  • For ampere RK3588 (also 'r' per iter38 — same driver-kind tag): probe returns true on the rkvdec fd. The gate fires correctly.

Commit message: "add HEVC EXT_SPS_*_RPS UAPI header + runtime probe (no callers yet)."

Step 4 — wire h265_set_controls

Extend h265_set_controls in src/h265.c. After the existing 5-control block (~line 660-690):

if (driver_data->has_hevc_ext_sps_rps) {
    /* Parse the SPS NAL from the bitstream buffer to source RPS data.
     * VAAPI's VAPictureParameterBufferHEVC does NOT expose the RPS array
     * contents (only counts), per the upstream-survey finding documented in
     * ~/src/ampere-kernel-decoders/phase0_findings_iter2.md.
     */
    /* call into vendored gst_h265_parser_* to fill st_rps[] + lt_rps[] arrays
     * from the SPS NAL bytes the backend already has access to (the
     * VAEncSequenceParameterBuffer / source_data buffer) */
    /* ... allocate dynamic arrays sized to num_short_term_ref_pic_sets +
     *     num_long_term_ref_pics_sps; field-by-field map per the GStreamer
     *     reference mapping in fill_ext_sps_rps */

    controls[n++] = (struct v4l2_ext_control){
        .id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS,
        .ptr = st_rps_array,
        .size = sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps) * num_st_rps,
    };
    controls[n++] = (struct v4l2_ext_control){
        .id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS,
        .ptr = lt_rps_array,
        .size = sizeof(struct v4l2_ctrl_hevc_ext_sps_lt_rps) * num_lt_rps,
    };
}

The actual field-by-field mapping mirrors GStreamer's gst_v4l2_codec_h265_dec_fill_ext_sps_rps literally — per Phase 0 survey, the names match the H.265 spec on both sides, so it's a mechanical copy.

The SPS NAL must be located in the bitstream buffer. The backend has the buffer already (from VASliceData / source_data); a small helper walks it for nal_unit_type == 33 (SPS) and feeds the NAL bytes to the parser.

Commit message: "h265: populate EXT_SPS_ST_RPS + LT_RPS controls when kernel supports them."

Step 5 — build, install, test (Phase 7 territory but Step 5 is iter2's "smoke check")

cd ~/src/libva-v4l2-request-fourier
ninja -C build
sudo install -m644 build/src/v4l2_request_drv_video.so /usr/lib/dri/
sudo systemctl reboot  # clear any wedged m2m state from Phase 3
# (wait for ampere to come back)
# Smoke check (NOT full Phase 7):
LIBVA_DRIVER_NAME=v4l2_request timeout 8 ffmpeg \
    -hwaccel vaapi -hwaccel_output_format vaapi \
    -i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 \
    -vf "hwdownload,format=nv12" -frames:v 5 -f null -
# Predicted: exit 0, no kernel OOPS in dmesg.

If smoke fails (still OOPSes): F1 fires → loop back to Phase 0 with re-opened kernel-agent#11.

Smoke success advances to full Phase 7 (C1-C8 sweep).

Step 6 — README + STUDY.md updates

  • README.md: add "Vendored third-party components" section listing the 4 vendored files with upstream URLs + SHA pin + GPL note.
  • STUDY.md (optional, if file exists and is appropriate): brief paragraph that the H.265 parser is vendored verbatim from GStreamer to preserve future bug-fix sync capability.

Commit message: "README: document vendored GStreamer H.265 parser."

Predicted Phase 7 outcomes (in C1-C8 terms)

Criterion Prediction Loopback if violated
C1 (HEVC decode completes) PASS — 1440 frames decoded, exit 0 F1 → Phase 0
C2 (HW path engaged, new ioctls fire) PASS — VIDIOC_S_EXT_CTRLS count up from 4 to 6; MEDIA_REQUEST_IOC_QUEUE up from 1 to ≥1440 F1
C3 (frame 0 byte-identical vs SW) PASS — HEVC I-frame at t=0, no inter-prediction, same sha as ampere-fourier iter1 H.264/VP8/MPEG-2 (3214803d8be74416) F2 → Phase 4 (parser bisect)
C4 (frame 720 SSIM Y in H.264 territory ~0.65±0.05) Likely PASS in that band, but RK3588 may differ; record actual value as new in-session-acquired data point not a falsifier per Phase 1 — document and proceed
C5 (FPS N=3) Unconstrained, expect 100-400 fps; matches H.264 / VP8 / MPEG-2 order of magnitude not a falsifier
C6 (dmesg clean) PASS — no rkvdec_hevc_prepare_hw_st_rps OOPS F1
C7 (firefox-fourier HEVC, opportunistic) PASS if SDDM auto-login Wayland session still up not iter2-blocking
C8 (iter1 3-codec baseline regression check) PASS — patch is HEVC-only behind a gate; H.264 / VP8 / MPEG-2 paths unchanged F3 → Phase 4 (per-driver-gate audit)

Risk register

Risk Probability Mitigation
Vendored parser doesn't compile cleanly without GLib (more deps than mapped in Phase 2) medium Phase 2 mapping is mechanical; expect 1-2 surprises (GTypeClass usage somewhere). Address inline in Step 2; don't extend scope further.
Parser produces RPS data slightly different from kernel's spec interpretation medium F2 falsifier — bisect against GStreamer gst-launch on same clip if it fires
linux-api-headers bump happens unexpectedly mid-iter2 low UAPI header is internal; works regardless of host headers version
ampere reboot during Step 5 takes longer than expected low SDDM auto-login restores Wayland session automatically
Step 4's SPS-NAL-location helper misidentifies the SPS in some bitstream layouts low-medium Phase 5 review item; test with multiple BBB encodes if needed

Risks NOT mitigated (accepted as is)

  • The full Phase 6 sequence is substantial work (~8.7k LOC added, multiple build/test cycles, kernel reboots). Per operator directive, time/effort don't matter; the budget is "however long correctness takes."

Phase 4 close

5 commits, 8.7k LOC, 5 file modifications. Phase 7 predictions concrete (C1-C8 mapped). 5 risks named, 4 mitigated. Ready for Phase 5 second-model review.


Phase 5 amendments (applied 2026-05-16 post-review)

Amendment §Step 3 — per-driver-kind probe storage

Replace single bool has_hevc_ext_sps_rps; field on struct request_data with a pair:

bool has_hevc_ext_sps_rps_rkvdec;
bool has_hevc_ext_sps_rps_hantro;

Probe runs at the time each fd is opened (init for primary, alt-probe for secondary per iter38). Storage parallels video_fd_rkvdec / video_fd_hantro. h265_set_controls consults only the rkvdec entry (HEVC routes through rkvdec only).

§Step 4 control-submission guard becomes explicit:

if (driver_data->driver_kind == 'r' && driver_data->has_hevc_ext_sps_rps_rkvdec) {
    /* … populate + submit EXT_SPS_*_RPS controls … */
}

The driver_kind == 'r' clause is logically redundant given the probe-result gate but provides reviewer-flagged human-readable intent.

Amendment §Step 4 — explicit two-struct usage + SPS caching

The GStreamer reference fill_ext_sps_rps reads from BOTH GstH265ShortTermRefPicSet AND GstH265ShortTermRefPicSetExt (both structs are in the vendored gsth265parser.h — see Phase 5 Finding #1 rebut). Step 4 mapping:

V4L2 field Source in vendored parser
delta_idx_minus1, delta_rps_sign, abs_delta_rps_minus1, num_negative_pics, num_positive_pics, flags GstH265ShortTermRefPicSet (members delta_idx_minus1, delta_rps_sign, abs_delta_rps_minus1, NumNegativePics, NumPositivePics, inter_ref_pic_set_prediction_flag)
use_delta_flag, used_by_curr_pic, delta_poc_s0_minus1[], delta_poc_s1_minus1[] GstH265ShortTermRefPicSetExt (members use_delta_flag[], used_by_curr_pic_flag[], delta_poc_s0_minus1[], delta_poc_s1_minus1[])

If GStreamer's gst_h265_decoder_get_sps_ext() helper is in the base-decoder class (not in gsth265parser.c), inline its body — it likely just reaches into sps->short_term_ref_pic_set_ext[i] on the parsed GstH265SPS.

SPS parse-and-cache strategy (Phase 5 Finding #3):

  • Add a cache slot on struct request_data: struct hevc_sps_cache_entry { uint8_t sps_id; bool valid; GstH265SPS parsed_sps; } hevc_sps_cache[16] (sized to match the H.265 spec's max of 16 active SPS).
  • At first h265_set_controls invocation OR whenever the SPS NAL is present in source_data (walk for nal_unit_type == 33): parse, store at hevc_sps_cache[sps_id], mark valid.
  • On subsequent frames: look up hevc_sps_cache[pps->seq_parameter_set_id], reuse if valid; if invalid AND SPS NAL not in current source_data, log warning + skip the EXT_SPS_*_RPS controls (graceful fallback — better to fail soft than re-OOPS).
  • Cache cleared in RequestDestroyContext.

This adds approximately +20 LOC to Step 4 (cache plumbing) on top of the original control-submission code.

Amendment §"Phase 7 predictions" C3 row

Replace:

C3 — PASS — HEVC I-frame at t=0, no inter-prediction, same sha as ampere-fourier iter1 H.264/VP8/MPEG-2 (3214803d8be74416)

With:

C3 — PASS — HEVC frame 0 HW-decode SHA matches HEVC SW-decode SHA from the same encoded clip (anchored per-clip, not cross-codec). Cross-codec sha convergence observed in iter1 was empirical-not-guaranteed: encoder's quantization choices differ across codecs and the SHA match was a happy coincidence, not a contract.

Updated risk register (added risk #6)

| 6 | SPS NAL absent from source_data on non-IDR frames; without caching, h265_set_controls submits zero-filled RPS for every non-IDR → re-OOPS | high (would have fired in practice) | parse-and-cache per Amendment §Step 4 above; first IDR populates cache, subsequent non-IDR frames reuse it; graceful skip + warning if cache miss AND no SPS NAL available |