iter3 1-line kernel fix eliminated the OOPS. Now diagnosing why
the 5-control batch (SPS PPS SLICE_PARAMS SCALING_MATRIX
DECODE_PARAMS) returns EINVAL with error_idx=count=5 → all-zero
output.
Locked-in evidence: control sizes match kernel-expected elem_size
for every CID. SPS values strace-decoded to (chroma=1, bit_depth=0,
1280x720) all pass validate_sps numerically. coded_fmt from S_FMT
trace is 1280x720 S265. validate_new dprintk doesn't fire despite
DEV_DEBUG_CTRL=0x20 set → rejection is silent inside
try_or_set_cluster's try_ctrl path (rkvdec_hevc_validate_sps).
Phase 1 starts with empirical: pr_warn at validate_sps entry to
confirm/refute. Instrumented module already built + ready to
load post-reboot.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rkvdec_hevc_run_preamble conditionally assigns run->ext_sps_st_rps
and run->ext_sps_lt_rps only when ctx->has_sps_*_rps is true. The
caller (vdpu381-hevc.c:591) allocates the run struct without zero-init,
so when has_sps_st_rps is false, run.ext_sps_st_rps retains stack
garbage (0x51a0 deterministically on RK3588 CoolPi CM5 GenBook).
prepare_hw_st_rps's `if (!run->ext_sps_st_rps) return;` check passes
(garbage is non-NULL), then memcmp dereferences 0x51a0 → OOPS.
Backend diagnostic instrumentation (iter3-Q4) revealed the full
dispatch path BeginPicture → RenderPicture → EndPicture →
h265_set_controls → populate_ext_sps_rps_cache(ENODATA, because
ffmpeg-vaapi strips SPS/VPS/PPS, leaving only slice NALs) →
fallback to 5-ctrl batch (EINVAL) → MEDIA_REQUEST_IOC_QUEUE → kernel
OOPS. The OOPS occurs even when our EXT_SPS_*_RPS controls are
NEVER submitted — purely a kernel-side init bug.
1-line patches proposed (Option A — fix in preamble, Option B —
zero-init run struct at caller).
Q1-Q5 all answered. Userspace iter2 work stays — upstream-aligned
path for streams that need EXT_SPS_*_RPS. Iter4 deferred until
kernel patch lands via kernel-agent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Entry condition: iter2 F1 closed with deterministic x1=0x51a0
evidence + 'our new controls don't reach the kernel' strace.
Substrate:
- kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices
(same tree as boltzmann's linux-rk3588-marfrit branch)
- module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann
16-core, deploy via scp + rmmod/insmod cycle (no reboot needed)
5 open questions for Phase 1:
Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?)
Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every
assignment)
Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS?
Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log
h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD.
Q4 first; informs all others.
Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c?
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 6 ran Steps 1-4 cleanly (vendor + adapt + UAPI header +
h265_set_controls wiring; 4 atomic commits f91c3f5..1a2c958 on
backend master, build green). Phase 6 Step 5 smoke test triggered
F1 falsifier per Phase 1 falsifier path.
Evidence:
- strace ioctl trace shows per-frame VIDIOC_S_EXT_CTRLS carries 5
controls (IDs 0xa40a90..0xa40a94 = standard HEVC SPS/PPS/SLICE/
SCALING/DECODE_PARAMS). The new 0xa40a98 (EXT_SPS_ST_RPS) and
0xa40a99 (EXT_SPS_LT_RPS) DO NOT appear in any S_EXT_CTRLS call.
- Probe-line still fires (has_hevc_ext_sps_rps_rkvdec=true confirmed
via vainfo log) — so the gate is fine, but the populate-and-add
code path doesn't reach v4l2_set_controls.
- OOPS register state identical across pre-iter2 + post-iter2:
x1 = 0x51a0 (small integer treated as pointer; pgd=0 confirms
invalid). The kernel reads this same garbage value regardless of
whether userspace tries to set the controls or not.
Hypothesis revision: Phase 0's 'UAPI-gap' reconstruction was
PARTIALLY refuted. Even when userspace doesn't populate the new
CIDs (pre-iter2) AND when it tries to (iter2 but the call doesn't
actually fire), the kernel ends up with run->ext_sps_st_rps=0x51a0.
The 0x51a0 is a deterministic kernel-side state — uninitialized
ctrl->p_cur.p or a confused offset-vs-pointer.
Three diagnostic next-steps for iter3 (kernel-side investigation):
1. Backend instrumentation: log h265_populate_ext_sps_rps_cache
return code + source_data SPS NAL search outcome
2. Backend code-path check: is h265.c::h265_set_controls really
the call site, or does picture.c dispatch via something else?
3. Kernel instrumentation: printk in rkvdec_hevc_prepare_hw_st_rps
dumping run->ext_sps_st_rps as read from ctrl->p_cur.p
Meta-campaign re-shuffle:
- iter2 closes F1 (this commit)
- iter3 was 'VP9 enablement' -> now bumped to 'HEVC kernel
investigation' (more urgent, has concrete evidence to pursue)
- iter4 = VP9 kernel enablement (was iter3)
Source code stays on backend master — iter2 infrastructure
(vendored parser, UAPI shim, runtime probe) is reusable for iter3+
regardless of whether the eventual kernel-side fix changes how
userspace integrates.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan subagent (sonnet) reviewed Phase 0-4 with raw artifacts. Four
findings:
1. GstH265SPSEXT data-source gap — REBUT with empirical evidence.
Reviewer confused GstH265SPSEXT (SPS extension params, separate
struct) with GstH265ShortTermRefPicSetExt (RPS-extended struct,
contains the 4 fields they flagged as missing). Both
ShortTermRefPicSet AND ShortTermRefPicSetExt ARE in the vendored
gsth265parser.h. Direct read of fetched header confirmed.
2. Per-fd storage for has_hevc_ext_sps_rps — ADOPT. Mirror iter38
pattern of storing rkvdec + hantro fds separately. Add explicit
driver_kind=='r' gate for human-readable intent.
3. SPS NAL caching strategy — ADOPT, critical. SPS NALs only arrive
at IDR frames; per-frame walk would submit zero-filled RPS for
non-IDR frames and re-OOPS. Parse-and-cache at first IDR, reuse
on subsequent frames.
4. C3 prediction caveat — ADOPT. Anchor SHA per-clip (HEVC HW vs
HEVC SW) not cross-codec; iter1's shared SHA across codecs was
lucky empirical convergence, not guaranteed.
Three Phase 4 amendments applied as appendix to phase4_plan_iter2.md:
- §Step 3 — per-driver-kind probe storage (pair instead of scalar)
- §Step 4 — explicit two-struct mapping table; SPS parse-and-cache
- §Phase 7 predictions C3 — anchor per-clip
Risk register gains risk #6 (SPS absent on non-IDR frames).
Per feedback_review_empirical_over_theoretical: the Finding #1
rebut was done by reading the actual vendored header file content,
not by source-reading the reviewer's argument. Empirical evidence
won, as the memory rule requires.
Plan sound with amendments. Phase 6 can proceed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HEVC OOPS reproducer captured with strace ioctl trace. Backend
submits 4 VIDIOC_S_EXT_CTRLS (existing SPS/PPS/SCALING_MATRIX/
DECODE_PARAMS) and 1 MEDIA_REQUEST_IOC_QUEUE, kernel faults in
rkvdec_hevc_prepare_hw_st_rps -> __pi_memcmp during processing of
that single request, ffmpeg hangs (m2m wedged), needs SIGKILL.
ZERO V4L2_CID_STATELESS_HEVC_EXT_SPS_*_RPS calls in trace —
corroborates iter2 hypothesis that the backend never sets the new
7.0-UAPI controls. Confirms by independent evidence (in addition
to grep of source which already showed zero hits).
Phase 7 prediction table: VIDIOC_S_EXT_CTRLS 4 -> 6, QBUF 2 -> 10
(for 5 frames), MEDIA_REQUEST_IOC_QUEUE 1 -> 5, dmesg empty
(no new rkvdec_hevc_prepare_hw_st_rps oops).
Falsifier anchor F1: if the trace re-appears post-patch, iter2 loops
back to Phase 0 with re-opened kernel-agent#11.
ampere is in wedged-m2m state post-capture; reboot needed before
Phase 6 / Phase 7. Documented as pre-action for those phases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fetched gsth265parser.c (5409 lines) + .h (2440 lines) from GStreamer
main mirror. Plus GstBitReader (~600 LOC vendored separately) as
the parser's only non-self-contained dependency.
License LGPL v2.1+ (Intel + Sreerenj Balachandran), preserves verbatim
in vendored copies; compatible with backend COPYING.LGPL. Add README
note listing the two vendored files.
GLib adaptation mapped to 6 mechanical replacements:
GArray/g_array_* -> plain C dynamic arrays (count+ptr)
g_malloc/g_free -> libc malloc/free
g_clear_pointer -> inline free+NULL
g_assert -> propagate parser-failure-code (NOT abort!)
gboolean/gint/etc -> stdbool/stdint
GST_DEBUG_* -> backend's request_log/error_log
Vendor the FULL parser unchanged per upstream-alignment rule; dead
code (PPS/slice/SEI parsing we don't strictly need) is acceptable
to preserve upstream-bug-fix-sync simplicity.
Header strategy concretized: new src/hevc-ctrls/v4l2-hevc-ext-controls.h
~50 lines with verbatim kernel UAPI defs; runtime probe via
VIDIOC_QUERYCTRL at backend init, stored per-driver_data, gated by
both kernel-supports + active driver-kind is vdpu381/383.
Build system impact: 2-line Makefile.am addition, no autoconf, no
pkg-config changes. Compile time uptick acceptable.
New constraints (6 total):
1. Vendored LGPL header verbatim preservation + README note
2. Hand-build install path (carries from iter1)
3. Reboot needed after HEVC OOPS recovery (iter2 test cycle)
5. Replace g_assert with error propagation, not abort
6. Parser interpretation may differ from kernel even with same spec
Ready for Phase 3 (mostly inherited from iter1 + iter2 phase 0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GStreamer's MERGED v4l2_codec_h265_dec_fill_ext_sps_rps in
gst-plugins-bad (GStreamer 1.28, MR !10820) is the primary upstream
reference. Walks its own gst_h265_parser_'s GstH265SPS.short_term_
ref_pic_set[] array, field names match the H.265 spec, one-to-one
mapping to the V4L2 control struct. Header strategy: runtime-optional
control probe, NO #ifndef shim.
Casanova's FFmpeg WIP branch (v4l2-request-ext-sps-rps-n8.0.1 at
gitlab.collabora.com) is the secondary reference — walks libavcodec
internal HEVCSPS->st_rps[] with different field names. Useful as
cross-check but not the primary template (renaming gymnastics).
cros-codecs has no support yet (would follow GStreamer's shape if
added). Casanova's kernel-test framework uses fluster through these
two upstream consumers; no other reference exists.
Q1 (architecture): resolved — implement H.265 SPS parser in backend,
mirror GStreamer pattern with spec-compliant field names.
Q2 (UAPI shim): resolved — runtime-optional control probe per
GStreamer pattern, NOT #ifndef shim.
Remaining sub-question for Phase 1: parser SOURCE (vendor GStreamer's
gsth265parser.c, adapt to backend idioms, or implement minimal fresh
from H.265 §7.3.7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 existing HEVC controls in backend (SPS/PPS/SLICE_PARAMS/SCALING_
MATRIX/DECODE_PARAMS at h265.c:660-688) + DECODE_MODE/START_CODE in
context.c. No H.265 bitstream parser in backend (h264_slice_header.c
is the only such precedent — for H.264).
CRITICAL substrate finding: VAAPI VAPictureParameterBufferHEVC
exposes RPS COUNTS (num_short_term_ref_pic_sets,
num_long_term_ref_pic_sps) but NOT the per-RPS array contents
(delta_poc_s0_minus1[], delta_idx_minus1, etc.). So the backend
can't just copy from VAAPI — needs another data source.
5 open questions tabled for iter2 Phase 1, with Q1 = architecture
for RPS data sourcing being load-bearing:
A. Implement H.265 SPS parser in backend (~800-1500 LOC)
B. Stage-A test minimal-patch hypothesis (zero-init RPS) first
C. Link libavcodec's H265RawSPS (adds FFmpeg build dep)
D. Some other channel TBD (e.g. VAAPI extension buffer)
Plus Q2 (linux-api-headers shim vs bump), Q3 (mechanism depth),
Q4 (test clip — BBB iter1 carries), Q5 (Phase 7 anchor).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Meta-iter1 was a scoping iteration — deliverable is the campaign
umbrella + sub-iter ledger, not a code change. Phases 2-8 are
either rolled into Phase 0+1 (situation, plan) or deliberately N/A
(no meta-level measurements or verification beyond 'issues filed +
ledger exists').
Phase 5 review is the documented deviation from 'reviews are never
skippable': justified because Phase 0's prior-art survey +
verification against kernel source IS the review-equivalent rigor
(per feedback_review_empirical_over_theoretical), and there's no
separate Phase 4 plan to review beyond the ledger. iter2 + iter3
each get a full Phase 5 review on their own.
No new memory entry — lessons fall under existing
feedback_dev_process (Phase 1 loopback when scoping was wrong) and
feedback_characterize_before_change (meta-iter1 = scope + ledger
extends naturally).
Next: start iter2 Phase 0 (HEVC backend extension via
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operator picked option 2: meta-campaign coordinating an HEVC backend
iteration (iter2 against libva-v4l2-request-fourier#3) and a VP9
kernel iteration (iter3 against kernel-agent#12) as parallel children
under one umbrella.
Iteration order: HEVC first because (a) smaller blast radius / faster
feedback loop, (b) the survey's mechanism reconstruction is a
hypothesis not yet test-verified — cheapest way to confirm is a
minimal backend patch, (c) VP9 has unresolved upstream-tree selection
that benefits from absorbing iter2 findings first. Operator may
overrule for parallel execution.
Meta-success: iter2 + iter3 both close with HEVC + VP9 added to
ampere-fourier Phase 3 instrumentation (C1-C6 against per-codec
floors); no regression to iter1 3-codec baseline; patches in right
repos (libva for HEVC, kernel-agent experiment branch for VP9, NOT
linux-ampere-fourier baseline per operator policy).
Falsifiers: iter2 patch doesn't fix HEVC -> re-open kernel-agent#11
with new evidence; iter3 tree doesn't rebase cleanly -> pick another;
Phase 7 regression to iter1 baseline -> Phase 5 review missed shared-
code interaction.
iter1 of the meta-campaign is mostly Phase 0+1+8: scoping + ledger +
close. Real engineering happens inside iter2 and iter3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 0 ran the operator-mandated upstream prior-art survey FIRST,
before any source-read or hypothesis. Headline finding: the HEVC
OOPS is fundamentally re-scoped from 'kernel bug' to 'userspace UAPI
gap' against the new 7.0 controls
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS.
Survey + verification:
- Casanova/Collabora v8 series merged in Linux 7.0 added the two
new V4L2 controls for VDPU381 HEVC; backend 7ac934e (June pre-iter38)
predates the UAPI and grep returns zero hits for these CIDs.
- ampere linux-api-headers is still 6.19-1, doesn't define the
constants — the backend literally cannot reference them without a
headers bump.
- ampere kernel source rkvdec-hevc-common.c:500-509 looks up the new
CIDs; if backend never set them, rkvdec_hevc_prepare_hw_st_rps
reads invalid memory via memcmp — exactly the __pi_memcmp OOPS
symptom.
VP9 still kernel-side per the v4 cover ('This patch only adds support
for H264 and H265 in both variants'). Multiple competing out-of-tree
starting trees: Sarma's android tree (working but Android-flavored),
dongioia/rock5bplus-rkvdec2 (mainline-style claims), Kwiboo (no VP9
on RK3588 yet). RKVDEC2 separate-driver path is dead — future VP9
extends the existing rkvdec driver's VDPU381 variant_ops.
Five open questions tabled for Phase 1 — most important being campaign
re-scope (HEVC moves to backend campaign; this stays VP9 kernel-only
OR becomes a meta-campaign coordinating both).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sister to ampere-fourier (userspace consumer) and fresnel-fourier
(RK3399 peer). Tracks the 2 kernel-side blockers from ampere-fourier
iter1: HEVC OOPS in rkvdec_hevc_prepare_hw_st_rps (kernel-agent#11)
and VP9 enablement on VDPU381/383 (kernel-agent#12). AV1 stays
userspace (libva-v4l2-request-fourier#2), not in this campaign.
Process notes: Phase 0 includes a non-optional upstream prior-art
survey (linux-rockchip / linux-media / linux-mm / Kwiboo / Bootlin)
before any code; per operator policy, patches go to kernel-agent
experiment branches, NOT into the linux-ampere-fourier baseline
package.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>