30 Commits

Author SHA1 Message Date
Markus Fritsche fdd9c635cb Campaign close: HEVC decode bit-perfect on ampere
HW vs SW frame 100 byte-compare: 100.00% exact match (1382400 /
1382400 bytes). 182 unique byte values, identical top-10
distribution.

Campaign actually completed at iter4 (kernel-agent#14 ext_sps
NULL init + kernel-agent#15 HEVC_SLICE_PARAMS registration).
iter5 'uniform Y=0x10/CbCr=0x80 black output' was a
mis-diagnosis — the bbb video first 3 frames ARE genuinely
all-black (intro fade-in). A 2-min SW-reference byte-compare
at iter5 close would have ended the campaign 10+ hours earlier.

iter6 (vb2 fence series, 3 versions, 6 WeChat recoveries):
off-path, but found a real upstream NULL deref at
dma_fence->context (offset 0x20) in dma_resv_add_fence —
file as kernel-agent#16 when UART confirms the register dump.

iter7 (DT dma-coherent on rkvdec): off-path AND falsified —
RK3588 rkvdec is NOT in ACE-Lite coherent domain. dma-coherent
causes HW timeouts. Reverted.

5 new/updated memories:
- feedback_compare_hw_against_sw_reference.md (the lesson)
- feedback_backup_before_module_replace.md
- feedback_sddm_autologin_disable.md
- feedback_no_session_termination_attempts.md (reinforced)
- reference_dmabuf_resv_blocker.md (overturned claim)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:41:06 +00:00
Markus Fritsche 24272596cd iter7 Phase 0: pivot to cache-coherency hypothesis (H1) for iter5 black-output
iter6 was an off-path investigation. Sonnet round-1 review + my
own corrected memory feedback_rfc_v2_vb2_dma_resv_scope.md make
clear: vb2 fence series targets Wayland compositor green-frames,
not libva cached-mmap readback. iter6 found a real upstream NULL
deref bug (filed for kernel-agent#16 when UART captures the trace)
but it's not on the critical path for iter5.

iter7 returns to iter5's actual hypothesis ladder:
- H1(a) DT dma-coherent on rkvdec node — cheapest, first
- H1(b) backend DMA_BUF_IOCTL_SYNC userspace fix — if H1(a) fails
- H2 wrong-DMA-address — if H1(a)+(b) fail
- H3 false-positive DEC_RDY — last resort

Test on vanilla kernel + iter3+iter4-fixed modules (already
verified working pipeline). Pass criterion: ffmpeg-vaapi output
shows more than {16, 128} unique bytes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:32:20 +00:00
Markus Fritsche b8930df801 iter6 v6 substrate: source-trace points NULL deref at 0x20 to dma_fence->context
Decoded ESR 0x96000004 = DFSC level-0 (pure NULL deref) at virtual
address 0x20. Structural offset analysis: struct dma_fence has
u64 context at offset 32 (=0x20). dma_buf->ops also at 0x20 but
0004's code guards against NULL dbuf.

Leading hypothesis: dma_resv_add_fence iterates existing fences
in dbuf->resv->shared[] to merge-by-context. If RCU-managed
fence cleanup races with concurrent add, a freed slot becomes NULL
and the iteration dereferences NULL->context (offset 0x20).

Timing matches: 18-31 min uptime for first wedge (decode-cycle
churn needed); fast reboot loops after (BTRFS replays unflushed
state). KASAN doesn't catch (NULL deref is not UAF). Lockdep
doesn't catch (fence lifecycle race, not lock order).

Proposed 0004 v2 fix: use DMA_RESV_USAGE_KERNEL (single-slot,
replaces previous) instead of DMA_RESV_USAGE_WRITE (multi-slot
list with race window), OR dma_resv_replace_fences() for explicit
context-keyed atomic swap.

Confirmation path: when UART lands, look for pc inside
dma_resv_add_fence and the NULL-pointer register holding the
stale fence slot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:29:57 +00:00
Markus Fritsche 9050b454a4 iter6 v3: stage KCSAN-only config delta script for second-pass build
KASAN and KCSAN are mutually exclusive on 7.0-rc3 kernel
(lib/Kconfig.kcsan: 'depends on ... !KASAN'). User picked
KASAN first. This script applies the KCSAN-only config delta
on top of the lockdep base for the follow-up build, so the
KCSAN config survives session-compacting.

Run AFTER the KASAN-pass test cycle completes, IF KASAN comes
up silent on the GPU-compositor wedge test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:36:24 +00:00
Markus Fritsche 96e2d439c9 iter6 v3 plan: apply Sonnet round-3 amendments
6 targeted fixes, no rewrite:
1. Add DEBUG_ATOMIC_SLEEP + DEBUG_LIST explicitly to config delta
2. Drop PAGE_POISONING_NO_SANITY (keep the sanity check)
3. Fix mpv hwdec flag (vaapi-copy is SW! use v4l2m2m or rkmpp).
   Extend monitoring window 2min → 20min (KASAN slows runtime 3-8x).
   Add glmark2 + ffmpeg-null concurrent option.
4. Add R11: /boot/firmware headroom pre-flight (KASAN_OUTLINE +15-25% Image size).
5. Replace serial fallback with netconsole (no clip work needed).
6. Add H4 isolation exit path: headless-only DRM ffmpeg loop if
   compositor test silent — isolates panthor from vb2.

All round-3 ACCEPT/AMENDs applied. v3 ready for execution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:28:19 +00:00
Markus Fritsche 2de6870cb4 iter6 v3 plan: KASAN+KCSAN delta on lockdep substrate
Pending architect delta-review before any execution.

Adds: KASAN_GENERIC + KCSAN + DEBUG_PREEMPT + DEBUG_OBJECTS +
SLUB_DEBUG_ON + PAGE_POISONING. CONFIG_LOCALVERSION=-lockdep-kasan
so separate from existing lockdep modules.

Smoke test extended: after headless ffmpeg passes, manually log in
to plasma + run mpv hwdec=vaapi-copy to reproduce v2's compositor
wedge condition. Monitor pstore + journal for KASAN/KCSAN reports.

Risk additions: KASAN/KCSAN boot slowness, KASAN shadow region,
KASAN false positives, GPU repro requires physical login.

Exit conditions: success (sanitizer report fires) / partial (still
silent — switch to serial) / failure (KASAN kernel won't boot).

~75-90 min build + ~45 min test cycle for 0004 verdict.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:26:55 +00:00
Markus Fritsche 53e247724f iter6 v2 attempt-2 close — 0004 headless-clean but compositor-wedges
Step 2.1 (0004 vb2 helper alone) on lockdep+ramoops kernel:
- Headless smoke test PASS: ffmpeg rc=0, 4147200B output, zero
  lockdep splat, iter4-DIAG validate_sps per-frame, iter5-IRQ DEC_RDY=1
- ~10 min later when GPU compositor + concurrent vb2 traffic active:
  silent wedge → panic_timeout=10 → boot loop
- PROVE_LOCKING + 9 other lock-debug flags DID NOT catch it before wedge

Architect H1 hypothesis (classical lock-order inversion lockdep models)
is incomplete. Refined hypothesis ladder for v3:
- H1' cross-context wait not modeled by lockdep (KCSAN)
- H2 use-after-free / double-signal of dma_fence (KASAN)
- H3 race in fence alloc/init under concurrency (KCSAN)
- H4 panthor consumer-side bug (separate non-vb2 test)

Sub-bug fixed during execution: forcibly-rebuild-all-vb2-consumers
required after 0004 (touch core.h + make -j8 modules), otherwise
ABI mismatch produces 0-byte ffmpeg output.

DT-based ramoops reservation works (memmap= cmdline is x86-only).
Verified 2465B+70818B ramoops capture across panic+reset.

Recovery via WeChat stick chroot — 4 modules restored to pre-base
md5s, extlinux default flipped to arch_devices vanilla. Ampere
confirmed clean.

v3 plan pending: add KASAN+KCSAN+DEBUG_PREEMPT, full rebuild,
intentionally reproduce wedge under compositor. ~75 min build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:26:30 +00:00
Markus Fritsche 1656f84a40 iter6 plan v2: log A5 execution deviation — PROVE_RCU forced =y by kconfig
CONFIG_PROVE_RCU is selected by CONFIG_PROVE_LOCKING via Kconfig
dependency. Cannot disable without disabling PROVE_LOCKING itself
(which would defeat the purpose). Documenting the execution deviation
from A5: PROVE_RCU=y remains. 10-min HW watchdog has plenty of margin.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:54:38 +00:00
Markus Fritsche a8d563a917 iter6 plan v2: ramoops amendments (user chose ramoops over serial)
User picked ramoops path for the 0.4 hard gate. Current ampere kernel
has CONFIG_PSTORE_RAM=m but lacks PSTORE_CONSOLE, so ramoops can only
be made operational AFTER lockdep kernel rebuild.

4 amendments:
- 0.4: restructured. 0.4a/b survey current state (informational only),
  0.4c notes accepted limitation (hard spinlock+IRQ-off won't flush),
  0.4-G hard gate moves to step 1.8a (after lockdep kernel boots)
- 1.2: add --enable PSTORE_CONSOLE --enable PSTORE_PMSG
- 1.6: extend lockdep extlinux append with ramoops carve-out cmdline
  (memmap=0x100000$0x10000000 ramoops.mem_address=0x10000000
  ramoops.mem_size=0x100000 console_size=0x40000 dump_oops=1).
  DEFAULT override is mandatory per Q3 (ramoops-only operator).
- 1.7/1.8: split into 1.7 (boot+module load), 1.8a (sysrq-trigger
  ramoops verify HARD GATE), 1.8b (regular smoke test)

Documented limitation accepted by user: hard spinlock-with-IRQ-off
deadlocks (the worst-case iter6 v1 wedge shape) may not flush to
pstore before watchdog reset. Serial would catch those; ramoops may
miss. Bisect-apply 0004→0005→0006→0007v2 should surface lockdep
splats BEFORE the deadlock becomes a hard hang anyway.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:53:14 +00:00
Markus Fritsche 543bed905f iter6 plan v2: add remote-operator DEFAULT override note (Sonnet round 2 Q3)
Sonnet round 2 review ACCEPTED 0007 v2 + 4 of 5 amendment areas.
Q3 found one new issue introduced by amendments: U-Boot extlinux menu
selection requires physical keyboard/HDMI presence — not workable for
SSH-only operators relying on serial console.

Fix (single prose addition to step 1.6): if remote-only operator
relying on serial (0.4b), temporarily set DEFAULT arch_devices_lockdep
for this test boot. Restore DEFAULT arch_devices before any non-test
reboot. If ramoops-only (0.4a, no serial), DEFAULT override MANDATORY.

Q1 (0007 v2 source), Q2 (5 amendments), Q4 (recovery sufficiency) all
ACCEPTed. With this Q3 prose fix applied, Phase 5 review complete.
Pending user sign-off + serial cable confirmation before pre-flight.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:35:16 +00:00
Markus Fritsche b02baffca7 iter6 post-mortem Phase 4 v2: per-codec 0007 + lockdep base kernel
Amendments per Sonnet architect review round 1:
- A1: 0007 v2 rewritten — 7 per-codec run() insertion points,
  matches hantro pattern (after preamble metadata copy, before HW kick).
  Old v1 (in rkvdec_device_run) REJECTED — wrong structural placement.
- A2: panthor ww_mutex/dma_resv contention added as primary hypothesis H1.
  Smoke test 1.9/2.x extended to exercise GPU compositor path.
- A3: CONFIG_LOCALVERSION=-lockdep so lockdep kernel uname differs from
  vanilla — prevents modules_install overwriting working tree.
- A4: pstore/serial gate is now HARD (one-of required); pre-flight aborts
  if neither serial nor ramoops is functional.
- A5: PROVE_RCU removed from initial config — boot latency risk pushes
  past watchdog before lockdep prints. Add back only if first run clean.

0007-v2 patch attached: 8 hunks across rkvdec-{h264,hevc,vdpu381-h264,
vdpu381-hevc,vdpu383-h264,vdpu383-hevc,vp9}.c + rkvdec.c queue_init flag.
25 lines insertions.

Pending Phase 5 round 2 delta-review of v2 source + amended plan before
any execution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:33:08 +00:00
Markus Fritsche 0ef64406b6 iter6 post-mortem Phase 4: bisect-apply plan with lockdep base kernel
Pre-flight (5 steps): backup, pstore, serial console verify
Debug base kernel (8 steps): PROVE_LOCKING + LOCKDEP + DEBUG_*
  full kernel rebuild, separate extlinux label, keep vanilla default
Bisect-apply (4 steps): 0004 → 0005 → 0006 → 0007, reboot+test between each
Risk register: 5 risks with mitigations
Total wall-time: ~150 min if clean

Pending Phase 5 architect review (Sonnet) before any execution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:13:48 +00:00
Markus Fritsche 11d2dde8ab iter6 post-mortem Phase 0: substrate + safeguards
Locks in forensics from boot -9..-1 journal:
- Silent watchdog reset, no oops/panic in logs
- vblank/edp WARNs are pre-existing (also fire on recovered boot 0)
- vb2_buffer_attach_release_fence / dma_fence / dma_resv NEVER
  appear in iter6-boot kernel logs — deadlock at a level kernel
  can't reach printk
- Hardware Synopsys DesignWare watchdog is the reset mechanism

Six non-negotiable safeguards for any retry:
1. backup .ko AND off-device archive before sudo install
2. CONFIG_PROVE_LOCKING + DEBUG_ATOMIC_SLEEP + LOCKDEP etc
3. bisect-apply one patch at a time, reboot+test between
4. SDDM auto-login OFF (done — file renamed .disabled-iter6postmortem)
5. pstore.backend=ramoops to capture kernel oops across reset
6. Phase 5 architect review of plan + 0007 source before apply

Four gating questions for Phase 1, starting with bisect:
- which of 4 patches is the actual vector
- lockdep splat hidden by CONFIG_PROVE_LOCKING=n
- why no oops in journal
- producer-side fence-alloc hang vs consumer-side wait hang

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:02:09 +00:00
Markus Fritsche 1594def84e iter5 close — HW reports DEC_RDY but CAPTURE is uniform black
Post-iter3+iter4 patches: per-frame S_EXT_CTRLS succeeds, OUTPUT
bitstream is byte-identical to raw HEVC, hardware IRQ reports
STA_INT=0x107 DEC_RDY=1 TIMEOUT=0 ERROR=0 on every decode, zero
IOMMU faults. But γ-dump shows CAPTURE plane[0]=uniform 0x10 (Y),
plane[1]=uniform 0x80 (CbCr) — video black.

Leading hypothesis for iter6: cache coherency between hardware-
written DMA buffer and userspace cached mmap — same pattern as
RK3399 documented in feedback_rockchip_pixel_verify_path. Iter6
falsifier: VAExportSurfaceHandle → DMA-BUF → DMA_BUF_IOCTL_SYNC,
read. If real content visible, coherency confirmed.

Three open kernel-agent issues: #14 (iter3, verified), #15 (iter4,
verified), #16 TBD (iter5 finding).

Substrate: ampere kernel carries iter3 + iter4 + iter5 IRQ pr_warn.
Backend .so unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 11:43:59 +00:00
Markus Fritsche 46c956bd51 iter4 close — second kernel bug: missing HEVC_SLICE_PARAMS registration
Casanova/Collabora v7.0 HEVC series forgot to register
V4L2_CID_STATELESS_HEVC_SLICE_PARAMS in vdpu38x_hevc_ctrl_descs[].
The legacy rkvdec_hevc_ctrl_descs[] (RK3399 path) has it; the new
vdpu381/vdpu383 path doesn't. Every per-frame S_EXT_CTRLS fails
with EINVAL ("cannot find control id 0xa40a92").

Surfaced via dev_debug=0x3f on /sys/class/video4linux/videoN —
prepare_ext_ctrls's "cannot find" dprintk is gated behind
V4L2_DEV_DEBUG_CTRL (bit 0x20), invisible by default.

1-line patch (5 lines with formatting) mirrors the legacy entry:
SLICE_PARAMS as DYNAMIC_ARRAY, dims={600} (HEVC level >6 max).

Verified on ampere: no EINVAL, no dmesg errors, ffmpeg exit 0,
3-frame NV12 output structurally valid. But output bytes are all
Y=16/Cb=Cr=128 (solid black) — separate downstream bitstream-
feeding bug, deferred to iter5.

Iter5 starts with LIBVA_V4L2_DUMP_OUTPUT to confirm whether the
OUTPUT bitstream is reaching the kernel correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 11:18:04 +00:00
Markus Fritsche 0a400b6f08 iter4 Phase 0: HEVC per-frame S_EXT_CTRLS EINVAL substrate
iter3 1-line kernel fix eliminated the OOPS. Now diagnosing why
the 5-control batch (SPS PPS SLICE_PARAMS SCALING_MATRIX
DECODE_PARAMS) returns EINVAL with error_idx=count=5 → all-zero
output.

Locked-in evidence: control sizes match kernel-expected elem_size
for every CID. SPS values strace-decoded to (chroma=1, bit_depth=0,
1280x720) all pass validate_sps numerically. coded_fmt from S_FMT
trace is 1280x720 S265. validate_new dprintk doesn't fire despite
DEV_DEBUG_CTRL=0x20 set → rejection is silent inside
try_or_set_cluster's try_ctrl path (rkvdec_hevc_validate_sps).

Phase 1 starts with empirical: pr_warn at validate_sps entry to
confirm/refute. Instrumented module already built + ready to
load post-reboot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 11:14:41 +00:00
Markus Fritsche 659c08c81c iter3 close — kernel root cause: uninitialized run.ext_sps_st_rps
rkvdec_hevc_run_preamble conditionally assigns run->ext_sps_st_rps
and run->ext_sps_lt_rps only when ctx->has_sps_*_rps is true. The
caller (vdpu381-hevc.c:591) allocates the run struct without zero-init,
so when has_sps_st_rps is false, run.ext_sps_st_rps retains stack
garbage (0x51a0 deterministically on RK3588 CoolPi CM5 GenBook).

prepare_hw_st_rps's `if (!run->ext_sps_st_rps) return;` check passes
(garbage is non-NULL), then memcmp dereferences 0x51a0 → OOPS.

Backend diagnostic instrumentation (iter3-Q4) revealed the full
dispatch path BeginPicture → RenderPicture → EndPicture →
h265_set_controls → populate_ext_sps_rps_cache(ENODATA, because
ffmpeg-vaapi strips SPS/VPS/PPS, leaving only slice NALs) →
fallback to 5-ctrl batch (EINVAL) → MEDIA_REQUEST_IOC_QUEUE → kernel
OOPS. The OOPS occurs even when our EXT_SPS_*_RPS controls are
NEVER submitted — purely a kernel-side init bug.

1-line patches proposed (Option A — fix in preamble, Option B —
zero-init run struct at caller).

Q1-Q5 all answered. Userspace iter2 work stays — upstream-aligned
path for streams that need EXT_SPS_*_RPS. Iter4 deferred until
kernel patch lands via kernel-agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 10:34:13 +00:00
marfrit dfebd8017f iter3 phase0: HEVC kernel-side investigation substrate
Entry condition: iter2 F1 closed with deterministic x1=0x51a0
evidence + 'our new controls don't reach the kernel' strace.

Substrate:
- kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices
  (same tree as boltzmann's linux-rk3588-marfrit branch)
- module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann
  16-core, deploy via scp + rmmod/insmod cycle (no reboot needed)

5 open questions for Phase 1:
  Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?)
  Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every
     assignment)
  Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS?
  Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log
     h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD.
     Q4 first; informs all others.
  Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c?

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:48:57 +00:00
marfrit 17aa443f8f iter2 close: F1 falsifier fired, mechanism reconstruction was wrong
Phase 6 ran Steps 1-4 cleanly (vendor + adapt + UAPI header +
h265_set_controls wiring; 4 atomic commits f91c3f5..1a2c958 on
backend master, build green). Phase 6 Step 5 smoke test triggered
F1 falsifier per Phase 1 falsifier path.

Evidence:
- strace ioctl trace shows per-frame VIDIOC_S_EXT_CTRLS carries 5
  controls (IDs 0xa40a90..0xa40a94 = standard HEVC SPS/PPS/SLICE/
  SCALING/DECODE_PARAMS). The new 0xa40a98 (EXT_SPS_ST_RPS) and
  0xa40a99 (EXT_SPS_LT_RPS) DO NOT appear in any S_EXT_CTRLS call.
- Probe-line still fires (has_hevc_ext_sps_rps_rkvdec=true confirmed
  via vainfo log) — so the gate is fine, but the populate-and-add
  code path doesn't reach v4l2_set_controls.
- OOPS register state identical across pre-iter2 + post-iter2:
  x1 = 0x51a0 (small integer treated as pointer; pgd=0 confirms
  invalid). The kernel reads this same garbage value regardless of
  whether userspace tries to set the controls or not.

Hypothesis revision: Phase 0's 'UAPI-gap' reconstruction was
PARTIALLY refuted. Even when userspace doesn't populate the new
CIDs (pre-iter2) AND when it tries to (iter2 but the call doesn't
actually fire), the kernel ends up with run->ext_sps_st_rps=0x51a0.
The 0x51a0 is a deterministic kernel-side state — uninitialized
ctrl->p_cur.p or a confused offset-vs-pointer.

Three diagnostic next-steps for iter3 (kernel-side investigation):
  1. Backend instrumentation: log h265_populate_ext_sps_rps_cache
     return code + source_data SPS NAL search outcome
  2. Backend code-path check: is h265.c::h265_set_controls really
     the call site, or does picture.c dispatch via something else?
  3. Kernel instrumentation: printk in rkvdec_hevc_prepare_hw_st_rps
     dumping run->ext_sps_st_rps as read from ctrl->p_cur.p

Meta-campaign re-shuffle:
  - iter2 closes F1 (this commit)
  - iter3 was 'VP9 enablement' -> now bumped to 'HEVC kernel
    investigation' (more urgent, has concrete evidence to pursue)
  - iter4 = VP9 kernel enablement (was iter3)

Source code stays on backend master — iter2 infrastructure
(vendored parser, UAPI shim, runtime probe) is reusable for iter3+
regardless of whether the eventual kernel-side fix changes how
userspace integrates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:32:05 +00:00
marfrit d2ff661554 iter2 phase5: second-model review, 1 finding rebut, 3 findings adopt
Plan subagent (sonnet) reviewed Phase 0-4 with raw artifacts. Four
findings:

1. GstH265SPSEXT data-source gap — REBUT with empirical evidence.
   Reviewer confused GstH265SPSEXT (SPS extension params, separate
   struct) with GstH265ShortTermRefPicSetExt (RPS-extended struct,
   contains the 4 fields they flagged as missing). Both
   ShortTermRefPicSet AND ShortTermRefPicSetExt ARE in the vendored
   gsth265parser.h. Direct read of fetched header confirmed.
2. Per-fd storage for has_hevc_ext_sps_rps — ADOPT. Mirror iter38
   pattern of storing rkvdec + hantro fds separately. Add explicit
   driver_kind=='r' gate for human-readable intent.
3. SPS NAL caching strategy — ADOPT, critical. SPS NALs only arrive
   at IDR frames; per-frame walk would submit zero-filled RPS for
   non-IDR frames and re-OOPS. Parse-and-cache at first IDR, reuse
   on subsequent frames.
4. C3 prediction caveat — ADOPT. Anchor SHA per-clip (HEVC HW vs
   HEVC SW) not cross-codec; iter1's shared SHA across codecs was
   lucky empirical convergence, not guaranteed.

Three Phase 4 amendments applied as appendix to phase4_plan_iter2.md:
  - §Step 3 — per-driver-kind probe storage (pair instead of scalar)
  - §Step 4 — explicit two-struct mapping table; SPS parse-and-cache
  - §Phase 7 predictions C3 — anchor per-clip

Risk register gains risk #6 (SPS absent on non-IDR frames).

Per feedback_review_empirical_over_theoretical: the Finding #1
rebut was done by reading the actual vendored header file content,
not by source-reading the reviewer's argument. Empirical evidence
won, as the memory rule requires.

Plan sound with amendments. Phase 6 can proceed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:57:06 +00:00
marfrit a575e37190 iter2 phase4: concrete implementation plan for Phase 6
5 commits, ~8.7k LOC, 5 existing file mods. Strict order so Phase 7
can bisect if needed:

Step 1: vendor GStreamer parser unchanged @ pinned SHA (does NOT
        build yet — that's intentional, this is upstream baseline)
Step 2: GLib-to-libc mechanical adaptation per the 6-row Phase 2
        table; no logic changes; build succeeds; parser unused
Step 3: add src/hevc-ctrls/v4l2-hevc-ext-controls.h with verbatim
        kernel UAPI defs + runtime VIDIOC_QUERYCTRL probe at context
        init; store on driver_data->has_hevc_ext_sps_rps; gates by
        kernel-supports AND vdpu381/383 driver-kind
Step 4: wire h265_set_controls — add 2 entries to controls[] after
        the existing 5, gated by probe; SPS NAL parsing via the
        vendored gst_h265_parser_*; field-by-field map mirrors
        GStreamer's fill_ext_sps_rps verbatim
Step 5: build, install, REBOOT (to clear m2m wedge from Phase 3
        baseline), smoke-test with 5-frame HEVC decode
Step 6: README documents the 4 vendored LGPL files

Phase 7 C1-C8 predictions explicit + falsifier mapping (F1 -> Phase
0, F2 -> Phase 4 parser bisect, F3 -> Phase 4 per-driver gate audit).

Risk register: 5 risks named, 4 mitigated. Accepted-as-is: the work
is substantial; per operator directive, time/effort budget is
'however long correctness takes.'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:48:48 +00:00
marfrit f5e681caf5 iter2 phase3: HEVC BEFORE-state baseline
HEVC OOPS reproducer captured with strace ioctl trace. Backend
submits 4 VIDIOC_S_EXT_CTRLS (existing SPS/PPS/SCALING_MATRIX/
DECODE_PARAMS) and 1 MEDIA_REQUEST_IOC_QUEUE, kernel faults in
rkvdec_hevc_prepare_hw_st_rps -> __pi_memcmp during processing of
that single request, ffmpeg hangs (m2m wedged), needs SIGKILL.

ZERO V4L2_CID_STATELESS_HEVC_EXT_SPS_*_RPS calls in trace —
corroborates iter2 hypothesis that the backend never sets the new
7.0-UAPI controls. Confirms by independent evidence (in addition
to grep of source which already showed zero hits).

Phase 7 prediction table: VIDIOC_S_EXT_CTRLS 4 -> 6, QBUF 2 -> 10
(for 5 frames), MEDIA_REQUEST_IOC_QUEUE 1 -> 5, dmesg empty
(no new rkvdec_hevc_prepare_hw_st_rps oops).

Falsifier anchor F1: if the trace re-appears post-patch, iter2 loops
back to Phase 0 with re-opened kernel-agent#11.

ampere is in wedged-m2m state post-capture; reboot needed before
Phase 6 / Phase 7. Documented as pre-action for those phases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:48:12 +00:00
marfrit 5541e01d26 iter2 phase2: vendoring spec concrete + dependency chain re-verified
Fetched gsth265parser.c (5409 lines) + .h (2440 lines) from GStreamer
main mirror. Plus GstBitReader (~600 LOC vendored separately) as
the parser's only non-self-contained dependency.

License LGPL v2.1+ (Intel + Sreerenj Balachandran), preserves verbatim
in vendored copies; compatible with backend COPYING.LGPL. Add README
note listing the two vendored files.

GLib adaptation mapped to 6 mechanical replacements:
  GArray/g_array_* -> plain C dynamic arrays (count+ptr)
  g_malloc/g_free  -> libc malloc/free
  g_clear_pointer  -> inline free+NULL
  g_assert         -> propagate parser-failure-code (NOT abort!)
  gboolean/gint/etc -> stdbool/stdint
  GST_DEBUG_*      -> backend's request_log/error_log

Vendor the FULL parser unchanged per upstream-alignment rule; dead
code (PPS/slice/SEI parsing we don't strictly need) is acceptable
to preserve upstream-bug-fix-sync simplicity.

Header strategy concretized: new src/hevc-ctrls/v4l2-hevc-ext-controls.h
~50 lines with verbatim kernel UAPI defs; runtime probe via
VIDIOC_QUERYCTRL at backend init, stored per-driver_data, gated by
both kernel-supports + active driver-kind is vdpu381/383.

Build system impact: 2-line Makefile.am addition, no autoconf, no
pkg-config changes. Compile time uptick acceptable.

New constraints (6 total):
  1. Vendored LGPL header verbatim preservation + README note
  2. Hand-build install path (carries from iter1)
  3. Reboot needed after HEVC OOPS recovery (iter2 test cycle)
  5. Replace g_assert with error propagation, not abort
  6. Parser interpretation may differ from kernel even with same spec

Ready for Phase 3 (mostly inherited from iter1 + iter2 phase 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:42:07 +00:00
marfrit eefc378d93 iter2 phase1: HEVC backend extension goal + vendoring spec
Architecture locked: implement H.265 SPS parser in backend by
vendoring GStreamer's gst-plugins-bad/codecparsers/gsth265parser.c
directly (B1 per operator decision 2026-05-16). Drop GLib deps,
preserve LGPL header + upstream function names (gst_h265_parser_*),
add README note pinning vendored revision.

Header strategy: runtime-optional V4L2 control probe (no #ifndef
shim, per GStreamer pattern). Compile-time CID + struct defs in a
new internal header src/hevc-ctrls/v4l2-hevc-ext-controls.h
mirroring fresnel iter25 precedent.

8 success criteria for iter2:
  C1 — decode completes, 1440 frames
  C2 — HW path engaged (ioctl trace shows new CID writes)
  C3 — frame 0 byte-identical vs SW reference
  C4 — frame 720 SSIM Y in H.264-drift territory, no fixed threshold
  C5 — FPS N=3 with sigma, no fixed threshold
  C6 — dmesg clean, no rkvdec_hevc_prepare_hw_st_rps OOPS
  C7 — firefox-fourier vendor-default HEVC engagement (now possible
       with SDDM auto-login configured; not iter2-blocking)
  C8 — regression check: ampere-fourier iter1's 3-codec baseline
       still passes C1-C6 per iter1 per-codec floors

4 falsifier branches with explicit loopback edges (F1: HEVC still
OOPSes -> re-open ka#11; F2: garbage output -> parser bisect;
F3: regression -> per-driver-kind gate; F4: license issue -> revisit).

Ready for iter2 Phase 2 situation analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:41:22 +00:00
marfrit 299e376d51 iter2 phase0 update: upstream-consumer survey closes Q1 + Q2
GStreamer's MERGED v4l2_codec_h265_dec_fill_ext_sps_rps in
gst-plugins-bad (GStreamer 1.28, MR !10820) is the primary upstream
reference. Walks its own gst_h265_parser_'s GstH265SPS.short_term_
ref_pic_set[] array, field names match the H.265 spec, one-to-one
mapping to the V4L2 control struct. Header strategy: runtime-optional
control probe, NO #ifndef shim.

Casanova's FFmpeg WIP branch (v4l2-request-ext-sps-rps-n8.0.1 at
gitlab.collabora.com) is the secondary reference — walks libavcodec
internal HEVCSPS->st_rps[] with different field names. Useful as
cross-check but not the primary template (renaming gymnastics).

cros-codecs has no support yet (would follow GStreamer's shape if
added). Casanova's kernel-test framework uses fluster through these
two upstream consumers; no other reference exists.

Q1 (architecture): resolved — implement H.265 SPS parser in backend,
mirror GStreamer pattern with spec-compliant field names.
Q2 (UAPI shim): resolved — runtime-optional control probe per
GStreamer pattern, NOT #ifndef shim.

Remaining sub-question for Phase 1: parser SOURCE (vendor GStreamer's
gsth265parser.c, adapt to backend idioms, or implement minimal fresh
from H.265 §7.3.7).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:38:52 +00:00
marfrit cd047a34de iter2 phase0: HEVC backend extension substrate
5 existing HEVC controls in backend (SPS/PPS/SLICE_PARAMS/SCALING_
MATRIX/DECODE_PARAMS at h265.c:660-688) + DECODE_MODE/START_CODE in
context.c. No H.265 bitstream parser in backend (h264_slice_header.c
is the only such precedent — for H.264).

CRITICAL substrate finding: VAAPI VAPictureParameterBufferHEVC
exposes RPS COUNTS (num_short_term_ref_pic_sets,
num_long_term_ref_pic_sps) but NOT the per-RPS array contents
(delta_poc_s0_minus1[], delta_idx_minus1, etc.). So the backend
can't just copy from VAAPI — needs another data source.

5 open questions tabled for iter2 Phase 1, with Q1 = architecture
for RPS data sourcing being load-bearing:
  A. Implement H.265 SPS parser in backend (~800-1500 LOC)
  B. Stage-A test minimal-patch hypothesis (zero-init RPS) first
  C. Link libavcodec's H265RawSPS (adds FFmpeg build dep)
  D. Some other channel TBD (e.g. VAAPI extension buffer)

Plus Q2 (linux-api-headers shim vs bump), Q3 (mechanism depth),
Q4 (test clip — BBB iter1 carries), Q5 (Phase 7 anchor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:33:50 +00:00
marfrit 0b3c23ba66 meta-iter1 close: consolidated phases 2-8 with rationale
Meta-iter1 was a scoping iteration — deliverable is the campaign
umbrella + sub-iter ledger, not a code change. Phases 2-8 are
either rolled into Phase 0+1 (situation, plan) or deliberately N/A
(no meta-level measurements or verification beyond 'issues filed +
ledger exists').

Phase 5 review is the documented deviation from 'reviews are never
skippable': justified because Phase 0's prior-art survey +
verification against kernel source IS the review-equivalent rigor
(per feedback_review_empirical_over_theoretical), and there's no
separate Phase 4 plan to review beyond the ledger. iter2 + iter3
each get a full Phase 5 review on their own.

No new memory entry — lessons fall under existing
feedback_dev_process (Phase 1 loopback when scoping was wrong) and
feedback_characterize_before_change (meta-iter1 = scope + ledger
extends naturally).

Next: start iter2 Phase 0 (HEVC backend extension via
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:32:33 +00:00
marfrit c46e025b2e iter1 phase1: lock meta-campaign goal + sub-iteration ledger
Operator picked option 2: meta-campaign coordinating an HEVC backend
iteration (iter2 against libva-v4l2-request-fourier#3) and a VP9
kernel iteration (iter3 against kernel-agent#12) as parallel children
under one umbrella.

Iteration order: HEVC first because (a) smaller blast radius / faster
feedback loop, (b) the survey's mechanism reconstruction is a
hypothesis not yet test-verified — cheapest way to confirm is a
minimal backend patch, (c) VP9 has unresolved upstream-tree selection
that benefits from absorbing iter2 findings first. Operator may
overrule for parallel execution.

Meta-success: iter2 + iter3 both close with HEVC + VP9 added to
ampere-fourier Phase 3 instrumentation (C1-C6 against per-codec
floors); no regression to iter1 3-codec baseline; patches in right
repos (libva for HEVC, kernel-agent experiment branch for VP9, NOT
linux-ampere-fourier baseline per operator policy).

Falsifiers: iter2 patch doesn't fix HEVC -> re-open kernel-agent#11
with new evidence; iter3 tree doesn't rebase cleanly -> pick another;
Phase 7 regression to iter1 baseline -> Phase 5 review missed shared-
code interaction.

iter1 of the meta-campaign is mostly Phase 0+1+8: scoping + ledger +
close. Real engineering happens inside iter2 and iter3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:32:06 +00:00
marfrit 2a5f5c296e iter1 phase0: substrate + prior-art survey (HEVC reclassified)
Phase 0 ran the operator-mandated upstream prior-art survey FIRST,
before any source-read or hypothesis. Headline finding: the HEVC
OOPS is fundamentally re-scoped from 'kernel bug' to 'userspace UAPI
gap' against the new 7.0 controls
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS.

Survey + verification:
- Casanova/Collabora v8 series merged in Linux 7.0 added the two
  new V4L2 controls for VDPU381 HEVC; backend 7ac934e (June pre-iter38)
  predates the UAPI and grep returns zero hits for these CIDs.
- ampere linux-api-headers is still 6.19-1, doesn't define the
  constants — the backend literally cannot reference them without a
  headers bump.
- ampere kernel source rkvdec-hevc-common.c:500-509 looks up the new
  CIDs; if backend never set them, rkvdec_hevc_prepare_hw_st_rps
  reads invalid memory via memcmp — exactly the __pi_memcmp OOPS
  symptom.

VP9 still kernel-side per the v4 cover ('This patch only adds support
for H264 and H265 in both variants'). Multiple competing out-of-tree
starting trees: Sarma's android tree (working but Android-flavored),
dongioia/rock5bplus-rkvdec2 (mainline-style claims), Kwiboo (no VP9
on RK3588 yet). RKVDEC2 separate-driver path is dead — future VP9
extends the existing rkvdec driver's VDPU381 variant_ops.

Five open questions tabled for Phase 1 — most important being campaign
re-scope (HEVC moves to backend campaign; this stays VP9 kernel-only
OR becomes a meta-campaign coordinating both).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:10:40 +00:00
marfrit 72f658d7b9 scaffold: kernel-side sibling campaign for RK3588 decoder enablement
Sister to ampere-fourier (userspace consumer) and fresnel-fourier
(RK3399 peer). Tracks the 2 kernel-side blockers from ampere-fourier
iter1: HEVC OOPS in rkvdec_hevc_prepare_hw_st_rps (kernel-agent#11)
and VP9 enablement on VDPU381/383 (kernel-agent#12). AV1 stays
userspace (libva-v4l2-request-fourier#2), not in this campaign.

Process notes: Phase 0 includes a non-optional upstream prior-art
survey (linux-rockchip / linux-media / linux-mm / Kwiboo / Bootlin)
before any code; per operator policy, patches go to kernel-agent
experiment branches, NOT into the linux-ampere-fourier baseline
package.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:07:55 +00:00