HW vs SW frame 100 byte-compare: 100.00% exact match (1382400 / 1382400 bytes). 182 unique byte values, identical top-10 distribution. Campaign actually completed at iter4 (kernel-agent#14 ext_sps NULL init + kernel-agent#15 HEVC_SLICE_PARAMS registration). iter5 'uniform Y=0x10/CbCr=0x80 black output' was a mis-diagnosis — the bbb video first 3 frames ARE genuinely all-black (intro fade-in). A 2-min SW-reference byte-compare at iter5 close would have ended the campaign 10+ hours earlier. iter6 (vb2 fence series, 3 versions, 6 WeChat recoveries): off-path, but found a real upstream NULL deref at dma_fence->context (offset 0x20) in dma_resv_add_fence — file as kernel-agent#16 when UART confirms the register dump. iter7 (DT dma-coherent on rkvdec): off-path AND falsified — RK3588 rkvdec is NOT in ACE-Lite coherent domain. dma-coherent causes HW timeouts. Reverted. 5 new/updated memories: - feedback_compare_hw_against_sw_reference.md (the lesson) - feedback_backup_before_module_replace.md - feedback_sddm_autologin_disable.md - feedback_no_session_termination_attempts.md (reinforced) - reference_dmabuf_resv_blocker.md (overturned claim) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.9 KiB
Campaign close — ampere HEVC decode bit-perfect
Date: 2026-05-17 ~00:42 (single-session campaign 2026-05-16 morning through 2026-05-17 early hours)
Branch: master
Cross-ref: phase0/phase1/etc. for iter1-iter4 (real bugs fixed); iter5_close.md (mis-diagnosis), iter6_v2_attempt2_close.md, iter6_v6_substrate_null_deref_at_0x20.md (off-path), phase0_findings_iter7.md (off-path)
Final verification (the missing iter5 check)
ssh ampere 'ffmpeg ... -i bbb_60s_720p.hevc.mp4 -vf format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/sw-frame100.nv12' # SW reference
ssh ampere 'LIBVA_DRIVER_NAME=v4l2_request ffmpeg ... -hwaccel vaapi -vf hwdownload,format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/hw-frame100.nv12' # HW path
ssh ampere 'python3 -c "..." byte-compare'
Result: HW vs SW frame 100 exact match: 100.00% (1382400 / 1382400 bytes identical). 182 unique byte values, identical top-10 frequency distribution (208:64838, 206:56757, 233:48907, 210:47080, 207:46245, ...). Same on iter5-IRQ pr_warn: every per-frame IRQ shows STA_INT=0x00000107 DEC_RDY=1 TIMEOUT=0 ERROR=0.
What actually closes the campaign
Two upstream-aligned kernel fixes:
| # | Patch | Status | Closes |
|---|---|---|---|
| iter3 / kernel-agent#14 | rkvdec_hevc_run_preamble NULL-inits run->ext_sps_st_rps / lt_rps before conditional assignment |
filed + verified by kernel-agent attempt | HEVC OOPS at prepare_hw_st_rps+0x38 (memcmp on stack-garbage 0x51a0) |
| iter4 / kernel-agent#15 | vdpu38x_hevc_ctrl_descs[] registers V4L2_CID_STATELESS_HEVC_SLICE_PARAMS with DYNAMIC_ARRAY/dims={600} (legacy rkvdec_hevc_ctrl_descs[] had it; Casanova v7.0 forgot for the RK3588/RK3576 new path) |
filed + verified by kernel-agent attempt | per-frame S_EXT_CTRLS EINVAL ("cannot find control id 0xa40a92") |
With both kernel patches + the existing libva-v4l2-request-fourier iter38b backend, HEVC decode on RK3588 is bit-perfect against libavcodec SW reference. Nothing else was needed.
Iter5 mis-diagnosis (root cause of wasted work)
At iter5 close, I observed ffmpeg -frames:v 3 output to /tmp/o.nv12 was uniform Y=0x10, CbCr=0x80. I incorrectly assumed:
- This meant the buffer was uninitialised OR the HW didn't actually decode
- Therefore there was a downstream "black-output bug" worth investigating
What was actually true:
- The bbb_60s_720p.hevc.mp4 video's first 3 frames are genuinely all-black (intro fade-in)
Y=0x10, CbCr=0x80is the correct NV12 representation of "video black" (studio range)- HW decoded exactly what SW would have decoded — the decoder was working
A single ffmpeg ... -frames:v 3 /tmp/sw-ref.nv12 SW reference + byte-compare at iter5 close would have shown 100% match, no bug, campaign closed.
What iter6 + iter7 produced
Off-path investigations triggered by the iter5 mis-diagnosis. Outcomes:
iter6 (vb2 fence series)
- Real finding (off-path but valuable for upstream): there's a NULL deref in
dma_resv_add_fence's context-merge iteration on RK3588 — virtual address 0x20 maps tostruct dma_fence::contextfield. Triggered byvb2_buffer_attach_release_fence'sUSAGE_WRITE(list-of-fences accumulation racing with RCU GC). Proposed fix: useUSAGE_KERNEL(single-slot atomic replace). Seeiter6_v6_substrate_null_deref_at_0x20.md. Not on ampere HEVC critical path — only matters for Wayland compositor + V4L2 dmabuf-import scenarios. - Cost: 6 WeChat-stick recoveries, 64-min lockdep kernel build, 53-min KASAN kernel build, multiple intermediate doc rounds.
- Status: file upstream as kernel-agent#16 when UART confirms register dump matches the hypothesis. Otherwise park as documented hypothesis.
iter7 (DT dma-coherent)
- Falsified: enabling
dma-coherentonvdec0/vdec1DT nodes makes the kernel skip cache management for rkvdec DMA. Hardware then TIMES OUT (13/15 framesTIMEOUT=1 DEC_RDY=0) because it reads stale input data and can't decode. CPU sees an all-zero CAPTURE buffer (kvzalloc'd default, never written). REVERTED — DTB restored from.pre-iter7-bkp. - Cost: ~20 min for DTS edit + dtb build + reboot cycle + verification.
- Status: closed as falsified.
dma-coherentdoes NOT apply to rkvdec on RK3588 — the IP is NOT in the ACE-Lite coherent domain. Confirms RK3588 rkvdec requires kernel DMA cache management.
Persistent state at campaign close
- ampere: vanilla
7.0.0-rc3-devices+kernel + iter3+iter4-fixed modules. Bit-perfect HEVC decode confirmed. Default extlinux labelarch_devices. - Source tree (
ampere:~/src/linux-rockchip): iter3 + iter4 + iter4-DIAG (validate_sps pr_warn, cosmetic) + iter5-IRQ (vdpu381_irq_handler pr_warn, cosmetic) patches uncommitted. Need to commit + push tolinux-rk3588-marfritbranch as the upstream-bound contribution. - Backend (
ampere:/usr/lib/dri/v4l2_request_drv_video.so): iter3 instrumented build (md5404041ea2dcc03c769e0ab8c43ddadd6). Diagnostic logging (iter3-Q4 pr_warn) can stay or be stripped — no behavioural impact. - Lockdep + lockdep-kasan kernels: installed in
/lib/modules/and/boot/firmware/, NOT default in extlinux. Available for future hypothesis work but not loaded. Modules at/lib/modules/7.0.0-rc3-lockdep+/are 0004-contaminated (will wedge if booted); the lockdep-kasan modules have the same issue. Either delete those labels from extlinux or leave as-is. - WeChat stick (
higgs:/dev/sdaor wherever it lives now): configured for ampere recovery — defaultcoolpi_rk3588_gbook, fstabLABEL=writable / ext4. Worked through 7 recovery cycles today. Preserved. - Iter6 v1 .ko backups:
ampere:~/iter6-broken-modules-bak-20260516-1720/(the OOPS-causing modules from the first iter6 attempt) — preserved for binary-diff analysis if anyone wants to forensic-pick the NULL-deref bug later. - DTS backup:
rk3588-coolpi-cm5-genbook.dts.pre-iter7-bkpon ampere,.pre-ramoops-bkpfrom earlier — iter6 ramoops DT changes still present in DT source but only enabled on the lockdep label, not used in the default vanilla boot. - Open kernel-agent issues: #14 (iter3, verified working), #15 (iter4, verified working), #16 TBD (iter6 NULL deref hypothesis, awaits UART). No follow-ups required for the ampere HEVC use case.
Memory updates landed this campaign
| Memory | Note |
|---|---|
feedback_compare_hw_against_sw_reference.md (new) |
the lesson from the iter5 mis-diagnosis |
feedback_backup_before_module_replace.md (new) |
the lesson from the iter6 v1 recovery |
feedback_sddm_autologin_disable.md (new) |
the lesson from "rename insufficient" |
feedback_no_session_termination_attempts.md (reinforced) |
5 repeat violations today, expanded incident citation |
reference_dmabuf_resv_blocker.md (corrected) |
overturned the "vb2_dma_resv fixes readback" claim — actually scope is compositor green-frames |
What's next (campaign-level, separate work)
- Commit iter3+iter4 patches to ampere:
~/src/linux-rockchiplinux-rk3588-marfritbranch + push (kernel-agent product source). - Promote both patches via kernel-agent into the next
linux-ampere-fourierpackage build (when ka-promote lands or via manual flow). - File iter6 NULL deref upstream once UART trace exists — separate from this campaign.
- Iter4 + iter5 close docs need a small amendment noting the actual end-state (closed correctly with this verification, not at iter5 close as previously documented).
Campaign metrics
- Wall-time: 2026-05-16 morning → 2026-05-17 ~00:42 (~16h of active session, multiple breaks)
- Iterations attempted: iter1-iter4 (productive) + iter5-iter7 (off-path)
- WeChat-stick recoveries: 7
- Kernel rebuilds: 3 (vanilla, lockdep, lockdep-kasan); 2 module-only rebuild cycles
- Kernel-agent issues opened: 2 (verified), 1 staged (TBD)
- Architect (Sonnet) review rounds: 3 + 1 amendment cycle
- The single check that would have closed the campaign 10+ hours earlier:
ffmpeg ... format=nv12 SW reference + byte-compare. Saved into memory.