Files
Markus Fritsche fdd9c635cb Campaign close: HEVC decode bit-perfect on ampere
HW vs SW frame 100 byte-compare: 100.00% exact match (1382400 /
1382400 bytes). 182 unique byte values, identical top-10
distribution.

Campaign actually completed at iter4 (kernel-agent#14 ext_sps
NULL init + kernel-agent#15 HEVC_SLICE_PARAMS registration).
iter5 'uniform Y=0x10/CbCr=0x80 black output' was a
mis-diagnosis — the bbb video first 3 frames ARE genuinely
all-black (intro fade-in). A 2-min SW-reference byte-compare
at iter5 close would have ended the campaign 10+ hours earlier.

iter6 (vb2 fence series, 3 versions, 6 WeChat recoveries):
off-path, but found a real upstream NULL deref at
dma_fence->context (offset 0x20) in dma_resv_add_fence —
file as kernel-agent#16 when UART confirms the register dump.

iter7 (DT dma-coherent on rkvdec): off-path AND falsified —
RK3588 rkvdec is NOT in ACE-Lite coherent domain. dma-coherent
causes HW timeouts. Reverted.

5 new/updated memories:
- feedback_compare_hw_against_sw_reference.md (the lesson)
- feedback_backup_before_module_replace.md
- feedback_sddm_autologin_disable.md
- feedback_no_session_termination_attempts.md (reinforced)
- reference_dmabuf_resv_blocker.md (overturned claim)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:41:06 +00:00

7.9 KiB

Campaign close — ampere HEVC decode bit-perfect

Date: 2026-05-17 ~00:42 (single-session campaign 2026-05-16 morning through 2026-05-17 early hours) Branch: master Cross-ref: phase0/phase1/etc. for iter1-iter4 (real bugs fixed); iter5_close.md (mis-diagnosis), iter6_v2_attempt2_close.md, iter6_v6_substrate_null_deref_at_0x20.md (off-path), phase0_findings_iter7.md (off-path)

Final verification (the missing iter5 check)

ssh ampere 'ffmpeg ... -i bbb_60s_720p.hevc.mp4 -vf format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/sw-frame100.nv12'   # SW reference
ssh ampere 'LIBVA_DRIVER_NAME=v4l2_request ffmpeg ... -hwaccel vaapi -vf hwdownload,format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/hw-frame100.nv12'   # HW path
ssh ampere 'python3 -c "..." byte-compare'

Result: HW vs SW frame 100 exact match: 100.00% (1382400 / 1382400 bytes identical). 182 unique byte values, identical top-10 frequency distribution (208:64838, 206:56757, 233:48907, 210:47080, 207:46245, ...). Same on iter5-IRQ pr_warn: every per-frame IRQ shows STA_INT=0x00000107 DEC_RDY=1 TIMEOUT=0 ERROR=0.

What actually closes the campaign

Two upstream-aligned kernel fixes:

# Patch Status Closes
iter3 / kernel-agent#14 rkvdec_hevc_run_preamble NULL-inits run->ext_sps_st_rps / lt_rps before conditional assignment filed + verified by kernel-agent attempt HEVC OOPS at prepare_hw_st_rps+0x38 (memcmp on stack-garbage 0x51a0)
iter4 / kernel-agent#15 vdpu38x_hevc_ctrl_descs[] registers V4L2_CID_STATELESS_HEVC_SLICE_PARAMS with DYNAMIC_ARRAY/dims={600} (legacy rkvdec_hevc_ctrl_descs[] had it; Casanova v7.0 forgot for the RK3588/RK3576 new path) filed + verified by kernel-agent attempt per-frame S_EXT_CTRLS EINVAL ("cannot find control id 0xa40a92")

With both kernel patches + the existing libva-v4l2-request-fourier iter38b backend, HEVC decode on RK3588 is bit-perfect against libavcodec SW reference. Nothing else was needed.

Iter5 mis-diagnosis (root cause of wasted work)

At iter5 close, I observed ffmpeg -frames:v 3 output to /tmp/o.nv12 was uniform Y=0x10, CbCr=0x80. I incorrectly assumed:

  • This meant the buffer was uninitialised OR the HW didn't actually decode
  • Therefore there was a downstream "black-output bug" worth investigating

What was actually true:

  • The bbb_60s_720p.hevc.mp4 video's first 3 frames are genuinely all-black (intro fade-in)
  • Y=0x10, CbCr=0x80 is the correct NV12 representation of "video black" (studio range)
  • HW decoded exactly what SW would have decoded — the decoder was working

A single ffmpeg ... -frames:v 3 /tmp/sw-ref.nv12 SW reference + byte-compare at iter5 close would have shown 100% match, no bug, campaign closed.

What iter6 + iter7 produced

Off-path investigations triggered by the iter5 mis-diagnosis. Outcomes:

iter6 (vb2 fence series)

  • Real finding (off-path but valuable for upstream): there's a NULL deref in dma_resv_add_fence's context-merge iteration on RK3588 — virtual address 0x20 maps to struct dma_fence::context field. Triggered by vb2_buffer_attach_release_fence's USAGE_WRITE (list-of-fences accumulation racing with RCU GC). Proposed fix: use USAGE_KERNEL (single-slot atomic replace). See iter6_v6_substrate_null_deref_at_0x20.md. Not on ampere HEVC critical path — only matters for Wayland compositor + V4L2 dmabuf-import scenarios.
  • Cost: 6 WeChat-stick recoveries, 64-min lockdep kernel build, 53-min KASAN kernel build, multiple intermediate doc rounds.
  • Status: file upstream as kernel-agent#16 when UART confirms register dump matches the hypothesis. Otherwise park as documented hypothesis.

iter7 (DT dma-coherent)

  • Falsified: enabling dma-coherent on vdec0 / vdec1 DT nodes makes the kernel skip cache management for rkvdec DMA. Hardware then TIMES OUT (13/15 frames TIMEOUT=1 DEC_RDY=0) because it reads stale input data and can't decode. CPU sees an all-zero CAPTURE buffer (kvzalloc'd default, never written). REVERTED — DTB restored from .pre-iter7-bkp.
  • Cost: ~20 min for DTS edit + dtb build + reboot cycle + verification.
  • Status: closed as falsified. dma-coherent does NOT apply to rkvdec on RK3588 — the IP is NOT in the ACE-Lite coherent domain. Confirms RK3588 rkvdec requires kernel DMA cache management.

Persistent state at campaign close

  • ampere: vanilla 7.0.0-rc3-devices+ kernel + iter3+iter4-fixed modules. Bit-perfect HEVC decode confirmed. Default extlinux label arch_devices.
  • Source tree (ampere:~/src/linux-rockchip): iter3 + iter4 + iter4-DIAG (validate_sps pr_warn, cosmetic) + iter5-IRQ (vdpu381_irq_handler pr_warn, cosmetic) patches uncommitted. Need to commit + push to linux-rk3588-marfrit branch as the upstream-bound contribution.
  • Backend (ampere:/usr/lib/dri/v4l2_request_drv_video.so): iter3 instrumented build (md5 404041ea2dcc03c769e0ab8c43ddadd6). Diagnostic logging (iter3-Q4 pr_warn) can stay or be stripped — no behavioural impact.
  • Lockdep + lockdep-kasan kernels: installed in /lib/modules/ and /boot/firmware/, NOT default in extlinux. Available for future hypothesis work but not loaded. Modules at /lib/modules/7.0.0-rc3-lockdep+/ are 0004-contaminated (will wedge if booted); the lockdep-kasan modules have the same issue. Either delete those labels from extlinux or leave as-is.
  • WeChat stick (higgs:/dev/sda or wherever it lives now): configured for ampere recovery — default coolpi_rk3588_gbook, fstab LABEL=writable / ext4. Worked through 7 recovery cycles today. Preserved.
  • Iter6 v1 .ko backups: ampere:~/iter6-broken-modules-bak-20260516-1720/ (the OOPS-causing modules from the first iter6 attempt) — preserved for binary-diff analysis if anyone wants to forensic-pick the NULL-deref bug later.
  • DTS backup: rk3588-coolpi-cm5-genbook.dts.pre-iter7-bkp on ampere, .pre-ramoops-bkp from earlier — iter6 ramoops DT changes still present in DT source but only enabled on the lockdep label, not used in the default vanilla boot.
  • Open kernel-agent issues: #14 (iter3, verified working), #15 (iter4, verified working), #16 TBD (iter6 NULL deref hypothesis, awaits UART). No follow-ups required for the ampere HEVC use case.

Memory updates landed this campaign

Memory Note
feedback_compare_hw_against_sw_reference.md (new) the lesson from the iter5 mis-diagnosis
feedback_backup_before_module_replace.md (new) the lesson from the iter6 v1 recovery
feedback_sddm_autologin_disable.md (new) the lesson from "rename insufficient"
feedback_no_session_termination_attempts.md (reinforced) 5 repeat violations today, expanded incident citation
reference_dmabuf_resv_blocker.md (corrected) overturned the "vb2_dma_resv fixes readback" claim — actually scope is compositor green-frames

What's next (campaign-level, separate work)

  • Commit iter3+iter4 patches to ampere:~/src/linux-rockchip linux-rk3588-marfrit branch + push (kernel-agent product source).
  • Promote both patches via kernel-agent into the next linux-ampere-fourier package build (when ka-promote lands or via manual flow).
  • File iter6 NULL deref upstream once UART trace exists — separate from this campaign.
  • Iter4 + iter5 close docs need a small amendment noting the actual end-state (closed correctly with this verification, not at iter5 close as previously documented).

Campaign metrics

  • Wall-time: 2026-05-16 morning → 2026-05-17 ~00:42 (~16h of active session, multiple breaks)
  • Iterations attempted: iter1-iter4 (productive) + iter5-iter7 (off-path)
  • WeChat-stick recoveries: 7
  • Kernel rebuilds: 3 (vanilla, lockdep, lockdep-kasan); 2 module-only rebuild cycles
  • Kernel-agent issues opened: 2 (verified), 1 staged (TBD)
  • Architect (Sonnet) review rounds: 3 + 1 amendment cycle
  • The single check that would have closed the campaign 10+ hours earlier: ffmpeg ... format=nv12 SW reference + byte-compare. Saved into memory.