fdd9c635cb
HW vs SW frame 100 byte-compare: 100.00% exact match (1382400 / 1382400 bytes). 182 unique byte values, identical top-10 distribution. Campaign actually completed at iter4 (kernel-agent#14 ext_sps NULL init + kernel-agent#15 HEVC_SLICE_PARAMS registration). iter5 'uniform Y=0x10/CbCr=0x80 black output' was a mis-diagnosis — the bbb video first 3 frames ARE genuinely all-black (intro fade-in). A 2-min SW-reference byte-compare at iter5 close would have ended the campaign 10+ hours earlier. iter6 (vb2 fence series, 3 versions, 6 WeChat recoveries): off-path, but found a real upstream NULL deref at dma_fence->context (offset 0x20) in dma_resv_add_fence — file as kernel-agent#16 when UART confirms the register dump. iter7 (DT dma-coherent on rkvdec): off-path AND falsified — RK3588 rkvdec is NOT in ACE-Lite coherent domain. dma-coherent causes HW timeouts. Reverted. 5 new/updated memories: - feedback_compare_hw_against_sw_reference.md (the lesson) - feedback_backup_before_module_replace.md - feedback_sddm_autologin_disable.md - feedback_no_session_termination_attempts.md (reinforced) - reference_dmabuf_resv_blocker.md (overturned claim) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
92 lines
7.9 KiB
Markdown
92 lines
7.9 KiB
Markdown
# Campaign close — ampere HEVC decode bit-perfect
|
|
|
|
Date: 2026-05-17 ~00:42 (single-session campaign 2026-05-16 morning through 2026-05-17 early hours)
|
|
Branch: `master`
|
|
Cross-ref: phase0/phase1/etc. for iter1-iter4 (real bugs fixed); iter5_close.md (mis-diagnosis), iter6_v2_attempt2_close.md, iter6_v6_substrate_null_deref_at_0x20.md (off-path), phase0_findings_iter7.md (off-path)
|
|
|
|
## Final verification (the missing iter5 check)
|
|
|
|
```
|
|
ssh ampere 'ffmpeg ... -i bbb_60s_720p.hevc.mp4 -vf format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/sw-frame100.nv12' # SW reference
|
|
ssh ampere 'LIBVA_DRIVER_NAME=v4l2_request ffmpeg ... -hwaccel vaapi -vf hwdownload,format=nv12,select=eq(n,100) -frames:v 1 -f rawvideo /tmp/hw-frame100.nv12' # HW path
|
|
ssh ampere 'python3 -c "..." byte-compare'
|
|
```
|
|
|
|
Result: **HW vs SW frame 100 exact match: 100.00%** (1382400 / 1382400 bytes identical). 182 unique byte values, identical top-10 frequency distribution `(208:64838, 206:56757, 233:48907, 210:47080, 207:46245, ...)`. Same on iter5-IRQ pr_warn: every per-frame IRQ shows `STA_INT=0x00000107 DEC_RDY=1 TIMEOUT=0 ERROR=0`.
|
|
|
|
## What actually closes the campaign
|
|
|
|
Two upstream-aligned kernel fixes:
|
|
|
|
| # | Patch | Status | Closes |
|
|
|---|-------|--------|--------|
|
|
| iter3 / kernel-agent#14 | `rkvdec_hevc_run_preamble` NULL-inits `run->ext_sps_st_rps` / `lt_rps` before conditional assignment | filed + verified by kernel-agent attempt | HEVC OOPS at `prepare_hw_st_rps+0x38` (memcmp on stack-garbage 0x51a0) |
|
|
| iter4 / kernel-agent#15 | `vdpu38x_hevc_ctrl_descs[]` registers `V4L2_CID_STATELESS_HEVC_SLICE_PARAMS` with `DYNAMIC_ARRAY`/dims={600} (legacy `rkvdec_hevc_ctrl_descs[]` had it; Casanova v7.0 forgot for the RK3588/RK3576 new path) | filed + verified by kernel-agent attempt | per-frame `S_EXT_CTRLS` EINVAL ("cannot find control id 0xa40a92") |
|
|
|
|
With both kernel patches + the existing `libva-v4l2-request-fourier` iter38b backend, HEVC decode on RK3588 is **bit-perfect against libavcodec SW reference**. Nothing else was needed.
|
|
|
|
## Iter5 mis-diagnosis (root cause of wasted work)
|
|
|
|
At iter5 close, I observed `ffmpeg -frames:v 3` output to `/tmp/o.nv12` was uniform `Y=0x10, CbCr=0x80`. I incorrectly assumed:
|
|
- This meant the buffer was uninitialised OR the HW didn't actually decode
|
|
- Therefore there was a downstream "black-output bug" worth investigating
|
|
|
|
What was actually true:
|
|
- The bbb_60s_720p.hevc.mp4 video's first 3 frames are **genuinely all-black** (intro fade-in)
|
|
- `Y=0x10, CbCr=0x80` is the **correct NV12 representation of "video black" (studio range)**
|
|
- HW decoded exactly what SW would have decoded — the decoder was working
|
|
|
|
A single `ffmpeg ... -frames:v 3 /tmp/sw-ref.nv12` SW reference + byte-compare at iter5 close would have shown 100% match, no bug, campaign closed.
|
|
|
|
## What iter6 + iter7 produced
|
|
|
|
Off-path investigations triggered by the iter5 mis-diagnosis. Outcomes:
|
|
|
|
### iter6 (vb2 fence series)
|
|
- **Real finding (off-path but valuable for upstream)**: there's a NULL deref in `dma_resv_add_fence`'s context-merge iteration on RK3588 — virtual address 0x20 maps to `struct dma_fence::context` field. Triggered by `vb2_buffer_attach_release_fence`'s `USAGE_WRITE` (list-of-fences accumulation racing with RCU GC). Proposed fix: use `USAGE_KERNEL` (single-slot atomic replace). See `iter6_v6_substrate_null_deref_at_0x20.md`. **Not on ampere HEVC critical path** — only matters for Wayland compositor + V4L2 dmabuf-import scenarios.
|
|
- **Cost**: 6 WeChat-stick recoveries, 64-min lockdep kernel build, 53-min KASAN kernel build, multiple intermediate doc rounds.
|
|
- **Status**: file upstream as kernel-agent#16 when UART confirms register dump matches the hypothesis. Otherwise park as documented hypothesis.
|
|
|
|
### iter7 (DT dma-coherent)
|
|
- **Falsified**: enabling `dma-coherent` on `vdec0` / `vdec1` DT nodes makes the kernel skip cache management for rkvdec DMA. Hardware then **TIMES OUT** (13/15 frames `TIMEOUT=1 DEC_RDY=0`) because it reads stale input data and can't decode. CPU sees an all-zero CAPTURE buffer (kvzalloc'd default, never written). REVERTED — DTB restored from `.pre-iter7-bkp`.
|
|
- **Cost**: ~20 min for DTS edit + dtb build + reboot cycle + verification.
|
|
- **Status**: closed as falsified. `dma-coherent` does NOT apply to rkvdec on RK3588 — the IP is NOT in the ACE-Lite coherent domain. Confirms RK3588 rkvdec requires kernel DMA cache management.
|
|
|
|
## Persistent state at campaign close
|
|
|
|
- **ampere**: vanilla `7.0.0-rc3-devices+` kernel + iter3+iter4-fixed modules. Bit-perfect HEVC decode confirmed. Default extlinux label `arch_devices`.
|
|
- **Source tree** (`ampere:~/src/linux-rockchip`): iter3 + iter4 + iter4-DIAG (validate_sps pr_warn, cosmetic) + iter5-IRQ (vdpu381_irq_handler pr_warn, cosmetic) patches uncommitted. Need to commit + push to `linux-rk3588-marfrit` branch as the upstream-bound contribution.
|
|
- **Backend** (`ampere:/usr/lib/dri/v4l2_request_drv_video.so`): iter3 instrumented build (md5 `404041ea2dcc03c769e0ab8c43ddadd6`). Diagnostic logging (iter3-Q4 pr_warn) can stay or be stripped — no behavioural impact.
|
|
- **Lockdep + lockdep-kasan kernels**: installed in `/lib/modules/` and `/boot/firmware/`, NOT default in extlinux. Available for future hypothesis work but not loaded. Modules at `/lib/modules/7.0.0-rc3-lockdep+/` are 0004-contaminated (will wedge if booted); the lockdep-kasan modules have the same issue. Either delete those labels from extlinux or leave as-is.
|
|
- **WeChat stick** (`higgs:/dev/sda` or wherever it lives now): configured for ampere recovery — default `coolpi_rk3588_gbook`, fstab `LABEL=writable / ext4`. Worked through 7 recovery cycles today. Preserved.
|
|
- **Iter6 v1 .ko backups**: `ampere:~/iter6-broken-modules-bak-20260516-1720/` (the OOPS-causing modules from the first iter6 attempt) — preserved for binary-diff analysis if anyone wants to forensic-pick the NULL-deref bug later.
|
|
- **DTS backup**: `rk3588-coolpi-cm5-genbook.dts.pre-iter7-bkp` on ampere, `.pre-ramoops-bkp` from earlier — iter6 ramoops DT changes still present in DT source but only enabled on the lockdep label, not used in the default vanilla boot.
|
|
- **Open kernel-agent issues**: #14 (iter3, verified working), #15 (iter4, verified working), #16 TBD (iter6 NULL deref hypothesis, awaits UART). No follow-ups required for the ampere HEVC use case.
|
|
|
|
## Memory updates landed this campaign
|
|
|
|
| Memory | Note |
|
|
|---|---|
|
|
| `feedback_compare_hw_against_sw_reference.md` (new) | the lesson from the iter5 mis-diagnosis |
|
|
| `feedback_backup_before_module_replace.md` (new) | the lesson from the iter6 v1 recovery |
|
|
| `feedback_sddm_autologin_disable.md` (new) | the lesson from "rename insufficient" |
|
|
| `feedback_no_session_termination_attempts.md` (reinforced) | 5 repeat violations today, expanded incident citation |
|
|
| `reference_dmabuf_resv_blocker.md` (corrected) | overturned the "vb2_dma_resv fixes readback" claim — actually scope is compositor green-frames |
|
|
|
|
## What's next (campaign-level, separate work)
|
|
|
|
- Commit iter3+iter4 patches to ampere:`~/src/linux-rockchip` `linux-rk3588-marfrit` branch + push (kernel-agent product source).
|
|
- Promote both patches via kernel-agent into the next `linux-ampere-fourier` package build (when ka-promote lands or via manual flow).
|
|
- File iter6 NULL deref upstream once UART trace exists — separate from this campaign.
|
|
- Iter4 + iter5 close docs need a small amendment noting the actual end-state (closed correctly with this verification, not at iter5 close as previously documented).
|
|
|
|
## Campaign metrics
|
|
|
|
- Wall-time: 2026-05-16 morning → 2026-05-17 ~00:42 (~16h of active session, multiple breaks)
|
|
- Iterations attempted: iter1-iter4 (productive) + iter5-iter7 (off-path)
|
|
- WeChat-stick recoveries: 7
|
|
- Kernel rebuilds: 3 (vanilla, lockdep, lockdep-kasan); 2 module-only rebuild cycles
|
|
- Kernel-agent issues opened: 2 (verified), 1 staged (TBD)
|
|
- Architect (Sonnet) review rounds: 3 + 1 amendment cycle
|
|
- The single check that would have closed the campaign 10+ hours earlier: `ffmpeg ... format=nv12 SW reference + byte-compare`. Saved into memory.
|