Files
ampere-kernel-decoders/phase0_findings_iter7.md
T
Markus Fritsche 24272596cd iter7 Phase 0: pivot to cache-coherency hypothesis (H1) for iter5 black-output
iter6 was an off-path investigation. Sonnet round-1 review + my
own corrected memory feedback_rfc_v2_vb2_dma_resv_scope.md make
clear: vb2 fence series targets Wayland compositor green-frames,
not libva cached-mmap readback. iter6 found a real upstream NULL
deref bug (filed for kernel-agent#16 when UART captures the trace)
but it's not on the critical path for iter5.

iter7 returns to iter5's actual hypothesis ladder:
- H1(a) DT dma-coherent on rkvdec node — cheapest, first
- H1(b) backend DMA_BUF_IOCTL_SYNC userspace fix — if H1(a) fails
- H2 wrong-DMA-address — if H1(a)+(b) fail
- H3 false-positive DEC_RDY — last resort

Test on vanilla kernel + iter3+iter4-fixed modules (already
verified working pipeline). Pass criterion: ffmpeg-vaapi output
shows more than {16, 128} unique bytes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:32:20 +00:00

67 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — iter7 substrate (cache-coherency angle for iter5 black-output)
Opened 2026-05-17 ~00:45, following iter6 v1-v6 chain that exhausted the vb2-fence-series hypothesis (iter6 was an off-path investigation — see Why-pivot below).
## Research question
**Does forcing the RK3588 rkvdec DMA path into the cache-coherent domain (via DT `dma-coherent` property on the rkvdec node) make iter5's CAPTURE buffer readback show real decoded NV12 content instead of uniform Y=0x10/CbCr=0x80?**
## Locked-in evidence carried from iter5 (still binding)
| Observation | Source | Status |
|-------------|--------|--------|
| Iter3+iter4 kernel patches verified working: HEVC OOPS gone, EINVAL gone, decoder runs end-to-end | iter3_close.md (kernel-agent#14) + iter4_close.md (kernel-agent#15) | verified, both upstream-aligned |
| ffmpeg HEVC test: rc=0, /tmp/o.nv12 = 4147200 bytes (exact 3×NV12-frame size for 1280×720) | iter5_close.md | confirmed |
| iter5-IRQ pr_warn diagnostic: every frame ends with `STA_INT=0x107 DEC_RDY=1 TIMEOUT=0 ERROR=0` | iter5_close.md | confirmed — hardware decode succeeds |
| Output bytes are uniform Y=0x10 (luma "video black") and Cb/Cr=0x80 (chroma neutral) — solid black | iter5_close.md | confirmed — symptom not garbage |
| OUTPUT bitstream is byte-identical to raw HEVC NAL with Annex-B prefix prepended | iter5_close.md | confirmed |
| populate_ext_sps_rps_cache returns -ENODATA because ffmpeg-vaapi strips SPS/VPS/PPS | iter5_close.md | confirmed Phase 5 reviewer's prediction |
| Backend's 5-control batch commits (post iter4 SLICE_PARAMS registration) | iter5_close.md | confirmed via iter4-DIAG validate_sps firing per-frame |
## Why pivot from iter6 (vb2 fence series)
Iter6's premise was: "vb2 fence series unlocks libva cached-mmap readback". **Premise is wrong.** Sonnet round-1 architect review pointed this out; my own memory `feedback_rfc_v2_vb2_dma_resv_scope.md` (which I corrected today) confirms: the fence series targets **Wayland compositor implicit-sync green-frames** on GPU consumers, NOT the libva readback path. iter6's investigation found a real upstream bug (NULL deref at dma_fence->context inside dma_resv_add_fence, see `iter6_v6_substrate_null_deref_at_0x20.md`), but that bug fix wouldn't have made iter5 work either. iter6 = off-path; iter7 = back on the right hypothesis ladder.
## Iter5's actual hypothesis ladder (carried to iter7)
| H | Hypothesis | Test |
|---|-----------|------|
| H1 | **DMA cache coherency** between hardware-written CAPTURE buffer and userspace cached-mmap. HW writes to physical RAM. CPU reads via cached mmap. If rkvdec isn't in the ACE-Lite coherent domain (or the kernel doesn't know it is), CPU cache holds stale pre-decode bytes. | (a) DT `dma-coherent` property on rkvdec node; (b) backend `DMA_BUF_IOCTL_SYNC` round-trip |
| H2 | HW writes to a different physical address than vb2-allocated CAPTURE buffer | log dst_addr in rkvdec.config_registers vs vb2_dma_contig_plane_dma_addr; compare |
| H3 | DEC_RDY=1 is a false positive — pipeline registers look valid but actual decode is no-op | examine register-config code |
H1 is the leading hypothesis (matches RK3399 pattern from `reference_dmabuf_resv_blocker.md`, matches the "solid value" symptom). Iter7 starts with H1 (a) — DT `dma-coherent`.
## Substrate for iter7 H1(a) test
- ampere kernel source: `~/src/linux-rockchip` branch `ampere-minimal-devices`, working tree has iter3+iter4 patches uncommitted + iter6 patches partially in (need cleanup before DTB rebuild — only need to rebuild dtb, source state of .c files doesn't matter for dtb-only build)
- DTS file to edit: `arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts`
- rkvdec node: `&rkvdec_ccu`, `&rkvdec0`, `&rkvdec1` (rk3588 has dual-core rkvdec). For initial test, add `dma-coherent` to ALL rkvdec nodes
- DTB output: `arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dtb`
- Install path: `/boot/firmware/rk3588-coolpi-cm5-genbook.dtb-7.0.0-rc3-devices+` (vanilla kernel's dtb, since we're testing on vanilla which has iter3+iter4 fixes baked into the loaded modules)
- Test reboot needed (kernel re-reads DTB during early init)
- Test command: `LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 -vf hwdownload,format=nv12 -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/o.nv12`
- Pass criterion: `head -c 4147200 /tmp/o.nv12 | od -An -tu1 -w1 | sort -u | head -10` shows MORE THAN 2 unique bytes (i.e. not just `16` and `128`)
- Better pass criterion: byte-compare against ffmpeg SW-decoded NV12 of the same source — if reasonably close (HW vs SW differ slightly but not solid-color), confirmed PASS
## Risk register
| # | Risk | Mitigation |
|---|------|-----------|
| R1 | `dma-coherent` on a non-coherent SoC IP causes DATA CORRUPTION (CPU caches not synced for DMA reads, missing writebacks for DMA writes) — could corrupt decoded data OR oops on missed cacheline | RK3588 ACE-Lite supports rkvdec coherent per upstream comments; if wrong, symptom is more corruption not less. Backup vanilla DTB before swap. Recovery via WeChat stick |
| R2 | DTB-only change doesn't kick in because boot loader caches the old DTB | extlinux on this system reads `fdt` path on each boot — uncached. Safe |
| R3 | iter6 source mods in working tree contaminate dtb rebuild | DTB rebuild only touches DT files, not .c files. Safe |
| R4 | If H1(a) fails (still black), need to try H1(b) backend SYNC. Backend rebuild + replace `.so` on ampere — already routine work | documented path |
## Open questions tabled for Phase 1
(Phase 1 starts after Phase 0 close + this test runs.)
1. If H1(a) succeeds: is the kernel rkvdec driver also affected (needs explicit dma_sync calls when DT lies about coherency)? Need to verify by stress-testing under varied load (concurrent vb2 + GPU) for no data corruption.
2. If H1(a) fails: is H1(b) the right next move, or did we mis-diagnose? Need to check `/sys/class/dma_heap/` or `/sys/firmware/devicetree/.../dma-coherent` state to confirm DT property took effect.
3. If both H1 variants fail: H2 (wrong DMA address) — instrument rkvdec to log dst_addr vs vb2 mapping.
## Phase 0 close
Substrate locked. Pivot from iter6 documented (real bug, off-path). H1(a) DT dma-coherent is cheapest test, minimal risk, ~10 min wallclock from ampere-reachable to pass/fail verdict.