iter6 was an off-path investigation. Sonnet round-1 review + my
own corrected memory feedback_rfc_v2_vb2_dma_resv_scope.md make
clear: vb2 fence series targets Wayland compositor green-frames,
not libva cached-mmap readback. iter6 found a real upstream NULL
deref bug (filed for kernel-agent#16 when UART captures the trace)
but it's not on the critical path for iter5.
iter7 returns to iter5's actual hypothesis ladder:
- H1(a) DT dma-coherent on rkvdec node — cheapest, first
- H1(b) backend DMA_BUF_IOCTL_SYNC userspace fix — if H1(a) fails
- H2 wrong-DMA-address — if H1(a)+(b) fail
- H3 false-positive DEC_RDY — last resort
Test on vanilla kernel + iter3+iter4-fixed modules (already
verified working pipeline). Pass criterion: ffmpeg-vaapi output
shows more than {16, 128} unique bytes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6.3 KiB
Phase 0 — iter7 substrate (cache-coherency angle for iter5 black-output)
Opened 2026-05-17 ~00:45, following iter6 v1-v6 chain that exhausted the vb2-fence-series hypothesis (iter6 was an off-path investigation — see Why-pivot below).
Research question
Does forcing the RK3588 rkvdec DMA path into the cache-coherent domain (via DT dma-coherent property on the rkvdec node) make iter5's CAPTURE buffer readback show real decoded NV12 content instead of uniform Y=0x10/CbCr=0x80?
Locked-in evidence carried from iter5 (still binding)
| Observation | Source | Status |
|---|---|---|
| Iter3+iter4 kernel patches verified working: HEVC OOPS gone, EINVAL gone, decoder runs end-to-end | iter3_close.md (kernel-agent#14) + iter4_close.md (kernel-agent#15) | verified, both upstream-aligned |
| ffmpeg HEVC test: rc=0, /tmp/o.nv12 = 4147200 bytes (exact 3×NV12-frame size for 1280×720) | iter5_close.md | confirmed |
iter5-IRQ pr_warn diagnostic: every frame ends with STA_INT=0x107 DEC_RDY=1 TIMEOUT=0 ERROR=0 |
iter5_close.md | confirmed — hardware decode succeeds |
| Output bytes are uniform Y=0x10 (luma "video black") and Cb/Cr=0x80 (chroma neutral) — solid black | iter5_close.md | confirmed — symptom not garbage |
| OUTPUT bitstream is byte-identical to raw HEVC NAL with Annex-B prefix prepended | iter5_close.md | confirmed |
| populate_ext_sps_rps_cache returns -ENODATA because ffmpeg-vaapi strips SPS/VPS/PPS | iter5_close.md | confirmed Phase 5 reviewer's prediction |
| Backend's 5-control batch commits (post iter4 SLICE_PARAMS registration) | iter5_close.md | confirmed via iter4-DIAG validate_sps firing per-frame |
Why pivot from iter6 (vb2 fence series)
Iter6's premise was: "vb2 fence series unlocks libva cached-mmap readback". Premise is wrong. Sonnet round-1 architect review pointed this out; my own memory feedback_rfc_v2_vb2_dma_resv_scope.md (which I corrected today) confirms: the fence series targets Wayland compositor implicit-sync green-frames on GPU consumers, NOT the libva readback path. iter6's investigation found a real upstream bug (NULL deref at dma_fence->context inside dma_resv_add_fence, see iter6_v6_substrate_null_deref_at_0x20.md), but that bug fix wouldn't have made iter5 work either. iter6 = off-path; iter7 = back on the right hypothesis ladder.
Iter5's actual hypothesis ladder (carried to iter7)
| H | Hypothesis | Test |
|---|---|---|
| H1 | DMA cache coherency between hardware-written CAPTURE buffer and userspace cached-mmap. HW writes to physical RAM. CPU reads via cached mmap. If rkvdec isn't in the ACE-Lite coherent domain (or the kernel doesn't know it is), CPU cache holds stale pre-decode bytes. | (a) DT dma-coherent property on rkvdec node; (b) backend DMA_BUF_IOCTL_SYNC round-trip |
| H2 | HW writes to a different physical address than vb2-allocated CAPTURE buffer | log dst_addr in rkvdec.config_registers vs vb2_dma_contig_plane_dma_addr; compare |
| H3 | DEC_RDY=1 is a false positive — pipeline registers look valid but actual decode is no-op | examine register-config code |
H1 is the leading hypothesis (matches RK3399 pattern from reference_dmabuf_resv_blocker.md, matches the "solid value" symptom). Iter7 starts with H1 (a) — DT dma-coherent.
Substrate for iter7 H1(a) test
- ampere kernel source:
~/src/linux-rockchipbranchampere-minimal-devices, working tree has iter3+iter4 patches uncommitted + iter6 patches partially in (need cleanup before DTB rebuild — only need to rebuild dtb, source state of .c files doesn't matter for dtb-only build) - DTS file to edit:
arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts - rkvdec node:
&rkvdec_ccu,&rkvdec0,&rkvdec1(rk3588 has dual-core rkvdec). For initial test, adddma-coherentto ALL rkvdec nodes - DTB output:
arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dtb - Install path:
/boot/firmware/rk3588-coolpi-cm5-genbook.dtb-7.0.0-rc3-devices+(vanilla kernel's dtb, since we're testing on vanilla which has iter3+iter4 fixes baked into the loaded modules) - Test reboot needed (kernel re-reads DTB during early init)
- Test command:
LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i ~/measurements/encoded/bbb_60s_720p.hevc.mp4 -vf hwdownload,format=nv12 -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/o.nv12 - Pass criterion:
head -c 4147200 /tmp/o.nv12 | od -An -tu1 -w1 | sort -u | head -10shows MORE THAN 2 unique bytes (i.e. not just16and128) - Better pass criterion: byte-compare against ffmpeg SW-decoded NV12 of the same source — if reasonably close (HW vs SW differ slightly but not solid-color), confirmed PASS
Risk register
| # | Risk | Mitigation |
|---|---|---|
| R1 | dma-coherent on a non-coherent SoC IP causes DATA CORRUPTION (CPU caches not synced for DMA reads, missing writebacks for DMA writes) — could corrupt decoded data OR oops on missed cacheline |
RK3588 ACE-Lite supports rkvdec coherent per upstream comments; if wrong, symptom is more corruption not less. Backup vanilla DTB before swap. Recovery via WeChat stick |
| R2 | DTB-only change doesn't kick in because boot loader caches the old DTB | extlinux on this system reads fdt path on each boot — uncached. Safe |
| R3 | iter6 source mods in working tree contaminate dtb rebuild | DTB rebuild only touches DT files, not .c files. Safe |
| R4 | If H1(a) fails (still black), need to try H1(b) backend SYNC. Backend rebuild + replace .so on ampere — already routine work |
documented path |
Open questions tabled for Phase 1
(Phase 1 starts after Phase 0 close + this test runs.)
- If H1(a) succeeds: is the kernel rkvdec driver also affected (needs explicit dma_sync calls when DT lies about coherency)? Need to verify by stress-testing under varied load (concurrent vb2 + GPU) for no data corruption.
- If H1(a) fails: is H1(b) the right next move, or did we mis-diagnose? Need to check
/sys/class/dma_heap/or/sys/firmware/devicetree/.../dma-coherentstate to confirm DT property took effect. - If both H1 variants fail: H2 (wrong DMA address) — instrument rkvdec to log dst_addr vs vb2 mapping.
Phase 0 close
Substrate locked. Pivot from iter6 documented (real bug, off-path). H1(a) DT dma-coherent is cheapest test, minimal risk, ~10 min wallclock from ampere-reachable to pass/fail verdict.