From 7b54ff6c2d8a8fcafe817889ba8c4569fb14f0d7 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Fri, 8 May 2026 22:31:36 +0000 Subject: [PATCH] iter1 phase 3 close: H6 ruled out by DMA_BUF_IOCTL_SYNC patch test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START| SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and drmprime_dmabuf_importer. Installed on ohm via [marfrit]. Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4 Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4 — BYTE-IDENTICAL to the baseline green-frame screenshot. Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709 limited-range YUV(0,0,0) per the README math). Userspace cache-sync ioctl has zero effect. H6 ruled out. Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU- mmap then glTexSubImage2D upload path) is known-working. So the buffer DOES contain valid YUV data. Only the zero-copy dma_buf- to-Mali path renders zeros. Concentrates hypothesis on the dma_buf → Mali GPU import/translation step itself. Live hypothesis space: H1..H3, H5, H6 ruled out H4 latent (low conf) H7 LEADING — panfrost dma_buf import / GPU-side cache or BO-type / cache-attribute mismatch Next probes: 1. Read panfrost kernel-mode dma_buf import path (~45 min) 2. EGL importer harness with synthetic udmabuf NV12 (~1-2h) 3. MESA_DEBUG verbose log (~15 min recon) mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future iter close replaces or removes. Posted to dmabuf-modifier-triage#1 comment 259. --- phase2_iter1_findings.md | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/phase2_iter1_findings.md b/phase2_iter1_findings.md index 97ca4b5..b1f39ef 100644 --- a/phase2_iter1_findings.md +++ b/phase2_iter1_findings.md @@ -89,9 +89,15 @@ So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. 4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset. -6. **DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math). BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) — exactly the green tone we see. This means *panfrost is reading zero-fill bytes despite hantro having written valid YUV data*. If the dma_buf hasn't been cache-flushed/invalidated correctly between writer (hantro) and reader (Mali), Mali samples stale zeroed memory. V4L2 does NOT attach implicit fences to CAPTURE buffers on dequeue — this gap is exactly what our `vb2_dma_resv` RFC v2 addresses for upstream (see `project_vb2_dma_resv_v2_state.md`). Testable on ohm: between mpv's VIDIOC_DQBUF and the wl_dmabuf submit, mmap the EXPBUF fd from a sibling process via `pidfd_getfd`, force CPU cache sync via `DMA_BUF_IOCTL_SYNC` with `DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ`, read back; if it shows real Y data, the hantro-side write completed but the Mali-side read sees zeros — narrows to GPU-side cache invalidation gap. +6. ~~**DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math).~~ **RULED OUT 2026-05-08 phase 3** by the iter1 patch test. Patched mpv 0.41.0 (mpv-fourier-1:0.41.0-9) to call `DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW)` + matching `SYNC_END` on each EXPBUF fd in both `vaapi_dmabuf_importer` and `drmprime_dmabuf_importer` before `zwp_linux_buffer_params_v1_add()`. Built via Gitea Actions, installed on ohm. Ran `mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4`, captured screenshot. **Result: byte-identical to baseline (md5 c8c8e9b88521a0069f709d483451c3d4).** The userspace cache-sync ioctl has no effect. Either hantro's `dma_buf_ops->begin_cpu_access` is a no-op (likely on Rockchip — many dma-buf-heap allocations are non-coherent CPU-cached but rely on different sync paths), OR the gap is on the GPU consumer side and CPU-cache state is irrelevant. -7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads. Testable: search `pan_kmod_*_buf_import` / Mali kernel-mode driver for `dma_buf_attach` + `dma_buf_map_attachment` calls; look for missing cache-invalidate or missing DMA fence wait. (Sub-hypothesis of 6, but at a different layer.) + **Critical phase-3 observation**: `--hwdec=v4l2request --vo=gpu` (texture-upload path) is known-working and renders correctly. That path mmap's the dma_buf into mpv's CPU memory, then uploads to a GL texture via `glTexSubImage2D`. **The CPU CAN read valid YUV data from the buffer**; only the zero-copy dma_buf-to-GPU import path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself (H7 territory). + +7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation OR doesn't import with the right BO type** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads, OR the imported BO is created without the `DMA-COHERENT` flag panfrost expects, leaving Mali sampling un-snooped memory. Phase 3 narrowed: this is now the LEADING hypothesis. Sub-cases to investigate: + - 7a. Panfrost's `dma_buf_attach` + `dma_buf_map_attachment` calls miss cache-invalidate. Source is `drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c` and `panfrost_drv.c`'s `panfrost_gem_prime_import_sg_table`. + - 7b. The imported BO is mapped non-coherent in Mali's MMU, but the buffer was allocated cacheable (or vice-versa). Sync mismatch. + - 7c. Panfrost uses ioremap_wc / ioremap_cache the wrong way for hantro-allocated CMA pages. + - 7d. The Mali-G52 panfrost path does NOT support imported dma_buf for sampling at all — only for scanout direct-pass-through. (Less likely; would mean GL texture creation should fail, but in our case it succeeds and renders zeros.) ## Recommended next moves for iter1 @@ -105,15 +111,14 @@ d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The ## Status -- iter1 phase 2 closed 2026-05-08. Source-read of KWin 6.6.4 + Mesa 26.0.6 + arithmetic + the `lseek` probe rule out / reduce-confidence on the original hypothesis space: - - H2 (KWin wl_dmabuf import) — **innocent**. Both wl-protocol params validation and EGL import paths forward user-supplied offset verbatim to `eglCreateImage`. KWin's YUV→RGB shader (glshadermanager.cpp:189) reads `.r=Cb, .g=Cr` matching mesa's GR88→RG88_UNORM mapping. - - H3 (hantro size cap) — **innocent**. Runtime probe shows `lseek(SEEK_END)=3,657,728`, well past offset 2,088,960. - - H5 (offset mismatch with hantro NV12 stride) — **innocent**. Arithmetic shows Y/UV/trailing-metadata layout fits cleanly. - - H1 (panfrost EGL non-zero offset) — **less likely** after Mesa source-read. Linear path correctly captures `whandle->offset` and adds it to GPU base for sampling. Cannot be conclusively ruled out without runtime EGL probe but obvious places are clean. -- **New leading hypotheses: H6 (DMA cache coherency between hantro VPU and Mali GPU) and H7 (panfrost dma_buf import lacks GPU-side cache invalidation).** Derived from green-color math: BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) is exactly the observed green tone — so panfrost is reading **zeros** despite hantro having written real data. This points to a synchronization/coherency gap, not an offset/format bug. +- iter1 phase 3 closed 2026-05-08. The DMA_BUF_IOCTL_SYNC patch (mpv-fourier-1:0.41.0-9, both vaapi_dmabuf_importer + drmprime_dmabuf_importer) had **zero effect** — green-frame screenshot byte-identical to baseline. **H6 ruled out.** +- Five hypotheses ruled out (H2, H3, H5, H6, the ad-hoc offset variant). H1 less-likely after Mesa source-read but not conclusively excluded. +- **Leading hypothesis: H7** — panfrost's dma_buf import / GPU-side cache or BO-type handling. Pinned by the *known-good* counter-test: `mpv --hwdec=v4l2request --vo=gpu` (CPU-mmap → glTexSubImage2D upload path) renders correctly. So the buffer DOES contain valid data; only the zero-copy dma_buf→Mali path renders zeros. - Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged. -- Delivery vehicle re-evaluation: `mpv-fourier-1:0.41.0-8` is the right vehicle ONLY for an mpv-side defensive workaround. Given the new hypotheses, the fix more likely lands in: (a) the kernel V4L2 vb2 layer (attach implicit fence on DQBUF — exactly the `vb2_dma_resv` RFC at lore.kernel.org), (b) panfrost's kernel-mode driver dma_buf import path (cache invalidate), or (c) a userspace `DMA_BUF_IOCTL_SYNC` workaround between mpv's DQBUF and wl-submit. -- Next probe options ranked by cost-to-decisiveness: - 1. **Cache-sync ioctl test** (~30 min) — patch `mpv-fourier`'s `vo_dmabuf_wayland.c` to call `DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ)` on each EXPBUF fd before submitting to KWin. If green goes away, H6/H7 confirmed (workaround viable; root-cause kernel bug separately). - 2. **EGL importer harness** (~1-2h) — synthesize a known NV12 buffer in CPU memory, write to a `udmabuf`, EGL_image-import with PLANE0_OFFSET=2088960, render via fullscreen quad, glReadPixels. Decides H1 definitively. If reads return correct synthesized data, H1 ruled out cleanly; if zeros, H1 confirmed. - 3. **Mesa debug log** (~15 min) — run mpv with `MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo` and inspect what panfrost sees for our buffers. Cheap but may not be conclusive. +- Delivery vehicle re-evaluation again: with H6 gone, the userspace mpv workaround is no longer the right delivery vehicle for this iteration. The fix lands in: (a) panfrost kernel-mode driver (`drivers/gpu/drm/panfrost/`), (b) Mesa-panfrost userspace if there's an EGL_image attribute / format-import quirk, (c) hantro driver-side allocation flags (V4L2_MEMORY_DMABUF + appropriate cache attribute), or (d) a kernel bridge (e.g., DMA_BUF_IOCTL_SET_NAME with cache-aware variant). +- Next probe options ranked: + 1. **Read panfrost kernel-mode dma_buf import** (~45 min, cheap source-read, no hardware): inspect `drivers/gpu/drm/panfrost/panfrost_gem.c` `panfrost_gem_prime_import_sg_table` and Mali MMU mapping for cache attributes / IO-coherency settings. May spot the gap directly. + 2. **EGL importer harness with synthetic NV12 in udmabuf** (~1-2h): allocate via udmabuf (CPU-coherent), write known YUV pattern, eglCreateImage from the udmabuf, render, glReadPixels. If it reads back correct data → bug is hantro-allocated-buffer-specific (cache-attribute mismatch). If it ALSO reads zeros → general panfrost dma_buf import bug (less likely). + 3. **Run mpv with `MESA_DEBUG=verbose` + `PAN_MESA_DEBUG=sync,trace`** (~15 min): may show something at the import boundary. Cheap recon. + +mpv-fourier-1:0.41.0-9 keeps the no-op patch installed for now — it's harmless. Future iter close (iter2 or further phase under iter1) will replace it with whatever the actual fix is, OR pkgrel-bump back to remove the dead patch if the fix lands elsewhere (kernel/Mesa).