iter1 phase 2: Mesa-panfrost source-read shifts theory to cache coherency

Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.

Findings:

(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
    KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).

(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
    Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.

(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
    yuvToRgb*. So result.y=U, result.z=V. Math is consistent.

(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
    pan_mod.c:663-667 honors offset_B with only alignment check.
    pan_texture.c:361 etc. set texture base = plane->base + offset_B.

Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.

Green-color math points elsewhere:
  BT.601 limited-range YUV(0,0,0)
    → R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
    → G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
    → B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
  = RGB(0, 135, 0) — EXACTLY the observed green tone.

Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.

New leading hypotheses:

H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.

H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.

Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
   wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).

Leaning toward option 1.

Posted to dmabuf-modifier-triage#1 comment 257.
This commit is contained in:
2026-05-08 21:54:49 +00:00
parent 735f7f7ae3
commit d26d662c04
+23 -8
View File
@@ -73,7 +73,13 @@ So the green frame is **not caused by mpv or ffmpeg's descriptor construction**.
## New hypothesis space (one of these is the real bug)
1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** The driver may sample plane 1 from offset 0 of the imported fd instead of offset 2088960, giving zero-fill UV. Testable: a minimal EGL importer C program that imports a known NV12 dmabuf with offsets and reads back via `glReadPixels`.
1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`:
- `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88).
- `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`).
- `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import.
- `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike.
- `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`.
- **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data.
2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent.
@@ -83,6 +89,10 @@ So the green frame is **not caused by mpv or ffmpeg's descriptor construction**.
4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset.
6. **DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math). BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) — exactly the green tone we see. This means *panfrost is reading zero-fill bytes despite hantro having written valid YUV data*. If the dma_buf hasn't been cache-flushed/invalidated correctly between writer (hantro) and reader (Mali), Mali samples stale zeroed memory. V4L2 does NOT attach implicit fences to CAPTURE buffers on dequeue — this gap is exactly what our `vb2_dma_resv` RFC v2 addresses for upstream (see `project_vb2_dma_resv_v2_state.md`). Testable on ohm: between mpv's VIDIOC_DQBUF and the wl_dmabuf submit, mmap the EXPBUF fd from a sibling process via `pidfd_getfd`, force CPU cache sync via `DMA_BUF_IOCTL_SYNC` with `DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ`, read back; if it shows real Y data, the hantro-side write completed but the Mali-side read sees zeros — narrows to GPU-side cache invalidation gap.
7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads. Testable: search `pan_kmod_*_buf_import` / Mali kernel-mode driver for `dma_buf_attach` + `dma_buf_map_attachment` calls; look for missing cache-invalidate or missing DMA fence wait. (Sub-hypothesis of 6, but at a different layer.)
## Recommended next moves for iter1
a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly.
@@ -95,10 +105,15 @@ d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The
## Status
- iter1 phase 2 closed 2026-05-08. Three of five hypotheses ruled out:
- H2 (KWin wl_dmabuf import) — KWin 6.6.4 source-read clean; both wl-protocol params validation and EGL import paths forward user-supplied offset verbatim to `eglCreateImage`. Innocent.
- H3 (hantro size cap) — `/tmp/expbuf_probe.c` runtime probe on ohm: `lseek(SEEK_END)=3,657,728`, well past offset 2,088,960. Innocent.
- H5 (offset mismatch with hantro NV12 stride) — arithmetic disproves: Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing Rockchip metadata at [3,133,440, 3,655,712). ffmpeg's plane[1].offset is correct.
- **Live hypothesis: H1 panfrost's `EGL_DMA_BUF_PLANE*_OFFSET_EXT` handling for LINEAR NV12 (or per-plane RG88 with non-zero offset).** Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged.
- Delivery vehicle (`mpv-fourier-1:0.41.0-8`) is still the right shipping path **if** the fix turns out to be a defensive workaround in mpv. With kernel + ffmpeg + mpv + KWin all exonerated, the patch most likely lands in Mesa-panfrost (`vulkan-panfrost` package — already in marfrit). Less-likely fallback: kernel hantro driver-side adjustment.
- Next probe: a minimal EGL importer harness that recreates the per-plane RG88-from-offset-2088960 import. Run with a known-good NV12 buffer (synthesized in CPU memory, written to a CMA dmabuf via udmabuf) and `glReadPixels` the resulting EGL_image. If the read shows zeros where UV data should be, panfrost is confirmed culprit. ~1-2h C code, runs on ohm.
- iter1 phase 2 closed 2026-05-08. Source-read of KWin 6.6.4 + Mesa 26.0.6 + arithmetic + the `lseek` probe rule out / reduce-confidence on the original hypothesis space:
- H2 (KWin wl_dmabuf import) — **innocent**. Both wl-protocol params validation and EGL import paths forward user-supplied offset verbatim to `eglCreateImage`. KWin's YUV→RGB shader (glshadermanager.cpp:189) reads `.r=Cb, .g=Cr` matching mesa's GR88→RG88_UNORM mapping.
- H3 (hantro size cap) — **innocent**. Runtime probe shows `lseek(SEEK_END)=3,657,728`, well past offset 2,088,960.
- H5 (offset mismatch with hantro NV12 stride) — **innocent**. Arithmetic shows Y/UV/trailing-metadata layout fits cleanly.
- H1 (panfrost EGL non-zero offset) — **less likely** after Mesa source-read. Linear path correctly captures `whandle->offset` and adds it to GPU base for sampling. Cannot be conclusively ruled out without runtime EGL probe but obvious places are clean.
- **New leading hypotheses: H6 (DMA cache coherency between hantro VPU and Mali GPU) and H7 (panfrost dma_buf import lacks GPU-side cache invalidation).** Derived from green-color math: BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) is exactly the observed green tone — so panfrost is reading **zeros** despite hantro having written real data. This points to a synchronization/coherency gap, not an offset/format bug.
- Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged.
- Delivery vehicle re-evaluation: `mpv-fourier-1:0.41.0-8` is the right vehicle ONLY for an mpv-side defensive workaround. Given the new hypotheses, the fix more likely lands in: (a) the kernel V4L2 vb2 layer (attach implicit fence on DQBUF — exactly the `vb2_dma_resv` RFC at lore.kernel.org), (b) panfrost's kernel-mode driver dma_buf import path (cache invalidate), or (c) a userspace `DMA_BUF_IOCTL_SYNC` workaround between mpv's DQBUF and wl-submit.
- Next probe options ranked by cost-to-decisiveness:
1. **Cache-sync ioctl test** (~30 min) — patch `mpv-fourier`'s `vo_dmabuf_wayland.c` to call `DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ)` on each EXPBUF fd before submitting to KWin. If green goes away, H6/H7 confirmed (workaround viable; root-cause kernel bug separately).
2. **EGL importer harness** (~1-2h) — synthesize a known NV12 buffer in CPU memory, write to a `udmabuf`, EGL_image-import with PLANE0_OFFSET=2088960, render via fullscreen quad, glReadPixels. Decides H1 definitively. If reads return correct synthesized data, H1 ruled out cleanly; if zeros, H1 confirmed.
3. **Mesa debug log** (~15 min) — run mpv with `MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo` and inspect what panfrost sees for our buffers. Cheap but may not be conclusive.