Read Linux 6.12 panfrost source. Smoking gun chain:
(a) panfrost_gem.c:262 — obj->base.map_wc = !pfdev->coherent
Imports get write-combine (uncached) CPU mapping if non-coherent.
(b) panfrost_drv.c:625 — pfdev->coherent comes from DT dma-coherent.
On ohm: NO dma-coherent in /sys/firmware/devicetree/base/.
So pfdev->coherent = false.
(c) panfrost_mmu.c:330 — int prot = IOMMU_READ | IOMMU_WRITE
NO IOMMU_CACHE. Mali's IOMMU mapping is non-snooping.
Mali reads directly from DRAM, bypassing CPU caches.
(d) KWin caches EGL_images per-fd (eglbackend.cpp:282).
Cache sync only at dma_buf_map_attachment time (one-time).
(e) V4L2 doesn't attach dma_resv fences to CAPTURE buffers on DQBUF.
No per-frame cache flush trigger.
Architectural picture: hantro writes through CPU L1/L2/L3 caches,
Mali reads through non-snooping IOMMU, sees stale/zero DRAM. Result:
green frames.
Counter-validation: --vo=gpu (CPU-mmap → glTexSubImage2D upload to
Mali-private BO) works correctly. CPU mmap triggers begin_cpu_access
sync. Mali-private BO is naturally cache-coherent. Per-frame implicit
sync via the CPU mmap path.
Why DMA_BUF_IOCTL_SYNC (phase 3) didn't help: that's CPU-side cache
management. Doesn't propagate to GPU IOMMU.
ROOT CAUSE: Same root cause as our upstream vb2_dma_resv RFC v2.
With V4L2 attaching dma_resv fence on DQBUF, mesa-panfrost's implicit
fence-wait at sample time enforces cache writeback. RFC v2 IS the fix.
Proposed iter2 path: build linux-pinetab2-danctnix-besser 7.0 with
RFC v2 patches applied, install on ohm, retest. ~2-3 hours total.
If green goes away, we have:
- confirmation that our RFC fixes a shipping-product bug
- a locally shippable kernel package
- strong data point for the v2 cover letter
Posted to dmabuf-modifier-triage#1 comment 260.
Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START|
SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and
drmprime_dmabuf_importer. Installed on ohm via [marfrit].
Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause
--start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4
Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4
— BYTE-IDENTICAL to the baseline green-frame screenshot.
Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709
limited-range YUV(0,0,0) per the README math).
Userspace cache-sync ioctl has zero effect. H6 ruled out.
Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU-
mmap then glTexSubImage2D upload path) is known-working. So the
buffer DOES contain valid YUV data. Only the zero-copy dma_buf-
to-Mali path renders zeros. Concentrates hypothesis on the
dma_buf → Mali GPU import/translation step itself.
Live hypothesis space:
H1..H3, H5, H6 ruled out
H4 latent (low conf)
H7 LEADING — panfrost dma_buf import / GPU-side cache or
BO-type / cache-attribute mismatch
Next probes:
1. Read panfrost kernel-mode dma_buf import path (~45 min)
2. EGL importer harness with synthetic udmabuf NV12 (~1-2h)
3. MESA_DEBUG verbose log (~15 min recon)
mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future
iter close replaces or removes.
Posted to dmabuf-modifier-triage#1 comment 259.
Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.
Findings:
(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).
(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.
(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
yuvToRgb*. So result.y=U, result.z=V. Math is consistent.
(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
pan_mod.c:663-667 honors offset_B with only alignment check.
pan_texture.c:361 etc. set texture base = plane->base + offset_B.
Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.
Green-color math points elsewhere:
BT.601 limited-range YUV(0,0,0)
→ R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
→ G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
→ B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
= RGB(0, 135, 0) — EXACTLY the observed green tone.
Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.
New leading hypotheses:
H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.
H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.
Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).
Leaning toward option 1.
Posted to dmabuf-modifier-triage#1 comment 257.
Shallow-cloned KWin 6.6.4 (references/kwin-6.6.4/, gitignored) and
read the wl_dmabuf protocol handler + EGL import paths.
Findings:
(a) LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add stores
fd[i]/offset[i]/pitch[i] per plane index, no fd dedup, no
cross-plane comparison.
(b) LinuxDmaBufParamsV1::test() does lseek(SEEK_END) per plane and
range-checks against the result. Our exported size 3,657,728
satisfies all checks for offset[1]=2,088,960 + pitch[1]=1920.
(c) EglDisplay::importDmaBufAsImage (egldisplay.cpp:166 combined
and :218 per-plane) passes user-supplied fd/offset/pitch
straight to eglCreateImage(EGL_LINUX_DMA_BUF_EXT) verbatim. No
transformation.
(d) EglBackend::testImportBuffer (eglbackend.cpp:338) chooses
between combined (line 342) and per-plane (line 353-356)
based on whether NV12+LINEAR is in nonExternalOnlySupportedDrmFormats.
Either path passes offset=2088960 to the driver.
KWin is innocent. Live hypothesis narrows to H1 (panfrost's
EGL_DMA_BUF_PLANE*_OFFSET_EXT handling for LINEAR NV12, most likely
via the per-plane RG88-from-offset-2088960 import path).
Posted to dmabuf-modifier-triage#1 comment 256.
Three of five hypotheses ruled out (H2, H3, H5). Remaining: H1
(panfrost EGL offset), H4 (kwin-fourier residual, low conf).
Next probe: minimal EGL importer harness on ohm.
Probe `/tmp/expbuf_probe.c` (snapshot at probes/expbuf_probe.c) opens
/dev/video1, sets OUTPUT format H264_SLICE 1920x1088, REQBUFS 4 capture
buffers, EXPBUF on plane 0 of buffer 0, lseek(fd, 0, SEEK_END).
On ohm (kernel besser-7.0, hantro-vpu / rk3568-vpu-dec):
CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712
EXPBUF fd lseek(SEEK_END) = 3657728 (page-rounded from 3655712)
Kernel exports the dma_buf at full sizeimage; offset 2,088,960
(plane 1 base in ffmpeg's drm-frame-descriptor) is well inside.
Hantro is innocent.
Side observation: sizeimage = 3,655,712 > naive NV12's 3,133,440.
The 522,272-byte excess is trailing padding (likely Rockchip
per-frame MV / context metadata) past the UV plane. Y and UV layout
fit cleanly within [0, 3,133,440), exactly where mpv/ffmpeg expect.
Remaining hypothesis space: H1 (panfrost EGL non-zero plane offset),
H2 (KWin wl_dmabuf import), H4 (kwin-fourier residual, low conf).
Next probe queued: H2 source-read of KWin 6.6.4 wl_dmabuf import
path. ~30 min, no hardware needed. If that turns up nothing,
write the EGL importer harness for H1.
Posted to dmabuf-modifier-triage#1 comment 255.
mpv 0.41.0's drmprime_dmabuf_importer reads correctly. Kwiboo's
ffmpeg V4L2 hwaccel at b57fbbe sets planes[1].object_index=0 for
hantro single-planar NV12 (LINEAR modifier branch, line 157), so
mpv should produce identical fd values for both .add() calls.
Runtime confirms via strace: ffmpeg does one VIDIOC_EXPBUF per
CAPTURE buffer, returning ONE fd. nb_objects=1.
The "different fds per plane" observed in WAYLAND_DEBUG is most
likely libwayland's wl_closure_marshal dup_cloexec'ing the fd at
protocol-marshal time — both .add()s use the same source fd, the
trace shows post-dup values which are consecutive but point at the
same dma_buf.
This means the earlier phase 0 conclusion ("mpv mixes per-plane
fds with single-allocation offset") was wrong. The wl_dmabuf
message is internally consistent. Bug is somewhere else.
New hypothesis space (in phase2_iter1_findings.md):
- Mali-G52 panfrost EGL_dma_buf_import with non-zero plane offset
- KWin wl_dmabuf import deduplication bug
- hantro kernel exports dma_buf with size < full allocation
- environment-reset incompleteness from earlier kwin-fourier A/B
Recommended next moves: probe fd size on ohm, mpv debug-logging
patch, KWin source-read, update issue #1 with revised analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>