Read Linux 6.12 panfrost source. Smoking gun chain:
(a) panfrost_gem.c:262 — obj->base.map_wc = !pfdev->coherent
Imports get write-combine (uncached) CPU mapping if non-coherent.
(b) panfrost_drv.c:625 — pfdev->coherent comes from DT dma-coherent.
On ohm: NO dma-coherent in /sys/firmware/devicetree/base/.
So pfdev->coherent = false.
(c) panfrost_mmu.c:330 — int prot = IOMMU_READ | IOMMU_WRITE
NO IOMMU_CACHE. Mali's IOMMU mapping is non-snooping.
Mali reads directly from DRAM, bypassing CPU caches.
(d) KWin caches EGL_images per-fd (eglbackend.cpp:282).
Cache sync only at dma_buf_map_attachment time (one-time).
(e) V4L2 doesn't attach dma_resv fences to CAPTURE buffers on DQBUF.
No per-frame cache flush trigger.
Architectural picture: hantro writes through CPU L1/L2/L3 caches,
Mali reads through non-snooping IOMMU, sees stale/zero DRAM. Result:
green frames.
Counter-validation: --vo=gpu (CPU-mmap → glTexSubImage2D upload to
Mali-private BO) works correctly. CPU mmap triggers begin_cpu_access
sync. Mali-private BO is naturally cache-coherent. Per-frame implicit
sync via the CPU mmap path.
Why DMA_BUF_IOCTL_SYNC (phase 3) didn't help: that's CPU-side cache
management. Doesn't propagate to GPU IOMMU.
ROOT CAUSE: Same root cause as our upstream vb2_dma_resv RFC v2.
With V4L2 attaching dma_resv fence on DQBUF, mesa-panfrost's implicit
fence-wait at sample time enforces cache writeback. RFC v2 IS the fix.
Proposed iter2 path: build linux-pinetab2-danctnix-besser 7.0 with
RFC v2 patches applied, install on ohm, retest. ~2-3 hours total.
If green goes away, we have:
- confirmation that our RFC fixes a shipping-product bug
- a locally shippable kernel package
- strong data point for the v2 cover letter
Posted to dmabuf-modifier-triage#1 comment 260.
Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START|
SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and
drmprime_dmabuf_importer. Installed on ohm via [marfrit].
Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause
--start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4
Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4
— BYTE-IDENTICAL to the baseline green-frame screenshot.
Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709
limited-range YUV(0,0,0) per the README math).
Userspace cache-sync ioctl has zero effect. H6 ruled out.
Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU-
mmap then glTexSubImage2D upload path) is known-working. So the
buffer DOES contain valid YUV data. Only the zero-copy dma_buf-
to-Mali path renders zeros. Concentrates hypothesis on the
dma_buf → Mali GPU import/translation step itself.
Live hypothesis space:
H1..H3, H5, H6 ruled out
H4 latent (low conf)
H7 LEADING — panfrost dma_buf import / GPU-side cache or
BO-type / cache-attribute mismatch
Next probes:
1. Read panfrost kernel-mode dma_buf import path (~45 min)
2. EGL importer harness with synthetic udmabuf NV12 (~1-2h)
3. MESA_DEBUG verbose log (~15 min recon)
mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future
iter close replaces or removes.
Posted to dmabuf-modifier-triage#1 comment 259.
Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.
Findings:
(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).
(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.
(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
yuvToRgb*. So result.y=U, result.z=V. Math is consistent.
(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
pan_mod.c:663-667 honors offset_B with only alignment check.
pan_texture.c:361 etc. set texture base = plane->base + offset_B.
Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.
Green-color math points elsewhere:
BT.601 limited-range YUV(0,0,0)
→ R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
→ G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
→ B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
= RGB(0, 135, 0) — EXACTLY the observed green tone.
Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.
New leading hypotheses:
H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.
H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.
Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).
Leaning toward option 1.
Posted to dmabuf-modifier-triage#1 comment 257.
Shallow-cloned KWin 6.6.4 (references/kwin-6.6.4/, gitignored) and
read the wl_dmabuf protocol handler + EGL import paths.
Findings:
(a) LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add stores
fd[i]/offset[i]/pitch[i] per plane index, no fd dedup, no
cross-plane comparison.
(b) LinuxDmaBufParamsV1::test() does lseek(SEEK_END) per plane and
range-checks against the result. Our exported size 3,657,728
satisfies all checks for offset[1]=2,088,960 + pitch[1]=1920.
(c) EglDisplay::importDmaBufAsImage (egldisplay.cpp:166 combined
and :218 per-plane) passes user-supplied fd/offset/pitch
straight to eglCreateImage(EGL_LINUX_DMA_BUF_EXT) verbatim. No
transformation.
(d) EglBackend::testImportBuffer (eglbackend.cpp:338) chooses
between combined (line 342) and per-plane (line 353-356)
based on whether NV12+LINEAR is in nonExternalOnlySupportedDrmFormats.
Either path passes offset=2088960 to the driver.
KWin is innocent. Live hypothesis narrows to H1 (panfrost's
EGL_DMA_BUF_PLANE*_OFFSET_EXT handling for LINEAR NV12, most likely
via the per-plane RG88-from-offset-2088960 import path).
Posted to dmabuf-modifier-triage#1 comment 256.
Three of five hypotheses ruled out (H2, H3, H5). Remaining: H1
(panfrost EGL offset), H4 (kwin-fourier residual, low conf).
Next probe: minimal EGL importer harness on ohm.
Probe `/tmp/expbuf_probe.c` (snapshot at probes/expbuf_probe.c) opens
/dev/video1, sets OUTPUT format H264_SLICE 1920x1088, REQBUFS 4 capture
buffers, EXPBUF on plane 0 of buffer 0, lseek(fd, 0, SEEK_END).
On ohm (kernel besser-7.0, hantro-vpu / rk3568-vpu-dec):
CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712
EXPBUF fd lseek(SEEK_END) = 3657728 (page-rounded from 3655712)
Kernel exports the dma_buf at full sizeimage; offset 2,088,960
(plane 1 base in ffmpeg's drm-frame-descriptor) is well inside.
Hantro is innocent.
Side observation: sizeimage = 3,655,712 > naive NV12's 3,133,440.
The 522,272-byte excess is trailing padding (likely Rockchip
per-frame MV / context metadata) past the UV plane. Y and UV layout
fit cleanly within [0, 3,133,440), exactly where mpv/ffmpeg expect.
Remaining hypothesis space: H1 (panfrost EGL non-zero plane offset),
H2 (KWin wl_dmabuf import), H4 (kwin-fourier residual, low conf).
Next probe queued: H2 source-read of KWin 6.6.4 wl_dmabuf import
path. ~30 min, no hardware needed. If that turns up nothing,
write the EGL importer harness for H1.
Posted to dmabuf-modifier-triage#1 comment 255.
mpv 0.41.0's drmprime_dmabuf_importer reads correctly. Kwiboo's
ffmpeg V4L2 hwaccel at b57fbbe sets planes[1].object_index=0 for
hantro single-planar NV12 (LINEAR modifier branch, line 157), so
mpv should produce identical fd values for both .add() calls.
Runtime confirms via strace: ffmpeg does one VIDIOC_EXPBUF per
CAPTURE buffer, returning ONE fd. nb_objects=1.
The "different fds per plane" observed in WAYLAND_DEBUG is most
likely libwayland's wl_closure_marshal dup_cloexec'ing the fd at
protocol-marshal time — both .add()s use the same source fd, the
trace shows post-dup values which are consecutive but point at the
same dma_buf.
This means the earlier phase 0 conclusion ("mpv mixes per-plane
fds with single-allocation offset") was wrong. The wl_dmabuf
message is internally consistent. Bug is somewhere else.
New hypothesis space (in phase2_iter1_findings.md):
- Mali-G52 panfrost EGL_dma_buf_import with non-zero plane offset
- KWin wl_dmabuf import deduplication bug
- hantro kernel exports dma_buf with size < full allocation
- environment-reset incompleteness from earlier kwin-fourier A/B
Recommended next moves: probe fd size on ohm, mpv debug-logging
patch, KWin source-read, update issue #1 with revised analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator pushback: phase 0 should unblock iter1, not gate it. The
locked question is "fix what's locally in scope" — kill the green,
not just identify a layer. The captured wl_dmabuf message is
internally inconsistent on its own (per-plane fds + single-allocation
offset for plane 1 is a contradiction no valid producer can claim
simultaneously). mpv's translation layer produces this regardless of
which producer feeds it, so iter1 can write the fix from the ffmpeg-
side data alone. The libva-path WAYLAND_DEBUG comparison after iter9
is a follow-up validation that confirms the fix handles both producer
shapes, not a prerequisite for writing it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator decision 2026-05-08: the iter1 fix is shipped only when frame
10 of the bbb fixture under --vo=dmabuf-wayland matches the expected
reference (captured via --vo=gpu on same hardware/session).
- screenshots/frame10_dmabuf_green.png — current broken state
- screenshots/frame10_expected.png — target post-fix state
- screenshots/README.md — verification protocol (SSIM > 0.95 + valid
WAYLAND_DEBUG .add() semantics + no regression on vo=gpu / vo=wlshm)
- phase0_findings.md — references the criterion in the conclusion section
This locks the campaign's deliverable. Patches that change wire-protocol
.add() shape but don't restore the picture do not satisfy iter1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference image of the bug as visible on ohm. Captured 2026-05-08 via
mpv --hwdec=v4l2request --vo=dmabuf-wayland --pause --start=00:00:00.42
--fullscreen fourier-test/bbb_1080p30_h264.mp4 then spectacle -b -f -n.
Uniform dark green at approx RGB(0, 75, 0) — exactly what the all-zero
NV12 hypothesis (Y=U=V=0 through BT.601/709 conversion) predicts and
what marfrit/dmabuf-modifier-triage#1 diagnoses (KWin reads past-EOF
on the UV plane fd, returns zeros for chroma).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight directed A/B tests on ohm ruled out every layer that doesn't
matter (libva, decoder content, color tags, kwin-fourier 0001 patch,
Mesa 26.0.6 vs 26.0.5, Wayland generally, kernel 6.19.10 vs 7.0,
KWin Vulkan vs OpenGL backend). WAYLAND_DEBUG=1 capture surfaced the
real bug at the protocol-message layer:
add(fd 41, plane=0, offset=0, stride=1920, mod=0,0)
add(fd 42, plane=1, offset=2088960, stride=1920, mod=0,0)
mpv mixes per-plane fds (V4L2 MPLANE export semantics) with
single-allocation offset for plane 1 (single-fd export semantics).
KWin reads UV from past-EOF on the UV-plane fd → all-zero NV12 →
dark green render (Y=U=V=0 in BT.601/709 ≈ RGB(0,70,0)).
Filed at marfrit/dmabuf-modifier-triage#1 with full diagnosis +
suggested fix path. Iter1 fix work is blocked on libva-multiplanar
iter9 (need clean libva path to verify which producer is at fault
or whether mpv VO is the sole bug). Working interim path is
mpv --hwdec=v4l2request --vo=gpu (correct picture, slow shader path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User runs kwin-fourier with active patch
0001-transaction-bypass-watchDmaBuf-fence-wait.patch — this bypasses
KWin's implicit-sync fence wait on dmabufs. If KWin samples a hantro
CAPTURE buffer before the decoder fence signals, it gets all-zeros
NV12 which renders solid green in YUV→RGB. This is the most likely
cause and is decisively testable via a stock-kwin A/B (item 1).
Hypothesis: pre-iter5 the libva path was masked because vaSyncSurface
provided ordering. iter6/7 may have changed that. Both libva and
ffmpeg-v4l2request paths now expose the lack of fence-wait.
Run-history breadcrumb (#51 introduce, #58 switch to 0002, #59 revert
to 0001) recorded in the findings doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Focused triage of marfrit/libva-multiplanar#1 — dmabuf-wayland green
on ohm independent of decoder backend. Locked question: identify the
layer responsible (libva / ffmpeg / KWin / Mesa-panfrost / kernel)
and file upstream where appropriate. Performance is explicitly out of
scope — user has working slow path via vo=gpu hwdec=v4l2request.
Phase 0 deliverables: vaExportSurfaceHandle + AVDRMFrameDescriptor
modifier captures, Wayland linux-dmabuf-v1 advertise snapshot, pacman
upgrade timeline review for the iter5→iter8 regression window, and
stock-kwin A/B isolating kwin-fourier as a candidate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>