6 Commits

Author SHA1 Message Date
marfrit 54fb20bcc0 iter1 phase 4: H7 confirmed by panfrost kernel source-read
Read Linux 6.12 panfrost source. Smoking gun chain:

(a) panfrost_gem.c:262 — obj->base.map_wc = !pfdev->coherent
    Imports get write-combine (uncached) CPU mapping if non-coherent.

(b) panfrost_drv.c:625 — pfdev->coherent comes from DT dma-coherent.
    On ohm: NO dma-coherent in /sys/firmware/devicetree/base/.
    So pfdev->coherent = false.

(c) panfrost_mmu.c:330 — int prot = IOMMU_READ | IOMMU_WRITE
    NO IOMMU_CACHE. Mali's IOMMU mapping is non-snooping.
    Mali reads directly from DRAM, bypassing CPU caches.

(d) KWin caches EGL_images per-fd (eglbackend.cpp:282).
    Cache sync only at dma_buf_map_attachment time (one-time).

(e) V4L2 doesn't attach dma_resv fences to CAPTURE buffers on DQBUF.
    No per-frame cache flush trigger.

Architectural picture: hantro writes through CPU L1/L2/L3 caches,
Mali reads through non-snooping IOMMU, sees stale/zero DRAM. Result:
green frames.

Counter-validation: --vo=gpu (CPU-mmap → glTexSubImage2D upload to
Mali-private BO) works correctly. CPU mmap triggers begin_cpu_access
sync. Mali-private BO is naturally cache-coherent. Per-frame implicit
sync via the CPU mmap path.

Why DMA_BUF_IOCTL_SYNC (phase 3) didn't help: that's CPU-side cache
management. Doesn't propagate to GPU IOMMU.

ROOT CAUSE: Same root cause as our upstream vb2_dma_resv RFC v2.
With V4L2 attaching dma_resv fence on DQBUF, mesa-panfrost's implicit
fence-wait at sample time enforces cache writeback. RFC v2 IS the fix.

Proposed iter2 path: build linux-pinetab2-danctnix-besser 7.0 with
RFC v2 patches applied, install on ohm, retest. ~2-3 hours total.

If green goes away, we have:
  - confirmation that our RFC fixes a shipping-product bug
  - a locally shippable kernel package
  - strong data point for the v2 cover letter

Posted to dmabuf-modifier-triage#1 comment 260.
2026-05-08 22:40:51 +00:00
marfrit 7b54ff6c2d iter1 phase 3 close: H6 ruled out by DMA_BUF_IOCTL_SYNC patch test
Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START|
SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and
drmprime_dmabuf_importer. Installed on ohm via [marfrit].

Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause
      --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4

Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4
        — BYTE-IDENTICAL to the baseline green-frame screenshot.
        Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709
        limited-range YUV(0,0,0) per the README math).

Userspace cache-sync ioctl has zero effect. H6 ruled out.

Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU-
mmap then glTexSubImage2D upload path) is known-working. So the
buffer DOES contain valid YUV data. Only the zero-copy dma_buf-
to-Mali path renders zeros. Concentrates hypothesis on the
dma_buf → Mali GPU import/translation step itself.

Live hypothesis space:
  H1..H3, H5, H6 ruled out
  H4 latent (low conf)
  H7 LEADING — panfrost dma_buf import / GPU-side cache or
  BO-type / cache-attribute mismatch

Next probes:
  1. Read panfrost kernel-mode dma_buf import path (~45 min)
  2. EGL importer harness with synthetic udmabuf NV12 (~1-2h)
  3. MESA_DEBUG verbose log (~15 min recon)

mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future
iter close replaces or removes.

Posted to dmabuf-modifier-triage#1 comment 259.
2026-05-08 22:31:36 +00:00
marfrit d26d662c04 iter1 phase 2: Mesa-panfrost source-read shifts theory to cache coherency
Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.

Findings:

(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
    KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).

(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
    Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.

(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
    yuvToRgb*. So result.y=U, result.z=V. Math is consistent.

(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
    pan_mod.c:663-667 honors offset_B with only alignment check.
    pan_texture.c:361 etc. set texture base = plane->base + offset_B.

Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.

Green-color math points elsewhere:
  BT.601 limited-range YUV(0,0,0)
    → R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
    → G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
    → B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
  = RGB(0, 135, 0) — EXACTLY the observed green tone.

Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.

New leading hypotheses:

H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.

H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.

Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
   wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).

Leaning toward option 1.

Posted to dmabuf-modifier-triage#1 comment 257.
2026-05-08 21:54:49 +00:00
marfrit 735f7f7ae3 iter1 phase 2: hypothesis 2 ruled out by KWin source-read
Shallow-cloned KWin 6.6.4 (references/kwin-6.6.4/, gitignored) and
read the wl_dmabuf protocol handler + EGL import paths.

Findings:

(a) LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add stores
    fd[i]/offset[i]/pitch[i] per plane index, no fd dedup, no
    cross-plane comparison.

(b) LinuxDmaBufParamsV1::test() does lseek(SEEK_END) per plane and
    range-checks against the result. Our exported size 3,657,728
    satisfies all checks for offset[1]=2,088,960 + pitch[1]=1920.

(c) EglDisplay::importDmaBufAsImage (egldisplay.cpp:166 combined
    and :218 per-plane) passes user-supplied fd/offset/pitch
    straight to eglCreateImage(EGL_LINUX_DMA_BUF_EXT) verbatim. No
    transformation.

(d) EglBackend::testImportBuffer (eglbackend.cpp:338) chooses
    between combined (line 342) and per-plane (line 353-356)
    based on whether NV12+LINEAR is in nonExternalOnlySupportedDrmFormats.
    Either path passes offset=2088960 to the driver.

KWin is innocent. Live hypothesis narrows to H1 (panfrost's
EGL_DMA_BUF_PLANE*_OFFSET_EXT handling for LINEAR NV12, most likely
via the per-plane RG88-from-offset-2088960 import path).

Posted to dmabuf-modifier-triage#1 comment 256.

Three of five hypotheses ruled out (H2, H3, H5). Remaining: H1
(panfrost EGL offset), H4 (kwin-fourier residual, low conf).
Next probe: minimal EGL importer harness on ohm.
2026-05-08 21:43:28 +00:00
marfrit 89a4b81654 iter1 phase 2: hypothesis 3 ruled out by EXPBUF lseek probe
Probe `/tmp/expbuf_probe.c` (snapshot at probes/expbuf_probe.c) opens
/dev/video1, sets OUTPUT format H264_SLICE 1920x1088, REQBUFS 4 capture
buffers, EXPBUF on plane 0 of buffer 0, lseek(fd, 0, SEEK_END).

On ohm (kernel besser-7.0, hantro-vpu / rk3568-vpu-dec):
  CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712
  EXPBUF fd lseek(SEEK_END) = 3657728  (page-rounded from 3655712)

Kernel exports the dma_buf at full sizeimage; offset 2,088,960
(plane 1 base in ffmpeg's drm-frame-descriptor) is well inside.
Hantro is innocent.

Side observation: sizeimage = 3,655,712 > naive NV12's 3,133,440.
The 522,272-byte excess is trailing padding (likely Rockchip
per-frame MV / context metadata) past the UV plane. Y and UV layout
fit cleanly within [0, 3,133,440), exactly where mpv/ffmpeg expect.

Remaining hypothesis space: H1 (panfrost EGL non-zero plane offset),
H2 (KWin wl_dmabuf import), H4 (kwin-fourier residual, low conf).

Next probe queued: H2 source-read of KWin 6.6.4 wl_dmabuf import
path. ~30 min, no hardware needed. If that turns up nothing,
write the EGL importer harness for H1.

Posted to dmabuf-modifier-triage#1 comment 255.
2026-05-08 21:11:09 +00:00
marfrit eddd9ef88f phase2 iter1: source-read overturns earlier mpv-bug conclusion
mpv 0.41.0's drmprime_dmabuf_importer reads correctly. Kwiboo's
ffmpeg V4L2 hwaccel at b57fbbe sets planes[1].object_index=0 for
hantro single-planar NV12 (LINEAR modifier branch, line 157), so
mpv should produce identical fd values for both .add() calls.

Runtime confirms via strace: ffmpeg does one VIDIOC_EXPBUF per
CAPTURE buffer, returning ONE fd. nb_objects=1.

The "different fds per plane" observed in WAYLAND_DEBUG is most
likely libwayland's wl_closure_marshal dup_cloexec'ing the fd at
protocol-marshal time — both .add()s use the same source fd, the
trace shows post-dup values which are consecutive but point at the
same dma_buf.

This means the earlier phase 0 conclusion ("mpv mixes per-plane
fds with single-allocation offset") was wrong. The wl_dmabuf
message is internally consistent. Bug is somewhere else.

New hypothesis space (in phase2_iter1_findings.md):
  - Mali-G52 panfrost EGL_dma_buf_import with non-zero plane offset
  - KWin wl_dmabuf import deduplication bug
  - hantro kernel exports dma_buf with size < full allocation
  - environment-reset incompleteness from earlier kwin-fourier A/B

Recommended next moves: probe fd size on ohm, mpv debug-logging
patch, KWin source-read, update issue #1 with revised analysis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:12:47 +00:00