Files
dmabuf-modifier-triage/phase2_iter1_findings.md
T
marfrit d26d662c04 iter1 phase 2: Mesa-panfrost source-read shifts theory to cache coherency
Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.

Findings:

(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
    KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).

(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
    Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.

(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
    yuvToRgb*. So result.y=U, result.z=V. Math is consistent.

(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
    pan_mod.c:663-667 honors offset_B with only alignment check.
    pan_texture.c:361 etc. set texture base = plane->base + offset_B.

Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.

Green-color math points elsewhere:
  BT.601 limited-range YUV(0,0,0)
    → R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
    → G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
    → B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
  = RGB(0, 135, 0) — EXACTLY the observed green tone.

Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.

New leading hypotheses:

H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.

H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.

Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
   wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).

Leaning toward option 1.

Posted to dmabuf-modifier-triage#1 comment 257.
2026-05-08 21:54:49 +00:00

120 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis)
**Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested.
## What the source actually says
**mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`:
```c
for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) {
AVDRMPlaneDescriptor plane = layer.planes[plane_no];
int object_index = plane.object_index;
AVDRMObjectDescriptor object = desc->objects[object_index];
uint64_t modifier = object.format_modifier;
zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset,
plane.pitch, modifier >> 32, modifier & 0xffffffff);
}
```
No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says.
**Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm):
```c
desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro
desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd
// in v4l2request_set_drm_descriptor:
desc->nb_layers = 1;
layer->nb_planes = 1;
layer->planes[0].object_index = 0;
layer->planes[0].offset = 0;
layer->planes[0].pitch = bytesperline; // 1920
if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true
layer->nb_planes = 2;
layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0
layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960
layer->planes[1].pitch = layer->planes[0].pitch;
}
```
Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`.
## What the runtime probe says
`v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`:
```
Pixel Format : 'NV12' (Y/UV 4:2:0)
Number of planes : 1
sizeimage=3655712, bytesperline=1920
```
`strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`.
But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer:
```
add(fd 41, 0, 0, 1920, 0, 0)
add(fd 42, 1, 2088960, 1920, 0, 0)
```
These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table).
## Implications for iter1
The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer:
- nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls
- libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory
- Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation
So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else.
## New hypothesis space (one of these is the real bug)
1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`:
- `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88).
- `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`).
- `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import.
- `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike.
- `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`.
- **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data.
2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent.
3. ~~**hantro kernel driver exports a `dma_buf` with `size` < full allocation.**~~ **RULED OUT 2026-05-08** by `/tmp/expbuf_probe.c` on ohm. Driver `hantro-vpu` on `rk3568-vpu-dec` reports `CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712`; `VIDIOC_EXPBUF` yields fd whose `lseek(fd, 0, SEEK_END) = 3,657,728` (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent.
Side observation worth recording: `sizeimage = 3,655,712` is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits **past** the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is correct.
4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset.
6. **DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math). BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) — exactly the green tone we see. This means *panfrost is reading zero-fill bytes despite hantro having written valid YUV data*. If the dma_buf hasn't been cache-flushed/invalidated correctly between writer (hantro) and reader (Mali), Mali samples stale zeroed memory. V4L2 does NOT attach implicit fences to CAPTURE buffers on dequeue — this gap is exactly what our `vb2_dma_resv` RFC v2 addresses for upstream (see `project_vb2_dma_resv_v2_state.md`). Testable on ohm: between mpv's VIDIOC_DQBUF and the wl_dmabuf submit, mmap the EXPBUF fd from a sibling process via `pidfd_getfd`, force CPU cache sync via `DMA_BUF_IOCTL_SYNC` with `DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ`, read back; if it shows real Y data, the hantro-side write completed but the Mali-side read sees zeros — narrows to GPU-side cache invalidation gap.
7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads. Testable: search `pan_kmod_*_buf_import` / Mali kernel-mode driver for `dma_buf_attach` + `dma_buf_map_attachment` calls; look for missing cache-invalidate or missing DMA fence wait. (Sub-hypothesis of 6, but at a different layer.)
## Recommended next moves for iter1
a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly.
b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI).
c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read.
d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned.
## Status
- iter1 phase 2 closed 2026-05-08. Source-read of KWin 6.6.4 + Mesa 26.0.6 + arithmetic + the `lseek` probe rule out / reduce-confidence on the original hypothesis space:
- H2 (KWin wl_dmabuf import) — **innocent**. Both wl-protocol params validation and EGL import paths forward user-supplied offset verbatim to `eglCreateImage`. KWin's YUV→RGB shader (glshadermanager.cpp:189) reads `.r=Cb, .g=Cr` matching mesa's GR88→RG88_UNORM mapping.
- H3 (hantro size cap) — **innocent**. Runtime probe shows `lseek(SEEK_END)=3,657,728`, well past offset 2,088,960.
- H5 (offset mismatch with hantro NV12 stride) — **innocent**. Arithmetic shows Y/UV/trailing-metadata layout fits cleanly.
- H1 (panfrost EGL non-zero offset) — **less likely** after Mesa source-read. Linear path correctly captures `whandle->offset` and adds it to GPU base for sampling. Cannot be conclusively ruled out without runtime EGL probe but obvious places are clean.
- **New leading hypotheses: H6 (DMA cache coherency between hantro VPU and Mali GPU) and H7 (panfrost dma_buf import lacks GPU-side cache invalidation).** Derived from green-color math: BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) is exactly the observed green tone — so panfrost is reading **zeros** despite hantro having written real data. This points to a synchronization/coherency gap, not an offset/format bug.
- Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged.
- Delivery vehicle re-evaluation: `mpv-fourier-1:0.41.0-8` is the right vehicle ONLY for an mpv-side defensive workaround. Given the new hypotheses, the fix more likely lands in: (a) the kernel V4L2 vb2 layer (attach implicit fence on DQBUF — exactly the `vb2_dma_resv` RFC at lore.kernel.org), (b) panfrost's kernel-mode driver dma_buf import path (cache invalidate), or (c) a userspace `DMA_BUF_IOCTL_SYNC` workaround between mpv's DQBUF and wl-submit.
- Next probe options ranked by cost-to-decisiveness:
1. **Cache-sync ioctl test** (~30 min) — patch `mpv-fourier`'s `vo_dmabuf_wayland.c` to call `DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ)` on each EXPBUF fd before submitting to KWin. If green goes away, H6/H7 confirmed (workaround viable; root-cause kernel bug separately).
2. **EGL importer harness** (~1-2h) — synthesize a known NV12 buffer in CPU memory, write to a `udmabuf`, EGL_image-import with PLANE0_OFFSET=2088960, render via fullscreen quad, glReadPixels. Decides H1 definitively. If reads return correct synthesized data, H1 ruled out cleanly; if zeros, H1 confirmed.
3. **Mesa debug log** (~15 min) — run mpv with `MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo` and inspect what panfrost sees for our buffers. Cheap but may not be conclusive.