d26d662c04
Shallow-cloned Mesa 26.0.6 (matches ohm's installed mesa+vulkan-panfrost)
and traced the per-plane EGL import path through panfrost.
Findings:
(a) pan_screen.c:443 — external_only=is_yuv → NV12+LINEAR forces
KWin's per-plane path (Y as R8, UV as DRM_FORMAT_GR88).
(b) loader_dri_helper.c:43 — DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM.
Sampling: .r=byte 0=Cb, .g=byte 1=Cr. Matches KWin's shader.
(c) KWin shader (glshadermanager.cpp:189): vec4(Y, .r, .g, 1) then
yuvToRgb*. So result.y=U, result.z=V. Math is consistent.
(d) pan_resource.c:354-358 captures whandle->offset → explicit_layout.
pan_mod.c:663-667 honors offset_B with only alignment check.
pan_texture.c:361 etc. set texture base = plane->base + offset_B.
Source code is clean. H1 (panfrost offset bug) demoted to LESS-LIKELY.
Cannot be conclusively ruled out without runtime EGL probe.
Green-color math points elsewhere:
BT.601 limited-range YUV(0,0,0)
→ R = 1.164*(-16) + 1.596*(-128) = -223 → clamp 0
→ G = 1.164*(-16) - 0.391*(-128) - 0.813*(-128) = +135
→ B = 1.164*(-16) + 2.018*(-128) = -277 → clamp 0
= RGB(0, 135, 0) — EXACTLY the observed green tone.
Conclusion: panfrost is reading ZERO-FILL bytes despite hantro
writing real data. Not a format/offset bug — a cache coherency
or synchronization bug.
New leading hypotheses:
H6 — DMA cache coherency between hantro VPU and Mali GPU. V4L2 does
NOT attach implicit fences on DQBUF for CAPTURE buffers (the exact
gap our vb2_dma_resv RFC v2 addresses upstream). Mali starts sampling
before hantro's writes flush to coherent DRAM.
H7 — Panfrost dma_buf import lacks GPU-side cache invalidation at
attach/map time. Mali MMU/cache serves stale (zero) reads.
Next probe options ranked:
1. Patch mpv-fourier to issue DMA_BUF_IOCTL_SYNC on EXPBUF fds before
wl-submit. Cheap (~30 min), decisive on H6/H7, doubles as workaround.
2. EGL importer harness with synthetic NV12 (~1-2h), decides H1.
3. MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo log (~15 min, may not decide).
Leaning toward option 1.
Posted to dmabuf-modifier-triage#1 comment 257.
120 lines
12 KiB
Markdown
120 lines
12 KiB
Markdown
# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis)
|
||
|
||
**Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested.
|
||
|
||
## What the source actually says
|
||
|
||
**mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`:
|
||
|
||
```c
|
||
for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) {
|
||
AVDRMPlaneDescriptor plane = layer.planes[plane_no];
|
||
int object_index = plane.object_index;
|
||
AVDRMObjectDescriptor object = desc->objects[object_index];
|
||
uint64_t modifier = object.format_modifier;
|
||
zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset,
|
||
plane.pitch, modifier >> 32, modifier & 0xffffffff);
|
||
}
|
||
```
|
||
|
||
No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says.
|
||
|
||
**Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm):
|
||
|
||
```c
|
||
desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro
|
||
desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd
|
||
// in v4l2request_set_drm_descriptor:
|
||
desc->nb_layers = 1;
|
||
layer->nb_planes = 1;
|
||
layer->planes[0].object_index = 0;
|
||
layer->planes[0].offset = 0;
|
||
layer->planes[0].pitch = bytesperline; // 1920
|
||
if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true
|
||
layer->nb_planes = 2;
|
||
layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0
|
||
layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960
|
||
layer->planes[1].pitch = layer->planes[0].pitch;
|
||
}
|
||
```
|
||
|
||
Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`.
|
||
|
||
## What the runtime probe says
|
||
|
||
`v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`:
|
||
|
||
```
|
||
Pixel Format : 'NV12' (Y/UV 4:2:0)
|
||
Number of planes : 1
|
||
sizeimage=3655712, bytesperline=1920
|
||
```
|
||
|
||
`strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`.
|
||
|
||
But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer:
|
||
|
||
```
|
||
add(fd 41, 0, 0, 1920, 0, 0)
|
||
add(fd 42, 1, 2088960, 1920, 0, 0)
|
||
```
|
||
|
||
These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table).
|
||
|
||
## Implications for iter1
|
||
|
||
The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer:
|
||
|
||
- nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls
|
||
- libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory
|
||
- Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation
|
||
|
||
So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else.
|
||
|
||
## New hypothesis space (one of these is the real bug)
|
||
|
||
1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`:
|
||
- `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88).
|
||
- `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`).
|
||
- `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import.
|
||
- `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike.
|
||
- `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`.
|
||
- **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data.
|
||
|
||
2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent.
|
||
|
||
3. ~~**hantro kernel driver exports a `dma_buf` with `size` < full allocation.**~~ **RULED OUT 2026-05-08** by `/tmp/expbuf_probe.c` on ohm. Driver `hantro-vpu` on `rk3568-vpu-dec` reports `CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712`; `VIDIOC_EXPBUF` yields fd whose `lseek(fd, 0, SEEK_END) = 3,657,728` (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent.
|
||
|
||
Side observation worth recording: `sizeimage = 3,655,712` is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits **past** the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is correct.
|
||
|
||
4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset.
|
||
|
||
6. **DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math). BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) — exactly the green tone we see. This means *panfrost is reading zero-fill bytes despite hantro having written valid YUV data*. If the dma_buf hasn't been cache-flushed/invalidated correctly between writer (hantro) and reader (Mali), Mali samples stale zeroed memory. V4L2 does NOT attach implicit fences to CAPTURE buffers on dequeue — this gap is exactly what our `vb2_dma_resv` RFC v2 addresses for upstream (see `project_vb2_dma_resv_v2_state.md`). Testable on ohm: between mpv's VIDIOC_DQBUF and the wl_dmabuf submit, mmap the EXPBUF fd from a sibling process via `pidfd_getfd`, force CPU cache sync via `DMA_BUF_IOCTL_SYNC` with `DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ`, read back; if it shows real Y data, the hantro-side write completed but the Mali-side read sees zeros — narrows to GPU-side cache invalidation gap.
|
||
|
||
7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads. Testable: search `pan_kmod_*_buf_import` / Mali kernel-mode driver for `dma_buf_attach` + `dma_buf_map_attachment` calls; look for missing cache-invalidate or missing DMA fence wait. (Sub-hypothesis of 6, but at a different layer.)
|
||
|
||
## Recommended next moves for iter1
|
||
|
||
a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly.
|
||
|
||
b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI).
|
||
|
||
c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read.
|
||
|
||
d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned.
|
||
|
||
## Status
|
||
|
||
- iter1 phase 2 closed 2026-05-08. Source-read of KWin 6.6.4 + Mesa 26.0.6 + arithmetic + the `lseek` probe rule out / reduce-confidence on the original hypothesis space:
|
||
- H2 (KWin wl_dmabuf import) — **innocent**. Both wl-protocol params validation and EGL import paths forward user-supplied offset verbatim to `eglCreateImage`. KWin's YUV→RGB shader (glshadermanager.cpp:189) reads `.r=Cb, .g=Cr` matching mesa's GR88→RG88_UNORM mapping.
|
||
- H3 (hantro size cap) — **innocent**. Runtime probe shows `lseek(SEEK_END)=3,657,728`, well past offset 2,088,960.
|
||
- H5 (offset mismatch with hantro NV12 stride) — **innocent**. Arithmetic shows Y/UV/trailing-metadata layout fits cleanly.
|
||
- H1 (panfrost EGL non-zero offset) — **less likely** after Mesa source-read. Linear path correctly captures `whandle->offset` and adds it to GPU base for sampling. Cannot be conclusively ruled out without runtime EGL probe but obvious places are clean.
|
||
- **New leading hypotheses: H6 (DMA cache coherency between hantro VPU and Mali GPU) and H7 (panfrost dma_buf import lacks GPU-side cache invalidation).** Derived from green-color math: BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) is exactly the observed green tone — so panfrost is reading **zeros** despite hantro having written real data. This points to a synchronization/coherency gap, not an offset/format bug.
|
||
- Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged.
|
||
- Delivery vehicle re-evaluation: `mpv-fourier-1:0.41.0-8` is the right vehicle ONLY for an mpv-side defensive workaround. Given the new hypotheses, the fix more likely lands in: (a) the kernel V4L2 vb2 layer (attach implicit fence on DQBUF — exactly the `vb2_dma_resv` RFC at lore.kernel.org), (b) panfrost's kernel-mode driver dma_buf import path (cache invalidate), or (c) a userspace `DMA_BUF_IOCTL_SYNC` workaround between mpv's DQBUF and wl-submit.
|
||
- Next probe options ranked by cost-to-decisiveness:
|
||
1. **Cache-sync ioctl test** (~30 min) — patch `mpv-fourier`'s `vo_dmabuf_wayland.c` to call `DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ)` on each EXPBUF fd before submitting to KWin. If green goes away, H6/H7 confirmed (workaround viable; root-cause kernel bug separately).
|
||
2. **EGL importer harness** (~1-2h) — synthesize a known NV12 buffer in CPU memory, write to a `udmabuf`, EGL_image-import with PLANE0_OFFSET=2088960, render via fullscreen quad, glReadPixels. Decides H1 definitively. If reads return correct synthesized data, H1 ruled out cleanly; if zeros, H1 confirmed.
|
||
3. **Mesa debug log** (~15 min) — run mpv with `MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo` and inspect what panfrost sees for our buffers. Cheap but may not be conclusive.
|