7b54ff6c2d
Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START|
SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and
drmprime_dmabuf_importer. Installed on ohm via [marfrit].
Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause
--start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4
Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4
— BYTE-IDENTICAL to the baseline green-frame screenshot.
Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709
limited-range YUV(0,0,0) per the README math).
Userspace cache-sync ioctl has zero effect. H6 ruled out.
Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU-
mmap then glTexSubImage2D upload path) is known-working. So the
buffer DOES contain valid YUV data. Only the zero-copy dma_buf-
to-Mali path renders zeros. Concentrates hypothesis on the
dma_buf → Mali GPU import/translation step itself.
Live hypothesis space:
H1..H3, H5, H6 ruled out
H4 latent (low conf)
H7 LEADING — panfrost dma_buf import / GPU-side cache or
BO-type / cache-attribute mismatch
Next probes:
1. Read panfrost kernel-mode dma_buf import path (~45 min)
2. EGL importer harness with synthetic udmabuf NV12 (~1-2h)
3. MESA_DEBUG verbose log (~15 min recon)
mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future
iter close replaces or removes.
Posted to dmabuf-modifier-triage#1 comment 259.
125 lines
13 KiB
Markdown
125 lines
13 KiB
Markdown
# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis)
|
||
|
||
**Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested.
|
||
|
||
## What the source actually says
|
||
|
||
**mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`:
|
||
|
||
```c
|
||
for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) {
|
||
AVDRMPlaneDescriptor plane = layer.planes[plane_no];
|
||
int object_index = plane.object_index;
|
||
AVDRMObjectDescriptor object = desc->objects[object_index];
|
||
uint64_t modifier = object.format_modifier;
|
||
zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset,
|
||
plane.pitch, modifier >> 32, modifier & 0xffffffff);
|
||
}
|
||
```
|
||
|
||
No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says.
|
||
|
||
**Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm):
|
||
|
||
```c
|
||
desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro
|
||
desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd
|
||
// in v4l2request_set_drm_descriptor:
|
||
desc->nb_layers = 1;
|
||
layer->nb_planes = 1;
|
||
layer->planes[0].object_index = 0;
|
||
layer->planes[0].offset = 0;
|
||
layer->planes[0].pitch = bytesperline; // 1920
|
||
if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true
|
||
layer->nb_planes = 2;
|
||
layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0
|
||
layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960
|
||
layer->planes[1].pitch = layer->planes[0].pitch;
|
||
}
|
||
```
|
||
|
||
Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`.
|
||
|
||
## What the runtime probe says
|
||
|
||
`v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`:
|
||
|
||
```
|
||
Pixel Format : 'NV12' (Y/UV 4:2:0)
|
||
Number of planes : 1
|
||
sizeimage=3655712, bytesperline=1920
|
||
```
|
||
|
||
`strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`.
|
||
|
||
But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer:
|
||
|
||
```
|
||
add(fd 41, 0, 0, 1920, 0, 0)
|
||
add(fd 42, 1, 2088960, 1920, 0, 0)
|
||
```
|
||
|
||
These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table).
|
||
|
||
## Implications for iter1
|
||
|
||
The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer:
|
||
|
||
- nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls
|
||
- libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory
|
||
- Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation
|
||
|
||
So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else.
|
||
|
||
## New hypothesis space (one of these is the real bug)
|
||
|
||
1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`:
|
||
- `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88).
|
||
- `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`).
|
||
- `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import.
|
||
- `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike.
|
||
- `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`.
|
||
- **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data.
|
||
|
||
2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent.
|
||
|
||
3. ~~**hantro kernel driver exports a `dma_buf` with `size` < full allocation.**~~ **RULED OUT 2026-05-08** by `/tmp/expbuf_probe.c` on ohm. Driver `hantro-vpu` on `rk3568-vpu-dec` reports `CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712`; `VIDIOC_EXPBUF` yields fd whose `lseek(fd, 0, SEEK_END) = 3,657,728` (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent.
|
||
|
||
Side observation worth recording: `sizeimage = 3,655,712` is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits **past** the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is correct.
|
||
|
||
4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset.
|
||
|
||
6. ~~**DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math).~~ **RULED OUT 2026-05-08 phase 3** by the iter1 patch test. Patched mpv 0.41.0 (mpv-fourier-1:0.41.0-9) to call `DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW)` + matching `SYNC_END` on each EXPBUF fd in both `vaapi_dmabuf_importer` and `drmprime_dmabuf_importer` before `zwp_linux_buffer_params_v1_add()`. Built via Gitea Actions, installed on ohm. Ran `mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4`, captured screenshot. **Result: byte-identical to baseline (md5 c8c8e9b88521a0069f709d483451c3d4).** The userspace cache-sync ioctl has no effect. Either hantro's `dma_buf_ops->begin_cpu_access` is a no-op (likely on Rockchip — many dma-buf-heap allocations are non-coherent CPU-cached but rely on different sync paths), OR the gap is on the GPU consumer side and CPU-cache state is irrelevant.
|
||
|
||
**Critical phase-3 observation**: `--hwdec=v4l2request --vo=gpu` (texture-upload path) is known-working and renders correctly. That path mmap's the dma_buf into mpv's CPU memory, then uploads to a GL texture via `glTexSubImage2D`. **The CPU CAN read valid YUV data from the buffer**; only the zero-copy dma_buf-to-GPU import path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself (H7 territory).
|
||
|
||
7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation OR doesn't import with the right BO type** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads, OR the imported BO is created without the `DMA-COHERENT` flag panfrost expects, leaving Mali sampling un-snooped memory. Phase 3 narrowed: this is now the LEADING hypothesis. Sub-cases to investigate:
|
||
- 7a. Panfrost's `dma_buf_attach` + `dma_buf_map_attachment` calls miss cache-invalidate. Source is `drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c` and `panfrost_drv.c`'s `panfrost_gem_prime_import_sg_table`.
|
||
- 7b. The imported BO is mapped non-coherent in Mali's MMU, but the buffer was allocated cacheable (or vice-versa). Sync mismatch.
|
||
- 7c. Panfrost uses ioremap_wc / ioremap_cache the wrong way for hantro-allocated CMA pages.
|
||
- 7d. The Mali-G52 panfrost path does NOT support imported dma_buf for sampling at all — only for scanout direct-pass-through. (Less likely; would mean GL texture creation should fail, but in our case it succeeds and renders zeros.)
|
||
|
||
## Recommended next moves for iter1
|
||
|
||
a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly.
|
||
|
||
b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI).
|
||
|
||
c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read.
|
||
|
||
d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned.
|
||
|
||
## Status
|
||
|
||
- iter1 phase 3 closed 2026-05-08. The DMA_BUF_IOCTL_SYNC patch (mpv-fourier-1:0.41.0-9, both vaapi_dmabuf_importer + drmprime_dmabuf_importer) had **zero effect** — green-frame screenshot byte-identical to baseline. **H6 ruled out.**
|
||
- Five hypotheses ruled out (H2, H3, H5, H6, the ad-hoc offset variant). H1 less-likely after Mesa source-read but not conclusively excluded.
|
||
- **Leading hypothesis: H7** — panfrost's dma_buf import / GPU-side cache or BO-type handling. Pinned by the *known-good* counter-test: `mpv --hwdec=v4l2request --vo=gpu` (CPU-mmap → glTexSubImage2D upload path) renders correctly. So the buffer DOES contain valid data; only the zero-copy dma_buf→Mali path renders zeros.
|
||
- Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged.
|
||
- Delivery vehicle re-evaluation again: with H6 gone, the userspace mpv workaround is no longer the right delivery vehicle for this iteration. The fix lands in: (a) panfrost kernel-mode driver (`drivers/gpu/drm/panfrost/`), (b) Mesa-panfrost userspace if there's an EGL_image attribute / format-import quirk, (c) hantro driver-side allocation flags (V4L2_MEMORY_DMABUF + appropriate cache attribute), or (d) a kernel bridge (e.g., DMA_BUF_IOCTL_SET_NAME with cache-aware variant).
|
||
- Next probe options ranked:
|
||
1. **Read panfrost kernel-mode dma_buf import** (~45 min, cheap source-read, no hardware): inspect `drivers/gpu/drm/panfrost/panfrost_gem.c` `panfrost_gem_prime_import_sg_table` and Mali MMU mapping for cache attributes / IO-coherency settings. May spot the gap directly.
|
||
2. **EGL importer harness with synthetic NV12 in udmabuf** (~1-2h): allocate via udmabuf (CPU-coherent), write known YUV pattern, eglCreateImage from the udmabuf, render, glReadPixels. If it reads back correct data → bug is hantro-allocated-buffer-specific (cache-attribute mismatch). If it ALSO reads zeros → general panfrost dma_buf import bug (less likely).
|
||
3. **Run mpv with `MESA_DEBUG=verbose` + `PAN_MESA_DEBUG=sync,trace`** (~15 min): may show something at the import boundary. Cheap recon.
|
||
|
||
mpv-fourier-1:0.41.0-9 keeps the no-op patch installed for now — it's harmless. Future iter close (iter2 or further phase under iter1) will replace it with whatever the actual fix is, OR pkgrel-bump back to remove the dead patch if the fix lands elsewhere (kernel/Mesa).
|