# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis) **Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested. ## What the source actually says **mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`: ```c for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) { AVDRMPlaneDescriptor plane = layer.planes[plane_no]; int object_index = plane.object_index; AVDRMObjectDescriptor object = desc->objects[object_index]; uint64_t modifier = object.format_modifier; zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset, plane.pitch, modifier >> 32, modifier & 0xffffffff); } ``` No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says. **Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm): ```c desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd // in v4l2request_set_drm_descriptor: desc->nb_layers = 1; layer->nb_planes = 1; layer->planes[0].object_index = 0; layer->planes[0].offset = 0; layer->planes[0].pitch = bytesperline; // 1920 if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true layer->nb_planes = 2; layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0 layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960 layer->planes[1].pitch = layer->planes[0].pitch; } ``` Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`. ## What the runtime probe says `v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`: ``` Pixel Format : 'NV12' (Y/UV 4:2:0) Number of planes : 1 sizeimage=3655712, bytesperline=1920 ``` `strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`. But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer: ``` add(fd 41, 0, 0, 1920, 0, 0) add(fd 42, 1, 2088960, 1920, 0, 0) ``` These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table). ## Implications for iter1 The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer: - nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls - libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory - Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else. ## New hypothesis space (one of these is the real bug) 1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`: - `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88). - `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`). - `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import. - `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike. - `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`. - **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data. 2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent. 3. ~~**hantro kernel driver exports a `dma_buf` with `size` < full allocation.**~~ **RULED OUT 2026-05-08** by `/tmp/expbuf_probe.c` on ohm. Driver `hantro-vpu` on `rk3568-vpu-dec` reports `CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712`; `VIDIOC_EXPBUF` yields fd whose `lseek(fd, 0, SEEK_END) = 3,657,728` (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent. Side observation worth recording: `sizeimage = 3,655,712` is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits **past** the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is correct. 4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset. 6. ~~**DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math).~~ **RULED OUT 2026-05-08 phase 3** by the iter1 patch test. Patched mpv 0.41.0 (mpv-fourier-1:0.41.0-9) to call `DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW)` + matching `SYNC_END` on each EXPBUF fd in both `vaapi_dmabuf_importer` and `drmprime_dmabuf_importer` before `zwp_linux_buffer_params_v1_add()`. Built via Gitea Actions, installed on ohm. Ran `mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4`, captured screenshot. **Result: byte-identical to baseline (md5 c8c8e9b88521a0069f709d483451c3d4).** The userspace cache-sync ioctl has no effect. Either hantro's `dma_buf_ops->begin_cpu_access` is a no-op (likely on Rockchip — many dma-buf-heap allocations are non-coherent CPU-cached but rely on different sync paths), OR the gap is on the GPU consumer side and CPU-cache state is irrelevant. **Critical phase-3 observation**: `--hwdec=v4l2request --vo=gpu` (texture-upload path) is known-working and renders correctly. That path mmap's the dma_buf into mpv's CPU memory, then uploads to a GL texture via `glTexSubImage2D`. **The CPU CAN read valid YUV data from the buffer**; only the zero-copy dma_buf-to-GPU import path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself (H7 territory). 7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation OR doesn't import with the right BO type** when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads, OR the imported BO is created without the `DMA-COHERENT` flag panfrost expects, leaving Mali sampling un-snooped memory. Phase 3 narrowed: this is now the LEADING hypothesis. Sub-cases to investigate: - 7a. Panfrost's `dma_buf_attach` + `dma_buf_map_attachment` calls miss cache-invalidate. Source is `drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c` and `panfrost_drv.c`'s `panfrost_gem_prime_import_sg_table`. - 7b. The imported BO is mapped non-coherent in Mali's MMU, but the buffer was allocated cacheable (or vice-versa). Sync mismatch. - 7c. Panfrost uses ioremap_wc / ioremap_cache the wrong way for hantro-allocated CMA pages. - 7d. The Mali-G52 panfrost path does NOT support imported dma_buf for sampling at all — only for scanout direct-pass-through. (Less likely; would mean GL texture creation should fail, but in our case it succeeds and renders zeros.) ## Recommended next moves for iter1 a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly. b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI). c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read. d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned. ## Status - iter1 phase 3 closed 2026-05-08. The DMA_BUF_IOCTL_SYNC patch (mpv-fourier-1:0.41.0-9, both vaapi_dmabuf_importer + drmprime_dmabuf_importer) had **zero effect** — green-frame screenshot byte-identical to baseline. **H6 ruled out.** - Five hypotheses ruled out (H2, H3, H5, H6, the ad-hoc offset variant). H1 less-likely after Mesa source-read but not conclusively excluded. - **Leading hypothesis: H7** — panfrost's dma_buf import / GPU-side cache or BO-type handling. Pinned by the *known-good* counter-test: `mpv --hwdec=v4l2request --vo=gpu` (CPU-mmap → glTexSubImage2D upload path) renders correctly. So the buffer DOES contain valid data; only the zero-copy dma_buf→Mali path renders zeros. - Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged. - Delivery vehicle re-evaluation again: with H6 gone, the userspace mpv workaround is no longer the right delivery vehicle for this iteration. The fix lands in: (a) panfrost kernel-mode driver (`drivers/gpu/drm/panfrost/`), (b) Mesa-panfrost userspace if there's an EGL_image attribute / format-import quirk, (c) hantro driver-side allocation flags (V4L2_MEMORY_DMABUF + appropriate cache attribute), or (d) a kernel bridge (e.g., DMA_BUF_IOCTL_SET_NAME with cache-aware variant). - Next probe options ranked: 1. **Read panfrost kernel-mode dma_buf import** (~45 min, cheap source-read, no hardware): inspect `drivers/gpu/drm/panfrost/panfrost_gem.c` `panfrost_gem_prime_import_sg_table` and Mali MMU mapping for cache attributes / IO-coherency settings. May spot the gap directly. 2. **EGL importer harness with synthetic NV12 in udmabuf** (~1-2h): allocate via udmabuf (CPU-coherent), write known YUV pattern, eglCreateImage from the udmabuf, render, glReadPixels. If it reads back correct data → bug is hantro-allocated-buffer-specific (cache-attribute mismatch). If it ALSO reads zeros → general panfrost dma_buf import bug (less likely). 3. **Run mpv with `MESA_DEBUG=verbose` + `PAN_MESA_DEBUG=sync,trace`** (~15 min): may show something at the import boundary. Cheap recon. mpv-fourier-1:0.41.0-9 keeps the no-op patch installed for now — it's harmless. Future iter close (iter2 or further phase under iter1) will replace it with whatever the actual fix is, OR pkgrel-bump back to remove the dead patch if the fix lands elsewhere (kernel/Mesa).