# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis) **Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested. ## What the source actually says **mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`: ```c for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) { AVDRMPlaneDescriptor plane = layer.planes[plane_no]; int object_index = plane.object_index; AVDRMObjectDescriptor object = desc->objects[object_index]; uint64_t modifier = object.format_modifier; zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset, plane.pitch, modifier >> 32, modifier & 0xffffffff); } ``` No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says. **Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm): ```c desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd // in v4l2request_set_drm_descriptor: desc->nb_layers = 1; layer->nb_planes = 1; layer->planes[0].object_index = 0; layer->planes[0].offset = 0; layer->planes[0].pitch = bytesperline; // 1920 if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true layer->nb_planes = 2; layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0 layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960 layer->planes[1].pitch = layer->planes[0].pitch; } ``` Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`. ## What the runtime probe says `v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`: ``` Pixel Format : 'NV12' (Y/UV 4:2:0) Number of planes : 1 sizeimage=3655712, bytesperline=1920 ``` `strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`. But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer: ``` add(fd 41, 0, 0, 1920, 0, 0) add(fd 42, 1, 2088960, 1920, 0, 0) ``` These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table). ## Implications for iter1 The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer: - nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls - libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory - Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else. ## New hypothesis space (one of these is the real bug) 1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** **Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY.** Trace at `references/mesa-26.0.6/`: - `src/gallium/drivers/panfrost/pan_screen.c:443` reports `external_only[count] = is_yuv` for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88). - `src/loader/loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` (the byte-order distinction is preserved at the pipe-format level — `.r` = byte 0 = Cb, `.g` = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 `result.yz = sampler1.rg`). - `src/gallium/drivers/panfrost/pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B` for the import. - `src/panfrost/lib/pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike. - `src/panfrost/lib/pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B` — i.e., sampling reads from `bo_gpu + 2,088,960`. - **Conclusion**: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data. 2. ~~**KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.**~~ **RULED OUT 2026-05-08** by source-read of KWin 6.6.4 at `references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp` + `src/opengl/{eglbackend,egldisplay}.cpp`. (a) `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) `LinuxDmaBufParamsV1::test()` does `lseek(SEEK_END)` per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) `EglDisplay::importDmaBufAsImage` (both the combined and per-plane forms) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. (d) `EglBackend::testImportBuffer` chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. **Either path** forwards `offset = 2,088,960` to the driver. KWin is innocent. 3. ~~**hantro kernel driver exports a `dma_buf` with `size` < full allocation.**~~ **RULED OUT 2026-05-08** by `/tmp/expbuf_probe.c` on ohm. Driver `hantro-vpu` on `rk3568-vpu-dec` reports `CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712`; `VIDIOC_EXPBUF` yields fd whose `lseek(fd, 0, SEEK_END) = 3,657,728` (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent. Side observation worth recording: `sizeimage = 3,655,712` is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits **past** the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is correct. 4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset. 6. ~~**DMA cache coherency between hantro VPU and Mali GPU** (NEW 2026-05-08, derived from green-color math).~~ **RULED OUT 2026-05-08 phase 3** by the iter1 patch test. Patched mpv 0.41.0 (mpv-fourier-1:0.41.0-9) to call `DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW)` + matching `SYNC_END` on each EXPBUF fd in both `vaapi_dmabuf_importer` and `drmprime_dmabuf_importer` before `zwp_linux_buffer_params_v1_add()`. Built via Gitea Actions, installed on ohm. Ran `mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4`, captured screenshot. **Result: byte-identical to baseline (md5 c8c8e9b88521a0069f709d483451c3d4).** The userspace cache-sync ioctl has no effect. Either hantro's `dma_buf_ops->begin_cpu_access` is a no-op (likely on Rockchip — many dma-buf-heap allocations are non-coherent CPU-cached but rely on different sync paths), OR the gap is on the GPU consumer side and CPU-cache state is irrelevant. **Critical phase-3 observation**: `--hwdec=v4l2request --vo=gpu` (texture-upload path) is known-working and renders correctly. That path mmap's the dma_buf into mpv's CPU memory, then uploads to a GL texture via `glTexSubImage2D`. **The CPU CAN read valid YUV data from the buffer**; only the zero-copy dma_buf-to-GPU import path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself (H7 territory). 7. **Panfrost dma_buf import doesn't perform GPU-side cache invalidation OR doesn't import with the right BO type.** ✅ **CONFIRMED VIA SOURCE-READ 2026-05-08 phase 4** (Linux 6.12 panfrost source at `~/src/linux-rfc/drivers/gpu/drm/panfrost/`): - `panfrost_gem_create_object` (panfrost_gem.c:262): `obj->base.map_wc = !pfdev->coherent;` — sets write-combine (uncached) CPU mapping when device isn't coherent. Applies to imports too via `drm_gem_shmem_prime_import_sg_table`. - `pfdev->coherent` (panfrost_drv.c:625): `device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT` — i.e., from the DT `dma-coherent` property on the panfrost node. - On ohm (RK3566 PineTab2 besser-7.0): **NO `dma-coherent` property** anywhere in `/sys/firmware/devicetree/base/` (verified via `find ... -name dma-coherent`). So `pfdev->coherent = false`. - `panfrost_mmu_map` (panfrost_mmu.c:330): **`int prot = IOMMU_READ | IOMMU_WRITE;`** — **no `IOMMU_CACHE`**. Imported BOs get mapped into Mali's IOMMU as non-snooping. Mali reads directly from DRAM. - The only cache sync that occurs is **once** at `dma_buf_map_attachment` time (during EGL_image import in KWin). KWin caches the EGL_image per-fd in `m_importedBuffers` (eglbackend.cpp:282), reusing it for every subsequent frame. - **No per-frame cache sync mechanism exists** — V4L2 doesn't attach `dma_resv` fences to CAPTURE buffers on DQBUF (the exact gap addressed by our `vb2_dma_resv` RFC v2 upstream). **Architectural picture**: hantro VPU writes decoded YUV to its CMA buffer through CPU L1/L2/L3 caches. Mali GPU reads through its IOMMU with no cache snoop. Without per-frame fence-driven cache flush, Mali sees DRAM-direct content — which lags behind hantro's writes (often zero-fill of fresh-allocated pages). **Result: green frames.** **Counter-validation**: `mpv --hwdec=v4l2request --vo=gpu` (CPU-mmap of dma_buf → glTexSubImage2D upload to Mali-private BO) works correctly. CPU mmap triggers cache sync via dma-buf's `begin_cpu_access`. The Mali-private destination BO is normally cached/coherent because Mali allocated it. Per-frame implicit cache sync via the CPU mmap path. Confirms the buffer DOES contain valid data and only the zero-copy dma_buf-to-Mali-IOMMU path lacks per-frame sync. **Why `DMA_BUF_IOCTL_SYNC` (phase 3) didn't help**: that ioctl invokes `dma_buf_ops->begin_cpu_access` / `end_cpu_access` — both **CPU**-side cache management. They don't propagate to the GPU's IOMMU mapping. The GPU still reads through its non-snooping mapping; CPU cache state is irrelevant to it. **Real fix path**: kernel-side V4L2 `vb2_dma_resv` patches (our upstream RFC v2). With V4L2 attaching a `dma_resv` fence on DQBUF for CAPTURE, mesa-panfrost's implicit fence-wait at sample time will block until hantro's writes signal — and the fence signaling semantics imply cache writeback. The fence-wait + cache-flush combination resolves the green frames. ## Recommended next moves for iter1 a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly. b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI). c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read. d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned. ## Status - iter1 phase 4 closed 2026-05-08. **H7 confirmed via panfrost kernel source-read.** The dmabuf-wayland green-frame bug is structurally caused by the *missing per-frame cache sync mechanism* between hantro VPU and Mali GPU, on a non-coherent SoC (RK3566), with KWin caching the EGL_image per-fd. V4L2 doesn't attach `dma_resv` fences to CAPTURE buffers on DQBUF, so panfrost has no per-frame fence to wait on, and never flushes cross-device cache between frames. - Six hypotheses ruled out (H1, H2, H3, H5, H6, ad-hoc offset variant). H4 latent. **H7 leading and root-cause-confirmed**. - Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged. - **Critical discovery for the campaign**: the dmabuf-wayland green is **the same root cause** as the upstream RFC we're already advancing. The `vb2_dma_resv` v2 patches we're preparing for the linux-media list ARE the fix for ohm. ### Proposed iter2 / phase 5 path Take the kernel rebuild route. Build `linux-pinetab2-danctnix-besser` 7.0 with `vb2_dma_resv` RFC v2 patches applied, install on ohm, retest. If green goes away, we have: 1. Confirmation that our upstream RFC fixes a real shipping-product bug 2. A locally-shippable fix via `linux-pinetab2-fourier` (or similar fresnel-style kernel package) 3. A strong concrete data point to include in the v2 cover letter Estimated effort: - Apply RFC v2 patches to besser-7.0 source: ~30 min (patches need rebasing; current upstream RFC is against 6.12-rc, besser is 7.0) - Build kernel via distcc (his can wire up DISTCC_HOSTS): ~45-90 min - Install + reboot + retest: ~15 min - Total: ~2-3 hours ### Alternative paths if kernel-rebuild blocks a. **EGL importer harness with synthetic NV12 in udmabuf** (~1-2h): would CONFIRM by independent test that the issue is producer-cache-flush specifically (synthetic NV12 in CPU-allocated udmabuf would have writes via CPU mmap → naturally flushed; should render correctly even with current panfrost). Worth doing as additional evidence. b. **mpv-fourier `--vo=dmabuf-wayland` workaround**: re-import the dma_buf each frame from mpv-side. Defeats zero-copy. Not desirable. Only viable as last-resort fallback. c. **kwin-fourier workaround**: invalidate the cached EGL_image per-frame. Same downside (zero-copy defeated). But would help validate the kernel theory. mpv-fourier-1:0.41.0-9 keeps the harmless no-op DMA_BUF_IOCTL_SYNC patch installed for now. If the kernel-rebuild path works, the patch will be removed (revert) in the next mpv-fourier rev.