diff --git a/phase2_iter1_findings.md b/phase2_iter1_findings.md new file mode 100644 index 0000000..4324a70 --- /dev/null +++ b/phase2_iter1_findings.md @@ -0,0 +1,98 @@ +# Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis) + +**Opened 2026-05-08** during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe`. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested. + +## What the source actually says + +**mpv `video/out/vo_dmabuf_wayland.c` `drmprime_dmabuf_importer` (lines 250-277)** straightforwardly relays the producer's `AVDRMFrameDescriptor`: + +```c +for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) { + AVDRMPlaneDescriptor plane = layer.planes[plane_no]; + int object_index = plane.object_index; + AVDRMObjectDescriptor object = desc->objects[object_index]; + uint64_t modifier = object.format_modifier; + zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset, + plane.pitch, modifier >> 32, modifier & 0xffffffff); +} +``` + +No `dup()`, no rewriting, no transformation. mpv passes through what `AVDRMFrameDescriptor` says. + +**Kwiboo's `libavutil/hwcontext_v4l2request.c` `v4l2request_set_drm_descriptor` (lines 138-198)** for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format `v4l2-ctl --get-fmt-video-mplane-cap` reports for `/dev/video1` on ohm): + +```c +desc->base.nb_objects = num_planes; // = 1 for single-planar NV12 on hantro +desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd +// in v4l2request_set_drm_descriptor: +desc->nb_layers = 1; +layer->nb_planes = 1; +layer->planes[0].object_index = 0; +layer->planes[0].offset = 0; +layer->planes[0].pitch = bytesperline; // 1920 +if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0), so this is true + layer->nb_planes = 2; + layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0 + layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960 + layer->planes[1].pitch = layer->planes[0].pitch; +} +``` + +Per the source, mpv should produce **identical** fd values in the two `.add()` calls — both pulling from `desc->objects[0].fd`. + +## What the runtime probe says + +`v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`: + +``` +Pixel Format : 'NV12' (Y/UV 4:2:0) +Number of planes : 1 +sizeimage=3655712, bytesperline=1920 +``` + +`strace -e trace=ioctl mpv ...` confirms ffmpeg only does **one** `VIDIOC_EXPBUF` per CAPTURE buffer (`index=N, plane=0` → one fd), exactly matching `nb_objects = 1`. + +But `WAYLAND_DEBUG=1 mpv ...` shows two `.add()` calls **with different fd numbers** per buffer: + +``` +add(fd 41, 0, 0, 1920, 0, 0) +add(fd 42, 1, 2088960, 1920, 0, 0) +``` + +These fd numbers are **consecutive**, suggesting libwayland's `wl_closure_marshal` is `dup_cloexec`'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying `dma_buf` object (originally fd 17 or similar in mpv's table). + +## Implications for iter1 + +The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was **wrong**. There is no inconsistency at the producer ↔ mpv layer: + +- nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both `.add()` calls +- libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory +- Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation + +So the green frame is **not caused by mpv or ffmpeg's descriptor construction**. Something else. + +## New hypothesis space (one of these is the real bug) + +1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset.** The driver may sample plane 1 from offset 0 of the imported fd instead of offset 2088960, giving zero-fill UV. Testable: a minimal EGL importer C program that imports a known NV12 dmabuf with offsets and reads back via `glReadPixels`. + +2. **KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly.** If KWin sees two fds and detects (via `dma_buf_id` or kcmp(2)) that they're the same backing object, but then mishandles the per-plane offsets, this'd produce green. Testable: read KWin source `src/wayland/linuxdmabufv1clientbuffer.cpp` and compositor backend's EGL import path. + +3. **hantro kernel driver exports a `dma_buf` with `size` < full allocation.** If hantro caps the exported fd's mappable size to just the Y plane (2,088,960 bytes), offset 2,088,960 is past EOF — read fails silently, returns zeros. Testable: `ssh ohm "fcntl(F_GETSIZE) on EXPBUF fd"` or just `lseek(fd, 0, SEEK_END)` in a small probe. + +4. **kwin-fourier 0001 still has effect we missed.** Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset. + +## Recommended next moves for iter1 + +a. **Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info.** Decides hypothesis 3 in 30 minutes. Run on ohm directly. + +b. **Patch mpv with `MP_VERBOSE` logging of the AVDRMFrameDescriptor fields at .add()-call time** (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's `prepare()` slot, bump pkgrel, rebuild on fermi (~10 min CI). + +c. **Read KWin's wl_dmabuf import logic** (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read. + +d. **Update `marfrit/dmabuf-modifier-triage#1`** with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned. + +## Status + +- iter1 phase 2 ongoing. Pivot from "fix is in mpv" to "real layer is unclear, need more diagnostics." +- Acceptance criterion (`screenshots/frame10_expected.png`) is unchanged. +- Delivery vehicle (`mpv-fourier-1:0.41.0-8`) is still the right shipping path **if** the fix turns out to be a defensive workaround in mpv. Otherwise the patch lands in ffmpeg-v4l2-request-fourier (unlikely per this analysis), KWin, kernel hantro, or a Mesa-panfrost user-mode driver patch.