Files
dmabuf-modifier-triage/phase2_iter1_findings.md
T
marfrit 7b54ff6c2d iter1 phase 3 close: H6 ruled out by DMA_BUF_IOCTL_SYNC patch test
Built mpv-fourier-1:0.41.0-9 with the DMA_BUF_IOCTL_SYNC(SYNC_START|
SYNC_RW) + SYNC_END(SYNC_RW) patch in both vaapi_dmabuf_importer and
drmprime_dmabuf_importer. Installed on ohm via [marfrit].

Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause
      --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4

Result: spectacle screenshot md5 = c8c8e9b88521a0069f709d483451c3d4
        — BYTE-IDENTICAL to the baseline green-frame screenshot.
        Visual: same solid dark green ~ RGB(0, 77, 0) (BT.709
        limited-range YUV(0,0,0) per the README math).

Userspace cache-sync ioctl has zero effect. H6 ruled out.

Phase 3 critical observation: --hwdec=v4l2request --vo=gpu (CPU-
mmap then glTexSubImage2D upload path) is known-working. So the
buffer DOES contain valid YUV data. Only the zero-copy dma_buf-
to-Mali path renders zeros. Concentrates hypothesis on the
dma_buf → Mali GPU import/translation step itself.

Live hypothesis space:
  H1..H3, H5, H6 ruled out
  H4 latent (low conf)
  H7 LEADING — panfrost dma_buf import / GPU-side cache or
  BO-type / cache-attribute mismatch

Next probes:
  1. Read panfrost kernel-mode dma_buf import path (~45 min)
  2. EGL importer harness with synthetic udmabuf NV12 (~1-2h)
  3. MESA_DEBUG verbose log (~15 min recon)

mpv-fourier-1:0.41.0-9 keeps the no-op patch (harmless). Future
iter close replaces or removes.

Posted to dmabuf-modifier-triage#1 comment 259.
2026-05-08 22:31:36 +00:00

13 KiB
Raw Blame History

Phase 2 — iter1 source-read findings (REOPEN of root-cause analysis)

Opened 2026-05-08 during the iter1 phase 2 source-read of mpv 0.41.0 + Kwiboo's ffmpeg fork at commit b57fbbe. Phase 0's earlier conclusion ("mpv mixes per-plane fds with single-allocation offset") needs revision — the source code reads + runtime probe show the situation is more nuanced than the WAYLAND_DEBUG wire trace alone suggested.

What the source actually says

mpv video/out/vo_dmabuf_wayland.c drmprime_dmabuf_importer (lines 250-277) straightforwardly relays the producer's AVDRMFrameDescriptor:

for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) {
    AVDRMPlaneDescriptor plane = layer.planes[plane_no];
    int object_index = plane.object_index;
    AVDRMObjectDescriptor object = desc->objects[object_index];
    uint64_t modifier = object.format_modifier;
    zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset,
                                   plane.pitch, modifier >> 32, modifier & 0xffffffff);
}

No dup(), no rewriting, no transformation. mpv passes through what AVDRMFrameDescriptor says.

Kwiboo's libavutil/hwcontext_v4l2request.c v4l2request_set_drm_descriptor (lines 138-198) for hantro's NV12 single-planar (V4L2_PIX_FMT_NV12, the format v4l2-ctl --get-fmt-video-mplane-cap reports for /dev/video1 on ohm):

desc->base.nb_objects = num_planes;        // = 1 for single-planar NV12 on hantro
desc->base.objects[0].fd = exportbuffer.fd; // VIDIOC_EXPBUF returns ONE fd
// in v4l2request_set_drm_descriptor:
desc->nb_layers = 1;
layer->nb_planes = 1;
layer->planes[0].object_index = 0;
layer->planes[0].offset = 0;
layer->planes[0].pitch = bytesperline;     // 1920
if (modifier != ARM_VENDOR) {              // hantro outputs LINEAR (0x0), so this is true
    layer->nb_planes = 2;
    layer->planes[1].object_index = 0;     // ← BOTH PLANES point at object 0
    layer->planes[1].offset = pitch * height;  // 1920 * 1088 = 2088960
    layer->planes[1].pitch = layer->planes[0].pitch;
}

Per the source, mpv should produce identical fd values in the two .add() calls — both pulling from desc->objects[0].fd.

What the runtime probe says

v4l2-ctl --get-fmt-video-mplane-cap on ohm /dev/video1:

Pixel Format      : 'NV12' (Y/UV 4:2:0)
Number of planes  : 1
sizeimage=3655712, bytesperline=1920

strace -e trace=ioctl mpv ... confirms ffmpeg only does one VIDIOC_EXPBUF per CAPTURE buffer (index=N, plane=0 → one fd), exactly matching nb_objects = 1.

But WAYLAND_DEBUG=1 mpv ... shows two .add() calls with different fd numbers per buffer:

add(fd 41, 0, 0,       1920, 0, 0)
add(fd 42, 1, 2088960, 1920, 0, 0)

These fd numbers are consecutive, suggesting libwayland's wl_closure_marshal is dup_cloexec'ing the fd at protocol-marshal time and the trace prints the post-dup fd. Both fd 41 and fd 42 are dups of the same underlying dma_buf object (originally fd 17 or similar in mpv's table).

Implications for iter1

The earlier phase 0 conclusion that mpv constructs an "internally inconsistent" wl_dmabuf message was wrong. There is no inconsistency at the producer ↔ mpv layer:

  • nb_objects = 1, both planes use object 0 → mpv passes the same fd value into both .add() calls
  • libwayland dups it before sending → wire trace shows different fd numbers, but they refer to the same backing memory
  • Plane 1's offset = 2088960 is correct relative to the (single) underlying allocation

So the green frame is not caused by mpv or ffmpeg's descriptor construction. Something else.

New hypothesis space (one of these is the real bug)

  1. Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset. Source-read 2026-05-08 of Mesa 26.0.6 makes this LESS LIKELY. Trace at references/mesa-26.0.6/:

    • src/gallium/drivers/panfrost/pan_screen.c:443 reports external_only[count] = is_yuv for any YUV format → NV12+LINEAR is external_only, forcing KWin's per-plane import path (Y as R8, UV as DRM_FORMAT_GR88).
    • src/loader/loader_dri_helper.c:43 maps DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM (the byte-order distinction is preserved at the pipe-format level — .r = byte 0 = Cb, .g = byte 1 = Cr — matching KWin's shader assumption at glshadermanager.cpp:189 result.yz = sampler1.rg).
    • src/gallium/drivers/panfrost/pan_resource.c:354-358 captures whandle->offset into explicit_layout.offset_B for the import.
    • src/panfrost/lib/pan_mod.c:663-667 (linear modifier slice-init) honors layout_constraints->offset_B directly with only an alignment check; 2,088,960 is page-aligned, satisfies 16-/64-/4096-byte alignments alike.
    • src/panfrost/lib/pan_texture.c:361,561,660,773,817 set the texture descriptor's GPU base to plane->base + slayout->offset_B — i.e., sampling reads from bo_gpu + 2,088,960.
    • Conclusion: panfrost source code as written DOES honor non-zero plane offset. Source-read alone cannot rule out runtime bug — but the obvious places are clean. To definitively rule in/out, write the EGL importer harness with synthetic NV12 data.
  2. KWin's wl_dmabuf import logic deduplicates the dup'd fds incorrectly. RULED OUT 2026-05-08 by source-read of KWin 6.6.4 at references/kwin-6.6.4/src/wayland/linuxdmabufv1clientbuffer.cpp + src/opengl/{eglbackend,egldisplay}.cpp. (a) LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add simply stores per-plane fd/offset/pitch in separate slots, no dedup. (b) LinuxDmaBufParamsV1::test() does lseek(SEEK_END) per plane + range checks against the resulting size; our 3,657,728 satisfies all of them. (c) EglDisplay::importDmaBufAsImage (both the combined and per-plane forms) passes dmabuf.fd[i], dmabuf.offset[i], dmabuf.pitch[i] straight to eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...) with no transformation. (d) EglBackend::testImportBuffer chooses between combined import and per-plane (Y as R8 / UV as RG88 from offset 2,088,960) based on whether NV12+LINEAR is in nonExternalOnlySupportedDrmFormats(). Either path forwards offset = 2,088,960 to the driver. KWin is innocent.

  3. hantro kernel driver exports a dma_buf with size < full allocation. RULED OUT 2026-05-08 by /tmp/expbuf_probe.c on ohm. Driver hantro-vpu on rk3568-vpu-dec reports CAPTURE: NV12 1920x1088 num_planes=1 sizeimage=3655712; VIDIOC_EXPBUF yields fd whose lseek(fd, 0, SEEK_END) = 3,657,728 (page-rounded up from 3,655,712). Offset 2,088,960 (plane 1 base) is firmly inside the exported size. Kernel is innocent.

    Side observation worth recording: sizeimage = 3,655,712 is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. The 522,272-byte excess sits past the UV plane (Y at [0, 2,088,960), UV at [2,088,960, 3,133,440), trailing padding at [3,133,440, 3,655,712)). On Rockchip codecs that tail commonly holds per-frame motion-vector / decoder-context data. Confirms ffmpeg's hardcoded planes[1].offset = pitch*height = 2,088,960 is correct.

  4. kwin-fourier 0001 still has effect we missed. Even though we ruled out kwin-fourier as a compositor-replacement A/B, that test was on an earlier kernel/Mesa combo. Worth verifying the test environment is fully reset.

  5. DMA cache coherency between hantro VPU and Mali GPU (NEW 2026-05-08, derived from green-color math). RULED OUT 2026-05-08 phase 3 by the iter1 patch test. Patched mpv 0.41.0 (mpv-fourier-1:0.41.0-9) to call DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW) + matching SYNC_END on each EXPBUF fd in both vaapi_dmabuf_importer and drmprime_dmabuf_importer before zwp_linux_buffer_params_v1_add(). Built via Gitea Actions, installed on ohm. Ran mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 fourier-test/bbb_1080p30_h264.mp4, captured screenshot. Result: byte-identical to baseline (md5 c8c8e9b88521a0069f709d483451c3d4). The userspace cache-sync ioctl has no effect. Either hantro's dma_buf_ops->begin_cpu_access is a no-op (likely on Rockchip — many dma-buf-heap allocations are non-coherent CPU-cached but rely on different sync paths), OR the gap is on the GPU consumer side and CPU-cache state is irrelevant.

    Critical phase-3 observation: --hwdec=v4l2request --vo=gpu (texture-upload path) is known-working and renders correctly. That path mmap's the dma_buf into mpv's CPU memory, then uploads to a GL texture via glTexSubImage2D. The CPU CAN read valid YUV data from the buffer; only the zero-copy dma_buf-to-GPU import path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself (H7 territory).

  6. Panfrost dma_buf import doesn't perform GPU-side cache invalidation OR doesn't import with the right BO type when mapping an imported fd. Even if data has reached DRAM, Mali's MMU/cache may serve stale reads, OR the imported BO is created without the DMA-COHERENT flag panfrost expects, leaving Mali sampling un-snooped memory. Phase 3 narrowed: this is now the LEADING hypothesis. Sub-cases to investigate:

    • 7a. Panfrost's dma_buf_attach + dma_buf_map_attachment calls miss cache-invalidate. Source is drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c and panfrost_drv.c's panfrost_gem_prime_import_sg_table.
    • 7b. The imported BO is mapped non-coherent in Mali's MMU, but the buffer was allocated cacheable (or vice-versa). Sync mismatch.
    • 7c. Panfrost uses ioremap_wc / ioremap_cache the wrong way for hantro-allocated CMA pages.
    • 7d. The Mali-G52 panfrost path does NOT support imported dma_buf for sampling at all — only for scanout direct-pass-through. (Less likely; would mean GL texture creation should fail, but in our case it succeeds and renders zeros.)

a. Write a small C harness that does VIDIOC_EXPBUF on a hantro CAPTURE buffer and reports fd size + backing dma_buf info. Decides hypothesis 3 in 30 minutes. Run on ohm directly.

b. Patch mpv with MP_VERBOSE logging of the AVDRMFrameDescriptor fields at .add()-call time (nb_objects, planes[].object_index, planes[].offset, objects[].size). Confirms the source-read is correct at runtime. Drop into mpv-fourier's prepare() slot, bump pkgrel, rebuild on fermi (~10 min CI).

c. Read KWin's wl_dmabuf import logic (KDE Plasma 6 / KWin 6.6.4 source) for how it handles multiple-fd-same-buffer cases. ~30 min source-read.

d. Update marfrit/dmabuf-modifier-triage#1 with this revised analysis. The current issue body claims the bug is in mpv's plane-semantics translation — that conclusion is now overturned.

Status

  • iter1 phase 3 closed 2026-05-08. The DMA_BUF_IOCTL_SYNC patch (mpv-fourier-1:0.41.0-9, both vaapi_dmabuf_importer + drmprime_dmabuf_importer) had zero effect — green-frame screenshot byte-identical to baseline. H6 ruled out.
  • Five hypotheses ruled out (H2, H3, H5, H6, the ad-hoc offset variant). H1 less-likely after Mesa source-read but not conclusively excluded.
  • Leading hypothesis: H7 — panfrost's dma_buf import / GPU-side cache or BO-type handling. Pinned by the known-good counter-test: mpv --hwdec=v4l2request --vo=gpu (CPU-mmap → glTexSubImage2D upload path) renders correctly. So the buffer DOES contain valid data; only the zero-copy dma_buf→Mali path renders zeros.
  • Acceptance criterion (screenshots/frame10_expected.png) is unchanged.
  • Delivery vehicle re-evaluation again: with H6 gone, the userspace mpv workaround is no longer the right delivery vehicle for this iteration. The fix lands in: (a) panfrost kernel-mode driver (drivers/gpu/drm/panfrost/), (b) Mesa-panfrost userspace if there's an EGL_image attribute / format-import quirk, (c) hantro driver-side allocation flags (V4L2_MEMORY_DMABUF + appropriate cache attribute), or (d) a kernel bridge (e.g., DMA_BUF_IOCTL_SET_NAME with cache-aware variant).
  • Next probe options ranked:
    1. Read panfrost kernel-mode dma_buf import (~45 min, cheap source-read, no hardware): inspect drivers/gpu/drm/panfrost/panfrost_gem.c panfrost_gem_prime_import_sg_table and Mali MMU mapping for cache attributes / IO-coherency settings. May spot the gap directly.
    2. EGL importer harness with synthetic NV12 in udmabuf (~1-2h): allocate via udmabuf (CPU-coherent), write known YUV pattern, eglCreateImage from the udmabuf, render, glReadPixels. If it reads back correct data → bug is hantro-allocated-buffer-specific (cache-attribute mismatch). If it ALSO reads zeros → general panfrost dma_buf import bug (less likely).
    3. Run mpv with MESA_DEBUG=verbose + PAN_MESA_DEBUG=sync,trace (~15 min): may show something at the import boundary. Cheap recon.

mpv-fourier-1:0.41.0-9 keeps the no-op patch installed for now — it's harmless. Future iter close (iter2 or further phase under iter1) will replace it with whatever the actual fix is, OR pkgrel-bump back to remove the dead patch if the fix lands elsewhere (kernel/Mesa).