mpv vo_dmabuf_wayland: plane-semantics mismatch — different fds per plane combined with single-allocation offset for plane 1 (root cause of the green on ohm) #1

Closed
opened 2026-05-08 14:28:37 +00:00 by marfrit · 9 comments
Owner

Summary

WAYLAND_DEBUG=1 capture of mpv --hwdec=v4l2request --vo=dmabuf-wayland against hantro-decoded NV12 H.264 on ohm (RK3566 / Mali-G52 / kernel 7.0.danctnix1-1-pinetab2-danctnix-besser, KWin 6.6.4 OpenGL backend / Mesa-panfrost 26.0.6) shows the mpv-side wl_dmabuf protocol message is internally inconsistent. The zwp_linux_buffer_params_v1.add() calls combine per-plane fds (V4L2 MPLANE export semantics) with a single-allocation offset for plane 1 — KWin imports plane 1 from the wrong byte address and reads zeros for the UV chroma plane, producing the dark-green frames the user sees.

Discovered as the layer-isolation conclusion of the Phase 0 work in this repo. All other suspects (kwin-fourier 0001 patch, Mesa 26.0.5 vs 26.0.6, libva, decoder content correctness, color tagging, Wayland/KWin generally, kernel 6.19.10 vs 7.0, KWin Vulkan vs OpenGL backend) ruled out by A/B testing. See phase0_findings.md for the elimination ladder.

Trace (verbatim from ohm session, 2026-05-08)

WAYLAND_DEBUG=1 mpv --hwdec=v4l2request --vo=dmabuf-wayland --frames=3 \
    fourier-test/bbb_1080p30_h264.mp4

[ 675518.484]  -> zwp_linux_dmabuf_v1#27.create_params(new id zwp_linux_buffer_params_v1#54)
[ 675518.609]  -> zwp_linux_buffer_params_v1#54.add(fd 41, 0, 0,       1920, 0, 0)   ← Y plane
[ 675518.632]  -> zwp_linux_buffer_params_v1#54.add(fd 42, 1, 2088960, 1920, 0, 0)   ← UV plane
[ 675518.652]  -> zwp_linux_buffer_params_v1#54.create_immed(new id wl_buffer#56, 1920, 1080, 842094158, 0)
[ 675518.667]  -> zwp_linux_buffer_params_v1#54.destroy()

All subsequent frames follow the same pattern: plane 0 uses fd N at offset 0, plane 1 uses fd N+1 at offset 2088960.

  • 1920×1088 = 2088960. So the offset is the size of a height-aligned (1080→1088) Y plane in NV12. That offset is correct for a single underlying allocation where Y at [0..2088960) and UV at [2088960..3133440).
  • BUT plane 0 uses fd 41 and plane 1 uses fd 42 — different file descriptors. V4L2 MPLANE EXPBUF returns one fd per plane, so fd 41 is the Y plane's fd and fd 42 is the UV plane's fd. Both can map the same backing memory but at byte offset 0.
  • Result of add(fd 42, plane=1, offset=2088960, ...): KWin imports plane 1 from fd 42 starting at offset 2088960. If fd 42 maps only the UV plane (size ~1MB for 1920×1080 NV12), offset 2088960 is past EOF → KWin reads zeros for UV → all-zero NV12 buffer.

Why this produces dark green specifically

NV12 chroma is biased: U=128 + V=128 = "no color" (luma-only grayscale). When U=V=0 (instead of 128):

  • Y=0 + U=0 + V=0 in the BT.601/BT.709 → RGB matrix renders as approximately (0, 70, 0) — dark green.

Matches the user's observed symptom exactly.

Why mpv --vo=gpu doesn't trigger this

mpv's vo=gpu imports the dmabuf via libva's vaExportSurfaceHandle or ffmpeg's av_hwframe_transfer_data directly into mpv's own EGL context. Those APIs return the correct per-plane structure (each plane fd maps the corresponding plane at offset 0). mpv-side EGL_EXT_image_dma_buf_import then samples both planes correctly. The bug only manifests in mpv's vo_dmabuf_wayland code path.

Why mpv --vo=wlshm doesn't trigger this

wlshm goes through CPU memcpy into a wl_shm buffer. Bypasses the dmabuf protocol entirely. No plane-semantics translation issue.

Where the bug lives

mpv's video/out/vo_dmabuf_wayland.c (or its drm_prime helper, e.g. video/out/hwdec/dmabuf_interop_pl.c if libplacebo is involved). The translation from the producer's plane info (AVDRMFrameDescriptor for ffmpeg path, VADRMPRIMESurfaceDescriptor for libva path) to wl_dmabuf .add() calls is mishandling one of two cases:

Case A: producer uses single-fd-multiple-offsets convention

The producer's plane info has all planes referencing the same fd, with offsets pointing to where each plane lives in that one allocation. mpv translation should pass add(producer.fd, i, producer.plane[i].offset, producer.plane[i].pitch, ...) for every plane — same fd repeated, with offsets.

Case B: producer uses multi-fd-zero-offset convention

The producer's plane info has each plane with its own fd, each fd already pointing at the start of its own plane's allocation. mpv translation should pass add(producer.plane[i].fd, i, 0, producer.plane[i].pitch, ...) for every plane — different fds, offset always 0.

The trace shows mpv mixes both: different fds (Case B) and non-zero offset for plane 1 (Case A). One of the two reads from the producer is wrong — either the plane fds are being assigned right but offsets are wrong, or the offsets are being assigned right but fds should all be the same.

Suggested fix path

  1. Audit how vo_dmabuf_wayland.c extracts plane fd + offset from AVDRMFrameDescriptor (ffmpeg path) and VADRMPRIMESurfaceDescriptor (libva path). Check whether mpv is reading the wrong field.
  2. Compare against working VO paths in mpv (gst-launch with waylandsink would be a useful cross-check that's not affected by the same bug).
  3. Verify with a fixed mpv build that the picture displays correctly when:
    • Either: add(fd 41, 0, 0, 1920, 0, 0); add(fd 41, 1, 2088960, 1920, 0, 0) — same fd for both planes
    • Or: add(fd 41, 0, 0, 1920, 0, 0); add(fd 42, 1, 0, 1920, 0, 0) — different fds, both offset 0

Decisive verifier (deferred — needs libva-multiplanar iter9 fix first)

Re-run the WAYLAND_DEBUG capture with --hwdec=vaapi (libva path) once libva-v4l2-request-fourier#1 is fixed and the cap_pool/REQBUFS cascade no longer prevents libva playback. If the libva path produces the same wrong .add() pattern → bug is in mpv VO. If it produces a different wrong pattern → bug is in the producer (libva or ffmpeg V4L2 hwaccel respectively).

Cross-references

  • marfrit/libva-multiplanar#1 — the parent triage issue covering the user-visible green frames symptom. This issue is the discovered root cause.
  • marfrit/libva-v4l2-request-fourier#1 — separate libva cap_pool/REQBUFS cascade bug; gates the libva-path verifier.
  • Phase 0 elimination ladder in ~/src/dmabuf-modifier-triage/phase0_findings.md.

Environment

  • Host: ohm (PineTab2, RK3566, hantro G1, Mali-G52)
  • Kernel: 7.0.0-danctnix1-1-pinetab2-danctnix-besser (the bug ALSO reproduces on 6.19.10-danctnix1-1-pinetab2 — kernel-independent)
  • Mesa: 26.0.6-arch1.1 (also reproduces on 26.0.5 — Mesa-version-independent)
  • KWin: 6.6.4 OpenGL backend (also reproduces on stock arch kwin without kwin-fourier patches — compositor-variant-independent)
  • mpv: v0.41.0 (built Feb 14 2026)
  • ffmpeg-v4l2-request: 2:8.1-3
## Summary WAYLAND_DEBUG=1 capture of `mpv --hwdec=v4l2request --vo=dmabuf-wayland` against hantro-decoded NV12 H.264 on ohm (RK3566 / Mali-G52 / kernel 7.0.danctnix1-1-pinetab2-danctnix-besser, KWin 6.6.4 OpenGL backend / Mesa-panfrost 26.0.6) shows the mpv-side wl_dmabuf protocol message is internally inconsistent. The `zwp_linux_buffer_params_v1.add()` calls combine **per-plane fds** (V4L2 MPLANE export semantics) with a **single-allocation offset** for plane 1 — KWin imports plane 1 from the wrong byte address and reads zeros for the UV chroma plane, producing the dark-green frames the user sees. Discovered as the layer-isolation conclusion of the Phase 0 work in this repo. All other suspects (kwin-fourier 0001 patch, Mesa 26.0.5 vs 26.0.6, libva, decoder content correctness, color tagging, Wayland/KWin generally, kernel 6.19.10 vs 7.0, KWin Vulkan vs OpenGL backend) ruled out by A/B testing. See `phase0_findings.md` for the elimination ladder. ## Trace (verbatim from ohm session, 2026-05-08) ``` WAYLAND_DEBUG=1 mpv --hwdec=v4l2request --vo=dmabuf-wayland --frames=3 \ fourier-test/bbb_1080p30_h264.mp4 [ 675518.484] -> zwp_linux_dmabuf_v1#27.create_params(new id zwp_linux_buffer_params_v1#54) [ 675518.609] -> zwp_linux_buffer_params_v1#54.add(fd 41, 0, 0, 1920, 0, 0) ← Y plane [ 675518.632] -> zwp_linux_buffer_params_v1#54.add(fd 42, 1, 2088960, 1920, 0, 0) ← UV plane [ 675518.652] -> zwp_linux_buffer_params_v1#54.create_immed(new id wl_buffer#56, 1920, 1080, 842094158, 0) [ 675518.667] -> zwp_linux_buffer_params_v1#54.destroy() ``` All subsequent frames follow the same pattern: plane 0 uses fd N at offset 0, plane 1 uses fd N+1 at offset 2088960. - 1920×1088 = 2088960. So the offset is the size of a height-aligned (1080→1088) Y plane in NV12. That offset is correct for **a single underlying allocation** where Y at [0..2088960) and UV at [2088960..3133440). - BUT plane 0 uses `fd 41` and plane 1 uses `fd 42` — different file descriptors. V4L2 MPLANE EXPBUF returns **one fd per plane**, so fd 41 is the Y plane's fd and fd 42 is the UV plane's fd. Both can map the same backing memory but at byte offset 0. - Result of `add(fd 42, plane=1, offset=2088960, ...)`: KWin imports plane 1 from fd 42 starting at offset 2088960. If fd 42 maps only the UV plane (size ~1MB for 1920×1080 NV12), offset 2088960 is past EOF → KWin reads zeros for UV → all-zero NV12 buffer. ## Why this produces dark green specifically NV12 chroma is biased: U=128 + V=128 = "no color" (luma-only grayscale). When U=V=0 (instead of 128): - Y=0 + U=0 + V=0 in the BT.601/BT.709 → RGB matrix renders as approximately `(0, 70, 0)` — dark green. Matches the user's observed symptom exactly. ## Why mpv `--vo=gpu` doesn't trigger this mpv's `vo=gpu` imports the dmabuf via libva's `vaExportSurfaceHandle` or ffmpeg's `av_hwframe_transfer_data` directly into mpv's own EGL context. Those APIs return the *correct* per-plane structure (each plane fd maps the corresponding plane at offset 0). mpv-side EGL_EXT_image_dma_buf_import then samples both planes correctly. The bug only manifests in mpv's vo_dmabuf_wayland code path. ## Why mpv `--vo=wlshm` doesn't trigger this wlshm goes through CPU memcpy into a wl_shm buffer. Bypasses the dmabuf protocol entirely. No plane-semantics translation issue. ## Where the bug lives mpv's `video/out/vo_dmabuf_wayland.c` (or its drm_prime helper, e.g. `video/out/hwdec/dmabuf_interop_pl.c` if libplacebo is involved). The translation from the producer's plane info (`AVDRMFrameDescriptor` for ffmpeg path, `VADRMPRIMESurfaceDescriptor` for libva path) to wl_dmabuf `.add()` calls is mishandling one of two cases: ### Case A: producer uses single-fd-multiple-offsets convention The producer's plane info has all planes referencing the same fd, with offsets pointing to where each plane lives in that one allocation. mpv translation should pass `add(producer.fd, i, producer.plane[i].offset, producer.plane[i].pitch, ...)` for every plane — same fd repeated, with offsets. ### Case B: producer uses multi-fd-zero-offset convention The producer's plane info has each plane with its own fd, each fd already pointing at the start of its own plane's allocation. mpv translation should pass `add(producer.plane[i].fd, i, 0, producer.plane[i].pitch, ...)` for every plane — different fds, offset always 0. The trace shows mpv mixes both: different fds (Case B) **and** non-zero offset for plane 1 (Case A). One of the two reads from the producer is wrong — either the plane fds are being assigned right but offsets are wrong, or the offsets are being assigned right but fds should all be the same. ## Suggested fix path 1. Audit how `vo_dmabuf_wayland.c` extracts plane fd + offset from `AVDRMFrameDescriptor` (ffmpeg path) and `VADRMPRIMESurfaceDescriptor` (libva path). Check whether mpv is reading the wrong field. 2. Compare against working VO paths in mpv (gst-launch with waylandsink would be a useful cross-check that's not affected by the same bug). 3. Verify with a fixed mpv build that the picture displays correctly when: - Either: `add(fd 41, 0, 0, 1920, 0, 0); add(fd 41, 1, 2088960, 1920, 0, 0)` — same fd for both planes - Or: `add(fd 41, 0, 0, 1920, 0, 0); add(fd 42, 1, 0, 1920, 0, 0)` — different fds, both offset 0 ## Decisive verifier (deferred — needs libva-multiplanar iter9 fix first) Re-run the WAYLAND_DEBUG capture with `--hwdec=vaapi` (libva path) once `libva-v4l2-request-fourier#1` is fixed and the cap_pool/REQBUFS cascade no longer prevents libva playback. If the libva path produces the **same wrong .add() pattern** → bug is in mpv VO. If it produces a **different wrong pattern** → bug is in the producer (libva or ffmpeg V4L2 hwaccel respectively). ## Cross-references - [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1) — the parent triage issue covering the user-visible green frames symptom. This issue is the discovered root cause. - [marfrit/libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1) — separate libva cap_pool/REQBUFS cascade bug; gates the libva-path verifier. - Phase 0 elimination ladder in `~/src/dmabuf-modifier-triage/phase0_findings.md`. ## Environment - Host: ohm (PineTab2, RK3566, hantro G1, Mali-G52) - Kernel: 7.0.0-danctnix1-1-pinetab2-danctnix-besser (the bug ALSO reproduces on 6.19.10-danctnix1-1-pinetab2 — kernel-independent) - Mesa: 26.0.6-arch1.1 (also reproduces on 26.0.5 — Mesa-version-independent) - KWin: 6.6.4 OpenGL backend (also reproduces on stock arch kwin without kwin-fourier patches — compositor-variant-independent) - mpv: v0.41.0 (built Feb 14 2026) - ffmpeg-v4l2-request: 2:8.1-3
Author
Owner

Reference screenshot — frame 10

What the bug looks like on ohm. Captured 2026-05-08 via mpv --hwdec=v4l2request --vo=dmabuf-wayland --pause --start=00:00:00.42 --fullscreen fourier-test/bbb_1080p30_h264.mp4 + spectacle -b -f -n.

frame 10 dmabuf-wayland green

Uniform dark green ≈ RGB(0, 75, 0) — black bars top/bottom are letterboxing from fullscreen presentation on PineTab2's 1280×800 display vs the 1920×1080 / 16:9 source, not part of the bug. The exact green hue matches the predicted output of an all-zero NV12 buffer through BT.601/709 → RGB conversion (Y=0 + U=0 + V=0 ≈ RGB(0,70,0)), confirming the diagnosis: KWin reads UV plane past-EOF on fd 42, returns zeros for chroma.

Committed at screenshots/frame10_dmabuf_green.png (commit e293078).

## Reference screenshot — frame 10 What the bug looks like on ohm. Captured 2026-05-08 via `mpv --hwdec=v4l2request --vo=dmabuf-wayland --pause --start=00:00:00.42 --fullscreen fourier-test/bbb_1080p30_h264.mp4` + `spectacle -b -f -n`. ![frame 10 dmabuf-wayland green](https://git.reauktion.de/marfrit/dmabuf-modifier-triage/raw/branch/master/screenshots/frame10_dmabuf_green.png) Uniform dark green ≈ RGB(0, 75, 0) — black bars top/bottom are letterboxing from fullscreen presentation on PineTab2's 1280×800 display vs the 1920×1080 / 16:9 source, not part of the bug. The exact green hue matches the predicted output of an all-zero NV12 buffer through BT.601/709 → RGB conversion (Y=0 + U=0 + V=0 ≈ RGB(0,70,0)), confirming the diagnosis: KWin reads UV plane past-EOF on fd 42, returns zeros for chroma. Committed at `screenshots/frame10_dmabuf_green.png` (commit `e293078`).
Author
Owner

Phase 2 source-read overturns the original root-cause analysis (2026-05-08)

When the iter1 phase 2 source-read started against mpv 0.41.0 + Kwiboo's ffmpeg fork at commit b57fbbe (the _commit pin in marfrit-packages/arch/ffmpeg-v4l2-request-fourier/PKGBUILD), the original conclusion in this issue body — that mpv mixes per-plane fds with a single-allocation offset for plane 1 — did not survive contact with the actual code. The earlier diagnosis was wrong. Recording the revision here so future readers don't act on it.

What the source actually says

mpv's video/out/vo_dmabuf_wayland.c::drmprime_dmabuf_importer (lines 250-277):

for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) {
    AVDRMPlaneDescriptor plane = layer.planes[plane_no];
    int object_index = plane.object_index;
    AVDRMObjectDescriptor object = desc->objects[object_index];
    uint64_t modifier = object.format_modifier;
    zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset,
                                   plane.pitch, modifier >> 32, modifier & 0xffffffff);
}

No dup(), no rewriting. mpv passes the producer's AVDRMFrameDescriptor through unchanged.

Kwiboo's libavutil/hwcontext_v4l2request.c::v4l2request_set_drm_descriptor (lines 138-198):

desc->base.nb_objects = num_planes;        // = 1 for hantro single-planar NV12
desc->base.objects[0].fd = exportbuffer.fd;
layer->planes[0].object_index = 0;
layer->planes[0].offset = 0;
layer->planes[0].pitch = bytesperline;     // 1920
if (modifier != ARM_VENDOR) {              // hantro outputs LINEAR (0x0)
    layer->nb_planes = 2;
    layer->planes[1].object_index = 0;     // ← BOTH PLANES point at object 0
    layer->planes[1].offset = pitch * height;  // 1920 * 1088 = 2088960
}

Per this code, mpv should produce two .add() calls with identical fd values — both pull from desc->objects[0].fd.

What the runtime probe says

v4l2-ctl --get-fmt-video-mplane-cap on ohm /dev/video1:

Pixel Format      : 'NV12' (Y/UV 4:2:0)
Number of planes  : 1
plane_fmt[0]: sizeimage=3655712, bytesperline=1920

strace -e trace=ioctl mpv ... confirms ffmpeg does one VIDIOC_EXPBUF per CAPTURE buffer, returning a single fd. nb_objects = 1 matches.

Why the WAYLAND_DEBUG trace shows different fds

The original analysis read the trace literally:

add(fd 41, 0, 0,       1920, 0, 0)
add(fd 42, 1, 2088960, 1920, 0, 0)

Most likely explanation: libwayland's wl_closure_marshal dup_cloexec's the fd at protocol-marshal time, and WAYLAND_DEBUG prints the post-dup value. Both .add() calls pass the same source fd into libwayland; libwayland creates two dups (consecutive integers in the fd table), sends each via SCM_RIGHTS to the compositor, and prints the dup'd values. The two dups refer to the same underlying dma_buf object.

This means the wl_dmabuf message is not internally inconsistent:

  • Both planes reference the same backing memory (via fds that happen to have different table indices)
  • Plane 1's offset (2088960) is correct relative to the (single) underlying allocation that contains both Y and UV
  • ffmpeg's objects[0].size = 3655712 is the full allocation size — well above plane 1's offset

Nothing in the producer chain looks wrong by static analysis.

Where the bug actually lives — new hypothesis space

  1. Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for NV12 with non-zero plane offset. The driver may sample plane 1 from offset 0 of the imported fd instead of offset 2088960, returning zero-fill UV. Testable: a minimal EGL importer C program against a known NV12 dmabuf with offsets, read back via glReadPixels.
  2. KWin's linux-dmabuf-v1 import deduplicates the dup'd fds incorrectly. KWin may detect (via kcmp(2) or dma_buf_get_unique_id or similar) that the two received fds reference the same dma_buf, then mishandle the per-plane offsets. Source path: src/wayland/linuxdmabufv1clientbuffer.cpp in KDE Plasma 6 + the OpenGL compositor backend's EGL import.
  3. hantro kernel driver caps the exported dma_buf size to just the Y plane (2088960 bytes). If true, KWin's read at offset 2088960 falls past EOF → silent zero-fill → green frame. Testable in 30 minutes with a small C program on ohm: lseek(EXPBUF_fd, 0, SEEK_END) and check whether it returns 2,088,960 or 3,655,712.
  4. Earlier kwin-fourier A/B test was incomplete — it stopped on a stale Plasma session that didn't actually use stock arch kwin's compositor backend. Worth retesting from a fresh login with stock kwin confirmed running.

Recommended decisive next probe

Hypothesis 3 (kernel-side fd size) is the cheapest to falsify. If lseek on the EXPBUF fd reports 3,655,712 (full alloc), drop to 1/2/4. If it reports 2,088,960 (Y plane only), the bug is in hantro's kernel dma_buf export code and the fix lands in the kernel driver, not in any user-space component.

Cross-references

  • Source-read details + runtime probe outputs: ~/src/dmabuf-modifier-triage/phase2_iter1_findings.md (commit eddd9ef on origin).
  • Acceptance criterion at screenshots/frame10_expected.png is unchanged regardless of which layer turns out to be at fault.
  • Delivery vehicle (mpv-fourier-1:0.41.0-8) is still right if the fix turns out to be a defensive workaround in mpv. Otherwise the patch lands in ffmpeg / KWin / Mesa-panfrost / kernel hantro per which hypothesis fires.
## Phase 2 source-read overturns the original root-cause analysis (2026-05-08) When the iter1 phase 2 source-read started against mpv 0.41.0 + Kwiboo's ffmpeg fork at commit `b57fbbe` (the `_commit` pin in `marfrit-packages/arch/ffmpeg-v4l2-request-fourier/PKGBUILD`), the original conclusion in this issue body — that mpv mixes per-plane fds with a single-allocation offset for plane 1 — did not survive contact with the actual code. **The earlier diagnosis was wrong.** Recording the revision here so future readers don't act on it. ### What the source actually says **mpv's `video/out/vo_dmabuf_wayland.c::drmprime_dmabuf_importer` (lines 250-277):** ```c for (plane_no = 0; plane_no < layer.nb_planes; ++plane_no) { AVDRMPlaneDescriptor plane = layer.planes[plane_no]; int object_index = plane.object_index; AVDRMObjectDescriptor object = desc->objects[object_index]; uint64_t modifier = object.format_modifier; zwp_linux_buffer_params_v1_add(params, object.fd, plane_no, plane.offset, plane.pitch, modifier >> 32, modifier & 0xffffffff); } ``` No `dup()`, no rewriting. mpv passes the producer's `AVDRMFrameDescriptor` through unchanged. **Kwiboo's `libavutil/hwcontext_v4l2request.c::v4l2request_set_drm_descriptor` (lines 138-198):** ```c desc->base.nb_objects = num_planes; // = 1 for hantro single-planar NV12 desc->base.objects[0].fd = exportbuffer.fd; layer->planes[0].object_index = 0; layer->planes[0].offset = 0; layer->planes[0].pitch = bytesperline; // 1920 if (modifier != ARM_VENDOR) { // hantro outputs LINEAR (0x0) layer->nb_planes = 2; layer->planes[1].object_index = 0; // ← BOTH PLANES point at object 0 layer->planes[1].offset = pitch * height; // 1920 * 1088 = 2088960 } ``` Per this code, mpv should produce two `.add()` calls with **identical fd values** — both pull from `desc->objects[0].fd`. ### What the runtime probe says `v4l2-ctl --get-fmt-video-mplane-cap` on ohm `/dev/video1`: ``` Pixel Format : 'NV12' (Y/UV 4:2:0) Number of planes : 1 plane_fmt[0]: sizeimage=3655712, bytesperline=1920 ``` `strace -e trace=ioctl mpv ...` confirms ffmpeg does **one** `VIDIOC_EXPBUF` per CAPTURE buffer, returning a single fd. `nb_objects = 1` matches. ### Why the WAYLAND_DEBUG trace shows different fds The original analysis read the trace literally: ``` add(fd 41, 0, 0, 1920, 0, 0) add(fd 42, 1, 2088960, 1920, 0, 0) ``` Most likely explanation: libwayland's `wl_closure_marshal` `dup_cloexec`'s the fd at protocol-marshal time, and `WAYLAND_DEBUG` prints the post-dup value. Both `.add()` calls pass the same source fd into libwayland; libwayland creates two dups (consecutive integers in the fd table), sends each via `SCM_RIGHTS` to the compositor, and prints the dup'd values. The two dups refer to the same underlying `dma_buf` object. This means the wl_dmabuf message is **not internally inconsistent**: - Both planes reference the same backing memory (via fds that happen to have different table indices) - Plane 1's offset (2088960) is correct relative to the (single) underlying allocation that contains both Y and UV - ffmpeg's `objects[0].size = 3655712` is the full allocation size — well above plane 1's offset Nothing in the producer chain looks wrong by static analysis. ### Where the bug actually lives — new hypothesis space 1. **Mali-G52 panfrost `EGL_EXT_image_dma_buf_import_modifiers` regression for NV12 with non-zero plane offset.** The driver may sample plane 1 from offset 0 of the imported fd instead of offset 2088960, returning zero-fill UV. Testable: a minimal EGL importer C program against a known NV12 dmabuf with offsets, read back via `glReadPixels`. 2. **KWin's `linux-dmabuf-v1` import deduplicates the dup'd fds incorrectly.** KWin may detect (via `kcmp(2)` or `dma_buf_get_unique_id` or similar) that the two received fds reference the same dma_buf, then mishandle the per-plane offsets. Source path: `src/wayland/linuxdmabufv1clientbuffer.cpp` in KDE Plasma 6 + the OpenGL compositor backend's EGL import. 3. **hantro kernel driver caps the exported `dma_buf` size to just the Y plane (2088960 bytes).** If true, KWin's read at offset 2088960 falls past EOF → silent zero-fill → green frame. Testable in 30 minutes with a small C program on ohm: `lseek(EXPBUF_fd, 0, SEEK_END)` and check whether it returns 2,088,960 or 3,655,712. 4. **Earlier kwin-fourier A/B test was incomplete** — it stopped on a stale Plasma session that didn't actually use stock arch kwin's compositor backend. Worth retesting from a fresh login with stock kwin confirmed running. ### Recommended decisive next probe Hypothesis 3 (kernel-side fd size) is the cheapest to falsify. If `lseek` on the EXPBUF fd reports 3,655,712 (full alloc), drop to 1/2/4. If it reports 2,088,960 (Y plane only), the bug is in hantro's kernel `dma_buf` export code and the fix lands in the kernel driver, not in any user-space component. ### Cross-references - Source-read details + runtime probe outputs: `~/src/dmabuf-modifier-triage/phase2_iter1_findings.md` (commit `eddd9ef` on origin). - Acceptance criterion at `screenshots/frame10_expected.png` is unchanged regardless of which layer turns out to be at fault. - Delivery vehicle (`mpv-fourier-1:0.41.0-8`) is still right *if* the fix turns out to be a defensive workaround in mpv. Otherwise the patch lands in ffmpeg / KWin / Mesa-panfrost / kernel hantro per which hypothesis fires.
Author
Owner

iter1 phase 2 — hypothesis 3 RULED OUT (probe run 2026-05-08 on ohm)

Wrote /tmp/expbuf_probe.c — minimal V4L2 harness that opens /dev/video1, sets OUTPUT format to V4L2_PIX_FMT_H264_SLICE 1920×1088, retrieves CAPTURE format, REQBUFS 4 buffers, then VIDIOC_EXPBUF on buffer 0 plane 0 and lseek(fd, 0, SEEK_END).

Result:

driver:   hantro-vpu
card:     rockchip,rk3568-vpu-dec
CAPTURE:  NV12 1920x1088 num_planes=1
  plane[0]: sizeimage=3655712 bytesperline=1920
*** plane[0] EXPBUF fd=4  lseek(SEEK_END)=3657728 ***

Kernel exports the dma_buf at the full sizeimage (3,655,712), page-rounded up to 3,657,728. Offset 2,088,960 (where ffmpeg places UV plane) is well inside. Hantro is innocent.

Bonus observation, also ruling out a candidate hypothesis 5 I was about to file: the reported sizeimage is bigger than naïve NV12's 1920×1088×1.5 = 3,133,440. Difference is 522,272 bytes of trailing padding. But the math works out cleanly:

  • Y at [0, 2,088,960)
  • UV at [2,088,960, 3,133,440)
  • Trailing padding at [3,133,440, 3,655,712) — likely Rockchip per-frame MV/context metadata

Meaning ffmpeg's hardcoded planes[1].offset = pitch*height = 2,088,960 is correct — UV starts exactly there.

Remaining hypothesis space:

  1. Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for non-zero plane offset — alive
  2. KWin's wl_dmabuf import logic — alive
  3. kwin-fourier 0001 residual — low-confidence latent

Next probe: hypothesis 2 source-read of KWin 6.6.4 src/wayland/linuxdmabufv1clientbuffer.cpp + EGL backend import path (~30 min, no hardware). If that doesn't show a clear bug, fall back to writing the EGL importer harness for hypothesis 1 (~1-2h).

**iter1 phase 2 — hypothesis 3 RULED OUT** (probe run 2026-05-08 on ohm) Wrote `/tmp/expbuf_probe.c` — minimal V4L2 harness that opens `/dev/video1`, sets OUTPUT format to `V4L2_PIX_FMT_H264_SLICE` 1920×1088, retrieves CAPTURE format, REQBUFS 4 buffers, then `VIDIOC_EXPBUF` on buffer 0 plane 0 and `lseek(fd, 0, SEEK_END)`. **Result:** ``` driver: hantro-vpu card: rockchip,rk3568-vpu-dec CAPTURE: NV12 1920x1088 num_planes=1 plane[0]: sizeimage=3655712 bytesperline=1920 *** plane[0] EXPBUF fd=4 lseek(SEEK_END)=3657728 *** ``` Kernel exports the dma_buf at the full sizeimage (3,655,712), page-rounded up to 3,657,728. Offset 2,088,960 (where ffmpeg places UV plane) is well inside. Hantro is innocent. **Bonus observation, also ruling out a candidate hypothesis 5 I was about to file**: the reported sizeimage is *bigger* than naïve NV12's 1920×1088×1.5 = 3,133,440. Difference is 522,272 bytes of trailing padding. But the math works out cleanly: - Y at [0, 2,088,960) - UV at [2,088,960, 3,133,440) - Trailing padding at [3,133,440, 3,655,712) — likely Rockchip per-frame MV/context metadata Meaning ffmpeg's hardcoded `planes[1].offset = pitch*height = 2,088,960` is **correct** — UV starts exactly there. **Remaining hypothesis space:** 1. **Mali-G52 panfrost EGL_EXT_image_dma_buf_import_modifiers regression for non-zero plane offset** — alive 2. **KWin's wl_dmabuf import logic** — alive 4. **kwin-fourier 0001 residual** — low-confidence latent **Next probe**: hypothesis 2 source-read of KWin 6.6.4 `src/wayland/linuxdmabufv1clientbuffer.cpp` + EGL backend import path (~30 min, no hardware). If that doesn't show a clear bug, fall back to writing the EGL importer harness for hypothesis 1 (~1-2h).
Author
Owner

iter1 phase 2 — hypothesis 2 RULED OUT (source-read of KWin 6.6.4 on 2026-05-08)

Shallow-cloned KWin 6.6.4 to references/kwin-6.6.4/. Read paths:

  • src/wayland/linuxdmabufv1clientbuffer.cpp — wl_linux_dmabuf protocol handler
  • src/opengl/eglbackend.cpptestImportBuffer() decision logic
  • src/opengl/egldisplay.cpp — actual eglCreateImage call

No fd dedup, no offset transformation:

  1. LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add (line 106) stores each fd/offset/pitch in m_attrs.fd[i], m_attrs.offset[i], m_attrs.pitch[i] per plane index. No comparison of fds across calls. Two .add() invocations with two dup'd-same-fd values produce two independent slots.

  2. LinuxDmaBufParamsV1::test() (line 233) does lseek(fd[i], 0, SEEK_END) per plane and validates offset[i] < size, offset[i] + pitch[i] <= size, modifier consistency. With our exported size 3,657,728 vs offset[1]=2,088,960 + pitch=1920, all checks pass.

  3. EglDisplay::importDmaBufAsImage (both forms — combined multi-plane and per-plane single-format) passes dmabuf.fd[i], dmabuf.offset[i], dmabuf.pitch[i] straight to eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...) with no transformation.

  4. EglBackend::testImportBuffer (eglbackend.cpp:338) picks between two paths:

    • Combined import (line 342): single EGL_image with EGL_DMA_BUF_PLANE0_* + EGL_DMA_BUF_PLANE1_* attribs. Used if NV12+LINEAR is in nonExternalOnlySupportedDrmFormats().
    • Per-plane import (line 353-356): separate EGL_image per plane (Y as R8, UV as RG88 from offset 2,088,960). Fallback for external-only formats.

Either path forwards offset = 2,088,960 for plane 1 to the driver. KWin is innocent.

Live hypothesis is now solely H1: panfrost's EGL_DMA_BUF_PLANE*_OFFSET_EXT handling for LINEAR NV12 (or per-plane RG88 with non-zero offset).

The per-plane path on Mali-G52 is most likely (Mesa-panfrost typically reports YUV+LINEAR as external-only) — meaning UV gets imported as a stand-alone RG88 1920×544 EGL_image with PLANE0_OFFSET = 2,088,960. If panfrost's KMS/EGL Mesa code samples from offset 0 of the underlying fd instead of honoring 2,088,960, UV would read zero-fill = green frame.

Next probe: minimal EGL importer harness on ohm. Synthesize a known NV12 buffer in CPU memory (Y = 0x80 fill, UV = 0x40/0xC0 alternating), write to a udmabuf, do eglCreateImage with PLANE0_OFFSET=2088960 + format RG88, render via fullscreen quad with samplerExternalOES, glReadPixels. If output reads zeros, panfrost confirmed culprit. ~1-2h. Then file an upstream Mesa bug + handle in vulkan-panfrost or kwin-fourier.

**iter1 phase 2 — hypothesis 2 RULED OUT** (source-read of KWin 6.6.4 on 2026-05-08) Shallow-cloned KWin 6.6.4 to `references/kwin-6.6.4/`. Read paths: - `src/wayland/linuxdmabufv1clientbuffer.cpp` — wl_linux_dmabuf protocol handler - `src/opengl/eglbackend.cpp` — `testImportBuffer()` decision logic - `src/opengl/egldisplay.cpp` — actual `eglCreateImage` call **No fd dedup, no offset transformation:** 1. `LinuxDmaBufParamsV1::zwp_linux_buffer_params_v1_add` (line 106) stores each fd/offset/pitch in `m_attrs.fd[i]`, `m_attrs.offset[i]`, `m_attrs.pitch[i]` per plane index. No comparison of fds across calls. Two `.add()` invocations with two dup'd-same-fd values produce two independent slots. 2. `LinuxDmaBufParamsV1::test()` (line 233) does `lseek(fd[i], 0, SEEK_END)` per plane and validates `offset[i] < size`, `offset[i] + pitch[i] <= size`, modifier consistency. With our exported size 3,657,728 vs offset[1]=2,088,960 + pitch=1920, all checks pass. 3. `EglDisplay::importDmaBufAsImage` (both forms — combined multi-plane and per-plane single-format) passes `dmabuf.fd[i]`, `dmabuf.offset[i]`, `dmabuf.pitch[i]` straight to `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` with no transformation. 4. `EglBackend::testImportBuffer` (eglbackend.cpp:338) picks between two paths: - **Combined import** (line 342): single EGL_image with `EGL_DMA_BUF_PLANE0_*` + `EGL_DMA_BUF_PLANE1_*` attribs. Used if NV12+LINEAR is in `nonExternalOnlySupportedDrmFormats()`. - **Per-plane import** (line 353-356): separate EGL_image per plane (Y as R8, UV as RG88 from offset 2,088,960). Fallback for external-only formats. Either path forwards `offset = 2,088,960` for plane 1 to the driver. **KWin is innocent.** **Live hypothesis is now solely H1: panfrost's `EGL_DMA_BUF_PLANE*_OFFSET_EXT` handling for LINEAR NV12 (or per-plane RG88 with non-zero offset).** The per-plane path on Mali-G52 is most likely (Mesa-panfrost typically reports YUV+LINEAR as external-only) — meaning UV gets imported as a stand-alone RG88 1920×544 EGL_image with `PLANE0_OFFSET = 2,088,960`. If panfrost's KMS/EGL Mesa code samples from offset 0 of the underlying fd instead of honoring 2,088,960, UV would read zero-fill = green frame. **Next probe**: minimal EGL importer harness on ohm. Synthesize a known NV12 buffer in CPU memory (Y = 0x80 fill, UV = 0x40/0xC0 alternating), write to a udmabuf, do `eglCreateImage` with PLANE0_OFFSET=2088960 + format RG88, render via fullscreen quad with `samplerExternalOES`, `glReadPixels`. If output reads zeros, panfrost confirmed culprit. ~1-2h. Then file an upstream Mesa bug + handle in vulkan-panfrost or kwin-fourier.
Author
Owner

iter1 phase 2 — Mesa-panfrost source-read + green-color math (2026-05-08)

Shallow-cloned Mesa 26.0.6 (matching ohm's installed mesa 1:26.0.6-1 and vulkan-panfrost 1:26.0.6-1). Read the EGL/DRI/gallium/panfrost stack along the path KWin's per-plane import takes.

Path through Mesa for our case:

  1. pan_screen.c:443 reports external_only=true for any YUV format (including NV12+LINEAR). KWin's testImportBuffer thus takes the per-plane path (Y as R8, UV as DRM_FORMAT_GR88).
  2. loader_dri_helper.c:43 maps DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM — sampling returns byte 0 as .r, byte 1 as .g. NV12 chroma byte 0=Cb, byte 1=Cr. So .r=U=Cb, .g=V=Cr.
  3. KWin's YUV→RGB shader (glshadermanager.cpp:189): result = vec4(sampler.x, sampler1.rg, 1.0) then yuvToRgb * result.rgba. So result.y=U, result.z=V. Matches what Mesa returns. Chroma swap is NOT the bug.
  4. pan_resource.c:354-358 captures whandle->offset into explicit_layout.offset_B.
  5. pan_mod.c:663-667 (linear modifier slice-init) honors layout_constraints->offset_B directly with only an alignment check; 2,088,960 is page-aligned so passes.
  6. pan_texture.c:361,561,660,773,817 set the texture descriptor's GPU base to plane->base + slayout->offset_B. Sampling reads from bo_gpu + 2,088,960. ✓

Conclusion of source-read: panfrost looks innocent at the offset-handling layer. KWin looks innocent. Format mapping looks consistent. H1 demoted from leading candidate to less-likely (cannot be conclusively ruled out without runtime EGL probe but the obvious places are clean).

Green-color math points elsewhere:

BT.601 limited-range YUV(0,0,0) → RGB conversion:

R = 1.164*(0-16) + 1.596*(0-128)            = -222.9 → clamp 0
G = 1.164*(0-16) - 0.391*(0-128) - 0.813*(0-128) = +135.5 → 135
B = 1.164*(0-16) + 2.018*(0-128)            = -276.9 → clamp 0

→ RGB(0, 135, 0) — exactly the green tone in frame10_dmabuf_green.png.

That means panfrost is reading zero-fill bytes despite hantro having written real YUV data. Not an offset bug, not a format bug — a synchronization or cache-coherency bug.

New leading hypotheses:

H6 — DMA cache coherency between hantro VPU and Mali GPU: V4L2 doesn't attach implicit fences to CAPTURE buffers on DQBUF (this is exactly the gap our vb2_dma_resv RFC addresses upstream — see ~/.claude/projects/-home-mfritsche-src-fourier/memory/project_vb2_dma_resv_v2_state.md). Mali begins sampling before hantro's writes have flushed to coherent memory. Mali sees zero-fill backing.

H7 — Panfrost dma_buf import path lacks GPU-side cache invalidation at attach/map time. Even after data lands in DRAM, Mali's MMU/cache may serve stale reads.

Next probe options (ranked):

  1. Cache-sync ioctl workaround test (~30 min cost, decisive on H6/H7): patch mpv-fourier's vo_dmabuf_wayland.c to call DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ) on each EXPBUF fd before submitting to KWin. If green goes away → H6/H7 confirmed, workaround is shippable, root-cause kernel bug filed separately.

  2. EGL importer harness (~1-2h): synthesize known NV12 buffer in CPU memory → udmabuf → eglCreateImage with PLANE0_OFFSET=2088960 → render → glReadPixels. Decides H1 definitively. Synthesizing in CPU memory rules out the producer-side cache-coherency variable.

  3. Mesa debug log (~15 min): MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo for cheap visibility into panfrost's actual buffer/format/offset choices.

Leaning toward option 1 — cheapest probe AND a viable workaround if it works.

**iter1 phase 2 — Mesa-panfrost source-read + green-color math** (2026-05-08) Shallow-cloned Mesa 26.0.6 (matching ohm's installed `mesa 1:26.0.6-1` and `vulkan-panfrost 1:26.0.6-1`). Read the EGL/DRI/gallium/panfrost stack along the path KWin's per-plane import takes. **Path through Mesa for our case:** 1. `pan_screen.c:443` reports `external_only=true` for any YUV format (including NV12+LINEAR). KWin's `testImportBuffer` thus takes the per-plane path (Y as R8, UV as DRM_FORMAT_GR88). 2. `loader_dri_helper.c:43` maps `DRM_FORMAT_GR88 ↔ PIPE_FORMAT_RG88_UNORM` — sampling returns byte 0 as `.r`, byte 1 as `.g`. NV12 chroma byte 0=Cb, byte 1=Cr. So `.r=U=Cb, .g=V=Cr`. 3. KWin's YUV→RGB shader (glshadermanager.cpp:189): `result = vec4(sampler.x, sampler1.rg, 1.0)` then `yuvToRgb * result.rgba`. So `result.y=U`, `result.z=V`. Matches what Mesa returns. **Chroma swap is NOT the bug.** 4. `pan_resource.c:354-358` captures `whandle->offset` into `explicit_layout.offset_B`. 5. `pan_mod.c:663-667` (linear modifier slice-init) honors `layout_constraints->offset_B` directly with only an alignment check; 2,088,960 is page-aligned so passes. 6. `pan_texture.c:361,561,660,773,817` set the texture descriptor's GPU base to `plane->base + slayout->offset_B`. Sampling reads from `bo_gpu + 2,088,960`. ✓ **Conclusion of source-read**: panfrost looks innocent at the offset-handling layer. KWin looks innocent. Format mapping looks consistent. H1 demoted from leading candidate to *less-likely* (cannot be conclusively ruled out without runtime EGL probe but the obvious places are clean). **Green-color math points elsewhere:** BT.601 limited-range YUV(0,0,0) → RGB conversion: ``` R = 1.164*(0-16) + 1.596*(0-128) = -222.9 → clamp 0 G = 1.164*(0-16) - 0.391*(0-128) - 0.813*(0-128) = +135.5 → 135 B = 1.164*(0-16) + 2.018*(0-128) = -276.9 → clamp 0 ``` → RGB(0, 135, 0) — *exactly* the green tone in `frame10_dmabuf_green.png`. That means **panfrost is reading zero-fill bytes despite hantro having written real YUV data**. Not an offset bug, not a format bug — a *synchronization* or *cache-coherency* bug. **New leading hypotheses:** **H6 — DMA cache coherency between hantro VPU and Mali GPU**: V4L2 doesn't attach implicit fences to CAPTURE buffers on DQBUF (this is exactly the gap [our `vb2_dma_resv` RFC](https://lore.kernel.org/linux-media/) addresses upstream — see `~/.claude/projects/-home-mfritsche-src-fourier/memory/project_vb2_dma_resv_v2_state.md`). Mali begins sampling before hantro's writes have flushed to coherent memory. Mali sees zero-fill backing. **H7 — Panfrost dma_buf import path lacks GPU-side cache invalidation** at attach/map time. Even after data lands in DRAM, Mali's MMU/cache may serve stale reads. **Next probe options (ranked):** 1. **Cache-sync ioctl workaround test** (~30 min cost, decisive on H6/H7): patch `mpv-fourier`'s `vo_dmabuf_wayland.c` to call `DMA_BUF_IOCTL_SYNC(DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ)` on each EXPBUF fd before submitting to KWin. If green goes away → H6/H7 confirmed, workaround is shippable, root-cause kernel bug filed separately. 2. **EGL importer harness** (~1-2h): synthesize known NV12 buffer in CPU memory → udmabuf → eglCreateImage with PLANE0_OFFSET=2088960 → render → glReadPixels. Decides H1 definitively. Synthesizing in CPU memory rules out the producer-side cache-coherency variable. 3. **Mesa debug log** (~15 min): `MESA_DEBUG=1 PAN_MESA_DEBUG=trace,bo` for cheap visibility into panfrost's actual buffer/format/offset choices. Leaning toward option 1 — cheapest probe AND a viable workaround if it works.
Author
Owner

iter1 phase 3 — H6 RULED OUT (test run 2026-05-08)

Patch deployed: mpv-fourier-1:0.41.0-9 adds DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW) + matching SYNC_END on each EXPBUF fd in both vaapi_dmabuf_importer AND drmprime_dmabuf_importer (the patch covers VAAPI and DRMPrime paths symmetrically). Built via Gitea Actions run #80, installed on ohm.

Test: mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 --quiet ~/fourier-test/bbb_1080p30_h264.mp4

  • v4l2request hwdec confirmed engaged: v4l2-request: cap_pool_init: 24 slots ready x3
  • VO engaged: VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]
  • Spectacle full-screen capture

Result: md5 c8c8e9b88521a0069f709d483451c3d4byte-identical to baseline frame10_dmabuf_green.png. Visual inspection confirms: same solid dark green ≈ RGB(0, 77, 0) (BT.709 limited-range YUV(0,0,0)).

Conclusion: the userspace DMA_BUF_IOCTL_SYNC cache-coherency workaround does not fix the green. H6 is dead. Either hantro's dma_buf_ops->begin_cpu_access is a no-op for the buffer type used by the V4L2 stateless decoder (likely on Rockchip), OR the gap is on the GPU consumer side where CPU cache state is irrelevant.

Critical phase-3 observation: --hwdec=v4l2request --vo=gpu (CPU-mmap → glTexSubImage2D upload path) is known-working — renders correctly. So the buffer DOES contain valid YUV data, the CPU CAN read it, and the decoder is producing legit content. Only the zero-copy dma_buf-to-Mali path renders zeros. This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself.

Live hypothesis space narrows further:

  • H1, H2, H3, H5, H6 all ruled out
  • H7 (panfrost dma_buf import GPU-side cache invalidation OR BO-type / cache-attribute mismatch) — now strongly leading
  • H4 (kwin-fourier residual) — latent, low confidence

Next probe options ranked:

  1. Read panfrost kernel-mode source (drivers/gpu/drm/panfrost/panfrost_gem.c, panfrost_gem_prime_import_sg_table, MMU mapping for cache attributes) — ~45 min, no hardware
  2. EGL importer harness with synthetic NV12 in CPU-allocated udmabuf — distinguishes "hantro-allocated buffer specifically" vs "general panfrost dma_buf import bug" (~1-2h)
  3. MESA_DEBUG=verbose PAN_MESA_DEBUG=sync,trace log of the buggy run — cheap recon (~15 min)

Leaning #1 first (cheapest source-read), #2 if #1 turns up nothing.

mpv-fourier-1:0.41.0-9 keeps the no-op patch installed (harmless). Will be replaced or removed in the next iteration.

**iter1 phase 3 — H6 RULED OUT** (test run 2026-05-08) **Patch deployed**: `mpv-fourier-1:0.41.0-9` adds `DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW)` + matching `SYNC_END` on each EXPBUF fd in **both** `vaapi_dmabuf_importer` AND `drmprime_dmabuf_importer` (the patch covers VAAPI and DRMPrime paths symmetrically). Built via Gitea Actions run #80, installed on ohm. **Test**: `mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --pause --start=00:00:00.42 --quiet ~/fourier-test/bbb_1080p30_h264.mp4` - v4l2request hwdec confirmed engaged: `v4l2-request: cap_pool_init: 24 slots ready` x3 - VO engaged: `VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]` - Spectacle full-screen capture **Result**: md5 `c8c8e9b88521a0069f709d483451c3d4` — **byte-identical** to baseline `frame10_dmabuf_green.png`. Visual inspection confirms: same solid dark green ≈ RGB(0, 77, 0) (BT.709 limited-range YUV(0,0,0)). **Conclusion**: the userspace `DMA_BUF_IOCTL_SYNC` cache-coherency workaround does not fix the green. H6 is dead. Either hantro's `dma_buf_ops->begin_cpu_access` is a no-op for the buffer type used by the V4L2 stateless decoder (likely on Rockchip), OR the gap is on the GPU consumer side where CPU cache state is irrelevant. **Critical phase-3 observation**: `--hwdec=v4l2request --vo=gpu` (CPU-mmap → glTexSubImage2D upload path) is known-working — renders correctly. So the buffer DOES contain valid YUV data, the CPU CAN read it, and the decoder is producing legit content. **Only the zero-copy dma_buf-to-Mali path renders zeros.** This rules out "data isn't there" entirely and concentrates the hypothesis on the dma_buf → Mali GPU import/translation step itself. **Live hypothesis space narrows further**: - ~~H1, H2, H3, H5, H6~~ all ruled out - **H7 (panfrost dma_buf import GPU-side cache invalidation OR BO-type / cache-attribute mismatch)** — now strongly leading - H4 (kwin-fourier residual) — latent, low confidence **Next probe options ranked**: 1. Read panfrost kernel-mode source (`drivers/gpu/drm/panfrost/panfrost_gem.c`, `panfrost_gem_prime_import_sg_table`, MMU mapping for cache attributes) — ~45 min, no hardware 2. EGL importer harness with synthetic NV12 in CPU-allocated `udmabuf` — distinguishes "hantro-allocated buffer specifically" vs "general panfrost dma_buf import bug" (~1-2h) 3. `MESA_DEBUG=verbose PAN_MESA_DEBUG=sync,trace` log of the buggy run — cheap recon (~15 min) Leaning #1 first (cheapest source-read), #2 if #1 turns up nothing. mpv-fourier-1:0.41.0-9 keeps the no-op patch installed (harmless). Will be replaced or removed in the next iteration.
Author
Owner

iter1 phase 4 — H7 CONFIRMED via panfrost kernel source-read (2026-05-08)

Read the panfrost kernel driver at Linux 6.12 (~/src/linux-rfc/drivers/gpu/drm/panfrost/). Smoking gun chain:

  1. panfrost_gem.c:262obj->base.map_wc = !pfdev->coherent; — sets write-combine (uncached) CPU mapping when device isn't coherent. Applies to imports too (drm_gem_shmem_prime_import_sg_table calls back into panfrost_gem_create_object).

  2. panfrost_drv.c:625pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT; — i.e., from DT dma-coherent property on the panfrost node.

  3. On ohm RK3566 PineTab2 besser-7.0: NO dma-coherent property anywhere in /sys/firmware/devicetree/base/ (verified). So pfdev->coherent = false.

  4. panfrost_mmu.c:330int prot = IOMMU_READ | IOMMU_WRITE;NO IOMMU_CACHE. Imported BOs are mapped into Mali's IOMMU as non-snooping. Mali reads directly from DRAM, bypassing CPU caches entirely.

  5. KWin caches EGL_images per-fd in m_importedBuffers (eglbackend.cpp:282). For each of the rotating ~15 buffers, EGL_image is created once and reused. The only cache sync is the one at dma_buf_map_attachment time during initial import.

  6. No per-frame cache sync mechanism exists. V4L2 doesn't attach dma_resv fences to CAPTURE buffers on DQBUF.

Architectural picture is now clear: hantro writes through CPU L1/L2/L3 caches → Mali reads through non-snooping IOMMU → DRAM-direct → sees stale or zero-fill data. Result: green frames on every frame after the first cache-flushed one.

Counter-validation that confirms the diagnosis:

  • --vo=gpu works correctly. CPU-mmap of dma_buf triggers cache sync via begin_cpu_access. Then glTexSubImage2D copies to a Mali-private (cached/coherent) BO. Per-frame implicit cache sync.
  • --vo=dmabuf-wayland fails. Zero-copy import → no per-frame sync.

Why DMA_BUF_IOCTL_SYNC (phase 3) didn't help: it invokes begin/end_cpu_access — CPU-side cache management. Doesn't propagate to GPU IOMMU mapping. Mali still reads through its non-snooping mapping.

ROOT CAUSE

The dmabuf-wayland green is structurally the same bug that our vb2_dma_resv RFC v2 addresses upstream on linux-media. With V4L2 attaching a dma_resv fence on CAPTURE DQBUF, mesa-panfrost's implicit fence-wait at sample time will block until hantro signals — and the fence signaling enforces cache writeback. The RFC v2 is the fix for this bug.

Proposed iter2 / phase 5 path

Build linux-pinetab2-danctnix-besser 7.0 with the vb2_dma_resv RFC v2 patches applied. Install on ohm. Retest the green-frame case. If green goes away:

  • Confirms our upstream RFC fixes a real shipping-product bug
  • A locally-shippable kernel package via linux-pinetab2-fourier (or similar)
  • A strong concrete data point for the v2 cover letter

Estimated cost: ~2-3 hours (rebase patches against besser-7.0 → distcc kernel build → reboot ohm → retest).

H7 alive sub-cases (probably moot now, recorded for completeness)

If the kernel-rebuild path is blocked, fallback validation paths:

  • EGL importer harness with synthetic NV12 in CPU-allocated udmabuf — should work because CPU writes get flushed naturally; would confirm by independent test.
  • kwin-fourier per-frame EGL_image invalidate — costly, defeats zero-copy.
  • mpv-fourier per-frame re-import — same.

Status of mpv-fourier-1:0.41.0-9

The harmless no-op DMA_BUF_IOCTL_SYNC patch stays installed. When the kernel-rebuild path works, the patch gets reverted in the next mpv-fourier rev.

**iter1 phase 4 — H7 CONFIRMED via panfrost kernel source-read** (2026-05-08) Read the panfrost kernel driver at Linux 6.12 (`~/src/linux-rfc/drivers/gpu/drm/panfrost/`). Smoking gun chain: 1. **`panfrost_gem.c:262`** — `obj->base.map_wc = !pfdev->coherent;` — sets write-combine (uncached) CPU mapping when device isn't coherent. Applies to imports too (`drm_gem_shmem_prime_import_sg_table` calls back into `panfrost_gem_create_object`). 2. **`panfrost_drv.c:625`** — `pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;` — i.e., from DT `dma-coherent` property on the panfrost node. 3. **On ohm RK3566 PineTab2 besser-7.0**: NO `dma-coherent` property anywhere in `/sys/firmware/devicetree/base/` (verified). So `pfdev->coherent = false`. 4. **`panfrost_mmu.c:330`** — `int prot = IOMMU_READ | IOMMU_WRITE;` — **NO `IOMMU_CACHE`**. Imported BOs are mapped into Mali's IOMMU as non-snooping. Mali reads directly from DRAM, bypassing CPU caches entirely. 5. **KWin caches EGL_images per-fd** in `m_importedBuffers` (eglbackend.cpp:282). For each of the rotating ~15 buffers, EGL_image is created once and reused. The only cache sync is the one at `dma_buf_map_attachment` time during initial import. 6. **No per-frame cache sync mechanism exists.** V4L2 doesn't attach `dma_resv` fences to CAPTURE buffers on DQBUF. **Architectural picture is now clear**: hantro writes through CPU L1/L2/L3 caches → Mali reads through non-snooping IOMMU → DRAM-direct → sees stale or zero-fill data. Result: green frames on every frame after the first cache-flushed one. **Counter-validation that confirms the diagnosis**: - `--vo=gpu` works correctly. CPU-mmap of dma_buf triggers cache sync via `begin_cpu_access`. Then `glTexSubImage2D` copies to a Mali-private (cached/coherent) BO. Per-frame implicit cache sync. - `--vo=dmabuf-wayland` fails. Zero-copy import → no per-frame sync. **Why DMA_BUF_IOCTL_SYNC (phase 3) didn't help**: it invokes `begin/end_cpu_access` — CPU-side cache management. Doesn't propagate to GPU IOMMU mapping. Mali still reads through its non-snooping mapping. ## ROOT CAUSE The dmabuf-wayland green is **structurally the same bug** that our `vb2_dma_resv` RFC v2 addresses upstream on linux-media. With V4L2 attaching a `dma_resv` fence on CAPTURE DQBUF, mesa-panfrost's implicit fence-wait at sample time will block until hantro signals — and the fence signaling enforces cache writeback. **The RFC v2 is the fix for this bug.** ## Proposed iter2 / phase 5 path Build `linux-pinetab2-danctnix-besser` 7.0 with the `vb2_dma_resv` RFC v2 patches applied. Install on ohm. Retest the green-frame case. If green goes away: - ✅ Confirms our upstream RFC fixes a real shipping-product bug - ✅ A locally-shippable kernel package via `linux-pinetab2-fourier` (or similar) - ✅ A strong concrete data point for the v2 cover letter Estimated cost: ~2-3 hours (rebase patches against besser-7.0 → distcc kernel build → reboot ohm → retest). ## H7 alive sub-cases (probably moot now, recorded for completeness) If the kernel-rebuild path is blocked, fallback validation paths: - EGL importer harness with synthetic NV12 in CPU-allocated udmabuf — should work because CPU writes get flushed naturally; would confirm by independent test. - kwin-fourier per-frame EGL_image invalidate — costly, defeats zero-copy. - mpv-fourier per-frame re-import — same. ## Status of mpv-fourier-1:0.41.0-9 The harmless no-op DMA_BUF_IOCTL_SYNC patch stays installed. When the kernel-rebuild path works, the patch gets reverted in the next mpv-fourier rev.

Still reproducible on ohm, 2026-05-18. Keeping this issue open — it's the canonical home for the root-cause investigation. Closing the sibling symptom report marfrit/libva-multiplanar#1 as duplicate, pointing here.

Re-verification (mpv-fourier 1:0.41.0-10, libva-v4l2-request-fourier iter39 cf8cd9d)

$ WAYLAND_DEBUG=1 mpv --hwdec=v4l2request --vo=dmabuf-wayland --frames=3 \
    fourier-test/bbb_1080p30_h264.mp4 2>&1 | grep zwp_linux_buffer_params

zwp_linux_buffer_params_v1#54.add(fd 40, 0, 0,       1920, 0, 0)   ← Y plane: fd 40, offset 0
zwp_linux_buffer_params_v1#54.add(fd 47, 1, 2088960, 1920, 0, 0)   ← UV plane: fd 47, offset 2088960
zwp_linux_buffer_params_v1#54.create_immed(new id wl_buffer#56, 1920, 1080, 842094158, 0)

Byte-identical to the broken pattern in the original report:

  • Plane 0 uses fd 40, plane 1 uses fd 47 — V4L2 MPLANE EXPBUF gives one fd per plane (Case B).
  • BUT plane 1 has offset 2088960 = 1920 × 1088, the Y-plane size for a single-allocation NV12 (Case A).
  • KWin imports plane 1 starting at offset 2088960 of fd 47. fd 47 is the UV plane's allocation (~1 MB), so 2088960 is past EOF → KWin samples zeros → dark green frame.

The bug lives in mpv's vo_dmabuf_wayland.c plane-info translation. mpv upstream has not fixed this between 2026-05-08 and 2026-05-18 (mpv-fourier still tracks 0.41.0).

Suggested next steps (in priority order)

  1. Upstream-check first: search mpv-player/mpv issues for "vo_dmabuf_wayland" + "NV12" + "green" / "plane" terms. If a ticket exists and is being worked on, mark this as "waiting upstream" and tie our fix-eta to theirs.
  2. If no upstream ticket: file one with this trace + the Case A vs Case B analysis from the original report. Reference Mesa-panfrost + KWin 6 as the affected compositor side (works on Mutter / other compositors? unknown — would strengthen the bug report).
  3. Local-fix path: if upstream is slow, write a one-liner patch against mpv's video/out/vo_dmabuf_wayland.c (or the drm_prime helper) that fixes the producer-side read — see "Suggested fix path" §1 above. Ship as mpv-fourier patch in marfrit-packages.

Cross-references unchanged: depends on the producer-side analysis in this issue; gated by neither libva-v4l2-request-fourier#1 (now closed) nor any other fleet issue.

**Still reproducible on ohm, 2026-05-18.** Keeping this issue open — it's the canonical home for the root-cause investigation. Closing the sibling symptom report [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1) as duplicate, pointing here. ## Re-verification (mpv-fourier 1:0.41.0-10, libva-v4l2-request-fourier iter39 `cf8cd9d`) ``` $ WAYLAND_DEBUG=1 mpv --hwdec=v4l2request --vo=dmabuf-wayland --frames=3 \ fourier-test/bbb_1080p30_h264.mp4 2>&1 | grep zwp_linux_buffer_params zwp_linux_buffer_params_v1#54.add(fd 40, 0, 0, 1920, 0, 0) ← Y plane: fd 40, offset 0 zwp_linux_buffer_params_v1#54.add(fd 47, 1, 2088960, 1920, 0, 0) ← UV plane: fd 47, offset 2088960 zwp_linux_buffer_params_v1#54.create_immed(new id wl_buffer#56, 1920, 1080, 842094158, 0) ``` Byte-identical to the broken pattern in the original report: - Plane 0 uses fd 40, plane 1 uses fd **47** — V4L2 MPLANE EXPBUF gives one fd per plane (Case B). - BUT plane 1 has offset **2088960** = 1920 × 1088, the Y-plane size for a single-allocation NV12 (Case A). - KWin imports plane 1 starting at offset 2088960 of fd 47. fd 47 is the UV plane's allocation (~1 MB), so 2088960 is past EOF → KWin samples zeros → dark green frame. The bug lives in mpv's `vo_dmabuf_wayland.c` plane-info translation. mpv upstream has not fixed this between 2026-05-08 and 2026-05-18 (mpv-fourier still tracks 0.41.0). ## Suggested next steps (in priority order) 1. **Upstream-check first**: search [mpv-player/mpv issues](https://github.com/mpv-player/mpv/issues) for "vo_dmabuf_wayland" + "NV12" + "green" / "plane" terms. If a ticket exists and is being worked on, mark this as "waiting upstream" and tie our fix-eta to theirs. 2. **If no upstream ticket**: file one with this trace + the Case A vs Case B analysis from the original report. Reference Mesa-panfrost + KWin 6 as the affected compositor side (works on Mutter / other compositors? unknown — would strengthen the bug report). 3. **Local-fix path**: if upstream is slow, write a one-liner patch against mpv's `video/out/vo_dmabuf_wayland.c` (or the drm_prime helper) that fixes the producer-side read — see "Suggested fix path" §1 above. Ship as `mpv-fourier` patch in marfrit-packages. Cross-references unchanged: depends on the producer-side analysis in this issue; gated by neither libva-v4l2-request-fourier#1 (now closed) nor any other fleet issue.

Closing 2026-05-18 — bug is fixed by the cache-sync workaround already shipped in mpv-fourier 0.41.0-10. The plane-semantics diagnosis in the issue body was a misdiagnosis.

Empirical re-verification today

Visual test on ohm with current stack (mpv-fourier 0.41.0-10, libva-v4l2-request-fourier 1:1.0.0.r361.cf8cd9d-1, ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-3):

mpv --hwdec=v4l2request --vo=dmabuf-wayland --length=10 bbb_1080p30_h264.mp4

Real Big Buck Bunny content displayed (operator visually confirmed, 2026-05-18 ~09:50 UTC). No green frames, no zero UV plane, no protocol rejection.

Why the original diagnosis was wrong

The issue body interpreted this WAYLAND_DEBUG pattern as broken:

add(fd 40, 0, 0,       1920, 0, 0)
add(fd 47, 1, 2088960, 1920, 0, 0)

…claiming fd 40 and fd 47 are V4L2 per-plane EXPBUF fds (Case B convention) being incorrectly combined with a single-allocation offset (Case A convention).

But strace of mpv on ohm 2026-05-18 shows:

VIDIOC_EXPBUF, {type=V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, index=N, plane=0, ...} => {fd=X}

plane=0 for every EXPBUF, every CAPTURE buffer index. The hantro driver on RK3566 reports MPLANE but exposes num_planes = 1 for NV12 — Y and UV are packed into a single allocation. So FFmpeg's hwcontext_v4l2request::v4l2request_set_drm_descriptor correctly sets:

  • planes[0].object_index = 0, offset = 0 (Y at start of buffer)
  • planes[1].object_index = 0, offset = pitch × aligned_height = 1920 × 1088 = 2088960 (UV at offset)

…and BOTH planes reference the same single object (= same fd). The wayland fds 40 and 47 are SCM_RIGHTS dups of that single underlying dmabuf, not separate per-plane fds. KWin's import is structurally correct.

The TRUE bug, fixed earlier: missing implicit-fence on V4L2 CAPTURE DQBUF caused the GPU import to read from physical memory before the producer's writes flushed → all-zero UV → BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) = solid dark green. Same class as reference_dmabuf_resv_blocker (RK3399 hantro CAPTURE all-zero readback). The cache-sync patch issues explicit DMA_BUF_IOCTL_SYNC(SYNC_RW) on each unique import fd before zwp_linux_buffer_params_v1_add(), invoking the producer driver's begin_cpu_access / end_cpu_access which flushes write buffers on ARM SoCs.

Sibling marfrit/libva-multiplanar#1 — closed earlier today

That sibling was closed as duplicate-of-this-issue with a (now-incorrect) note saying "still reproducible." It was actually no-longer-reproducible at the time of that comment — I'd only verified the wayland trace pattern, not the visual output. Posting a follow-up correction to the sibling for the record.

Root-cause attribution for future-me

  • True root cause: missing implicit fence on V4L2 stateless CAPTURE DQBUF (reference_dmabuf_resv_blocker family).
  • Userspace mitigation: cache-sync DMA_BUF_IOCTL_SYNC before compositor import (mpv-fourier 0001-vo_dmabuf_wayland-explicit-cache-sync-on-import-fd.patch).
  • Long-term fix: vb2_dma_resv opt-in producer fences in the kernel (tracked at marfrit/dmabuf-modifier-triage#3, operator-driven upstream work).
  • Plane-semantics theory: rejected, was based on an incorrect assumption about V4L2 MPLANE EXPBUF returning one fd per plane.

Closing.

**Closing 2026-05-18 — bug is fixed by the cache-sync workaround already shipped in mpv-fourier 0.41.0-10. The plane-semantics diagnosis in the issue body was a misdiagnosis.** ## Empirical re-verification today Visual test on ohm with current stack (`mpv-fourier 0.41.0-10`, `libva-v4l2-request-fourier 1:1.0.0.r361.cf8cd9d-1`, `ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-3`): ``` mpv --hwdec=v4l2request --vo=dmabuf-wayland --length=10 bbb_1080p30_h264.mp4 ``` → **Real Big Buck Bunny content displayed** (operator visually confirmed, 2026-05-18 ~09:50 UTC). No green frames, no zero UV plane, no protocol rejection. ## Why the original diagnosis was wrong The issue body interpreted this WAYLAND_DEBUG pattern as broken: ``` add(fd 40, 0, 0, 1920, 0, 0) add(fd 47, 1, 2088960, 1920, 0, 0) ``` …claiming fd 40 and fd 47 are V4L2 per-plane EXPBUF fds (Case B convention) being incorrectly combined with a single-allocation offset (Case A convention). But strace of mpv on ohm 2026-05-18 shows: ``` VIDIOC_EXPBUF, {type=V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, index=N, plane=0, ...} => {fd=X} ``` `plane=0` for every EXPBUF, every CAPTURE buffer index. The hantro driver on RK3566 reports `MPLANE` but exposes `num_planes = 1` for NV12 — Y and UV are packed into a single allocation. So FFmpeg's `hwcontext_v4l2request::v4l2request_set_drm_descriptor` correctly sets: - `planes[0].object_index = 0, offset = 0` (Y at start of buffer) - `planes[1].object_index = 0, offset = pitch × aligned_height = 1920 × 1088 = 2088960` (UV at offset) …and BOTH planes reference the **same single object** (= same fd). The wayland fds 40 and 47 are SCM_RIGHTS dups of that single underlying dmabuf, not separate per-plane fds. KWin's import is structurally correct. The TRUE bug, fixed earlier: missing implicit-fence on V4L2 CAPTURE DQBUF caused the GPU import to read from physical memory before the producer's writes flushed → all-zero UV → BT.601 limited-range YUV(0,0,0) → RGB(0, 135, 0) = solid dark green. Same class as `reference_dmabuf_resv_blocker` (RK3399 hantro CAPTURE all-zero readback). The cache-sync patch issues explicit `DMA_BUF_IOCTL_SYNC(SYNC_RW)` on each unique import fd before `zwp_linux_buffer_params_v1_add()`, invoking the producer driver's `begin_cpu_access` / `end_cpu_access` which flushes write buffers on ARM SoCs. ## Sibling [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1) — closed earlier today That sibling was closed as duplicate-of-this-issue with a (now-incorrect) note saying "still reproducible." It was actually no-longer-reproducible at the time of that comment — I'd only verified the wayland trace pattern, not the visual output. Posting a follow-up correction to the sibling for the record. ## Root-cause attribution for future-me - True root cause: missing implicit fence on V4L2 stateless CAPTURE DQBUF (`reference_dmabuf_resv_blocker` family). - Userspace mitigation: cache-sync `DMA_BUF_IOCTL_SYNC` before compositor import (mpv-fourier `0001-vo_dmabuf_wayland-explicit-cache-sync-on-import-fd.patch`). - Long-term fix: `vb2_dma_resv` opt-in producer fences in the kernel (tracked at [marfrit/dmabuf-modifier-triage#3](https://git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/3), operator-driven upstream work). - Plane-semantics theory: **rejected**, was based on an incorrect assumption about V4L2 MPLANE EXPBUF returning one fd per plane. Closing.
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/dmabuf-modifier-triage#1