Files
dmabuf-modifier-triage/phase0_findings.md
T
marfrit 9f406c0c42 phase0: revise framing — iter1 is unblocked, libva comparison is follow-up validation
Operator pushback: phase 0 should unblock iter1, not gate it. The
locked question is "fix what's locally in scope" — kill the green,
not just identify a layer. The captured wl_dmabuf message is
internally inconsistent on its own (per-plane fds + single-allocation
offset for plane 1 is a contradiction no valid producer can claim
simultaneously). mpv's translation layer produces this regardless of
which producer feeds it, so iter1 can write the fix from the ffmpeg-
side data alone. The libva-path WAYLAND_DEBUG comparison after iter9
is a follow-up validation that confirms the fix handles both producer
shapes, not a prerequisite for writing it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:01:12 +00:00

14 KiB
Raw Blame History

Phase 0 — locked research question, substrate, deliverables

Locked 2026-05-08. Iter1 phase 0 substrate.

PHASE 0 CLOSED 2026-05-08 — root cause isolated within hours of opening, faster than expected. See "Phase 0 conclusion" below. The original deliverables list (modifier captures, A/B tests) is preserved as historical record; the elimination ladder went deeper than the planned items but landed at a clear answer.

Phase 0 conclusion

Root cause: mpv's vo_dmabuf_wayland.c constructs the zwp_linux_buffer_params_v1 protocol message with internally inconsistent plane semantics — different fds per plane (V4L2 MPLANE export semantics) combined with single-allocation offset for plane 1 (single-fd export semantics). KWin imports plane 1 from the wrong byte address, reads zeros for the UV chroma plane, and the all-zero NV12 buffer renders as dark green. Filed at marfrit/dmabuf-modifier-triage#1.

Trace (one buffer cycle, abbreviated):

create_params #54
  add(fd 41, plane=0, offset=0,        stride=1920, modifier=0,0)   ← Y plane
  add(fd 42, plane=1, offset=2088960,  stride=1920, modifier=0,0)   ← UV plane
create_immed(wl_buffer#56, 1920, 1080, NV12, flags=0)

1920×1088 = 2088960 is the size of the Y plane (height-aligned 1080→1088). Correct offset for "both planes in one allocation"; wrong when plane 1 is in its own fd.

Color analysis — why all-zero NV12 = dark green specifically: Y=0 + U=0 + V=0 in BT.601/709 → RGB ≈ (0, 70, 0), which is the dark green the user reported.

Layers ruled out via directed A/B (in order):

Layer Verdict Test
libva not the cause --hwdec=v4l2request (libva-bypassed) also greens
Decoder content correct --vo=gpu --hwdec=v4l2request shows real picture
Color tagging / HDR PQ not the cause --target-colorspace-hint=no + explicit BT.709 SDR triple no effect
kwin-fourier 0001 (watchDmaBuf bypass) exonerated stock arch kwin also greens
Mesa 26.0.6 vs 26.0.5 exonerated downgrade to 26.0.5 still greens
Wayland / KWin generally exonerated --vo=wlshm shows correct picture
Kernel besser-7.0 vs pinetab2-6.19.10 exonerated boot.scr swap + reboot, both kernels green
KWin Vulkan vs OpenGL backend exonerated qdbus6 confirms KWin runs OpenGL + Mesa-panfrost (same path mpv's vo=gpu uses successfully)

Why the layer-strip works the way it does:

  • mpv's vo=gpu is fine because it imports the dmabuf via libva's vaExportSurfaceHandle / ffmpeg's av_hwframe_transfer_data directly into mpv's own EGL context — those APIs return correct per-plane structure.
  • mpv's vo=dmabuf-wayland is the unique broken path because it constructs the wl_dmabuf protocol message itself, mishandling the producer's plane semantics.
  • mpv's vo=wlshm is fine because it goes through CPU memcpy, bypassing the dmabuf protocol entirely.

iter1 is unblocked. The wl_dmabuf message in the WAYLAND_DEBUG trace is internally inconsistent on its own — combining per-plane fds with a single-allocation offset for plane 1 is something no valid producer can claim simultaneously. mpv's translation layer produces this regardless of which producer feeds it. The fix can be written from the ffmpeg-side data alone.

Follow-up validation (not a prerequisite): once libva-v4l2-request-fourier#1 is fixed and the cap_pool/REQBUFS cascade stops blocking libva playback, re-run the WAYLAND_DEBUG capture with --hwdec=vaapi to confirm the same fix also handles the libva path. If both paths produce the same wrong .add() pattern → fix is correct. If only one does → expand the fix to handle both producer shapes (the AVDRMFrameDescriptor and VADRMPRIMESurfaceDescriptor mappings to wl_dmabuf may need separate logic).

Working ohm HW-decode workflow until fix lands:

mpv --hwdec=v4l2request --vo=gpu fourier-test/bbb_1080p30_h264.mp4

Correct picture, slow due to GPU shader path on Mali-G52. Documented as the campaign's interim recommended path.

Locked acceptance criterion for iter1 (2026-05-08):

Per operator decision, the iter1 fix is considered shipped only when the before/after screenshot pair in screenshots/ reconciles. The current broken state is screenshots/frame10_dmabuf_green.png (uniform dark green). The target post-fix state is screenshots/frame10_expected.png (correct bbb frame 10, captured via the working --vo=gpu path on the same hardware/session). Verification protocol in screenshots/README.md — SSIM > 0.95 on the central frame area + valid .add() semantics in WAYLAND_DEBUG capture + no regression on --vo=gpu or --vo=wlshm.


Locked research question (original — superseded by Phase 0 conclusion above)

Identify the layer responsible for the dmabuf-wayland green on ohm — libva (vaExportSurfaceHandle modifier reporting), ffmpeg V4L2 request hwaccel (AVDRMFrameDescriptor modifier), KWin (linux-dmabuf-v1 accept logic), Mesa-panfrost (modifier import constraints), or the kernel hantro driver (buffer attribute reporting). File upstream where appropriate; fix what's locally in scope.

Bug tracker: marfrit/libva-multiplanar#1 (user-visible symptom). Root-cause issue: marfrit/dmabuf-modifier-triage#1 (mpv vo_dmabuf_wayland plane-semantics).

Reproduction (verbatim from issue tracker)

# All three on ohm with libva-v4l2-request-fourier-1.0.0.r280.65969da-1
# from [marfrit] and /etc/profile.d/libva-v4l2-request.sh in effect.

# 1. via libva — green (also hits libva-v4l2-request-fourier#1, but green
#    would persist even with that bug fixed)
mpv --hwdec=vaapi --vo=dmabuf-wayland --target-colorspace-hint=no \
    fourier-test/bbb_1080p30_h264.mp4

# 2. via ffmpeg V4L2 request hwaccel — also green (no libva)
mpv --hwdec=v4l2request --vo=dmabuf-wayland \
    fourier-test/bbb_1080p30_h264.mp4

# 3. via ffmpeg V4L2 request hwaccel + GPU shader VO — correct picture (slow)
mpv --hwdec=v4l2request --vo=gpu \
    fourier-test/bbb_1080p30_h264.mp4

Result #3 is the workaround currently in use. The campaign closes when result #1 displays correctly.

Open questions

  1. What modifier does libva's vaExportSurfaceHandle report for the hantro decode surface on ohm? Should be DRM_FORMAT_MOD_LINEAR (0x0) per iter2 Fix 2's pitch-aligned path, but the green suggests otherwise. Need a vainfo-equivalent or a small C harness that calls vaCreateSurfaces + vaExportSurfaceHandle and prints the VADRMPRIMESurfaceDescriptor.objects[i].drm_format_modifier.

  2. What modifier does ffmpeg's V4L2 request hwaccel report for the same decode? Captured via AV_HWFRAME_TRANSFER_DIRECTION_FROM + inspecting the AVDRMFrameDescriptor.objects[i].format_modifier. Probably comes from VIDIOC_G_FMT(CAPTURE_MPLANE) plus a hardcoded LINEAR if v4l2 doesn't report a modifier.

  3. What modifier does KWin advertise via zwp_linux_dmabuf_v1.modifier? From mpv -v output we already know the answer is "NV12 with modifier 0x0 only." But it's worth confirming via wayland-info that this is the only advertised entry, and capturing whether KWin also supports DRM_FORMAT_MOD_INVALID as the catch-all.

  4. Does KWin reject the buffer outright (protocol error) or accept and display garbage? From wp_linux_dmabuf protocol perspective: the answer is in the surface's per-commit feedback. Strace KWin's compositor or use a WAYLAND_DEBUG=1 mpv run to capture the protocol exchange.

  5. Is the bug in the modifier handshake or in the buffer's content interpretation? Specifically: if KWin accepts the buffer but renders it wrong, the issue is interpretation (likely Mali-G52 panfrost's NV12 sampler reading raw pixels assuming a stride/layout that doesn't match). If KWin rejects, the issue is negotiation (mpv claims a modifier KWin won't accept).

  6. Has KWin or Mesa-panfrost been upgraded between iter5 close (2026-05-05) and now (2026-05-08)? A pacman -Q log + pacman.log review on ohm tells us whether new package versions correlate with the iter5→iter8 regression window. The kwin-fourier version on ohm (probably 1:6.6.4-1 per packages.reauktion.de) needs cross-checking against the version that was "smooth" at iter5.

  7. Does a non-fourier KWin (stock arch kwin 1:6.6.4-1) exhibit the same green? The kwin-fourier 0001 patch is the known-distinguishing change; pinning back to stock kwin and re-testing isolates whether kwin-fourier introduced the issue.

  8. Does wlroots-based compositor (sway, weston) show the green too? Switches the compositor variable. If green there, it's not KWin-specific. If correct there, KWin is the suspect.

Phase 0 will deliver

Priority reordered 2026-05-08 after the active kwin-fourier patch was identified as 0001-transaction-bypass-watchDmaBuf-fence-wait.patch — a runtime-observable smoking gun (watchDmaBuf bypasses the implicit-sync fence wait, meaning KWin samples dmabufs without waiting for the producer fence; an all-zeros NV12 buffer renders solid green in YUV→RGB conversion). Stock-kwin A/B is the decisive single-step isolation.

  1. Stock-kwin A/B (was item 5) — pin back to extra/kwin (drops kwin-fourier patches), restart compositor, re-run reproduction. If green clears → the campaign's iter1 narrows to 0001-transaction-bypass-watchDmaBuf-fence-wait.patch. Output to phase0_evidence/<date>/kwin_fourier_ab.md.

  2. vaExportSurfaceHandle modifier capture — small C harness in phase0_evidence/<date>/va_modifier_probe.c linked against libva, prints the DRM_PRIME_2 descriptor for a freshly-allocated NV12 surface on ohm. Captured output goes to phase0_evidence/<date>/va_modifier_capture.md. (Less urgent if item 1 already isolates the cause, but useful as parallel data.)

  3. AVDRMFrameDescriptor modifier capture — small C harness using ffmpeg's av_hwframe_transfer_data against a /dev/media0 + /dev/video1 hwdevice context, prints the modifier ffmpeg reports. Output to phase0_evidence/<date>/av_modifier_capture.md.

  4. Wayland linux-dmabuf-v1 advertised listwayland-info snapshot + WAYLAND_DEBUG=1 mpv ... excerpt showing the negotiation. Output to phase0_evidence/<date>/kwin_dmabuf_advertise.md.

  5. Pacman upgrade timeline reviewjournalctl _COMM=pacman or cat /var/log/pacman.log | awk '$1>="[2026-05-05"' on ohm to see what changed between iter5 close and now. Output to phase0_evidence/<date>/pacman_upgrade_window.md. (Useful to confirm kwin-fourier 0001 was already active at iter5 close — if so, the regression is somewhere else.)

  6. Compositor A/B (optional) — if items 1-5 don't conclude, swap compositor (sway via TTY login session) and capture. Output to phase0_evidence/<date>/compositor_ab.md.

Item 1 is ~10 minutes (downgrade + re-login + retest). Items 2-3 are decoder-side captures (~30 min each). Items 4-5 are 5 min each. Item 6 is bigger because it requires login-session swaps.

kwin-fourier 0001 patch context (2026-05-08)

Active patch: 0001-transaction-bypass-watchDmaBuf-fence-wait.patch in ~/src/marfrit-packages/arch/kwin-fourier/. Run-history breadcrumb:

  • run #51 (84088141, 2026-05-04): introduced as kwin-fourier: bypass watchDmaBuf implicit-sync fence wait (experiment).
  • run #58 (00aa186b, 2026-05-04): switched active patch to 0002-transaction-poll-dmabuf-fd-directly-upstream-shape.patch.
  • run #59 (bc2c97d1, 2026-05-04): reverted active patch to 0001, bumped pkgrel=2.

The hypothesis if stock-kwin clears the green: the bypass introduced a subtle race where the dmabuf is sampled before the v4l2 stateless decoder's CAPTURE buffer has been written. Pre-iter5, this may have been masked because mpv's --hwdec=vaapi with libva-multiplanar produces buffers with explicit vaSyncSurface calls that block until decode-complete (libva's API contract is "buffer is valid after vaSyncSurface returns"). But the same buffers via --hwdec=v4l2request go through ffmpeg's AVDRMFrameDescriptor path which doesn't hit vaSyncSurface — and the implicit fence (which kwin-fourier 0001 ignores) is the only ordering primitive left. So the green showing up on both paths simultaneously is consistent with the hypothesis only if the libva path also somehow lost its sync, which is plausible if iter6/7 changed the libva-multiplanar vaSyncSurface implementation. Worth checking.

After Phase 0 closes, Phase 1 will reproduce on a controlled test rig (probably mpv -v with WAYLAND_DEBUG=1, deterministic frame count, structured output capture) so Phase 4's fix attempt has a clean signal-to-noise environment.

Phase 0 cross-references

  • libva-multiplanar phase0_findings.md — Phase 0 / Phase 2 substrate for the original campaign. The decoder-side facts there are reference (modifier reporting in iter2 Fix 2, NV12 multi-planar paths).
  • kwin-overlay-subsurface phase2_source_findings.md — modifier table for PineTab2's rockchip-drm planes. Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; this campaign's bug may be related to whether the dmabuf reaches Plane 39 vs goes through GL composition (the predecessor verdict was "no NV12-capable Overlay plane, so KWin always GL-composites").
  • libva-multiplanar iter5 phase8_iteration5_close.md — last close date 2026-05-05 with the "mpv smooth" claim. Verifying that the date stamp is correct and the test was run interactively (not just via the perf binding cell) is one of Phase 0's housekeeping tasks.

Out-of-scope reminders

  • Performance / "make it smooth": this campaign is correctness-only. The user already has --vo=gpu --hwdec=v4l2request as a working slow path.
  • Decoder-side bugs: those belong to libva-multiplanar iter9. Anything that turns out to be vaExportSurfaceHandle lying about the modifier hands the bug back to iter9.
  • Other hardware: ohm is the locked target. fresnel (RK3399, Mali-T860 Midgard) and ampere (RK3588) may or may not exhibit the same — note in cross-campaign memory if they do, but don't expand scope to fix on those hosts.
  • AV1 / VP9 / HEVC dmabuf paths: H.264 only for this triage.