commit 7d68d1723296a97969fe4905c995290870476f2f Author: Markus Fritsche Date: Fri May 8 13:49:51 2026 +0000 dmabuf-modifier-triage campaign scaffold Focused triage of marfrit/libva-multiplanar#1 — dmabuf-wayland green on ohm independent of decoder backend. Locked question: identify the layer responsible (libva / ffmpeg / KWin / Mesa-panfrost / kernel) and file upstream where appropriate. Performance is explicitly out of scope — user has working slow path via vo=gpu hwdec=v4l2request. Phase 0 deliverables: vaExportSurfaceHandle + AVDRMFrameDescriptor modifier captures, Wayland linux-dmabuf-v1 advertise snapshot, pacman upgrade timeline review for the iter5→iter8 regression window, and stock-kwin A/B isolating kwin-fourier as a candidate. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2028810 --- /dev/null +++ b/.gitignore @@ -0,0 +1,25 @@ +references/ + +# Inside phase*_evidence/ track narrative .md notes; ignore raw captures. +phase*_evidence/**/*.txt +phase*_evidence/**/*.log +phase*_evidence/**/*.pcap +phase*_evidence/**/*.strace +phase*_evidence/**/*.bin +phase*_evidence/**/*.gz +phase*_evidence/**/*.zst +phase*_evidence/**/*.dat +phase*_evidence/**/*.trace +phase*_evidence/**/*.json +phase*_evidence/**/*.ftrace +phase*_evidence/**/*.jpg +phase*_evidence/**/*.png +phase*_evidence/**/*.nv12 +phase*_evidence/**/*.yuv +phase*_evidence/**/*.tsv +phase*_evidence/**/*.strace* +phase*_evidence/**/libva.trace.* + +*.log +*.pcap +*.strace diff --git a/README.md b/README.md new file mode 100644 index 0000000..eea00f5 --- /dev/null +++ b/README.md @@ -0,0 +1,64 @@ +# dmabuf-modifier-triage + +## TL;DR + +Focused triage campaign for **[marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1)** — `mpv --vo=dmabuf-wayland` produces solid green frames on ohm regardless of decoder backend (`--hwdec=vaapi`, `--hwdec=v4l2request`). `--vo=gpu` displays correct content. The bug is squarely in the dmabuf-wayland↔KWin presentation handoff, not in the decoder. + +This campaign exists because libva-multiplanar's iter9 was scoped narrowly to fix its own decoder-side cap_pool/REQBUFS cascade ([marfrit/libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1)) and shouldn't bloat to take on the presentation-layer bug. KWin/panfrost stack work is also explicitly out-of-scope for libva-multiplanar per its `Out-of-scope` README section. + +## Locked question + +**Identify the layer responsible for the dmabuf-wayland green on ohm — libva (`vaExportSurfaceHandle` modifier reporting), ffmpeg V4L2 request hwaccel (`AVDRMFrameDescriptor` modifier), KWin (`linux-dmabuf-v1` accept logic), Mesa-panfrost (modifier import constraints), or the kernel hantro driver (buffer attribute reporting). File upstream where appropriate; fix what's locally in scope.** + +Performance / "make smooth" is explicitly **out** of this campaign — the goal is bisection-down to the offending layer + a recorded fix or upstream filing per layer. The user's working HW-decode workflow on ohm right now is `--vo=gpu --hwdec=v4l2request` (slow but correct), which is sufficient until this campaign lands. + +## Hardware target + +- **ohm** — PineTab2, Rockchip **RK3566** (note: prior README references to RK3568 corrected per `~/src/libva-multiplanar/libva-v4l2-request-fourier/` commit `dcaa1f1`). hantro G1 decoder, **Mali-G52 (Bifrost)** GPU via panfrost. +- KWin 6.x via Plasma 6 Wayland. +- Reproducer: `~/fourier-test/bbb_1080p30_h264.mp4` (Pinebook Pro / ohm shared corpus from `ohm_gl_fix` campaign). + +Other hosts may serve as A/B controls (different GPU + same decoder, or same GPU + different decoder) — see `phase0_findings.md` for the planned matrix. + +## Process + +8(+1) phase loop per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md). Phase 0 = locked-question + substrate. Phase 5 review uses the sonnet-architect subagent pattern. + +In-session-acquired data discipline per [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md): the libva-multiplanar campaign's decoder-side measurements are reference history, not threshold sources for this campaign's cells. + +## Predecessor work this campaign builds on + +- **[`../libva-multiplanar/`](../libva-multiplanar/)** — closed iter1..iter5 + iter6/7/8 work on the libva-v4l2-request-fourier fork. The campaign's iter5/8 close claimed "mpv `--hwdec=vaapi` smooth" on ohm — that claim is what was found to fail on 2026-05-08, with the green being one of the two failure modes (the other is the libva cascade, separately tracked at iter9). Cross-link from libva-multiplanar README's "Known issues at iter8 production tip" section. +- **[`../kwin-overlay-subsurface/`](../kwin-overlay-subsurface/)** — closed without patch. `phase2_source_findings.md` covers PineTab2's rockchip-drm plane format/modifier table. Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout. Useful because the modifier negotiation we're triaging here may overlap with the scanout-plane-modifier facts that campaign captured. +- **[`../x11-session-research/`](../x11-session-research/)** — confirmed the scanout-plane gap isn't fixable by switching session servers. Also useful negative space: this triage shouldn't waste cycles on "switch off Wayland" experiments that the predecessor already ran. +- **[`~/.claude/projects/-home-mfritsche-src-fourier/memory/project_libva_multiplanar_state.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/project_libva_multiplanar_state.md)** — campaign-state memory captures the package shipped + both bugs found. + +External reference: + +- mpv `--vo=dmabuf-wayland` source: +- KWin Wayland linux-dmabuf-v1 source: KDE `kwin/src/wayland/linuxdmabufv1clientbuffer.cpp` +- Mesa panfrost modifier handling: `src/panfrost/lib/genxml/decode_*` etc. +- Linux DRM modifier registry: `include/uapi/drm/drm_fourcc.h` (`DRM_FORMAT_MOD_*`) + +## Repository layout + +``` +~/src/dmabuf-modifier-triage/ <- this campaign (its own git repo) +├── README.md <- this file +├── phase0_findings.md <- locked question + Phase 0 work list +├── (worklist.md, phase[2-8]*.md as phases land) +└── (no fork code lives here — bug is presentation-side, not in libva-v4l2-request-fourier; + fixes either land upstream or in kwin-fourier / panfrost-side packages) +``` + +The campaign repo stays separate from any fork. If a fix lands as a downstream patch, it lives in `~/src/marfrit-packages/arch//` (kwin-fourier, mesa, etc.) per the existing fourier umbrella convention, with a pointer back to this campaign's findings. + +Operator-facing repo URL: `git.reauktion.de/marfrit/dmabuf-modifier-triage` — created empty during scaffolding, no push until first iteration finds something worth publishing. + +## Non-upstreaming default + +Inherited from `feedback_no_upstream.md`. **Exception**: when triage proves the bug is upstream code (KWin / Mesa-panfrost / kernel), opening a properly-formatted upstream bug report is the *only* useful outcome of that line of investigation, so this campaign explicitly **does** plan upstream filings as deliverables when scope-correct. Upstream submissions land in `~/src/marfrit-packages/upstream-submissions/dmabuf-modifier-triage/` per the existing convention. + +## Build infrastructure + +No build host needed — this is a triage campaign, not a code-bearing one. Reproductions run on ohm directly. If a kwin-fourier rebuild is required for an A/B test, the existing boltzmann + marfrit-publish path applies. diff --git a/phase0_findings.md b/phase0_findings.md new file mode 100644 index 0000000..1eecf44 --- /dev/null +++ b/phase0_findings.md @@ -0,0 +1,85 @@ +# Phase 0 — locked research question, substrate, deliverables + +**Locked 2026-05-08.** Iter1 phase 0 substrate. + +## Locked research question + +> Identify the layer responsible for the dmabuf-wayland green on ohm — +> libva (`vaExportSurfaceHandle` modifier reporting), ffmpeg V4L2 request +> hwaccel (`AVDRMFrameDescriptor` modifier), KWin (`linux-dmabuf-v1` accept +> logic), Mesa-panfrost (modifier import constraints), or the kernel +> hantro driver (buffer attribute reporting). File upstream where +> appropriate; fix what's locally in scope. + +Bug tracker: [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1). + +## Reproduction (verbatim from issue tracker) + +```bash +# All three on ohm with libva-v4l2-request-fourier-1.0.0.r280.65969da-1 +# from [marfrit] and /etc/profile.d/libva-v4l2-request.sh in effect. + +# 1. via libva — green (also hits libva-v4l2-request-fourier#1, but green +# would persist even with that bug fixed) +mpv --hwdec=vaapi --vo=dmabuf-wayland --target-colorspace-hint=no \ + fourier-test/bbb_1080p30_h264.mp4 + +# 2. via ffmpeg V4L2 request hwaccel — also green (no libva) +mpv --hwdec=v4l2request --vo=dmabuf-wayland \ + fourier-test/bbb_1080p30_h264.mp4 + +# 3. via ffmpeg V4L2 request hwaccel + GPU shader VO — correct picture (slow) +mpv --hwdec=v4l2request --vo=gpu \ + fourier-test/bbb_1080p30_h264.mp4 +``` + +Result #3 is the workaround currently in use. The campaign closes when result #1 displays correctly. + +## Open questions + +1. **What modifier does libva's `vaExportSurfaceHandle` report for the hantro decode surface on ohm?** Should be `DRM_FORMAT_MOD_LINEAR` (`0x0`) per iter2 Fix 2's pitch-aligned path, but the green suggests otherwise. Need a `vainfo`-equivalent or a small C harness that calls `vaCreateSurfaces` + `vaExportSurfaceHandle` and prints the `VADRMPRIMESurfaceDescriptor.objects[i].drm_format_modifier`. + +2. **What modifier does ffmpeg's V4L2 request hwaccel report for the same decode?** Captured via `AV_HWFRAME_TRANSFER_DIRECTION_FROM` + inspecting the `AVDRMFrameDescriptor.objects[i].format_modifier`. Probably comes from `VIDIOC_G_FMT(CAPTURE_MPLANE)` plus a hardcoded LINEAR if v4l2 doesn't report a modifier. + +3. **What modifier does KWin advertise via `zwp_linux_dmabuf_v1.modifier`?** From mpv `-v` output we already know the answer is "NV12 with modifier 0x0 only." But it's worth confirming via `wayland-info` that this is the *only* advertised entry, and capturing whether KWin also supports `DRM_FORMAT_MOD_INVALID` as the catch-all. + +4. **Does KWin reject the buffer outright (protocol error) or accept and display garbage?** From wp_linux_dmabuf protocol perspective: the answer is in the surface's per-commit feedback. Strace KWin's compositor or use a `WAYLAND_DEBUG=1` mpv run to capture the protocol exchange. + +5. **Is the bug in the modifier handshake or in the buffer's content interpretation?** Specifically: if KWin accepts the buffer but renders it wrong, the issue is *interpretation* (likely Mali-G52 panfrost's NV12 sampler reading raw pixels assuming a stride/layout that doesn't match). If KWin rejects, the issue is *negotiation* (mpv claims a modifier KWin won't accept). + +6. **Has KWin or Mesa-panfrost been upgraded between iter5 close (2026-05-05) and now (2026-05-08)?** A `pacman -Q` log + `pacman.log` review on ohm tells us whether new package versions correlate with the iter5→iter8 regression window. The kwin-fourier version on ohm (probably `1:6.6.4-1` per packages.reauktion.de) needs cross-checking against the version that was "smooth" at iter5. + +7. **Does a non-fourier KWin (stock arch `kwin 1:6.6.4-1`) exhibit the same green?** The kwin-fourier 0001 patch is the known-distinguishing change; pinning back to stock kwin and re-testing isolates whether kwin-fourier introduced the issue. + +8. **Does `wlroots`-based compositor (sway, weston) show the green too?** Switches the compositor variable. If green there, it's not KWin-specific. If correct there, KWin is the suspect. + +## Phase 0 will deliver + +1. **vaExportSurfaceHandle modifier capture** — small C harness in `phase0_evidence//va_modifier_probe.c` linked against libva, prints the DRM_PRIME_2 descriptor for a freshly-allocated NV12 surface on ohm. Captured output goes to `phase0_evidence//va_modifier_capture.md`. + +2. **AVDRMFrameDescriptor modifier capture** — small C harness using ffmpeg's `av_hwframe_transfer_data` against a /dev/media0 + /dev/video1 hwdevice context, prints the modifier ffmpeg reports. Output to `phase0_evidence//av_modifier_capture.md`. + +3. **Wayland linux-dmabuf-v1 advertised list** — `wayland-info` snapshot + `WAYLAND_DEBUG=1 mpv ...` excerpt showing the negotiation. Output to `phase0_evidence//kwin_dmabuf_advertise.md`. + +4. **Pacman upgrade timeline review** — `journalctl _COMM=pacman` or `cat /var/log/pacman.log | awk '$1>="[2026-05-05"'` on ohm to see what changed between iter5 close and now. Output to `phase0_evidence//pacman_upgrade_window.md`. + +5. **Stock-kwin A/B** — temporarily swap `kwin-fourier` for stock arch `kwin`, re-run reproduction, capture result. Output to `phase0_evidence//kwin_fourier_ab.md`. + +6. **Compositor A/B (optional)** — if items 1-5 don't conclude, swap compositor (sway via TTY login session) and capture. Output to `phase0_evidence//compositor_ab.md`. + +Items 1-2 are decoder-side captures (~30 min each). Items 3-4 are 5 min each. Items 5-6 are bigger because they require login-session swaps. + +After Phase 0 closes, Phase 1 will reproduce on a controlled test rig (probably `mpv -v` with WAYLAND_DEBUG=1, deterministic frame count, structured output capture) so Phase 4's fix attempt has a clean signal-to-noise environment. + +## Phase 0 cross-references + +- libva-multiplanar `phase0_findings.md` — Phase 0 / Phase 2 substrate for the original campaign. The decoder-side facts there are reference (modifier reporting in iter2 Fix 2, NV12 multi-planar paths). +- kwin-overlay-subsurface `phase2_source_findings.md` — modifier table for PineTab2's rockchip-drm planes. Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; this campaign's bug may be related to whether the dmabuf reaches Plane 39 vs goes through GL composition (the predecessor verdict was "no NV12-capable Overlay plane, so KWin always GL-composites"). +- libva-multiplanar `iter5 phase8_iteration5_close.md` — last close date 2026-05-05 with the "mpv smooth" claim. Verifying that the date stamp is correct and the test was run interactively (not just via the perf binding cell) is one of Phase 0's housekeeping tasks. + +## Out-of-scope reminders + +- Performance / "make it smooth": this campaign is correctness-only. The user already has `--vo=gpu --hwdec=v4l2request` as a working slow path. +- Decoder-side bugs: those belong to libva-multiplanar iter9. Anything that turns out to be `vaExportSurfaceHandle` lying about the modifier hands the bug back to iter9. +- Other hardware: ohm is the locked target. fresnel (RK3399, Mali-T860 Midgard) and ampere (RK3588) may or may not exhibit the same — note in cross-campaign memory if they do, but don't expand scope to fix on those hosts. +- AV1 / VP9 / HEVC dmabuf paths: H.264 only for this triage.