# invent.kde.org/plasma/kwin merge request body **Title**: `wayland/transaction: poll dmabuf fd directly instead of EXPORT_SYNC_FILE` **Branch**: `master` (forward-port to `Plasma/6.6` will be requested via backport label after master lands). --- ## Summary `Transaction::watchDmaBuf` calls `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` on every plane of every imported dmabuf and parks the transaction on a `QSocketNotifier(POLLIN)` waiting for the resulting sync_file fd to become readable. This is correct, but unnecessary. The dma-buf core (`drivers/dma-buf/dma-buf.c`, `dma_buf_poll`) has supported `poll(POLLIN)` on the dmabuf fd directly since implicit-sync was introduced. The sync_file we obtain via `EXPORT_SYNC_FILE` wraps the same set of fences that polling the dmabuf fd directly would wait on. The export-then-poll round-trip costs: - one ioctl into the kernel (`DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) - one sync_file allocation + struct + ref-count - one extra dup'd fd we hand to QSocketNotifier …per fence per plane per frame on every wp_linux_dmabuf-v1 client. For a single 1080p video stream at 30 fps with 2 NV12 planes, that's 60 ioctls/sec/client of pure overhead with no semantic difference vs. polling the dmabuf fd directly. This patch dup()s the dmabuf fd we already have and hands it to TransactionFence directly. Same `QSocketNotifier(POLLIN)` wait semantics, fewer syscalls. ## Why this matters On Mali-class hardware (RK3566 / RK3588 / mainline kernel + panfrost or panthor mesa) running KWin Wayland, the per-frame ioctl overhead is a measurable contributor to compositor latency spikes. During a wider investigation into a chrome-on-KWin video playback stall (chromium-fourier campaign, see https://github.com/marfrit/fourier), the watchDmaBuf path was identified as one of two contributors to the deadlock; the other is a kernel-side gap (V4L2 producers don't populate the dmabuf's `dma_resv` exclusive fence, so the sync_file we export is a stub-signalled `dma_fence_get_stub()` representing nothing real). A separate kernel patch series (sent to linux-media) addresses the producer-side gap. With that kernel patch in place, the wait genuinely waits on a producer fence — but the export-then-poll round-trip is still pure overhead. This MR addresses the KWin side of the latency. ## Side effect Removes the dependency on `` and `` from `transaction.cpp`'s includes, since they were only present for `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` / `drmIoctl()`. The `exportWaitSyncFile()` static helper is removed for the same reason. ## Alternative considered Adding a timeout to the existing path (e.g. 16 ms ≈ one 60 Hz vsync). That preserves more of the existing code shape but doesn't remove the per-frame ioctl overhead. Polling the dmabuf fd directly is the cleaner fix and matches what the dma-buf API documents as the recommended primitive for this case. ## Validation Built and run on PineTab2 (RK3566 / Mali-G52 / mainline kernel 6.19.10 / panfrost mesa 26.0.5 / KDE Plasma 6.6.4 Wayland). KWin session starts cleanly, multi-window OpenGL workloads (Brave, plasma-overview, kate, dolphin) render normally — no regression. ### Honest finding from `strace -e trace=ioctl` on `kwin_wayland` In 30 s of 1080p30 H.264 playback under chromium-fourier (wp_linux_dmabuf_v1 client, V4L2-stateless decode capable): - **Zero `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`** calls. - 29,128 ioctl calls total, dominated by: - `DRM_IOCTL_PANFROST_*` (rendering) - `DRM_IOCTL_SYNCOBJ_*` (≈ 7,800 — explicit-sync syncobj operations) The same shape holds with stock Brave + 60 s playback (96,120 total ioctls, 0 EXPORT_SYNC_FILE). KWin 6.6.4 with a modern client negotiates `wp_linux_drm_syncobj_v1` **explicit sync** — `Transaction::watchDmaBuf` is not on the hot path in this configuration. The function still exists and runs for clients that don't advertise drm-syncobj support (older Wayland clients, some video pipelines that import implicit-sync producer dmabufs without wrapping them), but on a current Plasma + drm-syncobj-aware client stack, the call frequency we initially expected (≈ 60 ioctls/sec/client at 1080p30) does not materialize. ### What this means for the patch The patch is still correct and worth taking, but the *value proposition* shifts from "measurable per-frame win on V4L2 video playback" to "remove a kernel round-trip on the legacy implicit-sync path that still services older Wayland clients and V4L2 producers that don't go through drm-syncobj-aware compositor paths". The structural improvement is real wherever `Transaction::watchDmaBuf` fires; we just couldn't construct a benchmark on this hardware that fires it heavily enough to put a percentage on. Reviewers with older client baselines (Plasma 5 era apps, GTK on older Wayland, xwayland passing implicit-sync buffers in some configurations) may see the call site fire more often. ### What is reproducible - No regression: 60-second video playback, no fps loss, no rendering artifacts, no compositor stalls. - Strace evidence available at https://git.reauktion.de/marfrit/marfrit-packages under `upstream-submissions/kwin-fourier/measurements/`: - `stock-6.6.4-bbb1080p30-60s.txt` — `strace -c -f` summary - `stock-cr-fourier-ioctls-30s.log` — full per-ioctl trace under chromium-fourier playback - Patch correctness: the dmabuf fd's `poll()` semantics are documented in `Documentation/driver-api/dma-buf.rst` "Implicit Fence Poll Support" and match what `EXPORT_SYNC_FILE`+poll observes. ## Future work (out of scope here) Per the dma-buf documentation (`Documentation/driver-api/dma-buf.rst`, "Implicit Fence Poll Support"), some V4L2 producers do not populate `dma_resv` exclusive fences, which makes both this direct-poll path and the existing `EXPORT_SYNC_FILE` path stub-resolve immediately for those buffers. Filed as a separate kernel patch series at linux-media-devel. --- # KDE submission notes (Per https://invent.kde.org/plasma/kwin and KDE's typical contribution flow.) ## Account setup (one-time) 1. Create account at https://invent.kde.org 2. Sign KDE's CLA at https://contributoragreement.kde.org 3. SSH key configured at https://invent.kde.org/-/profile/keys ## Fork + branch ```sh # On the github web UI: fork plasma/kwin to your namespace git clone git@invent.kde.org:/kwin.git cd kwin git remote add upstream https://invent.kde.org/plasma/kwin.git git fetch upstream git checkout -b transaction-poll-dmabuf-fd-directly upstream/master git am path/to/kwin-fourier/0002-transaction-poll-dmabuf-fd-directly-upstream-shape.patch git push origin HEAD ``` Then create the MR via the web UI (a banner appears on the git push output linking to the create-MR page). ## Reviewers / labels Recommended reviewers (these tend to handle KWin Wayland MRs): - Vlad Zahorodnii (vladz) — KWin Wayland scene/transaction owner - Xaver Hugl (zamundaaa) — KWin DRM / present pipeline - David Edmundson — KWin generally Labels to suggest: - `Component::Wayland` - `Performance` - After acceptance, request a `Backport::Plasma/6.6` (or whichever LTS branch is current) for downstream distros.