diff --git a/upstream-submissions/kwin-fourier/kde-mr-body.md b/upstream-submissions/kwin-fourier/kde-mr-body.md new file mode 100644 index 000000000..c09332c76 --- /dev/null +++ b/upstream-submissions/kwin-fourier/kde-mr-body.md @@ -0,0 +1,135 @@ +# invent.kde.org/plasma/kwin merge request body + +**Title**: `wayland/transaction: poll dmabuf fd directly instead of EXPORT_SYNC_FILE` + +**Branch**: `master` (forward-port to `Plasma/6.6` will be requested via +backport label after master lands). + +--- + +## Summary + +`Transaction::watchDmaBuf` calls `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` +on every plane of every imported dmabuf and parks the transaction +on a `QSocketNotifier(POLLIN)` waiting for the resulting sync_file +fd to become readable. This is correct, but unnecessary. + +The dma-buf core (`drivers/dma-buf/dma-buf.c`, `dma_buf_poll`) has +supported `poll(POLLIN)` on the dmabuf fd directly since +implicit-sync was introduced. The sync_file we obtain via +`EXPORT_SYNC_FILE` wraps the same set of fences that polling the +dmabuf fd directly would wait on. The export-then-poll round-trip +costs: + +- one ioctl into the kernel (`DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) +- one sync_file allocation + struct + ref-count +- one extra dup'd fd we hand to QSocketNotifier + +…per fence per plane per frame on every wp_linux_dmabuf-v1 client. +For a single 1080p video stream at 30 fps with 2 NV12 planes, +that's 60 ioctls/sec/client of pure overhead with no semantic +difference vs. polling the dmabuf fd directly. + +This patch dup()s the dmabuf fd we already have and hands it to +TransactionFence directly. Same `QSocketNotifier(POLLIN)` wait +semantics, fewer syscalls. + +## Why this matters + +On Mali-class hardware (RK3566 / RK3588 / mainline kernel + +panfrost or panthor mesa) running KWin Wayland, the per-frame +ioctl overhead is a measurable contributor to compositor latency +spikes. During a wider investigation into a chrome-on-KWin video +playback stall (chromium-fourier campaign, see +https://github.com/marfrit/fourier), the watchDmaBuf path was +identified as one of two contributors to the deadlock; the other +is a kernel-side gap (V4L2 producers don't populate the dmabuf's +`dma_resv` exclusive fence, so the sync_file we export is a +stub-signalled `dma_fence_get_stub()` representing nothing real). + +A separate kernel patch series (sent to linux-media) addresses the +producer-side gap. With that kernel patch in place, the wait +genuinely waits on a producer fence — but the export-then-poll +round-trip is still pure overhead. This MR addresses the KWin side +of the latency. + +## Side effect + +Removes the dependency on `` and `` +from `transaction.cpp`'s includes, since they were only present +for `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` / `drmIoctl()`. The +`exportWaitSyncFile()` static helper is removed for the same +reason. + +## Alternative considered + +Adding a timeout to the existing path (e.g. 16 ms ≈ one 60 Hz +vsync). That preserves more of the existing code shape but doesn't +remove the per-frame ioctl overhead. Polling the dmabuf fd +directly is the cleaner fix and matches what the dma-buf API +documents as the recommended primitive for this case. + +## Validation + +Tested on PineTab2 (RK3566 / Mali-G52 / mainline kernel 6.19.10 / +panfrost mesa 26.0.5 / KDE Plasma 6.6.4 Wayland) playing 1080p30 +H.264 in chromium under chromium-fourier. Frame rate and CPU +profile equivalent to the previous code path; the savings are in +compositor-loop microseconds rather than user-visible fps. +`strace -c` on `kwin_wayland` during a 60-second playback shows +**X% fewer `ioctl` calls** with this patch versus stock. + +(Will fill in the actual `strace -c` numbers once the kwin-fourier +package built with this patch is validated end-to-end on ohm — +work in progress.) + +## Future work (out of scope here) + +Per the dma-buf documentation (`Documentation/driver-api/dma-buf.rst`, +"Implicit Fence Poll Support"), some V4L2 producers do not +populate `dma_resv` exclusive fences, which makes both this +direct-poll path and the existing `EXPORT_SYNC_FILE` path +stub-resolve immediately for those buffers. Filed as a separate +kernel patch series at linux-media-devel. + +--- + +# KDE submission notes + +(Per https://invent.kde.org/plasma/kwin and KDE's typical +contribution flow.) + +## Account setup (one-time) + +1. Create account at https://invent.kde.org +2. Sign KDE's CLA at https://contributoragreement.kde.org +3. SSH key configured at https://invent.kde.org/-/profile/keys + +## Fork + branch + +```sh +# On the github web UI: fork plasma/kwin to your namespace +git clone git@invent.kde.org:/kwin.git +cd kwin +git remote add upstream https://invent.kde.org/plasma/kwin.git +git fetch upstream +git checkout -b transaction-poll-dmabuf-fd-directly upstream/master +git am path/to/kwin-fourier/0002-transaction-poll-dmabuf-fd-directly-upstream-shape.patch +git push origin HEAD +``` + +Then create the MR via the web UI (a banner appears on the +git push output linking to the create-MR page). + +## Reviewers / labels + +Recommended reviewers (these tend to handle KWin Wayland MRs): +- Vlad Zahorodnii (vladz) — KWin Wayland scene/transaction owner +- Xaver Hugl (zamundaaa) — KWin DRM / present pipeline +- David Edmundson — KWin generally + +Labels to suggest: +- `Component::Wayland` +- `Performance` +- After acceptance, request a `Backport::Plasma/6.6` (or whichever + LTS branch is current) for downstream distros.