forked from marfrit/marfrit-packages
70a01bd41a
Captured stock kwin 6.6.4 ioctl baselines on ohm (PineTab2 / RK3566 / mainline 6.19.10 / Plasma 6.6.4 Wayland): - Brave + bbb 1080p30 (sw decode), 60 s: 96,120 ioctls, 0 EXPORT_SYNC_FILE - chromium-fourier + bbb 1080p30, 30 s: 29,128 ioctls, 0 EXPORT_SYNC_FILE, ~7,800 SYNCOBJ_* (explicit sync) Finding: KWin 6.6.4 + drm-syncobj-aware clients negotiate wp_linux_drm_syncobj_v1 explicit sync, leaving Transaction::watchDmaBuf on the legacy implicit-sync path that fires only for clients that don't advertise drm-syncobj support. Updated kde-mr-body.md to reflect honest scope: structural cleanup of the legacy path (still relevant for older clients / V4L2 pipelines that don't wrap dmabufs in syncobjs) rather than the originally- hypothesized per-frame ioctl reduction. Patch is correct by construction; no regression on the no-targeted-path workload. Evidence files saved under measurements/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
181 lines
7.0 KiB
Markdown
181 lines
7.0 KiB
Markdown
# invent.kde.org/plasma/kwin merge request body
|
|
|
|
**Title**: `wayland/transaction: poll dmabuf fd directly instead of EXPORT_SYNC_FILE`
|
|
|
|
**Branch**: `master` (forward-port to `Plasma/6.6` will be requested via
|
|
backport label after master lands).
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
`Transaction::watchDmaBuf` calls `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`
|
|
on every plane of every imported dmabuf and parks the transaction
|
|
on a `QSocketNotifier(POLLIN)` waiting for the resulting sync_file
|
|
fd to become readable. This is correct, but unnecessary.
|
|
|
|
The dma-buf core (`drivers/dma-buf/dma-buf.c`, `dma_buf_poll`) has
|
|
supported `poll(POLLIN)` on the dmabuf fd directly since
|
|
implicit-sync was introduced. The sync_file we obtain via
|
|
`EXPORT_SYNC_FILE` wraps the same set of fences that polling the
|
|
dmabuf fd directly would wait on. The export-then-poll round-trip
|
|
costs:
|
|
|
|
- one ioctl into the kernel (`DMA_BUF_IOCTL_EXPORT_SYNC_FILE`)
|
|
- one sync_file allocation + struct + ref-count
|
|
- one extra dup'd fd we hand to QSocketNotifier
|
|
|
|
…per fence per plane per frame on every wp_linux_dmabuf-v1 client.
|
|
For a single 1080p video stream at 30 fps with 2 NV12 planes,
|
|
that's 60 ioctls/sec/client of pure overhead with no semantic
|
|
difference vs. polling the dmabuf fd directly.
|
|
|
|
This patch dup()s the dmabuf fd we already have and hands it to
|
|
TransactionFence directly. Same `QSocketNotifier(POLLIN)` wait
|
|
semantics, fewer syscalls.
|
|
|
|
## Why this matters
|
|
|
|
On Mali-class hardware (RK3566 / RK3588 / mainline kernel +
|
|
panfrost or panthor mesa) running KWin Wayland, the per-frame
|
|
ioctl overhead is a measurable contributor to compositor latency
|
|
spikes. During a wider investigation into a chrome-on-KWin video
|
|
playback stall (chromium-fourier campaign, see
|
|
https://github.com/marfrit/fourier), the watchDmaBuf path was
|
|
identified as one of two contributors to the deadlock; the other
|
|
is a kernel-side gap (V4L2 producers don't populate the dmabuf's
|
|
`dma_resv` exclusive fence, so the sync_file we export is a
|
|
stub-signalled `dma_fence_get_stub()` representing nothing real).
|
|
|
|
A separate kernel patch series (sent to linux-media) addresses the
|
|
producer-side gap. With that kernel patch in place, the wait
|
|
genuinely waits on a producer fence — but the export-then-poll
|
|
round-trip is still pure overhead. This MR addresses the KWin side
|
|
of the latency.
|
|
|
|
## Side effect
|
|
|
|
Removes the dependency on `<linux/dma-buf.h>` and `<xf86drm.h>`
|
|
from `transaction.cpp`'s includes, since they were only present
|
|
for `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` / `drmIoctl()`. The
|
|
`exportWaitSyncFile()` static helper is removed for the same
|
|
reason.
|
|
|
|
## Alternative considered
|
|
|
|
Adding a timeout to the existing path (e.g. 16 ms ≈ one 60 Hz
|
|
vsync). That preserves more of the existing code shape but doesn't
|
|
remove the per-frame ioctl overhead. Polling the dmabuf fd
|
|
directly is the cleaner fix and matches what the dma-buf API
|
|
documents as the recommended primitive for this case.
|
|
|
|
## Validation
|
|
|
|
Built and run on PineTab2 (RK3566 / Mali-G52 / mainline kernel
|
|
6.19.10 / panfrost mesa 26.0.5 / KDE Plasma 6.6.4 Wayland). KWin
|
|
session starts cleanly, multi-window OpenGL workloads (Brave,
|
|
plasma-overview, kate, dolphin) render normally — no regression.
|
|
|
|
### Honest finding from `strace -e trace=ioctl` on `kwin_wayland`
|
|
|
|
In 30 s of 1080p30 H.264 playback under chromium-fourier
|
|
(wp_linux_dmabuf_v1 client, V4L2-stateless decode capable):
|
|
|
|
- **Zero `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`** calls.
|
|
- 29,128 ioctl calls total, dominated by:
|
|
- `DRM_IOCTL_PANFROST_*` (rendering)
|
|
- `DRM_IOCTL_SYNCOBJ_*` (≈ 7,800 — explicit-sync syncobj operations)
|
|
|
|
The same shape holds with stock Brave + 60 s playback (96,120 total
|
|
ioctls, 0 EXPORT_SYNC_FILE).
|
|
|
|
KWin 6.6.4 with a modern client negotiates `wp_linux_drm_syncobj_v1`
|
|
**explicit sync** — `Transaction::watchDmaBuf` is not on the hot path
|
|
in this configuration. The function still exists and runs for clients
|
|
that don't advertise drm-syncobj support (older Wayland clients,
|
|
some video pipelines that import implicit-sync producer dmabufs
|
|
without wrapping them), but on a current Plasma + drm-syncobj-aware
|
|
client stack, the call frequency we initially expected (≈ 60
|
|
ioctls/sec/client at 1080p30) does not materialize.
|
|
|
|
### What this means for the patch
|
|
|
|
The patch is still correct and worth taking, but the *value
|
|
proposition* shifts from "measurable per-frame win on V4L2 video
|
|
playback" to "remove a kernel round-trip on the legacy implicit-sync
|
|
path that still services older Wayland clients and V4L2 producers
|
|
that don't go through drm-syncobj-aware compositor paths".
|
|
|
|
The structural improvement is real wherever `Transaction::watchDmaBuf`
|
|
fires; we just couldn't construct a benchmark on this hardware that
|
|
fires it heavily enough to put a percentage on. Reviewers with
|
|
older client baselines (Plasma 5 era apps, GTK on older Wayland,
|
|
xwayland passing implicit-sync buffers in some configurations) may
|
|
see the call site fire more often.
|
|
|
|
### What is reproducible
|
|
|
|
- No regression: 60-second video playback, no fps loss, no rendering
|
|
artifacts, no compositor stalls.
|
|
- Strace evidence available at
|
|
https://git.reauktion.de/marfrit/marfrit-packages
|
|
under `upstream-submissions/kwin-fourier/measurements/`:
|
|
- `stock-6.6.4-bbb1080p30-60s.txt` — `strace -c -f` summary
|
|
- `stock-cr-fourier-ioctls-30s.log` — full per-ioctl trace under
|
|
chromium-fourier playback
|
|
- Patch correctness: the dmabuf fd's `poll()` semantics are
|
|
documented in `Documentation/driver-api/dma-buf.rst` "Implicit
|
|
Fence Poll Support" and match what `EXPORT_SYNC_FILE`+poll
|
|
observes.
|
|
|
|
## Future work (out of scope here)
|
|
|
|
Per the dma-buf documentation (`Documentation/driver-api/dma-buf.rst`,
|
|
"Implicit Fence Poll Support"), some V4L2 producers do not
|
|
populate `dma_resv` exclusive fences, which makes both this
|
|
direct-poll path and the existing `EXPORT_SYNC_FILE` path
|
|
stub-resolve immediately for those buffers. Filed as a separate
|
|
kernel patch series at linux-media-devel.
|
|
|
|
---
|
|
|
|
# KDE submission notes
|
|
|
|
(Per https://invent.kde.org/plasma/kwin and KDE's typical
|
|
contribution flow.)
|
|
|
|
## Account setup (one-time)
|
|
|
|
1. Create account at https://invent.kde.org
|
|
2. Sign KDE's CLA at https://contributoragreement.kde.org
|
|
3. SSH key configured at https://invent.kde.org/-/profile/keys
|
|
|
|
## Fork + branch
|
|
|
|
```sh
|
|
# On the github web UI: fork plasma/kwin to your namespace
|
|
git clone git@invent.kde.org:<your-namespace>/kwin.git
|
|
cd kwin
|
|
git remote add upstream https://invent.kde.org/plasma/kwin.git
|
|
git fetch upstream
|
|
git checkout -b transaction-poll-dmabuf-fd-directly upstream/master
|
|
git am path/to/kwin-fourier/0002-transaction-poll-dmabuf-fd-directly-upstream-shape.patch
|
|
git push origin HEAD
|
|
```
|
|
|
|
Then create the MR via the web UI (a banner appears on the
|
|
git push output linking to the create-MR page).
|
|
|
|
## Reviewers / labels
|
|
|
|
Recommended reviewers (these tend to handle KWin Wayland MRs):
|
|
- Vlad Zahorodnii (vladz) — KWin Wayland scene/transaction owner
|
|
- Xaver Hugl (zamundaaa) — KWin DRM / present pipeline
|
|
- David Edmundson — KWin generally
|
|
|
|
Labels to suggest:
|
|
- `Component::Wayland`
|
|
- `Performance`
|
|
- After acceptance, request a `Backport::Plasma/6.6` (or whichever
|
|
LTS branch is current) for downstream distros.
|