forked from marfrit/marfrit-packages
upstream-submissions/kwin-fourier: validation findings + revised MR body
Captured stock kwin 6.6.4 ioctl baselines on ohm (PineTab2 / RK3566 / mainline 6.19.10 / Plasma 6.6.4 Wayland): - Brave + bbb 1080p30 (sw decode), 60 s: 96,120 ioctls, 0 EXPORT_SYNC_FILE - chromium-fourier + bbb 1080p30, 30 s: 29,128 ioctls, 0 EXPORT_SYNC_FILE, ~7,800 SYNCOBJ_* (explicit sync) Finding: KWin 6.6.4 + drm-syncobj-aware clients negotiate wp_linux_drm_syncobj_v1 explicit sync, leaving Transaction::watchDmaBuf on the legacy implicit-sync path that fires only for clients that don't advertise drm-syncobj support. Updated kde-mr-body.md to reflect honest scope: structural cleanup of the legacy path (still relevant for older clients / V4L2 pipelines that don't wrap dmabufs in syncobjs) rather than the originally- hypothesized per-frame ioctl reduction. Patch is correct by construction; no regression on the no-targeted-path workload. Evidence files saved under measurements/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -71,17 +71,62 @@ documents as the recommended primitive for this case.
|
|||||||
|
|
||||||
## Validation
|
## Validation
|
||||||
|
|
||||||
Tested on PineTab2 (RK3566 / Mali-G52 / mainline kernel 6.19.10 /
|
Built and run on PineTab2 (RK3566 / Mali-G52 / mainline kernel
|
||||||
panfrost mesa 26.0.5 / KDE Plasma 6.6.4 Wayland) playing 1080p30
|
6.19.10 / panfrost mesa 26.0.5 / KDE Plasma 6.6.4 Wayland). KWin
|
||||||
H.264 in chromium under chromium-fourier. Frame rate and CPU
|
session starts cleanly, multi-window OpenGL workloads (Brave,
|
||||||
profile equivalent to the previous code path; the savings are in
|
plasma-overview, kate, dolphin) render normally — no regression.
|
||||||
compositor-loop microseconds rather than user-visible fps.
|
|
||||||
`strace -c` on `kwin_wayland` during a 60-second playback shows
|
|
||||||
**X% fewer `ioctl` calls** with this patch versus stock.
|
|
||||||
|
|
||||||
(Will fill in the actual `strace -c` numbers once the kwin-fourier
|
### Honest finding from `strace -e trace=ioctl` on `kwin_wayland`
|
||||||
package built with this patch is validated end-to-end on ohm —
|
|
||||||
work in progress.)
|
In 30 s of 1080p30 H.264 playback under chromium-fourier
|
||||||
|
(wp_linux_dmabuf_v1 client, V4L2-stateless decode capable):
|
||||||
|
|
||||||
|
- **Zero `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`** calls.
|
||||||
|
- 29,128 ioctl calls total, dominated by:
|
||||||
|
- `DRM_IOCTL_PANFROST_*` (rendering)
|
||||||
|
- `DRM_IOCTL_SYNCOBJ_*` (≈ 7,800 — explicit-sync syncobj operations)
|
||||||
|
|
||||||
|
The same shape holds with stock Brave + 60 s playback (96,120 total
|
||||||
|
ioctls, 0 EXPORT_SYNC_FILE).
|
||||||
|
|
||||||
|
KWin 6.6.4 with a modern client negotiates `wp_linux_drm_syncobj_v1`
|
||||||
|
**explicit sync** — `Transaction::watchDmaBuf` is not on the hot path
|
||||||
|
in this configuration. The function still exists and runs for clients
|
||||||
|
that don't advertise drm-syncobj support (older Wayland clients,
|
||||||
|
some video pipelines that import implicit-sync producer dmabufs
|
||||||
|
without wrapping them), but on a current Plasma + drm-syncobj-aware
|
||||||
|
client stack, the call frequency we initially expected (≈ 60
|
||||||
|
ioctls/sec/client at 1080p30) does not materialize.
|
||||||
|
|
||||||
|
### What this means for the patch
|
||||||
|
|
||||||
|
The patch is still correct and worth taking, but the *value
|
||||||
|
proposition* shifts from "measurable per-frame win on V4L2 video
|
||||||
|
playback" to "remove a kernel round-trip on the legacy implicit-sync
|
||||||
|
path that still services older Wayland clients and V4L2 producers
|
||||||
|
that don't go through drm-syncobj-aware compositor paths".
|
||||||
|
|
||||||
|
The structural improvement is real wherever `Transaction::watchDmaBuf`
|
||||||
|
fires; we just couldn't construct a benchmark on this hardware that
|
||||||
|
fires it heavily enough to put a percentage on. Reviewers with
|
||||||
|
older client baselines (Plasma 5 era apps, GTK on older Wayland,
|
||||||
|
xwayland passing implicit-sync buffers in some configurations) may
|
||||||
|
see the call site fire more often.
|
||||||
|
|
||||||
|
### What is reproducible
|
||||||
|
|
||||||
|
- No regression: 60-second video playback, no fps loss, no rendering
|
||||||
|
artifacts, no compositor stalls.
|
||||||
|
- Strace evidence available at
|
||||||
|
https://git.reauktion.de/marfrit/marfrit-packages
|
||||||
|
under `upstream-submissions/kwin-fourier/measurements/`:
|
||||||
|
- `stock-6.6.4-bbb1080p30-60s.txt` — `strace -c -f` summary
|
||||||
|
- `stock-cr-fourier-ioctls-30s.log` — full per-ioctl trace under
|
||||||
|
chromium-fourier playback
|
||||||
|
- Patch correctness: the dmabuf fd's `poll()` semantics are
|
||||||
|
documented in `Documentation/driver-api/dma-buf.rst` "Implicit
|
||||||
|
Fence Poll Support" and match what `EXPORT_SYNC_FILE`+poll
|
||||||
|
observes.
|
||||||
|
|
||||||
## Future work (out of scope here)
|
## Future work (out of scope here)
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,29 @@
|
|||||||
|
strace: Process 3891 attached with 6 threads
|
||||||
|
strace: Process 3891 detached
|
||||||
|
strace: Process 3931 detached
|
||||||
|
strace: Process 3941 detached
|
||||||
|
strace: Process 3932 detached
|
||||||
|
strace: Process 3930 detached
|
||||||
|
strace: Process 3928 detached
|
||||||
|
% time seconds usecs/call calls errors syscall
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
44.28 8.740874 801 10908 ppoll
|
||||||
|
35.51 7.011272 1221 5740 459 futex
|
||||||
|
8.65 1.708574 17 96120 ioctl
|
||||||
|
3.41 0.673061 427 1576 clock_nanosleep
|
||||||
|
2.57 0.506512 48 10361 mmap
|
||||||
|
1.94 0.383496 37 10358 munmap
|
||||||
|
1.08 0.212343 16 13241 close
|
||||||
|
0.73 0.145039 42 3426 sendmsg
|
||||||
|
0.59 0.116589 7 16232 getpid
|
||||||
|
0.31 0.061258 18 3337 173 read
|
||||||
|
0.28 0.055864 19 2888 1444 recvmsg
|
||||||
|
0.23 0.045432 10 4428 fcntl
|
||||||
|
0.19 0.037662 24 1514 write
|
||||||
|
0.15 0.029347 16 1749 epoll_pwait
|
||||||
|
0.06 0.011514 8 1432 dup
|
||||||
|
0.01 0.002820 2820 1 restart_syscall
|
||||||
|
0.00 0.000272 34 8 timerfd_settime
|
||||||
|
0.00 0.000054 27 2 sysinfo
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
100.00 19.741983 107 183321 2076 total
|
||||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user