Files
marfrit-packages/arch/kwin-fourier/0001-transaction-bypass-watchDmaBuf-fence-wait.patch
T
marfrit 84088141fd
build and publish packages / distcc-avahi-aarch64 (push) Successful in 35s
build and publish packages / lmcp-any (push) Successful in 7s
build and publish packages / lmcp-debian (push) Successful in 6s
build and publish packages / claude-his-any (push) Successful in 7s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Successful in 12m2s
build and publish packages / claude-his-debian (push) Successful in 9s
kwin-fourier: bypass watchDmaBuf implicit-sync fence wait (experiment)
Hypothesis under test: KWin's Transaction::watchDmaBuf calls
DMA_BUF_IOCTL_EXPORT_SYNC_FILE on every plane of every imported
dmabuf and parks the transaction on a QSocketNotifier(POLLIN)
waiting for that sync_file. On V4L2 hantro CAPTURE buffers (RK3566
mainline 6.19, panfrost mesa 26.0.5) the resulting fence either
never signals or signals so late that chrome's 6-buffer V4L2
capture pool exhausts at ~6s, hard-stalling the decoder. mpv with
gpu-next slideshows at 76% drop. weston A/B with same chrome v4
binary plays through clean — KWin's watchDmaBuf is the suspect.

This experiment patches watchDmaBuf to no-op. Wayland clients are
required by spec to ensure buffer contents are complete before
wl_surface.attach+commit, so the fence-wait is a defensive
optimization for misbehaving clients, not a correctness primitive.

If chrome plays through end-to-end at the recorded 34.7% combined
CPU number with this patched KWin, the bug is confirmed and the
upstream fix can be refined (timeout, V4L2-source skip, or use the
dmabuf fd directly in the QSocketNotifier instead of an extra
exported sync_file).

KWIN_PIVOT.md (in chromium-fourier/) carries the discovery thread.
2026-04-28 17:11:04 +00:00

90 lines
3.7 KiB
Diff

From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] transaction: bypass watchDmaBuf implicit-sync fence wait
Date: 2026-04-28
Background
----------
KWin's `Transaction::watchDmaBuf` (src/wayland/transaction.cpp) calls
`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` on every plane of every imported
dmabuf and parks the transaction on a `QSocketNotifier(POLLIN)`
waiting for the resulting sync_file fd to become readable. The intent
is correct in principle — wait for the producer to finish writing
before sampling — but on V4L2 hantro CAPTURE buffers (RK3566 mainline
6.19, panfrost mesa 26.0.5) the resulting fence either signals so
late that chrome's 6-buffer V4L2 capture pool exhausts, or never
signals at all. Symptom (per chromium-fourier KWIN_PIVOT.md):
- chrome v4 attaches a video frame to a wp_subsurface, commits
- KWin's Transaction::commit calls watchDmaBuf, exports a sync_file,
parks on QSocketNotifier
- Sync_file never becomes readable
- Transaction never applies; old surface state never replaced
- wl_buffer.release for the previous video buffer never sent
- chrome's V4L2 capture pool starves at ~6 seconds, decoder blocks,
audio drains, hard stall
mpv with `--vo=gpu-next` on the same KWin session slideshows at 76%
drop rate but does not deadlock — its single-surface attach pattern
hits a different transaction shape than chrome's subsurface flow.
A clean weston A/B with the same chrome v4 binary plays through
end-to-end: the bug is specifically KWin's transaction fence-wait
path on this stack, not Wayland-as-a-protocol.
Fix
---
This experimental patch no-ops `watchDmaBuf` to test the hypothesis.
Implicit-sync correctness in this case is not lost: the V4L2
producer guarantees the buffer's contents are complete before
chrome sends `wl_surface.attach + commit`, and the wp_linux_dmabuf
client is required to do so by spec. The fence-wait was a defensive
optimization for misbehaving clients, not a correctness primitive.
If chrome plays through end-to-end at the recorded 34.7% combined
CPU number under KWin with this patch, the bug is confirmed and the
upstream fix can be refined (timeout, V4L2-source skip, or use the
dmabuf fd directly in the QSocketNotifier instead of an extra
exported sync_file).
diff --git a/src/wayland/transaction.cpp b/src/wayland/transaction.cpp
index 967b22b..e3fbc06 100644
--- a/src/wayland/transaction.cpp
+++ b/src/wayland/transaction.cpp
@@ -263,27 +263,18 @@ static FileDescriptor exportWaitSyncFile(const FileDescriptor &fileDescriptor)
return FileDescriptor{};
}
#endif
void Transaction::watchDmaBuf(TransactionEntry *entry)
{
-#if defined(Q_OS_LINUX)
- const DmaBufAttributes *attributes = entry->buffer->dmabufAttributes();
- if (!attributes) {
- return;
- }
-
- for (int i = 0; i < attributes->planeCount; ++i) {
- const FileDescriptor &fileDescriptor = attributes->fd[i];
- if (fileDescriptor.isReadable()) {
- continue;
- }
-
- auto syncFile = exportWaitSyncFile(fileDescriptor);
- if (syncFile.isValid()) {
- entry->fences.emplace_back(std::make_unique<TransactionFence>(this, std::move(syncFile)));
- }
- }
-#endif
+ // kwin-fourier: no-op the implicit-sync fence wait. On V4L2
+ // hantro CAPTURE buffers (RK3566 mainline 6.19, panfrost mesa
+ // 26.0.5) the DMA_BUF_IOCTL_EXPORT_SYNC_FILE fence either never
+ // signals or signals so late that chrome's V4L2 capture pool
+ // exhausts at ~6s, hard-stalling the decoder. Wayland clients
+ // are required by spec to ensure the buffer's contents are
+ // complete before wl_surface.attach+commit, so this fence-wait
+ // is a belt-and-braces optimization, not a correctness primitive.
+ Q_UNUSED(entry);
}
} // namespace KWin