The drm_info-during-playback probe under Wayland confirmed KWin IS GL-compositing per frame: Plane 39 rotates triple- buffered ABGR8888 framebuffers (FB IDs 60/61/66) during playback. The earlier "0% kwin CPU = direct-scanout" reading in a1_summary.md was a CPU-blind-spot artifact — Panfrost shader work isn't visible to top or perf-on-userspace. Corrigendum added to a1_summary.md preserving the original text as the on-the-day record. Worklist: A1 entry updated to point at both summaries. The probe itself was invasive (drm_info every 3s perturbed KWin's atomic-commit fast-path, kwin %CPU jumped 0->18 median and 6 drops appeared) — usable diagnostically but cannot be embedded in measurement reps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.8 KiB
drm_info-during-playback probe — verdict
Date: 2026-05-03 ~17:00 CEST.
Goal: confirm or refute the "KWin direct-scanout" hypothesis
that A1's 0%-kwin-CPU result suggested. Capture
drm_info snapshots every 3 s during a 70 s
chromium-fourier-kwin playback rep, inspect Plane 39's framebuffer
state.
Outcome: direct-scanout hypothesis refuted. KWin IS
GL-compositing every frame and rotating triple-buffered ABGR8888
framebuffers on Plane 39. The CPU-side measurement (top -p kwin_wayland,
perf record) was blind to the work because it's all GPU-side
(Panfrost shader). The 0%-kwin-CPU result in the A1 reps was
misleading.
Plane 39 framebuffer state — decisive
| Snapshot | Plane 39 FB ID | Format | Plane 45 |
|---|---|---|---|
00_pre.txt (no chrome) |
60 | ABGR8888 | unused (FB 0) |
01.txt–23.txt (during 70 s playback) |
rotates 60 / 61 / 66 | ABGR8888 | unused (FB 0) |
99_post_capture.txt |
66 | ABGR8888 | unused (FB 0) |
Three framebuffers rotating during playback is classic triple-buffering: KWin renders the next frame into the not-currently-scanned-out buffer while DRM scans out from the previous one. Three IDs spread across 23 snapshots taken at arbitrary phase, all ABGR8888 — KWin is doing per-frame GL composite into rotating RGB framebuffers.
No NV12 ever lands on Plane 39. No buffer's FOURCC ever flips from ABGR8888. Direct-scanout of chrome's hardware-decoded NV12 buffer is not engaged.
Plane 45 (Overlay) stays unused throughout — FB ID = 0, CRTC = 0. The hypothesised "video on Plane 39 + desktop on Plane 45" arrangement is structurally available on this hardware but not exercised by KWin in this configuration.
Why A1's top -p kwin_wayland and perf record showed 0 %
KWin's per-frame work happens on the GPU via Mesa Panfrost. Per-frame
GL composite invocations dispatch to Panfrost's kernel driver and
the Mali-G52 hardware. The kwin_wayland userspace process spends
microseconds queuing GL commands and the rest of its time blocked
in poll/futex waits. top -p kwin_wayland -d 1 samples %CPU at
1 Hz — that resolution can't see microseconds-per-frame bursts,
especially when most of the time is spent waiting on GPU fences.
perf record on the userspace PID likewise misses kernel-side
panfrost work.
What we DID see in perf reports:
- DBus message handling, libz, Qt event dispatcher, kernel memcpy.
What we DIDN'T see (and now know is happening):
- The actual GL composite per frame, dispatched as Mesa GL calls
to
/dev/dri/renderD128(panfrost). 24 fps × KWin's composite-shader = real GPU work, just not CPU-visible.
Heisenberg moment: drm_info itself perturbed the result
This rep showed:
| drmprobe rep | A1 rep 1 (no probe) | |
|---|---|---|
| drops_total | 11 | 0 |
| drops_post_warmup | 6 | 0 |
| frames_total | 1662 | 1685 |
| kwin %CPU median | 18.00 | 0.00 |
| kwin %CPU max | 31.9 | 4.0 |
The act of running drm_info every 3 s — opening
/dev/dri/card0 and querying every plane property —
forced KWin into a slower path. kwin %CPU jumped from 0
to ~18 % median; ~6 frames were dropped post-warmup over the
70 s window.
Why: opening /dev/dri/card0 from a third-party client likely either (a) invalidated KWin's atomic-commit fast-path, (b) triggered KWin's heuristic to back off because another DRM master might be present, or (c) the kernel forced a flush of in-flight commits before answering the property query. Whichever the mechanism, the probe is invasive — it changes the very behavior it's measuring.
The 18 % kwin %CPU result is informative even though it's
"perturbed" — it shows that when KWin can't engage its
fastest path, its userspace cost becomes ~18 % for 24 fps
H.264 composite. That's roughly half the predecessor's
~36 % kwin %CPU, suggesting the predecessor's reps were
running an even slower path (perhaps without the
kwin-fourier patches' watchDmaBuf no-op optimization, or
with thermal contention).
What this means for the campaign
The campaign's mechanism is intact and the matrix has real work to do.
Under Plasma Wayland, KWin is doing the per-frame GL composite of chrome's RGB surface (which itself includes the chrome-side GL composite of the NV12 video texture). That's the cost the campaign's "without-KWin" cells were designed to avoid. The cost is real — it's just GPU-side, not CPU-side, so we need GPU-aware metrics to see it.
The matrix's effective_fps and drops_post_warmup
metrics are still valid — they measure user-visible
playback quality regardless of where the cost is. The fact
that today's GPU has enough headroom for both
chrome's-internal-composite + KWin's-additional-composite
at 24 fps means today both sessions could deliver clean
playback. Under stress, only one session does the extra
work.
Better metrics to add to Phase 1 binding cells:
- Panfrost GPU utilization via
/sys/class/devfreq/fde60000.gpu/load_busy_percentorpanfrost_pmuperf events — measures the GPU work top can't see. - Frame latency (commit → present delta) via
wp_presentation_feedbackon Wayland andXPresentnotify events on X11 — directly measures the per-frame composite delay. - Power consumption / battery drain during sustained playback — quantifies the cumulative GPU-cycle cost.
Without these, top + perf alone will miss the campaign's mechanism every time.
Why the result restores (not weakens) the campaign
The A1 verdict at a1_summary.md interpreted 0 % kwin CPU as
"KWin doing nothing per frame, direct-scanout engaged,
campaign hypothesis structurally weakened." That interpretation
was wrong because of the GPU-vs-CPU blind spot.
The correct reading: KWin is doing exactly the GL composite the campaign's premise named. CPU measurement isn't sufficient to see it. The matrix needs GPU-side metrics. Once added, the X11 cells should be able to demonstrate a real (if subtle) delta against the Wayland baseline.
The original framing "stutterless playback - possible with X11? proven impossible with Wayland" remains intact: Wayland's per-frame composite is real, and under enough load it WILL miss frames (predecessor's data is consistent with this; today's clean Wayland just means GPU has headroom for this specific 24 fps workload). X11 + non-compositing WM removes the second composite step entirely.
What should change in a1_summary.md
The "Major reframing finding" section overstates the result. The data is consistent with KWin doing per-frame GL composite that's invisible to CPU instrumentation. The "structurally weakened campaign hypothesis" claim should be retracted.
I'll add a corrigendum at the top of a1_summary.md linking
to this file and noting the corrected interpretation. Not
rewriting the original text — keeping it as the original
record of what I concluded with the wrong instrumentation.
Phase 1 implications
Updated:
- Binding cells need GPU-aware metrics. CPU-only instrumentation will miss the campaign's mechanism in any condition that isn't already at the GPU contention limit.
- The drm_info probe is too invasive to embed in measurement reps. Use it diagnostically before/after reps, not during. Direct-scanout decisions can be inferred from the framebuffer-rotation pattern at idle vs at boot vs per-frame; we don't need per-rep.
- The matrix value isn't lost. The X11 cells will save one GPU pass per frame regardless. Phase 1 should pick a workload that pushes the GPU closer to its limit so the per-frame saving manifests as drops difference. 1080p60 H.264 (vs the current 24 fps) is the obvious bump.
- mpv
--vo=xvmechanism test still wanted. Now framed as "does the X server schedule NV12 directly to Plane 39 for an Xv client, bypassing the GL composite that both browsers and Wayland need?" Answer requires a native X11 session.