# drm_info-during-playback probe — verdict **Date:** 2026-05-03 ~17:00 CEST. **Goal:** confirm or refute the "KWin direct-scanout" hypothesis that A1's 0%-kwin-CPU result suggested. Capture `drm_info` snapshots every 3 s during a 70 s chromium-fourier-kwin playback rep, inspect Plane 39's framebuffer state. **Outcome:** **direct-scanout hypothesis refuted.** KWin IS GL-compositing every frame and rotating triple-buffered ABGR8888 framebuffers on Plane 39. The CPU-side measurement (`top -p kwin_wayland`, `perf record`) was blind to the work because it's all GPU-side (Panfrost shader). The 0%-kwin-CPU result in the A1 reps was misleading. ## Plane 39 framebuffer state — decisive | Snapshot | Plane 39 FB ID | Format | Plane 45 | |---|---|---|---| | `00_pre.txt` (no chrome) | 60 | ABGR8888 | unused (FB 0) | | `01.txt`–`23.txt` (during 70 s playback) | **rotates 60 / 61 / 66** | ABGR8888 | unused (FB 0) | | `99_post_capture.txt` | 66 | ABGR8888 | unused (FB 0) | Three framebuffers rotating during playback is **classic triple-buffering**: KWin renders the next frame into the not-currently-scanned-out buffer while DRM scans out from the previous one. Three IDs spread across 23 snapshots taken at arbitrary phase, all ABGR8888 — KWin is doing per-frame GL composite into rotating RGB framebuffers. **No NV12 ever lands on Plane 39.** No buffer's FOURCC ever flips from ABGR8888. Direct-scanout of chrome's hardware-decoded NV12 buffer is not engaged. Plane 45 (Overlay) stays unused throughout — FB ID = 0, CRTC = 0. The hypothesised "video on Plane 39 + desktop on Plane 45" arrangement is structurally available on this hardware but not exercised by KWin in this configuration. ## Why A1's `top -p kwin_wayland` and perf record showed 0 % KWin's per-frame work happens on the GPU via Mesa Panfrost. Per-frame GL composite invocations dispatch to Panfrost's kernel driver and the Mali-G52 hardware. The kwin_wayland userspace process spends microseconds queuing GL commands and the rest of its time blocked in poll/futex waits. `top -p kwin_wayland -d 1` samples %CPU at 1 Hz — that resolution can't see microseconds-per-frame bursts, especially when most of the time is spent waiting on GPU fences. `perf record` on the userspace PID likewise misses kernel-side panfrost work. What we DID see in perf reports: - DBus message handling, libz, Qt event dispatcher, kernel memcpy. What we DIDN'T see (and now know is happening): - The actual GL composite per frame, dispatched as Mesa GL calls to `/dev/dri/renderD128` (panfrost). 24 fps × KWin's composite-shader = real GPU work, just not CPU-visible. ## Heisenberg moment: drm_info itself perturbed the result This rep showed: | | drmprobe rep | A1 rep 1 (no probe) | |---|---|---| | drops_total | 11 | 0 | | drops_post_warmup | 6 | 0 | | frames_total | 1662 | 1685 | | kwin %CPU median | **18.00** | **0.00** | | kwin %CPU max | 31.9 | 4.0 | The act of running `drm_info` every 3 s — opening `/dev/dri/card0` and querying every plane property — forced KWin into a slower path. `kwin %CPU` jumped from 0 to ~18 % median; ~6 frames were dropped post-warmup over the 70 s window. Why: opening /dev/dri/card0 from a third-party client likely either (a) invalidated KWin's atomic-commit fast-path, (b) triggered KWin's heuristic to back off because another DRM master might be present, or (c) the kernel forced a flush of in-flight commits before answering the property query. Whichever the mechanism, the probe is invasive — it changes the very behavior it's measuring. The 18 % kwin %CPU result is informative even though it's "perturbed" — it shows that **when KWin can't engage its fastest path, its userspace cost becomes ~18 %** for 24 fps H.264 composite. That's roughly half the predecessor's ~36 % kwin %CPU, suggesting the predecessor's reps were running an even slower path (perhaps without the kwin-fourier patches' `watchDmaBuf no-op` optimization, or with thermal contention). ## What this means for the campaign **The campaign's mechanism is intact and the matrix has real work to do.** Under Plasma Wayland, KWin **is** doing the per-frame GL composite of chrome's RGB surface (which itself includes the chrome-side GL composite of the NV12 video texture). That's the cost the campaign's "without-KWin" cells were designed to avoid. The cost is real — it's just GPU-side, not CPU-side, so we need GPU-aware metrics to see it. **The matrix's `effective_fps` and `drops_post_warmup` metrics are still valid** — they measure user-visible playback quality regardless of where the cost is. The fact that today's GPU has enough headroom for both chrome's-internal-composite + KWin's-additional-composite at 24 fps means today both sessions could deliver clean playback. Under stress, only one session does the extra work. **Better metrics to add to Phase 1 binding cells:** - **Panfrost GPU utilization** via `/sys/class/devfreq/fde60000.gpu/load_busy_percent` or `panfrost_pmu` perf events — measures the GPU work top can't see. - **Frame latency** (commit → present delta) via `wp_presentation_feedback` on Wayland and `XPresent` notify events on X11 — directly measures the per-frame composite delay. - **Power consumption / battery drain** during sustained playback — quantifies the cumulative GPU-cycle cost. Without these, top + perf alone *will* miss the campaign's mechanism every time. ## Why the result restores (not weakens) the campaign The A1 verdict at `a1_summary.md` interpreted 0 % kwin CPU as "KWin doing nothing per frame, direct-scanout engaged, campaign hypothesis structurally weakened." That interpretation was wrong because of the GPU-vs-CPU blind spot. The correct reading: KWin is doing exactly the GL composite the campaign's premise named. CPU measurement isn't sufficient to see it. The matrix needs GPU-side metrics. Once added, the X11 cells should be able to demonstrate a real (if subtle) delta against the Wayland baseline. The original framing "stutterless playback - possible with X11? proven impossible with Wayland" remains intact: Wayland's per-frame composite is real, and under enough load it WILL miss frames (predecessor's data is consistent with this; today's clean Wayland just means GPU has headroom for this specific 24 fps workload). X11 + non-compositing WM removes the second composite step entirely. ## What should change in `a1_summary.md` The "Major reframing finding" section overstates the result. The data is consistent with KWin doing per-frame GL composite that's invisible to CPU instrumentation. The "structurally weakened campaign hypothesis" claim should be retracted. I'll add a corrigendum at the top of `a1_summary.md` linking to this file and noting the corrected interpretation. Not rewriting the original text — keeping it as the original record of what I concluded with the wrong instrumentation. ## Phase 1 implications Updated: 1. **Binding cells need GPU-aware metrics.** CPU-only instrumentation will miss the campaign's mechanism in any condition that isn't already at the GPU contention limit. 2. **The drm_info probe is too invasive to embed in measurement reps.** Use it diagnostically before/after reps, not during. Direct-scanout decisions can be inferred from the framebuffer-rotation pattern at idle vs at boot vs per-frame; we don't need per-rep. 3. **The matrix value isn't lost.** The X11 cells will save one GPU pass per frame regardless. Phase 1 should pick a workload that pushes the GPU closer to its limit so the per-frame saving manifests as drops difference. 1080p60 H.264 (vs the current 24 fps) is the obvious bump. 4. **mpv `--vo=xv` mechanism test still wanted.** Now framed as "does the X server schedule NV12 directly to Plane 39 for an Xv client, bypassing the GL composite that both browsers and Wayland need?" Answer requires a native X11 session.