# A1 baseline — verdict **Three in-session reps of chromium-fourier 149 / brave_drops_test.html / Plasma Wayland 6.6.4 + kwin-fourier 6.6.4-3 + qt6-base-fourier 6.11.0-3 acquired 2026-05-03 14:23–14:31 CEST.** Per the campaign-contained-data discipline, this is the only Wayland baseline this campaign uses. Predecessor numbers are referenced below as instrument-sanity context, not as comparison targets. ## Results | Metric | Rep 1 | Rep 2 | Rep 3 | Median (cluster) | |---|---|---|---|---| | frames_total (over 70 s) | 1685 | 1685 | 1686 | **1685** | | effective fps | 24.04 | 24.04 | 24.05 | **24.04** | | drops_total | 0 | 0 | 0 | **0** | | drops_post_warmup (after t=10s) | 0 | 0 | 0 | **0** | | kwin_wayland %CPU median (1 Hz × 70 samples) | 0.00 | 0.00 | 0.00 | **0.00** | | kwin_wayland %CPU mean | 0.07 | 0.04 | 0.04 | **0.04** | | kwin_wayland %CPU max | 4.00 | 1.00 | 3.00 | **3.00** | | perf samples on kwin_wayland (99 Hz × 70 s) | 39 | 28 | (similar) | tens, not thousands | **IQR / median for the spread metrics is 0 (cluster is degenerate at the lower bound).** Per the protocol's exit-condition tree, this is the "tight cluster" branch: Phase 1 binding cells can use these as the anchor with a sub-1% tolerance band on kwin %CPU and a ≤ 1-frame tolerance on drops_post_warmup. Source video plays at 24 fps for 70.09 s ≈ 1682 frames; observed 1685 frames matches within rounding (chromium counts the DROPS_TRAJECTORY playing-event frame separately). ## What the perf reports say kwin was actually doing For all three reps, the perf samples on kwin_wayland during playback are dominated by event-loop bookkeeping, **not** by any GL-composite or dmabuf-import path: | Rep | Top symbol(s) at non-trivial % | |---|---| | 1 | `__pi_memcpy_generic` (97.18 %) — single memcpy event in a 39-sample run | | 2 | `libz.so` (37.57 %), `call_filldir` (31.91 %) — readdir + zlib | | 3 | `dbus_message_unref` (38.80 %), `QUnixEventDispatcherQPA::processEvents` (36.72 %), `libz.so` (23.30 %) — DBus + Qt event loop | **Zero samples anywhere in:** - `glEGLImageTargetTexture2DOES` (the GL EGL image bind path the predecessor's `kwin_overlay_subsurface` campaign Phase 2 hypothesised would dominate per-frame KWin cost on this hardware) - `panfrost_*` (Mesa Panfrost driver routines) - `wp_subsurface_*` / `WaylandSurface_*` (overlay/subsurface protocol handling) - any `Compositor::*` / `OpenGLBackend::*` / `OutputLayer::*` KWin internal symbols The total cycle count across all three reps is in the millions, not billions — kwin_wayland was scheduled out for >99 % of the 70 s capture window in every rep. ## What this means **The "KWin is the bottleneck" framing the campaign was built around is structurally weakened by these data.** The campaign's load-bearing hypothesis (`README.md` § 1) was that "the campaign's load-bearing hypothesis is that this plane-allocation freedom translates into measurable browser-video speedup." That hypothesis was built on top of the predecessor's observation that `kwin_wayland` consumed ~36 % CPU during similar playback, attributable per the predecessor's Phase 2 source-read to per-frame GL composite of NV12 → RGB. **Today's A1 reps show kwin_wayland at 0 % median, with no GL-composite work in the perf samples.** There is no Wayland-side KWin-induced overhead for the X11 cells to *be faster than*. ### The most likely mechanism (hypothesis, not yet verified) KWin 6.6.4 (with kwin-fourier 6.6.4-3 patches applied) appears to have engaged its **direct-scanout** code path for the chrome-window-displaying-video workload. KWin's direct-scanout support has been there for years on the Wayland backend and has been progressively widened: when there's a single visible "top" surface (the chrome window) whose buffer matches a hardware plane's format/modifier capabilities, KWin can hand that buffer to DRM directly without first GL-compositing it into KWin's own framebuffer. The browser's RGB or NV12 dmabuf goes onto Plane 39 (the Primary plane on rockchip-drm RK3568) without any per-frame KWin GPU work. This is **not** the wp_subsurface route the predecessor was investigating — it's a different, simpler scanout path that doesn't require the client to opt into overlay protocol. It just requires the client's surface to be the only visible non-trivial top-level plus a buffer format/modifier that DRM can scan out. If this hypothesis is correct, two things follow: 1. **The campaign's X11-vs-Wayland delta is much smaller than originally expected.** Both sessions can avoid per-frame compositor work for the single-window video case. The X11 cells will not be measurably faster than Wayland for this workload. 2. **The campaign's mechanism is realised under Wayland too, when conditions are right.** "Plane-allocation freedom" is not X11-exclusive — it's just easier to engage on X11 because there's no compositor in the path at all. On Wayland, KWin engages an equivalent fast-path when its heuristics allow. ### What the data does NOT establish - That direct-scanout is the cause. The data is consistent with direct-scanout but a perf-only diagnosis can't pin it down. Phase 1 should add `drm_info` snapshots during playback (which plane is programmed with the chrome window's buffer FOURCC + modifier?) and KWin debug logging (`KWIN_DRM=1` dumps direct-scanout decisions) to confirm. - That this behavior holds for multi-window scenarios. A single visible non-trivial top-level window is the simplest case for direct-scanout. If the operator works multi-windowed (panel + chrome + terminal + Konsole), the fast-path may decline and kwin %CPU may rebound. The matrix's relevance to daily-driver scenarios depends on this. - That this behavior holds across browsers. chromium-fourier 149 specifically has the patches that enable smooth NV12 dmabuf production. Brave 147 stock and Firefox 150 may produce buffers in different shapes (RGB-pre-composited, different modifier) that don't satisfy the direct-scanout predicate. Phase 1 reps for those browsers will tell. ## Cross-check: predecessor's same-condition reps For instrument sanity (NOT as comparison target — these are the predecessor's `kwin_timing_nodebug_rep[1-3]` numbers from 2026-05-02/03): | | Predecessor median | This campaign median | |---|---|---| | frames_total | 1688 | **1685** | | drops_total | 44 | **0** | | drops_post_warmup | 28 | **0** | | kwin %CPU median | 35.9 | **0.00** | The frames_total match indicates the test page + chromium-fourier emit at the same rate in both campaigns. The drops + kwin %CPU divergence is too large to be measurement noise — something about the runtime conditions changed between 2026-05-02 (predecessor's reps) and 2026-05-03 14:23 (today's reps), even though the package versions are identical and the test page is the same file. Possible non-package causes worth listing here so a follow-up can investigate (out of A1 scope): - **Boot generation:** predecessor's reps were on a session that had been running ~9 hours; today's reps were on a session that had been running ~50 minutes after autologin (revert.log entry 6). - **Cumulative session state:** predecessor's session likely had multiple browser instances and other windows open during the campaign's preceding work; today's session was freshly autologin'd from greeter, only the test chrome window visible. - **Thermal:** predecessor's `temp_pre.txt` for rep 1 isn't in our scope to check (would be predecessor data import); but today's reps had cpu-thermal at 36 °C pre-rep, well below thermal-throttle thresholds. - **kwin / qt patches state:** packages identical per `pacman -Q`, but the runtime state of KWin's heuristics (window-rule cache, scanout-decision history) might differ between sessions. This is a normal property of compositors and explains some run-to-run variance even on the same binary. The discipline rule already required the in-session re-measurement this campaign just did. The predecessor's number is no longer the reference; **this campaign's measured median (0 drops, 0 % kwin) is the reference for any X11 cells the campaign will later compare against**. ## Phase 1 implications The matrix design needs revisiting before Phase 1 cells lock: 1. **The mpv `--vo=xv` cell remains the most informative single point** for the campaign's original mechanism (does the X server route NV12 to Plane 39 directly?), per the browser overlay inventory's verdict. 2. **The browser X11 cells become a measurement of "do browsers under X11 get the equivalent direct-scanout benefit they get under Wayland?"** rather than the original "does X11 win over Wayland?" framing. Three plausible outcomes: - X11 cells match Wayland baseline (both engage direct scanout) → "compositor-or-not is irrelevant for the single-window case on this hardware" - X11 cells slightly faster than Wayland (X11 path has less per-frame X protocol overhead) → small but real win for X11 daily-driver - X11 cells slower than Wayland (X11 path has issues KWin's direct-scanout doesn't) → unexpected; would need re-investigation 3. **A multi-window variant** of the with-KWin baseline should be added before Phase 1 binding-cell lock — otherwise the matrix only measures the easiest scenario. Suggested add: A1' rep with chrome + Konsole + Plasma panel all visible, see if kwin %CPU rebounds. If it does, the matrix's daily-driver-relevance picture is more nuanced. The campaign continues with the matrix as defined, but with the understanding that the original framing is partially invalidated. Phase 1 will lock around the reframed sub-questions in `phase0_evidence/browser_overlay_inventory_2026-05-03.md` § "Implications for the matrix" + the multi-window add above. ## Files in this evidence dir ``` a1_rep1/ a1_rep2/ a1_rep3/ — three rep evidence dirs 01_live_session.txt — Wayland session state at A1 capture time 02_predecessor_assets.txt — verification that predecessor scripts/assets reusable a1_protocol.md — protocol spec, run beforehand a1_summary.md — this file ``` Each rep directory contains: - `start.txt` / `end.txt` / `capture_start.txt` — wall-clock - `temp_pre.txt` / `temp_post.txt` — cpu-thermal temp - `top_kwin.txt` — kwin_wayland top samples (70 × 1 Hz) - `top_full.txt` — system top samples (70 × 1 Hz) - `stderr.log` — chromium stderr (full) - `drops_trajectory.txt` — DROPS_TRAJECTORY lines (73 each) - `drops_summary.txt` — frames_total / drops_total / drops_post_warmup - `kwin_cpu_summary.txt` — kwin %CPU stats - `perf_record_stderr.txt` — perf recorder's own stderr - `perf_report_self.txt` / `perf_report_top50.txt` — perf flamegraph (text) `perf.data` files (~400 KB each) are root-owned on ohm (created by `sudo perf record`) and were not synced to noether. They remain at `/home/mfritsche/phase3_prime_runs/x11research_a1_rep[1-3]/perf.data` on ohm if re-analysis is needed.