3 in-session reps of chromium-fourier 149 / brave_drops_test.html /
Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 carry-overs intact). Tight cluster IQR=0:
drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU
median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show
zero composite/dmabuf/GL symbols — only event-loop bookkeeping.
Most likely mechanism: KWin direct-scanout fast-path engaged for
the single-visible-client video case. The campaign's load-bearing
hypothesis ("X11 + non-compositing WM avoids per-frame GL composite
of NV12") is structurally weakened — KWin already avoids that work
under Wayland for this workload. Phase 1 needs to add a
multi-window A1' variant and drm_info-during-playback to confirm
direct-scanout, then revisit matrix cell design.
revert.log entry 6: SDDM autologin + state.conf swap that landed
the Plasma Wayland session for the A1 reps. Backup of original
state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak;
single-command revert documented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
A1 baseline — verdict
Three in-session reps of chromium-fourier 149 / brave_drops_test.html / Plasma Wayland 6.6.4 + kwin-fourier 6.6.4-3 + qt6-base-fourier 6.11.0-3 acquired 2026-05-03 14:23–14:31 CEST.
Per the campaign-contained-data discipline, this is the only Wayland baseline this campaign uses. Predecessor numbers are referenced below as instrument-sanity context, not as comparison targets.
Results
| Metric | Rep 1 | Rep 2 | Rep 3 | Median (cluster) |
|---|---|---|---|---|
| frames_total (over 70 s) | 1685 | 1685 | 1686 | 1685 |
| effective fps | 24.04 | 24.04 | 24.05 | 24.04 |
| drops_total | 0 | 0 | 0 | 0 |
| drops_post_warmup (after t=10s) | 0 | 0 | 0 | 0 |
| kwin_wayland %CPU median (1 Hz × 70 samples) | 0.00 | 0.00 | 0.00 | 0.00 |
| kwin_wayland %CPU mean | 0.07 | 0.04 | 0.04 | 0.04 |
| kwin_wayland %CPU max | 4.00 | 1.00 | 3.00 | 3.00 |
| perf samples on kwin_wayland (99 Hz × 70 s) | 39 | 28 | (similar) | tens, not thousands |
IQR / median for the spread metrics is 0 (cluster is degenerate at the lower bound). Per the protocol's exit-condition tree, this is the "tight cluster" branch: Phase 1 binding cells can use these as the anchor with a sub-1% tolerance band on kwin %CPU and a ≤ 1-frame tolerance on drops_post_warmup.
Source video plays at 24 fps for 70.09 s ≈ 1682 frames; observed 1685 frames matches within rounding (chromium counts the DROPS_TRAJECTORY playing-event frame separately).
What the perf reports say kwin was actually doing
For all three reps, the perf samples on kwin_wayland during playback are dominated by event-loop bookkeeping, not by any GL-composite or dmabuf-import path:
| Rep | Top symbol(s) at non-trivial % |
|---|---|
| 1 | __pi_memcpy_generic (97.18 %) — single memcpy event in a 39-sample run |
| 2 | libz.so (37.57 %), call_filldir (31.91 %) — readdir + zlib |
| 3 | dbus_message_unref (38.80 %), QUnixEventDispatcherQPA::processEvents (36.72 %), libz.so (23.30 %) — DBus + Qt event loop |
Zero samples anywhere in:
glEGLImageTargetTexture2DOES(the GL EGL image bind path the predecessor'skwin_overlay_subsurfacecampaign Phase 2 hypothesised would dominate per-frame KWin cost on this hardware)panfrost_*(Mesa Panfrost driver routines)wp_subsurface_*/WaylandSurface_*(overlay/subsurface protocol handling)- any
Compositor::*/OpenGLBackend::*/OutputLayer::*KWin internal symbols
The total cycle count across all three reps is in the millions, not billions — kwin_wayland was scheduled out for >99 % of the 70 s capture window in every rep.
What this means
The "KWin is the bottleneck" framing the campaign was built around is structurally weakened by these data.
The campaign's load-bearing hypothesis (README.md § 1) was
that "the campaign's load-bearing hypothesis is that this
plane-allocation freedom translates into measurable browser-video
speedup." That hypothesis was built on top of the predecessor's
observation that kwin_wayland consumed ~36 % CPU during
similar playback, attributable per the predecessor's Phase 2
source-read to per-frame GL composite of NV12 → RGB. Today's
A1 reps show kwin_wayland at 0 % median, with no GL-composite
work in the perf samples. There is no Wayland-side
KWin-induced overhead for the X11 cells to be faster than.
The most likely mechanism (hypothesis, not yet verified)
KWin 6.6.4 (with kwin-fourier 6.6.4-3 patches applied) appears to have engaged its direct-scanout code path for the chrome-window-displaying-video workload. KWin's direct-scanout support has been there for years on the Wayland backend and has been progressively widened: when there's a single visible "top" surface (the chrome window) whose buffer matches a hardware plane's format/modifier capabilities, KWin can hand that buffer to DRM directly without first GL-compositing it into KWin's own framebuffer. The browser's RGB or NV12 dmabuf goes onto Plane 39 (the Primary plane on rockchip-drm RK3568) without any per-frame KWin GPU work.
This is not the wp_subsurface route the predecessor was investigating — it's a different, simpler scanout path that doesn't require the client to opt into overlay protocol. It just requires the client's surface to be the only visible non-trivial top-level plus a buffer format/modifier that DRM can scan out.
If this hypothesis is correct, two things follow:
- The campaign's X11-vs-Wayland delta is much smaller than originally expected. Both sessions can avoid per-frame compositor work for the single-window video case. The X11 cells will not be measurably faster than Wayland for this workload.
- The campaign's mechanism is realised under Wayland too, when conditions are right. "Plane-allocation freedom" is not X11-exclusive — it's just easier to engage on X11 because there's no compositor in the path at all. On Wayland, KWin engages an equivalent fast-path when its heuristics allow.
What the data does NOT establish
- That direct-scanout is the cause. The data is consistent
with direct-scanout but a perf-only diagnosis can't pin it
down. Phase 1 should add
drm_infosnapshots during playback (which plane is programmed with the chrome window's buffer FOURCC + modifier?) and KWin debug logging (KWIN_DRM=1dumps direct-scanout decisions) to confirm. - That this behavior holds for multi-window scenarios. A single visible non-trivial top-level window is the simplest case for direct-scanout. If the operator works multi-windowed (panel + chrome + terminal + Konsole), the fast-path may decline and kwin %CPU may rebound. The matrix's relevance to daily-driver scenarios depends on this.
- That this behavior holds across browsers. chromium-fourier 149 specifically has the patches that enable smooth NV12 dmabuf production. Brave 147 stock and Firefox 150 may produce buffers in different shapes (RGB-pre-composited, different modifier) that don't satisfy the direct-scanout predicate. Phase 1 reps for those browsers will tell.
Cross-check: predecessor's same-condition reps
For instrument sanity (NOT as comparison target — these are
the predecessor's kwin_timing_nodebug_rep[1-3] numbers from
2026-05-02/03):
| Predecessor median | This campaign median | |
|---|---|---|
| frames_total | 1688 | 1685 |
| drops_total | 44 | 0 |
| drops_post_warmup | 28 | 0 |
| kwin %CPU median | 35.9 | 0.00 |
The frames_total match indicates the test page + chromium-fourier emit at the same rate in both campaigns. The drops + kwin %CPU divergence is too large to be measurement noise — something about the runtime conditions changed between 2026-05-02 (predecessor's reps) and 2026-05-03 14:23 (today's reps), even though the package versions are identical and the test page is the same file.
Possible non-package causes worth listing here so a follow-up can investigate (out of A1 scope):
- Boot generation: predecessor's reps were on a session that had been running ~9 hours; today's reps were on a session that had been running ~50 minutes after autologin (revert.log entry 6).
- Cumulative session state: predecessor's session likely had multiple browser instances and other windows open during the campaign's preceding work; today's session was freshly autologin'd from greeter, only the test chrome window visible.
- Thermal: predecessor's
temp_pre.txtfor rep 1 isn't in our scope to check (would be predecessor data import); but today's reps had cpu-thermal at 36 °C pre-rep, well below thermal-throttle thresholds. - kwin / qt patches state: packages identical per
pacman -Q, but the runtime state of KWin's heuristics (window-rule cache, scanout-decision history) might differ between sessions. This is a normal property of compositors and explains some run-to-run variance even on the same binary.
The discipline rule already required the in-session re-measurement this campaign just did. The predecessor's number is no longer the reference; this campaign's measured median (0 drops, 0 % kwin) is the reference for any X11 cells the campaign will later compare against.
Phase 1 implications
The matrix design needs revisiting before Phase 1 cells lock:
- The mpv
--vo=xvcell remains the most informative single point for the campaign's original mechanism (does the X server route NV12 to Plane 39 directly?), per the browser overlay inventory's verdict. - The browser X11 cells become a measurement of "do
browsers under X11 get the equivalent direct-scanout
benefit they get under Wayland?" rather than the original
"does X11 win over Wayland?" framing. Three plausible
outcomes:
- X11 cells match Wayland baseline (both engage direct scanout) → "compositor-or-not is irrelevant for the single-window case on this hardware"
- X11 cells slightly faster than Wayland (X11 path has less per-frame X protocol overhead) → small but real win for X11 daily-driver
- X11 cells slower than Wayland (X11 path has issues KWin's direct-scanout doesn't) → unexpected; would need re-investigation
- A multi-window variant of the with-KWin baseline should be added before Phase 1 binding-cell lock — otherwise the matrix only measures the easiest scenario. Suggested add: A1' rep with chrome + Konsole + Plasma panel all visible, see if kwin %CPU rebounds. If it does, the matrix's daily-driver-relevance picture is more nuanced.
The campaign continues with the matrix as defined, but with
the understanding that the original framing is partially
invalidated. Phase 1 will lock around the reframed sub-questions
in phase0_evidence/browser_overlay_inventory_2026-05-03.md §
"Implications for the matrix" + the multi-window add above.
Files in this evidence dir
a1_rep1/ a1_rep2/ a1_rep3/ — three rep evidence dirs
01_live_session.txt — Wayland session state at A1 capture time
02_predecessor_assets.txt — verification that predecessor scripts/assets reusable
a1_protocol.md — protocol spec, run beforehand
a1_summary.md — this file
Each rep directory contains:
start.txt/end.txt/capture_start.txt— wall-clocktemp_pre.txt/temp_post.txt— cpu-thermal temptop_kwin.txt— kwin_wayland top samples (70 × 1 Hz)top_full.txt— system top samples (70 × 1 Hz)stderr.log— chromium stderr (full)drops_trajectory.txt— DROPS_TRAJECTORY lines (73 each)drops_summary.txt— frames_total / drops_total / drops_post_warmupkwin_cpu_summary.txt— kwin %CPU statsperf_record_stderr.txt— perf recorder's own stderrperf_report_self.txt/perf_report_top50.txt— perf flamegraph (text)
perf.data files (~400 KB each) are root-owned on ohm
(created by sudo perf record) and were not synced to noether.
They remain at /home/mfritsche/phase3_prime_runs/x11research_a1_rep[1-3]/perf.data
on ohm if re-analysis is needed.