x11-session-research/phase0_evidence/wayland_baseline_2026-05-03/a1_summary.md

# A1 baseline — verdict

> **CORRIGENDUM 2026-05-03 ~17:00 CEST** — the
> "Major reframing finding" section below interpreted 0 %
> kwin CPU as "KWin doing nothing per frame, direct-scanout
> engaged, campaign hypothesis structurally weakened." That
> interpretation is **wrong**. A subsequent
> drm_info-during-playback probe (see
> `drmprobe_findings.md` in this directory) shows Plane 39
> rotates triple-buffered ABGR8888 framebuffers throughout
> playback — KWin IS doing per-frame GL composite, but
> the work is GPU-side (Panfrost) and invisible to top /
> perf-on-userspace. The campaign's mechanism is intact;
> the matrix needs GPU-aware metrics. The original text
> below is preserved as the on-the-day record of what I
> concluded with insufficient instrumentation.

---

**Three in-session reps of chromium-fourier 149 / brave_drops_test.html
/ Plasma Wayland 6.6.4 + kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 acquired 2026-05-03 14:23–14:31 CEST.**

Per the campaign-contained-data discipline, this is the only
Wayland baseline this campaign uses. Predecessor numbers are
referenced below as instrument-sanity context, not as comparison
targets.

## Results

| Metric | Rep 1 | Rep 2 | Rep 3 | Median (cluster) |
|---|---|---|---|---|
| frames_total (over 70 s) | 1685 | 1685 | 1686 | **1685** |
| effective fps | 24.04 | 24.04 | 24.05 | **24.04** |
| drops_total | 0 | 0 | 0 | **0** |
| drops_post_warmup (after t=10s) | 0 | 0 | 0 | **0** |
| kwin_wayland %CPU median (1 Hz × 70 samples) | 0.00 | 0.00 | 0.00 | **0.00** |
| kwin_wayland %CPU mean | 0.07 | 0.04 | 0.04 | **0.04** |
| kwin_wayland %CPU max | 4.00 | 1.00 | 3.00 | **3.00** |
| perf samples on kwin_wayland (99 Hz × 70 s) | 39 | 28 | (similar) | tens, not thousands |

**IQR / median for the spread metrics is 0 (cluster is degenerate
at the lower bound).** Per the protocol's exit-condition tree,
this is the "tight cluster" branch: Phase 1 binding cells can
use these as the anchor with a sub-1% tolerance band on kwin
%CPU and a ≤ 1-frame tolerance on drops_post_warmup.

Source video plays at 24 fps for 70.09 s ≈ 1682 frames; observed
1685 frames matches within rounding (chromium counts the
DROPS_TRAJECTORY playing-event frame separately).

## What the perf reports say kwin was actually doing

For all three reps, the perf samples on kwin_wayland during
playback are dominated by event-loop bookkeeping, **not** by
any GL-composite or dmabuf-import path:

| Rep | Top symbol(s) at non-trivial % |
|---|---|
| 1 | `__pi_memcpy_generic` (97.18 %) — single memcpy event in a 39-sample run |
| 2 | `libz.so` (37.57 %), `call_filldir` (31.91 %) — readdir + zlib |
| 3 | `dbus_message_unref` (38.80 %), `QUnixEventDispatcherQPA::processEvents` (36.72 %), `libz.so` (23.30 %) — DBus + Qt event loop |

**Zero samples anywhere in:**
- `glEGLImageTargetTexture2DOES` (the GL EGL image bind path
  the predecessor's `kwin_overlay_subsurface` campaign Phase 2
  hypothesised would dominate per-frame KWin cost on this
  hardware)
- `panfrost_*` (Mesa Panfrost driver routines)
- `wp_subsurface_*` / `WaylandSurface_*` (overlay/subsurface
  protocol handling)
- any `Compositor::*` / `OpenGLBackend::*` / `OutputLayer::*`
  KWin internal symbols

The total cycle count across all three reps is in the millions,
not billions — kwin_wayland was scheduled out for >99 % of the
70 s capture window in every rep.

## What this means

**The "KWin is the bottleneck" framing the campaign was built
around is structurally weakened by these data.**

The campaign's load-bearing hypothesis (`README.md` § 1) was
that "the campaign's load-bearing hypothesis is that this
plane-allocation freedom translates into measurable browser-video
speedup." That hypothesis was built on top of the predecessor's
observation that `kwin_wayland` consumed ~36 % CPU during
similar playback, attributable per the predecessor's Phase 2
source-read to per-frame GL composite of NV12 → RGB. **Today's
A1 reps show kwin_wayland at 0 % median, with no GL-composite
work in the perf samples.** There is no Wayland-side
KWin-induced overhead for the X11 cells to *be faster than*.

### The most likely mechanism (hypothesis, not yet verified)

KWin 6.6.4 (with kwin-fourier 6.6.4-3 patches applied) appears
to have engaged its **direct-scanout** code path for the
chrome-window-displaying-video workload. KWin's direct-scanout
support has been there for years on the Wayland backend and
has been progressively widened: when there's a single visible
"top" surface (the chrome window) whose buffer matches a
hardware plane's format/modifier capabilities, KWin can hand
that buffer to DRM directly without first GL-compositing it
into KWin's own framebuffer. The browser's RGB or NV12 dmabuf
goes onto Plane 39 (the Primary plane on rockchip-drm RK3568)
without any per-frame KWin GPU work.

This is **not** the wp_subsurface route the predecessor was
investigating — it's a different, simpler scanout path that
doesn't require the client to opt into overlay protocol. It
just requires the client's surface to be the only visible
non-trivial top-level plus a buffer format/modifier that DRM
can scan out.

If this hypothesis is correct, two things follow:

1. **The campaign's X11-vs-Wayland delta is much smaller than
   originally expected.** Both sessions can avoid per-frame
   compositor work for the single-window video case. The X11
   cells will not be measurably faster than Wayland for this
   workload.
2. **The campaign's mechanism is realised under Wayland too,
   when conditions are right.** "Plane-allocation freedom" is
   not X11-exclusive — it's just easier to engage on X11
   because there's no compositor in the path at all. On
   Wayland, KWin engages an equivalent fast-path when its
   heuristics allow.

### What the data does NOT establish

- That direct-scanout is the cause. The data is consistent
  with direct-scanout but a perf-only diagnosis can't pin it
  down. Phase 1 should add `drm_info` snapshots during
  playback (which plane is programmed with the chrome
  window's buffer FOURCC + modifier?) and KWin debug
  logging (`KWIN_DRM=1` dumps direct-scanout decisions) to
  confirm.
- That this behavior holds for multi-window scenarios. A
  single visible non-trivial top-level window is the simplest
  case for direct-scanout. If the operator works
  multi-windowed (panel + chrome + terminal + Konsole), the
  fast-path may decline and kwin %CPU may rebound. The
  matrix's relevance to daily-driver scenarios depends on
  this.
- That this behavior holds across browsers. chromium-fourier
  149 specifically has the patches that enable smooth NV12
  dmabuf production. Brave 147 stock and Firefox 150 may
  produce buffers in different shapes (RGB-pre-composited,
  different modifier) that don't satisfy the direct-scanout
  predicate. Phase 1 reps for those browsers will tell.

## Cross-check: predecessor's same-condition reps

For instrument sanity (NOT as comparison target — these are
the predecessor's `kwin_timing_nodebug_rep[1-3]` numbers from
2026-05-02/03):

| | Predecessor median | This campaign median |
|---|---|---|
| frames_total | 1688 | **1685** |
| drops_total | 44 | **0** |
| drops_post_warmup | 28 | **0** |
| kwin %CPU median | 35.9 | **0.00** |

The frames_total match indicates the test page + chromium-fourier
emit at the same rate in both campaigns. The drops + kwin %CPU
divergence is too large to be measurement noise — something
about the runtime conditions changed between 2026-05-02
(predecessor's reps) and 2026-05-03 14:23 (today's reps), even
though the package versions are identical and the test page is
the same file.

Possible non-package causes worth listing here so a follow-up
can investigate (out of A1 scope):

- **Boot generation:** predecessor's reps were on a session
  that had been running ~9 hours; today's reps were on a
  session that had been running ~50 minutes after autologin
  (revert.log entry 6).
- **Cumulative session state:** predecessor's session likely
  had multiple browser instances and other windows open
  during the campaign's preceding work; today's session was
  freshly autologin'd from greeter, only the test chrome
  window visible.
- **Thermal:** predecessor's `temp_pre.txt` for rep 1 isn't
  in our scope to check (would be predecessor data import);
  but today's reps had cpu-thermal at 36 °C pre-rep, well
  below thermal-throttle thresholds.
- **kwin / qt patches state:** packages identical per
  `pacman -Q`, but the runtime state of KWin's heuristics
  (window-rule cache, scanout-decision history) might differ
  between sessions. This is a normal property of compositors
  and explains some run-to-run variance even on the same
  binary.

The discipline rule already required the in-session re-measurement
this campaign just did. The predecessor's number is no longer
the reference; **this campaign's measured median (0 drops, 0 %
kwin) is the reference for any X11 cells the campaign will
later compare against**.

## Phase 1 implications

The matrix design needs revisiting before Phase 1 cells lock:

1. **The mpv `--vo=xv` cell remains the most informative
   single point** for the campaign's original mechanism (does
   the X server route NV12 to Plane 39 directly?), per the
   browser overlay inventory's verdict.
2. **The browser X11 cells become a measurement of "do
   browsers under X11 get the equivalent direct-scanout
   benefit they get under Wayland?"** rather than the original
   "does X11 win over Wayland?" framing. Three plausible
   outcomes:
   - X11 cells match Wayland baseline (both engage direct
     scanout) → "compositor-or-not is irrelevant for the
     single-window case on this hardware"
   - X11 cells slightly faster than Wayland (X11 path has
     less per-frame X protocol overhead) → small but real
     win for X11 daily-driver
   - X11 cells slower than Wayland (X11 path has issues
     KWin's direct-scanout doesn't) → unexpected; would need
     re-investigation
3. **A multi-window variant** of the with-KWin baseline
   should be added before Phase 1 binding-cell lock —
   otherwise the matrix only measures the easiest scenario.
   Suggested add: A1' rep with chrome + Konsole + Plasma
   panel all visible, see if kwin %CPU rebounds. If it does,
   the matrix's daily-driver-relevance picture is more
   nuanced.

The campaign continues with the matrix as defined, but with
the understanding that the original framing is partially
invalidated. Phase 1 will lock around the reframed sub-questions
in `phase0_evidence/browser_overlay_inventory_2026-05-03.md` §
"Implications for the matrix" + the multi-window add above.

## Files in this evidence dir

```
a1_rep1/  a1_rep2/  a1_rep3/   — three rep evidence dirs
01_live_session.txt              — Wayland session state at A1 capture time
02_predecessor_assets.txt        — verification that predecessor scripts/assets reusable
a1_protocol.md                   — protocol spec, run beforehand
a1_summary.md                    — this file
```

Each rep directory contains:
- `start.txt` / `end.txt` / `capture_start.txt` — wall-clock
- `temp_pre.txt` / `temp_post.txt` — cpu-thermal temp
- `top_kwin.txt` — kwin_wayland top samples (70 × 1 Hz)
- `top_full.txt` — system top samples (70 × 1 Hz)
- `stderr.log` — chromium stderr (full)
- `drops_trajectory.txt` — DROPS_TRAJECTORY lines (73 each)
- `drops_summary.txt` — frames_total / drops_total / drops_post_warmup
- `kwin_cpu_summary.txt` — kwin %CPU stats
- `perf_record_stderr.txt` — perf recorder's own stderr
- `perf_report_self.txt` / `perf_report_top50.txt` — perf
  flamegraph (text)

`perf.data` files (~400 KB each) are root-owned on ohm
(created by `sudo perf record`) and were not synced to noether.
They remain at `/home/mfritsche/phase3_prime_runs/x11research_a1_rep[1-3]/perf.data`
on ohm if re-analysis is needed.