Phase 0: A1 Wayland baseline + state snapshot — major reframing

3 in-session reps of chromium-fourier 149 / brave_drops_test.html /
Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 carry-overs intact). Tight cluster IQR=0:
drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU
median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show
zero composite/dmabuf/GL symbols — only event-loop bookkeeping.

Most likely mechanism: KWin direct-scanout fast-path engaged for
the single-visible-client video case. The campaign's load-bearing
hypothesis ("X11 + non-compositing WM avoids per-frame GL composite
of NV12") is structurally weakened — KWin already avoids that work
under Wayland for this workload. Phase 1 needs to add a
multi-window A1' variant and drm_info-during-playback to confirm
direct-scanout, then revisit matrix cell design.

revert.log entry 6: SDDM autologin + state.conf swap that landed
the Plasma Wayland session for the A1 reps. Backup of original
state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak;
single-command revert documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 13:11:15 +00:00
parent d2e11be430
commit 9a023e9264
45 changed files with 3199 additions and 17 deletions
@@ -0,0 +1,147 @@
# A1 baseline protocol — in-session Plasma Wayland anchor
**Goal:** acquire 3 in-session reps of a chromium-fourier
under-Plasma-Wayland-with-KWin video playback measurement, so
the X11 cells of the matrix have a same-session Wayland
reference to compare against. Per the campaign-contained-data
discipline, **this is the only Wayland baseline this campaign
uses**; predecessor numbers are reference history only.
## Cell
- **Browser:** `/tmp/chromium-ohm-gl-fix-step2/chrome`
(chromium-fourier 149.0.7812.0, the existing predecessor
build).
- **Page:** `file:///home/mfritsche/fourier-test/brave_drops_test.html`
(a 30 fps H.264 / video element with autoplay + drops trajectory
emitted to console at 1 Hz; used by the predecessor for all
Phase 3 reps).
- **Session:** Plasma Wayland tty1 / session 433
(the live one, autologin'd via revert.log entry 6).
- **Window:** windowed (default chromium behavior, no fullscreen).
- **Decode:** chromium-fourier's default decode path. With the
Step 1 + Step 2 patches present, this is libva via
`libva-v4l2-request-fourier` driver (V4L2 stateless on hantro).
- **Capture window:** 70 s starting at autoplay-detected.
- **Instrumentation:** `top -p kwin_wayland` (1 Hz),
`top` (system, 1 Hz), `sudo perf record -F 99 -g
--call-graph dwarf -p kwin_wayland`, browser stderr (catches
the page's `DROPS_TRAJECTORY: t=Xs tot=Y drop=Z` 1 Hz log).
**No `WAYLAND_DEBUG=1`** — this is the `nodebug` variant so
the kwin %CPU and drop measurements aren't perturbed by
WAYLAND_DEBUG's per-message overhead.
## Bound metrics per rep
Each rep's evidence dir contains:
- `start.txt` / `end.txt` / `capture_start.txt` — wall-clock
timestamps of phases.
- `temp_pre.txt` / `temp_post.txt` — thermal_zone0 (cpu) at
phase boundaries.
- `top_kwin.txt``kwin_wayland` %CPU samples (70 × 1 Hz).
- `top_full.txt` — system-wide top (70 × 1 Hz).
- `perf.data` — perf record at 99 Hz on kwin_wayland.
- `perf_report_self.txt` — perf report (sorted by overhead).
- `perf_report_top50.txt` — first 50 lines of perf report.
- `stderr.log` — full chromium stderr.
- `drops_trajectory.txt` — extracted DROPS_TRAJECTORY lines.
- `kwin_cpu_summary.txt` — kwin %CPU samples / median / mean /
min / max.
- `drops_summary.txt``frames_total`, `drops_total`,
`drops_post_warmup` (drops accumulated after t=10 s).
## Protocol
Three reps **back-to-back** with ≥ 30 s idle between to let
thermals settle. The whole campaign sequence takes ~5 minutes
of wall time:
```
T+0:00 rep 1: launch + 70s capture + cleanup (~95s)
T+1:35 30s idle (thermal settle)
T+2:05 rep 2: same (~95s)
T+3:40 30s idle
T+4:10 rep 3: same (~95s)
T+5:45 done — pull evidence
```
**SSH-driven:** the orchestrator
`/home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh
$RUN_ID chromium-fourier-kwin` runs end-to-end from a single
SSH command. Operator-side, **a chrome window will appear on
the screen for ~80 s per rep**; the only operator action is
**not interacting with that window** (no clicks, no typing in
the chrome window, no pulling focus). The orchestrator kills
the chrome process cleanly at end of capture.
After the 3 reps complete, this campaign's evidence
sub-directory `phase0_evidence/wayland_baseline_2026-05-03/`
will contain:
```
a1_rep1/ (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/)
a1_rep2/
a1_rep3/
a1_summary.md (this campaign's interpretation of the 3 reps)
```
The original predecessor evidence at
`/home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3]`
is **untouched**.
## Exit conditions
- Per-rep success = `drops_summary.txt` exists with non-`n/a`
values, `kwin_cpu_summary.txt` exists with samples > 0, perf
report has > 1000 samples.
- Per-rep failure causes:
- autoplay not detected within 30 s → script aborts, evidence
dir is partial; rep marked failed.
- workload exits before autoplay → script aborts.
- perf record fails (e.g. paranoid > 1) → script continues
but perf.data is empty; we'd see this in
`perf_record_stderr.txt`.
If a rep fails, surface the cause and re-run that rep before
moving on.
## Decision after 3 reps
Compute median + IQR of `drops_post_warmup`, `frames_total`,
`drops_total`, and kwin %CPU across the three reps. Two
possible verdict shapes:
- **Tight cluster (IQR / median ≤ 0.3):** baseline is stable;
Phase 1 binding cells can use the median as the anchor with
the IQR as the tolerance band.
- **High variance (IQR / median > 0.3):** baseline is noisy;
Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell
thresholds need IQR-based formulation rather than fixed
numbers. This is the predecessor lesson built into the
worklist's "3 reps minimum (variance is a real concern)".
## Operator green-light request
Before I fire the 3 reps:
1. Confirm you're OK with **a chrome window popping up on the
screen for ~80 s per rep × 3 reps**, and during that time
**not interacting with it** (mouse stays still, no key
presses).
2. Confirm the current Plasma Wayland session is in a "clean
measurement state" — i.e. nothing else is doing significant
CPU work (ideally close any active terminals/browsers/IDE
windows you don't need; the predecessor's 36-37 % kwin %CPU
baseline assumes a quiescent desktop with just the test
chrome window plus normal Plasma services running).
3. (Optional) Decide whether to also include other variants in
this turn's measurements — e.g. add a rep of `Brave 147` or
`Firefox 150` under Plasma Wayland to start populating the
full matrix. Default scope: just the 3 chromium-fourier
reps; matrix-fill cells go into Phase 1 proper.
When green-lit, I fire `run_browser_nodebug.sh
x11research_a1_rep1 chromium-fourier-kwin` first as a smoke
test, surface its drops_summary.txt + kwin_cpu_summary.txt
output, then on confirmation fire reps 2 and 3 back-to-back.