# A1 baseline protocol — in-session Plasma Wayland anchor **Goal:** acquire 3 in-session reps of a chromium-fourier under-Plasma-Wayland-with-KWin video playback measurement, so the X11 cells of the matrix have a same-session Wayland reference to compare against. Per the campaign-contained-data discipline, **this is the only Wayland baseline this campaign uses**; predecessor numbers are reference history only. ## Cell - **Browser:** `/tmp/chromium-ohm-gl-fix-step2/chrome` (chromium-fourier 149.0.7812.0, the existing predecessor build). - **Page:** `file:///home/mfritsche/fourier-test/brave_drops_test.html` (a 30 fps H.264 / video element with autoplay + drops trajectory emitted to console at 1 Hz; used by the predecessor for all Phase 3 reps). - **Session:** Plasma Wayland tty1 / session 433 (the live one, autologin'd via revert.log entry 6). - **Window:** windowed (default chromium behavior, no fullscreen). - **Decode:** chromium-fourier's default decode path. With the Step 1 + Step 2 patches present, this is libva via `libva-v4l2-request-fourier` driver (V4L2 stateless on hantro). - **Capture window:** 70 s starting at autoplay-detected. - **Instrumentation:** `top -p kwin_wayland` (1 Hz), `top` (system, 1 Hz), `sudo perf record -F 99 -g --call-graph dwarf -p kwin_wayland`, browser stderr (catches the page's `DROPS_TRAJECTORY: t=Xs tot=Y drop=Z` 1 Hz log). **No `WAYLAND_DEBUG=1`** — this is the `nodebug` variant so the kwin %CPU and drop measurements aren't perturbed by WAYLAND_DEBUG's per-message overhead. ## Bound metrics per rep Each rep's evidence dir contains: - `start.txt` / `end.txt` / `capture_start.txt` — wall-clock timestamps of phases. - `temp_pre.txt` / `temp_post.txt` — thermal_zone0 (cpu) at phase boundaries. - `top_kwin.txt` — `kwin_wayland` %CPU samples (70 × 1 Hz). - `top_full.txt` — system-wide top (70 × 1 Hz). - `perf.data` — perf record at 99 Hz on kwin_wayland. - `perf_report_self.txt` — perf report (sorted by overhead). - `perf_report_top50.txt` — first 50 lines of perf report. - `stderr.log` — full chromium stderr. - `drops_trajectory.txt` — extracted DROPS_TRAJECTORY lines. - `kwin_cpu_summary.txt` — kwin %CPU samples / median / mean / min / max. - `drops_summary.txt` — `frames_total`, `drops_total`, `drops_post_warmup` (drops accumulated after t=10 s). ## Protocol Three reps **back-to-back** with ≥ 30 s idle between to let thermals settle. The whole campaign sequence takes ~5 minutes of wall time: ``` T+0:00 rep 1: launch + 70s capture + cleanup (~95s) T+1:35 30s idle (thermal settle) T+2:05 rep 2: same (~95s) T+3:40 30s idle T+4:10 rep 3: same (~95s) T+5:45 done — pull evidence ``` **SSH-driven:** the orchestrator `/home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh $RUN_ID chromium-fourier-kwin` runs end-to-end from a single SSH command. Operator-side, **a chrome window will appear on the screen for ~80 s per rep**; the only operator action is **not interacting with that window** (no clicks, no typing in the chrome window, no pulling focus). The orchestrator kills the chrome process cleanly at end of capture. After the 3 reps complete, this campaign's evidence sub-directory `phase0_evidence/wayland_baseline_2026-05-03/` will contain: ``` a1_rep1/ (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/) a1_rep2/ a1_rep3/ a1_summary.md (this campaign's interpretation of the 3 reps) ``` The original predecessor evidence at `/home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3]` is **untouched**. ## Exit conditions - Per-rep success = `drops_summary.txt` exists with non-`n/a` values, `kwin_cpu_summary.txt` exists with samples > 0, perf report has > 1000 samples. - Per-rep failure causes: - autoplay not detected within 30 s → script aborts, evidence dir is partial; rep marked failed. - workload exits before autoplay → script aborts. - perf record fails (e.g. paranoid > 1) → script continues but perf.data is empty; we'd see this in `perf_record_stderr.txt`. If a rep fails, surface the cause and re-run that rep before moving on. ## Decision after 3 reps Compute median + IQR of `drops_post_warmup`, `frames_total`, `drops_total`, and kwin %CPU across the three reps. Two possible verdict shapes: - **Tight cluster (IQR / median ≤ 0.3):** baseline is stable; Phase 1 binding cells can use the median as the anchor with the IQR as the tolerance band. - **High variance (IQR / median > 0.3):** baseline is noisy; Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell thresholds need IQR-based formulation rather than fixed numbers. This is the predecessor lesson built into the worklist's "3 reps minimum (variance is a real concern)". ## Operator green-light request Before I fire the 3 reps: 1. Confirm you're OK with **a chrome window popping up on the screen for ~80 s per rep × 3 reps**, and during that time **not interacting with it** (mouse stays still, no key presses). 2. Confirm the current Plasma Wayland session is in a "clean measurement state" — i.e. nothing else is doing significant CPU work (ideally close any active terminals/browsers/IDE windows you don't need; the predecessor's 36-37 % kwin %CPU baseline assumes a quiescent desktop with just the test chrome window plus normal Plasma services running). 3. (Optional) Decide whether to also include other variants in this turn's measurements — e.g. add a rep of `Brave 147` or `Firefox 150` under Plasma Wayland to start populating the full matrix. Default scope: just the 3 chromium-fourier reps; matrix-fill cells go into Phase 1 proper. When green-lit, I fire `run_browser_nodebug.sh x11research_a1_rep1 chromium-fourier-kwin` first as a smoke test, surface its drops_summary.txt + kwin_cpu_summary.txt output, then on confirmation fire reps 2 and 3 back-to-back.