Phase 0: A1 Wayland baseline + state snapshot — major reframing

3 in-session reps of chromium-fourier 149 / brave_drops_test.html / Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier 6.11.0-3 carry-overs intact). Tight cluster IQR=0: drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show zero composite/dmabuf/GL symbols — only event-loop bookkeeping. Most likely mechanism: KWin direct-scanout fast-path engaged for the single-visible-client video case. The campaign's load-bearing hypothesis ("X11 + non-compositing WM avoids per-frame GL composite of NV12") is structurally weakened — KWin already avoids that work under Wayland for this workload. Phase 1 needs to add a multi-window A1' variant and drm_info-during-playback to confirm direct-scanout, then revisit matrix cell design. revert.log entry 6: SDDM autologin + state.conf swap that landed the Plasma Wayland session for the A1 reps. Backup of original state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak; single-command revert documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:11:15 +00:00
parent d2e11be430
commit 9a023e9264
45 changed files with 3199 additions and 17 deletions
@@ -0,0 +1,147 @@
+# A1 baseline protocol — in-session Plasma Wayland anchor
+
+**Goal:** acquire 3 in-session reps of a chromium-fourier
+under-Plasma-Wayland-with-KWin video playback measurement, so
+the X11 cells of the matrix have a same-session Wayland
+reference to compare against. Per the campaign-contained-data
+discipline, **this is the only Wayland baseline this campaign
+uses**; predecessor numbers are reference history only.
+
+## Cell
+
+- **Browser:** `/tmp/chromium-ohm-gl-fix-step2/chrome`
+  (chromium-fourier 149.0.7812.0, the existing predecessor
+  build).
+- **Page:** `file:///home/mfritsche/fourier-test/brave_drops_test.html`
+  (a 30 fps H.264 / video element with autoplay + drops trajectory
+  emitted to console at 1 Hz; used by the predecessor for all
+  Phase 3 reps).
+- **Session:** Plasma Wayland tty1 / session 433
+  (the live one, autologin'd via revert.log entry 6).
+- **Window:** windowed (default chromium behavior, no fullscreen).
+- **Decode:** chromium-fourier's default decode path. With the
+  Step 1 + Step 2 patches present, this is libva via
+  `libva-v4l2-request-fourier` driver (V4L2 stateless on hantro).
+- **Capture window:** 70 s starting at autoplay-detected.
+- **Instrumentation:** `top -p kwin_wayland` (1 Hz),
+  `top` (system, 1 Hz), `sudo perf record -F 99 -g
+  --call-graph dwarf -p kwin_wayland`, browser stderr (catches
+  the page's `DROPS_TRAJECTORY: t=Xs tot=Y drop=Z` 1 Hz log).
+  **No `WAYLAND_DEBUG=1`** — this is the `nodebug` variant so
+  the kwin %CPU and drop measurements aren't perturbed by
+  WAYLAND_DEBUG's per-message overhead.
+
+## Bound metrics per rep
+
+Each rep's evidence dir contains:
+
+- `start.txt` / `end.txt` / `capture_start.txt` — wall-clock
+  timestamps of phases.
+- `temp_pre.txt` / `temp_post.txt` — thermal_zone0 (cpu) at
+  phase boundaries.
+- `top_kwin.txt` — `kwin_wayland` %CPU samples (70 × 1 Hz).
+- `top_full.txt` — system-wide top (70 × 1 Hz).
+- `perf.data` — perf record at 99 Hz on kwin_wayland.
+- `perf_report_self.txt` — perf report (sorted by overhead).
+- `perf_report_top50.txt` — first 50 lines of perf report.
+- `stderr.log` — full chromium stderr.
+- `drops_trajectory.txt` — extracted DROPS_TRAJECTORY lines.
+- `kwin_cpu_summary.txt` — kwin %CPU samples / median / mean /
+  min / max.
+- `drops_summary.txt` — `frames_total`, `drops_total`,
+  `drops_post_warmup` (drops accumulated after t=10 s).
+
+## Protocol
+
+Three reps **back-to-back** with ≥ 30 s idle between to let
+thermals settle. The whole campaign sequence takes ~5 minutes
+of wall time:
+
+```
+T+0:00   rep 1: launch + 70s capture + cleanup       (~95s)
+T+1:35   30s idle (thermal settle)
+T+2:05   rep 2: same                                  (~95s)
+T+3:40   30s idle
+T+4:10   rep 3: same                                  (~95s)
+T+5:45   done — pull evidence
+```
+
+**SSH-driven:** the orchestrator
+`/home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh
+$RUN_ID chromium-fourier-kwin` runs end-to-end from a single
+SSH command. Operator-side, **a chrome window will appear on
+the screen for ~80 s per rep**; the only operator action is
+**not interacting with that window** (no clicks, no typing in
+the chrome window, no pulling focus). The orchestrator kills
+the chrome process cleanly at end of capture.
+
+After the 3 reps complete, this campaign's evidence
+sub-directory `phase0_evidence/wayland_baseline_2026-05-03/`
+will contain:
+
+```
+a1_rep1/   (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/)
+a1_rep2/
+a1_rep3/
+a1_summary.md   (this campaign's interpretation of the 3 reps)
+```
+
+The original predecessor evidence at
+`/home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3]`
+is **untouched**.
+
+## Exit conditions
+
+- Per-rep success = `drops_summary.txt` exists with non-`n/a`
+  values, `kwin_cpu_summary.txt` exists with samples > 0, perf
+  report has > 1000 samples.
+- Per-rep failure causes:
+  - autoplay not detected within 30 s → script aborts, evidence
+    dir is partial; rep marked failed.
+  - workload exits before autoplay → script aborts.
+  - perf record fails (e.g. paranoid > 1) → script continues
+    but perf.data is empty; we'd see this in
+    `perf_record_stderr.txt`.
+
+If a rep fails, surface the cause and re-run that rep before
+moving on.
+
+## Decision after 3 reps
+
+Compute median + IQR of `drops_post_warmup`, `frames_total`,
+`drops_total`, and kwin %CPU across the three reps. Two
+possible verdict shapes:
+
+- **Tight cluster (IQR / median ≤ 0.3):** baseline is stable;
+  Phase 1 binding cells can use the median as the anchor with
+  the IQR as the tolerance band.
+- **High variance (IQR / median > 0.3):** baseline is noisy;
+  Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell
+  thresholds need IQR-based formulation rather than fixed
+  numbers. This is the predecessor lesson built into the
+  worklist's "3 reps minimum (variance is a real concern)".
+
+## Operator green-light request
+
+Before I fire the 3 reps:
+
+1. Confirm you're OK with **a chrome window popping up on the
+   screen for ~80 s per rep × 3 reps**, and during that time
+   **not interacting with it** (mouse stays still, no key
+   presses).
+2. Confirm the current Plasma Wayland session is in a "clean
+   measurement state" — i.e. nothing else is doing significant
+   CPU work (ideally close any active terminals/browsers/IDE
+   windows you don't need; the predecessor's 36-37 % kwin %CPU
+   baseline assumes a quiescent desktop with just the test
+   chrome window plus normal Plasma services running).
+3. (Optional) Decide whether to also include other variants in
+   this turn's measurements — e.g. add a rep of `Brave 147` or
+   `Firefox 150` under Plasma Wayland to start populating the
+   full matrix. Default scope: just the 3 chromium-fourier
+   reps; matrix-fill cells go into Phase 1 proper.
+
+When green-lit, I fire `run_browser_nodebug.sh
+x11research_a1_rep1 chromium-fourier-kwin` first as a smoke
+test, surface its drops_summary.txt + kwin_cpu_summary.txt
+output, then on confirmation fire reps 2 and 3 back-to-back.