Files
x11-session-research/phase0_evidence/wayland_baseline_2026-05-03/a1_protocol.md
T
marfrit 9a023e9264 Phase 0: A1 Wayland baseline + state snapshot — major reframing
3 in-session reps of chromium-fourier 149 / brave_drops_test.html /
Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 carry-overs intact). Tight cluster IQR=0:
drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU
median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show
zero composite/dmabuf/GL symbols — only event-loop bookkeeping.

Most likely mechanism: KWin direct-scanout fast-path engaged for
the single-visible-client video case. The campaign's load-bearing
hypothesis ("X11 + non-compositing WM avoids per-frame GL composite
of NV12") is structurally weakened — KWin already avoids that work
under Wayland for this workload. Phase 1 needs to add a
multi-window A1' variant and drm_info-during-playback to confirm
direct-scanout, then revisit matrix cell design.

revert.log entry 6: SDDM autologin + state.conf swap that landed
the Plasma Wayland session for the A1 reps. Backup of original
state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak;
single-command revert documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:11:15 +00:00

148 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# A1 baseline protocol — in-session Plasma Wayland anchor
**Goal:** acquire 3 in-session reps of a chromium-fourier
under-Plasma-Wayland-with-KWin video playback measurement, so
the X11 cells of the matrix have a same-session Wayland
reference to compare against. Per the campaign-contained-data
discipline, **this is the only Wayland baseline this campaign
uses**; predecessor numbers are reference history only.
## Cell
- **Browser:** `/tmp/chromium-ohm-gl-fix-step2/chrome`
(chromium-fourier 149.0.7812.0, the existing predecessor
build).
- **Page:** `file:///home/mfritsche/fourier-test/brave_drops_test.html`
(a 30 fps H.264 / video element with autoplay + drops trajectory
emitted to console at 1 Hz; used by the predecessor for all
Phase 3 reps).
- **Session:** Plasma Wayland tty1 / session 433
(the live one, autologin'd via revert.log entry 6).
- **Window:** windowed (default chromium behavior, no fullscreen).
- **Decode:** chromium-fourier's default decode path. With the
Step 1 + Step 2 patches present, this is libva via
`libva-v4l2-request-fourier` driver (V4L2 stateless on hantro).
- **Capture window:** 70 s starting at autoplay-detected.
- **Instrumentation:** `top -p kwin_wayland` (1 Hz),
`top` (system, 1 Hz), `sudo perf record -F 99 -g
--call-graph dwarf -p kwin_wayland`, browser stderr (catches
the page's `DROPS_TRAJECTORY: t=Xs tot=Y drop=Z` 1 Hz log).
**No `WAYLAND_DEBUG=1`** — this is the `nodebug` variant so
the kwin %CPU and drop measurements aren't perturbed by
WAYLAND_DEBUG's per-message overhead.
## Bound metrics per rep
Each rep's evidence dir contains:
- `start.txt` / `end.txt` / `capture_start.txt` — wall-clock
timestamps of phases.
- `temp_pre.txt` / `temp_post.txt` — thermal_zone0 (cpu) at
phase boundaries.
- `top_kwin.txt``kwin_wayland` %CPU samples (70 × 1 Hz).
- `top_full.txt` — system-wide top (70 × 1 Hz).
- `perf.data` — perf record at 99 Hz on kwin_wayland.
- `perf_report_self.txt` — perf report (sorted by overhead).
- `perf_report_top50.txt` — first 50 lines of perf report.
- `stderr.log` — full chromium stderr.
- `drops_trajectory.txt` — extracted DROPS_TRAJECTORY lines.
- `kwin_cpu_summary.txt` — kwin %CPU samples / median / mean /
min / max.
- `drops_summary.txt``frames_total`, `drops_total`,
`drops_post_warmup` (drops accumulated after t=10 s).
## Protocol
Three reps **back-to-back** with ≥ 30 s idle between to let
thermals settle. The whole campaign sequence takes ~5 minutes
of wall time:
```
T+0:00 rep 1: launch + 70s capture + cleanup (~95s)
T+1:35 30s idle (thermal settle)
T+2:05 rep 2: same (~95s)
T+3:40 30s idle
T+4:10 rep 3: same (~95s)
T+5:45 done — pull evidence
```
**SSH-driven:** the orchestrator
`/home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh
$RUN_ID chromium-fourier-kwin` runs end-to-end from a single
SSH command. Operator-side, **a chrome window will appear on
the screen for ~80 s per rep**; the only operator action is
**not interacting with that window** (no clicks, no typing in
the chrome window, no pulling focus). The orchestrator kills
the chrome process cleanly at end of capture.
After the 3 reps complete, this campaign's evidence
sub-directory `phase0_evidence/wayland_baseline_2026-05-03/`
will contain:
```
a1_rep1/ (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/)
a1_rep2/
a1_rep3/
a1_summary.md (this campaign's interpretation of the 3 reps)
```
The original predecessor evidence at
`/home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3]`
is **untouched**.
## Exit conditions
- Per-rep success = `drops_summary.txt` exists with non-`n/a`
values, `kwin_cpu_summary.txt` exists with samples > 0, perf
report has > 1000 samples.
- Per-rep failure causes:
- autoplay not detected within 30 s → script aborts, evidence
dir is partial; rep marked failed.
- workload exits before autoplay → script aborts.
- perf record fails (e.g. paranoid > 1) → script continues
but perf.data is empty; we'd see this in
`perf_record_stderr.txt`.
If a rep fails, surface the cause and re-run that rep before
moving on.
## Decision after 3 reps
Compute median + IQR of `drops_post_warmup`, `frames_total`,
`drops_total`, and kwin %CPU across the three reps. Two
possible verdict shapes:
- **Tight cluster (IQR / median ≤ 0.3):** baseline is stable;
Phase 1 binding cells can use the median as the anchor with
the IQR as the tolerance band.
- **High variance (IQR / median > 0.3):** baseline is noisy;
Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell
thresholds need IQR-based formulation rather than fixed
numbers. This is the predecessor lesson built into the
worklist's "3 reps minimum (variance is a real concern)".
## Operator green-light request
Before I fire the 3 reps:
1. Confirm you're OK with **a chrome window popping up on the
screen for ~80 s per rep × 3 reps**, and during that time
**not interacting with it** (mouse stays still, no key
presses).
2. Confirm the current Plasma Wayland session is in a "clean
measurement state" — i.e. nothing else is doing significant
CPU work (ideally close any active terminals/browsers/IDE
windows you don't need; the predecessor's 36-37 % kwin %CPU
baseline assumes a quiescent desktop with just the test
chrome window plus normal Plasma services running).
3. (Optional) Decide whether to also include other variants in
this turn's measurements — e.g. add a rep of `Brave 147` or
`Firefox 150` under Plasma Wayland to start populating the
full matrix. Default scope: just the 3 chromium-fourier
reps; matrix-fill cells go into Phase 1 proper.
When green-lit, I fire `run_browser_nodebug.sh
x11research_a1_rep1 chromium-fourier-kwin` first as a smoke
test, surface its drops_summary.txt + kwin_cpu_summary.txt
output, then on confirmation fire reps 2 and 3 back-to-back.