9a023e9264
3 in-session reps of chromium-fourier 149 / brave_drops_test.html /
Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 carry-overs intact). Tight cluster IQR=0:
drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU
median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show
zero composite/dmabuf/GL symbols — only event-loop bookkeeping.
Most likely mechanism: KWin direct-scanout fast-path engaged for
the single-visible-client video case. The campaign's load-bearing
hypothesis ("X11 + non-compositing WM avoids per-frame GL composite
of NV12") is structurally weakened — KWin already avoids that work
under Wayland for this workload. Phase 1 needs to add a
multi-window A1' variant and drm_info-during-playback to confirm
direct-scanout, then revisit matrix cell design.
revert.log entry 6: SDDM autologin + state.conf swap that landed
the Plasma Wayland session for the A1 reps. Backup of original
state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak;
single-command revert documented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
148 lines
5.9 KiB
Markdown
148 lines
5.9 KiB
Markdown
# A1 baseline protocol — in-session Plasma Wayland anchor
|
||
|
||
**Goal:** acquire 3 in-session reps of a chromium-fourier
|
||
under-Plasma-Wayland-with-KWin video playback measurement, so
|
||
the X11 cells of the matrix have a same-session Wayland
|
||
reference to compare against. Per the campaign-contained-data
|
||
discipline, **this is the only Wayland baseline this campaign
|
||
uses**; predecessor numbers are reference history only.
|
||
|
||
## Cell
|
||
|
||
- **Browser:** `/tmp/chromium-ohm-gl-fix-step2/chrome`
|
||
(chromium-fourier 149.0.7812.0, the existing predecessor
|
||
build).
|
||
- **Page:** `file:///home/mfritsche/fourier-test/brave_drops_test.html`
|
||
(a 30 fps H.264 / video element with autoplay + drops trajectory
|
||
emitted to console at 1 Hz; used by the predecessor for all
|
||
Phase 3 reps).
|
||
- **Session:** Plasma Wayland tty1 / session 433
|
||
(the live one, autologin'd via revert.log entry 6).
|
||
- **Window:** windowed (default chromium behavior, no fullscreen).
|
||
- **Decode:** chromium-fourier's default decode path. With the
|
||
Step 1 + Step 2 patches present, this is libva via
|
||
`libva-v4l2-request-fourier` driver (V4L2 stateless on hantro).
|
||
- **Capture window:** 70 s starting at autoplay-detected.
|
||
- **Instrumentation:** `top -p kwin_wayland` (1 Hz),
|
||
`top` (system, 1 Hz), `sudo perf record -F 99 -g
|
||
--call-graph dwarf -p kwin_wayland`, browser stderr (catches
|
||
the page's `DROPS_TRAJECTORY: t=Xs tot=Y drop=Z` 1 Hz log).
|
||
**No `WAYLAND_DEBUG=1`** — this is the `nodebug` variant so
|
||
the kwin %CPU and drop measurements aren't perturbed by
|
||
WAYLAND_DEBUG's per-message overhead.
|
||
|
||
## Bound metrics per rep
|
||
|
||
Each rep's evidence dir contains:
|
||
|
||
- `start.txt` / `end.txt` / `capture_start.txt` — wall-clock
|
||
timestamps of phases.
|
||
- `temp_pre.txt` / `temp_post.txt` — thermal_zone0 (cpu) at
|
||
phase boundaries.
|
||
- `top_kwin.txt` — `kwin_wayland` %CPU samples (70 × 1 Hz).
|
||
- `top_full.txt` — system-wide top (70 × 1 Hz).
|
||
- `perf.data` — perf record at 99 Hz on kwin_wayland.
|
||
- `perf_report_self.txt` — perf report (sorted by overhead).
|
||
- `perf_report_top50.txt` — first 50 lines of perf report.
|
||
- `stderr.log` — full chromium stderr.
|
||
- `drops_trajectory.txt` — extracted DROPS_TRAJECTORY lines.
|
||
- `kwin_cpu_summary.txt` — kwin %CPU samples / median / mean /
|
||
min / max.
|
||
- `drops_summary.txt` — `frames_total`, `drops_total`,
|
||
`drops_post_warmup` (drops accumulated after t=10 s).
|
||
|
||
## Protocol
|
||
|
||
Three reps **back-to-back** with ≥ 30 s idle between to let
|
||
thermals settle. The whole campaign sequence takes ~5 minutes
|
||
of wall time:
|
||
|
||
```
|
||
T+0:00 rep 1: launch + 70s capture + cleanup (~95s)
|
||
T+1:35 30s idle (thermal settle)
|
||
T+2:05 rep 2: same (~95s)
|
||
T+3:40 30s idle
|
||
T+4:10 rep 3: same (~95s)
|
||
T+5:45 done — pull evidence
|
||
```
|
||
|
||
**SSH-driven:** the orchestrator
|
||
`/home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh
|
||
$RUN_ID chromium-fourier-kwin` runs end-to-end from a single
|
||
SSH command. Operator-side, **a chrome window will appear on
|
||
the screen for ~80 s per rep**; the only operator action is
|
||
**not interacting with that window** (no clicks, no typing in
|
||
the chrome window, no pulling focus). The orchestrator kills
|
||
the chrome process cleanly at end of capture.
|
||
|
||
After the 3 reps complete, this campaign's evidence
|
||
sub-directory `phase0_evidence/wayland_baseline_2026-05-03/`
|
||
will contain:
|
||
|
||
```
|
||
a1_rep1/ (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/)
|
||
a1_rep2/
|
||
a1_rep3/
|
||
a1_summary.md (this campaign's interpretation of the 3 reps)
|
||
```
|
||
|
||
The original predecessor evidence at
|
||
`/home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3]`
|
||
is **untouched**.
|
||
|
||
## Exit conditions
|
||
|
||
- Per-rep success = `drops_summary.txt` exists with non-`n/a`
|
||
values, `kwin_cpu_summary.txt` exists with samples > 0, perf
|
||
report has > 1000 samples.
|
||
- Per-rep failure causes:
|
||
- autoplay not detected within 30 s → script aborts, evidence
|
||
dir is partial; rep marked failed.
|
||
- workload exits before autoplay → script aborts.
|
||
- perf record fails (e.g. paranoid > 1) → script continues
|
||
but perf.data is empty; we'd see this in
|
||
`perf_record_stderr.txt`.
|
||
|
||
If a rep fails, surface the cause and re-run that rep before
|
||
moving on.
|
||
|
||
## Decision after 3 reps
|
||
|
||
Compute median + IQR of `drops_post_warmup`, `frames_total`,
|
||
`drops_total`, and kwin %CPU across the three reps. Two
|
||
possible verdict shapes:
|
||
|
||
- **Tight cluster (IQR / median ≤ 0.3):** baseline is stable;
|
||
Phase 1 binding cells can use the median as the anchor with
|
||
the IQR as the tolerance band.
|
||
- **High variance (IQR / median > 0.3):** baseline is noisy;
|
||
Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell
|
||
thresholds need IQR-based formulation rather than fixed
|
||
numbers. This is the predecessor lesson built into the
|
||
worklist's "3 reps minimum (variance is a real concern)".
|
||
|
||
## Operator green-light request
|
||
|
||
Before I fire the 3 reps:
|
||
|
||
1. Confirm you're OK with **a chrome window popping up on the
|
||
screen for ~80 s per rep × 3 reps**, and during that time
|
||
**not interacting with it** (mouse stays still, no key
|
||
presses).
|
||
2. Confirm the current Plasma Wayland session is in a "clean
|
||
measurement state" — i.e. nothing else is doing significant
|
||
CPU work (ideally close any active terminals/browsers/IDE
|
||
windows you don't need; the predecessor's 36-37 % kwin %CPU
|
||
baseline assumes a quiescent desktop with just the test
|
||
chrome window plus normal Plasma services running).
|
||
3. (Optional) Decide whether to also include other variants in
|
||
this turn's measurements — e.g. add a rep of `Brave 147` or
|
||
`Firefox 150` under Plasma Wayland to start populating the
|
||
full matrix. Default scope: just the 3 chromium-fourier
|
||
reps; matrix-fill cells go into Phase 1 proper.
|
||
|
||
When green-lit, I fire `run_browser_nodebug.sh
|
||
x11research_a1_rep1 chromium-fourier-kwin` first as a smoke
|
||
test, surface its drops_summary.txt + kwin_cpu_summary.txt
|
||
output, then on confirmation fire reps 2 and 3 back-to-back.
|