Files
marfrit 9a023e9264 Phase 0: A1 Wayland baseline + state snapshot — major reframing
3 in-session reps of chromium-fourier 149 / brave_drops_test.html /
Plasma Wayland 6.6.4 (kwin-fourier 6.6.4-3 + qt6-base-fourier
6.11.0-3 carry-overs intact). Tight cluster IQR=0:
drops_total=0, drops_post_warmup=0, frames_total=1685, kwin %CPU
median=0.00, mean=0.04. Perf samples on kwin (~30 over 70s) show
zero composite/dmabuf/GL symbols — only event-loop bookkeeping.

Most likely mechanism: KWin direct-scanout fast-path engaged for
the single-visible-client video case. The campaign's load-bearing
hypothesis ("X11 + non-compositing WM avoids per-frame GL composite
of NV12") is structurally weakened — KWin already avoids that work
under Wayland for this workload. Phase 1 needs to add a
multi-window A1' variant and drm_info-during-playback to confirm
direct-scanout, then revisit matrix cell design.

revert.log entry 6: SDDM autologin + state.conf swap that landed
the Plasma Wayland session for the A1 reps. Backup of original
state.conf preserved at /var/lib/sddm/state.conf.x11-research-bak;
single-command revert documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:11:15 +00:00

5.9 KiB
Raw Permalink Blame History

A1 baseline protocol — in-session Plasma Wayland anchor

Goal: acquire 3 in-session reps of a chromium-fourier under-Plasma-Wayland-with-KWin video playback measurement, so the X11 cells of the matrix have a same-session Wayland reference to compare against. Per the campaign-contained-data discipline, this is the only Wayland baseline this campaign uses; predecessor numbers are reference history only.

Cell

  • Browser: /tmp/chromium-ohm-gl-fix-step2/chrome (chromium-fourier 149.0.7812.0, the existing predecessor build).
  • Page: file:///home/mfritsche/fourier-test/brave_drops_test.html (a 30 fps H.264 / video element with autoplay + drops trajectory emitted to console at 1 Hz; used by the predecessor for all Phase 3 reps).
  • Session: Plasma Wayland tty1 / session 433 (the live one, autologin'd via revert.log entry 6).
  • Window: windowed (default chromium behavior, no fullscreen).
  • Decode: chromium-fourier's default decode path. With the Step 1 + Step 2 patches present, this is libva via libva-v4l2-request-fourier driver (V4L2 stateless on hantro).
  • Capture window: 70 s starting at autoplay-detected.
  • Instrumentation: top -p kwin_wayland (1 Hz), top (system, 1 Hz), sudo perf record -F 99 -g --call-graph dwarf -p kwin_wayland, browser stderr (catches the page's DROPS_TRAJECTORY: t=Xs tot=Y drop=Z 1 Hz log). No WAYLAND_DEBUG=1 — this is the nodebug variant so the kwin %CPU and drop measurements aren't perturbed by WAYLAND_DEBUG's per-message overhead.

Bound metrics per rep

Each rep's evidence dir contains:

  • start.txt / end.txt / capture_start.txt — wall-clock timestamps of phases.
  • temp_pre.txt / temp_post.txt — thermal_zone0 (cpu) at phase boundaries.
  • top_kwin.txtkwin_wayland %CPU samples (70 × 1 Hz).
  • top_full.txt — system-wide top (70 × 1 Hz).
  • perf.data — perf record at 99 Hz on kwin_wayland.
  • perf_report_self.txt — perf report (sorted by overhead).
  • perf_report_top50.txt — first 50 lines of perf report.
  • stderr.log — full chromium stderr.
  • drops_trajectory.txt — extracted DROPS_TRAJECTORY lines.
  • kwin_cpu_summary.txt — kwin %CPU samples / median / mean / min / max.
  • drops_summary.txtframes_total, drops_total, drops_post_warmup (drops accumulated after t=10 s).

Protocol

Three reps back-to-back with ≥ 30 s idle between to let thermals settle. The whole campaign sequence takes ~5 minutes of wall time:

T+0:00   rep 1: launch + 70s capture + cleanup       (~95s)
T+1:35   30s idle (thermal settle)
T+2:05   rep 2: same                                  (~95s)
T+3:40   30s idle
T+4:10   rep 3: same                                  (~95s)
T+5:45   done — pull evidence

SSH-driven: the orchestrator /home/mfritsche/phase3_prime_runs/run_browser_nodebug.sh $RUN_ID chromium-fourier-kwin runs end-to-end from a single SSH command. Operator-side, a chrome window will appear on the screen for ~80 s per rep; the only operator action is not interacting with that window (no clicks, no typing in the chrome window, no pulling focus). The orchestrator kills the chrome process cleanly at end of capture.

After the 3 reps complete, this campaign's evidence sub-directory phase0_evidence/wayland_baseline_2026-05-03/ will contain:

a1_rep1/   (moved from /home/mfritsche/phase3_prime_runs/x11research_a1_rep1/)
a1_rep2/
a1_rep3/
a1_summary.md   (this campaign's interpretation of the 3 reps)

The original predecessor evidence at /home/mfritsche/phase3_prime_runs/kwin_timing_nodebug_rep[1-3] is untouched.

Exit conditions

  • Per-rep success = drops_summary.txt exists with non-n/a values, kwin_cpu_summary.txt exists with samples > 0, perf report has > 1000 samples.
  • Per-rep failure causes:
    • autoplay not detected within 30 s → script aborts, evidence dir is partial; rep marked failed.
    • workload exits before autoplay → script aborts.
    • perf record fails (e.g. paranoid > 1) → script continues but perf.data is empty; we'd see this in perf_record_stderr.txt.

If a rep fails, surface the cause and re-run that rep before moving on.

Decision after 3 reps

Compute median + IQR of drops_post_warmup, frames_total, drops_total, and kwin %CPU across the three reps. Two possible verdict shapes:

  • Tight cluster (IQR / median ≤ 0.3): baseline is stable; Phase 1 binding cells can use the median as the anchor with the IQR as the tolerance band.
  • High variance (IQR / median > 0.3): baseline is noisy; Phase 1 needs ≥ 5 reps per cell, not 3, and binding-cell thresholds need IQR-based formulation rather than fixed numbers. This is the predecessor lesson built into the worklist's "3 reps minimum (variance is a real concern)".

Operator green-light request

Before I fire the 3 reps:

  1. Confirm you're OK with a chrome window popping up on the screen for ~80 s per rep × 3 reps, and during that time not interacting with it (mouse stays still, no key presses).
  2. Confirm the current Plasma Wayland session is in a "clean measurement state" — i.e. nothing else is doing significant CPU work (ideally close any active terminals/browsers/IDE windows you don't need; the predecessor's 36-37 % kwin %CPU baseline assumes a quiescent desktop with just the test chrome window plus normal Plasma services running).
  3. (Optional) Decide whether to also include other variants in this turn's measurements — e.g. add a rep of Brave 147 or Firefox 150 under Plasma Wayland to start populating the full matrix. Default scope: just the 3 chromium-fourier reps; matrix-fill cells go into Phase 1 proper.

When green-lit, I fire run_browser_nodebug.sh x11research_a1_rep1 chromium-fourier-kwin first as a smoke test, surface its drops_summary.txt + kwin_cpu_summary.txt output, then on confirmation fire reps 2 and 3 back-to-back.