Campaign-contained data discipline + repo notice

Operator directive 2026-05-03: this campaign acquires all its own measurement data from scratch. Predecessor numbers are documented for context but never imported as binding cells, comparison targets, or success thresholds. The lesson the predecessor (kwin_overlay_subsurface) closed-without-patch on is exactly that: phase 1 cells anchored to a single historical ohm_gl_fix Phase 0 measurement, three weeks of phase planning on a baseline that didn't reproduce in-session. The strongest version of feedback_replicate_baseline_first.md: "don't import predecessor data, acquire it fresh." The discipline is now documented as a governing rule in three places: - README.md § "Campaign-contained data discipline" - phase0_findings.md § "Campaign-contained data discipline (governing rule)" - worklist.md § "Governing rule (every phase)" Concrete consequences: - A1 baseline (Phase 0 task) is now mandatory at N=3 reps. Single-rep wasn't enough to surface session variance in the predecessor; doing 3 up front makes the baseline robust to the same kind of session-state drift that ate the predecessor's premise. - Phase 1 thresholds are drawn against the A1 baseline measured in this campaign, not against any predecessor number. - metrics.csv (when it lands) only carries data from this campaign's reps. No predecessor rows imported. README.md additionally: - Adds the predecessor chain (ohm_gl_fix -> kwin_overlay_subsurface -> this campaign) with explicit "what stays valid for source- reading" vs "numbers that don't" separation. - Calls out durable substrate available from predecessors: KWin scanout-promotion archaeology, measurement-protocol template, WAYLAND_DEBUG parser. All structural; none measurement-numerical. - Carry-over predecessor system state on ohm (governor pin, baloo disabled, fourier packages) is explicitly distinguished from measurement data. System state inherits; data does not. - Repository line points to the gitea remote ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git phase0_findings.md additionally: - Reframes the predecessor-close-out summary section header to "(context, not data)" and rephrases past-tense numbers so none are stated as "the baseline." - Adds the discipline lesson narrative in-line before the predecessor close-out: a 30-minute N=3 same-session baseline on day 1 of the predecessor would have made the campaign different — and that's the move this campaign starts with. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 06:59:06 +00:00
parent ac301b4e48
commit e8bae670d3
3 changed files with 215 additions and 57 deletions
@@ -1,5 +1,18 @@
 # Work items — x11-session-research

+## Governing rule (every phase)
+
+**Campaign-contained data acquisition.** Every measurement
+this campaign relies on is acquired in this campaign's
+session. Predecessor numbers are reference history, never
+imported as binding cells, comparison targets, or success
+thresholds. The A1 baseline below is not optional — it is
+the in-session Wayland anchor against which X11 cells are
+compared. See `phase0_findings.md` § "Campaign-contained data
+discipline" and `README.md` § "Campaign-contained data
+discipline" for the full rule and the predecessor lesson
+that motivated it.
+
 ## Phase 0 — substrate + research question + inventory

 **Status: IN PROGRESS.** Motivation LOCKED 2026-05-03 in
@@ -64,13 +77,17 @@ inventory and baseline-anchor work below.
  `XPresent` notify events or `INTEL_swap_event` GLX
  extension equivalents — what's available on rockchip is
  open.
- [ ] **A1 baseline: Wayland-with-KWin reference rep.**
-  1× `kwin_timing_nodebug`-equivalent run on current Plasma
-  Wayland session captured into
-  `phase0_evidence/wayland_baseline_rep1/` as the
-  same-session anchor for the cross-session
-  Wayland-vs-X11 comparison. Per
-  `feedback_replicate_baseline_first.md`.
+- [ ] **A1 baseline: in-session Wayland-with-KWin anchor.**
+  Mandatory per the governing rule. 3 reps minimum (variance
+  is a real concern on this hardware per the predecessor
+  data) of a `kwin_timing_nodebug`-equivalent run on the
+  current Plasma Wayland session, captured into
+  `phase0_evidence/wayland_baseline_repN/`. **This is the
+  only Wayland baseline this campaign uses.** No
+  predecessor number substitutes for these reps even if the
+  predecessor measured an "identical" condition. The 3 reps
+  also surface the session-internal variance up front, so
+  Phase 1 thresholds aren't drawn against a single sample.

 ## Phase 1 — binding cells + measurement protocol

@@ -81,10 +98,14 @@ lock will produce `phase1_lock.md` with:
  drops_post_warmup, end-to-end-latency-if-testable,
  compositor+browser CPU at steady state).
 - The clear-pass / clear-fail thresholds: what "X11 is faster"
-  means quantitatively. Provisional: a matrix cell counts as
-  "X11 faster" if effective_fps under X11 is ≥ 28 (out of 30
-  source) AND drops_post_warmup is ≤ 5, on a same-session
-  cross-paired comparison with the corresponding Wayland cell.
+  means quantitatively. Thresholds are drawn from the A1
+  Wayland baseline acquired in *this* campaign — not from
+  predecessor numbers. Provisional shape (numbers TBD from
+  A1): a matrix cell counts as "X11 faster" if its effective
+  fps exceeds the campaign's same-cell Wayland median by a
+  margin larger than the campaign's same-cell Wayland IQR,
+  and its drops_post_warmup is materially below the same
+  cell's Wayland median.
 - The measurement protocol per matrix cell, including how the
  X11 sessions are entered and how the browser is launched
  with libva forced on/off. Mirrors the structure of
@@ -96,10 +117,10 @@ Pending.

 ## Discipline carry-overs from `kwin_overlay_subsurface`

- *Replicate the baseline first* — per
-  `feedback_replicate_baseline_first.md`. Phase 0 task "A1
-  baseline" exists specifically because of this lesson; do not
-  skip it.
+- *Campaign-contained data acquisition* — see governing rule
+  at the top of this file. The strongest version of the
+  predecessor's "replicate the baseline first" lesson:
+  **don't import predecessor data at all**, acquire it fresh.
 - *Phase discipline* — no patches before source-read is
  documented. Re-scoping must be honest about deferral target.
 - *Non-upstreaming default* — bug reports + MRs are explicit