Campaign-contained data discipline + repo notice

Operator directive 2026-05-03: this campaign acquires all its
own measurement data from scratch. Predecessor numbers are
documented for context but never imported as binding cells,
comparison targets, or success thresholds. The lesson the
predecessor (kwin_overlay_subsurface) closed-without-patch on
is exactly that: phase 1 cells anchored to a single historical
ohm_gl_fix Phase 0 measurement, three weeks of phase planning
on a baseline that didn't reproduce in-session.

The strongest version of feedback_replicate_baseline_first.md:
"don't import predecessor data, acquire it fresh." The discipline
is now documented as a governing rule in three places:

- README.md § "Campaign-contained data discipline"
- phase0_findings.md § "Campaign-contained data discipline
  (governing rule)"
- worklist.md § "Governing rule (every phase)"

Concrete consequences:
- A1 baseline (Phase 0 task) is now mandatory at N=3 reps.
  Single-rep wasn't enough to surface session variance in the
  predecessor; doing 3 up front makes the baseline robust to
  the same kind of session-state drift that ate the
  predecessor's premise.
- Phase 1 thresholds are drawn against the A1 baseline measured
  in this campaign, not against any predecessor number.
- metrics.csv (when it lands) only carries data from this
  campaign's reps. No predecessor rows imported.

README.md additionally:
- Adds the predecessor chain (ohm_gl_fix -> kwin_overlay_subsurface
  -> this campaign) with explicit "what stays valid for source-
  reading" vs "numbers that don't" separation.
- Calls out durable substrate available from predecessors:
  KWin scanout-promotion archaeology, measurement-protocol
  template, WAYLAND_DEBUG parser. All structural; none
  measurement-numerical.
- Carry-over predecessor system state on ohm (governor pin,
  baloo disabled, fourier packages) is explicitly distinguished
  from measurement data. System state inherits; data does not.
- Repository line points to the gitea remote
  ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git

phase0_findings.md additionally:
- Reframes the predecessor-close-out summary section header to
  "(context, not data)" and rephrases past-tense numbers so
  none are stated as "the baseline."
- Adds the discipline lesson narrative in-line before the
  predecessor close-out: a 30-minute N=3 same-session baseline
  on day 1 of the predecessor would have made the campaign
  different — and that's the move this campaign starts with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 06:59:06 +00:00
parent ac301b4e48
commit e8bae670d3
3 changed files with 215 additions and 57 deletions
+36 -15
View File
@@ -1,5 +1,18 @@
# Work items — x11-session-research
## Governing rule (every phase)
**Campaign-contained data acquisition.** Every measurement
this campaign relies on is acquired in this campaign's
session. Predecessor numbers are reference history, never
imported as binding cells, comparison targets, or success
thresholds. The A1 baseline below is not optional — it is
the in-session Wayland anchor against which X11 cells are
compared. See `phase0_findings.md` § "Campaign-contained data
discipline" and `README.md` § "Campaign-contained data
discipline" for the full rule and the predecessor lesson
that motivated it.
## Phase 0 — substrate + research question + inventory
**Status: IN PROGRESS.** Motivation LOCKED 2026-05-03 in
@@ -64,13 +77,17 @@ inventory and baseline-anchor work below.
`XPresent` notify events or `INTEL_swap_event` GLX
extension equivalents — what's available on rockchip is
open.
- [ ] **A1 baseline: Wayland-with-KWin reference rep.**
1× `kwin_timing_nodebug`-equivalent run on current Plasma
Wayland session captured into
`phase0_evidence/wayland_baseline_rep1/` as the
same-session anchor for the cross-session
Wayland-vs-X11 comparison. Per
`feedback_replicate_baseline_first.md`.
- [ ] **A1 baseline: in-session Wayland-with-KWin anchor.**
Mandatory per the governing rule. 3 reps minimum (variance
is a real concern on this hardware per the predecessor
data) of a `kwin_timing_nodebug`-equivalent run on the
current Plasma Wayland session, captured into
`phase0_evidence/wayland_baseline_repN/`. **This is the
only Wayland baseline this campaign uses.** No
predecessor number substitutes for these reps even if the
predecessor measured an "identical" condition. The 3 reps
also surface the session-internal variance up front, so
Phase 1 thresholds aren't drawn against a single sample.
## Phase 1 — binding cells + measurement protocol
@@ -81,10 +98,14 @@ lock will produce `phase1_lock.md` with:
drops_post_warmup, end-to-end-latency-if-testable,
compositor+browser CPU at steady state).
- The clear-pass / clear-fail thresholds: what "X11 is faster"
means quantitatively. Provisional: a matrix cell counts as
"X11 faster" if effective_fps under X11 is ≥ 28 (out of 30
source) AND drops_post_warmup is ≤ 5, on a same-session
cross-paired comparison with the corresponding Wayland cell.
means quantitatively. Thresholds are drawn from the A1
Wayland baseline acquired in *this* campaign — not from
predecessor numbers. Provisional shape (numbers TBD from
A1): a matrix cell counts as "X11 faster" if its effective
fps exceeds the campaign's same-cell Wayland median by a
margin larger than the campaign's same-cell Wayland IQR,
and its drops_post_warmup is materially below the same
cell's Wayland median.
- The measurement protocol per matrix cell, including how the
X11 sessions are entered and how the browser is launched
with libva forced on/off. Mirrors the structure of
@@ -96,10 +117,10 @@ Pending.
## Discipline carry-overs from `kwin_overlay_subsurface`
- *Replicate the baseline first* — per
`feedback_replicate_baseline_first.md`. Phase 0 task "A1
baseline" exists specifically because of this lesson; do not
skip it.
- *Campaign-contained data acquisition* — see governing rule
at the top of this file. The strongest version of the
predecessor's "replicate the baseline first" lesson:
**don't import predecessor data at all**, acquire it fresh.
- *Phase discipline* — no patches before source-read is
documented. Re-scoping must be honest about deferral target.
- *Non-upstreaming default* — bug reports + MRs are explicit