Campaign-contained data discipline + repo notice

Operator directive 2026-05-03: this campaign acquires all its own measurement data from scratch. Predecessor numbers are documented for context but never imported as binding cells, comparison targets, or success thresholds. The lesson the predecessor (kwin_overlay_subsurface) closed-without-patch on is exactly that: phase 1 cells anchored to a single historical ohm_gl_fix Phase 0 measurement, three weeks of phase planning on a baseline that didn't reproduce in-session. The strongest version of feedback_replicate_baseline_first.md: "don't import predecessor data, acquire it fresh." The discipline is now documented as a governing rule in three places: - README.md § "Campaign-contained data discipline" - phase0_findings.md § "Campaign-contained data discipline (governing rule)" - worklist.md § "Governing rule (every phase)" Concrete consequences: - A1 baseline (Phase 0 task) is now mandatory at N=3 reps. Single-rep wasn't enough to surface session variance in the predecessor; doing 3 up front makes the baseline robust to the same kind of session-state drift that ate the predecessor's premise. - Phase 1 thresholds are drawn against the A1 baseline measured in this campaign, not against any predecessor number. - metrics.csv (when it lands) only carries data from this campaign's reps. No predecessor rows imported. README.md additionally: - Adds the predecessor chain (ohm_gl_fix -> kwin_overlay_subsurface -> this campaign) with explicit "what stays valid for source- reading" vs "numbers that don't" separation. - Calls out durable substrate available from predecessors: KWin scanout-promotion archaeology, measurement-protocol template, WAYLAND_DEBUG parser. All structural; none measurement-numerical. - Carry-over predecessor system state on ohm (governor pin, baloo disabled, fourier packages) is explicitly distinguished from measurement data. System state inherits; data does not. - Repository line points to the gitea remote ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git phase0_findings.md additionally: - Reframes the predecessor-close-out summary section header to "(context, not data)" and rephrases past-tense numbers so none are stated as "the baseline." - Adds the discipline lesson narrative in-line before the predecessor close-out: a 30-minute N=3 same-session baseline on day 1 of the predecessor would have made the campaign different — and that's the move this campaign starts with. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 06:59:06 +00:00
parent ac301b4e48
commit e8bae670d3
3 changed files with 215 additions and 57 deletions
@@ -5,7 +5,8 @@
 > "Research question (LOCKED 2026-05-03)". Phase 1 still
 > requires the four pre-question Phase 0 deliverables
 > (state snapshot, X11 path inventory, browser-overlay-path
-> inventory, baseline anchor) before binding cells lock.
+> inventory, **own** baseline anchor) before binding cells
 > lock.
 **Research question:** Does cutting out the KWin compositor
 enable faster video display of Brave, chromium-fourier, and
@@ -27,35 +28,117 @@ The matrix is 3 browsers × 2 decode paths × 2 sessions = 12
 cells (some N/A where libva isn't supported). See
 `phase0_findings.md` for the full table.
-## Predecessor
+## Campaign-contained data discipline
-This campaign exists because
+**This campaign acquires all its own measurement data from
-[`../kwin_overlay_subsurface/`](../kwin_overlay_subsurface/)
+scratch in this session. Numbers, baselines, and verdict
-closed 2026-05-03 without patch (`phase8_handover.md`). Its
+thresholds from predecessor campaigns are NEVER imported as
-diagnostic loop terminated at "Phase 0 cage = 0 post-warmup
+binding cells, comparison targets, or success criteria.**
 drops floor not reproducible at N=3." The natural next move
 across the design surface is to vary the display server (X11
 instead of Wayland) on the same hardware and the same client
 binary, but the operator has not yet confirmed that as the
 campaign's specific question.
-## Hardware target (provisional, same as predecessor)
+Predecessor campaigns are documented for *context* — what
 question was asked, what file:line pointers stay valid for
 source-reading, what tooling/state on ohm carries over. But
 their *measurement numbers* (drop counts, perf percentages,
 Δ_present medians, kwin %CPU values) **are reference history
 only, not data this campaign relies on**.
 The discipline lesson is concrete:
 [`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md)
 closed 2026-05-03 without patch because its Phase 1 binding
 cells were anchored to a single historical measurement
 (`ohm_gl_fix` Phase 0: cage = 7 drops total / 0 post-warmup).
 That baseline was treated as a stable property of the
 hardware × software stack and the campaign's whole phase
 plan was built around closing the gap to it. When the same
 condition was re-measured at N=3 in the closing session of
 the campaign, none of the three reps reached 0 post-warmup
 drops; the median was 26. The campaign's premise didn't
 hold up to in-session replication, three weeks of phase
 planning ended at Phase 8 close-out, and the lesson is now
 [`feedback_replicate_baseline_first.md`](../kwin_overlay_subsurface/.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md)
 in this session's memory.
 This campaign acts on that lesson by stronger discipline:
 - Every Phase 1 binding cell anchors to data acquired in
  this campaign's session.
 - Predecessor *measurements* never appear in
  `metrics.csv`. Predecessor *file:line source-reads,
  protocol designs, parser scripts* may be referenced or
  re-used as durable substrate.
 - The Phase 0 "A1 baseline" rep is not optional. It is the
  Wayland-with-KWin reference rep acquired at the start of
  this campaign, against which the X11 cells of the matrix
  will be compared. The predecessor's matching numbers are
  not used as a substitute even if they look identical.
 - If a measurement instrument behaved one way in a
  predecessor campaign, that behavior is re-verified here
  before being relied on (see Phase 0 X11 measurement-tool
  inventory).
 If a measurement in this campaign happens to match a
 predecessor's number, that's a useful corroboration of
 reproducibility — but it cannot substitute for in-session
 acquisition.
 ## Predecessor campaigns
 This campaign is the third in a chain on the same hardware
 target:
 1. [`../ohm_gl_fix/`](../ohm_gl_fix/) — closed 2026-05-02.
   Diagnosed and patched the Step 1 (`libva-v4l2-request`)
   and Step 2 (Chromium `WaylandBufferManagerHost`
   overlay-route) issues. Spun off the Wayland-compositor-
   path question into a separate campaign because that
   question had a different code surface, upstream, and
   reviewer culture.
 2. [`../kwin_overlay_subsurface/`](../kwin_overlay_subsurface/)
   — closed 2026-05-03 without patch
   (`phase8_handover.md`). Diagnostic loop terminated at
   "Phase 0 cage = 0 post-warmup drops floor not
   reproducible at N=3" — the inherited baseline turned
   out to be N=1 historical and didn't replicate in-session.
 3. **This campaign** (`x11-session-research`) — opened
   2026-05-03. Tests whether bypassing the KWin compositor
   via X11 + non-compositing WM avoids the forced
   NV12 → RGB GL-composite step the predecessor identified
   as a structural Wayland constraint on rockchip-drm.
 Durable substrate from the predecessors that this campaign
 *may* read (not import-as-data):
 - `kwin_overlay_subsurface/phase2_source_findings.md` —
  KWin scanout-promotion archaeology (Plane 39 / Plane 45
  format/modifier table on rockchip-drm RK3568); Phase 2-prime
  Shape C source-read of `Display::dispatchEvents` and
  `TransactionFence` (Wayland-side, doesn't transfer to X11
  but illustrates the per-frame dispatch shape).
 - `kwin_overlay_subsurface/phase3_protocol_prime.md` — the
  measurement-protocol structure (operator-vs-Claude split,
  pre-registered hypothesis, falsification table, evidence
  layout). The structural template is reusable. The specific
  numbers and threshold values **are not**.
 - `kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py` —
  parser for libwayland WAYLAND_DEBUG output. Useful for the
  Wayland cells of this campaign's matrix; an X11 equivalent
  is yet-to-build.
 ## Hardware target (same as predecessors)
 ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2,
 hantro G1/G2 VPU). Kernel `6.19.10-danctnix1-1-pinetab2`.
 Mesa 26.0.5. Currently runs KDE Plasma 6.6.4 Wayland.
-For an X11-session campaign, ohm needs an Xorg + Plasma X11 (or
+For an X11-session campaign, ohm needs an Xorg + non-compositing
-similar X11 desktop) install path verified. As of 2026-05-03,
+WM install path verified. As of 2026-05-03, the only confirmed
-the only confirmed display path on ohm is
+display path on ohm is `startplasma-wayland`. Phase 0 inventory
-`startplasma-wayland`. **Whether Plasma X11, an alternate X11
+will catalog what's installed, what'd need installing, and
-desktop (XFCE, openbox, lightweight WM), or Plasma running
+what SDDM-advertised sessions exist.
 under a Wayland-Xorg shim is in scope is part of the research
 question to be locked.**
-## Carry-overs from predecessor (still active on ohm)
+## Carry-overs from predecessor (system state, not data)
-Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`:
+Per `../kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`:
 - `qt6-base-fourier 1:6.11.0-3` installed.
 - `kwin-fourier 1:6.6.4-3` installed.
@@ -65,19 +148,29 @@ Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`:
 - `drm-info 2.9.0-1` installed.
 These were not reverted at the predecessor's close-out. This
-campaign inherits them unless an explicit revert is part of
+campaign inherits them as **system state**, distinct from
-the design.
+**measurement data**. A different system state (e.g. a fresh
 Plasma session, or the same packages but different uptime)
 might produce different measurements; this is exactly why
 in-session baseline acquisition matters.
 ## Non-upstreaming default
-Inherited from the predecessor and from `ohm_gl_fix`. Bug
+Inherited from the predecessors. Bug reports + MRs are
-reports + MRs are explicit operator-tasked decisions, not
+explicit operator-tasked decisions, not background process
-background process steps.
+steps.
 ## File map (will grow)
 | File | What it is |
 |---|---|
 | `README.md` | This file. |
-| `phase0_findings.md` | Substrate from the predecessor + the candidate research question. **Awaits operator confirmation/redirect on the question itself.** |
+| `phase0_findings.md` | Locked motivation + experimental matrix + open questions before Phase 1. |
-| `worklist.md` | Phase-by-phase task list. Phase 0 only as of campaign start. |
+| `worklist.md` | Phase-by-phase task list. |
 | `phase0_evidence/` (created when first run lands) | Phase 0 inventory + A1 baseline rep. |
 ## Repository
 `ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git`
 (public). Cloned/pushed via SSH on port 2222 per the home-infra
 gitea convention.
@@ -1,24 +1,68 @@
-# Phase 0 — substrate and provisional research question
+# Phase 0 — substrate and locked research question
-This is the campaign's Phase 0 substrate doc: what we already
+This is the campaign's Phase 0 doc: the locked research
-know from the predecessor `kwin_overlay_subsurface` close-out,
+question (see § "Research question (LOCKED 2026-05-03)"
-what's open, and what the candidate research question looks
+below), the substrate inherited as *context* from the
-like. **The research question is provisional and awaits
+predecessor `kwin_overlay_subsurface`, and the open questions
-operator confirmation before Phase 1 lock.**
+this campaign answers in-session before Phase 1 binding cells
 lock.
-## Predecessor close-out summary
+## Campaign-contained data discipline (governing rule)
 **This campaign acquires all its own measurement data from
 scratch in this session.** Predecessor measurement numbers
 (drop counts, perf percentages, Δ_present medians, kwin %CPU
 values, threshold values) are documented for *context* but
 **never imported as binding cells, comparison targets, or
 success criteria**.
 Concretely, in this Phase 0 doc:
 - Numbers from `kwin_overlay_subsurface` Phase 3 / Phase 3-prime
  (e.g. "C5' had 32 drops total / 22 post-warmup"; "median
  Δ_present 45.85 ms"; "kwin %CPU median 36.90") are quoted
  **only as evidence of past measurements that may or may not
  reproduce**. They establish the SHAPE of what's measurable
  and the fact that measurement infrastructure works on this
  hardware. They do **not** establish "this is the baseline
  the X11 cells will be compared against."
 - The Phase 0 A1 baseline rep (worklist.md) is the
  in-session anchor for any with-KWin Wayland measurement
  this campaign references.
 - No `metrics.csv` row in this campaign is populated from
  predecessor data, even if the predecessor measured an
  identical condition.
 - If an X11 cell appears to "match the cage Phase 0 number,"
  that's incidental — the cage Phase 0 number is not the test
  this campaign is running. The test is "X11 vs in-session
  Wayland baseline."
 The discipline lesson is concrete: the predecessor's Phase 1
 binding cell `drops_post_warmup == 0` was anchored to a single
 ohm_gl_fix Phase 0 cage measurement (7 drops total / 0
 post-warmup). Three weeks of phase planning ran on the
 assumption that floor existed. At N=3 in-session replication
 on the closing day of the predecessor, that floor was missing
 (cage today: 22 / 26 / 56 post-warmup). The campaign closed
 without patch. **A 30-minute N=3 same-session baseline check
 on day 1 would have made the campaign different — or made it
 honestly close earlier.** This campaign acts on that lesson.
 ## Predecessor close-out summary (context, not data)
 [`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md)
 (closed 2026-05-03 without patch). Three independent reasons
-no patch landed:
+no patch landed (numbers below are quoted as historical
 context per the governing rule above):
-1. The campaign's locked Phase 1 reference floor
+1. The predecessor's locked Phase 1 reference floor
-   (`drops_post_warmup == 0` from cage) is unreachable at N=3
+   (`drops_post_warmup == 0` from cage) was unreachable in the
-   today. Today's median is 26 post-warmup with the same
+   predecessor's closing N=3 measurement session — same
   chromium-fourier binary, same hardware, same kernel, same
-   Mesa, same kwin-fourier — KWin direct reproduces Phase 0's
+   Mesa, same kwin-fourier as the original Phase 0 measurement.
-   29 post-warmup, but cage now also drops ~22-56 post-warmup
+   KWin direct's number reproduced; cage's 0-floor did not.
-   instead of Phase 0's 0.
+   Numbers quoted in the predecessor's `phase3_prime_findings.md`;
   not used here as a baseline for any cell.
 2. The campaign's surface-of-investigation
   (`wp_subsurface` overlay route) is not engaged by
   `brave_drops_test.html`. Chromium-fourier renders the video
@@ -1,5 +1,18 @@
 # Work items — x11-session-research
 ## Governing rule (every phase)
 **Campaign-contained data acquisition.** Every measurement
 this campaign relies on is acquired in this campaign's
 session. Predecessor numbers are reference history, never
 imported as binding cells, comparison targets, or success
 thresholds. The A1 baseline below is not optional — it is
 the in-session Wayland anchor against which X11 cells are
 compared. See `phase0_findings.md` § "Campaign-contained data
 discipline" and `README.md` § "Campaign-contained data
 discipline" for the full rule and the predecessor lesson
 that motivated it.
 ## Phase 0 — substrate + research question + inventory
 **Status: IN PROGRESS.** Motivation LOCKED 2026-05-03 in
@@ -64,13 +77,17 @@ inventory and baseline-anchor work below.
  `XPresent` notify events or `INTEL_swap_event` GLX
  extension equivalents — what's available on rockchip is
  open.
- [ ] **A1 baseline: Wayland-with-KWin reference rep.**
+- [ ] **A1 baseline: in-session Wayland-with-KWin anchor.**
-  1× `kwin_timing_nodebug`-equivalent run on current Plasma
+  Mandatory per the governing rule. 3 reps minimum (variance
-  Wayland session captured into
+  is a real concern on this hardware per the predecessor
-  `phase0_evidence/wayland_baseline_rep1/` as the
+  data) of a `kwin_timing_nodebug`-equivalent run on the
-  same-session anchor for the cross-session
+  current Plasma Wayland session, captured into
-  Wayland-vs-X11 comparison. Per
+  `phase0_evidence/wayland_baseline_repN/`. **This is the
-  `feedback_replicate_baseline_first.md`.
+  only Wayland baseline this campaign uses.** No
  predecessor number substitutes for these reps even if the
  predecessor measured an "identical" condition. The 3 reps
  also surface the session-internal variance up front, so
  Phase 1 thresholds aren't drawn against a single sample.
 ## Phase 1 — binding cells + measurement protocol
@@ -81,10 +98,14 @@ lock will produce `phase1_lock.md` with:
  drops_post_warmup, end-to-end-latency-if-testable,
  compositor+browser CPU at steady state).
 - The clear-pass / clear-fail thresholds: what "X11 is faster"
-  means quantitatively. Provisional: a matrix cell counts as
+  means quantitatively. Thresholds are drawn from the A1
-  "X11 faster" if effective_fps under X11 is ≥ 28 (out of 30
+  Wayland baseline acquired in *this* campaign — not from
-  source) AND drops_post_warmup is ≤ 5, on a same-session
+  predecessor numbers. Provisional shape (numbers TBD from
-  cross-paired comparison with the corresponding Wayland cell.
+  A1): a matrix cell counts as "X11 faster" if its effective
  fps exceeds the campaign's same-cell Wayland median by a
  margin larger than the campaign's same-cell Wayland IQR,
  and its drops_post_warmup is materially below the same
  cell's Wayland median.
 - The measurement protocol per matrix cell, including how the
  X11 sessions are entered and how the browser is launched
  with libva forced on/off. Mirrors the structure of
@@ -96,10 +117,10 @@ Pending.
 ## Discipline carry-overs from `kwin_overlay_subsurface`
- *Replicate the baseline first* — per
+- *Campaign-contained data acquisition* — see governing rule
-  `feedback_replicate_baseline_first.md`. Phase 0 task "A1
+  at the top of this file. The strongest version of the
-  baseline" exists specifically because of this lesson; do not
+  predecessor's "replicate the baseline first" lesson:
-  skip it.
+  **don't import predecessor data at all**, acquire it fresh.
 - *Phase discipline* — no patches before source-read is
  documented. Re-scoping must be honest about deferral target.
 - *Non-upstreaming default* — bug reports + MRs are explicit