diff --git a/README.md b/README.md index 6cb961f..d6b3c74 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,8 @@ > "Research question (LOCKED 2026-05-03)". Phase 1 still > requires the four pre-question Phase 0 deliverables > (state snapshot, X11 path inventory, browser-overlay-path -> inventory, baseline anchor) before binding cells lock. +> inventory, **own** baseline anchor) before binding cells +> lock. **Research question:** Does cutting out the KWin compositor enable faster video display of Brave, chromium-fourier, and @@ -27,35 +28,117 @@ The matrix is 3 browsers × 2 decode paths × 2 sessions = 12 cells (some N/A where libva isn't supported). See `phase0_findings.md` for the full table. -## Predecessor +## Campaign-contained data discipline -This campaign exists because -[`../kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) -closed 2026-05-03 without patch (`phase8_handover.md`). Its -diagnostic loop terminated at "Phase 0 cage = 0 post-warmup -drops floor not reproducible at N=3." The natural next move -across the design surface is to vary the display server (X11 -instead of Wayland) on the same hardware and the same client -binary, but the operator has not yet confirmed that as the -campaign's specific question. +**This campaign acquires all its own measurement data from +scratch in this session. Numbers, baselines, and verdict +thresholds from predecessor campaigns are NEVER imported as +binding cells, comparison targets, or success criteria.** -## Hardware target (provisional, same as predecessor) +Predecessor campaigns are documented for *context* — what +question was asked, what file:line pointers stay valid for +source-reading, what tooling/state on ohm carries over. But +their *measurement numbers* (drop counts, perf percentages, +Δ_present medians, kwin %CPU values) **are reference history +only, not data this campaign relies on**. + +The discipline lesson is concrete: +[`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md) +closed 2026-05-03 without patch because its Phase 1 binding +cells were anchored to a single historical measurement +(`ohm_gl_fix` Phase 0: cage = 7 drops total / 0 post-warmup). +That baseline was treated as a stable property of the +hardware × software stack and the campaign's whole phase +plan was built around closing the gap to it. When the same +condition was re-measured at N=3 in the closing session of +the campaign, none of the three reps reached 0 post-warmup +drops; the median was 26. The campaign's premise didn't +hold up to in-session replication, three weeks of phase +planning ended at Phase 8 close-out, and the lesson is now +[`feedback_replicate_baseline_first.md`](../kwin_overlay_subsurface/.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md) +in this session's memory. + +This campaign acts on that lesson by stronger discipline: + +- Every Phase 1 binding cell anchors to data acquired in + this campaign's session. +- Predecessor *measurements* never appear in + `metrics.csv`. Predecessor *file:line source-reads, + protocol designs, parser scripts* may be referenced or + re-used as durable substrate. +- The Phase 0 "A1 baseline" rep is not optional. It is the + Wayland-with-KWin reference rep acquired at the start of + this campaign, against which the X11 cells of the matrix + will be compared. The predecessor's matching numbers are + not used as a substitute even if they look identical. +- If a measurement instrument behaved one way in a + predecessor campaign, that behavior is re-verified here + before being relied on (see Phase 0 X11 measurement-tool + inventory). + +If a measurement in this campaign happens to match a +predecessor's number, that's a useful corroboration of +reproducibility — but it cannot substitute for in-session +acquisition. + +## Predecessor campaigns + +This campaign is the third in a chain on the same hardware +target: + +1. [`../ohm_gl_fix/`](../ohm_gl_fix/) — closed 2026-05-02. + Diagnosed and patched the Step 1 (`libva-v4l2-request`) + and Step 2 (Chromium `WaylandBufferManagerHost` + overlay-route) issues. Spun off the Wayland-compositor- + path question into a separate campaign because that + question had a different code surface, upstream, and + reviewer culture. +2. [`../kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) + — closed 2026-05-03 without patch + (`phase8_handover.md`). Diagnostic loop terminated at + "Phase 0 cage = 0 post-warmup drops floor not + reproducible at N=3" — the inherited baseline turned + out to be N=1 historical and didn't replicate in-session. +3. **This campaign** (`x11-session-research`) — opened + 2026-05-03. Tests whether bypassing the KWin compositor + via X11 + non-compositing WM avoids the forced + NV12 → RGB GL-composite step the predecessor identified + as a structural Wayland constraint on rockchip-drm. + +Durable substrate from the predecessors that this campaign +*may* read (not import-as-data): + +- `kwin_overlay_subsurface/phase2_source_findings.md` — + KWin scanout-promotion archaeology (Plane 39 / Plane 45 + format/modifier table on rockchip-drm RK3568); Phase 2-prime + Shape C source-read of `Display::dispatchEvents` and + `TransactionFence` (Wayland-side, doesn't transfer to X11 + but illustrates the per-frame dispatch shape). +- `kwin_overlay_subsurface/phase3_protocol_prime.md` — the + measurement-protocol structure (operator-vs-Claude split, + pre-registered hypothesis, falsification table, evidence + layout). The structural template is reusable. The specific + numbers and threshold values **are not**. +- `kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py` — + parser for libwayland WAYLAND_DEBUG output. Useful for the + Wayland cells of this campaign's matrix; an X11 equivalent + is yet-to-build. + +## Hardware target (same as predecessors) ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Kernel `6.19.10-danctnix1-1-pinetab2`. Mesa 26.0.5. Currently runs KDE Plasma 6.6.4 Wayland. -For an X11-session campaign, ohm needs an Xorg + Plasma X11 (or -similar X11 desktop) install path verified. As of 2026-05-03, -the only confirmed display path on ohm is -`startplasma-wayland`. **Whether Plasma X11, an alternate X11 -desktop (XFCE, openbox, lightweight WM), or Plasma running -under a Wayland-Xorg shim is in scope is part of the research -question to be locked.** +For an X11-session campaign, ohm needs an Xorg + non-compositing +WM install path verified. As of 2026-05-03, the only confirmed +display path on ohm is `startplasma-wayland`. Phase 0 inventory +will catalog what's installed, what'd need installing, and +what SDDM-advertised sessions exist. -## Carry-overs from predecessor (still active on ohm) +## Carry-overs from predecessor (system state, not data) -Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`: +Per `../kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`: - `qt6-base-fourier 1:6.11.0-3` installed. - `kwin-fourier 1:6.6.4-3` installed. @@ -65,19 +148,29 @@ Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`: - `drm-info 2.9.0-1` installed. These were not reverted at the predecessor's close-out. This -campaign inherits them unless an explicit revert is part of -the design. +campaign inherits them as **system state**, distinct from +**measurement data**. A different system state (e.g. a fresh +Plasma session, or the same packages but different uptime) +might produce different measurements; this is exactly why +in-session baseline acquisition matters. ## Non-upstreaming default -Inherited from the predecessor and from `ohm_gl_fix`. Bug -reports + MRs are explicit operator-tasked decisions, not -background process steps. +Inherited from the predecessors. Bug reports + MRs are +explicit operator-tasked decisions, not background process +steps. ## File map (will grow) | File | What it is | |---|---| | `README.md` | This file. | -| `phase0_findings.md` | Substrate from the predecessor + the candidate research question. **Awaits operator confirmation/redirect on the question itself.** | -| `worklist.md` | Phase-by-phase task list. Phase 0 only as of campaign start. | +| `phase0_findings.md` | Locked motivation + experimental matrix + open questions before Phase 1. | +| `worklist.md` | Phase-by-phase task list. | +| `phase0_evidence/` (created when first run lands) | Phase 0 inventory + A1 baseline rep. | + +## Repository + +`ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git` +(public). Cloned/pushed via SSH on port 2222 per the home-infra +gitea convention. diff --git a/phase0_findings.md b/phase0_findings.md index 48e3dea..ef3c407 100644 --- a/phase0_findings.md +++ b/phase0_findings.md @@ -1,24 +1,68 @@ -# Phase 0 — substrate and provisional research question +# Phase 0 — substrate and locked research question -This is the campaign's Phase 0 substrate doc: what we already -know from the predecessor `kwin_overlay_subsurface` close-out, -what's open, and what the candidate research question looks -like. **The research question is provisional and awaits -operator confirmation before Phase 1 lock.** +This is the campaign's Phase 0 doc: the locked research +question (see § "Research question (LOCKED 2026-05-03)" +below), the substrate inherited as *context* from the +predecessor `kwin_overlay_subsurface`, and the open questions +this campaign answers in-session before Phase 1 binding cells +lock. -## Predecessor close-out summary +## Campaign-contained data discipline (governing rule) + +**This campaign acquires all its own measurement data from +scratch in this session.** Predecessor measurement numbers +(drop counts, perf percentages, Δ_present medians, kwin %CPU +values, threshold values) are documented for *context* but +**never imported as binding cells, comparison targets, or +success criteria**. + +Concretely, in this Phase 0 doc: + +- Numbers from `kwin_overlay_subsurface` Phase 3 / Phase 3-prime + (e.g. "C5' had 32 drops total / 22 post-warmup"; "median + Δ_present 45.85 ms"; "kwin %CPU median 36.90") are quoted + **only as evidence of past measurements that may or may not + reproduce**. They establish the SHAPE of what's measurable + and the fact that measurement infrastructure works on this + hardware. They do **not** establish "this is the baseline + the X11 cells will be compared against." +- The Phase 0 A1 baseline rep (worklist.md) is the + in-session anchor for any with-KWin Wayland measurement + this campaign references. +- No `metrics.csv` row in this campaign is populated from + predecessor data, even if the predecessor measured an + identical condition. +- If an X11 cell appears to "match the cage Phase 0 number," + that's incidental — the cage Phase 0 number is not the test + this campaign is running. The test is "X11 vs in-session + Wayland baseline." + +The discipline lesson is concrete: the predecessor's Phase 1 +binding cell `drops_post_warmup == 0` was anchored to a single +ohm_gl_fix Phase 0 cage measurement (7 drops total / 0 +post-warmup). Three weeks of phase planning ran on the +assumption that floor existed. At N=3 in-session replication +on the closing day of the predecessor, that floor was missing +(cage today: 22 / 26 / 56 post-warmup). The campaign closed +without patch. **A 30-minute N=3 same-session baseline check +on day 1 would have made the campaign different — or made it +honestly close earlier.** This campaign acts on that lesson. + +## Predecessor close-out summary (context, not data) [`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md) (closed 2026-05-03 without patch). Three independent reasons -no patch landed: +no patch landed (numbers below are quoted as historical +context per the governing rule above): -1. The campaign's locked Phase 1 reference floor - (`drops_post_warmup == 0` from cage) is unreachable at N=3 - today. Today's median is 26 post-warmup with the same +1. The predecessor's locked Phase 1 reference floor + (`drops_post_warmup == 0` from cage) was unreachable in the + predecessor's closing N=3 measurement session — same chromium-fourier binary, same hardware, same kernel, same - Mesa, same kwin-fourier — KWin direct reproduces Phase 0's - 29 post-warmup, but cage now also drops ~22-56 post-warmup - instead of Phase 0's 0. + Mesa, same kwin-fourier as the original Phase 0 measurement. + KWin direct's number reproduced; cage's 0-floor did not. + Numbers quoted in the predecessor's `phase3_prime_findings.md`; + not used here as a baseline for any cell. 2. The campaign's surface-of-investigation (`wp_subsurface` overlay route) is not engaged by `brave_drops_test.html`. Chromium-fourier renders the video diff --git a/worklist.md b/worklist.md index 8ac0aaf..a6d7f19 100644 --- a/worklist.md +++ b/worklist.md @@ -1,5 +1,18 @@ # Work items — x11-session-research +## Governing rule (every phase) + +**Campaign-contained data acquisition.** Every measurement +this campaign relies on is acquired in this campaign's +session. Predecessor numbers are reference history, never +imported as binding cells, comparison targets, or success +thresholds. The A1 baseline below is not optional — it is +the in-session Wayland anchor against which X11 cells are +compared. See `phase0_findings.md` § "Campaign-contained data +discipline" and `README.md` § "Campaign-contained data +discipline" for the full rule and the predecessor lesson +that motivated it. + ## Phase 0 — substrate + research question + inventory **Status: IN PROGRESS.** Motivation LOCKED 2026-05-03 in @@ -64,13 +77,17 @@ inventory and baseline-anchor work below. `XPresent` notify events or `INTEL_swap_event` GLX extension equivalents — what's available on rockchip is open. -- [ ] **A1 baseline: Wayland-with-KWin reference rep.** - 1× `kwin_timing_nodebug`-equivalent run on current Plasma - Wayland session captured into - `phase0_evidence/wayland_baseline_rep1/` as the - same-session anchor for the cross-session - Wayland-vs-X11 comparison. Per - `feedback_replicate_baseline_first.md`. +- [ ] **A1 baseline: in-session Wayland-with-KWin anchor.** + Mandatory per the governing rule. 3 reps minimum (variance + is a real concern on this hardware per the predecessor + data) of a `kwin_timing_nodebug`-equivalent run on the + current Plasma Wayland session, captured into + `phase0_evidence/wayland_baseline_repN/`. **This is the + only Wayland baseline this campaign uses.** No + predecessor number substitutes for these reps even if the + predecessor measured an "identical" condition. The 3 reps + also surface the session-internal variance up front, so + Phase 1 thresholds aren't drawn against a single sample. ## Phase 1 — binding cells + measurement protocol @@ -81,10 +98,14 @@ lock will produce `phase1_lock.md` with: drops_post_warmup, end-to-end-latency-if-testable, compositor+browser CPU at steady state). - The clear-pass / clear-fail thresholds: what "X11 is faster" - means quantitatively. Provisional: a matrix cell counts as - "X11 faster" if effective_fps under X11 is ≥ 28 (out of 30 - source) AND drops_post_warmup is ≤ 5, on a same-session - cross-paired comparison with the corresponding Wayland cell. + means quantitatively. Thresholds are drawn from the A1 + Wayland baseline acquired in *this* campaign — not from + predecessor numbers. Provisional shape (numbers TBD from + A1): a matrix cell counts as "X11 faster" if its effective + fps exceeds the campaign's same-cell Wayland median by a + margin larger than the campaign's same-cell Wayland IQR, + and its drops_post_warmup is materially below the same + cell's Wayland median. - The measurement protocol per matrix cell, including how the X11 sessions are entered and how the browser is launched with libva forced on/off. Mirrors the structure of @@ -96,10 +117,10 @@ Pending. ## Discipline carry-overs from `kwin_overlay_subsurface` -- *Replicate the baseline first* — per - `feedback_replicate_baseline_first.md`. Phase 0 task "A1 - baseline" exists specifically because of this lesson; do not - skip it. +- *Campaign-contained data acquisition* — see governing rule + at the top of this file. The strongest version of the + predecessor's "replicate the baseline first" lesson: + **don't import predecessor data at all**, acquire it fresh. - *Phase discipline* — no patches before source-read is documented. Re-scoping must be honest about deferral target. - *Non-upstreaming default* — bug reports + MRs are explicit