Files
x11-session-research/phase0_findings.md
T
marfrit e8bae670d3 Campaign-contained data discipline + repo notice
Operator directive 2026-05-03: this campaign acquires all its
own measurement data from scratch. Predecessor numbers are
documented for context but never imported as binding cells,
comparison targets, or success thresholds. The lesson the
predecessor (kwin_overlay_subsurface) closed-without-patch on
is exactly that: phase 1 cells anchored to a single historical
ohm_gl_fix Phase 0 measurement, three weeks of phase planning
on a baseline that didn't reproduce in-session.

The strongest version of feedback_replicate_baseline_first.md:
"don't import predecessor data, acquire it fresh." The discipline
is now documented as a governing rule in three places:

- README.md § "Campaign-contained data discipline"
- phase0_findings.md § "Campaign-contained data discipline
  (governing rule)"
- worklist.md § "Governing rule (every phase)"

Concrete consequences:
- A1 baseline (Phase 0 task) is now mandatory at N=3 reps.
  Single-rep wasn't enough to surface session variance in the
  predecessor; doing 3 up front makes the baseline robust to
  the same kind of session-state drift that ate the
  predecessor's premise.
- Phase 1 thresholds are drawn against the A1 baseline measured
  in this campaign, not against any predecessor number.
- metrics.csv (when it lands) only carries data from this
  campaign's reps. No predecessor rows imported.

README.md additionally:
- Adds the predecessor chain (ohm_gl_fix -> kwin_overlay_subsurface
  -> this campaign) with explicit "what stays valid for source-
  reading" vs "numbers that don't" separation.
- Calls out durable substrate available from predecessors:
  KWin scanout-promotion archaeology, measurement-protocol
  template, WAYLAND_DEBUG parser. All structural; none
  measurement-numerical.
- Carry-over predecessor system state on ohm (governor pin,
  baloo disabled, fourier packages) is explicitly distinguished
  from measurement data. System state inherits; data does not.
- Repository line points to the gitea remote
  ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git

phase0_findings.md additionally:
- Reframes the predecessor-close-out summary section header to
  "(context, not data)" and rephrases past-tense numbers so
  none are stated as "the baseline."
- Adds the discipline lesson narrative in-line before the
  predecessor close-out: a 30-minute N=3 same-session baseline
  on day 1 of the predecessor would have made the campaign
  different — and that's the move this campaign starts with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 06:59:06 +00:00

452 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — substrate and locked research question
This is the campaign's Phase 0 doc: the locked research
question (see § "Research question (LOCKED 2026-05-03)"
below), the substrate inherited as *context* from the
predecessor `kwin_overlay_subsurface`, and the open questions
this campaign answers in-session before Phase 1 binding cells
lock.
## Campaign-contained data discipline (governing rule)
**This campaign acquires all its own measurement data from
scratch in this session.** Predecessor measurement numbers
(drop counts, perf percentages, Δ_present medians, kwin %CPU
values, threshold values) are documented for *context* but
**never imported as binding cells, comparison targets, or
success criteria**.
Concretely, in this Phase 0 doc:
- Numbers from `kwin_overlay_subsurface` Phase 3 / Phase 3-prime
(e.g. "C5' had 32 drops total / 22 post-warmup"; "median
Δ_present 45.85 ms"; "kwin %CPU median 36.90") are quoted
**only as evidence of past measurements that may or may not
reproduce**. They establish the SHAPE of what's measurable
and the fact that measurement infrastructure works on this
hardware. They do **not** establish "this is the baseline
the X11 cells will be compared against."
- The Phase 0 A1 baseline rep (worklist.md) is the
in-session anchor for any with-KWin Wayland measurement
this campaign references.
- No `metrics.csv` row in this campaign is populated from
predecessor data, even if the predecessor measured an
identical condition.
- If an X11 cell appears to "match the cage Phase 0 number,"
that's incidental — the cage Phase 0 number is not the test
this campaign is running. The test is "X11 vs in-session
Wayland baseline."
The discipline lesson is concrete: the predecessor's Phase 1
binding cell `drops_post_warmup == 0` was anchored to a single
ohm_gl_fix Phase 0 cage measurement (7 drops total / 0
post-warmup). Three weeks of phase planning ran on the
assumption that floor existed. At N=3 in-session replication
on the closing day of the predecessor, that floor was missing
(cage today: 22 / 26 / 56 post-warmup). The campaign closed
without patch. **A 30-minute N=3 same-session baseline check
on day 1 would have made the campaign different — or made it
honestly close earlier.** This campaign acts on that lesson.
## Predecessor close-out summary (context, not data)
[`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md)
(closed 2026-05-03 without patch). Three independent reasons
no patch landed (numbers below are quoted as historical
context per the governing rule above):
1. The predecessor's locked Phase 1 reference floor
(`drops_post_warmup == 0` from cage) was unreachable in the
predecessor's closing N=3 measurement session — same
chromium-fourier binary, same hardware, same kernel, same
Mesa, same kwin-fourier as the original Phase 0 measurement.
KWin direct's number reproduced; cage's 0-floor did not.
Numbers quoted in the predecessor's `phase3_prime_findings.md`;
not used here as a baseline for any cell.
2. The campaign's surface-of-investigation
(`wp_subsurface` overlay route) is not engaged by
`brave_drops_test.html`. Chromium-fourier renders the video
element via internal compositing into its main browser
window surface — a single-surface case.
3. The Phase 2 hot-path hypothesis
(`glEGLImageTargetTexture2DOES` dominates `kwin_wayland`'s
per-frame cost) was rejected by Phase 3 perf measurement
with 100×-margin on the wrong side of the threshold.
The diagnostic loop terminated at "the campaign's premise was
N=1 to begin with, and the N=3 in-session re-measurement
doesn't replicate it." This is filed as a feedback memory:
*replicate the N=1 baseline at N=3 in the same session BEFORE
building multi-phase infrastructure around it*.
## What stays valid from the predecessor
Durable substrate listed in
`kwin_overlay_subsurface/phase8_handover.md` § "What's left
for a future session to pick up":
- **Phase 1 scanout-promotion archaeology** (rockchip-drm
RK3568 plane format/modifier table, KWin v6.6.4 promotion
predicate). Plane 39 (Primary, NV12 LINEAR) is the GL
framebuffer; Plane 45 (Overlay) does not advertise NV12 in
any modifier. Both KWin scanout-promotion paths are
structurally rejected for windowed Brave on this DRM driver.
This holds regardless of display server.
- **Phase 2 H1 file:line** in
`kwin_overlay_subsurface/phase2_source_findings.md`. Cold
per Phase 3 measurement; informational only.
- **Phase 2-prime Shape C source-read** of
`Display::dispatchEvents` and `TransactionFence` in KWin's
`src/wayland/`. Specific to the Wayland path; **not relevant
to an X11-session campaign**. The X11 path uses different
KWin surface plumbing (`kwin_x11`) and a different per-frame
protocol (X11 Composite extension + Damage + XPresent), not
Wayland protocol dispatch.
- **Δ_present-46 ms reproducible side-finding** under Plasma
Wayland. Across all measured conditions (chromium-fourier on
KWin, chromium-fourier in cage, stock Brave on KWin), median
Δ_present was 41-46 ms on a 60 Hz panel — a stable
~2.7-vsync queue depth. This finding is independent of the
cage breakdown and **directly testable under X11** as a
comparison point.
- **Measurement infrastructure**:
`kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py`
(libwayland 1.21+ format, 17 unit tests passing) +
`phase3_prime_runs/run_browser.sh` orchestrator on ohm
(handles `WAYLAND_DEBUG=1` capture, perf record, top
sampling, drops trajectory extraction, kill-cleanly). **The
WAYLAND_DEBUG portion does not apply under X11**; an X11
equivalent would be different tooling (`xtrace`, `xev`, or
XCB-debug instrumentation if the client emits any). The
perf+top+drops capture portion remains usable under X11
unchanged.
## Current ohm state (carry-over from predecessor)
Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`,
not reverted at predecessor close-out:
- `qt6-base-fourier 1:6.11.0-3`
- `kwin-fourier 1:6.6.4-3` (Wayland-side compositor; not in
the hot path under an X11 session)
- `mesa 1:26.0.5-1`
- CPU governor pinned to `performance`
- Baloo permanently disabled
- `drm-info 2.9.0-1`
- Active session: `startplasma-wayland` on tty2,
`kwin_wayland` PID 3927 (as of 2026-05-03 03:05 UTC).
- Browser binaries available: `/tmp/chromium-ohm-gl-fix-step2/chrome`
(chromium-fourier, Step 1 + Step 2 patches, 149.0.7812.0),
`/usr/bin/brave` (`brave-bin 1:1.89.145-1`).
If this campaign needs to switch ohm to an X11 session, that
is a session-level operator action (logout, switch via SDDM,
log back in). It cannot be done unattended.
## Research question (LOCKED 2026-05-03)
> *"Does cutting out the KWin compositor enable faster video
> display of Brave, chromium-fourier, and Firefox — for full
> SW decoding, and for libva decoding (where possible) — on
> PineTab2 RK3568?"*
### Mechanism the question targets
Operator-supplied context 2026-05-03:
> *"hantro emits NV12 which the GPU can't put on a
> compositeable plane. So that is the real bottleneck of
> Wayland."*
This connects directly to the predecessor's Phase 1 finding
(`kwin_overlay_subsurface/phase2_source_findings.md`:170-229):
- Hantro VPU decodes H.264 video into NV12 dmabufs (`DRM_FORMAT_NV12`,
`DRM_FORMAT_MOD_LINEAR`).
- rockchip-drm's only NV12-LINEAR-capable plane is the
Primary plane (Plane 39 on CRTC 52), which the running KWin
uses for its GL framebuffer.
- The overlay plane (Plane 45) advertises no NV12 in any
modifier in `IN_FORMATS`.
- Therefore **no rockchip-drm scanout plane can accept the
NV12 buffer hantro produces while KWin owns the primary
plane.** Some compositing step must convert NV12 → RGB
before display.
The predecessor named the *constraint* (Path B rejected at
the format/modifier intersection) but the *consequence*
"some component must GL-composite NV12 → RGB on the GPU
because nothing else on this hardware can put NV12 on a
scanout plane" — was not made explicit. That consequence is
this campaign's motivating insight:
- **Under Plasma Wayland:** when the browser engages the
Wayland subsurface route (chromium's
`WaylandBufferManagerHost::CommitOverlays`), KWin receives
an NV12 dmabuf and must GL-composite it. **KWin's compositor
is the GL-composite step.** When the browser does NOT
engage the subsurface route (the predecessor's measured
case on `brave_drops_test.html` — zero `wp_subsurface` in
the trace), the browser itself converts NV12 → RGB in its
own GL context and hands KWin only RGB; KWin then composites
the RGB to its primary plane.
- **Under X11 without a compositor:** there is no separate
compositor process. Two paths are open to the client:
- *RGB-composite path* (browser converts NV12 → RGB in its
own GL context and presents the RGB result via XPresent/DRI3
to the X server, which schedules a page-flip on the same
Primary plane KWin would have used). One fewer hand-off
than the Wayland-with-subsurface case but the same GL-
composite cost as the no-subsurface Wayland case.
- **Hardware-overlay path** (operator-supplied context
2026-05-03: *"a X11 pipeline would route around that by
giving a portion of screen real estate directly to the
video pipeline"*). The X server allocates the Primary
plane (Plane 39, supports NV12 LINEAR) to the video
region and the Overlay plane (Plane 45, supports
RGB/AFBC) to the rest of the desktop. Hardware-blended
at scanout time. **No GL-composite of NV12 anywhere —
the cost the operator named as "the real bottleneck"
is structurally avoided.**
This second X11 path is what Wayland compositors as
designed today cannot do on rockchip-drm-class hardware: KWin
Wayland *must* own the Primary plane for its compositor
framebuffer (because the Wayland model is "compositor presents
one merged surface per output"), so it cannot give Plane 39
to a video-region NV12 buffer while putting the rest of the
desktop on Plane 45. X11 + non-compositing WM has no such
constraint — different windows can be assigned to different
planes by the X server's plane allocator.
This is the X11 hardware-overlay mechanism that historically
made X11 desktops good at video playback (Xv from the late
1990s, and the modern equivalents via DRI3 + XPresent +
Composite-redirection-disabled). It is structurally absent
in Wayland-with-monolithic-compositor designs.
### Hypothesis the matrix tests
There are three potentially separable costs:
1. **The mandatory NV12 → RGB GL conversion**, which is
*forced* on Wayland-with-KWin because KWin must own the
only NV12-LINEAR-capable plane on this hardware for its
compositor framebuffer. **This cost is structurally
avoidable** under X11 + non-compositing WM via
hardware-plane-overlay (per the operator-supplied insight
above). Whether browsers can be coaxed to *use* the X11
hardware-overlay path — rather than internally compositing
to RGB before presenting — is browser-specific (see Open
questions below).
2. **The fallback GL-composite cost** when the
hardware-overlay path doesn't engage. Both Wayland and X11
pay this when the buffer shape doesn't match a plane —
it just runs in different processes (KWin under Wayland,
browser under X11).
3. **The per-frame compositor overhead** independent of NV12:
dmabuf import, transaction apply, presentation-feedback
wiring, frame-callback delivery — which the predecessor
measured at ~30-37 % of `kwin_wayland`'s CPU during
steady-state video playback even when KWin only saw RGB
surfaces.
The X11 hypothesis is strongest if cost (1) is dominant on
the matrix's with-KWin cells AND the X11 cells trigger the
hardware-overlay path. The X11 hypothesis is weakest if
cost (1) is small and cost (3) is small — in which case the
"cutting out KWin" experiment would show only marginal
differences.
The matrix below is designed to surface which of (1) (2) (3)
dominates per browser × decode path.
"Faster video display" is operationally **a combination of**:
- **Effective fps actually rendered** (= `getVideoPlaybackQuality().totalVideoFrames / elapsed_s`
for a 30 fps source — the upper bound is 30; the question is
how close).
- **Drop count** over the same 70 s window (`droppedVideoFrames`).
- **End-to-end latency** if testable (commit → present;
testable on Wayland via `wp_presentation_feedback`,
testable on X11 via `XPresent` extension or `RandR` vblank
events; protocol-side measurement under each
display-server).
- **Compositor + browser CPU at steady state** (the cost
saved by cutting the compositor is the upper bound on the
patch-payoff if a future campaign tries to optimise the
compositor instead of removing it).
### Experimental matrix
Six 2-axis cells (3 browsers × 2 decode paths) × 2
session conditions (with-KWin / without-KWin):
| Browser | Decode | with-KWin (Plasma Wayland) | without-KWin (X11 session, no compositor) |
|---|---|---|---|
| Brave 147 | full SW | C-W-brave-sw | C-X-brave-sw |
| Brave 147 | libva (if it works) | C-W-brave-libva | C-X-brave-libva |
| chromium-fourier 149 (Step 1 + Step 2) | full SW | C-W-chrf-sw | C-X-chrf-sw |
| chromium-fourier 149 | libva (Step 1 enables it) | C-W-chrf-libva | C-X-chrf-libva |
| Firefox | full SW | C-W-ff-sw | C-X-ff-sw |
| Firefox | libva | C-W-ff-libva | C-X-ff-libva |
The "(if it works)" / "where possible" qualifier per the
operator's directive: libva on rockchip-drm RK3568 only works
on chromium-fourier (Step 1 ports `libva-v4l2-request`); for
stock Brave 147 and stock Firefox, libva probably doesn't
engage and those cells are documented N/A. For Firefox, the
Mesa-side `libva-v4l2-request` may make libva work via Mozilla's
VAAPI backend even on stock Firefox — to be verified in
Phase 0 inventory.
### What "cutting out the KWin compositor" means
This campaign uses **X11 session with no compositor in the
display path** as the "without-KWin" cell. Specifically:
- Native Xorg server, NOT XWayland (XWayland would still go
through KWin for display, defeating the purpose).
- Window manager that does NOT composite by default — e.g.
openbox, fluxbox, xfwm4-with-compositing-off, i3, twm.
Plasma X11 uses `kwin_x11` as compositing WM, which is
still a "KWin compositor" — it does not satisfy "cut KWin
out" and is **excluded** from the without-KWin cell.
- Browser windowed (not fullscreen). Even on a non-compositing
WM, fullscreen browsers may engage XPresent direct
presentation paths — testing windowed isolates the
baseline non-compositor windowed display path.
The exact WM choice is a Phase 0 inventory decision (which
WMs are available on ohm, which install cleanly, which
SDDM-advertised sessions exist). Default candidate: openbox.
### Three plausible outcome shapes
- **(α)** Without-KWin is materially faster across all 6
cells: confirms the KWin compositor cost is a real
bottleneck on this hardware, and X11-session-without-
compositor becomes the recommended daily-driver
configuration for video work on PineTab2.
- **(β)** Without-KWin is comparable or only marginally
faster: the compositor isn't the bottleneck; the drop
phenomenon is hardware/kernel/Mesa-bound, and the
predecessor's Phase 8 closure stands.
- **(γ)** Mixed picture per browser × decode path: e.g.
libva paths benefit but SW paths don't; or Firefox benefits
but chromium-class clients don't. Each cell becomes its own
characterisation.
### Open questions before Phase 1 lock
The hardware-overlay-path mechanism is structurally available
on X11 + non-compositing WM. Whether it actually engages for
each of the three browsers is browser-specific and currently
unknown:
- **Brave / Chromium ozone-x11**: Chromium has overlay-support
code (`OverlayProcessor`, `GpuMemoryBufferManager`,
`DCOMPSurface` on Windows; on Linux X11 the path is via
XPresent + DMA-BUF + `OverlayCandidate`). Whether Brave
147 / chromium-fourier 149 actually request hardware-overlay
presentation for a windowed video element under X11 is open.
- **Firefox**: VAAPIVideoDecoder backend produces hardware
decoded NV12 dmabufs that the GL compositor consumes
internally. Whether Firefox's X11 backend has a path to
hand the dmabuf to the X server for hardware-overlay
presentation (rather than internally composing to RGB) is
open. Mozilla has a `MOZ_X11_EGL` hint and a "hardware video
overlay" pref but these are not universally engaged.
- **Reference clients**: mpv with `--vo=xv` or
`--vo=gpu --hwdec=auto-copy --gpu-context=x11`, or `gst-play-1.0`
with `xvimagesink` or `glimagesink`, are known-good X11
hardware-overlay paths. **Adding mpv to the matrix as a
reference client** would isolate "does the X11 hardware-
overlay path work AT ALL on this hardware" from "do
browsers actually use it." If mpv hardware-overlays cleanly
but browsers don't, the conclusion is "the X11 path is fast,
but browsers leave the speedup on the table."
If the operator agrees, Phase 0 inventory should:
1. Verify Plane 39's NV12-LINEAR availability is reachable to
X11 clients (it is for KWin Wayland; should be for X11 too
since Plane 39 is just a DRM resource), and identify which
X11 path actually programs it (modesetting Xorg driver +
`Option "PageFlip" "true"`, or DRI3-presented buffer ending
up on Plane 39 via the X server's plane allocator).
2. Inventory Brave's, chromium-fourier's, and Firefox's X11
overlay-presentation paths to see which (if any) request
hardware-overlay presentation.
3. Add mpv as a reference X11-overlay client to the matrix,
so the campaign has a known-good comparison point.
### What this question does NOT cover
For clarity, since the predecessor was specifically about
the Wayland-overlay-subsurface composite path:
- This campaign is **not** investigating the wp_subsurface
route. The Wayland-cell of the matrix (with-KWin) measures
whatever browser configuration produces under the existing
Plasma Wayland session — windowed, default profile, default
flags. It's a measurement of the as-shipped Plasma Wayland
stack from the user's perspective, not a probe of a
specific KWin code path.
- The Δ_present-46 ms finding from the predecessor is
testable as a free side-finding under both axes (Wayland
and X11) but is not the campaign's primary question.
- Daily-driver fitness (apps that break under X11, touchscreen
behavior, multi-monitor edge cases, etc.) is **not in
scope**. The campaign's deliverable is the matrix above; if
any cell is decisively faster, daily-driver-fitness becomes
a follow-up campaign.
## What's NOT in scope (working assumption)
Until the research question is confirmed, the following are
treated as out of scope so they don't slip into Phase 1
prematurely:
- Patches to KWin, Xorg, kwin-fourier, qt6-base-fourier, or any
other component on ohm. This is **research**, not
patch-development. Per non-upstreaming default, MR/bug-report
filing is explicitly tasked and not scheduled here.
- The Δ_present-46 ms finding's investigation. It's a known
hook from the predecessor; whether this campaign chases it
depends on the locked research question.
- Reverting predecessor tooling state. Governor, baloo,
`qt6-base-fourier`, `kwin-fourier` stay as-is unless the
operator decides otherwise.
- File a bug for any of the predecessor's three documented
candidate findings. Same non-upstreaming default applies.
## What Phase 0 will deliver, regardless of framing
Even before the research question is locked, the following are
useful Phase 0 deliverables that don't depend on the specific
question:
1. **State snapshot of ohm under current Plasma Wayland**
captured at campaign start. This is the *before* photo for
any future X11 vs Wayland comparison. Unattended-tractable
(just scripted SSH).
2. **Inventory of available X11 paths on ohm**: what packages
are installed, what session candidates SDDM advertises,
what would need to be installed to enable a Plasma X11
session, what alternate WMs are available. Read-only,
unattended-tractable.
3. **Inventory of measurement instruments that work under
X11**: `xtrace`, `xprop`, `xrandr --verbose --query`, perf
on `Xorg` PID, frame-timing extraction options. Read-only.
4. **A1 baseline** under current Plasma Wayland: re-run a
single rep of the predecessor's `kwin_timing_nodebug`
condition immediately at the start of this campaign, so
the comparison Wayland-vs-X11 has a same-session anchor.
This is the "set the baseline before instrument changes"
discipline from `feedback_replicate_baseline_first.md`.
These steps are unblocked. They don't commit to a specific
research question and they produce evidence that's useful
under any of the candidate framings.