# Phase 0 — substrate and locked research question This is the campaign's Phase 0 doc: the locked research question (see § "Research question (LOCKED 2026-05-03)" below), the substrate inherited as *context* from the predecessor `kwin_overlay_subsurface`, and the open questions this campaign answers in-session before Phase 1 binding cells lock. ## Campaign-contained data discipline (governing rule) **This campaign acquires all its own measurement data from scratch in this session.** Predecessor measurement numbers (drop counts, perf percentages, Δ_present medians, kwin %CPU values, threshold values) are documented for *context* but **never imported as binding cells, comparison targets, or success criteria**. Concretely, in this Phase 0 doc: - Numbers from `kwin_overlay_subsurface` Phase 3 / Phase 3-prime (e.g. "C5' had 32 drops total / 22 post-warmup"; "median Δ_present 45.85 ms"; "kwin %CPU median 36.90") are quoted **only as evidence of past measurements that may or may not reproduce**. They establish the SHAPE of what's measurable and the fact that measurement infrastructure works on this hardware. They do **not** establish "this is the baseline the X11 cells will be compared against." - The Phase 0 A1 baseline rep (worklist.md) is the in-session anchor for any with-KWin Wayland measurement this campaign references. - No `metrics.csv` row in this campaign is populated from predecessor data, even if the predecessor measured an identical condition. - If an X11 cell appears to "match the cage Phase 0 number," that's incidental — the cage Phase 0 number is not the test this campaign is running. The test is "X11 vs in-session Wayland baseline." The discipline lesson is concrete: the predecessor's Phase 1 binding cell `drops_post_warmup == 0` was anchored to a single ohm_gl_fix Phase 0 cage measurement (7 drops total / 0 post-warmup). Three weeks of phase planning ran on the assumption that floor existed. At N=3 in-session replication on the closing day of the predecessor, that floor was missing (cage today: 22 / 26 / 56 post-warmup). The campaign closed without patch. **A 30-minute N=3 same-session baseline check on day 1 would have made the campaign different — or made it honestly close earlier.** This campaign acts on that lesson. ## Predecessor close-out summary (context, not data) [`../kwin_overlay_subsurface/phase8_handover.md`](../kwin_overlay_subsurface/phase8_handover.md) (closed 2026-05-03 without patch). Three independent reasons no patch landed (numbers below are quoted as historical context per the governing rule above): 1. The predecessor's locked Phase 1 reference floor (`drops_post_warmup == 0` from cage) was unreachable in the predecessor's closing N=3 measurement session — same chromium-fourier binary, same hardware, same kernel, same Mesa, same kwin-fourier as the original Phase 0 measurement. KWin direct's number reproduced; cage's 0-floor did not. Numbers quoted in the predecessor's `phase3_prime_findings.md`; not used here as a baseline for any cell. 2. The campaign's surface-of-investigation (`wp_subsurface` overlay route) is not engaged by `brave_drops_test.html`. Chromium-fourier renders the video element via internal compositing into its main browser window surface — a single-surface case. 3. The Phase 2 hot-path hypothesis (`glEGLImageTargetTexture2DOES` dominates `kwin_wayland`'s per-frame cost) was rejected by Phase 3 perf measurement with 100×-margin on the wrong side of the threshold. The diagnostic loop terminated at "the campaign's premise was N=1 to begin with, and the N=3 in-session re-measurement doesn't replicate it." This is filed as a feedback memory: *replicate the N=1 baseline at N=3 in the same session BEFORE building multi-phase infrastructure around it*. ## What stays valid from the predecessor Durable substrate listed in `kwin_overlay_subsurface/phase8_handover.md` § "What's left for a future session to pick up": - **Phase 1 scanout-promotion archaeology** (rockchip-drm RK3568 plane format/modifier table, KWin v6.6.4 promotion predicate). Plane 39 (Primary, NV12 LINEAR) is the GL framebuffer; Plane 45 (Overlay) does not advertise NV12 in any modifier. Both KWin scanout-promotion paths are structurally rejected for windowed Brave on this DRM driver. This holds regardless of display server. - **Phase 2 H1 file:line** in `kwin_overlay_subsurface/phase2_source_findings.md`. Cold per Phase 3 measurement; informational only. - **Phase 2-prime Shape C source-read** of `Display::dispatchEvents` and `TransactionFence` in KWin's `src/wayland/`. Specific to the Wayland path; **not relevant to an X11-session campaign**. The X11 path uses different KWin surface plumbing (`kwin_x11`) and a different per-frame protocol (X11 Composite extension + Damage + XPresent), not Wayland protocol dispatch. - **Δ_present-46 ms reproducible side-finding** under Plasma Wayland. Across all measured conditions (chromium-fourier on KWin, chromium-fourier in cage, stock Brave on KWin), median Δ_present was 41-46 ms on a 60 Hz panel — a stable ~2.7-vsync queue depth. This finding is independent of the cage breakdown and **directly testable under X11** as a comparison point. - **Measurement infrastructure**: `kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py` (libwayland 1.21+ format, 17 unit tests passing) + `phase3_prime_runs/run_browser.sh` orchestrator on ohm (handles `WAYLAND_DEBUG=1` capture, perf record, top sampling, drops trajectory extraction, kill-cleanly). **The WAYLAND_DEBUG portion does not apply under X11**; an X11 equivalent would be different tooling (`xtrace`, `xev`, or XCB-debug instrumentation if the client emits any). The perf+top+drops capture portion remains usable under X11 unchanged. ## Current ohm state (carry-over from predecessor) Per `kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md`, not reverted at predecessor close-out: - `qt6-base-fourier 1:6.11.0-3` - `kwin-fourier 1:6.6.4-3` (Wayland-side compositor; not in the hot path under an X11 session) - `mesa 1:26.0.5-1` - CPU governor pinned to `performance` - Baloo permanently disabled - `drm-info 2.9.0-1` - Active session: `startplasma-wayland` on tty2, `kwin_wayland` PID 3927 (as of 2026-05-03 03:05 UTC). - Browser binaries available: `/tmp/chromium-ohm-gl-fix-step2/chrome` (chromium-fourier, Step 1 + Step 2 patches, 149.0.7812.0), `/usr/bin/brave` (`brave-bin 1:1.89.145-1`). If this campaign needs to switch ohm to an X11 session, that is a session-level operator action (logout, switch via SDDM, log back in). It cannot be done unattended. ## Research question (LOCKED 2026-05-03) > *"Does cutting out the KWin compositor enable faster video > display of Brave, chromium-fourier, and Firefox — for full > SW decoding, and for libva decoding (where possible) — on > PineTab2 RK3568?"* ### Mechanism the question targets Operator-supplied context 2026-05-03: > *"hantro emits NV12 which the GPU can't put on a > compositeable plane. So that is the real bottleneck of > Wayland."* This connects directly to the predecessor's Phase 1 finding (`kwin_overlay_subsurface/phase2_source_findings.md`:170-229): - Hantro VPU decodes H.264 video into NV12 dmabufs (`DRM_FORMAT_NV12`, `DRM_FORMAT_MOD_LINEAR`). - rockchip-drm's only NV12-LINEAR-capable plane is the Primary plane (Plane 39 on CRTC 52), which the running KWin uses for its GL framebuffer. - The overlay plane (Plane 45) advertises no NV12 in any modifier in `IN_FORMATS`. - Therefore **no rockchip-drm scanout plane can accept the NV12 buffer hantro produces while KWin owns the primary plane.** Some compositing step must convert NV12 → RGB before display. The predecessor named the *constraint* (Path B rejected at the format/modifier intersection) but the *consequence* — "some component must GL-composite NV12 → RGB on the GPU because nothing else on this hardware can put NV12 on a scanout plane" — was not made explicit. That consequence is this campaign's motivating insight: - **Under Plasma Wayland:** when the browser engages the Wayland subsurface route (chromium's `WaylandBufferManagerHost::CommitOverlays`), KWin receives an NV12 dmabuf and must GL-composite it. **KWin's compositor is the GL-composite step.** When the browser does NOT engage the subsurface route (the predecessor's measured case on `brave_drops_test.html` — zero `wp_subsurface` in the trace), the browser itself converts NV12 → RGB in its own GL context and hands KWin only RGB; KWin then composites the RGB to its primary plane. - **Under X11 without a compositor:** there is no separate compositor process. Two paths are open to the client: - *RGB-composite path* (browser converts NV12 → RGB in its own GL context and presents the RGB result via XPresent/DRI3 to the X server, which schedules a page-flip on the same Primary plane KWin would have used). One fewer hand-off than the Wayland-with-subsurface case but the same GL- composite cost as the no-subsurface Wayland case. - **Hardware-overlay path** (operator-supplied context 2026-05-03: *"a X11 pipeline would route around that by giving a portion of screen real estate directly to the video pipeline"*). The X server allocates the Primary plane (Plane 39, supports NV12 LINEAR) to the video region and the Overlay plane (Plane 45, supports RGB/AFBC) to the rest of the desktop. Hardware-blended at scanout time. **No GL-composite of NV12 anywhere — the cost the operator named as "the real bottleneck" is structurally avoided.** This second X11 path is what Wayland compositors as designed today cannot do on rockchip-drm-class hardware: KWin Wayland *must* own the Primary plane for its compositor framebuffer (because the Wayland model is "compositor presents one merged surface per output"), so it cannot give Plane 39 to a video-region NV12 buffer while putting the rest of the desktop on Plane 45. X11 + non-compositing WM has no such constraint — different windows can be assigned to different planes by the X server's plane allocator. This is the X11 hardware-overlay mechanism that historically made X11 desktops good at video playback (Xv from the late 1990s, and the modern equivalents via DRI3 + XPresent + Composite-redirection-disabled). It is structurally absent in Wayland-with-monolithic-compositor designs. ### Hypothesis the matrix tests There are three potentially separable costs: 1. **The mandatory NV12 → RGB GL conversion**, which is *forced* on Wayland-with-KWin because KWin must own the only NV12-LINEAR-capable plane on this hardware for its compositor framebuffer. **This cost is structurally avoidable** under X11 + non-compositing WM via hardware-plane-overlay (per the operator-supplied insight above). Whether browsers can be coaxed to *use* the X11 hardware-overlay path — rather than internally compositing to RGB before presenting — is browser-specific (see Open questions below). 2. **The fallback GL-composite cost** when the hardware-overlay path doesn't engage. Both Wayland and X11 pay this when the buffer shape doesn't match a plane — it just runs in different processes (KWin under Wayland, browser under X11). 3. **The per-frame compositor overhead** independent of NV12: dmabuf import, transaction apply, presentation-feedback wiring, frame-callback delivery — which the predecessor measured at ~30-37 % of `kwin_wayland`'s CPU during steady-state video playback even when KWin only saw RGB surfaces. The X11 hypothesis is strongest if cost (1) is dominant on the matrix's with-KWin cells AND the X11 cells trigger the hardware-overlay path. The X11 hypothesis is weakest if cost (1) is small and cost (3) is small — in which case the "cutting out KWin" experiment would show only marginal differences. The matrix below is designed to surface which of (1) (2) (3) dominates per browser × decode path. "Faster video display" is operationally **a combination of**: - **Effective fps actually rendered** (= `getVideoPlaybackQuality().totalVideoFrames / elapsed_s` for a 30 fps source — the upper bound is 30; the question is how close). - **Drop count** over the same 70 s window (`droppedVideoFrames`). - **End-to-end latency** if testable (commit → present; testable on Wayland via `wp_presentation_feedback`, testable on X11 via `XPresent` extension or `RandR` vblank events; protocol-side measurement under each display-server). - **Compositor + browser CPU at steady state** (the cost saved by cutting the compositor is the upper bound on the patch-payoff if a future campaign tries to optimise the compositor instead of removing it). ### Experimental matrix Six 2-axis cells (3 browsers × 2 decode paths) × 2 session conditions (with-KWin / without-KWin): | Browser | Decode | with-KWin (Plasma Wayland) | without-KWin (X11 session, no compositor) | |---|---|---|---| | Brave 147 | full SW | C-W-brave-sw | C-X-brave-sw | | Brave 147 | libva (if it works) | C-W-brave-libva | C-X-brave-libva | | chromium-fourier 149 (Step 1 + Step 2) | full SW | C-W-chrf-sw | C-X-chrf-sw | | chromium-fourier 149 | libva (Step 1 enables it) | C-W-chrf-libva | C-X-chrf-libva | | Firefox | full SW | C-W-ff-sw | C-X-ff-sw | | Firefox | libva | C-W-ff-libva | C-X-ff-libva | The "(if it works)" / "where possible" qualifier per the operator's directive: libva on rockchip-drm RK3568 only works on chromium-fourier (Step 1 ports `libva-v4l2-request`); for stock Brave 147 and stock Firefox, libva probably doesn't engage and those cells are documented N/A. For Firefox, the Mesa-side `libva-v4l2-request` may make libva work via Mozilla's VAAPI backend even on stock Firefox — to be verified in Phase 0 inventory. ### What "cutting out the KWin compositor" means This campaign uses **X11 session with no compositor in the display path** as the "without-KWin" cell. Specifically: - Native Xorg server, NOT XWayland (XWayland would still go through KWin for display, defeating the purpose). - Window manager that does NOT composite by default — e.g. openbox, fluxbox, xfwm4-with-compositing-off, i3, twm. Plasma X11 uses `kwin_x11` as compositing WM, which is still a "KWin compositor" — it does not satisfy "cut KWin out" and is **excluded** from the without-KWin cell. - Browser windowed (not fullscreen). Even on a non-compositing WM, fullscreen browsers may engage XPresent direct presentation paths — testing windowed isolates the baseline non-compositor windowed display path. The exact WM choice is a Phase 0 inventory decision (which WMs are available on ohm, which install cleanly, which SDDM-advertised sessions exist). Default candidate: openbox. ### Three plausible outcome shapes - **(α)** Without-KWin is materially faster across all 6 cells: confirms the KWin compositor cost is a real bottleneck on this hardware, and X11-session-without- compositor becomes the recommended daily-driver configuration for video work on PineTab2. - **(β)** Without-KWin is comparable or only marginally faster: the compositor isn't the bottleneck; the drop phenomenon is hardware/kernel/Mesa-bound, and the predecessor's Phase 8 closure stands. - **(γ)** Mixed picture per browser × decode path: e.g. libva paths benefit but SW paths don't; or Firefox benefits but chromium-class clients don't. Each cell becomes its own characterisation. ### Open questions before Phase 1 lock The hardware-overlay-path mechanism is structurally available on X11 + non-compositing WM. Whether it actually engages for each of the three browsers is browser-specific and currently unknown: - **Brave / Chromium ozone-x11**: Chromium has overlay-support code (`OverlayProcessor`, `GpuMemoryBufferManager`, `DCOMPSurface` on Windows; on Linux X11 the path is via XPresent + DMA-BUF + `OverlayCandidate`). Whether Brave 147 / chromium-fourier 149 actually request hardware-overlay presentation for a windowed video element under X11 is open. - **Firefox**: VAAPIVideoDecoder backend produces hardware decoded NV12 dmabufs that the GL compositor consumes internally. Whether Firefox's X11 backend has a path to hand the dmabuf to the X server for hardware-overlay presentation (rather than internally composing to RGB) is open. Mozilla has a `MOZ_X11_EGL` hint and a "hardware video overlay" pref but these are not universally engaged. - **Reference clients**: mpv with `--vo=xv` or `--vo=gpu --hwdec=auto-copy --gpu-context=x11`, or `gst-play-1.0` with `xvimagesink` or `glimagesink`, are known-good X11 hardware-overlay paths. **Adding mpv to the matrix as a reference client** would isolate "does the X11 hardware- overlay path work AT ALL on this hardware" from "do browsers actually use it." If mpv hardware-overlays cleanly but browsers don't, the conclusion is "the X11 path is fast, but browsers leave the speedup on the table." If the operator agrees, Phase 0 inventory should: 1. Verify Plane 39's NV12-LINEAR availability is reachable to X11 clients (it is for KWin Wayland; should be for X11 too since Plane 39 is just a DRM resource), and identify which X11 path actually programs it (modesetting Xorg driver + `Option "PageFlip" "true"`, or DRI3-presented buffer ending up on Plane 39 via the X server's plane allocator). 2. Inventory Brave's, chromium-fourier's, and Firefox's X11 overlay-presentation paths to see which (if any) request hardware-overlay presentation. 3. Add mpv as a reference X11-overlay client to the matrix, so the campaign has a known-good comparison point. ### What this question does NOT cover For clarity, since the predecessor was specifically about the Wayland-overlay-subsurface composite path: - This campaign is **not** investigating the wp_subsurface route. The Wayland-cell of the matrix (with-KWin) measures whatever browser configuration produces under the existing Plasma Wayland session — windowed, default profile, default flags. It's a measurement of the as-shipped Plasma Wayland stack from the user's perspective, not a probe of a specific KWin code path. - The Δ_present-46 ms finding from the predecessor is testable as a free side-finding under both axes (Wayland and X11) but is not the campaign's primary question. - Daily-driver fitness (apps that break under X11, touchscreen behavior, multi-monitor edge cases, etc.) is **not in scope**. The campaign's deliverable is the matrix above; if any cell is decisively faster, daily-driver-fitness becomes a follow-up campaign. ## What's NOT in scope (working assumption) Until the research question is confirmed, the following are treated as out of scope so they don't slip into Phase 1 prematurely: - Patches to KWin, Xorg, kwin-fourier, qt6-base-fourier, or any other component on ohm. This is **research**, not patch-development. Per non-upstreaming default, MR/bug-report filing is explicitly tasked and not scheduled here. - The Δ_present-46 ms finding's investigation. It's a known hook from the predecessor; whether this campaign chases it depends on the locked research question. - Reverting predecessor tooling state. Governor, baloo, `qt6-base-fourier`, `kwin-fourier` stay as-is unless the operator decides otherwise. - File a bug for any of the predecessor's three documented candidate findings. Same non-upstreaming default applies. ## What Phase 0 will deliver, regardless of framing Even before the research question is locked, the following are useful Phase 0 deliverables that don't depend on the specific question: 1. **State snapshot of ohm under current Plasma Wayland** captured at campaign start. This is the *before* photo for any future X11 vs Wayland comparison. Unattended-tractable (just scripted SSH). 2. **Inventory of available X11 paths on ohm**: what packages are installed, what session candidates SDDM advertises, what would need to be installed to enable a Plasma X11 session, what alternate WMs are available. Read-only, unattended-tractable. 3. **Inventory of measurement instruments that work under X11**: `xtrace`, `xprop`, `xrandr --verbose --query`, perf on `Xorg` PID, frame-timing extraction options. Read-only. 4. **A1 baseline** under current Plasma Wayland: re-run a single rep of the predecessor's `kwin_timing_nodebug` condition immediately at the start of this campaign, so the comparison Wayland-vs-X11 has a same-session anchor. This is the "set the baseline before instrument changes" discipline from `feedback_replicate_baseline_first.md`. These steps are unblocked. They don't commit to a specific research question and they produce evidence that's useful under any of the candidate framings.