Operator-supplied research question 2026-05-03: "Does cutting out the KWin compositor enable faster video display of Brave, chromium-fourier, and Firefox — for full SW decoding, and for libva decoding (where possible) — on PineTab2 RK3568?" Operator-supplied mechanism 2026-05-03 (two messages): 1. "hantro emits NV12 which the GPU can't put on a compositeable plane. So that is the real bottleneck of Wayland." Connects directly to predecessor's Phase 1 finding (kwin_overlay_subsurface/phase2_source_findings.md:170-229): rockchip-drm overlay Plane 45 advertises no NV12 modifier; Primary Plane 39 supports NV12 LINEAR but is owned by KWin for its compositor framebuffer. Predecessor named the constraint but not the consequence — the consequence is that NV12 → RGB GL-composite is forced on Wayland-with-KWin regardless of which protocol path the browser uses. 2. "A X11 pipeline would route around that by giving a portion of screen real estate directly to the video pipeline." The X11 hardware-overlay path: with X11 + non-compositing WM, the X server can allocate Plane 39 (NV12 LINEAR) to the video region and Plane 45 (RGB AFBC) to the rest of the desktop. Hardware-blended at scanout. NO GL-composite anywhere — the cost the operator named as "the real bottleneck" is structurally avoided. This is the X11 hardware-overlay mechanism that historically made X11 desktops good at video playback (Xv → modern DRI3 + XPresent + Composite-redirection-disabled). Wayland-with-monolithic-compositor designs cannot use this freedom: the compositor must own the Primary plane, so the plane-allocation freedom required to put NV12 video on Plane 39 alongside RGB chrome on Plane 45 isn't available. phase0_findings.md updated with: - Locked research question + 12-cell experimental matrix (3 browsers × 2 decode paths × 2 sessions; some N/A). - Three separable cost components the matrix tests for (mandatory NV12→RGB GL conversion if hardware-overlay doesn't engage, fallback GL-composite, per-frame compositor overhead independent of NV12). - Open questions about whether browsers actually request hardware-overlay presentation under X11, or whether they always internally composite to RGB. - Recommendation to add mpv as a reference X11-overlay client: distinguishes "X11 path works on this hardware" from "browsers actually use the X11 path." worklist.md updated: - Phase 0 motivation + matrix items ticked. - Pre-Phase-1 inventory broken out: state snapshot, X11 path inventory, browser-overlay-path inventory, mpv reference, X11 measurement-tool inventory, A1 Wayland baseline anchor. - Phase 1 sketch: binding cells per matrix cell, clear-pass / clear-fail thresholds, measurement protocol mirroring the predecessor's phase3_protocol.md structure. README banner updated to reflect locked motivation + mechanism summary + matrix shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Phase 0 — substrate and provisional research question
This is the campaign's Phase 0 substrate doc: what we already
know from the predecessor kwin_overlay_subsurface close-out,
what's open, and what the candidate research question looks
like. The research question is provisional and awaits
operator confirmation before Phase 1 lock.
Predecessor close-out summary
../kwin_overlay_subsurface/phase8_handover.md
(closed 2026-05-03 without patch). Three independent reasons
no patch landed:
- The campaign's locked Phase 1 reference floor
(
drops_post_warmup == 0from cage) is unreachable at N=3 today. Today's median is 26 post-warmup with the same chromium-fourier binary, same hardware, same kernel, same Mesa, same kwin-fourier — KWin direct reproduces Phase 0's 29 post-warmup, but cage now also drops ~22-56 post-warmup instead of Phase 0's 0. - The campaign's surface-of-investigation
(
wp_subsurfaceoverlay route) is not engaged bybrave_drops_test.html. Chromium-fourier renders the video element via internal compositing into its main browser window surface — a single-surface case. - The Phase 2 hot-path hypothesis
(
glEGLImageTargetTexture2DOESdominateskwin_wayland's per-frame cost) was rejected by Phase 3 perf measurement with 100×-margin on the wrong side of the threshold.
The diagnostic loop terminated at "the campaign's premise was N=1 to begin with, and the N=3 in-session re-measurement doesn't replicate it." This is filed as a feedback memory: replicate the N=1 baseline at N=3 in the same session BEFORE building multi-phase infrastructure around it.
What stays valid from the predecessor
Durable substrate listed in
kwin_overlay_subsurface/phase8_handover.md § "What's left
for a future session to pick up":
- Phase 1 scanout-promotion archaeology (rockchip-drm RK3568 plane format/modifier table, KWin v6.6.4 promotion predicate). Plane 39 (Primary, NV12 LINEAR) is the GL framebuffer; Plane 45 (Overlay) does not advertise NV12 in any modifier. Both KWin scanout-promotion paths are structurally rejected for windowed Brave on this DRM driver. This holds regardless of display server.
- Phase 2 H1 file:line in
kwin_overlay_subsurface/phase2_source_findings.md. Cold per Phase 3 measurement; informational only. - Phase 2-prime Shape C source-read of
Display::dispatchEventsandTransactionFencein KWin'ssrc/wayland/. Specific to the Wayland path; not relevant to an X11-session campaign. The X11 path uses different KWin surface plumbing (kwin_x11) and a different per-frame protocol (X11 Composite extension + Damage + XPresent), not Wayland protocol dispatch. - Δ_present-46 ms reproducible side-finding under Plasma Wayland. Across all measured conditions (chromium-fourier on KWin, chromium-fourier in cage, stock Brave on KWin), median Δ_present was 41-46 ms on a 60 Hz panel — a stable ~2.7-vsync queue depth. This finding is independent of the cage breakdown and directly testable under X11 as a comparison point.
- Measurement infrastructure:
kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py(libwayland 1.21+ format, 17 unit tests passing) +phase3_prime_runs/run_browser.shorchestrator on ohm (handlesWAYLAND_DEBUG=1capture, perf record, top sampling, drops trajectory extraction, kill-cleanly). The WAYLAND_DEBUG portion does not apply under X11; an X11 equivalent would be different tooling (xtrace,xev, or XCB-debug instrumentation if the client emits any). The perf+top+drops capture portion remains usable under X11 unchanged.
Current ohm state (carry-over from predecessor)
Per kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md,
not reverted at predecessor close-out:
qt6-base-fourier 1:6.11.0-3kwin-fourier 1:6.6.4-3(Wayland-side compositor; not in the hot path under an X11 session)mesa 1:26.0.5-1- CPU governor pinned to
performance - Baloo permanently disabled
drm-info 2.9.0-1- Active session:
startplasma-waylandon tty2,kwin_waylandPID 3927 (as of 2026-05-03 03:05 UTC). - Browser binaries available:
/tmp/chromium-ohm-gl-fix-step2/chrome(chromium-fourier, Step 1 + Step 2 patches, 149.0.7812.0),/usr/bin/brave(brave-bin 1:1.89.145-1).
If this campaign needs to switch ohm to an X11 session, that is a session-level operator action (logout, switch via SDDM, log back in). It cannot be done unattended.
Research question (LOCKED 2026-05-03)
"Does cutting out the KWin compositor enable faster video display of Brave, chromium-fourier, and Firefox — for full SW decoding, and for libva decoding (where possible) — on PineTab2 RK3568?"
Mechanism the question targets
Operator-supplied context 2026-05-03:
"hantro emits NV12 which the GPU can't put on a compositeable plane. So that is the real bottleneck of Wayland."
This connects directly to the predecessor's Phase 1 finding
(kwin_overlay_subsurface/phase2_source_findings.md:170-229):
- Hantro VPU decodes H.264 video into NV12 dmabufs (
DRM_FORMAT_NV12,DRM_FORMAT_MOD_LINEAR). - rockchip-drm's only NV12-LINEAR-capable plane is the Primary plane (Plane 39 on CRTC 52), which the running KWin uses for its GL framebuffer.
- The overlay plane (Plane 45) advertises no NV12 in any
modifier in
IN_FORMATS. - Therefore no rockchip-drm scanout plane can accept the NV12 buffer hantro produces while KWin owns the primary plane. Some compositing step must convert NV12 → RGB before display.
The predecessor named the constraint (Path B rejected at the format/modifier intersection) but the consequence — "some component must GL-composite NV12 → RGB on the GPU because nothing else on this hardware can put NV12 on a scanout plane" — was not made explicit. That consequence is this campaign's motivating insight:
- Under Plasma Wayland: when the browser engages the
Wayland subsurface route (chromium's
WaylandBufferManagerHost::CommitOverlays), KWin receives an NV12 dmabuf and must GL-composite it. KWin's compositor is the GL-composite step. When the browser does NOT engage the subsurface route (the predecessor's measured case onbrave_drops_test.html— zerowp_subsurfacein the trace), the browser itself converts NV12 → RGB in its own GL context and hands KWin only RGB; KWin then composites the RGB to its primary plane. - Under X11 without a compositor: there is no separate
compositor process. Two paths are open to the client:
- RGB-composite path (browser converts NV12 → RGB in its own GL context and presents the RGB result via XPresent/DRI3 to the X server, which schedules a page-flip on the same Primary plane KWin would have used). One fewer hand-off than the Wayland-with-subsurface case but the same GL- composite cost as the no-subsurface Wayland case.
- Hardware-overlay path (operator-supplied context 2026-05-03: "a X11 pipeline would route around that by giving a portion of screen real estate directly to the video pipeline"). The X server allocates the Primary plane (Plane 39, supports NV12 LINEAR) to the video region and the Overlay plane (Plane 45, supports RGB/AFBC) to the rest of the desktop. Hardware-blended at scanout time. No GL-composite of NV12 anywhere — the cost the operator named as "the real bottleneck" is structurally avoided.
This second X11 path is what Wayland compositors as designed today cannot do on rockchip-drm-class hardware: KWin Wayland must own the Primary plane for its compositor framebuffer (because the Wayland model is "compositor presents one merged surface per output"), so it cannot give Plane 39 to a video-region NV12 buffer while putting the rest of the desktop on Plane 45. X11 + non-compositing WM has no such constraint — different windows can be assigned to different planes by the X server's plane allocator.
This is the X11 hardware-overlay mechanism that historically made X11 desktops good at video playback (Xv from the late 1990s, and the modern equivalents via DRI3 + XPresent + Composite-redirection-disabled). It is structurally absent in Wayland-with-monolithic-compositor designs.
Hypothesis the matrix tests
There are three potentially separable costs:
- The mandatory NV12 → RGB GL conversion, which is forced on Wayland-with-KWin because KWin must own the only NV12-LINEAR-capable plane on this hardware for its compositor framebuffer. This cost is structurally avoidable under X11 + non-compositing WM via hardware-plane-overlay (per the operator-supplied insight above). Whether browsers can be coaxed to use the X11 hardware-overlay path — rather than internally compositing to RGB before presenting — is browser-specific (see Open questions below).
- The fallback GL-composite cost when the hardware-overlay path doesn't engage. Both Wayland and X11 pay this when the buffer shape doesn't match a plane — it just runs in different processes (KWin under Wayland, browser under X11).
- The per-frame compositor overhead independent of NV12:
dmabuf import, transaction apply, presentation-feedback
wiring, frame-callback delivery — which the predecessor
measured at ~30-37 % of
kwin_wayland's CPU during steady-state video playback even when KWin only saw RGB surfaces.
The X11 hypothesis is strongest if cost (1) is dominant on the matrix's with-KWin cells AND the X11 cells trigger the hardware-overlay path. The X11 hypothesis is weakest if cost (1) is small and cost (3) is small — in which case the "cutting out KWin" experiment would show only marginal differences.
The matrix below is designed to surface which of (1) (2) (3) dominates per browser × decode path.
"Faster video display" is operationally a combination of:
- Effective fps actually rendered (=
getVideoPlaybackQuality().totalVideoFrames / elapsed_sfor a 30 fps source — the upper bound is 30; the question is how close). - Drop count over the same 70 s window (
droppedVideoFrames). - End-to-end latency if testable (commit → present;
testable on Wayland via
wp_presentation_feedback, testable on X11 viaXPresentextension orRandRvblank events; protocol-side measurement under each display-server). - Compositor + browser CPU at steady state (the cost saved by cutting the compositor is the upper bound on the patch-payoff if a future campaign tries to optimise the compositor instead of removing it).
Experimental matrix
Six 2-axis cells (3 browsers × 2 decode paths) × 2 session conditions (with-KWin / without-KWin):
| Browser | Decode | with-KWin (Plasma Wayland) | without-KWin (X11 session, no compositor) |
|---|---|---|---|
| Brave 147 | full SW | C-W-brave-sw | C-X-brave-sw |
| Brave 147 | libva (if it works) | C-W-brave-libva | C-X-brave-libva |
| chromium-fourier 149 (Step 1 + Step 2) | full SW | C-W-chrf-sw | C-X-chrf-sw |
| chromium-fourier 149 | libva (Step 1 enables it) | C-W-chrf-libva | C-X-chrf-libva |
| Firefox | full SW | C-W-ff-sw | C-X-ff-sw |
| Firefox | libva | C-W-ff-libva | C-X-ff-libva |
The "(if it works)" / "where possible" qualifier per the
operator's directive: libva on rockchip-drm RK3568 only works
on chromium-fourier (Step 1 ports libva-v4l2-request); for
stock Brave 147 and stock Firefox, libva probably doesn't
engage and those cells are documented N/A. For Firefox, the
Mesa-side libva-v4l2-request may make libva work via Mozilla's
VAAPI backend even on stock Firefox — to be verified in
Phase 0 inventory.
What "cutting out the KWin compositor" means
This campaign uses X11 session with no compositor in the display path as the "without-KWin" cell. Specifically:
- Native Xorg server, NOT XWayland (XWayland would still go through KWin for display, defeating the purpose).
- Window manager that does NOT composite by default — e.g.
openbox, fluxbox, xfwm4-with-compositing-off, i3, twm.
Plasma X11 uses
kwin_x11as compositing WM, which is still a "KWin compositor" — it does not satisfy "cut KWin out" and is excluded from the without-KWin cell. - Browser windowed (not fullscreen). Even on a non-compositing WM, fullscreen browsers may engage XPresent direct presentation paths — testing windowed isolates the baseline non-compositor windowed display path.
The exact WM choice is a Phase 0 inventory decision (which WMs are available on ohm, which install cleanly, which SDDM-advertised sessions exist). Default candidate: openbox.
Three plausible outcome shapes
- (α) Without-KWin is materially faster across all 6 cells: confirms the KWin compositor cost is a real bottleneck on this hardware, and X11-session-without- compositor becomes the recommended daily-driver configuration for video work on PineTab2.
- (β) Without-KWin is comparable or only marginally faster: the compositor isn't the bottleneck; the drop phenomenon is hardware/kernel/Mesa-bound, and the predecessor's Phase 8 closure stands.
- (γ) Mixed picture per browser × decode path: e.g. libva paths benefit but SW paths don't; or Firefox benefits but chromium-class clients don't. Each cell becomes its own characterisation.
Open questions before Phase 1 lock
The hardware-overlay-path mechanism is structurally available on X11 + non-compositing WM. Whether it actually engages for each of the three browsers is browser-specific and currently unknown:
- Brave / Chromium ozone-x11: Chromium has overlay-support
code (
OverlayProcessor,GpuMemoryBufferManager,DCOMPSurfaceon Windows; on Linux X11 the path is via XPresent + DMA-BUF +OverlayCandidate). Whether Brave 147 / chromium-fourier 149 actually request hardware-overlay presentation for a windowed video element under X11 is open. - Firefox: VAAPIVideoDecoder backend produces hardware
decoded NV12 dmabufs that the GL compositor consumes
internally. Whether Firefox's X11 backend has a path to
hand the dmabuf to the X server for hardware-overlay
presentation (rather than internally composing to RGB) is
open. Mozilla has a
MOZ_X11_EGLhint and a "hardware video overlay" pref but these are not universally engaged. - Reference clients: mpv with
--vo=xvor--vo=gpu --hwdec=auto-copy --gpu-context=x11, orgst-play-1.0withxvimagesinkorglimagesink, are known-good X11 hardware-overlay paths. Adding mpv to the matrix as a reference client would isolate "does the X11 hardware- overlay path work AT ALL on this hardware" from "do browsers actually use it." If mpv hardware-overlays cleanly but browsers don't, the conclusion is "the X11 path is fast, but browsers leave the speedup on the table."
If the operator agrees, Phase 0 inventory should:
- Verify Plane 39's NV12-LINEAR availability is reachable to
X11 clients (it is for KWin Wayland; should be for X11 too
since Plane 39 is just a DRM resource), and identify which
X11 path actually programs it (modesetting Xorg driver +
Option "PageFlip" "true", or DRI3-presented buffer ending up on Plane 39 via the X server's plane allocator). - Inventory Brave's, chromium-fourier's, and Firefox's X11 overlay-presentation paths to see which (if any) request hardware-overlay presentation.
- Add mpv as a reference X11-overlay client to the matrix, so the campaign has a known-good comparison point.
What this question does NOT cover
For clarity, since the predecessor was specifically about the Wayland-overlay-subsurface composite path:
- This campaign is not investigating the wp_subsurface route. The Wayland-cell of the matrix (with-KWin) measures whatever browser configuration produces under the existing Plasma Wayland session — windowed, default profile, default flags. It's a measurement of the as-shipped Plasma Wayland stack from the user's perspective, not a probe of a specific KWin code path.
- The Δ_present-46 ms finding from the predecessor is testable as a free side-finding under both axes (Wayland and X11) but is not the campaign's primary question.
- Daily-driver fitness (apps that break under X11, touchscreen behavior, multi-monitor edge cases, etc.) is not in scope. The campaign's deliverable is the matrix above; if any cell is decisively faster, daily-driver-fitness becomes a follow-up campaign.
What's NOT in scope (working assumption)
Until the research question is confirmed, the following are treated as out of scope so they don't slip into Phase 1 prematurely:
- Patches to KWin, Xorg, kwin-fourier, qt6-base-fourier, or any other component on ohm. This is research, not patch-development. Per non-upstreaming default, MR/bug-report filing is explicitly tasked and not scheduled here.
- The Δ_present-46 ms finding's investigation. It's a known hook from the predecessor; whether this campaign chases it depends on the locked research question.
- Reverting predecessor tooling state. Governor, baloo,
qt6-base-fourier,kwin-fourierstay as-is unless the operator decides otherwise. - File a bug for any of the predecessor's three documented candidate findings. Same non-upstreaming default applies.
What Phase 0 will deliver, regardless of framing
Even before the research question is locked, the following are useful Phase 0 deliverables that don't depend on the specific question:
- State snapshot of ohm under current Plasma Wayland captured at campaign start. This is the before photo for any future X11 vs Wayland comparison. Unattended-tractable (just scripted SSH).
- Inventory of available X11 paths on ohm: what packages are installed, what session candidates SDDM advertises, what would need to be installed to enable a Plasma X11 session, what alternate WMs are available. Read-only, unattended-tractable.
- Inventory of measurement instruments that work under
X11:
xtrace,xprop,xrandr --verbose --query, perf onXorgPID, frame-timing extraction options. Read-only. - A1 baseline under current Plasma Wayland: re-run a
single rep of the predecessor's
kwin_timing_nodebugcondition immediately at the start of this campaign, so the comparison Wayland-vs-X11 has a same-session anchor. This is the "set the baseline before instrument changes" discipline fromfeedback_replicate_baseline_first.md.
These steps are unblocked. They don't commit to a specific research question and they produce evidence that's useful under any of the candidate framings.