Phase 0: motivation locked + mechanism captured
Operator-supplied research question 2026-05-03: "Does cutting out the KWin compositor enable faster video display of Brave, chromium-fourier, and Firefox — for full SW decoding, and for libva decoding (where possible) — on PineTab2 RK3568?" Operator-supplied mechanism 2026-05-03 (two messages): 1. "hantro emits NV12 which the GPU can't put on a compositeable plane. So that is the real bottleneck of Wayland." Connects directly to predecessor's Phase 1 finding (kwin_overlay_subsurface/phase2_source_findings.md:170-229): rockchip-drm overlay Plane 45 advertises no NV12 modifier; Primary Plane 39 supports NV12 LINEAR but is owned by KWin for its compositor framebuffer. Predecessor named the constraint but not the consequence — the consequence is that NV12 → RGB GL-composite is forced on Wayland-with-KWin regardless of which protocol path the browser uses. 2. "A X11 pipeline would route around that by giving a portion of screen real estate directly to the video pipeline." The X11 hardware-overlay path: with X11 + non-compositing WM, the X server can allocate Plane 39 (NV12 LINEAR) to the video region and Plane 45 (RGB AFBC) to the rest of the desktop. Hardware-blended at scanout. NO GL-composite anywhere — the cost the operator named as "the real bottleneck" is structurally avoided. This is the X11 hardware-overlay mechanism that historically made X11 desktops good at video playback (Xv → modern DRI3 + XPresent + Composite-redirection-disabled). Wayland-with-monolithic-compositor designs cannot use this freedom: the compositor must own the Primary plane, so the plane-allocation freedom required to put NV12 video on Plane 39 alongside RGB chrome on Plane 45 isn't available. phase0_findings.md updated with: - Locked research question + 12-cell experimental matrix (3 browsers × 2 decode paths × 2 sessions; some N/A). - Three separable cost components the matrix tests for (mandatory NV12→RGB GL conversion if hardware-overlay doesn't engage, fallback GL-composite, per-frame compositor overhead independent of NV12). - Open questions about whether browsers actually request hardware-overlay presentation under X11, or whether they always internally composite to RGB. - Recommendation to add mpv as a reference X11-overlay client: distinguishes "X11 path works on this hardware" from "browsers actually use the X11 path." worklist.md updated: - Phase 0 motivation + matrix items ticked. - Pre-Phase-1 inventory broken out: state snapshot, X11 path inventory, browser-overlay-path inventory, mpv reference, X11 measurement-tool inventory, A1 Wayland baseline anchor. - Phase 1 sketch: binding cells per matrix cell, clear-pass / clear-fail thresholds, measurement protocol mirroring the predecessor's phase3_protocol.md structure. README banner updated to reflect locked motivation + mechanism summary + matrix shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+249
-55
@@ -99,70 +99,264 @@ If this campaign needs to switch ohm to an X11 session, that
|
||||
is a session-level operator action (logout, switch via SDDM,
|
||||
log back in). It cannot be done unattended.
|
||||
|
||||
## Research question (provisional — awaits operator confirmation)
|
||||
## Research question (LOCKED 2026-05-03)
|
||||
|
||||
**Candidate framing**, not locked:
|
||||
> *"Does cutting out the KWin compositor enable faster video
|
||||
> display of Brave, chromium-fourier, and Firefox — for full
|
||||
> SW decoding, and for libva decoding (where possible) — on
|
||||
> PineTab2 RK3568?"*
|
||||
|
||||
> *"On PineTab2 RK3568 with the same chromium-fourier binary,
|
||||
> the same `bbb_1080p30_h264.mp4` 30 fps source clip, and the
|
||||
> same `brave_drops_test.html` instrumented page, does running
|
||||
> an X11-session display server (Plasma X11, or an alternative
|
||||
> X11 desktop) reproduce the drop-inversion phenomenon that
|
||||
> motivated `kwin_overlay_subsurface`, eliminate it, or
|
||||
> introduce a different drop characteristic?"*
|
||||
### Mechanism the question targets
|
||||
|
||||
This is the most narrowly relevant question given the
|
||||
predecessor's close-out. Three plausible outcomes:
|
||||
Operator-supplied context 2026-05-03:
|
||||
|
||||
- **(α)** X11 reproduces low post-warmup drops (matches Phase 0
|
||||
cage = 0 floor): isolates the dropped-frames mechanism to
|
||||
the Wayland compositor stack on this hardware. The original
|
||||
campaign's framing was correct in spirit but the cage
|
||||
comparison was confounded; X11 becomes the better
|
||||
comparator.
|
||||
- **(β)** X11 has comparable or higher post-warmup drops: the
|
||||
drop phenomenon is hardware/kernel/Mesa-bound and does not
|
||||
localise to the display-server stack at all. Predecessor's
|
||||
Phase 8 closure stands; the X11 measurement is the
|
||||
decisive cross-check.
|
||||
- **(γ)** X11 has a different failure mode entirely (different
|
||||
drop pattern, different perf hot symbols, different effective
|
||||
fps): each finding is its own characterisation; the
|
||||
campaign becomes "what does running X11 on this hardware
|
||||
look like end-to-end."
|
||||
> *"hantro emits NV12 which the GPU can't put on a
|
||||
> compositeable plane. So that is the real bottleneck of
|
||||
> Wayland."*
|
||||
|
||||
**Alternate framings** the operator may have in mind that
|
||||
this provisional question doesn't cover:
|
||||
This connects directly to the predecessor's Phase 1 finding
|
||||
(`kwin_overlay_subsurface/phase2_source_findings.md`:170-229):
|
||||
|
||||
- *Daily-driver fitness*: "Can I use X11 instead of Wayland on
|
||||
this device for everyday browser/video/desktop work, and
|
||||
what works/breaks?" — different scope; less measurement-heavy,
|
||||
more workflow-oriented.
|
||||
- *Specific X11-only feature investigation*: composite
|
||||
redirection, XRender, GLAMOR, Xinerama on a single-display
|
||||
device, etc.
|
||||
- *XWayland behaviour*: many Linux desktops run X11 clients
|
||||
under Wayland via XWayland. If an "X11 session" really means
|
||||
"test under XWayland to compare with native Wayland", the
|
||||
measurement is fundamentally different.
|
||||
- *Power consumption / thermal*: X11 vs Wayland on a passively
|
||||
cooled tablet may differ in idle CPU and thermal envelope.
|
||||
Different metric set.
|
||||
- Hantro VPU decodes H.264 video into NV12 dmabufs (`DRM_FORMAT_NV12`,
|
||||
`DRM_FORMAT_MOD_LINEAR`).
|
||||
- rockchip-drm's only NV12-LINEAR-capable plane is the
|
||||
Primary plane (Plane 39 on CRTC 52), which the running KWin
|
||||
uses for its GL framebuffer.
|
||||
- The overlay plane (Plane 45) advertises no NV12 in any
|
||||
modifier in `IN_FORMATS`.
|
||||
- Therefore **no rockchip-drm scanout plane can accept the
|
||||
NV12 buffer hantro produces while KWin owns the primary
|
||||
plane.** Some compositing step must convert NV12 → RGB
|
||||
before display.
|
||||
|
||||
**Operator decision needed before Phase 1**:
|
||||
The predecessor named the *constraint* (Path B rejected at
|
||||
the format/modifier intersection) but the *consequence* —
|
||||
"some component must GL-composite NV12 → RGB on the GPU
|
||||
because nothing else on this hardware can put NV12 on a
|
||||
scanout plane" — was not made explicit. That consequence is
|
||||
this campaign's motivating insight:
|
||||
|
||||
1. Which question is in scope? (drop phenomenon, daily-driver,
|
||||
feature-specific, XWayland-vs-native, power, or something
|
||||
else).
|
||||
2. What "X11 session" means specifically: native Xorg + Plasma
|
||||
X11; native Xorg + lightweight WM (e.g. openbox / i3 / xfwm);
|
||||
XWayland under the existing Plasma Wayland session; or
|
||||
another configuration.
|
||||
3. What the success/failure criteria look like (binding cells,
|
||||
`metrics.csv` shape).
|
||||
- **Under Plasma Wayland:** when the browser engages the
|
||||
Wayland subsurface route (chromium's
|
||||
`WaylandBufferManagerHost::CommitOverlays`), KWin receives
|
||||
an NV12 dmabuf and must GL-composite it. **KWin's compositor
|
||||
is the GL-composite step.** When the browser does NOT
|
||||
engage the subsurface route (the predecessor's measured
|
||||
case on `brave_drops_test.html` — zero `wp_subsurface` in
|
||||
the trace), the browser itself converts NV12 → RGB in its
|
||||
own GL context and hands KWin only RGB; KWin then composites
|
||||
the RGB to its primary plane.
|
||||
- **Under X11 without a compositor:** there is no separate
|
||||
compositor process. Two paths are open to the client:
|
||||
- *RGB-composite path* (browser converts NV12 → RGB in its
|
||||
own GL context and presents the RGB result via XPresent/DRI3
|
||||
to the X server, which schedules a page-flip on the same
|
||||
Primary plane KWin would have used). One fewer hand-off
|
||||
than the Wayland-with-subsurface case but the same GL-
|
||||
composite cost as the no-subsurface Wayland case.
|
||||
- **Hardware-overlay path** (operator-supplied context
|
||||
2026-05-03: *"a X11 pipeline would route around that by
|
||||
giving a portion of screen real estate directly to the
|
||||
video pipeline"*). The X server allocates the Primary
|
||||
plane (Plane 39, supports NV12 LINEAR) to the video
|
||||
region and the Overlay plane (Plane 45, supports
|
||||
RGB/AFBC) to the rest of the desktop. Hardware-blended
|
||||
at scanout time. **No GL-composite of NV12 anywhere —
|
||||
the cost the operator named as "the real bottleneck"
|
||||
is structurally avoided.**
|
||||
|
||||
Until those are answered, Phase 0 documents the question space
|
||||
and Phase 1 does not lock.
|
||||
This second X11 path is what Wayland compositors as
|
||||
designed today cannot do on rockchip-drm-class hardware: KWin
|
||||
Wayland *must* own the Primary plane for its compositor
|
||||
framebuffer (because the Wayland model is "compositor presents
|
||||
one merged surface per output"), so it cannot give Plane 39
|
||||
to a video-region NV12 buffer while putting the rest of the
|
||||
desktop on Plane 45. X11 + non-compositing WM has no such
|
||||
constraint — different windows can be assigned to different
|
||||
planes by the X server's plane allocator.
|
||||
|
||||
This is the X11 hardware-overlay mechanism that historically
|
||||
made X11 desktops good at video playback (Xv from the late
|
||||
1990s, and the modern equivalents via DRI3 + XPresent +
|
||||
Composite-redirection-disabled). It is structurally absent
|
||||
in Wayland-with-monolithic-compositor designs.
|
||||
|
||||
### Hypothesis the matrix tests
|
||||
|
||||
There are three potentially separable costs:
|
||||
|
||||
1. **The mandatory NV12 → RGB GL conversion**, which is
|
||||
*forced* on Wayland-with-KWin because KWin must own the
|
||||
only NV12-LINEAR-capable plane on this hardware for its
|
||||
compositor framebuffer. **This cost is structurally
|
||||
avoidable** under X11 + non-compositing WM via
|
||||
hardware-plane-overlay (per the operator-supplied insight
|
||||
above). Whether browsers can be coaxed to *use* the X11
|
||||
hardware-overlay path — rather than internally compositing
|
||||
to RGB before presenting — is browser-specific (see Open
|
||||
questions below).
|
||||
2. **The fallback GL-composite cost** when the
|
||||
hardware-overlay path doesn't engage. Both Wayland and X11
|
||||
pay this when the buffer shape doesn't match a plane —
|
||||
it just runs in different processes (KWin under Wayland,
|
||||
browser under X11).
|
||||
3. **The per-frame compositor overhead** independent of NV12:
|
||||
dmabuf import, transaction apply, presentation-feedback
|
||||
wiring, frame-callback delivery — which the predecessor
|
||||
measured at ~30-37 % of `kwin_wayland`'s CPU during
|
||||
steady-state video playback even when KWin only saw RGB
|
||||
surfaces.
|
||||
|
||||
The X11 hypothesis is strongest if cost (1) is dominant on
|
||||
the matrix's with-KWin cells AND the X11 cells trigger the
|
||||
hardware-overlay path. The X11 hypothesis is weakest if
|
||||
cost (1) is small and cost (3) is small — in which case the
|
||||
"cutting out KWin" experiment would show only marginal
|
||||
differences.
|
||||
|
||||
The matrix below is designed to surface which of (1) (2) (3)
|
||||
dominates per browser × decode path.
|
||||
|
||||
"Faster video display" is operationally **a combination of**:
|
||||
|
||||
- **Effective fps actually rendered** (= `getVideoPlaybackQuality().totalVideoFrames / elapsed_s`
|
||||
for a 30 fps source — the upper bound is 30; the question is
|
||||
how close).
|
||||
- **Drop count** over the same 70 s window (`droppedVideoFrames`).
|
||||
- **End-to-end latency** if testable (commit → present;
|
||||
testable on Wayland via `wp_presentation_feedback`,
|
||||
testable on X11 via `XPresent` extension or `RandR` vblank
|
||||
events; protocol-side measurement under each
|
||||
display-server).
|
||||
- **Compositor + browser CPU at steady state** (the cost
|
||||
saved by cutting the compositor is the upper bound on the
|
||||
patch-payoff if a future campaign tries to optimise the
|
||||
compositor instead of removing it).
|
||||
|
||||
### Experimental matrix
|
||||
|
||||
Six 2-axis cells (3 browsers × 2 decode paths) × 2
|
||||
session conditions (with-KWin / without-KWin):
|
||||
|
||||
| Browser | Decode | with-KWin (Plasma Wayland) | without-KWin (X11 session, no compositor) |
|
||||
|---|---|---|---|
|
||||
| Brave 147 | full SW | C-W-brave-sw | C-X-brave-sw |
|
||||
| Brave 147 | libva (if it works) | C-W-brave-libva | C-X-brave-libva |
|
||||
| chromium-fourier 149 (Step 1 + Step 2) | full SW | C-W-chrf-sw | C-X-chrf-sw |
|
||||
| chromium-fourier 149 | libva (Step 1 enables it) | C-W-chrf-libva | C-X-chrf-libva |
|
||||
| Firefox | full SW | C-W-ff-sw | C-X-ff-sw |
|
||||
| Firefox | libva | C-W-ff-libva | C-X-ff-libva |
|
||||
|
||||
The "(if it works)" / "where possible" qualifier per the
|
||||
operator's directive: libva on rockchip-drm RK3568 only works
|
||||
on chromium-fourier (Step 1 ports `libva-v4l2-request`); for
|
||||
stock Brave 147 and stock Firefox, libva probably doesn't
|
||||
engage and those cells are documented N/A. For Firefox, the
|
||||
Mesa-side `libva-v4l2-request` may make libva work via Mozilla's
|
||||
VAAPI backend even on stock Firefox — to be verified in
|
||||
Phase 0 inventory.
|
||||
|
||||
### What "cutting out the KWin compositor" means
|
||||
|
||||
This campaign uses **X11 session with no compositor in the
|
||||
display path** as the "without-KWin" cell. Specifically:
|
||||
|
||||
- Native Xorg server, NOT XWayland (XWayland would still go
|
||||
through KWin for display, defeating the purpose).
|
||||
- Window manager that does NOT composite by default — e.g.
|
||||
openbox, fluxbox, xfwm4-with-compositing-off, i3, twm.
|
||||
Plasma X11 uses `kwin_x11` as compositing WM, which is
|
||||
still a "KWin compositor" — it does not satisfy "cut KWin
|
||||
out" and is **excluded** from the without-KWin cell.
|
||||
- Browser windowed (not fullscreen). Even on a non-compositing
|
||||
WM, fullscreen browsers may engage XPresent direct
|
||||
presentation paths — testing windowed isolates the
|
||||
baseline non-compositor windowed display path.
|
||||
|
||||
The exact WM choice is a Phase 0 inventory decision (which
|
||||
WMs are available on ohm, which install cleanly, which
|
||||
SDDM-advertised sessions exist). Default candidate: openbox.
|
||||
|
||||
### Three plausible outcome shapes
|
||||
|
||||
- **(α)** Without-KWin is materially faster across all 6
|
||||
cells: confirms the KWin compositor cost is a real
|
||||
bottleneck on this hardware, and X11-session-without-
|
||||
compositor becomes the recommended daily-driver
|
||||
configuration for video work on PineTab2.
|
||||
- **(β)** Without-KWin is comparable or only marginally
|
||||
faster: the compositor isn't the bottleneck; the drop
|
||||
phenomenon is hardware/kernel/Mesa-bound, and the
|
||||
predecessor's Phase 8 closure stands.
|
||||
- **(γ)** Mixed picture per browser × decode path: e.g.
|
||||
libva paths benefit but SW paths don't; or Firefox benefits
|
||||
but chromium-class clients don't. Each cell becomes its own
|
||||
characterisation.
|
||||
|
||||
### Open questions before Phase 1 lock
|
||||
|
||||
The hardware-overlay-path mechanism is structurally available
|
||||
on X11 + non-compositing WM. Whether it actually engages for
|
||||
each of the three browsers is browser-specific and currently
|
||||
unknown:
|
||||
|
||||
- **Brave / Chromium ozone-x11**: Chromium has overlay-support
|
||||
code (`OverlayProcessor`, `GpuMemoryBufferManager`,
|
||||
`DCOMPSurface` on Windows; on Linux X11 the path is via
|
||||
XPresent + DMA-BUF + `OverlayCandidate`). Whether Brave
|
||||
147 / chromium-fourier 149 actually request hardware-overlay
|
||||
presentation for a windowed video element under X11 is open.
|
||||
- **Firefox**: VAAPIVideoDecoder backend produces hardware
|
||||
decoded NV12 dmabufs that the GL compositor consumes
|
||||
internally. Whether Firefox's X11 backend has a path to
|
||||
hand the dmabuf to the X server for hardware-overlay
|
||||
presentation (rather than internally composing to RGB) is
|
||||
open. Mozilla has a `MOZ_X11_EGL` hint and a "hardware video
|
||||
overlay" pref but these are not universally engaged.
|
||||
- **Reference clients**: mpv with `--vo=xv` or
|
||||
`--vo=gpu --hwdec=auto-copy --gpu-context=x11`, or `gst-play-1.0`
|
||||
with `xvimagesink` or `glimagesink`, are known-good X11
|
||||
hardware-overlay paths. **Adding mpv to the matrix as a
|
||||
reference client** would isolate "does the X11 hardware-
|
||||
overlay path work AT ALL on this hardware" from "do
|
||||
browsers actually use it." If mpv hardware-overlays cleanly
|
||||
but browsers don't, the conclusion is "the X11 path is fast,
|
||||
but browsers leave the speedup on the table."
|
||||
|
||||
If the operator agrees, Phase 0 inventory should:
|
||||
|
||||
1. Verify Plane 39's NV12-LINEAR availability is reachable to
|
||||
X11 clients (it is for KWin Wayland; should be for X11 too
|
||||
since Plane 39 is just a DRM resource), and identify which
|
||||
X11 path actually programs it (modesetting Xorg driver +
|
||||
`Option "PageFlip" "true"`, or DRI3-presented buffer ending
|
||||
up on Plane 39 via the X server's plane allocator).
|
||||
2. Inventory Brave's, chromium-fourier's, and Firefox's X11
|
||||
overlay-presentation paths to see which (if any) request
|
||||
hardware-overlay presentation.
|
||||
3. Add mpv as a reference X11-overlay client to the matrix,
|
||||
so the campaign has a known-good comparison point.
|
||||
|
||||
### What this question does NOT cover
|
||||
|
||||
For clarity, since the predecessor was specifically about
|
||||
the Wayland-overlay-subsurface composite path:
|
||||
|
||||
- This campaign is **not** investigating the wp_subsurface
|
||||
route. The Wayland-cell of the matrix (with-KWin) measures
|
||||
whatever browser configuration produces under the existing
|
||||
Plasma Wayland session — windowed, default profile, default
|
||||
flags. It's a measurement of the as-shipped Plasma Wayland
|
||||
stack from the user's perspective, not a probe of a
|
||||
specific KWin code path.
|
||||
- The Δ_present-46 ms finding from the predecessor is
|
||||
testable as a free side-finding under both axes (Wayland
|
||||
and X11) but is not the campaign's primary question.
|
||||
- Daily-driver fitness (apps that break under X11, touchscreen
|
||||
behavior, multi-monitor edge cases, etc.) is **not in
|
||||
scope**. The campaign's deliverable is the matrix above; if
|
||||
any cell is decisively faster, daily-driver-fitness becomes
|
||||
a follow-up campaign.
|
||||
|
||||
## What's NOT in scope (working assumption)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user