Files

T

marfrit e8bae670d3 Campaign-contained data discipline + repo notice

Operator directive 2026-05-03: this campaign acquires all its
own measurement data from scratch. Predecessor numbers are
documented for context but never imported as binding cells,
comparison targets, or success thresholds. The lesson the
predecessor (kwin_overlay_subsurface) closed-without-patch on
is exactly that: phase 1 cells anchored to a single historical
ohm_gl_fix Phase 0 measurement, three weeks of phase planning
on a baseline that didn't reproduce in-session.

The strongest version of feedback_replicate_baseline_first.md:
"don't import predecessor data, acquire it fresh." The discipline
is now documented as a governing rule in three places:

- README.md § "Campaign-contained data discipline"
- phase0_findings.md § "Campaign-contained data discipline
  (governing rule)"
- worklist.md § "Governing rule (every phase)"

Concrete consequences:
- A1 baseline (Phase 0 task) is now mandatory at N=3 reps.
  Single-rep wasn't enough to surface session variance in the
  predecessor; doing 3 up front makes the baseline robust to
  the same kind of session-state drift that ate the
  predecessor's premise.
- Phase 1 thresholds are drawn against the A1 baseline measured
  in this campaign, not against any predecessor number.
- metrics.csv (when it lands) only carries data from this
  campaign's reps. No predecessor rows imported.

README.md additionally:
- Adds the predecessor chain (ohm_gl_fix -> kwin_overlay_subsurface
  -> this campaign) with explicit "what stays valid for source-
  reading" vs "numbers that don't" separation.
- Calls out durable substrate available from predecessors:
  KWin scanout-promotion archaeology, measurement-protocol
  template, WAYLAND_DEBUG parser. All structural; none
  measurement-numerical.
- Carry-over predecessor system state on ohm (governor pin,
  baloo disabled, fourier packages) is explicitly distinguished
  from measurement data. System state inherits; data does not.
- Repository line points to the gitea remote
  ssh://gitea@git.reauktion.de:2222/marfrit/x11-session-research.git

phase0_findings.md additionally:
- Reframes the predecessor-close-out summary section header to
  "(context, not data)" and rephrases past-tense numbers so
  none are stated as "the baseline."
- Adds the discipline lesson narrative in-line before the
  predecessor close-out: a 30-minute N=3 same-session baseline
  on day 1 of the predecessor would have made the campaign
  different — and that's the move this campaign starts with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 06:59:06 +00:00

20 KiB

Raw Blame History

Phase 0 — substrate and locked research question

This is the campaign's Phase 0 doc: the locked research question (see § "Research question (LOCKED 2026-05-03)" below), the substrate inherited as context from the predecessor kwin_overlay_subsurface, and the open questions this campaign answers in-session before Phase 1 binding cells lock.

Campaign-contained data discipline (governing rule)

This campaign acquires all its own measurement data from scratch in this session. Predecessor measurement numbers (drop counts, perf percentages, Δ_present medians, kwin %CPU values, threshold values) are documented for context but never imported as binding cells, comparison targets, or success criteria.

Concretely, in this Phase 0 doc:

Numbers from kwin_overlay_subsurface Phase 3 / Phase 3-prime (e.g. "C5' had 32 drops total / 22 post-warmup"; "median Δ_present 45.85 ms"; "kwin %CPU median 36.90") are quoted only as evidence of past measurements that may or may not reproduce. They establish the SHAPE of what's measurable and the fact that measurement infrastructure works on this hardware. They do not establish "this is the baseline the X11 cells will be compared against."
The Phase 0 A1 baseline rep (worklist.md) is the in-session anchor for any with-KWin Wayland measurement this campaign references.
No metrics.csv row in this campaign is populated from predecessor data, even if the predecessor measured an identical condition.
If an X11 cell appears to "match the cage Phase 0 number," that's incidental — the cage Phase 0 number is not the test this campaign is running. The test is "X11 vs in-session Wayland baseline."

The discipline lesson is concrete: the predecessor's Phase 1 binding cell drops_post_warmup == 0 was anchored to a single ohm_gl_fix Phase 0 cage measurement (7 drops total / 0 post-warmup). Three weeks of phase planning ran on the assumption that floor existed. At N=3 in-session replication on the closing day of the predecessor, that floor was missing (cage today: 22 / 26 / 56 post-warmup). The campaign closed without patch. A 30-minute N=3 same-session baseline check on day 1 would have made the campaign different — or made it honestly close earlier. This campaign acts on that lesson.

Predecessor close-out summary (context, not data)

../kwin_overlay_subsurface/phase8_handover.md (closed 2026-05-03 without patch). Three independent reasons no patch landed (numbers below are quoted as historical context per the governing rule above):

The predecessor's locked Phase 1 reference floor (drops_post_warmup == 0 from cage) was unreachable in the predecessor's closing N=3 measurement session — same chromium-fourier binary, same hardware, same kernel, same Mesa, same kwin-fourier as the original Phase 0 measurement. KWin direct's number reproduced; cage's 0-floor did not. Numbers quoted in the predecessor's phase3_prime_findings.md; not used here as a baseline for any cell.
The campaign's surface-of-investigation (wp_subsurface overlay route) is not engaged by brave_drops_test.html. Chromium-fourier renders the video element via internal compositing into its main browser window surface — a single-surface case.
The Phase 2 hot-path hypothesis (glEGLImageTargetTexture2DOES dominates kwin_wayland's per-frame cost) was rejected by Phase 3 perf measurement with 100×-margin on the wrong side of the threshold.

The diagnostic loop terminated at "the campaign's premise was N=1 to begin with, and the N=3 in-session re-measurement doesn't replicate it." This is filed as a feedback memory: replicate the N=1 baseline at N=3 in the same session BEFORE building multi-phase infrastructure around it.

What stays valid from the predecessor

Durable substrate listed in kwin_overlay_subsurface/phase8_handover.md § "What's left for a future session to pick up":

Phase 1 scanout-promotion archaeology (rockchip-drm RK3568 plane format/modifier table, KWin v6.6.4 promotion predicate). Plane 39 (Primary, NV12 LINEAR) is the GL framebuffer; Plane 45 (Overlay) does not advertise NV12 in any modifier. Both KWin scanout-promotion paths are structurally rejected for windowed Brave on this DRM driver. This holds regardless of display server.
Phase 2 H1 file:line in kwin_overlay_subsurface/phase2_source_findings.md. Cold per Phase 3 measurement; informational only.
Phase 2-prime Shape C source-read of Display::dispatchEvents and TransactionFence in KWin's src/wayland/. Specific to the Wayland path; not relevant to an X11-session campaign. The X11 path uses different KWin surface plumbing (kwin_x11) and a different per-frame protocol (X11 Composite extension + Damage + XPresent), not Wayland protocol dispatch.
Δ_present-46 ms reproducible side-finding under Plasma Wayland. Across all measured conditions (chromium-fourier on KWin, chromium-fourier in cage, stock Brave on KWin), median Δ_present was 41-46 ms on a 60 Hz panel — a stable ~2.7-vsync queue depth. This finding is independent of the cage breakdown and directly testable under X11 as a comparison point.
Measurement infrastructure: kwin_overlay_subsurface/scripts/wayland_debug_to_csv.py (libwayland 1.21+ format, 17 unit tests passing) + phase3_prime_runs/run_browser.sh orchestrator on ohm (handles WAYLAND_DEBUG=1 capture, perf record, top sampling, drops trajectory extraction, kill-cleanly). The WAYLAND_DEBUG portion does not apply under X11; an X11 equivalent would be different tooling (xtrace, xev, or XCB-debug instrumentation if the client emits any). The perf+top+drops capture portion remains usable under X11 unchanged.

Current ohm state (carry-over from predecessor)

Per kwin_overlay_subsurface/phase1_evidence/ohm_tooling_revert_log.md, not reverted at predecessor close-out:

qt6-base-fourier 1:6.11.0-3
kwin-fourier 1:6.6.4-3 (Wayland-side compositor; not in the hot path under an X11 session)
mesa 1:26.0.5-1
CPU governor pinned to performance
Baloo permanently disabled
drm-info 2.9.0-1
Active session: startplasma-wayland on tty2, kwin_wayland PID 3927 (as of 2026-05-03 03:05 UTC).
Browser binaries available: /tmp/chromium-ohm-gl-fix-step2/chrome (chromium-fourier, Step 1 + Step 2 patches, 149.0.7812.0), /usr/bin/brave (brave-bin 1:1.89.145-1).

If this campaign needs to switch ohm to an X11 session, that is a session-level operator action (logout, switch via SDDM, log back in). It cannot be done unattended.

Research question (LOCKED 2026-05-03)

"Does cutting out the KWin compositor enable faster video display of Brave, chromium-fourier, and Firefox — for full SW decoding, and for libva decoding (where possible) — on PineTab2 RK3568?"

Mechanism the question targets

Operator-supplied context 2026-05-03:

"hantro emits NV12 which the GPU can't put on a compositeable plane. So that is the real bottleneck of Wayland."

This connects directly to the predecessor's Phase 1 finding (kwin_overlay_subsurface/phase2_source_findings.md:170-229):

Hantro VPU decodes H.264 video into NV12 dmabufs (DRM_FORMAT_NV12, DRM_FORMAT_MOD_LINEAR).
rockchip-drm's only NV12-LINEAR-capable plane is the Primary plane (Plane 39 on CRTC 52), which the running KWin uses for its GL framebuffer.
The overlay plane (Plane 45) advertises no NV12 in any modifier in IN_FORMATS.
Therefore no rockchip-drm scanout plane can accept the NV12 buffer hantro produces while KWin owns the primary plane. Some compositing step must convert NV12 → RGB before display.

The predecessor named the constraint (Path B rejected at the format/modifier intersection) but the consequence — "some component must GL-composite NV12 → RGB on the GPU because nothing else on this hardware can put NV12 on a scanout plane" — was not made explicit. That consequence is this campaign's motivating insight:

Under Plasma Wayland: when the browser engages the Wayland subsurface route (chromium's WaylandBufferManagerHost::CommitOverlays), KWin receives an NV12 dmabuf and must GL-composite it. KWin's compositor is the GL-composite step. When the browser does NOT engage the subsurface route (the predecessor's measured case on brave_drops_test.html — zero wp_subsurface in the trace), the browser itself converts NV12 → RGB in its own GL context and hands KWin only RGB; KWin then composites the RGB to its primary plane.
Under X11 without a compositor: there is no separate compositor process. Two paths are open to the client:
- RGB-composite path (browser converts NV12 → RGB in its own GL context and presents the RGB result via XPresent/DRI3 to the X server, which schedules a page-flip on the same Primary plane KWin would have used). One fewer hand-off than the Wayland-with-subsurface case but the same GL- composite cost as the no-subsurface Wayland case.
- Hardware-overlay path (operator-supplied context 2026-05-03: "a X11 pipeline would route around that by giving a portion of screen real estate directly to the video pipeline"). The X server allocates the Primary plane (Plane 39, supports NV12 LINEAR) to the video region and the Overlay plane (Plane 45, supports RGB/AFBC) to the rest of the desktop. Hardware-blended at scanout time. No GL-composite of NV12 anywhere — the cost the operator named as "the real bottleneck" is structurally avoided.

This second X11 path is what Wayland compositors as designed today cannot do on rockchip-drm-class hardware: KWin Wayland must own the Primary plane for its compositor framebuffer (because the Wayland model is "compositor presents one merged surface per output"), so it cannot give Plane 39 to a video-region NV12 buffer while putting the rest of the desktop on Plane 45. X11 + non-compositing WM has no such constraint — different windows can be assigned to different planes by the X server's plane allocator.

This is the X11 hardware-overlay mechanism that historically made X11 desktops good at video playback (Xv from the late 1990s, and the modern equivalents via DRI3 + XPresent + Composite-redirection-disabled). It is structurally absent in Wayland-with-monolithic-compositor designs.

Hypothesis the matrix tests

There are three potentially separable costs:

The mandatory NV12 → RGB GL conversion, which is forced on Wayland-with-KWin because KWin must own the only NV12-LINEAR-capable plane on this hardware for its compositor framebuffer. This cost is structurally avoidable under X11 + non-compositing WM via hardware-plane-overlay (per the operator-supplied insight above). Whether browsers can be coaxed to use the X11 hardware-overlay path — rather than internally compositing to RGB before presenting — is browser-specific (see Open questions below).
The fallback GL-composite cost when the hardware-overlay path doesn't engage. Both Wayland and X11 pay this when the buffer shape doesn't match a plane — it just runs in different processes (KWin under Wayland, browser under X11).
The per-frame compositor overhead independent of NV12: dmabuf import, transaction apply, presentation-feedback wiring, frame-callback delivery — which the predecessor measured at ~30-37 % of kwin_wayland's CPU during steady-state video playback even when KWin only saw RGB surfaces.

The X11 hypothesis is strongest if cost (1) is dominant on the matrix's with-KWin cells AND the X11 cells trigger the hardware-overlay path. The X11 hypothesis is weakest if cost (1) is small and cost (3) is small — in which case the "cutting out KWin" experiment would show only marginal differences.

The matrix below is designed to surface which of (1) (2) (3) dominates per browser × decode path.

"Faster video display" is operationally a combination of:

Effective fps actually rendered (= getVideoPlaybackQuality().totalVideoFrames / elapsed_s for a 30 fps source — the upper bound is 30; the question is how close).
Drop count over the same 70 s window (droppedVideoFrames).
End-to-end latency if testable (commit → present; testable on Wayland via wp_presentation_feedback, testable on X11 via XPresent extension or RandR vblank events; protocol-side measurement under each display-server).
Compositor + browser CPU at steady state (the cost saved by cutting the compositor is the upper bound on the patch-payoff if a future campaign tries to optimise the compositor instead of removing it).

Experimental matrix

Six 2-axis cells (3 browsers × 2 decode paths) × 2 session conditions (with-KWin / without-KWin):

Browser	Decode	with-KWin (Plasma Wayland)	without-KWin (X11 session, no compositor)
Brave 147	full SW	C-W-brave-sw	C-X-brave-sw
Brave 147	libva (if it works)	C-W-brave-libva	C-X-brave-libva
chromium-fourier 149 (Step 1 + Step 2)	full SW	C-W-chrf-sw	C-X-chrf-sw
chromium-fourier 149	libva (Step 1 enables it)	C-W-chrf-libva	C-X-chrf-libva
Firefox	full SW	C-W-ff-sw	C-X-ff-sw
Firefox	libva	C-W-ff-libva	C-X-ff-libva

The "(if it works)" / "where possible" qualifier per the operator's directive: libva on rockchip-drm RK3568 only works on chromium-fourier (Step 1 ports libva-v4l2-request); for stock Brave 147 and stock Firefox, libva probably doesn't engage and those cells are documented N/A. For Firefox, the Mesa-side libva-v4l2-request may make libva work via Mozilla's VAAPI backend even on stock Firefox — to be verified in Phase 0 inventory.

What "cutting out the KWin compositor" means

This campaign uses X11 session with no compositor in the display path as the "without-KWin" cell. Specifically:

Native Xorg server, NOT XWayland (XWayland would still go through KWin for display, defeating the purpose).
Window manager that does NOT composite by default — e.g. openbox, fluxbox, xfwm4-with-compositing-off, i3, twm. Plasma X11 uses kwin_x11 as compositing WM, which is still a "KWin compositor" — it does not satisfy "cut KWin out" and is excluded from the without-KWin cell.
Browser windowed (not fullscreen). Even on a non-compositing WM, fullscreen browsers may engage XPresent direct presentation paths — testing windowed isolates the baseline non-compositor windowed display path.

The exact WM choice is a Phase 0 inventory decision (which WMs are available on ohm, which install cleanly, which SDDM-advertised sessions exist). Default candidate: openbox.

Three plausible outcome shapes

(α) Without-KWin is materially faster across all 6 cells: confirms the KWin compositor cost is a real bottleneck on this hardware, and X11-session-without- compositor becomes the recommended daily-driver configuration for video work on PineTab2.
(β) Without-KWin is comparable or only marginally faster: the compositor isn't the bottleneck; the drop phenomenon is hardware/kernel/Mesa-bound, and the predecessor's Phase 8 closure stands.
(γ) Mixed picture per browser × decode path: e.g. libva paths benefit but SW paths don't; or Firefox benefits but chromium-class clients don't. Each cell becomes its own characterisation.

Open questions before Phase 1 lock

The hardware-overlay-path mechanism is structurally available on X11 + non-compositing WM. Whether it actually engages for each of the three browsers is browser-specific and currently unknown:

Brave / Chromium ozone-x11: Chromium has overlay-support code (OverlayProcessor, GpuMemoryBufferManager, DCOMPSurface on Windows; on Linux X11 the path is via XPresent + DMA-BUF + OverlayCandidate). Whether Brave 147 / chromium-fourier 149 actually request hardware-overlay presentation for a windowed video element under X11 is open.
Firefox: VAAPIVideoDecoder backend produces hardware decoded NV12 dmabufs that the GL compositor consumes internally. Whether Firefox's X11 backend has a path to hand the dmabuf to the X server for hardware-overlay presentation (rather than internally composing to RGB) is open. Mozilla has a MOZ_X11_EGL hint and a "hardware video overlay" pref but these are not universally engaged.
Reference clients: mpv with --vo=xv or --vo=gpu --hwdec=auto-copy --gpu-context=x11, or gst-play-1.0 with xvimagesink or glimagesink, are known-good X11 hardware-overlay paths. Adding mpv to the matrix as a reference client would isolate "does the X11 hardware- overlay path work AT ALL on this hardware" from "do browsers actually use it." If mpv hardware-overlays cleanly but browsers don't, the conclusion is "the X11 path is fast, but browsers leave the speedup on the table."

If the operator agrees, Phase 0 inventory should:

Verify Plane 39's NV12-LINEAR availability is reachable to X11 clients (it is for KWin Wayland; should be for X11 too since Plane 39 is just a DRM resource), and identify which X11 path actually programs it (modesetting Xorg driver + Option "PageFlip" "true", or DRI3-presented buffer ending up on Plane 39 via the X server's plane allocator).
Inventory Brave's, chromium-fourier's, and Firefox's X11 overlay-presentation paths to see which (if any) request hardware-overlay presentation.
Add mpv as a reference X11-overlay client to the matrix, so the campaign has a known-good comparison point.

What this question does NOT cover

For clarity, since the predecessor was specifically about the Wayland-overlay-subsurface composite path:

This campaign is not investigating the wp_subsurface route. The Wayland-cell of the matrix (with-KWin) measures whatever browser configuration produces under the existing Plasma Wayland session — windowed, default profile, default flags. It's a measurement of the as-shipped Plasma Wayland stack from the user's perspective, not a probe of a specific KWin code path.
The Δ_present-46 ms finding from the predecessor is testable as a free side-finding under both axes (Wayland and X11) but is not the campaign's primary question.
Daily-driver fitness (apps that break under X11, touchscreen behavior, multi-monitor edge cases, etc.) is not in scope. The campaign's deliverable is the matrix above; if any cell is decisively faster, daily-driver-fitness becomes a follow-up campaign.

What's NOT in scope (working assumption)

Until the research question is confirmed, the following are treated as out of scope so they don't slip into Phase 1 prematurely:

Patches to KWin, Xorg, kwin-fourier, qt6-base-fourier, or any other component on ohm. This is research, not patch-development. Per non-upstreaming default, MR/bug-report filing is explicitly tasked and not scheduled here.
The Δ_present-46 ms finding's investigation. It's a known hook from the predecessor; whether this campaign chases it depends on the locked research question.
Reverting predecessor tooling state. Governor, baloo, qt6-base-fourier, kwin-fourier stay as-is unless the operator decides otherwise.
File a bug for any of the predecessor's three documented candidate findings. Same non-upstreaming default applies.

What Phase 0 will deliver, regardless of framing

Even before the research question is locked, the following are useful Phase 0 deliverables that don't depend on the specific question:

State snapshot of ohm under current Plasma Wayland captured at campaign start. This is the before photo for any future X11 vs Wayland comparison. Unattended-tractable (just scripted SSH).
Inventory of available X11 paths on ohm: what packages are installed, what session candidates SDDM advertises, what would need to be installed to enable a Plasma X11 session, what alternate WMs are available. Read-only, unattended-tractable.
Inventory of measurement instruments that work under X11: xtrace, xprop, xrandr --verbose --query, perf on Xorg PID, frame-timing extraction options. Read-only.
A1 baseline under current Plasma Wayland: re-run a single rep of the predecessor's kwin_timing_nodebug condition immediately at the start of this campaign, so the comparison Wayland-vs-X11 has a same-session anchor. This is the "set the baseline before instrument changes" discipline from feedback_replicate_baseline_first.md.

These steps are unblocked. They don't commit to a specific research question and they produce evidence that's useful under any of the candidate framings.

20 KiB Raw Blame History Unescape Escape