KWIN_PIVOT: Phase-4 done (qt6 patches landed) + weston A/B vindicates KWin theory

Today's deltas:
- qt6-base-fourier built, installed, validated on ohm. Static-idle
  journal shows zero GL_INVALID_VALUE post-relogin; the Qt 6
  GL_ALPHA bug is genuinely fixed.
- chrome v4 under KWin still stalls — at ~6s vs ~3s pre-Qt-fix, so
  the GL_ALPHA churn was contributing some load but wasn't the
  primary cause.
- Clean A/B with weston: same chrome v4 binary, same panfrost,
  same V4L2, same hardware → swapping KWin → weston turns the
  stall off entirely. Chrome plays through with elevated CPU
  (~96 % vs KWin's ~50 % when it isn't stalled) because weston
  falls back to LINEAR composite vs KWin's fast-tile path.
- mpv triangulation:
    --vo=null --hwdec=v4l2request: clean (decode only)
    --vo=drm --hwdec=v4l2request: 0.7 % drops in 19 s (KMS scanout)
    --vo=gpu-next --hwdec=v4l2request under KWin: 76 % drops, slideshow

Decode + display hardware path is fully capable. The wall is
specifically KWin's compositor scheduling/presentation pipeline on
this stack — panfrost ES 3.2 + V4L2 stateless NV12 dmabuf clients.

KWIN_PIVOT.md rewritten:
- Phase 4 (qt6 patch, ship, upstream) marked done.
- New Phase 5 (find the KWin culprit): WAYLAND_DEBUG on chrome +
  KWin to capture the missing wl_buffer.release / wp_presentation
  exchange around the 6 s stall, plus strace-on-kwin and
  effects-disable bisection.
- New Phase 6 (fix and ship): kwin-fourier package pattern, ohm
  validation, bugs.kde.org filing.
This commit is contained in:
2026-04-28 15:35:01 +00:00
parent 1dada5122b
commit 93d7c5c67a
+132 -24
View File
@@ -1,29 +1,137 @@
# KWin pivot — fix the `glTexImage2D(GL_ALPHA)` stall
# KWin pivot — fix the chrome-on-KWin video stall
> **2026-04-28 update — Phase 2 collapsed onto Phase 1: it's not KWin.**
> Source-grep nailed the offender on the first pass. Real culprit:
> Qt 6's `QOpenGLTextureGlyphCache` (`src/opengl/qopengltextureglyphcache.cpp:111-117`)
> and `QRhiGles2::toGlTextureFormat` (`src/gui/rhi/qrhigles2.cpp:1373-1378`).
> **2026-04-28 update part 2 — qt6-base-fourier landed, validated, did
> not fix the chrome stall.** The Qt 6 GL_ALPHA bug (qopengltextureglyphcache.cpp,
> qrhigles2.cpp, qopengltextureuploader.cpp) is real, the patch is
> correct, the journal noise is gone — but the chromium-fourier 149-r4
> playback under KWin still deadlocks at ~6 seconds (vs ~3 seconds
> pre-patch — so the GL_ALPHA churn was contributing some overhead,
> just not the primary cause). Vindication came from a clean weston
> A/B: same chrome v4 binary, same panfrost mesa, same V4L2 driver,
> same hardware → swapping KWin for weston turns the stall off. Chrome
> plays through under weston (with elevated CPU because weston falls
> back to LINEAR composite). KWin has a *second* bug, structurally
> deeper than the Qt one. **Phase 4 below — write/ship/upstream the Qt
> fix — was completed today.** This document now pivots again to the
> remaining KWin investigation.
> **2026-04-28 update part 1 — Phase 2 collapsed onto Phase 1: not KWin
> for the GL_ALPHA part.** Source-grep nailed the offender on the
> first pass. Real culprit: Qt 6's `QOpenGLTextureGlyphCache`
> (`src/opengl/qopengltextureglyphcache.cpp:111-117`) and
> `QRhiGles2::toGlTextureFormat` (`src/gui/rhi/qrhigles2.cpp:1373-1378`).
> KWin's own GL paths use `GL_R8` correctly (`src/opengl/gltexture.cpp:61`,
> `src/scene/shadowitem.cpp:494`). The pivot becomes a **Qt-fourier**
> patch, not a kwin-fourier one. Plan rewritten below; the pre-rewrite
> reproduction/triangulation phases are kept verbatim because they
> still apply to whatever lives downstream of the Qt fix.
>
> Qt's broken logic, in plain English: *"If qtbase was built with
> opengles2, just always use `GL_ALPHA`."* That's correct for an
> OpenGL ES 2.x context. It's wrong for OpenGL ES 3.x, where
> `GL_ALPHA` is no longer a valid `glTexImage2D` internalFormat
> (only sized formats — `GL_R8`, etc.). Mali / panfrost on RK3566
> exposes ES 3.2; KWin requests an ES 3.2 context; Qt picks
> `GL_ALPHA`; mesa returns `GL_INVALID_VALUE`; the texture is
> permanently broken; every dependent draw errors at level 0; the
> compositor's frame-callback path stalls. Affects every Qt 6
> application on Mali-class hardware that ends up rendering text
> through `QOpenGLTextureGlyphCache` (KDE's window decorations,
> Plasma overlays, Qt Quick scenegraph via RHI, ad nauseam) — KWin
> just happens to be the most visible victim because it's the
> compositor and its stall takes everyone else down with it.
> `src/scene/shadowitem.cpp:494`). The Qt-fourier patch series shipped
> as `marfrit-packages/arch/qt6-base-fourier/` and was validated on
> ohm — zero `GL_INVALID_VALUE` in a fresh-session journal.
## Triangulation table after today's work
| Path | Result |
|---|---|
| `ffmpeg -hwaccel v4l2request -f null` | ✓ 36 fps, clean |
| `mpv --vo=null --hwdec=v4l2request` | ✓ decode-only, clean |
| `mpv --vo=drm --hwdec=v4l2request` (KMS scanout, no compositor) | ✓ 0.7% drops in 19 s |
| **chrome v4 under weston** | **✓ plays through; ~96 % CPU** |
| chrome v4 under KWin (post-Qt-fix) | ✗ stall @ ~6 s, ⏸ icon, audio clock advances |
| `mpv --vo=gpu-next --hwdec=v4l2request` under KWin | ✗ ~76 % drops, slideshow |
| **chrome v4 under KWin pre-Qt-fix** | **✗ stall @ ~3 s** (GL_ALPHA spam adds load) |
Decode + display hardware path is fully capable. Wayland *as a
protocol* is fine (weston works). The wall is **specifically KWin's
compositor scheduling and presentation pipeline on this stack** —
panfrost ES 3.2 + V4L2 stateless NV12 dmabuf clients.
The chromium-fourier patch series is correct. The qt6-base-fourier
patch series is correct. The KWin bug is the third independent
problem on this hardware, exposed by the prior two fixes.
## What we know and what we don't
**Known:**
- The stall is *not* in chrome's V4L2VideoDecoder (works under weston).
- The stall is *not* in panfrost's dmabuf import (works under weston).
- The stall is *not* a GL error (no `GL_INVALID_VALUE` after qt6 fix).
- The stall is *not* a thread parked in `vb2`/`v4l2`/`dma_fence` wchan
(kwin/chrome/audio threads all sit in `futex_do_wait` /
`poll_schedule_timeout` / `unix_stream_read_generic`).
- The stall *does* idle the audio output socket (renderer audio
thread blocks reading from the audio service unix socket → audio
drains the last ALSA buffer, static, silence).
- The stall *does* leave the `<video>` element's currentTime
advancing (audio clock keeps running) while no new frames present.
**Unknown (where the next investigation should bisect):**
- Is KWin failing to deliver `wl_buffer.release` events promptly?
Symptom would be: chrome holds dmabuf references → V4L2's 6-buffer
CAPTURE pool exhausts → decoder blocks. Test by tracing
`wl_buffer.release` events on the chrome wl_display via a wayland
proxy / debugger.
- Is KWin failing to deliver `wp_presentation_feedback`? Chrome's
viz compositor uses presentation feedback to pace; if it never
fires, the renderer's frame submission backs up.
- Is KWin's render loop blocking on a vsync wait that never
releases? The 6-second figure roughly matches multiples of
panfrost's idle-power-state wakeup latency, but that's a
speculative correlation.
- Is KWin doing something specific with `wp_subsurface` that chrome's
multi-process IPC tickles in a way mpv's single-process flow does
not? Chrome attaches video to a subsurface inside its main
surface; mpv attaches video to its main surface directly.
## Phase 5 — Find the KWin culprit
Order of cheapness:
1. **Strace KWin during a chrome stall.** What is KWin actually
doing at the moment of stall? (`strace -p <kwin_pid> -f -e trace=poll,read,write,recvmsg,sendmsg -o /tmp/kwin.strace` for ~10 s during the chrome run, then chase a flatlined fd.)
2. **WAYLAND_DEBUG=client + WAYLAND_DEBUG=server.** Run chrome with
`WAYLAND_DEBUG=client`, capture all wayland traffic, look for
the last `wl_buffer.attach` chrome sent before the stall and
whether KWin ever emitted the corresponding `wl_buffer.release`.
This single experiment probably localizes the bug.
3. **Disable KWin effects, scene type, blur, contrast, animation
speed.** `kcmshell6 kwincompositing` toggles. If turning off
*all* effects unsticks chrome, the bug is in an effect plugin.
4. **Bisect KWin git.** v6.6.4 vs v6.5.x or earlier — does the
stall reproduce on v6.5? If not, bisect the offending commit.
(Heavy: KWin builds for ~1 h on boltzmann.)
Step (2) is the headliner. WAYLAND_DEBUG yields a complete
client-server transcript; the missing event after the stall is
usually the bug, and the exchange around it is the call site.
## Phase 6 — Fix and ship
Once we know the call site:
- Write the patch against `kwin/master`, smallest possible diff.
- Local Arch package as `kwin-fourier` under
`marfrit-packages/arch/kwin-fourier/`, pattern matching
`qt6-base-fourier`. Bump epoch to dominate Arch's version.
- Validate on ohm: chrome v4 + bbb sample plays through to EOF at
the 34.7 % CPU number (KWin's fast-tile path is more efficient
than weston's LINEAR fallback, so once unstuck, KWin should beat
weston on CPU).
- File on bugs.kde.org with the WAYLAND_DEBUG transcript, the
weston-vs-KWin A/B table from this document, and the patch.
- Push MR to invent.kde.org/plasma/kwin against `master`.
## Reflection — the spec-shaped void
The user's original Phase-1 hypothesis was "leaked corporate
short-cuts." Today vindicates it twice over:
- **Qt 6**: codified "OpenGL ES" as one thing in 2012, never
re-read the ES 3 spec when GL_ALPHA was deprecated. We patched
it. Three sites, ~9 lines of net change.
- **KWin**: still TBD which spec-or-spec-shaped-omission it tripped
on. The data so far points at a compositor scheduling issue —
most likely a missing or late `wl_buffer.release` /
`wp_presentation_feedback` for the specific case of multiple
IPC-fragmented client surfaces driving a fast video stream.
That's exactly the kind of scenario that gets "we never tested
this combination" treatment in QA.
## What we know