From cd25d02e0148b28ebbfa5c6a45e32190f0cf090a Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 28 Apr 2026 12:18:25 +0000 Subject: [PATCH] =?UTF-8?q?KWIN=5FPIVOT:=20Phase-2=20findings=20=E2=80=94?= =?UTF-8?q?=20bug=20is=20in=20Qt=206,=20not=20KWin?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Source-grep collapsed Phase 1+2 onto a single pass. KWin's own GL paths use GL_R8 correctly (gltexture.cpp:61, shadowitem.cpp:494). The glTexImage2D(GL_ALPHA) calls observed in the journal originate from Qt 6: - qtbase/src/opengl/qopengltextureglyphcache.cpp:111-117 — text glyph cache upload path. The #else branch (active when qtbase is built with QT_CONFIG(opengles2)) unconditionally uses GL_ALPHA, with no runtime check for ES context major version. Correct on ES 2.x; broken on ES 3.x where GL_ALPHA is no longer a valid glTexImage2D internalFormat. - qtbase/src/gui/rhi/qrhigles2.cpp:1373-1378 — Qt-Quick-RHI sibling. Same logic, gated only on caps.coreProfile, missing the ES≥3 case. - qtbase/src/opengl/qopengltextureuploader.cpp:253-257 — QImage→GL upload path; same shape. KWin runs an ES 3.2 context on Mali-G52 panfrost (RK3566), Qt picks GL_ALPHA, mesa returns GL_INVALID_VALUE, every dependent draw errors at level 0, the compositor's frame-callback path stalls. KWin is the visible victim because it's the compositor, but the bug is in Qt. KWIN_PIVOT.md rewritten: the patch series and packaging now target qt6-base-fourier instead of kwin-fourier. Three small hunks (~3 lines each), runtime-safe via existing caps.gles + caps.ctxMajor / surface format majorVersion checks. Upstream landing path: bugreports.qt.io + Gerrit change against qtbase dev branch. --- arch/chromium-fourier/KWIN_PIVOT.md | 163 ++++++++++++++++++++++------ 1 file changed, 131 insertions(+), 32 deletions(-) diff --git a/arch/chromium-fourier/KWIN_PIVOT.md b/arch/chromium-fourier/KWIN_PIVOT.md index 668442fef..d4580b6ea 100644 --- a/arch/chromium-fourier/KWIN_PIVOT.md +++ b/arch/chromium-fourier/KWIN_PIVOT.md @@ -1,5 +1,30 @@ # KWin pivot — fix the `glTexImage2D(GL_ALPHA)` stall +> **2026-04-28 update — Phase 2 collapsed onto Phase 1: it's not KWin.** +> Source-grep nailed the offender on the first pass. Real culprit: +> Qt 6's `QOpenGLTextureGlyphCache` (`src/opengl/qopengltextureglyphcache.cpp:111-117`) +> and `QRhiGles2::toGlTextureFormat` (`src/gui/rhi/qrhigles2.cpp:1373-1378`). +> KWin's own GL paths use `GL_R8` correctly (`src/opengl/gltexture.cpp:61`, +> `src/scene/shadowitem.cpp:494`). The pivot becomes a **Qt-fourier** +> patch, not a kwin-fourier one. Plan rewritten below; the pre-rewrite +> reproduction/triangulation phases are kept verbatim because they +> still apply to whatever lives downstream of the Qt fix. +> +> Qt's broken logic, in plain English: *"If qtbase was built with +> opengles2, just always use `GL_ALPHA`."* That's correct for an +> OpenGL ES 2.x context. It's wrong for OpenGL ES 3.x, where +> `GL_ALPHA` is no longer a valid `glTexImage2D` internalFormat +> (only sized formats — `GL_R8`, etc.). Mali / panfrost on RK3566 +> exposes ES 3.2; KWin requests an ES 3.2 context; Qt picks +> `GL_ALPHA`; mesa returns `GL_INVALID_VALUE`; the texture is +> permanently broken; every dependent draw errors at level 0; the +> compositor's frame-callback path stalls. Affects every Qt 6 +> application on Mali-class hardware that ends up rendering text +> through `QOpenGLTextureGlyphCache` (KDE's window decorations, +> Plasma overlays, Qt Quick scenegraph via RHI, ad nauseam) — KWin +> just happens to be the most visible victim because it's the +> compositor and its stall takes everyone else down with it. + ## What we know KWin 6.6.4-1 on Arch Linux ARM (Plasma 6.6.4-1, mesa 26.0.5-1, libdrm @@ -118,50 +143,124 @@ Tooling to identify *which*: ## Phase 3 — Write the patch (½ evening once Phase 2 is done) -If the offender is a `GL_ALPHA` allocation in a GLES3 context, the -fix is mechanical: +The Qt 6 fix is two ~3-line changes, runtime-safe, no new dependency. + +**Fix #1 — `src/opengl/qopengltextureglyphcache.cpp` lines 111-117:** ```diff -- glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, width, height, 0, -- GL_ALPHA, GL_UNSIGNED_BYTE, data); -+ glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, width, height, 0, -+ GL_RED, GL_UNSIGNED_BYTE, data); + #if !QT_CONFIG(opengles2) + const GLint internalFormat = isCoreProfile() ? GL_R8 : GL_ALPHA; + const GLenum format = isCoreProfile() ? GL_RED : GL_ALPHA; + #else +- const GLint internalFormat = GL_ALPHA; +- const GLenum format = GL_ALPHA; ++ // OpenGL ES 3.x deprecated GL_ALPHA as a glTexImage2D ++ // internalFormat; only true ES 2 contexts retain it. Use GL_R8 ++ // + the matching swizzle (handled in the fragment shader's .r ++ // sample below) on ES 3+ hardware so Mali / panfrost / panthor ++ // GLES3 contexts stop emitting GL_INVALID_VALUE every frame. ++ const bool useR8 = ctx->format().majorVersion() >= 3; ++ const GLint internalFormat = useR8 ? GL_R8 : GL_ALPHA; ++ const GLenum format = useR8 ? GL_RED : GL_ALPHA; + #endif ``` -…and adjust the consuming shader's swizzle: +The downstream fragment shader path that samples this texture must +read `.r` instead of `.a` when `GL_R8` is used. Qt's text-rendering +fragment program already has both code paths conditioned on context +core-profile; the ES 3+ branch needs the same treatment. Lines +214-216 of the same file (the resize / re-upload path) need the +identical change. + +**Fix #2 — `src/gui/rhi/qrhigles2.cpp` lines 1373-1378:** ```diff -- gl_FragColor = vec4(texture2D(s, uv).a, …); -+ gl_FragColor = vec4(texture2D(s, uv).r, …); + case QRhiTexture::RED_OR_ALPHA8: +- *glintformat = caps.coreProfile ? GL_R8 : GL_ALPHA; ++ *glintformat = (caps.coreProfile || (caps.gles && caps.ctxMajor >= 3)) ++ ? GL_R8 : GL_ALPHA; + *glsizedintformat = *glintformat; +- *glformat = caps.coreProfile ? GL_RED : GL_ALPHA; ++ *glformat = (caps.coreProfile || (caps.gles && caps.ctxMajor >= 3)) ++ ? GL_RED : GL_ALPHA; + *gltype = GL_UNSIGNED_BYTE; + break; ``` -If the offender is a per-plane fallback in the dmabuf import path -(suspect #1 above), the patch is larger because the right fix is to -*not fall through to the broken path* — handle the `external_only` -case by binding `GL_TEXTURE_EXTERNAL_OES` instead. That mirrors the -chromium-fourier patch 3/3 done at the chromium layer; symmetry says -KWin should do the same in its `glTexImage` consumer. +`caps.gles` and `caps.ctxMajor` are populated at context creation +(qrhigles2.cpp:804 + :855); the disjunct is free. + +**Fix #3 — `src/opengl/qopengltextureuploader.cpp` lines 253-257:** + +This is the QImage→GL upload path (used by `QOpenGLPaintEngineEx` +and its descendants). Same pattern, same fix shape: extend the +"core profile or GLES2 fallback" branching to also consider GLES3+ +as needing `GL_R8`. + +If we want to be aggressive, we can collapse all three sites onto a +single `qt_gl_use_r8_for_alpha8(ctx)` helper in `qopenglhelper_p.h` +so future Qt versions don't drift apart again — but a minimal patch +should keep the three sites independent so each is reviewable in +isolation by the relevant Qt module owner. ## Phase 4 — Ship and upstream (1 evening) -1. **Local Arch package** as `kwin-fourier` under - `marfrit-packages/arch/kwin-fourier/`, sibling to chromium-fourier - and firefox-fourier. PKGBUILD inherits from `extra/kwin`, drops - in our patch, bumps `pkgrel`. Same `provides=kwin conflicts=kwin` - pattern. -2. **Validate on ohm** by running the chromium-fourier 149-r2 build + - the bbb sample for a minute uninterrupted. Success = no GL_ALPHA - in the journal, no stall, smooth playback at the 34.7 % CPU - number from the chromium validation. +1. **Local Arch package** as `qt6-base-fourier` under + `marfrit-packages/arch/qt6-base-fourier/`, sibling to chromium-fourier + and firefox-fourier. PKGBUILD inherits from `extra/qt6-base`, drops + in the three patches above, bumps `pkgrel`. Same + `provides=qt6-base conflicts=qt6-base` pattern. Rebuild is heavy + (qtbase compile is ~30 minutes on boltzmann; ohm rebuild is + sustained-fan-territory and probably better avoided — boltzmann + builds the aarch64 .pkg.tar.zst, then we rsync it to ohm and + `pacman -U` there). +2. **Validate on ohm** by: + - `pacman -U` the patched qt6-base. + - Restart Plasma session (logout / login) so the new qt6-base.so + is mapped into the fresh kwin_wayland. + - Re-run `journalctl -u plasma-kwin_wayland.service -f` while + opening any Qt 6 application that triggers text caching (a + terminal, kate, the system tray) — the GL_INVALID_VALUE spam + should be **gone**. + - Then run chromium-fourier 149-r2 + the bbb sample for a full + minute uninterrupted. Success = smooth playback through to EOF + at the 34.7 % CPU number, no stall, no audio static, no + KWin-side errors in the journal. 3. **Upstream** via: - - File a `kwin` bug on bugs.kde.org with: apitrace fragment, our - hardware (Mali-G52 panfrost on RK3566 mainline), exact mesa - version, repro steps via `weston-simple-dmabuf-egl` if Phase 1 - produced one. - - Push an MR to invent.kde.org/plasma/kwin against `master`. -4. **Document** the fix in `chromium-fourier/docs/dmabuf-zero-copy.md` - so the next person who lands on the same wall finds the breadcrumb - trail. + - File on `bugreports.qt.io` against `QtBase: OpenGL`, with: the + three diff hunks above, the exact behavior on Mali-G52 panfrost + RK3566 mainline 6.19, an excerpt of the journal noise, and + mesa 26.0.5 / qt 6.11.0 / kwin 6.6.4 versions. + - Push a Gerrit change against `qtbase` `dev` branch + (`codereview.qt-project.org`). Qt won't accept a GitHub MR — + they live on Gerrit. Create a Qt account, configure + `git-review`, push. + - Reference the chromium-fourier project as the discovery site + so the next Mali-on-Linux Qt 6 user finds the breadcrumb. +4. **Document** the fix in + `chromium-fourier/docs/dmabuf-zero-copy.md` "Caveat — KWin 6.6.4 + GLES backend on this hardware" subsection: replace the "to be + investigated" wording with "fixed by qt6-base-fourier; see + `marfrit-packages/arch/qt6-base-fourier/`. Upstream Qt change + pending review at ``." + +## Reflection — corporate IT spec leakage, as predicted + +The user's Phase-1 hypothesis was that this was the result of code +written by people who never read the spec they were claiming to +implement. They were correct, with one nuance: the Qt code did read +the spec — *the OpenGL ES 2.x spec*, where `GL_ALPHA` is genuinely +the canonical single-channel format for `glTexImage2D`. What it +never went back and re-read is the OpenGL ES 3.0 spec +(section 3.8.3, "Texture Image Specification"), where `GL_ALPHA` +is moved to the deprecated list and only sized formats are +retained. The bug is: *Qt 6 was written assuming "OpenGL ES" is +one thing, and never updated the assumption when ES 3 dropped the +unsized formats.* That's a corporate-IT-style architectural +shortcut: codify the world in two boxes (desktop vs ES), call it +done, ship. The fact that a category had a sub-category which moved +in 2012 is not the framework's job to track. Until the bug report +arrives and someone has to extend the boolean to a triple. ## What success looks like