# KWin pivot — fix the `glTexImage2D(GL_ALPHA)` stall ## What we know KWin 6.6.4-1 on Arch Linux ARM (Plasma 6.6.4-1, mesa 26.0.5-1, libdrm 2.4.131-1) on ohm (PineTab2 / RK3566 / panfrost) silently corrupts its GL command queue mid-frame whenever a wayland client posts a video buffer. The journal carries a rolling stream of: ``` kwin_wayland: 0x4: GL_INVALID_VALUE in glTexImage2D(internalFormat=GL_ALPHA) kwin_wayland: 0x4: GL_INVALID_OPERATION in glTexSubImage2D(invalid texture level 0) × N ``` `GL_ALPHA` is not a valid `internalFormat` for `glTexImage2D` under **OpenGL ES 3.x** (it was the GLES1.x single-channel alpha format; GLES3 deprecates it for sized formats — `GL_R8`, `GL_LUMINANCE8_ALPHA8`, etc.). Once the texture allocation fails, the `glTexSubImage2D` calls that should populate it all error at level 0. KWin keeps retrying the same broken upload every frame, never recovers, and the present-callback path that depends on that texture stops acking client frames. Every wayland video client deadlocks on the missing ack. First occurrence in this box's journal: **2026-03-06** — the bug predates any chromium-fourier work by roughly seven weeks. ## Triangulation already in hand | Client | Outcome | |---|---| | chromium-fourier 149-r2 (with patch 3/3) | plays ~3 s @ 34.7 % CPU then renderer/GPU park in `futex_do_wait` | | chromium-fourier 149-r2 (without patch 3/3) | plays ~10 s (slower path delays surfacing) then identical deadlock | | VLC | `cannot convert decoder/filter output to any format supported by the output` → `could not initialize video chain` | | mpv `--vo=null --hwdec=v4l2request` | `Could not create device.` (mpv-side bug, separate, unrelated) | | ffmpeg `-hwaccel v4l2request -i bbb -f null -` | plays through clean at 36 fps; hardware path is healthy | Decode path is healthy on this hardware. The wall is exclusively the compositor's GL backend. ## Constraint: ohm is the only test box on hand ampere (RK3588 / panthor) is in the boxes-from-Shenzhen pile, currently DOWN. fresnel (RK3399 / Pinebook Pro) is offline. boltzmann (Rock 5 ITX+ build host) doesn't run KWin. We do every step on ohm; we accept the wifi flakiness and the occasional reboot. ## Phase 1 — Reproduce outside chrome and bound the trigger (1 evening) Goal: a deterministic, headless-or-near-headless reproduction that doesn't require launching a 800-MB browser. 1. **Smallest-possible client.** Build a 50-line C wayland client that creates a `wp_linux_dmabuf_v1` buffer, pumps frames at 30 fps, and exits when KWin first errors. Use `weston-simple-dmabuf-egl` from the `weston` package as a starting template — already does exactly this but without our specific format/modifier matrix. 2. **Vary the format/modifier matrix.** Run the smallest-possible client with each of: NV12 + LINEAR, NV12 + AFBC, NV12 + AFRC, AR24 + LINEAR, XR24 + LINEAR. We already know NV12 paths trigger; confirming AR24/XR24 do *not* trigger localizes the bug to KWin's YUV import path (vs a generic dmabuf import bug). 3. **Vary the buffer dimensions.** Some KWin texture-cache paths allocate fixed-size internal scratch textures; non-power-of-two, non-multiple-of-16, or specifically odd-aspect cases sometimes trigger paths that healthy aspect ratios skip. Test 1920×1080, 1280×720, 854×480, 640×360 and a deliberately weird 1366×768. 4. **Vary KWin scene type.** Switch `kwin_wayland --scene-type=opengl` vs `--scene-type=opengl-es` (current default on this hardware). If the bug only fires under GLES, that's a strong signal — the offending site is in a GLES-only fallback. By the end of Phase 1 we should have a one-line `weston-simple-dmabuf-egl -format=NV12 -modifier=…` that triggers the GL_ALPHA error within seconds, plus a yes/no answer to "does AR24 also trigger". ## Phase 2 — Identify the call site (1–2 evenings) The crime scene is somewhere in `kwin/src/scene/*` or `kwin/src/effects/*`. Suspects, ranked: - **`SurfaceItemWayland::createPixmapTexture` → `GLTexture::create` with `GL_ALPHA`.** This is the most likely path: KWin allocates a fallback per-plane texture when the dmabuf import path can't take the buffer whole. NV12 has a Y plane (single-channel) and a CbCr plane (two-channel); historically the Y plane has been allocated as `GL_ALPHA` in software fallbacks. If the EGL dmabuf import returned `EGL_BAD_ATTRIBUTE` for `external_only` modifiers and KWin fell through to per-plane, this is exactly where it would land. - **`BlurEffect::initBlurTexture` / `BackgroundContrastEffect::*`.** Single-channel noise textures for blur dither. Less likely (these fire on every frame regardless of video clients) but listed for completeness. - **Window-decoration text glyph cache.** Qt's QGLTexture historically requested `GL_ALPHA` for monochrome glyph atlases. Plasma 6 should have moved to `GL_RED` long ago, but a stale code path in a third-party theme or systray icon could still hit it. - **Cursor texture upload via `wl_shm_pool` + ARGB8888.** KWin's cursor scene sometimes uploads via glTexImage2D — but the format there is `GL_RGBA`, not `GL_ALPHA`. Probably not the suspect. Tooling to identify *which*: 1. **`apitrace trace --api egl kwin_wayland …`** then `apitrace dump trace.trace | grep -B5 GL_ALPHA`. Apitrace gives us the C++ call stack at the offending site if KWin was built with debug symbols. 2. **`MESA_GL_DEBUG=context KWIN_GL_DEBUG=1 kwin_wayland --replace`** plus `glDebugMessageCallback` already installed in KWin's `OpenGLBackend` will print the source/type/severity for each `GL_INVALID_VALUE`. Whether the file/line in the message includes the user-space caller depends on Mesa's debug-extension support; on panfrost it usually does include the GL function name and an ID, but not the C++ source — that is what apitrace adds. 3. **Build kwin from source** (`extra/kwin` PKGBUILD on Arch ARM, patch in `-DDEBUG=ON`, `-DCMAKE_BUILD_TYPE=Debug`) so the call stacks resolve to file:line. ## Phase 3 — Write the patch (½ evening once Phase 2 is done) If the offender is a `GL_ALPHA` allocation in a GLES3 context, the fix is mechanical: ```diff - glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, width, height, 0, - GL_ALPHA, GL_UNSIGNED_BYTE, data); + glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, width, height, 0, + GL_RED, GL_UNSIGNED_BYTE, data); ``` …and adjust the consuming shader's swizzle: ```diff - gl_FragColor = vec4(texture2D(s, uv).a, …); + gl_FragColor = vec4(texture2D(s, uv).r, …); ``` If the offender is a per-plane fallback in the dmabuf import path (suspect #1 above), the patch is larger because the right fix is to *not fall through to the broken path* — handle the `external_only` case by binding `GL_TEXTURE_EXTERNAL_OES` instead. That mirrors the chromium-fourier patch 3/3 done at the chromium layer; symmetry says KWin should do the same in its `glTexImage` consumer. ## Phase 4 — Ship and upstream (1 evening) 1. **Local Arch package** as `kwin-fourier` under `marfrit-packages/arch/kwin-fourier/`, sibling to chromium-fourier and firefox-fourier. PKGBUILD inherits from `extra/kwin`, drops in our patch, bumps `pkgrel`. Same `provides=kwin conflicts=kwin` pattern. 2. **Validate on ohm** by running the chromium-fourier 149-r2 build + the bbb sample for a minute uninterrupted. Success = no GL_ALPHA in the journal, no stall, smooth playback at the 34.7 % CPU number from the chromium validation. 3. **Upstream** via: - File a `kwin` bug on bugs.kde.org with: apitrace fragment, our hardware (Mali-G52 panfrost on RK3566 mainline), exact mesa version, repro steps via `weston-simple-dmabuf-egl` if Phase 1 produced one. - Push an MR to invent.kde.org/plasma/kwin against `master`. 4. **Document** the fix in `chromium-fourier/docs/dmabuf-zero-copy.md` so the next person who lands on the same wall finds the breadcrumb trail. ## What success looks like `chromium-fourier-149-r2` on ohm under KWin Wayland plays `bbb_1080p30_h264.mp4` end-to-end at the 34.7 % CPU figure already recorded by the architectural validation, with zero `GL_INVALID_VALUE` in the journal during playback. That number is the goal of the entire chromium-fourier campaign for RK3566 — it is currently blocked on a bug that has nothing to do with chromium. ## Scope discipline We do not turn this into "audit the entire KWin GLES backend." If Phase 2 surfaces additional latent GL_INVALID_* errors that don't matter for video playback, we note them in the bug report and move on. The pivot is explicitly "remove this single wall so the chromium-fourier patch series can ship a working stack on RK3566."