Files
marfrit-packages/arch/chromium-fourier/KWIN_PIVOT.md
T
marfrit 8756ce38be
build and publish packages / distcc-avahi-aarch64 (push) Successful in 46s
build and publish packages / lmcp-any (push) Successful in 9s
build and publish packages / lmcp-debian (push) Successful in 4s
build and publish packages / claude-his-any (push) Successful in 7s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Successful in 12m8s
build and publish packages / claude-his-debian (push) Successful in 5s
chromium-fourier r2 + firefox-fourier 150.0.1 + KWIN_PIVOT.md
chromium-fourier:
- patch 3/3 nv12-external-oes-on-modifier-external-only.patch — adds
  NativePixmapEGLBinding::ModifierRequiresExternalOES helper, extends
  OzoneImageGLTexturesHolder::GetBinding to honor EGL external_only
  flag for NV12 dmabufs on panfrost / panthor. Validated on ohm
  (RK3566 hantro mainline 6.19.10): bbb_1080p30_h264.mp4 plays at
  34.7 % combined CPU vs ~131 % pre-patch baseline (~3.8x).
- PKGBUILD pkgrel 1->2, source array + sha256sums + prepare() hook for
  patch 4, patch numbering 1/2,2/2 -> 1/3,2/3,3/3.
- NEXT.md appended with 2026-04-28 section: patch 4 design, validation
  log, KWin GL_ALPHA bug pinpoint (preexisting since 2026-03-06,
  affects every wayland video client; unrelated to chromium-fourier),
  device-renumbering note (/dev/video1 = encoder post-reboot).
- KWIN_PIVOT.md: 4-phase plan to identify and patch KWin's
  glTexImage2D(internalFormat=GL_ALPHA) site, ohm-only test plan,
  scope discipline.
- patches/ now tracked (compiler-rt-adjust-paths, enable-v4l2,
  wayland-allow-direct-egl-gles2, nv12-external-oes); the dead-end
  chromeos-pipeline-bypass.patch removed.

firefox-fourier:
- 4 patches (gfxinfo v4l2 stateless fourccs, libwrapper hwdevice ctx,
  ffmpegvideo v4l2-request route, prefs v4l2-request default).
- PKGBUILD bumped to firefox 150.0.1, Arch toolchain glue patches
  layered in, mozconfig with --without-wasm-sandboxed-libraries for
  ALARM, package() launcher fix (rm -f symlink before cat > to avoid
  ENOENT through the dangling /usr/local symlink mach install drops).
- 150.0.1-1-aarch64.pkg.tar.zst built on boltzmann (95 MB), pending
  fresnel power-on for V4L2 stateless validation on RK3399.
2026-04-28 12:02:18 +00:00

182 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# KWin pivot — fix the `glTexImage2D(GL_ALPHA)` stall
## What we know
KWin 6.6.4-1 on Arch Linux ARM (Plasma 6.6.4-1, mesa 26.0.5-1, libdrm
2.4.131-1) on ohm (PineTab2 / RK3566 / panfrost) silently corrupts its
GL command queue mid-frame whenever a wayland client posts a video
buffer. The journal carries a rolling stream of:
```
kwin_wayland: 0x4: GL_INVALID_VALUE in glTexImage2D(internalFormat=GL_ALPHA)
kwin_wayland: 0x4: GL_INVALID_OPERATION in glTexSubImage2D(invalid texture level 0) × N
```
`GL_ALPHA` is not a valid `internalFormat` for `glTexImage2D` under
**OpenGL ES 3.x** (it was the GLES1.x single-channel alpha format;
GLES3 deprecates it for sized formats — `GL_R8`, `GL_LUMINANCE8_ALPHA8`,
etc.). Once the texture allocation fails, the `glTexSubImage2D` calls
that should populate it all error at level 0. KWin keeps retrying the
same broken upload every frame, never recovers, and the present-callback
path that depends on that texture stops acking client frames. Every
wayland video client deadlocks on the missing ack.
First occurrence in this box's journal: **2026-03-06** — the bug
predates any chromium-fourier work by roughly seven weeks.
## Triangulation already in hand
| Client | Outcome |
|---|---|
| chromium-fourier 149-r2 (with patch 3/3) | plays ~3 s @ 34.7 % CPU then renderer/GPU park in `futex_do_wait` |
| chromium-fourier 149-r2 (without patch 3/3) | plays ~10 s (slower path delays surfacing) then identical deadlock |
| VLC | `cannot convert decoder/filter output to any format supported by the output``could not initialize video chain` |
| mpv `--vo=null --hwdec=v4l2request` | `Could not create device.` (mpv-side bug, separate, unrelated) |
| ffmpeg `-hwaccel v4l2request -i bbb -f null -` | plays through clean at 36 fps; hardware path is healthy |
Decode path is healthy on this hardware. The wall is exclusively the
compositor's GL backend.
## Constraint: ohm is the only test box on hand
ampere (RK3588 / panthor) is in the boxes-from-Shenzhen pile, currently
DOWN. fresnel (RK3399 / Pinebook Pro) is offline. boltzmann (Rock 5
ITX+ build host) doesn't run KWin. We do every step on ohm; we accept
the wifi flakiness and the occasional reboot.
## Phase 1 — Reproduce outside chrome and bound the trigger (1 evening)
Goal: a deterministic, headless-or-near-headless reproduction that
doesn't require launching a 800-MB browser.
1. **Smallest-possible client.** Build a 50-line C wayland client that
creates a `wp_linux_dmabuf_v1` buffer, pumps frames at 30 fps, and
exits when KWin first errors. Use `weston-simple-dmabuf-egl` from
the `weston` package as a starting template — already does exactly
this but without our specific format/modifier matrix.
2. **Vary the format/modifier matrix.** Run the smallest-possible
client with each of: NV12 + LINEAR, NV12 + AFBC, NV12 + AFRC,
AR24 + LINEAR, XR24 + LINEAR. We already know NV12 paths trigger;
confirming AR24/XR24 do *not* trigger localizes the bug to KWin's
YUV import path (vs a generic dmabuf import bug).
3. **Vary the buffer dimensions.** Some KWin texture-cache paths
allocate fixed-size internal scratch textures; non-power-of-two,
non-multiple-of-16, or specifically odd-aspect cases sometimes
trigger paths that healthy aspect ratios skip. Test 1920×1080,
1280×720, 854×480, 640×360 and a deliberately weird 1366×768.
4. **Vary KWin scene type.** Switch
`kwin_wayland --scene-type=opengl` vs `--scene-type=opengl-es`
(current default on this hardware). If the bug only fires under
GLES, that's a strong signal — the offending site is in a
GLES-only fallback.
By the end of Phase 1 we should have a one-line `weston-simple-dmabuf-egl
-format=NV12 -modifier=…` that triggers the GL_ALPHA error within
seconds, plus a yes/no answer to "does AR24 also trigger".
## Phase 2 — Identify the call site (12 evenings)
The crime scene is somewhere in `kwin/src/scene/*` or
`kwin/src/effects/*`. Suspects, ranked:
- **`SurfaceItemWayland::createPixmapTexture``GLTexture::create`
with `GL_ALPHA`.** This is the most likely path: KWin allocates a
fallback per-plane texture when the dmabuf import path can't take
the buffer whole. NV12 has a Y plane (single-channel) and a CbCr
plane (two-channel); historically the Y plane has been allocated as
`GL_ALPHA` in software fallbacks. If the EGL dmabuf import returned
`EGL_BAD_ATTRIBUTE` for `external_only` modifiers and KWin fell
through to per-plane, this is exactly where it would land.
- **`BlurEffect::initBlurTexture` / `BackgroundContrastEffect::*`.**
Single-channel noise textures for blur dither. Less likely (these
fire on every frame regardless of video clients) but listed for
completeness.
- **Window-decoration text glyph cache.** Qt's QGLTexture historically
requested `GL_ALPHA` for monochrome glyph atlases. Plasma 6 should
have moved to `GL_RED` long ago, but a stale code path in a
third-party theme or systray icon could still hit it.
- **Cursor texture upload via `wl_shm_pool` + ARGB8888.** KWin's
cursor scene sometimes uploads via glTexImage2D — but the format
there is `GL_RGBA`, not `GL_ALPHA`. Probably not the suspect.
Tooling to identify *which*:
1. **`apitrace trace --api egl kwin_wayland …`** then
`apitrace dump trace.trace | grep -B5 GL_ALPHA`. Apitrace gives
us the C++ call stack at the offending site if KWin was built with
debug symbols.
2. **`MESA_GL_DEBUG=context KWIN_GL_DEBUG=1 kwin_wayland --replace`**
plus `glDebugMessageCallback` already installed in KWin's
`OpenGLBackend` will print the source/type/severity for each
`GL_INVALID_VALUE`. Whether the file/line in the message includes
the user-space caller depends on Mesa's debug-extension support;
on panfrost it usually does include the GL function name and an
ID, but not the C++ source — that is what apitrace adds.
3. **Build kwin from source** (`extra/kwin` PKGBUILD on Arch ARM,
patch in `-DDEBUG=ON`, `-DCMAKE_BUILD_TYPE=Debug`) so the call
stacks resolve to file:line.
## Phase 3 — Write the patch (½ evening once Phase 2 is done)
If the offender is a `GL_ALPHA` allocation in a GLES3 context, the
fix is mechanical:
```diff
- glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, width, height, 0,
- GL_ALPHA, GL_UNSIGNED_BYTE, data);
+ glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, width, height, 0,
+ GL_RED, GL_UNSIGNED_BYTE, data);
```
…and adjust the consuming shader's swizzle:
```diff
- gl_FragColor = vec4(texture2D(s, uv).a, …);
+ gl_FragColor = vec4(texture2D(s, uv).r, …);
```
If the offender is a per-plane fallback in the dmabuf import path
(suspect #1 above), the patch is larger because the right fix is to
*not fall through to the broken path* — handle the `external_only`
case by binding `GL_TEXTURE_EXTERNAL_OES` instead. That mirrors the
chromium-fourier patch 3/3 done at the chromium layer; symmetry says
KWin should do the same in its `glTexImage` consumer.
## Phase 4 — Ship and upstream (1 evening)
1. **Local Arch package** as `kwin-fourier` under
`marfrit-packages/arch/kwin-fourier/`, sibling to chromium-fourier
and firefox-fourier. PKGBUILD inherits from `extra/kwin`, drops
in our patch, bumps `pkgrel`. Same `provides=kwin conflicts=kwin`
pattern.
2. **Validate on ohm** by running the chromium-fourier 149-r2 build +
the bbb sample for a minute uninterrupted. Success = no GL_ALPHA
in the journal, no stall, smooth playback at the 34.7 % CPU
number from the chromium validation.
3. **Upstream** via:
- File a `kwin` bug on bugs.kde.org with: apitrace fragment, our
hardware (Mali-G52 panfrost on RK3566 mainline), exact mesa
version, repro steps via `weston-simple-dmabuf-egl` if Phase 1
produced one.
- Push an MR to invent.kde.org/plasma/kwin against `master`.
4. **Document** the fix in `chromium-fourier/docs/dmabuf-zero-copy.md`
so the next person who lands on the same wall finds the breadcrumb
trail.
## What success looks like
`chromium-fourier-149-r2` on ohm under KWin Wayland plays
`bbb_1080p30_h264.mp4` end-to-end at the 34.7 % CPU figure already
recorded by the architectural validation, with zero `GL_INVALID_VALUE`
in the journal during playback. That number is the goal of the entire
chromium-fourier campaign for RK3566 — it is currently blocked on a
bug that has nothing to do with chromium.
## Scope discipline
We do not turn this into "audit the entire KWin GLES backend." If
Phase 2 surfaces additional latent GL_INVALID_* errors that don't
matter for video playback, we note them in the bug report and move
on. The pivot is explicitly "remove this single wall so the
chromium-fourier patch series can ship a working stack on RK3566."