chromium-fourier r2 + firefox-fourier 150.0.1 + KWIN_PIVOT.md
build and publish packages / distcc-avahi-aarch64 (push) Successful in 46s
build and publish packages / lmcp-any (push) Successful in 9s
build and publish packages / lmcp-debian (push) Successful in 4s
build and publish packages / claude-his-any (push) Successful in 7s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Successful in 12m8s
build and publish packages / claude-his-debian (push) Successful in 5s
build and publish packages / distcc-avahi-aarch64 (push) Successful in 46s
build and publish packages / lmcp-any (push) Successful in 9s
build and publish packages / lmcp-debian (push) Successful in 4s
build and publish packages / claude-his-any (push) Successful in 7s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Successful in 12m8s
build and publish packages / claude-his-debian (push) Successful in 5s
chromium-fourier: - patch 3/3 nv12-external-oes-on-modifier-external-only.patch — adds NativePixmapEGLBinding::ModifierRequiresExternalOES helper, extends OzoneImageGLTexturesHolder::GetBinding to honor EGL external_only flag for NV12 dmabufs on panfrost / panthor. Validated on ohm (RK3566 hantro mainline 6.19.10): bbb_1080p30_h264.mp4 plays at 34.7 % combined CPU vs ~131 % pre-patch baseline (~3.8x). - PKGBUILD pkgrel 1->2, source array + sha256sums + prepare() hook for patch 4, patch numbering 1/2,2/2 -> 1/3,2/3,3/3. - NEXT.md appended with 2026-04-28 section: patch 4 design, validation log, KWin GL_ALPHA bug pinpoint (preexisting since 2026-03-06, affects every wayland video client; unrelated to chromium-fourier), device-renumbering note (/dev/video1 = encoder post-reboot). - KWIN_PIVOT.md: 4-phase plan to identify and patch KWin's glTexImage2D(internalFormat=GL_ALPHA) site, ohm-only test plan, scope discipline. - patches/ now tracked (compiler-rt-adjust-paths, enable-v4l2, wayland-allow-direct-egl-gles2, nv12-external-oes); the dead-end chromeos-pipeline-bypass.patch removed. firefox-fourier: - 4 patches (gfxinfo v4l2 stateless fourccs, libwrapper hwdevice ctx, ffmpegvideo v4l2-request route, prefs v4l2-request default). - PKGBUILD bumped to firefox 150.0.1, Arch toolchain glue patches layered in, mozconfig with --without-wasm-sandboxed-libraries for ALARM, package() launcher fix (rm -f symlink before cat > to avoid ENOENT through the dangling /usr/local symlink mach install drops). - 150.0.1-1-aarch64.pkg.tar.zst built on boltzmann (95 MB), pending fresnel power-on for V4L2 stateless validation on RK3399.
This commit is contained in:
@@ -0,0 +1,181 @@
|
||||
# KWin pivot — fix the `glTexImage2D(GL_ALPHA)` stall
|
||||
|
||||
## What we know
|
||||
|
||||
KWin 6.6.4-1 on Arch Linux ARM (Plasma 6.6.4-1, mesa 26.0.5-1, libdrm
|
||||
2.4.131-1) on ohm (PineTab2 / RK3566 / panfrost) silently corrupts its
|
||||
GL command queue mid-frame whenever a wayland client posts a video
|
||||
buffer. The journal carries a rolling stream of:
|
||||
|
||||
```
|
||||
kwin_wayland: 0x4: GL_INVALID_VALUE in glTexImage2D(internalFormat=GL_ALPHA)
|
||||
kwin_wayland: 0x4: GL_INVALID_OPERATION in glTexSubImage2D(invalid texture level 0) × N
|
||||
```
|
||||
|
||||
`GL_ALPHA` is not a valid `internalFormat` for `glTexImage2D` under
|
||||
**OpenGL ES 3.x** (it was the GLES1.x single-channel alpha format;
|
||||
GLES3 deprecates it for sized formats — `GL_R8`, `GL_LUMINANCE8_ALPHA8`,
|
||||
etc.). Once the texture allocation fails, the `glTexSubImage2D` calls
|
||||
that should populate it all error at level 0. KWin keeps retrying the
|
||||
same broken upload every frame, never recovers, and the present-callback
|
||||
path that depends on that texture stops acking client frames. Every
|
||||
wayland video client deadlocks on the missing ack.
|
||||
|
||||
First occurrence in this box's journal: **2026-03-06** — the bug
|
||||
predates any chromium-fourier work by roughly seven weeks.
|
||||
|
||||
## Triangulation already in hand
|
||||
|
||||
| Client | Outcome |
|
||||
|---|---|
|
||||
| chromium-fourier 149-r2 (with patch 3/3) | plays ~3 s @ 34.7 % CPU then renderer/GPU park in `futex_do_wait` |
|
||||
| chromium-fourier 149-r2 (without patch 3/3) | plays ~10 s (slower path delays surfacing) then identical deadlock |
|
||||
| VLC | `cannot convert decoder/filter output to any format supported by the output` → `could not initialize video chain` |
|
||||
| mpv `--vo=null --hwdec=v4l2request` | `Could not create device.` (mpv-side bug, separate, unrelated) |
|
||||
| ffmpeg `-hwaccel v4l2request -i bbb -f null -` | plays through clean at 36 fps; hardware path is healthy |
|
||||
|
||||
Decode path is healthy on this hardware. The wall is exclusively the
|
||||
compositor's GL backend.
|
||||
|
||||
## Constraint: ohm is the only test box on hand
|
||||
|
||||
ampere (RK3588 / panthor) is in the boxes-from-Shenzhen pile, currently
|
||||
DOWN. fresnel (RK3399 / Pinebook Pro) is offline. boltzmann (Rock 5
|
||||
ITX+ build host) doesn't run KWin. We do every step on ohm; we accept
|
||||
the wifi flakiness and the occasional reboot.
|
||||
|
||||
## Phase 1 — Reproduce outside chrome and bound the trigger (1 evening)
|
||||
|
||||
Goal: a deterministic, headless-or-near-headless reproduction that
|
||||
doesn't require launching a 800-MB browser.
|
||||
|
||||
1. **Smallest-possible client.** Build a 50-line C wayland client that
|
||||
creates a `wp_linux_dmabuf_v1` buffer, pumps frames at 30 fps, and
|
||||
exits when KWin first errors. Use `weston-simple-dmabuf-egl` from
|
||||
the `weston` package as a starting template — already does exactly
|
||||
this but without our specific format/modifier matrix.
|
||||
2. **Vary the format/modifier matrix.** Run the smallest-possible
|
||||
client with each of: NV12 + LINEAR, NV12 + AFBC, NV12 + AFRC,
|
||||
AR24 + LINEAR, XR24 + LINEAR. We already know NV12 paths trigger;
|
||||
confirming AR24/XR24 do *not* trigger localizes the bug to KWin's
|
||||
YUV import path (vs a generic dmabuf import bug).
|
||||
3. **Vary the buffer dimensions.** Some KWin texture-cache paths
|
||||
allocate fixed-size internal scratch textures; non-power-of-two,
|
||||
non-multiple-of-16, or specifically odd-aspect cases sometimes
|
||||
trigger paths that healthy aspect ratios skip. Test 1920×1080,
|
||||
1280×720, 854×480, 640×360 and a deliberately weird 1366×768.
|
||||
4. **Vary KWin scene type.** Switch
|
||||
`kwin_wayland --scene-type=opengl` vs `--scene-type=opengl-es`
|
||||
(current default on this hardware). If the bug only fires under
|
||||
GLES, that's a strong signal — the offending site is in a
|
||||
GLES-only fallback.
|
||||
|
||||
By the end of Phase 1 we should have a one-line `weston-simple-dmabuf-egl
|
||||
-format=NV12 -modifier=…` that triggers the GL_ALPHA error within
|
||||
seconds, plus a yes/no answer to "does AR24 also trigger".
|
||||
|
||||
## Phase 2 — Identify the call site (1–2 evenings)
|
||||
|
||||
The crime scene is somewhere in `kwin/src/scene/*` or
|
||||
`kwin/src/effects/*`. Suspects, ranked:
|
||||
|
||||
- **`SurfaceItemWayland::createPixmapTexture` → `GLTexture::create`
|
||||
with `GL_ALPHA`.** This is the most likely path: KWin allocates a
|
||||
fallback per-plane texture when the dmabuf import path can't take
|
||||
the buffer whole. NV12 has a Y plane (single-channel) and a CbCr
|
||||
plane (two-channel); historically the Y plane has been allocated as
|
||||
`GL_ALPHA` in software fallbacks. If the EGL dmabuf import returned
|
||||
`EGL_BAD_ATTRIBUTE` for `external_only` modifiers and KWin fell
|
||||
through to per-plane, this is exactly where it would land.
|
||||
- **`BlurEffect::initBlurTexture` / `BackgroundContrastEffect::*`.**
|
||||
Single-channel noise textures for blur dither. Less likely (these
|
||||
fire on every frame regardless of video clients) but listed for
|
||||
completeness.
|
||||
- **Window-decoration text glyph cache.** Qt's QGLTexture historically
|
||||
requested `GL_ALPHA` for monochrome glyph atlases. Plasma 6 should
|
||||
have moved to `GL_RED` long ago, but a stale code path in a
|
||||
third-party theme or systray icon could still hit it.
|
||||
- **Cursor texture upload via `wl_shm_pool` + ARGB8888.** KWin's
|
||||
cursor scene sometimes uploads via glTexImage2D — but the format
|
||||
there is `GL_RGBA`, not `GL_ALPHA`. Probably not the suspect.
|
||||
|
||||
Tooling to identify *which*:
|
||||
|
||||
1. **`apitrace trace --api egl kwin_wayland …`** then
|
||||
`apitrace dump trace.trace | grep -B5 GL_ALPHA`. Apitrace gives
|
||||
us the C++ call stack at the offending site if KWin was built with
|
||||
debug symbols.
|
||||
2. **`MESA_GL_DEBUG=context KWIN_GL_DEBUG=1 kwin_wayland --replace`**
|
||||
plus `glDebugMessageCallback` already installed in KWin's
|
||||
`OpenGLBackend` will print the source/type/severity for each
|
||||
`GL_INVALID_VALUE`. Whether the file/line in the message includes
|
||||
the user-space caller depends on Mesa's debug-extension support;
|
||||
on panfrost it usually does include the GL function name and an
|
||||
ID, but not the C++ source — that is what apitrace adds.
|
||||
3. **Build kwin from source** (`extra/kwin` PKGBUILD on Arch ARM,
|
||||
patch in `-DDEBUG=ON`, `-DCMAKE_BUILD_TYPE=Debug`) so the call
|
||||
stacks resolve to file:line.
|
||||
|
||||
## Phase 3 — Write the patch (½ evening once Phase 2 is done)
|
||||
|
||||
If the offender is a `GL_ALPHA` allocation in a GLES3 context, the
|
||||
fix is mechanical:
|
||||
|
||||
```diff
|
||||
- glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, width, height, 0,
|
||||
- GL_ALPHA, GL_UNSIGNED_BYTE, data);
|
||||
+ glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, width, height, 0,
|
||||
+ GL_RED, GL_UNSIGNED_BYTE, data);
|
||||
```
|
||||
|
||||
…and adjust the consuming shader's swizzle:
|
||||
|
||||
```diff
|
||||
- gl_FragColor = vec4(texture2D(s, uv).a, …);
|
||||
+ gl_FragColor = vec4(texture2D(s, uv).r, …);
|
||||
```
|
||||
|
||||
If the offender is a per-plane fallback in the dmabuf import path
|
||||
(suspect #1 above), the patch is larger because the right fix is to
|
||||
*not fall through to the broken path* — handle the `external_only`
|
||||
case by binding `GL_TEXTURE_EXTERNAL_OES` instead. That mirrors the
|
||||
chromium-fourier patch 3/3 done at the chromium layer; symmetry says
|
||||
KWin should do the same in its `glTexImage` consumer.
|
||||
|
||||
## Phase 4 — Ship and upstream (1 evening)
|
||||
|
||||
1. **Local Arch package** as `kwin-fourier` under
|
||||
`marfrit-packages/arch/kwin-fourier/`, sibling to chromium-fourier
|
||||
and firefox-fourier. PKGBUILD inherits from `extra/kwin`, drops
|
||||
in our patch, bumps `pkgrel`. Same `provides=kwin conflicts=kwin`
|
||||
pattern.
|
||||
2. **Validate on ohm** by running the chromium-fourier 149-r2 build +
|
||||
the bbb sample for a minute uninterrupted. Success = no GL_ALPHA
|
||||
in the journal, no stall, smooth playback at the 34.7 % CPU
|
||||
number from the chromium validation.
|
||||
3. **Upstream** via:
|
||||
- File a `kwin` bug on bugs.kde.org with: apitrace fragment, our
|
||||
hardware (Mali-G52 panfrost on RK3566 mainline), exact mesa
|
||||
version, repro steps via `weston-simple-dmabuf-egl` if Phase 1
|
||||
produced one.
|
||||
- Push an MR to invent.kde.org/plasma/kwin against `master`.
|
||||
4. **Document** the fix in `chromium-fourier/docs/dmabuf-zero-copy.md`
|
||||
so the next person who lands on the same wall finds the breadcrumb
|
||||
trail.
|
||||
|
||||
## What success looks like
|
||||
|
||||
`chromium-fourier-149-r2` on ohm under KWin Wayland plays
|
||||
`bbb_1080p30_h264.mp4` end-to-end at the 34.7 % CPU figure already
|
||||
recorded by the architectural validation, with zero `GL_INVALID_VALUE`
|
||||
in the journal during playback. That number is the goal of the entire
|
||||
chromium-fourier campaign for RK3566 — it is currently blocked on a
|
||||
bug that has nothing to do with chromium.
|
||||
|
||||
## Scope discipline
|
||||
|
||||
We do not turn this into "audit the entire KWin GLES backend." If
|
||||
Phase 2 surfaces additional latent GL_INVALID_* errors that don't
|
||||
matter for video playback, we note them in the bug report and move
|
||||
on. The pivot is explicitly "remove this single wall so the
|
||||
chromium-fourier patch series can ship a working stack on RK3566."
|
||||
Reference in New Issue
Block a user