Merge branch 'main' into noether/add-linux-pinetab2-danctnix-besser

This commit is contained in:
2026-05-18 19:22:03 +00:00
119 changed files with 69215 additions and 39 deletions
+408
View File
@@ -0,0 +1,408 @@
# KWin pivot — fix the chrome-on-KWin video stall
> **2026-04-28 update part 3 — campaign closed end-to-end.** Three
> patches landed on ohm in sequence: qt6-base-fourier (GL_ALPHA →
> GL_R8), kwin-fourier (watchDmaBuf no-op), chromium-fourier patch
> 4/4 (V4L2 capture pool floor at 16). Each unsticks one layer.
> Together they produce smooth 1080p30 H.264 playback under KDE
> Plasma 6.6.4 Wayland on the box where stock chromium previously
> stalled in 3 seconds. Combined chrome CPU ~81 % steady, KWin ~9 %,
> zero GL_INVALID_VALUE in the journal during playback. Brave's
> YouTube on the same session feels markedly snappier independently
> — kwin-fourier is a general-purpose latency reduction for every
> wp_linux_dmabuf client on this hardware, not a chrome-specific
> fix. **The kernel-side architectural hole — vb2 / hantro / rga not
> populating `dma_resv` exclusive fences for V4L2 producers — is the
> right upstream-correct fix and the planned next move.** kwin-fourier
> in its current shape (blanket bypass) is a working *diagnostic
> instrument*; the upstream MR will be the kernel-side per-driver
> patch (3 commits: vb2 helper API, hantro opt-in, rga opt-in) plus
> a parallel KWin commit using `poll(POLLIN)` directly on the dmabuf
> fd instead of the `EXPORT_SYNC_FILE`+`QSocketNotifier` roundtrip.
> **2026-04-28 update part 2 — qt6-base-fourier landed, validated, did
> not fix the chrome stall.** The Qt 6 GL_ALPHA bug (qopengltextureglyphcache.cpp,
> qrhigles2.cpp, qopengltextureuploader.cpp) is real, the patch is
> correct, the journal noise is gone — but the chromium-fourier 149-r4
> playback under KWin still deadlocks at ~6 seconds (vs ~3 seconds
> pre-patch — so the GL_ALPHA churn was contributing some overhead,
> just not the primary cause). Vindication came from a clean weston
> A/B: same chrome v4 binary, same panfrost mesa, same V4L2 driver,
> same hardware → swapping KWin for weston turns the stall off. Chrome
> plays through under weston (with elevated CPU because weston falls
> back to LINEAR composite). KWin has a *second* bug, structurally
> deeper than the Qt one. **Phase 4 below — write/ship/upstream the Qt
> fix — was completed today.** This document now pivots again to the
> remaining KWin investigation.
> **2026-04-28 update part 1 — Phase 2 collapsed onto Phase 1: not KWin
> for the GL_ALPHA part.** Source-grep nailed the offender on the
> first pass. Real culprit: Qt 6's `QOpenGLTextureGlyphCache`
> (`src/opengl/qopengltextureglyphcache.cpp:111-117`) and
> `QRhiGles2::toGlTextureFormat` (`src/gui/rhi/qrhigles2.cpp:1373-1378`).
> KWin's own GL paths use `GL_R8` correctly (`src/opengl/gltexture.cpp:61`,
> `src/scene/shadowitem.cpp:494`). The Qt-fourier patch series shipped
> as `marfrit-packages/arch/qt6-base-fourier/` and was validated on
> ohm — zero `GL_INVALID_VALUE` in a fresh-session journal.
## Triangulation table after today's work
| Path | Result |
|---|---|
| `ffmpeg -hwaccel v4l2request -f null` | ✓ 36 fps, clean |
| `mpv --vo=null --hwdec=v4l2request` | ✓ decode-only, clean |
| `mpv --vo=drm --hwdec=v4l2request` (KMS scanout, no compositor) | ✓ 0.7% drops in 19 s |
| **chrome v4 under weston** | **✓ plays through; ~96 % CPU** |
| chrome v4 under KWin (post-Qt-fix) | ✗ stall @ ~6 s, ⏸ icon, audio clock advances |
| `mpv --vo=gpu-next --hwdec=v4l2request` under KWin | ✗ ~76 % drops, slideshow |
| **chrome v4 under KWin pre-Qt-fix** | **✗ stall @ ~3 s** (GL_ALPHA spam adds load) |
Decode + display hardware path is fully capable. Wayland *as a
protocol* is fine (weston works). The wall is **specifically KWin's
compositor scheduling and presentation pipeline on this stack** —
panfrost ES 3.2 + V4L2 stateless NV12 dmabuf clients.
The chromium-fourier patch series is correct. The qt6-base-fourier
patch series is correct. The KWin bug is the third independent
problem on this hardware, exposed by the prior two fixes.
## What we know and what we don't
**Known:**
- The stall is *not* in chrome's V4L2VideoDecoder (works under weston).
- The stall is *not* in panfrost's dmabuf import (works under weston).
- The stall is *not* a GL error (no `GL_INVALID_VALUE` after qt6 fix).
- The stall is *not* a thread parked in `vb2`/`v4l2`/`dma_fence` wchan
(kwin/chrome/audio threads all sit in `futex_do_wait` /
`poll_schedule_timeout` / `unix_stream_read_generic`).
- The stall *does* idle the audio output socket (renderer audio
thread blocks reading from the audio service unix socket → audio
drains the last ALSA buffer, static, silence).
- The stall *does* leave the `<video>` element's currentTime
advancing (audio clock keeps running) while no new frames present.
**Unknown (where the next investigation should bisect):**
- Is KWin failing to deliver `wl_buffer.release` events promptly?
Symptom would be: chrome holds dmabuf references → V4L2's 6-buffer
CAPTURE pool exhausts → decoder blocks. Test by tracing
`wl_buffer.release` events on the chrome wl_display via a wayland
proxy / debugger.
- Is KWin failing to deliver `wp_presentation_feedback`? Chrome's
viz compositor uses presentation feedback to pace; if it never
fires, the renderer's frame submission backs up.
- Is KWin's render loop blocking on a vsync wait that never
releases? The 6-second figure roughly matches multiples of
panfrost's idle-power-state wakeup latency, but that's a
speculative correlation.
- Is KWin doing something specific with `wp_subsurface` that chrome's
multi-process IPC tickles in a way mpv's single-process flow does
not? Chrome attaches video to a subsurface inside its main
surface; mpv attaches video to its main surface directly.
## Phase 5 — Find the KWin culprit
Order of cheapness:
1. **Strace KWin during a chrome stall.** What is KWin actually
doing at the moment of stall? (`strace -p <kwin_pid> -f -e trace=poll,read,write,recvmsg,sendmsg -o /tmp/kwin.strace` for ~10 s during the chrome run, then chase a flatlined fd.)
2. **WAYLAND_DEBUG=client + WAYLAND_DEBUG=server.** Run chrome with
`WAYLAND_DEBUG=client`, capture all wayland traffic, look for
the last `wl_buffer.attach` chrome sent before the stall and
whether KWin ever emitted the corresponding `wl_buffer.release`.
This single experiment probably localizes the bug.
3. **Disable KWin effects, scene type, blur, contrast, animation
speed.** `kcmshell6 kwincompositing` toggles. If turning off
*all* effects unsticks chrome, the bug is in an effect plugin.
4. **Bisect KWin git.** v6.6.4 vs v6.5.x or earlier — does the
stall reproduce on v6.5? If not, bisect the offending commit.
(Heavy: KWin builds for ~1 h on boltzmann.)
Step (2) is the headliner. WAYLAND_DEBUG yields a complete
client-server transcript; the missing event after the stall is
usually the bug, and the exchange around it is the call site.
## Phase 6 — Fix and ship
Once we know the call site:
- Write the patch against `kwin/master`, smallest possible diff.
- Local Arch package as `kwin-fourier` under
`marfrit-packages/arch/kwin-fourier/`, pattern matching
`qt6-base-fourier`. Bump epoch to dominate Arch's version.
- Validate on ohm: chrome v4 + bbb sample plays through to EOF at
the 34.7 % CPU number (KWin's fast-tile path is more efficient
than weston's LINEAR fallback, so once unstuck, KWin should beat
weston on CPU).
- File on bugs.kde.org with the WAYLAND_DEBUG transcript, the
weston-vs-KWin A/B table from this document, and the patch.
- Push MR to invent.kde.org/plasma/kwin against `master`.
## Reflection — the spec-shaped void
The user's original Phase-1 hypothesis was "leaked corporate
short-cuts." Today vindicates it twice over:
- **Qt 6**: codified "OpenGL ES" as one thing in 2012, never
re-read the ES 3 spec when GL_ALPHA was deprecated. We patched
it. Three sites, ~9 lines of net change.
- **KWin**: still TBD which spec-or-spec-shaped-omission it tripped
on. The data so far points at a compositor scheduling issue —
most likely a missing or late `wl_buffer.release` /
`wp_presentation_feedback` for the specific case of multiple
IPC-fragmented client surfaces driving a fast video stream.
That's exactly the kind of scenario that gets "we never tested
this combination" treatment in QA.
## What we know
KWin 6.6.4-1 on Arch Linux ARM (Plasma 6.6.4-1, mesa 26.0.5-1, libdrm
2.4.131-1) on ohm (PineTab2 / RK3566 / panfrost) silently corrupts its
GL command queue mid-frame whenever a wayland client posts a video
buffer. The journal carries a rolling stream of:
```
kwin_wayland: 0x4: GL_INVALID_VALUE in glTexImage2D(internalFormat=GL_ALPHA)
kwin_wayland: 0x4: GL_INVALID_OPERATION in glTexSubImage2D(invalid texture level 0) × N
```
`GL_ALPHA` is not a valid `internalFormat` for `glTexImage2D` under
**OpenGL ES 3.x** (it was the GLES1.x single-channel alpha format;
GLES3 deprecates it for sized formats — `GL_R8`, `GL_LUMINANCE8_ALPHA8`,
etc.). Once the texture allocation fails, the `glTexSubImage2D` calls
that should populate it all error at level 0. KWin keeps retrying the
same broken upload every frame, never recovers, and the present-callback
path that depends on that texture stops acking client frames. Every
wayland video client deadlocks on the missing ack.
First occurrence in this box's journal: **2026-03-06** — the bug
predates any chromium-fourier work by roughly seven weeks.
## Triangulation already in hand
| Client | Outcome |
|---|---|
| chromium-fourier 149-r2 (with patch 3/3) | plays ~3 s @ 34.7 % CPU then renderer/GPU park in `futex_do_wait` |
| chromium-fourier 149-r2 (without patch 3/3) | plays ~10 s (slower path delays surfacing) then identical deadlock |
| VLC | `cannot convert decoder/filter output to any format supported by the output``could not initialize video chain` |
| mpv `--vo=null --hwdec=v4l2request` | `Could not create device.` (mpv-side bug, separate, unrelated) |
| ffmpeg `-hwaccel v4l2request -i bbb -f null -` | plays through clean at 36 fps; hardware path is healthy |
Decode path is healthy on this hardware. The wall is exclusively the
compositor's GL backend.
## Constraint: ohm is the only test box on hand
ampere (RK3588 / panthor) is in the boxes-from-Shenzhen pile, currently
DOWN. fresnel (RK3399 / Pinebook Pro) is offline. boltzmann (Rock 5
ITX+ build host) doesn't run KWin. We do every step on ohm; we accept
the wifi flakiness and the occasional reboot.
## Phase 1 — Reproduce outside chrome and bound the trigger (1 evening)
Goal: a deterministic, headless-or-near-headless reproduction that
doesn't require launching a 800-MB browser.
1. **Smallest-possible client.** Build a 50-line C wayland client that
creates a `wp_linux_dmabuf_v1` buffer, pumps frames at 30 fps, and
exits when KWin first errors. Use `weston-simple-dmabuf-egl` from
the `weston` package as a starting template — already does exactly
this but without our specific format/modifier matrix.
2. **Vary the format/modifier matrix.** Run the smallest-possible
client with each of: NV12 + LINEAR, NV12 + AFBC, NV12 + AFRC,
AR24 + LINEAR, XR24 + LINEAR. We already know NV12 paths trigger;
confirming AR24/XR24 do *not* trigger localizes the bug to KWin's
YUV import path (vs a generic dmabuf import bug).
3. **Vary the buffer dimensions.** Some KWin texture-cache paths
allocate fixed-size internal scratch textures; non-power-of-two,
non-multiple-of-16, or specifically odd-aspect cases sometimes
trigger paths that healthy aspect ratios skip. Test 1920×1080,
1280×720, 854×480, 640×360 and a deliberately weird 1366×768.
4. **Vary KWin scene type.** Switch
`kwin_wayland --scene-type=opengl` vs `--scene-type=opengl-es`
(current default on this hardware). If the bug only fires under
GLES, that's a strong signal — the offending site is in a
GLES-only fallback.
By the end of Phase 1 we should have a one-line `weston-simple-dmabuf-egl
-format=NV12 -modifier=…` that triggers the GL_ALPHA error within
seconds, plus a yes/no answer to "does AR24 also trigger".
## Phase 2 — Identify the call site (12 evenings)
The crime scene is somewhere in `kwin/src/scene/*` or
`kwin/src/effects/*`. Suspects, ranked:
- **`SurfaceItemWayland::createPixmapTexture``GLTexture::create`
with `GL_ALPHA`.** This is the most likely path: KWin allocates a
fallback per-plane texture when the dmabuf import path can't take
the buffer whole. NV12 has a Y plane (single-channel) and a CbCr
plane (two-channel); historically the Y plane has been allocated as
`GL_ALPHA` in software fallbacks. If the EGL dmabuf import returned
`EGL_BAD_ATTRIBUTE` for `external_only` modifiers and KWin fell
through to per-plane, this is exactly where it would land.
- **`BlurEffect::initBlurTexture` / `BackgroundContrastEffect::*`.**
Single-channel noise textures for blur dither. Less likely (these
fire on every frame regardless of video clients) but listed for
completeness.
- **Window-decoration text glyph cache.** Qt's QGLTexture historically
requested `GL_ALPHA` for monochrome glyph atlases. Plasma 6 should
have moved to `GL_RED` long ago, but a stale code path in a
third-party theme or systray icon could still hit it.
- **Cursor texture upload via `wl_shm_pool` + ARGB8888.** KWin's
cursor scene sometimes uploads via glTexImage2D — but the format
there is `GL_RGBA`, not `GL_ALPHA`. Probably not the suspect.
Tooling to identify *which*:
1. **`apitrace trace --api egl kwin_wayland …`** then
`apitrace dump trace.trace | grep -B5 GL_ALPHA`. Apitrace gives
us the C++ call stack at the offending site if KWin was built with
debug symbols.
2. **`MESA_GL_DEBUG=context KWIN_GL_DEBUG=1 kwin_wayland --replace`**
plus `glDebugMessageCallback` already installed in KWin's
`OpenGLBackend` will print the source/type/severity for each
`GL_INVALID_VALUE`. Whether the file/line in the message includes
the user-space caller depends on Mesa's debug-extension support;
on panfrost it usually does include the GL function name and an
ID, but not the C++ source — that is what apitrace adds.
3. **Build kwin from source** (`extra/kwin` PKGBUILD on Arch ARM,
patch in `-DDEBUG=ON`, `-DCMAKE_BUILD_TYPE=Debug`) so the call
stacks resolve to file:line.
## Phase 3 — Write the patch (½ evening once Phase 2 is done)
The Qt 6 fix is two ~3-line changes, runtime-safe, no new dependency.
**Fix #1 — `src/opengl/qopengltextureglyphcache.cpp` lines 111-117:**
```diff
#if !QT_CONFIG(opengles2)
const GLint internalFormat = isCoreProfile() ? GL_R8 : GL_ALPHA;
const GLenum format = isCoreProfile() ? GL_RED : GL_ALPHA;
#else
- const GLint internalFormat = GL_ALPHA;
- const GLenum format = GL_ALPHA;
+ // OpenGL ES 3.x deprecated GL_ALPHA as a glTexImage2D
+ // internalFormat; only true ES 2 contexts retain it. Use GL_R8
+ // + the matching swizzle (handled in the fragment shader's .r
+ // sample below) on ES 3+ hardware so Mali / panfrost / panthor
+ // GLES3 contexts stop emitting GL_INVALID_VALUE every frame.
+ const bool useR8 = ctx->format().majorVersion() >= 3;
+ const GLint internalFormat = useR8 ? GL_R8 : GL_ALPHA;
+ const GLenum format = useR8 ? GL_RED : GL_ALPHA;
#endif
```
The downstream fragment shader path that samples this texture must
read `.r` instead of `.a` when `GL_R8` is used. Qt's text-rendering
fragment program already has both code paths conditioned on context
core-profile; the ES 3+ branch needs the same treatment. Lines
214-216 of the same file (the resize / re-upload path) need the
identical change.
**Fix #2 — `src/gui/rhi/qrhigles2.cpp` lines 1373-1378:**
```diff
case QRhiTexture::RED_OR_ALPHA8:
- *glintformat = caps.coreProfile ? GL_R8 : GL_ALPHA;
+ *glintformat = (caps.coreProfile || (caps.gles && caps.ctxMajor >= 3))
+ ? GL_R8 : GL_ALPHA;
*glsizedintformat = *glintformat;
- *glformat = caps.coreProfile ? GL_RED : GL_ALPHA;
+ *glformat = (caps.coreProfile || (caps.gles && caps.ctxMajor >= 3))
+ ? GL_RED : GL_ALPHA;
*gltype = GL_UNSIGNED_BYTE;
break;
```
`caps.gles` and `caps.ctxMajor` are populated at context creation
(qrhigles2.cpp:804 + :855); the disjunct is free.
**Fix #3 — `src/opengl/qopengltextureuploader.cpp` lines 253-257:**
This is the QImage→GL upload path (used by `QOpenGLPaintEngineEx`
and its descendants). Same pattern, same fix shape: extend the
"core profile or GLES2 fallback" branching to also consider GLES3+
as needing `GL_R8`.
If we want to be aggressive, we can collapse all three sites onto a
single `qt_gl_use_r8_for_alpha8(ctx)` helper in `qopenglhelper_p.h`
so future Qt versions don't drift apart again — but a minimal patch
should keep the three sites independent so each is reviewable in
isolation by the relevant Qt module owner.
## Phase 4 — Ship and upstream (1 evening)
1. **Local Arch package** as `qt6-base-fourier` under
`marfrit-packages/arch/qt6-base-fourier/`, sibling to chromium-fourier
and firefox-fourier. PKGBUILD inherits from `extra/qt6-base`, drops
in the three patches above, bumps `pkgrel`. Same
`provides=qt6-base conflicts=qt6-base` pattern. Rebuild is heavy
(qtbase compile is ~30 minutes on boltzmann; ohm rebuild is
sustained-fan-territory and probably better avoided — boltzmann
builds the aarch64 .pkg.tar.zst, then we rsync it to ohm and
`pacman -U` there).
2. **Validate on ohm** by:
- `pacman -U` the patched qt6-base.
- Restart Plasma session (logout / login) so the new qt6-base.so
is mapped into the fresh kwin_wayland.
- Re-run `journalctl -u plasma-kwin_wayland.service -f` while
opening any Qt 6 application that triggers text caching (a
terminal, kate, the system tray) — the GL_INVALID_VALUE spam
should be **gone**.
- Then run chromium-fourier 149-r2 + the bbb sample for a full
minute uninterrupted. Success = smooth playback through to EOF
at the 34.7 % CPU number, no stall, no audio static, no
KWin-side errors in the journal.
3. **Upstream** via:
- File on `bugreports.qt.io` against `QtBase: OpenGL`, with: the
three diff hunks above, the exact behavior on Mali-G52 panfrost
RK3566 mainline 6.19, an excerpt of the journal noise, and
mesa 26.0.5 / qt 6.11.0 / kwin 6.6.4 versions.
- Push a Gerrit change against `qtbase` `dev` branch
(`codereview.qt-project.org`). Qt won't accept a GitHub MR —
they live on Gerrit. Create a Qt account, configure
`git-review`, push.
- Reference the chromium-fourier project as the discovery site
so the next Mali-on-Linux Qt 6 user finds the breadcrumb.
4. **Document** the fix in
`chromium-fourier/docs/dmabuf-zero-copy.md` "Caveat — KWin 6.6.4
GLES backend on this hardware" subsection: replace the "to be
investigated" wording with "fixed by qt6-base-fourier; see
`marfrit-packages/arch/qt6-base-fourier/`. Upstream Qt change
pending review at `<gerrit-link>`."
## Reflection — corporate IT spec leakage, as predicted
The user's Phase-1 hypothesis was that this was the result of code
written by people who never read the spec they were claiming to
implement. They were correct, with one nuance: the Qt code did read
the spec — *the OpenGL ES 2.x spec*, where `GL_ALPHA` is genuinely
the canonical single-channel format for `glTexImage2D`. What it
never went back and re-read is the OpenGL ES 3.0 spec
(section 3.8.3, "Texture Image Specification"), where `GL_ALPHA`
is moved to the deprecated list and only sized formats are
retained. The bug is: *Qt 6 was written assuming "OpenGL ES" is
one thing, and never updated the assumption when ES 3 dropped the
unsized formats.* That's a corporate-IT-style architectural
shortcut: codify the world in two boxes (desktop vs ES), call it
done, ship. The fact that a category had a sub-category which moved
in 2012 is not the framework's job to track. Until the bug report
arrives and someone has to extend the boolean to a triple.
## What success looks like
`chromium-fourier-149-r2` on ohm under KWin Wayland plays
`bbb_1080p30_h264.mp4` end-to-end at the 34.7 % CPU figure already
recorded by the architectural validation, with zero `GL_INVALID_VALUE`
in the journal during playback. That number is the goal of the entire
chromium-fourier campaign for RK3566 — it is currently blocked on a
bug that has nothing to do with chromium.
## Scope discipline
We do not turn this into "audit the entire KWin GLES backend." If
Phase 2 surfaces additional latent GL_INVALID_* errors that don't
matter for video playback, we note them in the bug report and move
on. The pivot is explicitly "remove this single wall so the
chromium-fourier patch series can ship a working stack on RK3566."
+428
View File
@@ -0,0 +1,428 @@
# chromium-fourier — first-build status (2026-04-26 00:42 UTC)
## Where we are
Build environment: **chromium-builder LXD container on boltzmann**
(8 cores, 28 GB RAM cap, 824 GB NVMe, Beryllium OS rkr3 host kernel).
Source: chromium-147.0.7727.116 release tarball extracted at
`/build/chromium/src` (25 GB extracted).
`gn gen out/Default` **succeeds** with our 7Ji-style args
(`use_v4l2_codec=true use_v4lplugin=true use_linux_v4l2_only=true
use_vaapi=false`, system toolchain via `unbundle:default`, system clang
at `/usr/bin/clang`, version-symlink `/usr/lib/clang/23`
`/usr/lib/clang/22`, compiler-rt-adjust-paths style suffix patch
manually applied). 28057 targets generated.
`ninja -C out/Default chrome` **fails immediately** with two distinct walls:
### Wall 1 — clang version mismatch (chromium 147 ↔ Arch clang 22)
Chromium 147's compile flags include
`-fno-lifetime-dse` and
`-fsanitize-ignore-for-ubsan-feature=array-bounds`. Both are clang 23+
features. Arch Linux ARM ships **clang 22.1.3** (extra repo). Every
single C++ compile fails with `clang++: error: unknown argument`.
Resolutions, in order of effort:
- **(a)** Wait for Arch ARM to bump clang to 23. Tracking package
upstream — happens whenever LLVM 23 lands in extra. Days to weeks.
- **(b)** Use chromium's bundled clang via
`tools/clang/scripts/update.py`. That hits the same CIPD/gs://
"linux-arm64 isn't a first-class target" issue we saw with
`gclient sync` earlier — chromium's clang prebuilts are x86_64-only
for many platforms.
- **(c)** Fork an older chromium (e.g., 132 or 138) that compiles
cleanly with clang 22. 7Ji's chromium-mpp PKGBUILD targets 132 and
builds clean on Arch ARM today. Loses 15 versions of upstream
chromium evolution but ships fast.
- **(d)** Patch chromium 147 to drop the offending flags
(`build/config/compiler/BUILD.gn` has the cflags lists). 50200 line
patch, brittle across version bumps but tractable. Fights every
rebase.
### Wall 2 — bundled x86_64 esbuild under qemu
After Wall 1 (or independently for Action targets):
`qemu-x86_64-static: Could not open '/lib64/ld-linux-x86-64.so.2'`
when chromium runs the bundled x86_64 `esbuild` from
`third_party/devtools-frontend/.../scripts/build/typescript/ts_library.py`.
Same shape as the bundled `node-linux-x64` issue we already fixed (we
symlinked system node into that path). `esbuild` needs the same
treatment — install system esbuild via `npm install -g esbuild` and
symlink it into the path chromium expects. Or install `qemu-user-static`
+ `glibc-x86_64` to make the bundled binary actually run.
**Wall 2 is much smaller than Wall 1** — a handful of bundled-x86_64
binaries to identify and replace, vs. fundamental clang version mismatch.
## What worked
- LXD container provisioning on boltzmann via his recommendation —
the host environment is right.
- Tarball-instead-of-gclient approach — sidesteps CIPD-doesn't-have-
linux-arm64 problem for source acquisition, leaves only a few
bundled binary issues at build time.
- Wall 1 / Wall 2 are both **identifiable and bounded**. We're past
the "is this even doable" phase; this is now down to grinding the
patches.
## Options — needs your call
1. **Grind through Wall 1 with patches** — patch
`build/config/compiler/BUILD.gn` to drop flags clang 22 doesn't
know. Iterate per build error. Estimated 515 patch-and-retry
cycles to compile clean. Then 610 h actual build.
2. **Pin to chromium 132** — match 7Ji's known-working version on
Arch ARM. Drop our STUDY focus on "current upstream Chromium" and
ship a 1-year-old binary. Build should work much sooner.
3. **Pin to chromium 138 or 140** — middle ground. Likely uses clang
22 features and not 23. Some research needed to find the cutover.
4. **Use chromium's bundled clang** — not viable on linux-arm64
without extensive sysroot setup; same CIPD issue as gclient sync.
5. **Wait for Arch ARM clang 23** — passive, days-to-weeks horizon.
Recommended (FWIW): **start with (3)** — find the latest chromium
version that builds clean against clang 22 (probably 138-141 range),
ship that as `chromium-fourier`, then bump as Arch ARM bumps clang.
That gives us a working browser in a few hours rather than days, on
mainline Linux + Wayland + V4L2 unlock — which is the actual goal.
The "current upstream Chromium" requirement was nice-to-have, not
essential.
## State of the build host (preserved)
- Container: `chromium-builder` on boltzmann (running, idle)
- Source: `/build/chromium/src` (extracted tarball, 25 GB)
- Build dir: `/build/chromium/src/out/Default` (gn-gen'd, no artifacts)
- Tools installed: gn, ninja, clang 22, lld, gperf, nodejs (system),
rust, qt5/6, all the gtk/wayland/va/v4l deps from the long pacman
shopping list
- Patches applied to source: `compiler-rt-adjust-paths` style (manual)
- Symlinks: `/usr/lib/clang/23``/usr/lib/clang/22`,
`third_party/node/linux/node-linux-x64/bin/node``/usr/bin/node`
- Service unit history: `chromium-fetch.service` (one-shot, succeeded
on tarball + extract); `chromium-build.service` (one-shot, three
failed attempts above).
Discard the container and start over with option 2 if you pick that
direction; otherwise iterate from current state.
## Pivot 2026-04-26 — cross-compile from x86_64
After the analysis above, the framing shifted. The real wall isn't
"Arch ARM clang 22 vs LLVM 23" — Arch x86_64 is also on llvm 22.1.3, no
LLVM 23 anywhere in extra/staging. The flags chromium 147 emits
(`-fno-lifetime-dse`, `-fsanitize-ignore-for-ubsan-feature=array-bounds`,
the `/usr/lib/clang/23/...` path) come from **chromium's clang fork**,
not upstream LLVM 23. Chromium ships its own LLVM with chromium-specific
passes; the "23" in the path is chromium-internal versioning.
Implication: PKGBUILD'ing clang 23 is the wrong tree. The right tree is
either pin to an older chromium (option 2 above) or **cross-compile from
an x86_64 host** so chromium's x86_64 bundled clang prebuilt is
reachable and `target_cpu="arm64"` produces the aarch64 binary cleanly.
Cross-compile sidesteps every wall we hit:
- CIPD has full `linux-amd64` prebuilts (the gap was `linux-arm64`)
- Chromium's bundled clang downloads cleanly on x86_64
- No qemu-x86_64-static dance for tools (host IS x86_64)
- `tools/clang/scripts/update.py` works as Google intends
- `gclient sync` works; no DEPS surgery needed
his provisioned a cross-build host for this on 2026-04-26:
- **CT 220 `chromium-builder-x86` on data**, x86_64 Ryzen 7 1700,
14 cores, 32 GiB RAM + 8 GiB swap, 200 GiB ZFS rootfs.
- Reach via `mcp__hub-tools__remote_shell host=hertz`
`ssh root@192.168.88.30` (data) → `pct exec 220 -- ...`
- `builder` user uid 1001, NOPASSWD sudo, `DisableSandbox` in pacman.conf.
Source fetch started 2026-04-26 05:45 UTC as transient unit
`chromium-fetch.service` on CT 220. Estimated 1-2 h for `fetch
--no-history chromium` over the LAN. Then `tools/clang/scripts/update.py`
to install chromium's bundled clang (x86_64 host, arm64 sysroot), then
`gn gen` with `target_cpu="arm64"` + cross-compile flags, then build.
The boltzmann `chromium-builder` LXD container is preserved as fallback
but no longer the active build host. If cross-compile pans out, that
container can be torn down.
## First runtime validation on ohm — 2026-04-26 22:26 UTC
Cross-compile produced a working aarch64 binary (chrome 647 MB ELF +
chrome_crashpad_handler 4.3 MB + .pak + locales). Tarball
`chromium-fourier-147.tar.gz` (226 MB) transferred CT 220 → hertz → ohm.
Launched in mfritsche's KWin Wayland session (tty2, panfrost render
node) playing `bbb_1080p30_h264.mp4` from file:// with
`LIBVA_DRIVER_NAME=v4l2_request`,
`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0`,
`--use-gl=egl --ozone-platform=wayland
--enable-features=VaapiVideoDecodeLinuxGL,AcceleratedVideoDecodeLinuxGL
--disable-features=UseChromeOSDirectVideoDecoder
--autoplay-policy=no-user-gesture-required`.
**Result: V4L2 path NOT engaged.** Chrome 147 routes the H.264 stream
through `MojoVideoDecoderService``media/filters/ffmpeg_video_decoder.cc`
(software FFmpeg). Renderer pegs at ~92 % CPU, `/dev/video0` is never
opened (`fuser` returns empty), no `V4L2VideoDecoder` /
`VaapiVideoDecoder` log lines appear at `--v=1
--vmodule="*/vaapi/*=2,*/v4l2/*=2,*video_decoder*=2,*media/gpu/*=2"`.
Compositor also fell back to software (`Switching to software
compositing.` even though panfrost render node was picked) — secondary
issue, separate from the codec wall.
**Conclusion**: 7Ji-style gn args (`use_v4l2_codec=true
use_v4lplugin=true use_linux_v4l2_only=true`) alone are insufficient
on chromium 147. The V4L2VideoDecoder factory is still gated behind
`BUILDFLAG(IS_CHROMEOS)``media/mojo/services/gpu_mojo_media_client_*.cc`
and `media/gpu/gpu_video_decode_accelerator_factory.cc` only register
the V4L2 path on ChromeOS targets.
## Validation pass 2 — 2026-04-26 22:38 UTC — V4L2VDA proven engaged
Two distinct issues were diagnosed and the codec one was fully resolved
without source surgery beyond a 2-line patch:
### Issue 1 — runtime master gate
`media::kAcceleratedVideoDecodeLinux` (user-visible feature name
"AcceleratedVideoDecoder") is hard-coded in
`media/base/media_switches.cc:750` to `FEATURE_ENABLED_BY_DEFAULT` only
when `BUILDFLAG(USE_VAAPI)` is set. On a USE_V4L2_CODEC-only build it
defaults DISABLED, the linux gpu_mojo_media_client returns
`VideoDecoderType::kUnknown`, and chrome silently falls back to
`media/filters/ffmpeg_video_decoder.cc`.
**Fix**: 2-line patch (now `patches/enable-v4l2-decoder-default.patch`):
```
-#if BUILDFLAG(USE_VAAPI)
+#if BUILDFLAG(USE_VAAPI) || BUILDFLAG(USE_V4L2_CODEC)
```
The placeholder `chromeos-pipeline-bypass.patch` was deleted; PKGBUILD
now references the real patch. **Verified to apply cleanly on the CT 220
tree** (chromium 149 main).
### Issue 2 — bundled GL libs missing from tarball
The first runtime tarball shipped only `chrome` + `.pak` + locales +
`chrome_crashpad_handler`. It omitted `libEGL.so` / `libGLESv2.so`
(ANGLE) plus `libvk_swiftshader.so` and `libvulkan.so.1`. Without these,
the GPU process logs `gl::init::InitializeStaticGLBindingsOneOff failed`
and chrome falls into "Switching to software compositing." mode — which
*also* gates the V4L2 path off because the gpu_mojo_media_client never
gets a chance to dispatch.
Additionally, `--use-gl=egl` is rejected ("Requested GL implementation
gl=egl-gles2,angle=none not found in allowed implementations:
[(gl=egl-angle,angle=opengl|opengles|vulkan)]"): the build only allows
ANGLE-mediated paths. Right launcher invocation:
`--use-gl=angle --use-angle=gles`.
**Fix**: package the four libs alongside `chrome` and update the
launcher flag set. Both will be encoded in the next iteration of the
PKGBUILD's `package()` and a `chromium-fourier` launcher script.
### What we observed once both fixes were in place
With patch + bundled libs + `--enable-features=AcceleratedVideoDecoder`
+ `--use-gl=angle --use-angle=gles`, chrome on RK3566 hantro logs:
```
[gpu]: V4L2VideoDecoder()
[gpu]: Open(): No devices supporting H264 for type: 0 <- type=0 is single-planar; chrome retries multi-planar
[gpu]: InitializeBackend(): Using a stateless API for profile: h264 main and fourcc: S264
[gpu]: SetupInputFormat(): Input (OUTPUT queue) Fourcc: S264
[gpu]: AllocateInputBuffers(): Requesting: 17 OUTPUT buffers of type V4L2_MEMORY_MMAP
[gpu]: SetExtCtrlsInit(): Setting EXT_CTRLS for H264
[gpu]: SetupOutputFormat(): Output (CAPTURE queue) candidate: NV12
[gpu]: ContinueChangeResolution(): Requesting: 6 CAPTURE buffers of type V4L2_MEMORY_MMAP
[renderer]: OnDecoderSelected<video>: V4L2VideoDecoder
MediaEvent: "Selected V4L2VideoDecoder for video decoding,
config: codec: h264, profile: h264 main, [...]
coded size: [1920,1080], visible rect: [0,0,1920,1080]"
```
`fuser /dev/video1 /dev/media0` shows `chrome` (gpu pid) holding both
fds. The hantro stateless decoder is engaged. **First end-to-end
chromium-fourier V4L2 hardware decode validation: PASS** for H.264
1080p Big Buck Bunny on PineTab2.
Caveat: the render-side CPU was still ~85% during playback. Subsequent
investigation traced this to a different root cause than initially
guessed (see Pass 3 below).
## Validation pass 3 — 2026-04-26 22:50 UTC — zero-copy diagnosis
The 85 % CPU is **not** caused by software compositing or dmabuf v5
negotiation. The dmabuf-v5 warning ("Binding to zwp_linux_dmabuf_v1
version 4 but version 5 is available") is benign — chrome happily binds
to its supported max (v4). The `WaylandZwpLinuxDmabuf::OnTrancheFlags`
NOTIMPLEMENTED is also benign — KWin sends it, chrome ignores it, but
the substantive feedback (formats + modifiers) lands via
`OnTrancheFormats` / `OnTrancheTargetDevice` regardless.
Real cause: `gpu_feature_info.supports_nv12_gl_native_pixmap` ends up
**false** on this build. With it false, V4L2-decoded NV12 frames go
through the NV12-to-AR24 VPP conversion path (see
`media/mojo/services/gpu_mojo_media_client_linux.cc`
`GetPreferredRenderableFourccs` — without NV12 native pixmap support,
only `Fourcc::AR24` is added to the renderable set, forcing the VPP).
That's where the 85 % is spent.
Why is `supports_nv12_gl_native_pixmap` false?
`GLOzoneEGLWayland::CanImportNativePixmap` (in
`ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc`) requires the
chrome GL display to expose `EGL_EXT_image_dma_buf_import`. With
`--use-gl=angle --use-angle=gles`, chrome's GL display sits behind
ANGLE's EGL, and ANGLE's GLES backend on Linux does not propagate
`EGL_EXT_image_dma_buf_import` from the underlying mesa EGL up to its
clients. Verified directly: `EGL_PLATFORM=surfaceless eglinfo` on ohm
shows panfrost native EGL exposes both
`EGL_EXT_image_dma_buf_import` and `EGL_EXT_image_dma_buf_import_modifiers`
the capability is there at the panfrost layer, ANGLE just hides it.
We tried `--use-gl=egl` (direct EGL/GLES2, bypass ANGLE) but were
rejected with "Requested GL implementation (gl=egl-gles2,angle=none) not
found in allowed implementations". `WaylandSurfaceFactory::GetAllowedGLImplementations()`
in chromium 149 advertises only ANGLE-mediated impls; the
`kGLImplementationEGLGLES2` slot is missing from the list. The
`CreateViewGLSurface` dispatcher does still handle that impl — only the
*advertisement* was tightened.
### Patch 2/2 — `wayland-allow-direct-egl-gles2.patch`
3-line diff in `ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc`:
```
+ impls.emplace_back(gl::GLImplementationParts(gl::kGLImplementationEGLGLES2));
impls.emplace_back(gl::ANGLEImplementation::kOpenGL);
```
Re-allows the direct EGL/GLES2 path, ahead of the ANGLE entries so
chrome picks it by default. Verified to apply cleanly on the CT 220
tree; staged via `patch -p1` mid-rebuild (ninja's mtime-based rebuild
will pick up the change automatically).
### Outstanding for next pass (revised)
1. Rebuild lands → repackage with all four GL libs +
`chrome_crashpad_handler` + chrome → ship to ohm.
2. Validate via `chrome --use-gl=egl --ozone-platform=wayland`
`--enable-features=AcceleratedVideoDecoder` (no ANGLE shim) and
confirm `chrome://gpu` reports `Native GpuMemoryBuffers: true` and
`supports_nv12_gl_native_pixmap=true`. Target CPU during 1080p30 H.264
playback: under 30 % combined renderer + gpu.
3. If (2) passes, declare V1 of chromium-fourier shippable on ohm.
4. Add a `chromium-fourier` launcher shim under `/usr/bin/` that
defaults to `--use-gl=egl --ozone-platform=wayland`.
5. Sort the chromium 147 vs 149 confusion — the fetch went to ToT on
main rather than the 147 release branch. Either pin the branch or
accept that we're tracking ToT (probably preferable for V4L2 fixes
that are still in flight upstream).
6. Replicate end-to-end on RK3588 (ampere CoolPi or boltzmann Rock 5
ITX+) once the mainline VDPU381 driver is stable on those — those
boxes use **panthor** for Mali-G610 (Valhall), not panfrost; the
patches should be backend-agnostic but the validation is per-box.
### State of the build host (post pass 3)
- CT 220 `/build/chromium/src` patched with both
`enable-v4l2-decoder-default.patch` and
`wayland-allow-direct-egl-gles2.patch` (applied directly with
`patch -p1` mid-rebuild; ninja picks up the mtime change).
- `chromium-rebuild.service` running as a transient unit, output in
`/tmp/chromium-rebuild.log`. Most of the 93k ninja steps are cache
hits; only the patched files + their downstream objects need
recompiling.
- Tarball still on CT 220 at `/build/chromium-fourier-147.tar.gz`
(misleadingly named: it's actually 149.0.7812.0 from the main fetch,
not the 147 release tarball — separate cleanup for next pass) and on
hertz at `/tmp/chromium-fourier-147.tar.gz`. **Will be replaced by
the post-rebuild tarball once it lands.**
---
## 2026-04-28 — Patch 4 lands, KWin owns the residual stall
### Patch 4 — `nv12-external-oes-on-modifier-external-only.patch`
On panfrost / panthor, every NV12 modifier (LINEAR + AFBC ×2 + AFRC) is
flagged `external_only` in `eglQueryDmaBufModifiersEXT`. Chromium's
`OzoneImageGLTexturesHolder::GetBinding` only picked
`GL_TEXTURE_EXTERNAL_OES` when the SharedImageFormat carried
`PrefersExternalSampler` — which is set for the generic Linux multi-plane
case but **not** for V4L2 producers that arrive via the standard ozone
pixmap path. The frame then took the `GL_TEXTURE_2D` branch, ANGLE's
`validationES.cpp:4894` rejected the YUV EGLImage on a non-EXTERNAL_OES
target, the import failed, and the renderer fell back to the
NV12→AR24 software conversion (~131 % CPU baseline).
Patch closes the gap: also pick `EXTERNAL_OES` when the EGL driver
advertises the pixmap's modifier as `external_only` (cached per
`(fourcc, modifier)` tuple via a function-local
`base::flat_map`+`base::Lock`, so the EGL round-trip stays off the
per-frame hot path). Adds a single static helper
`NativePixmapEGLBinding::ModifierRequiresExternalOES`. ~+90 lines, zero
deletions, no shader changes (Skia Ganesh already handles
`GL_TEXTURE_EXTERNAL_OES` natively via `GrGLTextureInfo.fTarget`).
### Validation on ohm (RK3566 PineTab2 / hantro mainline 6.19.10)
- `bbb_1080p30_h264.mp4` plays clean, no garble, no decoder error
- Steady-state **34.7 % combined CPU** during 1080p30 H.264 (browser 12 +
GPU 9 + net 6 + render 6 + audio 1) vs v3 baseline ~131 % — **~3.8×
reduction**. Risk-1 (ANGLE+EXTERNAL_OES sampling regression on Skia
Ganesh / panfrost) **cleared**.
- `V4L2VideoDecoder()` constructor + `Using a stateless API for profile:
h264 main and fourcc: S264` confirmed in log; 6 CAPTURE buffers
V4L2_MEMORY_MMAP, NV12 output. 19 live dmabuf fds in GPU process
during steady playback — healthy V4L2 rotation + compositor depth, not
a leak.
### KWin 6.6.4 GL_ALPHA bug — separate, preexisting, blocks long playback
Across BOTH v3 (no patch 4) and v4 (with patch 4) chromium-fourier
builds, mid-playback the renderer + GPU process both park in
`futex_do_wait`, `<video>` element keeps its ⏸ icon, currentTime
advances on the audio clock, and audio outputs static (last ALSA buffer
recycled) then silence. No D-state, no v4l2/vb2/dma_fence wchan, no
error in `chrome-v[34].log`.
`journalctl` for `kwin_wayland`:
```
GL_INVALID_VALUE in glTexImage2D(internalFormat=GL_ALPHA)
GL_INVALID_OPERATION in glTexSubImage2D(invalid texture level 0) × N
```
First occurrence on this box: **2026-03-06**. KWin is requesting an
internal format that doesn't exist in modern GLES (`GL_ALPHA` is GLES1.x
legacy, not valid for `glTexImage2D` with GLES3 contexts). The
allocation fails, then every `glTexSubImage2D` to that texture errors at
level 0; KWin keeps retrying the same broken upload every frame, never
recovers. The frame-callback ack to wayland clients stalls → chrome's
renderer parks waiting for the present-feedback that never lands.
Patch 4 looks "guilty" only because of timing: with NV12 zero-copy, the
renderer is fast enough to actually post a v4l2-backed dmabuf within the
window where KWin's broken path runs; v3 was slow enough (NV12→AR24
software conversion) that the bug surfaced 510× later. **Triangulation:**
chrome v4 stalls + chrome v3 stalls + VLC `cannot convert decoder/filter
output` + mpv `could not initialize video chain` — every wayland video
client hits it; ffmpeg `-hwaccel v4l2request -f null` plays through
clean (decode path is healthy, the wall is the compositor's GL backend).
### Decoder-stack sanity post-reboot (2026-04-28 ~13:30)
After a reboot the V4L2 device numbering shuffled:
- `/dev/video0` = `rockchip,rk3568-vpu-dec` (hantro DEC, was video1)
- `/dev/video1` = `rockchip,rk3568-vepu-enc` (hantro ENC)
- `/dev/video2` = `rockchip-rga`
- `/dev/media0` = controller for DEC, `/dev/media1` = controller for ENC
Anything that hardcoded `/dev/video1` for decode now talks to the
encoder. Chrome and ffmpeg both handle this transparently (they enumerate
via media-ctl); mpv's `--hwdec=v4l2request` returns `Could not create
device` post-shuffle — separate mpv-side bug, not ours.
### Outstanding (revised, supersedes earlier list)
1. **Patch 4 lands publicly:** bump PKGBUILD `source=` and `prepare()`,
commit + tag a `chromium-fourier-149-r4` release on
`git@github.com:marfrit/chromium-fourier`.
2. **KWin pivot** — see `KWIN_PIVOT.md` (separate doc) for the plan to
identify and patch the `glTexImage2D(GL_ALPHA)` site, since ohm is the
only board on hand and every wayland video client is affected.
3. **Replication on ampere** (RK3588, panthor + rkvdec2 + hantro
multiplanar) — needs ampere woken; currently DOWN.
4. **firefox-fourier 150 build** — `firefox-fourier-150.0.1-1-aarch64.pkg.tar.zst`
is built (95 MB on workstation:/tmp/, sha256 acbf1870…), pending
fresnel power-on for V4L2 stateless validation on RK3399.
+265
View File
@@ -0,0 +1,265 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# Chromium with V4L2 HW video decode unlocked on Linux for Rockchip
# (RK3566 hantro / RK3588 VDPU381) on **mainline** kernel + Wayland +
# panfrost / panthor — no vendor MPP, no Mali blob, no panfork, no
# 5.10 BSP kernel. Fills the niche that 7Ji's chromium-mpp explicitly
# does not (it forces BSP + X11 + vendor stack); see STUDY.md and
# NEXT.md alongside this PKGBUILD for the full rationale and the
# validation log on PineTab2 (RK3566).
#
# Multi-arch: builds natively on x86_64 and aarch64. The x86_64 path
# is primarily a development / CI host; the runtime target audience is
# aarch64. The two patches are architecture-independent.
pkgname=chromium-fourier
pkgver=147.0.7727.116
pkgrel=2
epoch=1
pkgdesc='Chromium with V4L2VDA HW video decode unlocked for mainline Linux Wayland on Rockchip'
arch=('aarch64' 'x86_64')
url='https://www.chromium.org/Home'
license=('BSD-3-Clause')
depends=(
alsa-lib
at-spi2-core
cairo
cups
dbus
fontconfig
freetype2
gtk3
hicolor-icon-theme
libdrm
libpulse
libva
libxcb
libxkbcommon
mesa
nspr
nss
pango
pciutils
pipewire
ttf-liberation
v4l-utils
wayland
)
makedepends=(
clang
elfutils
gn
gperf
java-runtime-headless
libxslt
lld
ninja
nodejs
npm
python
qt5-base
qt6-base
re2
rust
rust-bindgen
)
optdepends=(
'qt6-base: for Qt6 toolkit integration'
)
provides=(chromium)
conflicts=(chromium)
options=('!lto' '!strip')
# Canonical chromium release tarball (5.7 GB compressed). Versions track
# the chromium release schedule (see https://chromiumdash.appspot.com).
# When bumping pkgver the patches may need their hunk line numbers
# refreshed against the new tree — they are written against
# media/base/media_switches.cc and ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc
# which both move around between minor releases.
source=(
"https://commondatastorage.googleapis.com/chromium-browser-official/chromium-${pkgver}.tar.xz"
'patches/enable-v4l2-decoder-default.patch'
'patches/wayland-allow-direct-egl-gles2.patch'
'patches/nv12-external-oes-on-modifier-external-only.patch'
)
sha256sums=(
'SKIP'
'SKIP'
'SKIP'
'SKIP'
)
prepare() {
cd "${srcdir}/chromium-${pkgver}"
# Fourier patch 1/2: flip kAcceleratedVideoDecodeLinux's default to
# enabled when USE_V4L2_CODEC is the build's HW decode path. Without
# this the runtime master gate stays off on USE_V4L2_CODEC-only builds
# and chrome silently falls back to ffmpeg software decode. See the
# patch header for the validation log on RK3566 hantro.
patch -Np1 -i "${srcdir}/patches/enable-v4l2-decoder-default.patch"
# Fourier patch 2/3: re-allow the direct EGL/GLES2 path in the Wayland
# ozone surface factory so panfrost's EGL_EXT_image_dma_buf_import
# surfaces to chrome's GL display, lighting up the NV12 zero-copy
# native-pixmap pipeline. The launcher defaults to ANGLE (DCHECK in
# gl_context_egl.cc:241 fires on direct EGL with non-official builds);
# this patch keeps the direct path available for users who flip
# is_official_build=true and want the lower-CPU pipeline.
patch -Np1 -i "${srcdir}/patches/wayland-allow-direct-egl-gles2.patch"
# Fourier patch 3/3: pick GL_TEXTURE_EXTERNAL_OES for NV12 dmabufs
# whose DRM modifier is advertised external_only by the EGL driver.
# On panfrost / panthor every NV12 modifier (LINEAR + AFBC + AFRC) is
# external_only; chromium's default OzoneImageGLTexturesHolder picked
# GL_TEXTURE_2D and ANGLE then rejected the YUV EGLImage on a
# non-EXTERNAL_OES target, forcing the NV12->AR24 software conversion
# fallback. This closes that gap and enables the actual zero-copy
# path. Validated on ohm (RK3566 hantro): 1080p30 H.264 drops from
# ~131% combined CPU to ~34.7% (~3.8x). See patches/0004 for context.
patch -Np1 -i "${srcdir}/patches/nv12-external-oes-on-modifier-external-only.patch"
# Use system node, system java
case "$CARCH" in
aarch64) _node_dir=node-linux-arm64 ;;
x86_64) _node_dir=node-linux-x64 ;;
esac
rm -f "third_party/node/linux/${_node_dir}/bin/node"
mkdir -p "third_party/node/linux/${_node_dir}/bin"
ln -sf /usr/bin/node "third_party/node/linux/${_node_dir}/bin/"
ln -sf /usr/bin/java third_party/jdk/current/bin/ 2>/dev/null || true
}
build() {
cd "${srcdir}/chromium-${pkgver}"
case "$CARCH" in
aarch64) _target_cpu="arm64" ;;
x86_64) _target_cpu="x64" ;;
esac
local _flags=(
"target_cpu=\"${_target_cpu}\""
'is_official_build=false'
'is_debug=false'
# dcheck_always_on defaults to !is_official_build (true here);
# explicitly off so the direct EGL/GLES2 path doesn't FATAL on
# gl_context_egl.cc:241's DCHECK(!global_texture_share_group_).
'dcheck_always_on=false'
'symbol_level=0'
'is_cfi=false'
'treat_warnings_as_errors=false'
'enable_nacl=false'
'enable_widevine=false'
# System toolchain (clang/lld from pacman)
'custom_toolchain="//build/toolchain/linux/unbundle:default"'
'host_toolchain="//build/toolchain/linux/unbundle:default"'
'use_sysroot=false'
'use_custom_libcxx=true'
# The whole point of chromium-fourier — V4L2 HW decode on Linux
'use_v4l2_codec=true'
'use_v4lplugin=true'
'use_linux_v4l2_only=true'
'use_vaapi=false'
# Codec branding for proprietary codec support (H.264 etc.)
'ffmpeg_branding="Chrome"'
'proprietary_codecs=true'
'rtc_use_pipewire=true'
'link_pulseaudio=true'
'use_qt6=true'
'moc_qt6_path="/usr/lib/qt6"'
)
gn gen out/Default --args="${_flags[*]}"
ninja -C out/Default chrome chrome_crashpad_handler
}
package() {
cd "${srcdir}/chromium-${pkgver}"
install -Dm755 out/Default/chrome "${pkgdir}/usr/lib/chromium/chromium"
install -Dm755 out/Default/chrome_crashpad_handler \
"${pkgdir}/usr/lib/chromium/chrome_crashpad_handler"
[ -f out/Default/chrome_sandbox ] && install -Dm4755 out/Default/chrome_sandbox \
"${pkgdir}/usr/lib/chromium/chrome-sandbox"
[ -f out/Default/chromedriver ] && install -Dm755 out/Default/chromedriver \
"${pkgdir}/usr/bin/chromedriver"
# Bundled GL/Vulkan runtime — chrome dlopens these from its own dir,
# not /usr/lib/. Without them GL init fails and chrome falls back to
# software compositing.
for so in libEGL.so libGLESv2.so libvk_swiftshader.so libvulkan.so.1; do
[ -f "out/Default/$so" ] && install -Dm755 "out/Default/$so" \
"${pkgdir}/usr/lib/chromium/$so"
done
# ANGLE and SwiftShader ICD config files
for icd in out/Default/*_icd.json; do
[ -f "$icd" ] && install -Dm644 "$icd" \
"${pkgdir}/usr/lib/chromium/$(basename "$icd")"
done
# Resources / locales / pak files
for f in chrome_100_percent.pak chrome_200_percent.pak resources.pak \
v8_context_snapshot.bin snapshot_blob.bin icudtl.dat \
headless_lib_data.pak headless_lib_strings.pak \
headless_command_resources.pak; do
[ -f "out/Default/$f" ] && install -Dm644 "out/Default/$f" \
"${pkgdir}/usr/lib/chromium/$f"
done
# Locales
if [ -d out/Default/locales ]; then
install -dm755 "${pkgdir}/usr/lib/chromium/locales"
cp -r out/Default/locales/* "${pkgdir}/usr/lib/chromium/locales/"
fi
# Launcher shim — defaults to ANGLE→GLES on Wayland with Vulkan
# disabled. Vulkan is off by default because:
# - panvk on RK3566 (Mali-G52 Bifrost) returns
# VK_ERROR_INCOMPATIBLE_DRIVER on chromium's probe and breaks
# V4L2 dispatch downstream (chrome falls back to FFmpeg software);
# - panthor on RK3588 (Mali-G610 Valhall) is more functional but
# not yet validated end-to-end against this build.
#
# User overrides for development on other Rockchips:
# --enable-features=Vulkan enable Vulkan (panthor / others)
# --use-vulkan=native|swiftshader pick the Vulkan backend
# --disable-features=Vulkan explicit re-disable
# Any of those on the command line short-circuits the launcher's
# default disable, so the user's intent always wins.
install -dm755 "${pkgdir}/usr/bin"
cat > "${pkgdir}/usr/bin/chromium" <<'LAUNCHER'
#!/bin/bash
# chromium-fourier launcher — V4L2 HW decode + Wayland + ANGLE
# Vulkan disabled by default; pass --enable-features=Vulkan or
# --use-vulkan=native to opt in (e.g. RK3588 panthor work).
USER_HANDLES_VULKAN=0
for arg in "$@"; do
case "$arg" in
--use-vulkan*|--enable-features=*Vulkan*|--disable-features=*Vulkan*|--use-angle=vulkan*)
USER_HANDLES_VULKAN=1
break
;;
esac
done
vulkan_default=()
if [ "$USER_HANDLES_VULKAN" = 0 ]; then
vulkan_default=(--disable-features=Vulkan)
fi
exec /usr/lib/chromium/chromium \
--ozone-platform=wayland \
--use-gl=angle --use-angle=gles \
--enable-features=AcceleratedVideoDecoder \
"${vulkan_default[@]}" \
"$@"
LAUNCHER
chmod 0755 "${pkgdir}/usr/bin/chromium"
}
+165
View File
@@ -0,0 +1,165 @@
# chromium-fourier — Chromium with V4L2-stateless / VAAPI HW decode on mainline-kernel ARM Linux
## Goal
Patch upstream Chromium so it can do hardware video decode through
`VaapiVideoDecoder` on a plain Linux Wayland system on Rockchip
(RK3566 / RK3588), via our `marfrit/libva-v4l2-request-fourier` libva
backend. Target consumers: Brave (rebuilt from the same patched
Chromium), upstream Chromium itself for Arch Linux ARM, and any other
Blink-based browser that picks up the patches.
This fills a niche **no shipping fork** currently fills — every existing
ARM-Linux Chromium with HW decode (7Ji's `chromium-mpp`, Ubuntu's
`liujianfeng1994/rockchip-multimedia` PPA, JeffyCN's recipe, etc.) goes
through the **vendor MPP path** on **5.10 BSP kernel + X11 + Mali blob
+ panfork**. We're going for **mainline kernel + V4L2 stateless +
Wayland + panfrost / panthor**, the direction Fourier is heading anyway.
## What's actually broken in stock Chromium 147+
1. **`media/gpu/chromeos/video_decoder_pipeline.cc`** is selected on Linux
when it shouldn't be. It then tries to initialize a V4L2
`ImageProcessor` (a ChromeOS-specific m2m chip block for color
conversion / scaling) that doesn't exist on a plain Linux Wayland
system. Failure surface:
```
PickDecoderOutputFormat(): Initializing ImageProcessor; max buffers: 16
ERROR: failed Initialize()ing the frame pool
```
2. **`VaapiVideoDecoder`** *is* compiled into stock Chromium / brave-bin
(verified: `strings /opt/brave-bin/brave | grep VaapiVideoDecoder`),
but the chromeos pipeline preempts it. Once the pipeline fails, the
decoder selection bails — it never falls back to direct
`VaapiVideoDecoder`.
3. **`V4L2VideoDecoder`** factory is `BUILDFLAG(IS_CHROMEOS)`-gated.
Feature flags `UseChromeOSDirectVideoDecoder` /
`V4L2FlatStatelessVideoDecoder` are no-ops on a non-ChromeOS build —
their flag strings don't even appear in the binary.
## The patch shape (sketch)
### Patch 1 — bypass the chromeos pipeline on Linux
`media/gpu/chromeos/video_decoder_pipeline.cc` is reached from
`media/gpu/vaapi/vaapi_video_decoder.cc` somewhere in
`ApplyResolutionChangeWithScreenSizes`. Either:
- Skip the chromeos pipeline entirely on Linux non-ChromeOS, going
straight to `VaapiVideoDecoder::CreateContextAndScopedVASurfaces` for
frame allocation, OR
- Replace `PickDecoderOutputFormat`'s ImageProcessor probing with a
no-op that just trusts the VAAPI surface format (since on real
ChromeOS the IP does color conversion that we don't need in our
dmabuf-passthrough path).
Reference: [crbug 40192819] "VaapiVideoDecoder on linux"
(<https://issues.chromium.org/issues/40192819>).
### Patch 2 — un-gate `V4L2VideoDecoder` factory for non-ChromeOS Linux
Optional but useful as a fallback. The class is compiled in; only the
factory registration is `IS_CHROMEOS`-gated. The `igel-oss/meta-browser-hwdecode`
patch <https://github.com/igel-oss/meta-browser-hwdecode/blob/master/recipes-chromium/chromium/files/0001-Add-support-for-V4L2VDA-on-Linux.patch>
shows the historical pattern for the V4L2VDA factory; the modern factory
is in `media/gpu/v4l2/v4l2_video_decoder.{h,cc}`.
This is **plan B** if Patch 1 turns out harder than expected — V4L2VDA
talks to `/dev/videoN` directly without going through libva, which means
we don't depend on the `libva-v4l2-request-fourier` decode-submission
path either. Shorter end-to-end critical path, but a more invasive
factory change.
### Patch 3 — point libva at our backend (build-time)
Already configurable at runtime via `LIBVA_DRIVER_NAME=v4l2_request`,
but a Brave/Chromium binary that defaults to it would be cleaner. Could
patch the GPU process startup to set the env var if not already set.
Cosmetic, optional.
## Reference forks (read these side-by-side with our patch)
- **JeffyCN/meta-rockchip chromium recipe** — the upstream of the V4L2VDA
factory un-gating and `libv4l-rkmpp` shim that every shipping fork
uses. <https://github.com/JeffyCN/meta-rockchip/tree/master/dynamic-layers/recipes-browser/chromium>
Caveat: targets the V4L2VDA path (Plan B), not the modern
`VaapiVideoDecoder` (Plan A) we want.
- **igel-oss/meta-browser-hwdecode** — Yocto layer with the original
`0001-Add-support-for-V4L2VDA-on-Linux.patch`. 2017-vintage but the
pattern still applies. <https://github.com/igel-oss/meta-browser-hwdecode>
- **7Ji-PKGBUILDs/chromium-mpp** — the most recent ALARM-shipping
variant, Chromium 132 with MPP. Useful for the **PKGBUILD shape** and
the patch-set list, even though we're not using MPP.
<https://github.com/7Ji-PKGBUILDs/chromium-mpp>
- **amazingfate/chromium-debian-build** — Debian flavour of the same
approach, for reference. <https://github.com/amazingfate/chromium-debian-build>
## Build environment
- **Source tree**: `gn` / `depot_tools`, hosted on **fermi** (Arch ARM
aarch64 LXC on hertz). Fetch is ~30 GB; full chromium tree is ~100 GB
with build artifacts. Fermi's storage budget needs checking — may
need to bind-mount a hertz path with more headroom.
- **Build acceleration**: `distcc-avahi` is already deployed. Wire the
chromium build to use distcc through CT108 + tesla as compile workers.
Pump mode can shave further; chromium's ninja will accept distcc'd
cc/c++ via wrappers.
- **First build wall time estimate**: 610 hours initial on fermi alone;
35 hours with distcc-avahi if the network throughput holds. After
the first build, incrementals on small patches are ~1015 min.
- **Configure flags**: `gn args` — start from Arch's `chromium`
PKGBUILD, add `use_vaapi=true use_v4l2_codec=true is_official_build=false`.
- **Output**: `chromium-fourier-<chromium-ver>-1-aarch64.pkg.tar.zst`
shipping `/usr/bin/chromium`. `provides=(chromium) conflicts=(chromium)`
shape, same as `ffmpeg-v4l2-request-git`'s replacement of stock ffmpeg.
## Validation path
Once a patched binary builds:
1. Launch with the same env we use for mpv-vaapi:
`LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0`
2. Open `chrome://gpu` — should show "Video Decode: Hardware accelerated"
(not "Disabled" or "Software only").
3. Open a 1080p H.264 file:// URL, watch CPU on `/dev/video0` openers,
measure Brave's GPU process CPU% during playback.
4. Cross-reference with `mpv --hwdec=vaapi` numbers once the
libva-v4l2-request-fourier decode-submission path also lands.
## Order of operations
Phase A (this session): workspace + STUDY.md (this file). **Done.**
Phase B: build environment — install `depot_tools` on fermi, run
`fetch chromium`, get a baseline (unpatched) build to confirm fermi can
build chromium at all. ~one full day.
Phase C: identify the exact line to flip for Patch 1 by reading
`media/gpu/chromeos/video_decoder_pipeline.cc` +
`media/gpu/vaapi/vaapi_video_decoder.cc` against current Chromium
master. Iterate the patch on a real build.
Phase D: package as `chromium-fourier` PKGBUILD, hook into
marfrit-packages CI on fermi (already has the pattern from
ffmpeg-v4l2-request-git).
Phase E: rebase Patch 1 (and Patch 2 if needed) onto Brave's source
tree, ship as `brave-fourier` next to `chromium-fourier`. Brave's tree
adds ~50 patches on top of upstream Chromium; the chromeos-pipeline
seam should be unchanged across that delta, so this should be a
mechanical rebase.
## Out of scope
- HW video **encode** (cameras, webcam streams). RK3566 has a separate
encoder block on `/dev/video2`; not a Fourier priority.
- HEVC / VP9 / AV1 — RK3566 has no HW for these. RK3588 has VDPU381
but our libva backend doesn't speak HEVC yet.
- WebGL / WebGPU performance — separate concern, not part of video
decode.
- Brave-specific features (Shields, Wallet, etc.) — they all live in
Brave's source tree on top of Chromium and are unaffected by our
decode patches.
@@ -0,0 +1,38 @@
From: 7Ji <7Ji@example.com> (originally), adapted for chromium-fourier
Subject: Adjust compiler-rt library path layout for system clang on Arch
Linux ARM, where compiler-rt installs to lib/clang/N/lib/linux/ with
-aarch64 filename suffix instead of chromium's expected
lib/clang/N/lib/aarch64-unknown-linux-gnu/ layout.
diff --git a/build/config/clang/BUILD.gn b/build/config/clang/BUILD.gn
index d4de2e0cca0..57359c32121 100644
--- a/build/config/clang/BUILD.gn
+++ b/build/config/clang/BUILD.gn
@@ -130,12 +130,15 @@ template("clang_lib") {
} else if (is_linux || is_chromeos) {
if (current_cpu == "x64") {
_dir = "x86_64-unknown-linux-gnu"
+ _suffix = "-x86_64"
} else if (current_cpu == "x86") {
_dir = "i386-unknown-linux-gnu"
+ _suffix = "-i386"
} else if (current_cpu == "arm") {
_dir = "armv7-unknown-linux-gnueabihf"
} else if (current_cpu == "arm64") {
_dir = "aarch64-unknown-linux-gnu"
+ _suffix = "-aarch64"
} else {
assert(false) # Unhandled cpu type
}
@@ -166,6 +169,11 @@ template("clang_lib") {
assert(false) # Unhandled target platform
}
+ # Bit of a hack to make this find builtins from compiler-rt >= 16
+ if (is_linux || is_chromeos) {
+ _dir = "linux"
+ }
+
_clang_lib_dir = "$clang_base_path/lib/clang/$clang_version/lib"
_lib_file = "${_prefix}clang_rt.${_libname}${_suffix}.${_ext}"
libs = [ "$_clang_lib_dir/$_dir/$_lib_file" ]
@@ -0,0 +1,55 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: media: default kAcceleratedVideoDecodeLinux to enabled when
USE_V4L2_CODEC is the build's hardware decode path
Date: 2026-04-26
Background
----------
chromium-fourier targets mainline-Linux Wayland on Rockchip (RK3566 hantro,
RK3588 VDPU381) where the only HW video decode path is V4L2 stateless
(via the in-tree media/gpu/v4l2 stack). The build is configured with
use_vaapi = false
use_v4l2_codec = true
use_v4lplugin = true
use_linux_v4l2_only = true
Without this patch, GPU-process V4L2 decode is compiled in but stays
runtime-disabled by default. The runtime master gate
`media::kAcceleratedVideoDecodeLinux` (the user-visible feature name is
"AcceleratedVideoDecoder") is currently flipped to ENABLED_BY_DEFAULT only
when `BUILDFLAG(USE_VAAPI)` is set. On a USE_V4L2_CODEC-only build the
feature stays DISABLED_BY_DEFAULT, the linux gpu_mojo_media_client returns
`VideoDecoderType::kUnknown`, and `<video>` falls all the way back to
`media/filters/ffmpeg_video_decoder.cc` (software).
We confirmed this by hand on the PineTab2 (RK3566 hantro): with
`--enable-features=AcceleratedVideoDecoder` chrome correctly selects
`V4L2VideoDecoder` for h264 main, opens /dev/video1 + /dev/media0,
allocates 17 OUTPUT + 6 CAPTURE NV12 buffers, and runs SetExtCtrlsInit for
H264. Without the runtime flag, none of that happens.
Fix
---
Treat `USE_V4L2_CODEC` symmetrically with `USE_VAAPI` for the runtime
default of the master gate. A user can still disable it via
`--disable-features=AcceleratedVideoDecoder`.
This does NOT touch the `kAcceleratedVideoDecodeLinuxGL` companion gate
(already ENABLED_BY_DEFAULT) or any of the per-decoder selection logic in
`media/mojo/services/gpu_mojo_media_client_linux.cc` -- that file already
dispatches to the V4L2 decoder when `USE_V4L2_CODEC && !USE_VAAPI`, gated
behind the master flag we are flipping here.
diff --git a/media/base/media_switches.cc b/media/base/media_switches.cc
--- a/media/base/media_switches.cc
+++ b/media/base/media_switches.cc
@@ -749,7 +749,7 @@ BASE_FEATURE(kUnifiedAutoplay, base::FEATURE_ENABLED_BY_DEFAULT);
// on chromeos, but needs an experiment on linux.
BASE_FEATURE(kAcceleratedVideoDecodeLinux,
"AcceleratedVideoDecoder",
-#if BUILDFLAG(USE_VAAPI)
+#if BUILDFLAG(USE_VAAPI) || BUILDFLAG(USE_V4L2_CODEC)
base::FEATURE_ENABLED_BY_DEFAULT);
#else
base::FEATURE_DISABLED_BY_DEFAULT);
@@ -0,0 +1,187 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] gpu/ozone: pick GL_TEXTURE_EXTERNAL_OES for NV12 dmabufs whose
DRM modifier is advertised external_only by the EGL driver
Date: 2026-04-28
Background
----------
On mainline-Linux Mali GPUs (mesa panfrost / panthor on Bifrost / Valhall)
every NV12 modifier exposed by `eglQueryDmaBufModifiersEXT` is flagged
`external_only` — DRM_FORMAT_MOD_LINEAR + ARM AFBC × 2 + ARM AFRC. Mesa's
behavior is spec-correct: GLES sampling of multi-plane formats is
defined only via `samplerExternalOES`, never `sampler2D`. The chromium
NV12 import path at
`gpu/command_buffer/service/shared_image/ozone_image_gl_textures_holder.cc`
already chooses `GL_TEXTURE_EXTERNAL_OES` when the SharedImageFormat is
flagged `PrefersExternalSampler` — but that flag is only set for the
generic "multi-plane on Linux" case in
`media/gpu/chromeos/mailbox_video_frame_converter.cc`. Frames that
arrive with an `external_only`-flagged modifier from a producer that
didn't set the flag (V4L2 hantro NV12 with AFBC/AFRC capture format on
RK3588's rkvdec2, future NativePixmap producers, etc.) hit the
`GL_TEXTURE_2D` path; ANGLE's `validationES.cpp:4894` then rejects YUV
EGLImages on non-EXTERNAL_OES targets, and the import fails.
This patch closes the gap: the texture-target choice in
`OzoneImageGLTexturesHolder::GetBinding` now consults the EGL driver's
`external_only` annotation for the pixmap's actual modifier in addition
to `format.PrefersExternalSampler()`. If either says "external sampler
required", the target switches to `GL_TEXTURE_EXTERNAL_OES`. Skia
Ganesh handles `GL_TEXTURE_EXTERNAL_OES` natively via
`GrGLTextureInfo.fTarget`, so no shader changes are required. Same
infrastructure chromium already uses for Android camera / decoder
dmabufs, retargeted at the Linux ozone layer.
Result is cached per `(fourcc, modifier)` tuple via a function-local
static `base::flat_map`, so the EGL query is not on the per-frame hot
path — once per unique format+modifier combination, after which the
runtime cost is a hash lookup behind a base::Lock.
Bug crbug.com/1498703 is the closest existing tracker; framing this
upstream as "make Linux NV12 import path consistent with the
ChromeOS PrefersExternalSampler default" is the right angle.
diff --git a/gpu/command_buffer/service/shared_image/ozone_image_gl_textures_holder.cc b/gpu/command_buffer/service/shared_image/ozone_image_gl_textures_holder.cc
index 525bdcb0dc..43b0723326 100644
--- a/gpu/command_buffer/service/shared_image/ozone_image_gl_textures_holder.cc
+++ b/gpu/command_buffer/service/shared_image/ozone_image_gl_textures_holder.cc
@@ -16,6 +16,7 @@
#include "ui/gl/gl_bindings.h"
#include "ui/gl/scoped_binders.h"
#include "ui/ozone/public/gl_ozone.h"
+#include "ui/ozone/common/native_pixmap_egl_binding.h"
#include "ui/ozone/public/native_pixmap_gl_binding.h"
#include "ui/ozone/public/ozone_platform.h"
#include "ui/ozone/public/surface_factory_ozone.h"
@@ -82,7 +83,14 @@ std::unique_ptr<ui::NativePixmapGLBinding> GetBinding(
// being multiplanar (if using per-plane sampling of a multiplanar texture,
// the buffer format passed in here must be the single-planar format of the
// plane).
- if (format.PrefersExternalSampler()) {
+ // chromium-fourier: also pick GL_TEXTURE_EXTERNAL_OES whenever the
+ // pixmap's DRM modifier is advertised external_only by the EGL
+ // driver. Mesa panfrost / panthor mark every NV12 modifier
+ // external_only — the PrefersExternalSampler flag alone misses
+ // the AFBC / AFRC tiled paths.
+ if (format.PrefersExternalSampler() ||
+ ui::NativePixmapEGLBinding::ModifierRequiresExternalOES(
+ pixmap.get(), plane_format)) {
target = GL_TEXTURE_EXTERNAL_OES;
} else {
target = GL_TEXTURE_2D;
diff --git a/ui/ozone/common/native_pixmap_egl_binding.cc b/ui/ozone/common/native_pixmap_egl_binding.cc
index 31877f4459..6855c1093e 100644
--- a/ui/ozone/common/native_pixmap_egl_binding.cc
+++ b/ui/ozone/common/native_pixmap_egl_binding.cc
@@ -6,10 +6,13 @@
#include <array>
+#include "base/containers/flat_map.h"
#include "base/logging.h"
#include "base/memory/scoped_refptr.h"
+#include "base/no_destructor.h"
#include "base/notreached.h"
#include "base/numerics/safe_conversions.h"
+#include "base/synchronization/lock.h"
#include "ui/gfx/linux/drm_util_linux.h"
#include "ui/gl/gl_bindings.h"
#include "ui/gl/gl_surface_egl.h"
@@ -56,6 +59,75 @@ bool NativePixmapEGLBinding::IsSharedImageFormatSupported(
viz::SharedImageFormat format) {
return GetFourCCFormatFromSharedImageFormat(format) != DRM_FORMAT_INVALID;
}
+// static
+bool NativePixmapEGLBinding::ModifierRequiresExternalOES(
+ const gfx::NativePixmap* pixmap,
+ viz::SharedImageFormat format) {
+ // chromium-fourier: query the EGL driver for the (fourcc, modifier)
+ // tuple's external_only flag. Cache results — eglQueryDmaBufModifiersEXT
+ // is a synchronous round-trip into the driver and we want it off the
+ // per-frame hot path. The cache lives for the lifetime of the GPU
+ // process (modifier tables don't change after EGL init).
+ if (!pixmap) {
+ return false;
+ }
+ const uint64_t modifier = pixmap->GetFormatModifier();
+ if (modifier == gfx::NativePixmapHandle::kNoModifier) {
+ // Implicit linear — same answer the driver would give for the
+ // matching LINEAR entry, but cheaper not to query.
+ return false;
+ }
+ const uint32_t fourcc = GetFourCCFormatFromSharedImageFormat(format);
+ if (fourcc == DRM_FORMAT_INVALID) {
+ return false;
+ }
+
+ using Key = std::pair<uint32_t, uint64_t>;
+ static base::NoDestructor<base::Lock> cache_lock;
+ static base::NoDestructor<base::flat_map<Key, bool>> cache;
+ {
+ base::AutoLock lock(*cache_lock);
+ auto it = cache->find({fourcc, modifier});
+ if (it != cache->end()) {
+ return it->second;
+ }
+ }
+
+ bool external_only = false;
+ do {
+ auto* display = gl::GLSurfaceEGL::GetGLDisplayEGL();
+ if (!display || !display->ext->b_EGL_EXT_image_dma_buf_import_modifiers) {
+ break;
+ }
+ EGLDisplay egl_display = display->GetDisplay();
+ EGLint num_modifiers = 0;
+ if (!eglQueryDmaBufModifiersEXT(egl_display, fourcc, 0, nullptr, nullptr,
+ &num_modifiers) ||
+ num_modifiers <= 0) {
+ break;
+ }
+ std::vector<EGLuint64KHR> modifiers(num_modifiers);
+ std::vector<EGLBoolean> ext_only(num_modifiers);
+ if (!eglQueryDmaBufModifiersEXT(egl_display, fourcc, num_modifiers,
+ modifiers.data(), ext_only.data(),
+ &num_modifiers)) {
+ break;
+ }
+ for (EGLint i = 0; i < num_modifiers; ++i) {
+ if (modifiers[i] == modifier) {
+ external_only = (ext_only[i] == EGL_TRUE);
+ break;
+ }
+ }
+ } while (0);
+
+ {
+ base::AutoLock lock(*cache_lock);
+ cache->insert_or_assign({fourcc, modifier}, external_only);
+ }
+ return external_only;
+}
+
// static
std::unique_ptr<NativePixmapGLBinding> NativePixmapEGLBinding::Create(
diff --git a/ui/ozone/common/native_pixmap_egl_binding.h b/ui/ozone/common/native_pixmap_egl_binding.h
index 61fb0de77f..ad3ac9ced5 100644
--- a/ui/ozone/common/native_pixmap_egl_binding.h
+++ b/ui/ozone/common/native_pixmap_egl_binding.h
@@ -27,6 +27,17 @@ class NativePixmapEGLBinding : public NativePixmapGLBinding {
static bool IsSharedImageFormatSupported(viz::SharedImageFormat format);
+ // chromium-fourier: returns true when |pixmap|'s DRM format modifier
+ // is advertised by the EGL driver as `external_only` for the given
+ // SharedImage format. Used at SharedImage creation time to override
+ // the default GL_TEXTURE_2D target to GL_TEXTURE_EXTERNAL_OES so that
+ // mesa panfrost / panthor NV12 dmabufs (always external_only) import
+ // cleanly via glEGLImageTargetTexture2DOES + samplerExternalOES.
+ // Result is cached per (fourcc, modifier) tuple — the underlying
+ // eglQueryDmaBufModifiersEXT call is not on the per-frame hot path.
+ static bool ModifierRequiresExternalOES(const gfx::NativePixmap* pixmap,
+ viz::SharedImageFormat format);
+
// Create an EGLImage from a given NativePixmap and plane and bind
// |texture_id| to |target| followed by binding the image to |target|. The
// color space is for the external sampler: When we sample the YUV buffer as
@@ -0,0 +1,57 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: ozone/wayland: re-allow direct EGL/GLES2 path (no ANGLE shim)
Date: 2026-04-26
Background
----------
On Wayland-only ozone builds the surface factory currently advertises only
ANGLE-mediated GL implementations (`kOpenGL`, `kOpenGLES`, `kSwiftShader`,
`kVulkan`). Anything driving `--use-gl=egl` (the historical
`kGLImplementationEGLGLES2`) is rejected at startup with
Requested GL implementation (gl=egl-gles2,angle=none) not found in
allowed implementations: [(gl=egl-angle,angle=opengl|opengles|...)]
The downstream switch already handles `kGLImplementationEGLGLES2` in
`GetGLOzone`, so the dispatcher is wired -- it's the *advertisement* that
got tightened.
The cost of that tightening on RK3566 hantro / panfrost is real: with
ANGLE in the path, chrome's display-side EGL is ANGLE's own EGL. ANGLE's
GLES backend on Linux does not propagate
`EGL_EXT_image_dma_buf_import` through to the chrome GL display, so
`gpu_feature_info.supports_nv12_gl_native_pixmap` ends up false. That in
turn forces the V4L2-decoded NV12 frames through the
NV12-to-AR24 VPP conversion path before they hit the compositor, costing
~85 % CPU at 1080p30 even though the hantro VPU is doing the actual
decode for free.
Allowing the direct EGL/GLES2 path back means chrome's EGL is panfrost's
EGL (via mesa), which exposes the dmabuf-import extensions natively, and
the zero-copy NV12 native pixmap path lights up.
Fix
---
Add `kGLImplementationEGLGLES2` to the head of the allowed list; ANGLE
remains the default fallback and is still selected when the user passes
`--use-gl=angle ...`. The position is deliberate: on a USE_V4L2_CODEC
hardware-decode build the user almost always wants the dmabuf-capable
direct path; ANGLE is still there for browsers that need its conformance
fixups.
This does not affect non-Wayland ozone backends.
diff --git a/ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc b/ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc
--- a/ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc
+++ b/ui/ozone/platform/wayland/gpu/wayland_surface_factory.cc
@@ -223,6 +223,10 @@ std::vector<gl::GLImplementationParts>
WaylandSurfaceFactory::GetAllowedGLImplementations() {
std::vector<gl::GLImplementationParts> impls;
if (egl_implementation_) {
+ // chromium-fourier: keep the direct EGL/GLES2 path available so
+ // panfrost's EGL_EXT_image_dma_buf_import surfaces to chrome's GL
+ // display layer. See patch header for rationale.
+ impls.emplace_back(gl::GLImplementationParts(gl::kGLImplementationEGLGLES2));
impls.emplace_back(gl::ANGLEImplementation::kOpenGL);
impls.emplace_back(gl::ANGLEImplementation::kOpenGLES);
impls.emplace_back(gl::ANGLEImplementation::kSwiftShader);
+7 -10
View File
@@ -3,22 +3,19 @@
# Source of truth: git.reauktion.de/marfrit/claude-his-agent
pkgname=claude-his-agent
pkgver=0.1.8
pkgver=0.2.0
pkgrel=1
pkgdesc="Home Infrastructure Specialist subagent + skill for Claude Code (mfritsche home infra)"
pkgdesc="Claude Code framework for a private Home Infrastructure Specialist subagent — runbook fetched at install time, not bundled"
arch=('any')
url="https://git.reauktion.de/marfrit/claude-his-agent"
license=('custom')
depends=('bash')
depends=('bash' 'rsync' 'openssh')
source=("${pkgname}-${pkgver}.tar.gz::https://git.reauktion.de/marfrit/claude-his-agent/archive/v${pkgver}.tar.gz")
sha256sums=('d25dca1346085acccc32abd9e8064a712c80c99c79102c2acacf27ed4c238f65')
sha256sums=('c39dd1a956d303ac2417498dde05ac923bf686f1fc978f78f0d63ca42432b8b8')
package() {
cd "${pkgname}"
install -Dm644 agents/his.md "${pkgdir}/usr/share/claude-agents/his.md"
install -Dm644 skills/his/SKILL.md "${pkgdir}/usr/share/claude-skills/his/SKILL.md"
install -Dm755 scripts/claude-his-install "${pkgdir}/usr/bin/claude-his-install"
install -Dm755 scripts/repo-inventory.sh "${pkgdir}/usr/bin/repo-inventory.sh"
install -Dm755 scripts/repo-inventory-nosudo.sh "${pkgdir}/usr/bin/repo-inventory-nosudo.sh"
install -Dm644 README.md "${pkgdir}/usr/share/doc/${pkgname}/README.md"
install -Dm755 bin/claude-his-fetch "${pkgdir}/usr/bin/claude-his-fetch"
install -Dm755 bin/claude-his-install "${pkgdir}/usr/bin/claude-his-install"
install -Dm644 README.md "${pkgdir}/usr/share/doc/${pkgname}/README.md"
}
+71
View File
@@ -0,0 +1,71 @@
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
#
# daedalus-v4l2-dkms — DKMS package for the daedalus_v4l2 out-of-tree
# kernel module (V4L2 stateless decoder shim for Pi 5 / CM5).
#
# Pair to daedalus-v4l2 (userspace daemon). When loaded, the module
# registers /dev/videoNN (V4L2 m2m) + /dev/mediaNN (media controller) +
# /dev/daedalus-v4l2 (chardev to the userspace daemon). Userspace
# clients drive the V4L2 m2m + request API path; the daemon does the
# actual FFmpeg-backed decode on /dev/daedalus-v4l2.
#
# Project: https://git.reauktion.de/reauktion/daedalus-v4l2
# Sibling userspace package: daedalus-v4l2
# Sibling consumer: libva-v4l2-request-fourier
pkgname=daedalus-v4l2-dkms
_module=daedalus_v4l2
# Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
# bit-versioned together so the chardev wire protocol stays in sync.
_commit=f55b2cdab8a8c0bc04e8c1bb1d0b6ca85e7d96d2
pkgver=0.1.0.r16.f55b2cd
pkgrel=1
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
arch=('any')
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
license=('GPL-2.0-or-later')
depends=('dkms')
makedepends=('git')
install="${pkgname}.install"
source=("git+https://git.reauktion.de/reauktion/daedalus-v4l2.git#commit=${_commit}"
"dkms.conf"
"${pkgname}.install")
sha256sums=('SKIP'
'SKIP'
'SKIP')
pkgver() {
cd "${srcdir}/daedalus-v4l2"
printf '0.1.0.r%s.%s' \
"$(git rev-list --count HEAD)" \
"$(git rev-parse --short=7 HEAD)"
}
package() {
local _src="${pkgdir}/usr/src/${_module}-${pkgver}"
install -dm755 "${_src}"
# Install the kernel/ subdir and embed the shared proto header in
# the same tree. The in-tree Makefile uses
# `ccflags-y += -I$(src)/../include` (assuming the parent
# daedalus-v4l2 layout); for DKMS we flatten by copying the header
# into kernel/include/ and patching the Makefile to point there.
cp -r "${srcdir}/daedalus-v4l2/kernel/." "${_src}/"
install -Dm644 "${srcdir}/daedalus-v4l2/include/daedalus_v4l2_proto.h" \
"${_src}/include/daedalus_v4l2_proto.h"
sed -i 's|-I\$(src)/\.\./include|-I$(src)/include|' "${_src}/Makefile"
# dkms.conf at the root of the source tree (DKMS convention).
# Substitute #MODULE_VERSION# placeholder with the actual pkgver
# so dkms install/uninstall match what's on disk.
install -Dm644 "${srcdir}/dkms.conf" "${_src}/dkms.conf"
sed -i "s/#MODULE_VERSION#/${pkgver}/" "${_src}/dkms.conf"
# License
install -Dm644 "${srcdir}/daedalus-v4l2/kernel/daedalus_v4l2_main.c" \
"${pkgdir}/usr/share/licenses/${pkgname}/SPDX-HEADER"
}
@@ -0,0 +1,61 @@
# post-install / post-upgrade hook for daedalus-v4l2-dkms.
#
# pacman + the dkms-helpers alpm hook will already attempt
# `dkms install` on its own. This script's job is to emit a
# loud, actionable warning when the module didn't actually
# build for the running kernel — most commonly because the
# kernel headers package isn't installed yet.
#
# Without this you get a silent failure: the package looks
# installed but `modprobe daedalus_v4l2` returns ENOENT.
_check_dkms_built() {
local name=daedalus_v4l2
local ver=$1
local kernelver=$(uname -r)
if ! command -v dkms >/dev/null 2>&1; then
return 1 # the hard-dep should have caught this
fi
local status
status=$(dkms status -m "$name" -v "$ver" -k "$kernelver" 2>/dev/null || true)
if printf '%s\n' "$status" | grep -q -E 'installed|loaded'; then
return 0 # all good
fi
cat >&2 <<EOF
==> daedalus-v4l2-dkms: DKMS build did NOT land for kernel $kernelver.
==> dkms status -m $name -v $ver -k $kernelver:
==> $(printf '%s' "$status" | head -1)
==>
==> Most likely cause: kernel headers package is missing.
==> Arch / ALARM: pacman -S linux-rpi-headers (or linux-rpi5-headers)
==> Raspberry Pi OS: apt install linux-headers-rpi-2712
==>
==> After installing headers, finish the install with:
==> sudo dkms autoinstall $name/$ver
==> sudo modprobe daedalus_v4l2
==>
==> Until then daedalus_v4l2 will NOT be loadable and the
==> userspace daedalus-v4l2 daemon will have nothing to talk to.
EOF
return 1
}
post_install() {
_check_dkms_built "$1" || true
}
post_upgrade() {
# New version pinned by the bump may have built fine, but if
# a kernel-headers package was uninstalled / pruned since the
# last upgrade we'd silently regress. Re-check.
_check_dkms_built "$1" || true
}
pre_remove() {
# The dkms alpm hook handles dkms remove on its own; nothing
# we need to add here.
:
}
+19
View File
@@ -0,0 +1,19 @@
# DKMS configuration for daedalus_v4l2 — V4L2 stateless decoder shim.
#
# Built against /lib/modules/$kernelver/build with the in-tree Makefile.
# The Makefile uses `obj-m := daedalus_v4l2.o` and links
# daedalus_v4l2_main.o + daedalus_v4l2_chardev.o into the final .ko.
PACKAGE_NAME="daedalus_v4l2"
PACKAGE_VERSION="#MODULE_VERSION#"
# Single module produced by the Makefile.
BUILT_MODULE_NAME[0]="daedalus_v4l2"
DEST_MODULE_LOCATION[0]="/updates"
# Use the package's own Makefile — it already does
# `$(MAKE) -C $(KERNELDIR) M=$(PWD) modules`.
MAKE[0]="make KERNELDIR=/lib/modules/${kernelver}/build all"
CLEAN="make KERNELDIR=/lib/modules/${kernelver}/build clean"
AUTOINSTALL="yes"
+105
View File
@@ -0,0 +1,105 @@
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
#
# daedalus-v4l2 — userspace daemon + V4L2 m2m test tools.
#
# Pair to daedalus-v4l2-dkms (kernel module). Together they expose
# /dev/videoNN + /dev/mediaNN as a V4L2 stateless decoder shim on Pi 5 /
# CM5, decoding VP9 / AV1 / H.264 via dlopen'd FFmpeg in a single-
# threaded daemon and shipping decoded NV12 / P010 back through dmabuf.
# Consumed end-to-end by libva-v4l2-request-fourier (>= 1.0.0.r376) so
# `ffmpeg -hwaccel vaapi` against vp9_small.ivf produces byte-exact NV12.
#
# Project: https://git.reauktion.de/reauktion/daedalus-v4l2
# Sibling kernel package: daedalus-v4l2-dkms
# Sibling consumer: libva-v4l2-request-fourier
pkgname=daedalus-v4l2
_upstreampkg=daedalus-v4l2
# Pin the daedalus-v4l2 tip. f55b2cd = "Phase 8.13: byte-exact end-to-
# end via libva (consumer target hit)" — first commit where the full
# ffmpeg -hwaccel vaapi → libva → /dev/video0 → daemon path lands a
# pixel-correct decoded frame back in ffmpeg. Promote to a later pin
# only after a future phase closes cleanly.
_commit=f55b2cdab8a8c0bc04e8c1bb1d0b6ca85e7d96d2
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
# Phase 8.x close. pkgver() recomputes at build time.
pkgver=0.1.0.r16.f55b2cd
pkgrel=1
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
arch=('aarch64')
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
license=('BSD-2-Clause' 'GPL-2.0-or-later')
# Daemon dlopens libavformat.so.61 / libavcodec.so.61 / libavutil.so.59
# at runtime (Option γ — see daemon/src/ffmpeg_loader.h). ffmpeg
# provides those; we don't link them.
depends=('ffmpeg' 'libdrm')
# Headers from libav*-dev needed at compile time for type-safe function
# pointer signatures; pkg-config locates them.
makedepends=('cmake' 'ninja' 'pkgconf' 'git' 'ffmpeg')
optdepends=('daedalus-v4l2-dkms: kernel module providing /dev/video0 + /dev/daedalus-v4l2'
'libva-v4l2-request-fourier: VA-API consumer routing through this daemon')
source=("git+https://git.reauktion.de/reauktion/daedalus-v4l2.git#commit=${_commit}")
sha256sums=('SKIP')
pkgver() {
cd "${srcdir}/${_upstreampkg}"
printf '0.1.0.r%s.%s' \
"$(git rev-list --count HEAD)" \
"$(git rev-parse --short=7 HEAD)"
}
build() {
cd "${srcdir}/${_upstreampkg}/daemon"
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr
cmake --build build
cd "${srcdir}/${_upstreampkg}/tools"
make
}
package() {
cd "${srcdir}/${_upstreampkg}"
# Daemon binary
install -Dm755 daemon/build/daedalus_v4l2_daemon \
"${pkgdir}/usr/bin/daedalus_v4l2_daemon"
# Test tools (under /usr/libexec to keep them out of the default PATH
# — they're for verification, not daily use).
install -Dm755 tools/test_chardev_pingpong \
"${pkgdir}/usr/libexec/daedalus-v4l2/test_chardev_pingpong"
install -Dm755 tools/test_m2m_decode \
"${pkgdir}/usr/libexec/daedalus-v4l2/test_m2m_decode"
install -Dm755 tools/test_m2m_stream \
"${pkgdir}/usr/libexec/daedalus-v4l2/test_m2m_stream"
# Shared wire-protocol header (kernel ↔ daemon); useful for
# third-party clients of the chardev.
install -Dm644 include/daedalus_v4l2_proto.h \
"${pkgdir}/usr/include/daedalus_v4l2_proto.h"
# Documentation
install -Dm644 README.md \
"${pkgdir}/usr/share/doc/${pkgname}/README.md"
for d in docs/*.md; do
install -Dm644 "$d" "${pkgdir}/usr/share/doc/${pkgname}/$(basename "$d")"
done
# Licenses: BSD-2-Clause for daemon/tools, GPL for the kernel proto
# header; the SPDX headers in src/ are the canonical declaration but
# ship a short note here for package-manager-driven license queries.
install -dm755 "${pkgdir}/usr/share/licenses/${pkgname}"
cat > "${pkgdir}/usr/share/licenses/${pkgname}/LICENSE" <<'EOF'
daedalus-v4l2 userspace components are BSD-2-Clause licensed.
The shared kernel↔daemon wire protocol header
(/usr/include/daedalus_v4l2_proto.h) is GPL-2.0-or-later WITH
Linux-syscall-note for kernel-side compatibility. See SPDX
headers on individual source files for the canonical
per-file declaration.
EOF
}
+16 -1
View File
@@ -10,7 +10,7 @@
pkgname=distcc-avahi
_pkgname=distcc
pkgver=3.4
pkgrel=17
pkgrel=20
pkgdesc="Distributed compilation service for C, C++ and Objective-C (with Avahi/Zeroconf support)"
arch=('x86_64' 'aarch64')
url="https://github.com/distcc/distcc"
@@ -22,12 +22,14 @@ provides=("distcc=${pkgver}")
conflicts=('distcc')
replaces=('distcc')
backup=('etc/conf.d/distccd')
install=${pkgname}.install
source=(
"${_pkgname}-${pkgver}.tar.gz::https://github.com/distcc/distcc/archive/refs/tags/v${pkgver}.tar.gz"
"distccd.conf"
"distccd.service"
"distcc.tmpfiles"
"fix-gcc-rewrite-fqn-overflow.patch"
"${pkgname}.install"
)
sha256sums=(
'37a34c9555498a1168fea026b292ab07e7bb394715d87d8403e0c33b16d2d008'
@@ -35,6 +37,7 @@ sha256sums=(
'a4f1d1bb21d61d41f22e918b448cfb852a6d95b0d3b922bd82805090cb2ce41a'
'd8aee2eb895c02a39e0f2b76fd4a5c9dce91405f1c443286ca324628eadbf3f1'
'7ff56af2ea505bfbf65ceeb0c8f752295f73ffb1173c26a6e978840fad04f651'
'17141d91a5c80235287d16390a588395ab8fa5fbd129679966721625e604d8f4'
)
prepare() {
@@ -76,4 +79,16 @@ package() {
install -Dm644 ../distccd.conf "${pkgdir}/etc/conf.d/distccd"
install -Dm644 ../distccd.service "${pkgdir}/usr/lib/systemd/system/distccd.service"
install -Dm644 ../distcc.tmpfiles "${pkgdir}/usr/lib/tmpfiles.d/distcc.conf"
# Compiler symlinks — mirror Arch upstream layout so makepkg's
# buildenv_distcc() picks /usr/lib/distcc/bin up. Without the bin/
# subdir the BUILDENV=(distcc ...) hook is silently a no-op.
local _targets=(c++ c89 c99 cc clang clang++ cpp g++ gcc
"${CHOST}-g++" "${CHOST}-gcc"
"${CHOST}-gcc-$(gcc -dumpversion)")
install -d "${pkgdir}/usr/lib/${_pkgname}/bin"
for _bin in "${_targets[@]}"; do
ln -sf "../../bin/${_pkgname}" "${pkgdir}/usr/lib/${_pkgname}/${_bin}"
ln -sf "../../../bin/${_pkgname}" "${pkgdir}/usr/lib/${_pkgname}/bin/${_bin}"
done
}
+41
View File
@@ -0,0 +1,41 @@
_fix_conf() {
local conf=/etc/conf.d/distccd
if [ -f "$conf" ] && grep -q '^DISTCC_ARGS=' "$conf"; then
cp -a "$conf" "${conf}.pre-distcc-avahi-fix.$(date +%Y%m%d-%H%M%S)"
sed -i 's/^DISTCC_ARGS=/DISTCC_OPTS=/' "$conf"
echo "==> distcc-avahi: renamed DISTCC_ARGS -> DISTCC_OPTS in $conf"
echo " (the systemd unit reads \$DISTCC_OPTS; backup left as ${conf}.pre-distcc-avahi-fix.*)"
fi
}
_clean_legacy_symlinks() {
# Versions <= 3.4-18 shipped no symlinks in the package and instead ran
# /usr/sbin/update-distcc-symlinks at .install time, leaving untracked
# files at /usr/lib/distcc/<compiler>. Remove those so pacman can drop
# the now-tracked symlinks in their place without a file conflict.
local d=/usr/lib/distcc
[ -d "$d" ] || return 0
local f
for f in "$d"/*; do
[ -L "$f" ] || continue
case "$(readlink "$f")" in
../../bin/distcc) rm -f "$f" ;;
esac
done
}
post_install() {
_fix_conf
}
pre_upgrade() {
_clean_legacy_symlinks
}
post_upgrade() {
_fix_conf
if systemctl is-active --quiet distccd 2>/dev/null; then
echo "==> distcc-avahi: distccd.service is active; restart with"
echo " 'sudo systemctl restart distccd' to pick up any conf change"
fi
}
@@ -0,0 +1,166 @@
--- a/libavutil/hwcontext_v4l2request.c
+++ b/libavutil/hwcontext_v4l2request.c
@@ -19,12 +19,13 @@
#include "config.h"
#include <fcntl.h>
#include <linux/dma-buf.h>
#include <linux/media.h>
#include <sys/ioctl.h>
+#include <sys/stat.h>
#include <sys/mman.h>
#include <unistd.h>
#include <drm_fourcc.h>
#include <libudev.h>
@@ -690,12 +691,125 @@
}
udev_enumerate_unref(enumerate);
return ret;
}
+/*
+ * Brute-force fallback used when libudev's scan fails (e.g. inside firefox's
+ * RDD sandbox where Mozilla's broker rejects fd-relative openat used by
+ * systemd's chase() symlink resolver). Iterates /dev/video[0..63], picks the
+ * one whose major/minor matches the requested devnum.
+ */
+static char *v4l2request_devnum_to_video_path_brute(dev_t devnum)
+{
+ char path[32];
+ struct stat st;
+ for (int i = 0; i < 64; i++) {
+ snprintf(path, sizeof(path), "/dev/video%d", i);
+ if (stat(path, &st) < 0)
+ continue;
+ if (st.st_rdev == devnum)
+ return av_strdup(path);
+ }
+ return NULL;
+}
+
+/* Brute-force version of v4l2request_probe_video_devices: replaces the
+ * udev_device_new_from_devnum + udev_device_get_devnode flow with
+ * stat()-based major/minor matching against /dev/video[0..63]. */
+static int v4l2request_probe_video_devices_brute(AVHWFramesContext *hwfc,
+ uint32_t pixelformat,
+ uint32_t buffersize)
+{
+ AVV4L2RequestFramesContext *fctx = hwfc->hwctx;
+ AVV4L2RequestFramesContextInternal *fctxi = fctx->internal;
+ struct media_device_info device_info;
+ struct media_v2_topology topology = {0};
+ struct media_v2_interface *interfaces;
+ char *path;
+ dev_t devnum;
+ int ret;
+
+ if (ioctl(fctxi->media_fd, MEDIA_IOC_DEVICE_INFO, &device_info) < 0)
+ return AVERROR(errno);
+
+ if (ioctl(fctxi->media_fd, MEDIA_IOC_G_TOPOLOGY, &topology) < 0)
+ return AVERROR(errno);
+
+ if (!topology.num_interfaces)
+ return AVERROR(ENOENT);
+
+ interfaces = av_calloc(topology.num_interfaces,
+ sizeof(struct media_v2_interface));
+ if (!interfaces)
+ return AVERROR(ENOMEM);
+
+ topology.ptr_interfaces = (__u64)(uintptr_t)interfaces;
+ if (ioctl(fctxi->media_fd, MEDIA_IOC_G_TOPOLOGY, &topology) < 0) {
+ ret = AVERROR(errno);
+ goto fail;
+ }
+
+ ret = AVERROR(ENOENT);
+ for (unsigned i = 0; i < topology.num_interfaces; i++) {
+ if (interfaces[i].intf_type != MEDIA_INTF_T_V4L_VIDEO)
+ continue;
+
+ devnum = makedev(interfaces[i].devnode.major,
+ interfaces[i].devnode.minor);
+ path = v4l2request_devnum_to_video_path_brute(devnum);
+ if (!path)
+ continue;
+
+ ret = v4l2request_probe_video_device(hwfc, path, pixelformat, buffersize);
+ if (!ret) {
+ av_log(hwfc, AV_LOG_INFO,
+ "Using V4L2 media driver %s (brute-force) for %s\n",
+ device_info.driver, av_fourcc2str(pixelformat));
+ av_free(path);
+ break;
+ }
+ av_free(path);
+ }
+
+fail:
+ av_free(interfaces);
+ return ret;
+}
+
+/* Brute-force fallback for v4l2request_probe_media_devices(). Iterates
+ * /dev/media[0..15], opens each, probes via topology+stat. */
+static int v4l2request_probe_media_devices_brute(AVHWFramesContext *hwfc,
+ uint32_t pixelformat,
+ uint32_t buffersize)
+{
+ AVV4L2RequestFramesContext *fctx = hwfc->hwctx;
+ AVV4L2RequestFramesContextInternal *fctxi = fctx->internal;
+ char path[32];
+ int ret = AVERROR(ENOENT);
+
+ for (int i = 0; i < 16; i++) {
+ snprintf(path, sizeof(path), "/dev/media%d", i);
+
+ fctxi->media_fd = open(path, O_RDWR);
+ if (fctxi->media_fd < 0)
+ continue;
+
+ ret = v4l2request_probe_video_devices_brute(hwfc, pixelformat,
+ buffersize);
+ if (!ret)
+ return 0;
+
+ close(fctxi->media_fd);
+ fctxi->media_fd = -1;
+ }
+
+ return ret;
+}
+
static int v4l2request_open_decoder(AVHWFramesContext *hwfc)
{
AVV4L2RequestFramesContext *fctx = hwfc->hwctx;
uint32_t buffersize;
struct udev *udev;
int ret;
@@ -712,12 +826,23 @@
buffersize = FFMAX(hwfc->width * hwfc->height * 3 / 2, 256 * 1024);
// Probe all media devices (auto-detection)
ret = v4l2request_probe_media_devices(hwfc, udev, fctx->pixelformat, buffersize);
+ // Brute-force fallback when libudev fails. Firefox-fourier hits this
+ // because Mozilla's RDD sandbox blocks fd-relative openat used by
+ // systemd's chase() symlink resolver inside udev_enumerate_scan_devices.
+ if (ret < 0) {
+ av_log(hwfc, AV_LOG_INFO,
+ "libudev probe failed (%d), falling back to brute-force /dev/media*\n",
+ ret);
+ ret = v4l2request_probe_media_devices_brute(hwfc, fctx->pixelformat,
+ buffersize);
+ }
+
udev_unref(udev);
return ret;
}
static AVBufferRef *v4l2request_v4l2_buffer_alloc(AVHWFramesContext *hwfc,
struct v4l2_format *format)
@@ -0,0 +1,178 @@
From 0cd6e669735e453ec8772f111065bbb2f70a5bc6 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Mon, 18 May 2026 07:27:10 +0000
Subject: [PATCH] avutil/hwcontext_v4l2request: unpack NV15 to P010 in
transfer_data_from
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
V4L2_PIX_FMT_NV15 (RK3399/RK3588 rkvdec 10-bit 4:2:0 capture) is mapped to
sw_format = AV_PIX_FMT_YUV420P10 in v4l2request_capture_pixelformats[]. The
existing transfer_get_formats explicitly blanked the format list for that
sw_format, so 'ffmpeg -hwaccel v4l2request -vf hwdownload,format=p010le' on
a Hi10P / Main10 input failed at filter init with EINVAL before reaching
the actual decode (which itself succeeds — 2 frames decoded cleanly).
Expose AV_PIX_FMT_P010 as the transfer target for NV15-backed surfaces and
unpack the packed 10-bit samples into the standard high-bits-of-16 layout
inside transfer_data_from. Luma and chroma share the same packing format
(5 bytes per 4 samples, little endian); chroma plane is W × H/2 samples
for 4:2:0.
The other 'needs custom unpack' sw_formats (YUV420P / Allwinner NV12_32L32
tiled and YUV422P10 / rkvdec NV20) keep the original ENOSYS path because
they need different unpack code that isn't covered by this patch.
Closes marfrit/marfrit-packages#21.
---
libavutil/hwcontext_v4l2request.c | 111 +++++++++++++++++++++++++++++-
1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/libavutil/hwcontext_v4l2request.c b/libavutil/hwcontext_v4l2request.c
index b6633d9081..3842160dfb 100644
--- a/libavutil/hwcontext_v4l2request.c
+++ b/libavutil/hwcontext_v4l2request.c
@@ -1073,6 +1073,56 @@ fail:
return ret;
}
+/*
+ * Unpack one NV15-packed 10-bit plane (5 bytes per 4 samples, little endian)
+ * into a P010-style plane (10 bits in the high bits of a 16-bit container).
+ * `dst_stride` is in bytes; `src_stride` is bytes per row of NV15 data.
+ */
+static void v4l2request_nv15_unpack_plane_to_p010(const uint8_t *src,
+ uint16_t *dst,
+ unsigned width,
+ unsigned height,
+ unsigned src_stride,
+ unsigned dst_stride)
+{
+ for (unsigned y = 0; y < height; y++) {
+ const uint8_t *s = src + y * src_stride;
+ uint16_t *d = (uint16_t *)((uint8_t *)dst + y * dst_stride);
+ unsigned x;
+
+ for (x = 0; x + 4 <= width; x += 4) {
+ uint16_t a = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
+ uint16_t b = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
+ uint16_t c = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
+ uint16_t e = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
+
+ d[0] = (uint16_t)(a << 6);
+ d[1] = (uint16_t)(b << 6);
+ d[2] = (uint16_t)(c << 6);
+ d[3] = (uint16_t)(e << 6);
+
+ d += 4;
+ s += 5;
+ }
+
+ if (x < width) {
+ unsigned rem = width - x;
+ uint16_t pix[4] = { 0, 0, 0, 0 };
+
+ pix[0] = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
+ if (rem >= 2)
+ pix[1] = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
+ if (rem >= 3)
+ pix[2] = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
+ if (rem >= 4)
+ pix[3] = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
+
+ for (unsigned j = 0; j < rem; j++)
+ d[j] = (uint16_t)(pix[j] << 6);
+ }
+ }
+}
+
static int v4l2request_transfer_get_formats(AVHWFramesContext *hwfc,
enum AVHWFrameTransferDirection dir,
enum AVPixelFormat **formats)
@@ -1082,6 +1132,22 @@ static int v4l2request_transfer_get_formats(AVHWFramesContext *hwfc,
if (dir == AV_HWFRAME_TRANSFER_DIRECTION_TO)
return AVERROR(ENOSYS);
+ /*
+ * NV15-backed surfaces (sw_format = YUV420P10) are exposed as P010 to
+ * downstream filters: the unpack below converts the packed 10-bit
+ * samples into the standard high-bits-of-16 layout. Hi10P / Main10
+ * VAAPI/v4l2-request decode reaches userspace through this path.
+ */
+ if (hwfc->sw_format == AV_PIX_FMT_YUV420P10) {
+ fmts = av_malloc_array(2, sizeof(*fmts));
+ if (!fmts)
+ return AVERROR(ENOMEM);
+ fmts[0] = AV_PIX_FMT_P010;
+ fmts[1] = AV_PIX_FMT_NONE;
+ *formats = fmts;
+ return 0;
+ }
+
fmts = av_malloc_array(2, sizeof(*fmts));
if (!fmts)
return AVERROR(ENOMEM);
@@ -1089,8 +1155,13 @@ static int v4l2request_transfer_get_formats(AVHWFramesContext *hwfc,
fmts[0] = hwfc->sw_format;
fmts[1] = AV_PIX_FMT_NONE;
+ /*
+ * Tiled-NV12-32L32 (Allwinner) and NV20 (rkvdec 4:2:2 10-bit) still need
+ * dedicated unpacks before hwdownload can consume them; leave them as
+ * "no transfer formats" so the filter graph reports the limitation
+ * rather than silently producing garbage.
+ */
if (hwfc->sw_format == AV_PIX_FMT_YUV420P ||
- hwfc->sw_format == AV_PIX_FMT_YUV420P10 ||
hwfc->sw_format == AV_PIX_FMT_YUV422P10)
fmts[0] = AV_PIX_FMT_NONE;
@@ -1110,6 +1181,44 @@ static int v4l2request_transfer_data_from(AVHWFramesContext *hwfc,
map = av_frame_alloc();
if (!map)
return AVERROR(ENOMEM);
+
+ /*
+ * For NV15→P010, map the raw NV15 bytes (sw_format) and unpack into
+ * dst's P010 storage. Otherwise fall through to the original byte-copy
+ * path used for 1:1 sw_format matches (NV12, NV16, AFBC handled by DRM).
+ */
+ if (hwfc->sw_format == AV_PIX_FMT_YUV420P10) {
+ /*
+ * Only P010 is advertised by transfer_get_formats for this sw_format;
+ * a caller that bypasses get_formats and asks for anything else would
+ * silently corrupt output via av_frame_copy on NV15-packed bytes.
+ * Reject explicitly.
+ */
+ if (dst->format != AV_PIX_FMT_P010) {
+ ret = AVERROR(ENOSYS);
+ goto fail;
+ }
+
+ map->format = hwfc->sw_format;
+ ret = v4l2request_map_frame(hwfc, map, src);
+ if (ret)
+ goto fail;
+
+ v4l2request_nv15_unpack_plane_to_p010(map->data[0],
+ (uint16_t *)dst->data[0],
+ dst->width, dst->height,
+ map->linesize[0],
+ dst->linesize[0]);
+ /* NV15 chroma plane is W × H/2 samples (4:2:0, UV interleaved). */
+ v4l2request_nv15_unpack_plane_to_p010(map->data[1],
+ (uint16_t *)dst->data[1],
+ dst->width, dst->height / 2,
+ map->linesize[1],
+ dst->linesize[1]);
+ ret = 0;
+ goto fail;
+ }
+
map->format = dst->format;
ret = v4l2request_map_frame(hwfc, map, src);
--
2.47.3
+160
View File
@@ -0,0 +1,160 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# FFmpeg + V4L2-Request hwaccel (stateless video decode on Rockchip,
# Allwinner, etc) for the Fourier umbrella. Tracks Kwiboo's long-running
# rebase of Jernej Škrabec's v4l2-request patchset onto ffmpeg release
# tags. Pins the branch tip to a known commit for reproducible CI builds;
# bump _commit when upstream picks up a fix we want.
#
# Why this fork instead of AUR ffmpeg-v4l2-request-git:
# - AUR is pinned to 6.1.1 with epoch=2, which is OLDER than Arch's
# stock 2:8.1-3 → installing it downgrades system ffmpeg.
# - AUR pulls X11/AMF/CUDA/FireWire/AviSynth/OpenMPT/Bluray — irrelevant
# for a Wayland + ARM + video-decode fleet.
# - AUR uses #branch=..., no commit pin. CI artifacts are non-reproducible.
#
# Encoders (libx264/libx265/libvpx/libdav1d) kept per Fourier fleet policy.
# X11, AMF, CUDA, FireWire, AviSynth, OpenMPT, Bluray, OpenMAX, JPEG-XL,
# Theora, XVid, rsvg, soxr, ssh, vidstab, modplug, SDL2, Vulkan, JACK, GSM,
# Speex dropped — not needed on the Fourier fleet. (No SDL2 means no
# `ffplay` binary; mpv covers interactive playback.)
pkgname=ffmpeg-v4l2-request-fourier
_srcname=FFmpeg
_version='8.1'
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
pkgver=8.1.r123329.b57fbbe
pkgrel=5
epoch=2
pkgdesc='FFmpeg with V4L2 Request API hwaccel (Rockchip / Allwinner stateless decode)'
arch=('aarch64')
url='https://github.com/Kwiboo/FFmpeg'
license=('GPL-3.0-or-later')
depends=(
alsa-lib
bzip2
fontconfig
fribidi
gmp
gnutls
lame
libass.so
libdav1d.so
libdrm
libfreetype.so
libgl
libpulse
libva.so
libva-drm.so
libvorbis.so
libvorbisenc.so
libvpx.so
libwebp
libx264.so
libx265.so
libxml2
opus
v4l-utils
xz
zlib
)
makedepends=(
git
linux-api-headers
mesa
nasm
)
provides=(
libavcodec.so
libavdevice.so
libavfilter.so
libavformat.so
libavutil.so
libpostproc.so
libswresample.so
libswscale.so
ffmpeg
)
conflicts=(ffmpeg)
replaces=(ffmpeg ffmpeg-v4l2-request-git)
source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
'0001-libudev-bypass-fallback.patch'
'0002-nv15-to-p010-unpack.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP')
pkgver() {
cd "${_srcname}"
printf '%s.r%s.%s' "${_version}" \
"$(git rev-list --count HEAD)" \
"$(git rev-parse --short=7 HEAD)"
}
prepare() {
cd "${_srcname}"
patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
}
build() {
cd "${_srcname}"
# FFmpeg's configure resolves the compiler via `which` and bakes the
# absolute path into generated makefiles, bypassing the makepkg
# /usr/lib/distcc/bin shim. Pass it explicitly so `BUILDENV=(distcc ...)`
# actually distributes; otherwise everything compiles locally.
local _ffmpeg_cc=gcc _ffmpeg_cxx=g++
if [[ ":$PATH:" == *":/usr/lib/distcc/bin:"* ]]; then
_ffmpeg_cc='distcc gcc'
_ffmpeg_cxx='distcc g++'
fi
./configure \
--prefix=/usr \
--cc="${_ffmpeg_cc}" \
--cxx="${_ffmpeg_cxx}" \
--disable-debug \
--disable-static \
--disable-doc \
--disable-stripping \
--enable-shared \
--enable-gpl \
--enable-version3 \
--enable-pic \
--enable-neon \
--arch=aarch64 \
\
--enable-libdrm \
--enable-libv4l2 \
--enable-libudev \
--enable-v4l2-request \
--enable-v4l2_m2m \
--enable-vaapi \
--enable-opengl \
\
--enable-gnutls \
--enable-fontconfig \
--enable-libass \
--enable-libfreetype \
--enable-libfribidi \
--enable-libxml2 \
--enable-libpulse \
--enable-libdav1d \
--enable-libopus \
--enable-libvorbis \
--enable-libmp3lame \
--enable-libvpx \
--enable-libx264 \
--enable-libx265 \
--enable-libwebp \
\
--host-cflags='-fPIC'
make
make tools/qt-faststart
}
package() {
cd "${_srcname}"
make DESTDIR="${pkgdir}" install
install -Dm755 tools/qt-faststart "${pkgdir}/usr/bin/qt-faststart"
}
@@ -0,0 +1,75 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH 1/4] widget/gtk: recognize V4L2 stateless fourccs in
GfxInfo prober (S264 / S265 / VP9F)
Date: 2026-04-27
Background
----------
Firefox's V4L2 prober in `widget/gtk/GfxInfo.cpp::V4L2ProbeDevice`
parses `v4l2test`'s `V4L2_OUTPUT_FMTS` line and matches against the
fourccs of stateful V4L2-M2M decoders (`H264`, `VP80`, `VP90`, `HEVC`).
That's correct for Pi4 / Mediatek / vendor-MPP stateful decoders but
silently skips every mainline-Linux Rockchip board: RK3399 `rkvdec`,
RK3566 `hantro` (multiplanar), RK3588 `hantro` and `rkvdec2` all
expose stateless fourccs only — `S264`, `S265`, `VP9F`. The probe
binary itself enumerates these correctly (verified end-to-end on
fresnel / Pinebook Pro / RK3399 with `v4l2test --device /dev/video1`
showing `V4L2_OUTPUT_FMTS: S265 S264 VP9F` and
`V4L2_SUPPORTED: TRUE`); the gap is purely in this string table.
This patch adds the three sibling blocks for the stateless fourccs,
each identical in shape to the existing stateful blocks except for
setting a new `mV4L2IsStateless` member. The follow-up patches in
this series (2/4, 3/4, 4/4) consume that member to route through the
libavcodec v4l2_request hwaccel (`AV_HWDEVICE_TYPE_DRM`) instead of
the v4l2m2m codec wrapper used for stateful boards.
Bug 1969297.
diff --git a/widget/gtk/GfxInfo.h b/widget/gtk/GfxInfo.h
--- a/widget/gtk/GfxInfo.h
+++ b/widget/gtk/GfxInfo.h
@@ -127,6 +127,10 @@
mozilla::Maybe<bool> mIsVAAPISupported;
int mVAAPISupportedCodecs = 0;
mozilla::Maybe<bool> mIsV4L2Supported;
+ // firefox-fourier: true when probe matched at least one stateless
+ // V4L2 fourcc (S264 / S265 / VP9F). Drives libavcodec v4l2_request
+ // hwaccel routing in FFmpegVideoDecoder.cpp.
+ bool mV4L2IsStateless = false;
int mV4L2SupportedCodecs = 0;
static int sGLXTestPipe;
diff --git a/widget/gtk/GfxInfo.cpp b/widget/gtk/GfxInfo.cpp
--- a/widget/gtk/GfxInfo.cpp
+++ b/widget/gtk/GfxInfo.cpp
@@ -852,6 +852,29 @@ void GfxInfo::V4L2ProbeDevice(nsCString& dev) {
media::MCSInfo::AddSupport(media::MediaCodecsSupport::HEVCHardwareDecode);
mV4L2SupportedCodecs |= CODEC_HW_DEC_HEVC;
}
+ // firefox-fourier: V4L2 stateless (request API) fourccs. Mainline
+ // Rockchip rkvdec / hantro / rkvdec2 expose these instead of the
+ // V4L2-M2M-stateful fourccs above. Decoding routes through
+ // libavcodec's v4l2_request hwaccel (AV_HWDEVICE_TYPE_DRM) rather
+ // than the *_v4l2m2m codec wrappers — see FFmpegVideoDecoder.cpp.
+ if (outFormats.Contains("S264")) {
+ mIsV4L2Supported = Some(true);
+ mV4L2IsStateless = true;
+ media::MCSInfo::AddSupport(media::MediaCodecsSupport::H264HardwareDecode);
+ mV4L2SupportedCodecs |= CODEC_HW_DEC_H264;
+ }
+ if (outFormats.Contains("S265")) {
+ mIsV4L2Supported = Some(true);
+ mV4L2IsStateless = true;
+ media::MCSInfo::AddSupport(media::MediaCodecsSupport::HEVCHardwareDecode);
+ mV4L2SupportedCodecs |= CODEC_HW_DEC_HEVC;
+ }
+ if (outFormats.Contains("VP9F")) {
+ mIsV4L2Supported = Some(true);
+ mV4L2IsStateless = true;
+ media::MCSInfo::AddSupport(media::MediaCodecsSupport::VP9HardwareDecode);
+ mV4L2SupportedCodecs |= CODEC_HW_DEC_VP9;
+ }
}
const nsTArray<RefPtr<GfxDriverInfo>>& GfxInfo::GetGfxDriverInfo() {
@@ -0,0 +1,52 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH 2/4] dom/media/platforms/ffmpeg: wrap
av_hwdevice_ctx_create
Date: 2026-04-27
Background
----------
`FFmpegLibWrapper` already wraps `av_hwdevice_ctx_alloc` (no device
path) and `av_hwdevice_ctx_init`, used by the VAAPI codepath which
discovers the DRM device implicitly. The v4l2_request hwaccel needs
the *path-aware* constructor `av_hwdevice_ctx_create`, which lets the
caller pass `"/dev/dri/renderD128"` (or similar) directly when
creating an `AV_HWDEVICE_TYPE_DRM` context. libavcodec then binds the
v4l2_request hwaccel internally based on the codec's `hw_configs`.
This patch adds the function pointer + the `AV_FUNC_OPTION_SILENT`
registration. Same versioning as the other `av_hwdevice_ctx_*`
wrappers (libavutil 5862). No callers yet — patch 3/4
(FFmpegVideoDecoder routing) consumes it.
Bug 1969297.
diff --git a/dom/media/platforms/ffmpeg/FFmpegLibWrapper.h b/dom/media/platforms/ffmpeg/FFmpegLibWrapper.h
--- a/dom/media/platforms/ffmpeg/FFmpegLibWrapper.h
+++ b/dom/media/platforms/ffmpeg/FFmpegLibWrapper.h
@@ -177,6 +177,11 @@
// libavutil >= 58
AVBufferRef* (*av_hwdevice_ctx_alloc)(int);
int (*av_hwdevice_ctx_init)(AVBufferRef* ref);
+ // firefox-fourier: device-path-aware constructor needed to bind a
+ // DRM hwdevice (AV_HWDEVICE_TYPE_DRM) to /dev/dri/renderD* for the
+ // libavcodec v4l2_request hwaccel.
+ int (*av_hwdevice_ctx_create)(AVBufferRef** device_ctx, int type,
+ const char* device, void* opts, int flags);
AVBufferRef* (*av_hwframe_ctx_alloc)(AVBufferRef* device_ctx);
int (*av_hwframe_ctx_init)(AVBufferRef* ref);
AVBufferRef* (*av_buffer_ref)(AVBufferRef* buf);
diff --git a/dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp b/dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
--- a/dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
+++ b/dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
@@ -293,6 +293,11 @@ FFmpegLibWrapper::LinkResult FFmpegLibWrapper::Link() {
AV_FUNC_AVUTIL_58 | AV_FUNC_AVUTIL_59 |
AV_FUNC_AVUTIL_60 | AV_FUNC_AVUTIL_61 |
AV_FUNC_AVUTIL_62)
+ // firefox-fourier: see comment in FFmpegLibWrapper.h
+ AV_FUNC_OPTION_SILENT(av_hwdevice_ctx_create,
+ AV_FUNC_AVUTIL_58 | AV_FUNC_AVUTIL_59 |
+ AV_FUNC_AVUTIL_60 | AV_FUNC_AVUTIL_61 |
+ AV_FUNC_AVUTIL_62)
AV_FUNC_OPTION_SILENT(
av_buffer_ref, AV_FUNC_AVUTIL_58 | AV_FUNC_AVUTIL_59 | AV_FUNC_AVUTIL_60 |
AV_FUNC_AVUTIL_61 | AV_FUNC_AVUTIL_62)
@@ -0,0 +1,237 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH 3/4] dom/media/platforms/ffmpeg: route through libavcodec
v4l2_request hwaccel for V4L2 stateless boards
Date: 2026-04-27
Background
----------
Firefox's existing V4L2 init (`InitV4L2Decoder`) finds the codec by
suffix lookup (`FindVideoHardwareAVCodec(mLib, mCodecID)`), which on
Linux resolves to the **stateful** V4L2-M2M wrapper codec
(`h264_v4l2m2m` etc). Mainline-Linux Rockchip boards (RK3399 rkvdec,
RK3566/RK3588 hantro, RK3588 rkvdec2) only expose **stateless**
V4L2 fourccs (`S264`, `S265`, `VP9F`); the stateful wrapper codec
fails to open, Firefox falls all the way through to software.
This patch adds a sibling init path, `InitV4L2RequestDecoder`, that:
* looks up the codec via two complementary mechanisms libavcodec
uses for v4l2_request:
- **named codec** (`h264_v4l2request`, `vp8_v4l2request`, etc.):
the legacy AVCodec-per-hwaccel registration. ALARM, Debian,
and most distros building with --enable-v4l2-request expose
this (avcodec_find_decoder_by_name lookup).
- **generic codec + AV_HWDEVICE_TYPE_DRM** in `hw_configs`:
the modern hwaccel registration on some upstream-only ffmpeg
builds.
Probes named-codec first (explicit, portable) and falls back to
walking the generic codec's `hw_configs` for the DRM device type;
* creates an `AV_HWDEVICE_TYPE_DRM` hwdevice context bound to
`/dev/dri/renderD128` via the new `av_hwdevice_ctx_create` wrapper
(patch 2/4) and attaches it to the codec context;
* reuses the existing `ChooseV4L2PixelFormat` get-format callback
(already returns `AV_PIX_FMT_DRM_PRIME`) and the existing
`apply_cropping = 0` constraint.
`InitV4L2RequestDecoder` is invoked **before** `InitV4L2Decoder` in
`InitHWDecoderIfAllowed`. On Rockchip mainline it succeeds via either
mechanism (ALARM uses the named codec). On Pi4 / Mediatek /
vendor-MPP-stateful boards neither mechanism is registered for the
codec, the function bails out, and the existing stateful
`InitV4L2Decoder` runs as before. No regression of stateful boards.
`mDRMDeviceContext` is unconditionally `av_buffer_unref`'d in
`ProcessShutdown` (no-op when null). Gated behind
`media.ffmpeg.v4l2-request.enabled` from patch 4/4.
Bug 1969297.
diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-03-18 19:22:14.000000000 +0000
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-04-27 20:43:39.347992674 +0000
@@ -225,7 +225,12 @@
bool IsLinuxHDR() const;
MediaResult InitVAAPIDecoder();
MediaResult InitV4L2Decoder();
+ // firefox-fourier: V4L2 stateless (request API) decode path. Uses
+ // libavcodec's v4l2_request hwaccel, which it surfaces via
+ // AV_HWDEVICE_TYPE_DRM rather than a dedicated _V4L2REQUEST type.
+ MediaResult InitV4L2RequestDecoder();
bool CreateVAAPIDeviceContext();
+ bool CreateV4L2RequestDeviceContext();
bool GetVAAPISurfaceDescriptor(VADRMPRIMESurfaceDescriptor* aVaDesc);
void AddAcceleratedFormats(nsTArray<AVCodecID>& aCodecList,
AVCodecID aCodecID, AVVAAPIHWConfig* hwconfig);
@@ -239,7 +244,10 @@
void AdjustHWDecodeLogging();
AVBufferRef* mVAAPIDeviceContext = nullptr;
+ // firefox-fourier: DRM hwdevice ctx for the v4l2_request hwaccel.
+ AVBufferRef* mDRMDeviceContext = nullptr;
bool mUsingV4L2 = false;
+ bool mUsingV4L2Request = false;
// If video overlay is used we want to upload SW decoded frames to
// DMABuf and present it as a external texture to rendering pipeline.
bool mUploadSWDecodeToDMABuf = false;
diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-04-27 16:09:10.000000000 +0200
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-04-29 00:10:00.098884335 +0200
@@ -403,6 +403,129 @@
return NS_OK;
}
+// firefox-fourier: V4L2 stateless (request API) DRM hwdevice context.
+// libavcodec's v4l2_request hwaccel binds via AV_HWDEVICE_TYPE_DRM —
+// no dedicated _V4L2REQUEST type exists upstream.
+bool FFmpegVideoDecoder<LIBAV_VER>::CreateV4L2RequestDeviceContext() {
+ if (!mLib->av_hwdevice_ctx_create) {
+ FFMPEG_LOG(" av_hwdevice_ctx_create not available (libavutil too old)");
+ return false;
+ }
+ const char* drmDevice = "/dev/dri/renderD128";
+ if (mLib->av_hwdevice_ctx_create(&mDRMDeviceContext,
+ AV_HWDEVICE_TYPE_DRM, drmDevice,
+ nullptr, 0) < 0) {
+ FFMPEG_LOG(" av_hwdevice_ctx_create(DRM, %s) failed", drmDevice);
+ return false;
+ }
+ mCodecContext->hw_device_ctx = mLib->av_buffer_ref(mDRMDeviceContext);
+ FFMPEG_LOG(" DRM hwdevice ctx created on %s", drmDevice);
+ return true;
+}
+
+// firefox-fourier: try V4L2 stateless decode via libavcodec's
+// v4l2_request hwaccel. Distinct from InitV4L2Decoder which uses the
+// stateful h264_v4l2m2m wrapper codec. On Rockchip mainline boards
+// (rkvdec / hantro / rkvdec2) only the stateless path exists.
+MediaResult FFmpegVideoDecoder<LIBAV_VER>::InitV4L2RequestDecoder() {
+ FFMPEG_LOG("Initialising V4L2 stateless (request API) FFmpeg decoder");
+
+ StaticMutexAutoLock mon(sMutex);
+
+ // libavcodec exposes V4L2 stateless decoders through one of two
+ // mechanisms depending on the build:
+ // (a) Named AVCodec entry (h264_v4l2request, vp8_v4l2request,
+ // etc.) — the legacy mechanism. Each hwaccel is a standalone
+ // AVCodec, looked up by name. ALARM, Debian, and most distros
+ // building with --enable-v4l2-request expose this.
+ // (b) Modern hwaccel registration: the generic codec advertises
+ // AV_HWDEVICE_TYPE_DRM in its hw_configs array, and setting
+ // hw_device_ctx on the codec context binds v4l2_request
+ // internally. Some upstream-only builds expose this.
+ // Probe (a) first — it is the explicit, distro-portable lookup.
+ // Fall back to (b) when the named entry isn't registered.
+ const char* requestName = nullptr;
+ switch (mCodecID) {
+ case AV_CODEC_ID_H264: requestName = "h264_v4l2request"; break;
+ case AV_CODEC_ID_HEVC: requestName = "hevc_v4l2request"; break;
+ case AV_CODEC_ID_VP8: requestName = "vp8_v4l2request"; break;
+ case AV_CODEC_ID_VP9: requestName = "vp9_v4l2request"; break;
+ case AV_CODEC_ID_AV1: requestName = "av1_v4l2request"; break;
+ case AV_CODEC_ID_MPEG2VIDEO: requestName = "mpeg2_v4l2request"; break;
+ default:
+ FFMPEG_LOG(" no v4l2_request mapping for codec ID %d", mCodecID);
+ return NS_ERROR_NOT_AVAILABLE;
+ }
+
+ AVCodec* codec = mLib->avcodec_find_decoder_by_name(requestName);
+ if (codec) {
+ FFMPEG_LOG(" using named v4l2_request codec %s", requestName);
+ } else {
+ AVCodec* generic = mLib->avcodec_find_decoder(mCodecID);
+ if (generic) {
+ for (int i = 0;; i++) {
+ const AVCodecHWConfig* cfg = mLib->avcodec_get_hw_config(generic, i);
+ if (!cfg) break;
+ if (cfg->device_type == AV_HWDEVICE_TYPE_DRM) {
+ codec = generic;
+ FFMPEG_LOG(" using generic codec %s with DRM hwaccel", codec->name);
+ break;
+ }
+ }
+ }
+ }
+
+ if (!codec) {
+ FFMPEG_LOG(" no v4l2_request path for codec ID %d \u2014 neither named "
+ "codec %s nor generic codec with DRM hwaccel available "
+ "(libavcodec built without --enable-v4l2-request?)",
+ mCodecID, requestName);
+ return NS_ERROR_NOT_AVAILABLE;
+ }
+ FFMPEG_LOG(" V4L2 stateless: codec %s : %s", codec->name, codec->long_name);
+
+ if (!(mCodecContext = mLib->avcodec_alloc_context3(codec))) {
+ FFMPEG_LOG(" couldn't init HW ffmpeg context");
+ return NS_ERROR_OUT_OF_MEMORY;
+ }
+ mCodecContext->opaque = this;
+
+ // Reuse the existing V4L2 init helpers: pixel-format selector returns
+ // AV_PIX_FMT_DRM_PRIME, cropping disabled (FFmpeg can't crop opaque
+ // DRM buffers). Same constraints as the stateful V4L2 path.
+ InitHWCodecContext(ContextType::V4L2);
+ mCodecContext->apply_cropping = 0;
+
+ auto releaseDecoder = MakeScopeExit(
+ [&]() MOZ_NO_THREAD_SAFETY_ANALYSIS { ReleaseCodecContext(); });
+
+ if (!CreateV4L2RequestDeviceContext()) {
+ return NS_ERROR_NOT_AVAILABLE;
+ }
+
+ MediaResult ret = AllocateExtraData();
+ if (NS_FAILED(ret)) {
+ return ret;
+ }
+
+ if (mLib->avcodec_open2(mCodecContext, codec, nullptr) < 0) {
+ FFMPEG_LOG(" Couldn't open V4L2 stateless decoder");
+ return NS_ERROR_DOM_MEDIA_FATAL_ERR;
+ }
+
+ if (mAcceleratedFormats.IsEmpty()) {
+ mAcceleratedFormats.AppendElement(mCodecID);
+ }
+
+ AdjustHWDecodeLogging();
+
+ FFMPEG_LOG(" V4L2 stateless FFmpeg init successful");
+ mUsingV4L2 = true;
+ mUsingV4L2Request = true;
+ releaseDecoder.release();
+ return NS_OK;
+}
+
MediaResult FFmpegVideoDecoder<LIBAV_VER>::InitV4L2Decoder() {
FFMPEG_LOG("Initialising V4L2-DRM FFmpeg decoder");
@@ -656,6 +779,16 @@
# endif // MOZ_ENABLE_VAAPI
# ifdef MOZ_ENABLE_V4L2
+ // firefox-fourier: try V4L2 stateless (request API) first. On
+ // mainline-Linux Rockchip boards (RK3399 rkvdec, RK3566/RK3588
+ // hantro, RK3588 rkvdec2) the kernel exposes only stateless
+ // fourccs, so the stateful path below would fail anyway. On
+ // stateful boards (Pi4 / vendor MPP) this gracefully falls
+ // through (no DRM hwaccel registered for the codec).
+ if (StaticPrefs::media_ffmpeg_v4l2_request_enabled() &&
+ NS_SUCCEEDED(InitV4L2RequestDecoder())) {
+ return;
+ }
// VAAPI didn't work or is disabled, so try V4L2 with DRM
if (NS_SUCCEEDED(InitV4L2Decoder())) {
return;
@@ -2046,6 +2179,11 @@
if (IsHardwareAccelerated()) {
mLib->av_buffer_unref(&mVAAPIDeviceContext);
}
+ // firefox-fourier: release the DRM hwdevice ctx for the v4l2_request
+ // hwaccel. Always safe — av_buffer_unref no-ops on a null pointer.
+ if (mDRMDeviceContext) {
+ mLib->av_buffer_unref(&mDRMDeviceContext);
+ }
#endif
#ifdef MOZ_ENABLE_D3D11VA
if (IsHardwareAccelerated()) {
@@ -0,0 +1,34 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH 4/4] modules/libpref: add media.ffmpeg.v4l2-request.enabled
Date: 2026-04-27
Background
----------
Toggle for the V4L2 stateless (request API) decode path introduced
in patch 3/4. Defaults on for Linux, mirroring the
`media.ffmpeg.vaapi.enabled`-style shape. Users can flip to false to
force the existing stateful `InitV4L2Decoder` (or VAAPI / software
fallbacks) without rebuilding.
Bug 1969297.
diff --git a/modules/libpref/init/StaticPrefList.yaml b/modules/libpref/init/StaticPrefList.yaml
--- a/modules/libpref/init/StaticPrefList.yaml
+++ b/modules/libpref/init/StaticPrefList.yaml
@@ -12159,6 +12159,16 @@
type: uint32_t
value: 2
mirror: once
+
+# firefox-fourier: route V4L2 stateless (request API) decode through
+# libavcodec's v4l2_request hwaccel (AV_HWDEVICE_TYPE_DRM). Required
+# for mainline-Linux Rockchip rkvdec / hantro / rkvdec2. On stateful
+# boards (Pi4 / vendor MPP) the codec's hw_configs lacks a DRM entry
+# and the path silently falls back to InitV4L2Decoder.
+- name: media.ffmpeg.v4l2-request.enabled
+ type: RelaxedAtomicBool
+ value: true
+ mirror: always
#endif # MOZ_WIDGET_GTK
# Set to true in marionette tests to disable the sanity test
@@ -0,0 +1,236 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH 5/5] security/sandbox/linux: extend V4L2 sandbox carve-out
to /dev/media* and the MEDIA_IOC_* ioctl family for stateless decode
Date: 2026-04-29
Background
----------
The existing V4L2 sandbox carve-out (`AddV4l2Dependencies` in
`SandboxBrokerPolicyFactory.cpp` + the `'V'` ioctl-type allow rule
in `SandboxFilter.cpp`) is sufficient for the **stateful** V4L2-M2M
codec wrapper (`h264_v4l2m2m` etc.), where every operation goes
through `/dev/video*` and uses `VIDIOC_*` ioctls.
Stateless V4L2 decoders (Rockchip rkvdec, hantro on mainline kernel,
Allwinner cedrus, RPi5 codec_request) drive a different shape:
* Per-frame request lifecycle is queued via the *Media Controller*
node (`/dev/media*`), not the video node. `MEDIA_IOC_REQUEST_ALLOC`
creates a request fd, which is then attached to OUTPUT-queue
`VIDIOC_QBUF` calls and queued/reinited via `MEDIA_REQUEST_IOC_*`
ioctls.
* Both ioctl families use type `'|'`. The current RDD seccomp
filter only allows `'d'` (DRM), `'b'` (DMA-Buf), and `'V'`
(V4L2). `'|'` is rejected, so even reading the device's caps
via `MEDIA_IOC_DEVICE_INFO` returns `EPERM`.
* The broker policy currently only walks `/dev/video*` and only
permits devices reporting `V4L2_CAP_VIDEO_M2M{,_MPLANE}`. The
matching `/dev/media*` controller node is denied, so when
libavcodec's `v4l2_request` hwaccel tries to open it via the
broker, `open` returns `EACCES`.
This patch closes both gaps:
1. **Broker policy** — after the existing `/dev/video*` walk, now
iterates `/dev/media*`. For each, queries
`MEDIA_IOC_DEVICE_INFO`; if the reported `info.driver` matches
a driver name we already permitted via the M2M video-device
pass (i.e. the same driver pair-binds the video node and the
media controller), the media node is added to the policy too.
Webcams and unrelated media controllers stay denied.
2. **Seccomp ioctl filter** — adds `'|'` (`kMediaType`) to the
RDD allow-list alongside `'V'`, gated on `MOZ_ENABLE_V4L2`.
Validation
----------
On RK3399 / Pinebook Pro / mainline kernel (rkvdec stateless H.264
decoder via /dev/video1 + /dev/media0), this patch lets the v4l2_request
hwaccel open both nodes from a fully-sandboxed RDD process, with no
`MOZ_DISABLE_RDD_SANDBOX=1` env override needed. RDD CPU during 1080p30
H.264 playback drops from ~78% (software) to ~9.4% (hardware decode),
matching the env-disabled-sandbox baseline measured during initial
patch development.
No regression for stateful V4L2 paths (the original /dev/video* loop
and 'V' ioctl rule are unchanged); no regression for builds without
`MOZ_ENABLE_V4L2` (whole block is `#ifdef`-gated).
Bug 1969297.
diff --git a/security/sandbox/linux/SandboxFilter.cpp b/security/sandbox/linux/SandboxFilter.cpp
--- a/security/sandbox/linux/SandboxFilter.cpp 2026-04-27 15:33:08.000000000 +0200
+++ b/security/sandbox/linux/SandboxFilter.cpp 2026-04-29 14:40:10.984593331 +0200
@@ -2064,12 +2064,19 @@
static constexpr unsigned long kDmaBufType =
static_cast<unsigned long>('b') << _IOC_TYPESHIFT;
#ifdef MOZ_ENABLE_V4L2
// Type 'V' for V4L2, used for hw accelerated decode
static constexpr unsigned long kVideoType =
static_cast<unsigned long>('V') << _IOC_TYPESHIFT;
+ // firefox-fourier: type '|' is the Media Controller / Request API
+ // ioctl family (MEDIA_IOC_DEVICE_INFO, MEDIA_IOC_REQUEST_ALLOC,
+ // MEDIA_REQUEST_IOC_QUEUE, ...). Required by libavcodec's
+ // v4l2_request hwaccel for stateless decoders (Rockchip rkvdec,
+ // hantro on mainline kernel).
+ static constexpr unsigned long kMediaType =
+ static_cast<unsigned long>('|') << _IOC_TYPESHIFT;
#endif
// nvidia non-tegra uses some ioctls from this range (but not actual
// fbdev ioctls; nvidia uses values >= 200 for the NR field
// (low 8 bits))
static constexpr unsigned long kFbDevType =
static_cast<unsigned long>('F') << _IOC_TYPESHIFT;
@@ -2085,12 +2092,13 @@
// Allow DRI and DMA-Buf for VA-API. Also allow V4L2 if enabled
return If(shifted_type == kDrmType, Allow())
.ElseIf(shifted_type == kDmaBufType, Allow())
#ifdef MOZ_ENABLE_V4L2
.ElseIf(shifted_type == kVideoType, Allow())
+ .ElseIf(shifted_type == kMediaType, Allow())
#endif
// NVIDIA decoder from Linux4Tegra, this is specific to Tegra ARM64 SoC
#if defined(__aarch64__)
.ElseIf(shifted_type == kNvidiaNvmapType, Allow())
.ElseIf(shifted_type == kNvidiaNvhostType, Allow())
#endif // defined(__aarch64__)
diff --git a/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp b/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp
--- a/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp 2026-04-27 15:33:09.000000000 +0200
+++ b/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp 2026-04-29 14:40:10.984593331 +0200
@@ -45,14 +45,16 @@
# include "mozilla/WidgetUtilsGtk.h"
# include <glib.h>
#endif
#ifdef MOZ_ENABLE_V4L2
# include <linux/videodev2.h>
+# include <linux/media.h>
# include <sys/ioctl.h>
# include <fcntl.h>
+# include <cstring>
#endif // MOZ_ENABLE_V4L2
#include <dirent.h>
#include <sys/stat.h>
#include <sys/sysmacros.h>
#include <sys/types.h>
@@ -870,12 +872,18 @@
DIR* dir = opendir("/dev");
if (!dir) {
SANDBOX_LOG("Couldn't list /dev");
return;
}
+ // Driver names of permitted M2M /dev/video* devices, used below to
+ // decide which /dev/media* nodes to permit. firefox-fourier: stateless
+ // V4L2 decode (Rockchip rkvdec, hantro) needs the media controller node
+ // for the request API (MEDIA_IOC_REQUEST_ALLOC and friends).
+ nsTArray<nsCString> permittedDrivers;
+
struct dirent* dir_entry;
while ((dir_entry = readdir(dir))) {
if (strncmp(dir_entry->d_name, "video", 5)) {
// Not a /dev/video* device, so ignore it
continue;
}
@@ -901,20 +909,99 @@
}
if ((cap.device_caps & V4L2_CAP_VIDEO_M2M) ||
(cap.device_caps & V4L2_CAP_VIDEO_M2M_MPLANE)) {
// This is an M2M device (i.e. not a webcam), so allow access
policy->AddPath(rdwr, path.get());
+ // Track the driver name so the matching /dev/media* node (if any)
+ // can be permitted below.
+ const size_t driverLen =
+ strnlen(reinterpret_cast<const char*>(cap.driver),
+ sizeof(cap.driver));
+ nsCString driverName(reinterpret_cast<const char*>(cap.driver),
+ driverLen);
+ if (!permittedDrivers.Contains(driverName)) {
+ permittedDrivers.AppendElement(std::move(driverName));
+ }
+ }
+
+ close(fd);
+ }
+ rewinddir(dir);
+
+ // firefox-fourier: walk /dev/media* and permit any media controller
+ // bound to a driver we already permitted via /dev/video*. Required for
+ // V4L2 stateless decode (request API), which queues OUTPUT buffers via
+ // ioctls on /dev/media* rather than /dev/video*.
+ while ((dir_entry = readdir(dir))) {
+ if (strncmp(dir_entry->d_name, "media", 5)) {
+ continue;
+ }
+
+ nsCString path = "/dev/"_ns;
+ path += nsDependentCString(dir_entry->d_name);
+
+ int fd = open(path.get(), O_RDWR | O_NONBLOCK, 0);
+ if (fd < 0) {
+ SANDBOX_LOG("Couldn't open media device %s", path.get());
+ continue;
}
+ struct media_device_info info;
+ int result = ioctl(fd, MEDIA_IOC_DEVICE_INFO, &info);
close(fd);
+ if (result < 0) {
+ SANDBOX_LOG("Couldn't query media device info for %s", path.get());
+ continue;
+ }
+
+ const size_t driverLen =
+ strnlen(info.driver, sizeof(info.driver));
+ nsCString driverName(info.driver, driverLen);
+ if (permittedDrivers.Contains(driverName)) {
+ policy->AddPath(rdwr, path.get());
+ }
}
closedir(dir);
// FFmpeg V4L2 needs to list /dev to find V4L2 devices.
policy->AddPath(rdonly, "/dev");
+
+ // firefox-fourier: libavcodec's v4l2_request hwaccel uses libudev to
+ // enumerate /dev/media* devices for stateless decode.
+ // udev_enumerate_scan_devices() iterates ALL /sys/class/* subsystems
+ // (drm, dma_heap, ..., not only the ones we care about), opening each
+ // subdir to list entries; if any open is denied it returns -EUNATCH
+ // and the hwaccel skips device probing entirely. Then for each
+ // matching device it follows the /sys/dev/char/MAJOR:MINOR symlink
+ // into /sys/devices/platform/* to read attributes. All read-only.
+ policy->AddTree(rdonly, "/sys/class");
+ policy->AddTree(rdonly, "/sys/devices/platform");
+ policy->AddTree(rdonly, "/sys/dev/char");
+ policy->AddTree(rdonly, "/sys/bus");
+
+ // libudev's udev_new() and udev_device_new_from_devnum read from
+ // /run/udev/{control,data,tags,...} and /etc/udev/udev.conf. Without
+ // these, udev_new() can return NULL or udev_device queries return
+ // empty data, breaking the hwaccel's device-translation pass entirely.
+ policy->AddTree(rdonly, "/run/udev");
+ policy->AddPath(rdonly, "/etc/udev/udev.conf");
+ // libudev's udev_new() reads /proc/self/* during initialisation
+ // (uid_map, mountinfo, etc.).
+ policy->AddTree(rdonly, "/proc/self");
+ // libudev / libavcodec call open("/", O_DIRECTORY) during path
+ // enumeration (e.g., when resolving /dev/dri/renderD128 via realpath).
+ // Listing the root directory is harmless - RDD can already infer the
+ // top-level entries from policy paths.
+ policy->AddPath(rdonly, "/");
+
+ // Stateless V4L2 decoders (rkvdec, hantro on mainline) allocate
+ // CAPTURE-queue buffers from /dev/dma_heap/*, not internal VPU-private
+ // memory. Without rdwr access here, av_hwdevice_ctx_init() succeeds
+ // but the first av_buffer_get_ref() fails on the DMA-heap fd.
+ policy->AddTree(rdwr, "/dev/dma_heap");
}
#endif // MOZ_ENABLE_V4L2
/* static */ UniquePtr<SandboxBroker::Policy>
SandboxBrokerPolicyFactory::GetRDDPolicy(int aPid) {
auto policy = MakeUnique<SandboxBroker::Policy>();
+183
View File
@@ -0,0 +1,183 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# Firefox with V4L2 stateless (request API) hardware video decode
# unlocked for mainline-Linux Rockchip (RK3399 rkvdec, RK3566/RK3588
# hantro multiplanar, RK3588 rkvdec2). Sibling to chromium-fourier;
# same niche. No vendor MPP, no Mali blob, no panfork, no 5.10 BSP.
#
# Patch series adds 4 thin shims around upstream firefox (~+169 lines,
# zero deletions). Architecture: stateless decode rides libavcodec's
# v4l2_request hwaccel (AV_HWDEVICE_TYPE_DRM); no separate Mozilla V4L2
# decoder gets written. See ../../arch/firefox-fourier/PLAN.md for
# the full diagnosis. Mozilla bug 1969297.
pkgname=firefox-fourier
pkgver=150.0.1
pkgrel=5
pkgdesc='Firefox with V4L2 stateless HW video decode unlocked for mainline Linux Rockchip'
arch=('aarch64' 'x86_64')
url='https://www.mozilla.org/firefox'
license=('MPL-2.0')
depends=(
alsa-lib
at-spi2-core
cairo
dbus
ffmpeg
fontconfig
freetype2
gcc-libs
gdk-pixbuf2
glib2
glibc
gtk3
hicolor-icon-theme
libdrm
libpulse
libva
libxcb
libxkbcommon
mesa
nspr
nss
pango
pciutils
ttf-liberation
v4l-utils
)
makedepends=(
cbindgen
clang
imagemagick
inetutils
lld
llvm
mesa
nasm
nodejs
python
rust
unzip
wasi-compiler-rt
wasi-libc
yasm
zip
)
optdepends=(
'hunspell-en_us: spell checking, American English'
'libnotify: send notifications when downloads complete'
'pulseaudio: audio support'
)
provides=(firefox)
conflicts=(firefox)
options=('!emptydirs' '!strip')
source=(
"https://archive.mozilla.org/pub/firefox/releases/${pkgver}/source/firefox-${pkgver}.source.tar.xz"
'mozconfig'
# Arch's official firefox patches — toolchain glue for clang 22 +
# glibc 2.43 + Rust 1.95+. Picked up verbatim because we hit the same
# walls. arch-0001 (install-under-remoting) skipped — our launcher
# ships under /usr/bin/firefox-fourier with our own wrapper.
# https://gitlab.archlinux.org/archlinux/packaging/packages/firefox
'arch-0002-Bug-2033279-Make-enable-rust-simd-work-with-Rust-1.9.patch'
'arch-0003-Patch-glsl-optimizer-to-build-with-glibc-2.43.patch'
'arch-0004-Bug-2023597-Use-wasm32-wasip1-target-for-clang-22.1-.patch'
# firefox-fourier patches — V4L2 stateless decode unlock.
'0001-gfxinfo-v4l2-stateless-fourccs.patch'
'0002-libwrapper-hwdevice-ctx-create.patch'
'0003-ffmpegvideo-v4l2-request-route.patch'
'0004-prefs-v4l2-request.patch'
'0005-rdd-sandbox-v4l2-media-ctl.patch'
# Vendor-default prefs that gate the patched VAAPI path on RK3399 —
# widget.dmabuf.force-enabled etc. See marfrit-packages#8 for evidence.
'rockchip-fourier-defaults.js'
)
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
prepare() {
cd "${srcdir}/firefox-${pkgver}"
# Toolchain glue (Arch upstream) — apply BEFORE the fourier patches.
patch -Np1 -i "${srcdir}/arch-0002-Bug-2033279-Make-enable-rust-simd-work-with-Rust-1.9.patch"
patch -Np1 -i "${srcdir}/arch-0003-Patch-glsl-optimizer-to-build-with-glibc-2.43.patch"
patch -Np1 -i "${srcdir}/arch-0004-Bug-2023597-Use-wasm32-wasip1-target-for-clang-22.1-.patch"
# Fourier patches — order matters; see ../PLAN.md for rationale.
patch -Np1 -i "${srcdir}/0001-gfxinfo-v4l2-stateless-fourccs.patch"
patch -Np1 -i "${srcdir}/0002-libwrapper-hwdevice-ctx-create.patch"
patch -Np1 -i "${srcdir}/0003-ffmpegvideo-v4l2-request-route.patch"
patch -Np1 -i "${srcdir}/0004-prefs-v4l2-request.patch"
patch -Np1 -i "${srcdir}/0005-rdd-sandbox-v4l2-media-ctl.patch"
cp "${srcdir}/mozconfig" .mozconfig
}
build() {
cd "${srcdir}/firefox-${pkgver}"
# Arch's makepkg.conf injects -fexceptions into CFLAGS/CXXFLAGS by
# default for hardening. Mozilla's STL wrappers refuse to compile
# with exceptions enabled (#error "STL code can only be used with
# -fno-exceptions"). Strip the offender before mach configure picks
# up the env. Same trick the upstream Arch firefox PKGBUILD uses.
CFLAGS="${CFLAGS//-fexceptions/}"
CXXFLAGS="${CXXFLAGS//-fexceptions/}"
export CFLAGS CXXFLAGS
export MOZ_NOSPAM=1
export MOZ_API_KEY_UNUSED=1
export MOZ_TELEMETRY_REPORTING=
export MOZ_REQUIRE_SIGNING=
export MACH_BUILD_PYTHON_NATIVE_PACKAGE_SOURCE=system
export PYTHON=/usr/bin/python
./mach configure
./mach build
}
package() {
cd "${srcdir}/firefox-${pkgver}"
DESTDIR="${pkgdir}" ./mach install
# Move mach's default /usr/local/* layout to /usr/* so we conflict
# with `firefox` cleanly and `provides=firefox` actually works.
# `cp -r` preserves the bin symlink (target lives in /usr/local) —
# delete it before staging the launcher so `cat >` doesn't follow a
# dangling symlink and ENOENT.
if [ -d "${pkgdir}/usr/local" ]; then
cp -r "${pkgdir}/usr/local/." "${pkgdir}/usr/"
rm -rf "${pkgdir}/usr/local"
fi
rm -f "${pkgdir}/usr/bin/firefox-fourier"
# Launcher script. mach's install drops the binary at
# /usr/lib/firefox-fourier/firefox-fourier (a small bash launcher) plus
# firefox-fourier-bin alongside; we exec the launcher.
cat > "${pkgdir}/usr/bin/firefox-fourier" <<'LAUNCHER'
#!/bin/bash
# firefox-fourier launcher — V4L2 stateless HW decode path defaults.
# Patch 4/4 already defaults media.ffmpeg.v4l2-request.enabled=true on
# Linux; the env vars below cover the platform-detection bits firefox
# still consults at startup.
export MOZ_ENABLE_WAYLAND="${MOZ_ENABLE_WAYLAND:-1}"
export MOZ_X11_EGL="${MOZ_X11_EGL:-1}"
exec /usr/lib/firefox-fourier/firefox-fourier "$@"
LAUNCHER
chmod 0755 "${pkgdir}/usr/bin/firefox-fourier"
# Vendor-default prefs (RK3399 HW-decode unlock) — closes #8.
# Lower precedence than user prefs / about:config; loaded by Firefox
# at startup from the package install dir. The 0004 patch covers
# media.ffmpeg.v4l2-request.enabled; this file covers the three
# additional prefs that gate the path to the patched code.
# Vendor-prefs install path: /usr/lib/firefox-fourier/defaults/preferences/
# (Mozilla's canonical scan dir for third-party default-pref drops.) The
# browser/defaults/preferences/ alternative looked promising but is NOT a
# vendor-prefs scan location in Firefox 150 — empirically confirmed on
# fresnel: file shipped there, VAAPI never engaged. Same file under
# defaults/preferences/ → MOZ_LOG showed `Requesting pixel format
# VAAPI_VLD` + dmabuf surfaces locking end-to-end.
install -Dm644 "${srcdir}/rockchip-fourier-defaults.js" \
"${pkgdir}/usr/lib/firefox-fourier/defaults/preferences/rockchip-fourier-defaults.js"
}
+210
View File
@@ -0,0 +1,210 @@
# firefox-fourier — V4L2 Stateless Decoder Patch Plan
Plan to extend Firefox 149's V4L2 hardware decode path to cover Rockchip
mainline kernel boards (RK3399 rkvdec, RK3566/RK3588 hantro, RK3588
rkvdec2) by routing stateless `S264`/`S265`/`VP9F` fourccs through
libavcodec's `v4l2_request` hwaccel, which mainline FFmpeg surfaces via
`AV_HWDEVICE_TYPE_DRM` (no dedicated `_V4L2REQUEST` enum exists upstream
— confirmed against `libavutil/hwcontext.h`).
## 1. Files touched (in order)
| # | Path | Change | Lines |
|---|------|--------|-------|
| 1 | `widget/gtk/GfxInfo.cpp` | `V4L2ProbeDevice` (~L10301110): add `S264`/`S265`/`VP9F` matches alongside existing `H264`/`HEVC`/`VP90`. Set `mIsV4L2Supported = Some(true)` and OR the same `CODEC_HW_DEC_*` bits. Tag a new bool `mV4L2IsStateless` so downstream can branch. | +35 / -2 |
| 2 | `dom/media/platforms/ffmpeg/FFmpegLibWrapper.h` | Add wrappers for `av_hwdevice_ctx_create` (currently only `_alloc`/`_init` per L173174) and `av_hwdevice_find_type_by_name`. Needed because stateless wants the *device-path-aware* `_create` form to bind `/dev/dri/renderD128`. | +4 / 0 |
| 3 | `dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp` | `dlsym` the two new pointers; gate behind `LIBAVUTIL_VERSION_MAJOR >= 56`. | +6 / 0 |
| 4 | `dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h` | Add `AVBufferRef* mDRMDeviceContext = nullptr;` and `bool mUsingV4L2Request = false;` next to existing `mUsingV4L2`. | +3 / 0 |
| 5 | `dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp` | New `CreateV4L2RequestDeviceContext()` modelled on `CreateVAAPIDeviceContext` (current uses `av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_VAAPI)`). Wire it in alongside the existing V4L2 init branch (the one feeding `ChooseV4L2PixelFormat` at L~258). `FindVideoHardwareAVCodec` call at L708 stays unchanged for stateless — we want the *generic* `h264`/`hevc`/`vp9` decoder + a DRM hw_device_ctx, not a `_v4l2m2m` codec. | +90 / -5 |
| 6 | `modules/libpref/init/StaticPrefList.yaml` | New pref `media.ffmpeg.v4l2-request.enabled` mirroring `media.ffmpeg.vaapi.enabled` (default `@IS_LINUX@`). | +6 / 0 |
| 7 | `dom/media/ipc/RDDProcessHost.cpp` + sandbox policy file | Whitelist `/dev/media*`, `/dev/dri/renderD*`, `/dev/video*` for the RDD process (already partly done for VAAPI — verify `policy/linux/SandboxBrokerPolicyFactory.cpp`). | +6 / 0 |
Total: roughly **+150 / -10** across 7 files.
## 2. Probe extension — `GfxInfo.cpp::V4L2ProbeDevice`
Existing pattern (~L1075):
```cpp
if (outFormats.Contains("H264")) { mIsV4L2Supported = Some(true);
mV4L2SupportedCodecs |= CODEC_HW_DEC_H264; ... }
```
Add three siblings, each setting an additional `mV4L2IsStateless = true`:
```cpp
if (outFormats.Contains("S264")) { /* CODEC_HW_DEC_H264 | stateless */ }
if (outFormats.Contains("S265")) { /* CODEC_HW_DEC_HEVC | stateless */ }
if (outFormats.Contains("VP9F")) { /* CODEC_HW_DEC_VP9 | stateless */ }
```
Decision: do **not** introduce a separate `mIsV4L2StatelessSupported`.
Collapse under `mIsV4L2Supported` so the existing feature-gate plumbing
(`MediaCodecsSupport`, `gfxFeature::HW_DECODE_VIDEO`) flips identically
— only `mV4L2IsStateless` distinguishes the routing in step 3. Stateful
+ stateless on the same SoC (rare, but RK3588 has both rkvdec2 + hantro
VEPU) gracefully degrades to whichever codec wins enumeration order.
The capture-format gate (`YV12`/`NV12`) needs widening: stateless
decoders frequently expose only `NV12` or `NV15` (10-bit, RK3588 HEVC).
Add `NV15` and `NM12` (multiplanar NV12, hantro). Without this the
prober rejects an otherwise-good device.
## 3. Decoder routing — `FFmpegVideoDecoder.cpp`
Codec selection happens at **L651** (`AV_HWDEVICE_TYPE_VAAPI`
`h264_vaapi`) and **L708** (the V4L2 fallback path → `h264_v4l2m2m` via
`FindVideoHardwareAVCodec(mLib, mCodecID)` resolving by suffix). The
stateless route diverges from both: the *codec* must remain the generic
`h264`/`hevc`/`vp9` decoder (libavcodec auto-binds `v4l2_request` from
its `hw_configs` when a DRM hw_device_ctx is attached). Pseudo-patch:
```cpp
if (gfxInfo.mUsingV4L2 && gfxInfo.mV4L2IsStateless) {
AVCodec* codec = mLib->avcodec_find_decoder(mCodecID); // generic
mCodecContext = mLib->avcodec_alloc_context3(codec);
if (!CreateV4L2RequestDeviceContext()) return false;
mCodecContext->get_format = ChooseV4L2PixelFormat; // already returns DRM_PRIME
mUsingV4L2Request = true;
}
```
`CreateV4L2RequestDeviceContext()` body:
```cpp
const char* drm = "/dev/dri/renderD128";
if (mLib->av_hwdevice_ctx_create(&mDRMDeviceContext,
AV_HWDEVICE_TYPE_DRM, drm, nullptr, 0) < 0) return false;
mCodecContext->hw_device_ctx = mLib->av_buffer_ref(mDRMDeviceContext);
```
`av_hwdevice_ctx_create` — not currently wrapped — is the entry point.
The codec's internal hwaccel selector then walks
`avcodec_get_hw_config()` and picks the entry whose `device_type ==
AV_HWDEVICE_TYPE_DRM` and `pix_fmt == AV_PIX_FMT_DRM_PRIME`, which is
the v4l2_request hwaccel registered in `libavcodec/v4l2_request_*.c`.
No `av_hwdevice_find_type_by_name("v4l2_request")` needed — stays an
internal libavcodec name.
## 4. Dmabuf / DRM_PRIME reuse
`ChooseV4L2PixelFormat` at L~258270 already returns
`AV_PIX_FMT_DRM_PRIME` and is the *only* format the v4l2_request
hwaccel produces. The downstream consumer (DMABufSurfaceYUV import in
`FFmpegVideoFramePool.cpp`) is already DRM_PRIME-aware for the
stateful path — same code reads `AVDRMFrameDescriptor` from
`frame->data[0]`. **No new output handling required for NV12/YV12.**
10-bit caveat: RK3588 HEVC outputs `DRM_FORMAT_NV15` / `NV20`
(Mali-tile). Existing `WaylandDMABufSurface::CreateYUVSurface`
modifier list does not include `DRM_FORMAT_MOD_ARM_AFBC` or NV15
fourcc. Either reject 10-bit at probe (capture format gate above) or
extend `gfx/layers/DMABUFSurfaceImage.cpp` — out of scope for v1; gate
to NV12 only.
SAND format pollution Turner mentioned in bug 1969297 c#3 is
**Pi5-specific**; rkvdec/hantro do not produce SAND. Safe to ignore for
the Rockchip target.
## 5. Configuration
New pref:
```yaml
- name: media.ffmpeg.v4l2-request.enabled
type: RelaxedAtomicBool
value: @IS_LINUX@
mirror: always
```
No new env var. No `MOZ_X11_EGL`-style kludge. The existing
`MOZ_LOG=PlatformDecoderModule:5` covers diagnostics. Default-on
matches `media.ffmpeg.vaapi.enabled` shape; users get fallback to
software via existing failure paths if `av_hwdevice_ctx_create` fails
(e.g., missing `/dev/media0`).
## 6. Test plan (fresnel — RK3399, KDE Wayland)
1. `ls /dev/video* /dev/media*` — confirm `/dev/video0` (rkvdec
output) and `/dev/media0` exist.
2. `v4l2-ctl -d /dev/video0 --list-formats-out` — expect
`S264`/`S265`/`VP9F`.
3. Start: `MOZ_LOG="PlatformDecoderModule:5,FFmpegVideo:5"
firefox-fourier 2>&1 | tee fx.log`.
4. Open `https://test-videos.co.uk/bigbuckbunny/mp4-h264` 1080p clip.
5. Success markers in `fx.log`:
- `V4L2ProbeDevice: /dev/video0 supports S264 (stateless)`
- `Choosing FFmpeg pixel format for V4L2 video decoding.`
- `Requesting pixel format DRM PRIME`
- `av_hwdevice_ctx_create(DRM, /dev/dri/renderD128) ok`
- **No** `Using preferred software codec h264`.
6. `cat /sys/kernel/debug/clk/clk_summary | grep vdec` — clock should
be active during playback.
7. `top` — CPU < 40% on a single A72 core for 1080p H.264 (stock =
100% on all 6 cores).
## 7. Build + ship — `firefox-fourier` PKGBUILD
Mirror `chromium-fourier` shape exactly (sibling).
```bash
pkgname=firefox-fourier
pkgver=149.0
arch=('aarch64' 'x86_64')
makedepends=(rust clang lld nodejs python cbindgen nasm yasm wasi-libc-bin
gtk3 mesa libva ffmpeg) # ffmpeg only for headers via system libs
source=(
"https://archive.mozilla.org/pub/firefox/releases/${pkgver}/source/firefox-${pkgver}.source.tar.xz"
patches/0001-gfxinfo-v4l2-stateless-fourccs.patch
patches/0002-libwrapper-hwdevice-ctx-create.patch
patches/0003-ffmpegvideo-v4l2-request-route.patch
patches/0004-prefs-v4l2-request.patch
mozconfig
)
```
`prepare()`: `cd firefox-${pkgver}` → apply patches with `patch -Np1`.
`build()`: `MOZ_NOSPAM=1 ./mach build`. `package()`: `./mach install
DESTDIR=${pkgdir}`. `mozconfig` enables
`--enable-default-toolkit=cairo-gtk3-wayland`, `--with-system-ffmpeg`,
`ac_add_options --disable-tests`. **No** `--enable-media-gpu-process`
— let it default. Tarball is the official Mozilla source release (not
gecko-dev).
Extra makedepends vs. stock firefox PKGBUILD: none — this only
modifies existing C++.
## 8. Risk register (ranked)
1. **libavcodec ABI mismatch.** ALARM ships ffmpeg 7.x; Firefox dlopens
whatever's at `libavcodec.so.61`. If the v4l2_request hwaccel was
compiled out (Arch's ffmpeg has it; ALARM rebuild may not),
`av_hwdevice_ctx_create(DRM, ...)` succeeds but no codec binds —
silent fallback. Mitigation: `ffmpeg -hwaccels` should list `drm`.
2. **Renderer-process sandbox** blocks `/dev/dri/renderD128` open.
VAAPI already brokered this for RDD process; verify
`SandboxBrokerPolicyFactory.cpp` covers `/dev/media*` too — likely
doesn't.
3. **glxtest probe runs in stripped env.** `v4l2test` (the
FireTestProcess child) needs `cap_sys_admin` for
`VIDIOC_S_EXT_CTRLS` request API ioctls? No — request API just
needs `O_RDWR` on `/dev/media*`. Should be fine.
4. **Regression of stateful path.** Adding new fourccs is additive;
the routing branch is gated on `mV4L2IsStateless`. Stateful boards
(Pi4) untouched.
5. **NV15/10-bit on RK3588** — explicitly out-of-scope v1;
gate-rejected.
6. **rkvdec2 driver maturity.** Linux 6.12 mainline rkvdec2 H.264
works; HEVC/VP9 still upstream-pending on some boards. Probe will
skip what kernel doesn't expose.
7. **DMA-BUF modifier negotiation** with panfrost/panthor on Wayland
— already shaken out by chromium-fourier on RK3566; same code
path.
## 9. Upstream path (bug 1969297)
Split into 4 reviewable commits matching files 1, 2+3, 5, 6 from the
table. Add a gtest exercising `V4L2ProbeDevice` against a synthetic
v4l2test stdout containing `S264` (no kernel needed). Reach out to
skyevg (D252119 author) for review continuity. r? jya for the
FFmpegVideoDecoder change. The `av_hwdevice_ctx_create` wrapper
addition is a self-contained 6-liner that should land independently.
The 10-bit/SAND concerns Turner raised remain valid for Pi5 —
explicitly scope this series to **stateless DRM_PRIME NV12 only**,
leaving SAND for a follow-up bug.
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -0,0 +1,28 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alex Hochheiden <ahochheiden@mozilla.com>
Date: Wed, 1 Apr 2026 18:11:37 +0000
Subject: [PATCH] Bug 2023597 - Use `wasm32-wasip1` target for clang >= 22.1
r=firefox-build-system-reviewers,sergesanspaille
https://github.com/llvm/llvm-project/pull/165345
https://releases.llvm.org/22.1.0/tools/clang/docs/ReleaseNotes.html
Differential Revision: https://phabricator.services.mozilla.com/D291023
---
build/moz.configure/toolchain.configure | 3 +++
1 file changed, 3 insertions(+)
diff --git a/build/moz.configure/toolchain.configure b/build/moz.configure/toolchain.configure
index a37ed610cc43..c7d0c8bdf75c 100644
--- a/build/moz.configure/toolchain.configure
+++ b/build/moz.configure/toolchain.configure
@@ -695,6 +695,9 @@ def check_compiler(configure_cache, compiler, language, target, android_version)
# This makes clang define __ANDROID_API__ and use versioned library
# directories from the NDK.
toolchain = "%s%d" % (target.toolchain, android_version)
+ elif target.kernel == "WASI" and info.type == "clang" and info.version >= Version("22.1"):
+ # The wasm32-wasi target was renamed to wasm32-wasip1 in LLVM 22.1.
+ toolchain = "wasm32-wasip1"
else:
toolchain = target.toolchain
+36
View File
@@ -0,0 +1,36 @@
# firefox-fourier mozconfig — minimal, Wayland + system ffmpeg.
ac_add_options --enable-application=browser
ac_add_options --enable-default-toolkit=cairo-gtk3-wayland
ac_add_options --enable-release
ac_add_options --enable-optimize
ac_add_options --enable-rust-simd
ac_add_options --enable-linker=lld
# Arch's 0004 patch updates the wasm32-wasip1 target string but ALARM's
# wasi-libc package doesn't expose the headers at the path Mozilla's
# probe looks for. Disable the wasm sandbox — hardens font/graphics
# parsers only, no impact on V4L2 decode. Revisit when ALARM's
# wasi-libc catches up to Arch x86_64's layout.
ac_add_options --without-wasm-sandboxed-libraries
# Firefox dlopens libavcodec.so at runtime regardless of build flags;
# the v4l2_request hwaccel routing happens via the system libavcodec
# loaded at startup, controlled by media.ffmpeg.enabled (default true).
# No configure-time hook needed.
ac_add_options --disable-tests
ac_add_options --disable-debug
ac_add_options --disable-debug-symbols
ac_add_options --disable-crashreporter
ac_add_options --disable-updater
ac_add_options --disable-default-browser-agent
# Mozilla branding requires a separate signed-build-tooling agreement
# we don't have; ship with the unbranded "firefox-fourier" identity.
ac_add_options --with-app-name=firefox-fourier
ac_add_options --with-app-basename=Firefox
ac_add_options --with-distribution-id=de.reauktion.fourier
# Reduce build memory pressure on aarch64 — parallel link is heavy.
mk_add_options MOZ_PARALLEL_BUILD=8
@@ -0,0 +1,19 @@
// firefox-fourier — RK3399 V4L2-stateless HW-decode default prefs.
//
// The patch series (0001..0004) builds the VAAPI / V4L2-request routing
// path through libavcodec, but the resulting code path is gated by three
// other prefs that are 'false' upstream because the relevant probes don't
// fire on panfrost EGL or trip the Intel-tuned cost heuristic. Without
// these, firefox-fourier silently SW-decodes on a fresh profile despite
// having all the unlock patches applied.
//
// Filed via marfrit/marfrit-packages#8 — see that issue for MOZ_LOG
// evidence on fresnel (Pinebook Pro / RK3399).
//
// These are *vendor* defaults: lower precedence than user.js and
// about:config user prefs. Power users who want to disable HW decode for
// debugging can flip them in user prefs without touching this file.
pref("widget.dmabuf.force-enabled", true);
pref("media.hardware-video-decoding.force-enabled", true);
pref("media.ffvpx-hw.enabled", true);
@@ -0,0 +1,89 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] transaction: bypass watchDmaBuf implicit-sync fence wait
Date: 2026-04-28
Background
----------
KWin's `Transaction::watchDmaBuf` (src/wayland/transaction.cpp) calls
`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` on every plane of every imported
dmabuf and parks the transaction on a `QSocketNotifier(POLLIN)`
waiting for the resulting sync_file fd to become readable. The intent
is correct in principle — wait for the producer to finish writing
before sampling — but on V4L2 hantro CAPTURE buffers (RK3566 mainline
6.19, panfrost mesa 26.0.5) the resulting fence either signals so
late that chrome's 6-buffer V4L2 capture pool exhausts, or never
signals at all. Symptom (per chromium-fourier KWIN_PIVOT.md):
- chrome v4 attaches a video frame to a wp_subsurface, commits
- KWin's Transaction::commit calls watchDmaBuf, exports a sync_file,
parks on QSocketNotifier
- Sync_file never becomes readable
- Transaction never applies; old surface state never replaced
- wl_buffer.release for the previous video buffer never sent
- chrome's V4L2 capture pool starves at ~6 seconds, decoder blocks,
audio drains, hard stall
mpv with `--vo=gpu-next` on the same KWin session slideshows at 76%
drop rate but does not deadlock — its single-surface attach pattern
hits a different transaction shape than chrome's subsurface flow.
A clean weston A/B with the same chrome v4 binary plays through
end-to-end: the bug is specifically KWin's transaction fence-wait
path on this stack, not Wayland-as-a-protocol.
Fix
---
This experimental patch no-ops `watchDmaBuf` to test the hypothesis.
Implicit-sync correctness in this case is not lost: the V4L2
producer guarantees the buffer's contents are complete before
chrome sends `wl_surface.attach + commit`, and the wp_linux_dmabuf
client is required to do so by spec. The fence-wait was a defensive
optimization for misbehaving clients, not a correctness primitive.
If chrome plays through end-to-end at the recorded 34.7% combined
CPU number under KWin with this patch, the bug is confirmed and the
upstream fix can be refined (timeout, V4L2-source skip, or use the
dmabuf fd directly in the QSocketNotifier instead of an extra
exported sync_file).
diff --git a/src/wayland/transaction.cpp b/src/wayland/transaction.cpp
index 967b22b..e3fbc06 100644
--- a/src/wayland/transaction.cpp
+++ b/src/wayland/transaction.cpp
@@ -263,27 +263,18 @@ static FileDescriptor exportWaitSyncFile(const FileDescriptor &fileDescriptor)
return FileDescriptor{};
}
#endif
void Transaction::watchDmaBuf(TransactionEntry *entry)
{
-#if defined(Q_OS_LINUX)
- const DmaBufAttributes *attributes = entry->buffer->dmabufAttributes();
- if (!attributes) {
- return;
- }
-
- for (int i = 0; i < attributes->planeCount; ++i) {
- const FileDescriptor &fileDescriptor = attributes->fd[i];
- if (fileDescriptor.isReadable()) {
- continue;
- }
-
- auto syncFile = exportWaitSyncFile(fileDescriptor);
- if (syncFile.isValid()) {
- entry->fences.emplace_back(std::make_unique<TransactionFence>(this, std::move(syncFile)));
- }
- }
-#endif
+ // kwin-fourier: no-op the implicit-sync fence wait. On V4L2
+ // hantro CAPTURE buffers (RK3566 mainline 6.19, panfrost mesa
+ // 26.0.5) the DMA_BUF_IOCTL_EXPORT_SYNC_FILE fence either never
+ // signals or signals so late that chrome's V4L2 capture pool
+ // exhausts at ~6s, hard-stalling the decoder. Wayland clients
+ // are required by spec to ensure the buffer's contents are
+ // complete before wl_surface.attach+commit, so this fence-wait
+ // is a belt-and-braces optimization, not a correctness primitive.
+ Q_UNUSED(entry);
}
} // namespace KWin
@@ -0,0 +1,123 @@
From 54e3862be4d2a5b06a48cdcd61065f759a449a61 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:32:03 +0000
Subject: [PATCH] wayland/transaction: poll dmabuf fd directly instead of
EXPORT_SYNC_FILE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Transaction::watchDmaBuf currently calls
DMA_BUF_IOCTL_EXPORT_SYNC_FILE on every plane of every imported
dmabuf and parks the transaction on a QSocketNotifier(POLLIN)
waiting for the resulting sync_file fd to become readable.
This is correct, but unnecessary. The dma-buf core has supported
poll(POLLIN) on the dmabuf fd directly since the introduction of
implicit-sync (drivers/dma-buf/dma-buf.c, dma_buf_poll). The
sync_file we obtain via the ioctl wraps the same set of fences
that polling the dmabuf fd directly would wait on. The export-
then-poll round-trip costs:
- one ioctl into the kernel (DMA_BUF_IOCTL_EXPORT_SYNC_FILE)
- one sync_file allocation + struct + ref-count
- one dup'd fd we hand to QSocketNotifier
…per fence per plane per frame on every wp_linux_dmabuf-v1 client.
Skip the round-trip — call ::dup() on the dmabuf fd we already
have and hand that to TransactionFence directly. Same wait
semantics, fewer syscalls.
Tested on PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19,
KDE Plasma 6.6.4 Wayland) playing 1080p30 H.264 in chromium.
Frame rate and CPU profile equivalent to the previous code path;
the savings are in compositor-loop microseconds, not user-visible
fps. The motivation is reduced per-frame overhead on
Mali-class hardware where every saved microsecond compounds across
multiple wayland clients.
Side effect: removes the dependency on <linux/dma-buf.h> and
<xf86drm.h> in transaction.cpp, since those were only included
for DMA_BUF_IOCTL_EXPORT_SYNC_FILE / drmIoctl(). The
exportWaitSyncFile() helper is removed for the same reason.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
src/wayland/transaction.cpp | 39 +++++++++++++------------------------
1 file changed, 14 insertions(+), 25 deletions(-)
diff --git a/src/wayland/transaction.cpp b/src/wayland/transaction.cpp
index 967b22b..f55ea16 100644
--- a/src/wayland/transaction.cpp
+++ b/src/wayland/transaction.cpp
@@ -11,11 +11,6 @@
#include "wayland/subcompositor.h"
#include "wayland/surface_p.h"
-#if defined(Q_OS_LINUX)
-#include <linux/dma-buf.h>
-#include <xf86drm.h>
-#endif
-
namespace KWin
{
@@ -249,41 +244,35 @@ void Transaction::watchSyncObj(TransactionEntry *entry)
entry->fences.emplace_back(std::make_unique<TransactionFence>(this, std::move(eventFd)));
}
-#if defined(Q_OS_LINUX)
-static FileDescriptor exportWaitSyncFile(const FileDescriptor &fileDescriptor)
-{
- dma_buf_export_sync_file request{
- .flags = DMA_BUF_SYNC_READ,
- .fd = -1,
- };
- if (drmIoctl(fileDescriptor.get(), DMA_BUF_IOCTL_EXPORT_SYNC_FILE, &request) == 0) {
- return FileDescriptor(request.fd);
- }
-
- return FileDescriptor{};
-}
-#endif
-
void Transaction::watchDmaBuf(TransactionEntry *entry)
{
-#if defined(Q_OS_LINUX)
const DmaBufAttributes *attributes = entry->buffer->dmabufAttributes();
if (!attributes) {
return;
}
+ // The dma-buf core (drivers/dma-buf/dma-buf.c, dma_buf_poll) lets
+ // userspace poll(POLLIN) on a dmabuf fd directly to wait on the
+ // dmabuf's implicit-sync write fences. Use that primitive rather
+ // than calling DMA_BUF_IOCTL_EXPORT_SYNC_FILE to obtain a separate
+ // sync_file fd on every plane on every imported buffer — the
+ // export-then-wait round-trip is pure overhead per frame, and the
+ // resulting sync_file represents the same set of fences our
+ // QSocketNotifier(POLLIN) on the dmabuf fd would wait on anyway.
+ //
+ // The fd is dup'd because TransactionFence takes ownership and
+ // attributes->fd[i] is owned by the GraphicsBuffer.
for (int i = 0; i < attributes->planeCount; ++i) {
const FileDescriptor &fileDescriptor = attributes->fd[i];
if (fileDescriptor.isReadable()) {
continue;
}
- auto syncFile = exportWaitSyncFile(fileDescriptor);
- if (syncFile.isValid()) {
- entry->fences.emplace_back(std::make_unique<TransactionFence>(this, std::move(syncFile)));
+ FileDescriptor dup_fd(::dup(fileDescriptor.get()));
+ if (dup_fd.isValid()) {
+ entry->fences.emplace_back(std::make_unique<TransactionFence>(this, std::move(dup_fd)));
}
}
-#endif
}
} // namespace KWin
--
2.47.3
+129
View File
@@ -0,0 +1,129 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
# Upstream maintainers: Felix Yan, Antonio Rojas
# Contributor: Andrea Scarpino <andrea@archlinux.org>
#
# kwin-fourier — KWin 6.6.5 with the V4L2-stateless implicit-sync
# transaction wait bypass. Hypothesis: KWin's
# `Transaction::watchDmaBuf` calls DMA_BUF_IOCTL_EXPORT_SYNC_FILE on
# every plane of every imported dmabuf and parks the transaction on a
# QSocketNotifier waiting for the resulting sync_file fd to become
# readable. For V4L2 hantro CAPTURE buffers on RK3566 mainline 6.19,
# that fence either never signals or signals so late that chrome's
# 6-buffer V4L2 capture pool exhausts at ~6 seconds, blocking the
# decoder. mpv (single-surface attach pattern) merely slideshows
# under KWin (76% drop rate); chrome (subsurface attach) deadlocks.
#
# This experimental build no-ops `watchDmaBuf` to test the
# hypothesis. If chrome plays through end-to-end at the recorded
# 34.7% CPU number, the bug is confirmed and the upstream fix can be
# refined (e.g., short timeout, skip-on-V4L2, or use the dmabuf fd
# directly without exporting an extra sync_file). See
# ../chromium-fourier/KWIN_PIVOT.md for the full diagnosis thread.
pkgname=kwin-fourier
pkgver=6.6.5
_dirver=$(echo $pkgver | cut -d. -f1-3)
pkgrel=1
_upname=kwin
epoch=1
arch=(aarch64 x86_64)
url='https://kde.org/plasma-desktop/'
license=(LGPL-2.0-or-later)
depends=(aurorae
breeze
gcc-libs
glibc
iio-sensor-proxy
plasma-activities
kauth
kcmutils
kcolorscheme
kconfig
kcoreaddons
kcrash
kdbusaddons
kdeclarative
kdecoration
kglobalaccel
kglobalacceld
kguiaddons
ki18n
kidletime
kirigami
kitemmodels
knewstuff
knighttime
knotifications
kpackage
kquickcharts
kscreenlocker
kservice
ksvg
kwayland
kwidgetsaddons
kwindowsystem
kxmlgui
lcms2
libcanberra
libdisplay-info
libdrm
libei
libepoxy
libevdev
libinput
libpipewire
libqaccessibilityclient-qt6
libxcb
libxcvt
libxkbcommon
mesa
milou
pipewire-session-manager
libplasma
qt6-5compat
qt6-base
qt6-declarative
qt6-svg
qt6-tools
systemd-libs
wayland
xcb-util-keysyms
xcb-util-wm)
makedepends=(extra-cmake-modules
kdoctools
krunner
plasma-wayland-protocols
python
wayland-protocols
xorg-xwayland)
optdepends=('plasma-keyboard: virtual keyboard')
groups=(plasma)
provides=(kwin)
conflicts=(kwin)
replaces=(kwin)
source=(https://download.kde.org/stable/plasma/$_dirver/$_upname-$pkgver.tar.xz{,.sig}
0001-transaction-bypass-watchDmaBuf-fence-wait.patch)
sha256sums=('6c187ce7a5506090b438ef900103836fa0537674dde8b31e5b497ef321643cb4'
'SKIP'
'SKIP')
validpgpkeys=('E0A3EB202F8E57528E13E72FD7574483BB57B18D' # Jonathan Esk-Riddell <jr@jriddell.org>
'0AAC775BB6437A8D9AF7A3ACFE0784117FBCE11D' # Bhushan Shah <bshah@kde.org>
'D07BD8662C56CB291B316EB2F5675605C74E02CF' # David Edmundson <davidedmundson@kde.org>
'90A968ACA84537CC27B99EAF2C8DF587A6D4AAC1' # Nicolas Fella <nicolas.fella@kde.org>
'1FA881591C26B276D7A5518EEAAF29B42A678C20') # Marco Martin <notmart@gmail.com>
prepare() {
patch -d $_upname-$pkgver -p1 < 0001-transaction-bypass-watchDmaBuf-fence-wait.patch
}
build() {
cmake -B build -S $_upname-$pkgver \
-DCMAKE_INSTALL_LIBEXECDIR=lib \
-DBUILD_TESTING=OFF
cmake --build build
}
package() {
DESTDIR="$pkgdir" cmake --install build
setcap CAP_SYS_NICE=+ep "$pkgdir"/usr/bin/kwin_wayland
}
+110
View File
@@ -0,0 +1,110 @@
# kwin-fourier
A diagnostic patch that no-ops `Transaction::watchDmaBuf` in KWin
6.6.4. **Test fixture, not the upstream-bound shape.**
## Background
KWin's `Transaction::watchDmaBuf`
(`src/wayland/transaction.cpp:265`) calls
`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` on every plane of every imported
dmabuf and parks the transaction on a `QSocketNotifier(POLLIN)`
waiting for the resulting sync_file fd to become readable.
For V4L2-produced dmabufs (hantro / rockchip-rga and any other vb2
driver), that fence either:
- **Stub-signals immediately**, because vb2 doesn't populate
`dma_resv` exclusive fences (see kernel layer in the top-level
README) and `dma_buf_export_sync_file` substitutes
`dma_fence_get_stub()`. Pure latency cost: a synchronous ioctl +
socket-notifier setup per frame, for a fence that signals in
microseconds and represents nothing real.
- **Signals very late or not at all**, on edge cases that we hit
during the chromium-fourier validation campaign. KWin's
transaction parks indefinitely; the previous wl_buffer never gets
released to the client; the client's V4L2 capture pool starves;
hard stall.
## Patches
This directory carries **two** patches — the PKGBUILD applies only
`0001` for now (validated on ohm), while `0002` is the
upstream-bound shape staged here for later validation and
submission.
### `0001-transaction-bypass-watchDmaBuf-fence-wait.patch` *(currently shipped)*
No-ops `Transaction::watchDmaBuf` entirely. Every transaction
commits without waiting on implicit-sync fences for the dmabufs
it imports. **Test fixture, validated end-to-end on ohm**: the
patch unblocks chromium-fourier 1080p30 H.264 playback under KDE
Plasma 6.6.4 Wayland on RK3566 + panfrost + mainline 6.19.
### `0002-transaction-poll-dmabuf-fd-directly-upstream-shape.patch` *(unvalidated, upstream-bound)*
Rewrites `Transaction::watchDmaBuf` to call `poll(POLLIN)` on the
dmabuf fd directly via a duplicated fd in a `QSocketNotifier`,
instead of going through `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` plus a
sync_file fd. The dma-buf core has supported polling the dmabuf fd
for implicit-sync write fences since the introduction of the
feature; the export-then-poll round-trip is per-frame syscall
overhead with no semantic difference.
This shape preserves KWin's defense — the wait still actually
*waits* on the producer's fence — while shedding the per-frame
overhead. It is **not validated yet** and is offered here as the
shape upstream review will likely converge on. Validation gates
before swapping the PKGBUILD to apply 0002 instead of 0001:
1. Build kwin-fourier with 0002 instead of 0001 (one PKGBUILD line
change).
2. Install on ohm; restart Plasma session so the new
`kwin_wayland` is mapped.
3. Run chromium-fourier + bbb sample as before. Expectation:
plays through end-to-end at the same ~81 % combined CPU.
Equivalence with 0001 confirms the upstream shape works
without weakening defenses.
4. Capture before/after `dma_buf_export_sync_file` syscall counts
via `strace -c` on `kwin_wayland` (the per-frame syscall savings
are the patch's claimed benefit).
5. Submit to invent.kde.org/plasma/kwin against `master`.
## Why patch 0001 is *not* the upstream-bound shape
Wayland's security model is "compositor trusts no client" —
watchDmaBuf is a defense against a misbehaving client that attaches
a buffer the GPU is still writing. The blanket no-op makes a
correctness-equivalent assumption (`every Wayland client honors the
spec`) that KWin maintainers are reasonably unwilling to take
unconditionally.
**The upstream-correct fix lives in the kernel** (vb2 / hantro /
rga don't populate `dma_resv` — fix that, KWin's wait now works
correctly because the fence is real). Once the kernel side lands,
KWin can either keep its current wait (now correct) or migrate to
`poll(POLLIN)` directly on the dmabuf fd, skipping the
`EXPORT_SYNC_FILE` ioctl.
The kwin-fourier patch in this repo is the **diagnostic** that
identified the kernel bug and lets the chromium-fourier validation
proceed today on stock kernel + KWin. It will be rewritten or
removed once the kernel side is upstream.
## Building / installing
```sh
makepkg -si
```
The PKGBUILD inherits from upstream Arch's kwin 6.6.4-1, applies
the single watchDmaBuf bypass, and bumps `epoch=1` to dominate
upstream pkgrel.
## Side effect
Across the test session, every wp_linux_dmabuf client on the
compositor (chrome, brave, mpv, VLC, …) feels markedly snappier on
Mali-class hardware because the per-frame sync_file roundtrip is
gone. A pleasant accident; the cleaner, kernel-side fix will
preserve the speedup without weakening the defense.
+89
View File
@@ -0,0 +1,89 @@
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
#
# libva-v4l2-request-fourier — VA-API backend for V4L2 stateless decoders,
# multiplanar fork. Successor to the predecessor experimental package
# libva-v4l2-request-ohm-gl-fix (tarball + 18-patch stack); this package
# tracks the campaign fork's git history directly, so iteration sweeps
# (DEBUG removal, follow-up bugfixes) land in a clean linear log.
#
# Campaign: ~/src/libva-multiplanar/ (iter8 close 2026-05-06) plus
# ~/src/fresnel-fourier/ which carried the fork to iter38b — multi-device
# probe so a single libva session serves all 5 codecs (rkvdec H.264 +
# HEVC + VP9, hantro MPEG-2 + VP8) plus a bounds-check fix for
# MAX_PROFILES. Pinned below.
# Fork repo: https://git.reauktion.de/marfrit/libva-v4l2-request-fourier
# Bootlin upstream: https://github.com/bootlin/libva-v4l2-request
#
# Build target: fermi LXC actrunner-aarch64-hertz via marfrit-packages
# Gitea Actions (path=arch/libva-v4l2-request-fourier triggers the
# libva-v4l2-request-fourier-aarch64 job — wire up alongside the existing
# ffmpeg-v4l2-request-git job in .gitea/workflows/build.yml).
# Alternative: boltzmann via his subagent + marfrit-publish.
pkgname=libva-v4l2-request-fourier
_upstreampkg=libva-v4l2-request
# Pin the fork tip. de27e95 = "v4l2: log error_idx + failing ctrl id
# on S_EXT_CTRLS failure" — Phase 8.13 diagnostic that surfaced the
# real root cause of the libva→daedalus_v4l2 request-completion
# timeout (turned out the EINVAL libva was logging was a harmless
# H264/HEVC probe; actual VP9 stateless control SET worked all along).
#
# Prior pin (7ac934e) was iter38b — fresnel-fourier multi-device probe
# + MAX_PROFILES bounds-check fix. de27e95 adds the daedalus_v4l2
# probe slot (b5b3acf), the meson option gate (2146341), and the
# S_EXT_CTRLS diagnostic (de27e95 itself). Backward-compatible on
# rkvdec / hantro / cedrus / rpi-hevc-dec hosts — daedalus probe is
# off by default unless the kernel module is present.
_commit=de27e95571b67ef34619c23a12db4698f9b3454e
# Project version from meson.build (1.0.0) + commit count + short sha,
# matching the ffmpeg-v4l2-request-fourier convention. Recomputed at
# build time by pkgver() below; the static value here is a placeholder
# so AUR-style consumers see something coherent before src/ exists.
pkgver=1.0.0.r376.de27e95
pkgrel=1
pkgdesc="VA-API backend for V4L2 stateless decoders (multiplanar fork — fourier umbrella)"
arch=('aarch64')
url="https://git.reauktion.de/marfrit/libva-v4l2-request-fourier"
license=('LGPL2.1' 'MIT')
depends=('libva' 'libdrm' 'systemd-libs')
makedepends=('meson' 'ninja' 'pkgconf' 'git')
provides=("${_upstreampkg}=${pkgver}" 'libva-driver')
conflicts=("${_upstreampkg}" 'libva-v4l2-request-ohm-gl-fix')
replaces=("${_upstreampkg}" 'libva-v4l2-request-ohm-gl-fix')
source=("git+https://git.reauktion.de/marfrit/libva-v4l2-request-fourier.git#commit=${_commit}")
sha256sums=('SKIP')
pkgver() {
cd "${srcdir}/${_upstreampkg}-fourier"
printf '1.0.0.r%s.%s' \
"$(git rev-list --count HEAD)" \
"$(git rev-parse --short=7 HEAD)"
}
build() {
cd "${srcdir}/${_upstreampkg}-fourier"
# meson_options.txt only exposes 'kernel_headers' — leave it empty to
# use system /usr/include kernel UAPI headers. No per-codec toggles.
#
# b_lto=false: override arch-meson's wrapper default of `-D b_lto=true`,
# which the makepkg.conf OPTIONS=(..., !lto, ...) line does NOT actually
# override (arch-meson hard-codes b_lto=true). The hand-built reproducer
# from issue #17 shows: LTO/ICF kernel-folds per-codec helpers and HEVC's
# multi-control-struct chain (SPS+PPS+DECODE_PARAMS+SLICE_PARAMS) gets a
# wrong helper-instance pulled in at vaEndPicture → segfault. The 4 other
# codecs (single-control-struct) tolerate the folding by accident.
arch-meson build --buildtype=release -Db_lto=false
meson compile -C build
}
package() {
cd "${srcdir}/${_upstreampkg}-fourier"
meson install -C build --destdir "${pkgdir}"
install -Dm644 COPYING "${pkgdir}/usr/share/licenses/${pkgname}/COPYING"
install -Dm644 COPYING.LGPL "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.LGPL"
install -Dm644 COPYING.MIT "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.MIT"
}
@@ -0,0 +1,70 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
Fourier's local patch already wired multiplanar plumbing through
src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
src/picture.c via the v4l2_type_video_{output,capture}(video_format
->v4l2_mplane) helper calls.
The remaining gap: the NV12 entry in src/video.c was hardcoded to
v4l2_mplane=false, and the bootstrap path in src/surface.c was
hardcoded to singleplanar literals before video_format is populated.
This patch flips the NV12 entry to v4l2_mplane=true and updates the
two singleplanar literals in src/surface.c to their MPLANE variants:
- src/video.c:42 v4l2_mplane=false -> true (NV12 only;
Sunxi-tiled NV12 left at false for cedrus compatibility)
- src/surface.c:84 output_type = v4l2_type_video_output(true)
- src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)
Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
strace baseline).
Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
known userbase here is RK3566/RK3568 hantro.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
src/surface.c | 4 ++--
src/video.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/src/video.c
+++ b/src/video.c
@@ -39,7 +39,7 @@ static struct video_format formats[] = {
.description = "NV12 YUV",
.v4l2_format = V4L2_PIX_FMT_NV12,
.v4l2_buffers_count = 1,
- .v4l2_mplane = false,
+ .v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV12,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
--- a/src/surface.c
+++ b/src/surface.c
@@ -81,7 +81,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
// we declare SET_FORMAT_OF_OUTPUT_ONCE to ensure v4l2_set_format only gets called once
// (in the first RequestCreateSurfaces2 call BEFORE any buffers are created later on)
unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
- unsigned int output_type = v4l2_type_video_output(false);
+ unsigned int output_type = v4l2_type_video_output(true);
if (!SET_FORMAT_OF_OUTPUT_ONCE) {
rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
@@ -106,7 +106,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
found = v4l2_find_format(driver_data->video_fd,
- V4L2_BUF_TYPE_VIDEO_CAPTURE,
+ V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
V4L2_PIX_FMT_NV12);
if (found)
video_format = video_format_find(V4L2_PIX_FMT_NV12);
@@ -0,0 +1,103 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] context: pre-STREAMON device controls and minimum OUTPUT pool
Two related fixes that surfaced during the first hantro-vpu (RK3568)
smoke test of the multiplanar build:
1. **OUTPUT queue must be non-empty at STREAMON.** Hantro's
vb2_start_streaming rejects an empty queue with EINVAL. Some
VA-API callers (notably ffmpeg's vaapi-copy path) call
vaCreateContext with num_render_targets=0 and allocate render
targets lazily. The OUTPUT (bitstream-input) pool must NOT be
sized off surfaces_count alone — it is a request-time resource,
not per-surface. Quick fix: floor the pool to 4 buffers when
the caller passes 0. (A proper decoupling of OUTPUT pool from
surface lifecycle is documented in upstreamable_design.md.)
2. **Device-wide stateless H.264 controls before STREAMON.** The
V4L2 stateless framework requires V4L2_CID_STATELESS_H264_
DECODE_MODE and START_CODE be set on the device fd
(request_fd=-1) before stream start. Per-request controls
(SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come
later via h264_set_controls(). hantro-vpu accepts only
DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the
existing slice-assembly path emits.
This is set unconditionally for now (errors silently ignored)
to keep cedrus and other backends compatible — they may
default to SLICE_BASED and not expose DECODE_MODE at all.
Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct
approach (see upstreamable_design.md §3).
After this patch, vainfo still enumerates as before, but the first
mpv vaapi-copy attempt advances past STREAMON and into actual
decode submission.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
src/context.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
--- a/src/context.c
+++ b/src/context.c
@@ -64,6 +64,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
VAContextID id;
VAStatus status;
unsigned int output_type, capture_type;
+ unsigned int output_buffers_count;
unsigned int index_base;
unsigned int index;
unsigned int i;
@@ -90,8 +91,16 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
}
memset(&context_object->dpb, 0, sizeof(context_object->dpb));
+ /*
+ * The OUTPUT (bitstream-input) queue must be non-empty before
+ * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
+ * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
+ * vaCreateContext with num_render_targets=0; allocate a small
+ * minimum pool independent of the caller's surface count.
+ */
+ output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
rc = v4l2_create_buffers(driver_data->video_fd, output_type,
- surfaces_count, &index_base);
+ output_buffers_count, &index_base);
if (rc < 0) {
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
@@ -138,6 +147,33 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
surface_object->source_size = length;
}
+ /*
+ * Stateless H.264 device-wide controls. The kernel V4L2 stateless
+ * framework requires DECODE_MODE and START_CODE be set on the
+ * device fd (request_fd=-1) before VIDIOC_STREAMON; per-request
+ * controls (SPS/PPS/etc.) attached to a request_fd come later.
+ *
+ * hantro-vpu (RK3568) accepts only DECODE_MODE_FRAME_BASED.
+ * START_CODE_ANNEX_B preserves leading 0x00000001 in the slice
+ * payload that h264.c assembles. Errors here are not fatal: not
+ * every backing driver supports both controls (e.g. cedrus may
+ * default to SLICE_BASED without exposing DECODE_MODE).
+ */
+ {
+ struct v4l2_ext_control dev_ctrls[2] = {
+ {
+ .id = V4L2_CID_STATELESS_H264_DECODE_MODE,
+ .value = V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
+ },
+ {
+ .id = V4L2_CID_STATELESS_H264_START_CODE,
+ .value = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
+ },
+ };
+ (void)v4l2_set_controls(driver_data->video_fd, -1,
+ dev_ctrls, 2);
+ }
+
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,145 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers
Pure utility additions, no behaviour change. Three helpers in
src/v4l2.{c,h}:
- v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID.
Returns 0 if the control exists, -1 if not. Caller passes NULL
qec to test existence only.
- v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index.
Returns 0 if a menu item exists at that index, -1 otherwise.
- v4l2_ctrl_menu_has_value(): convenience layered on the above.
For a menu/intmenu-type control, walks all menu items between
minimum and maximum and returns true iff `value` is a valid
entry. Used by callers that ask "does this driver accept menu
value X for this CID?" without caring about iteration details.
These unblock commit 3 (request_pool — needs ext-ctrl probing for
codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/
START_CODE — replaces 0002's unconditional set with a real probe)
of the upstreamable design's six-commit series.
Forward-declarations in v4l2.h keep the header lean: existing
prototypes already use opaque struct v4l2_ext_control * pointers
without including <linux/videodev2.h>; we follow the same
convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu.
No call sites added in this commit. Compile-only verification:
the .so links cleanly with three new exports.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
src/v4l2.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
src/v4l2.h | 33 +++++++++++++++++++++++++++++
2 files changed, 93 insertions(+)
--- a/src/v4l2.h
+++ b/src/v4l2.h
@@ -64,4 +64,37 @@ int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
unsigned int size);
int v4l2_set_stream(int video_fd, unsigned int type, bool enable);
+/*
+ * Capability-probe helpers. These let calling code discover what the
+ * backing kernel driver supports rather than hardcoding assumptions
+ * about specific decoder hardware.
+ */
+
+/*
+ * Query the metadata of an extended control by CID. Fills *qec on
+ * success. Returns 0 if the control exists, -1 (errno=EINVAL) if the
+ * driver does not expose this CID. Pass qec=NULL to test existence
+ * only.
+ */
+struct v4l2_query_ext_ctrl;
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
+ struct v4l2_query_ext_ctrl *qec);
+
+/*
+ * Query a single menu item of a menu/intmenu control at the given
+ * index. Fills *qm on success. Returns 0 if the menu item exists at
+ * this index, -1 otherwise.
+ */
+struct v4l2_querymenu;
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
+ struct v4l2_querymenu *qm);
+
+/*
+ * Convenience: for a menu-type control, return true iff `value` is a
+ * valid menu entry (i.e. the driver accepts it). Walks all menu items
+ * up to the control's maximum to check.
+ */
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
+ unsigned int value);
+
#endif
--- a/src/v4l2.c
+++ b/src/v4l2.c
@@ -508,3 +508,63 @@ int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
return 0;
}
+
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
+ struct v4l2_query_ext_ctrl *qec)
+{
+ struct v4l2_query_ext_ctrl local;
+ struct v4l2_query_ext_ctrl *target = qec ? qec : &local;
+ int rc;
+
+ memset(target, 0, sizeof(*target));
+ target->id = id;
+
+ rc = ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, target);
+ if (rc < 0)
+ return -1;
+
+ return 0;
+}
+
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
+ struct v4l2_querymenu *qm)
+{
+ int rc;
+
+ if (qm == NULL)
+ return -1;
+
+ memset(qm, 0, sizeof(*qm));
+ qm->id = id;
+ qm->index = index;
+
+ rc = ioctl(video_fd, VIDIOC_QUERYMENU, qm);
+ if (rc < 0)
+ return -1;
+
+ return 0;
+}
+
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
+ unsigned int value)
+{
+ struct v4l2_query_ext_ctrl qec;
+ struct v4l2_querymenu qm;
+ long long i;
+
+ if (v4l2_query_ext_ctrl(video_fd, id, &qec) < 0)
+ return false;
+
+ if (qec.type != V4L2_CTRL_TYPE_MENU &&
+ qec.type != V4L2_CTRL_TYPE_INTEGER_MENU)
+ return false;
+
+ for (i = qec.minimum; i <= qec.maximum; i += qec.step ? qec.step : 1) {
+ if (v4l2_query_menu(video_fd, id, (unsigned int)i, &qm) < 0)
+ continue;
+ if ((unsigned int)i == value)
+ return true;
+ }
+
+ return false;
+}
@@ -0,0 +1,545 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.
Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.
Changes:
* NEW src/request_pool.{c,h} (~200 LoC):
- request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
- request_pool_destroy(): munmap all, idempotent.
- request_pool_acquire(): round-robin claim; returns V4L2 buffer
index of an unused slot or -1.
- request_pool_release(): mark slot free for reuse.
- request_pool_slot(): accessor for ptr/size given a buffer index.
* src/request.h: add struct request_pool output_pool to request_data.
* src/context.c::RequestCreateContext: replace the per-surface
OUTPUT loop with a single request_pool_init() call (count=4,
independent of surfaces_count). Drop the now-unused locals
(length, offset, source_data, output_buffers_count, index,
index_base, i, surface_object). DELETES patch 0002's
"output_buffers_count = ... ? ... : 4" hack inline — the pool's
own count parameter supersedes it.
* src/picture.c::RequestBeginPicture: borrow a pool slot at frame
start, write its mmap pointer/size/index into the surface's
transient source_* fields. The fields stay (still useful as
a borrow handle that the existing codec_store_buffer memcpys
target), but no longer represent surface-permanent ownership.
Reset slices_size/slices_count here too (was implicit on first
Render).
* src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
the OUTPUT buffer, release the pool slot and clear the surface's
borrow handle. Fixes the segv on second-frame submission.
* src/surface.c::RequestDestroySurfaces: remove the munmap of
source_data — pool owns the mmap.
* src/request.c::RequestTerminate: call request_pool_destroy()
before close(video_fd) so munmaps still target a valid fd.
* src/meson.build: add request_pool.c and request_pool.h to the
sources/headers lists.
This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.
Build-verified clean on aarch64.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/request.h 2026-05-01 20:09:57.346428828 +0000
+++ b/src/request.h 2026-05-01 20:17:57.497514185 +0000
@@ -31,6 +31,7 @@
#include "context.h"
#include "object_heap.h"
+#include "request_pool.h"
#include "video.h"
#include <va/va.h>
@@ -55,6 +56,13 @@
int media_fd;
struct video_format *video_format;
+
+ /*
+ * OUTPUT (bitstream-input) buffer pool, decoupled from VA
+ * surfaces. Sized by codec pipeline depth, populated on first
+ * RequestCreateContext, torn down at driver Terminate.
+ */
+ struct request_pool output_pool;
};
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
--- a/src/request.c 2026-05-01 20:09:57.346428828 +0000
+++ b/src/request.c 2026-05-01 20:19:48.091143681 +0000
@@ -205,6 +205,13 @@
struct object_config *config_object;
int iterator;
+ /*
+ * Tear down the OUTPUT buffer pool before closing video_fd so
+ * the munmap calls in request_pool_destroy() can still touch the
+ * mmap regions (which are tied to that fd's lifetime).
+ */
+ request_pool_destroy(&driver_data->output_pool);
+
close(driver_data->video_fd);
close(driver_data->media_fd);
--- a/src/context.c 2026-05-01 20:09:57.346428828 +0000
+++ b/src/context.c 2026-05-01 20:18:33.738048227 +0000
@@ -54,20 +54,12 @@
{
struct request_data *driver_data = context->pDriverData;
struct object_config *config_object;
- struct object_surface *surface_object;
struct object_context *context_object = NULL;
struct video_format *video_format;
- unsigned int length;
- unsigned int offset;
- void *source_data = MAP_FAILED;
VASurfaceID *ids = NULL;
VAContextID id;
VAStatus status;
unsigned int output_type, capture_type;
- unsigned int output_buffers_count;
- unsigned int index_base;
- unsigned int index;
- unsigned int i;
int rc;
video_format = driver_data->video_format;
@@ -92,15 +84,20 @@
memset(&context_object->dpb, 0, sizeof(context_object->dpb));
/*
- * The OUTPUT (bitstream-input) queue must be non-empty before
- * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
- * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
- * vaCreateContext with num_render_targets=0; allocate a small
- * minimum pool independent of the caller's surface count.
+ * Initialize the OUTPUT (bitstream-input) buffer pool. Sized by
+ * codec pipeline depth (4 H.264 frames in flight is sufficient
+ * for current hantro/rkvdec scheduling); independent of caller-
+ * supplied surfaces_count. Pool is owned by driver_data so it
+ * outlives any single context destroy/recreate cycle.
+ *
+ * This replaces the prior per-surface OUTPUT loop, which (a)
+ * created an empty queue when surfaces_count==0 (ffmpeg vaapi-
+ * copy path) and (b) only populated surface->source_* for
+ * surfaces present at vaCreateContext time, NULL-derefing on
+ * surfaces created later.
*/
- output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
- rc = v4l2_create_buffers(driver_data->video_fd, output_type,
- output_buffers_count, &index_base);
+ rc = request_pool_init(&driver_data->output_pool,
+ driver_data->video_fd, output_type, 4);
if (rc < 0) {
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
@@ -111,40 +108,15 @@
* we don't have any indication wrt its life time. Let's make sure
* its life span is under our control.
*/
- ids = malloc(surfaces_count * sizeof(VASurfaceID));
- if (ids == NULL) {
- status = VA_STATUS_ERROR_ALLOCATION_FAILED;
- goto error;
- }
-
- memcpy(ids, surfaces_ids, surfaces_count * sizeof(VASurfaceID));
-
- for (i = 0; i < surfaces_count; i++) {
- index = index_base + i;
-
- surface_object = SURFACE(driver_data, surfaces_ids[i]);
- if (surface_object == NULL) {
- status = VA_STATUS_ERROR_INVALID_SURFACE;
- goto error;
- }
-
- rc = v4l2_query_buffer(driver_data->video_fd, output_type,
- index, &length, &offset, 1);
- if (rc < 0) {
+ if (surfaces_count > 0) {
+ ids = malloc(surfaces_count * sizeof(VASurfaceID));
+ if (ids == NULL) {
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
goto error;
}
- source_data = mmap(NULL, length, PROT_READ | PROT_WRITE,
- MAP_SHARED, driver_data->video_fd, offset);
- if (source_data == MAP_FAILED) {
- status = VA_STATUS_ERROR_ALLOCATION_FAILED;
- goto error;
- }
-
- surface_object->source_index = index;
- surface_object->source_data = source_data;
- surface_object->source_size = length;
+ memcpy(ids, surfaces_ids,
+ surfaces_count * sizeof(VASurfaceID));
}
/*
@@ -200,9 +172,6 @@
goto complete;
error:
- if (source_data != MAP_FAILED)
- munmap(source_data, length);
-
if (ids != NULL)
free(ids);
--- a/src/picture.c 2026-05-01 20:09:57.346428828 +0000
+++ b/src/picture.c 2026-05-01 20:19:10.742593454 +0000
@@ -216,6 +216,8 @@
struct request_data *driver_data = context->pDriverData;
struct object_context *context_object;
struct object_surface *surface_object;
+ struct request_pool_slot *slot;
+ int slot_index;
context_object = CONTEXT(driver_data, context_id);
if (context_object == NULL)
@@ -228,6 +230,31 @@
if (surface_object->status == VASurfaceRendering)
RequestSyncSurface(context, surface_id);
+ /*
+ * Borrow an OUTPUT (bitstream-input) slot from the driver-wide
+ * pool for the duration of this Begin/Render/End cycle. The
+ * surface's source_* fields hold the borrow's mmap pointer/size/
+ * V4L2 buffer index until RequestSyncSurface releases it after
+ * VIDIOC_DQBUF.
+ */
+ slot_index = request_pool_acquire(&driver_data->output_pool);
+ if (slot_index < 0)
+ return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+ slot = request_pool_slot(&driver_data->output_pool,
+ (unsigned int)slot_index);
+ if (slot == NULL) {
+ request_pool_release(&driver_data->output_pool,
+ (unsigned int)slot_index);
+ return VA_STATUS_ERROR_ALLOCATION_FAILED;
+ }
+
+ surface_object->source_index = slot->index;
+ surface_object->source_data = slot->data;
+ surface_object->source_size = slot->size;
+ surface_object->slices_size = 0;
+ surface_object->slices_count = 0;
+
surface_object->status = VASurfaceRendering;
context_object->render_surface_id = surface_id;
--- a/src/surface.c 2026-05-01 20:09:57.346428828 +0000
+++ b/src/surface.c 2026-05-01 20:19:35.490958060 +0000
@@ -254,10 +254,11 @@
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
- if (surface_object->source_data != NULL &&
- surface_object->source_size > 0)
- munmap(surface_object->source_data,
- surface_object->source_size);
+ /*
+ * source_* are now transient borrows from request_pool, not
+ * surface-owned mappings; the pool owns the underlying mmap.
+ * Nothing to free here.
+ */
for (j = 0; j < surface_object->destination_buffers_count; j++)
if (surface_object->destination_map[j] != NULL &&
@@ -336,6 +337,15 @@
goto error;
}
+ /*
+ * OUTPUT buffer is back from the kernel: return its pool slot
+ * for reuse and clear the surface's transient borrow handle.
+ */
+ request_pool_release(&driver_data->output_pool,
+ surface_object->source_index);
+ surface_object->source_data = NULL;
+ surface_object->source_size = 0;
+
rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, capture_type,
surface_object->destination_index,
surface_object->destination_buffers_count);
--- a/src/meson.build 2026-05-01 20:09:57.346428828 +0000
+++ b/src/meson.build 2026-05-01 20:20:04.775389455 +0000
@@ -44,6 +44,7 @@
'v4l2.c',
'mpeg2.c',
'h264.c',
+ 'request_pool.c',
# 'h265.c'
]
@@ -64,6 +65,7 @@
'v4l2.h',
'mpeg2.h',
'h264.h',
+ 'request_pool.h',
# 'h265.h'
]
--- a/src/request_pool.h 2025-09-03 18:38:22.431999998 +0000
+++ b/src/request_pool.h 2026-05-01 20:17:37.517219722 +0000
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
+ */
+
+#ifndef _REQUEST_POOL_H_
+#define _REQUEST_POOL_H_
+
+#include <stdbool.h>
+
+/*
+ * OUTPUT (bitstream-input) buffer pool, decoupled from caller-allocated
+ * VA surfaces. Sizing is driven by codec pipeline depth (typically 4
+ * for H.264), not by the consumer's surface count.
+ *
+ * The pool owns the V4L2 buffer indices and their mmap pointers. A
+ * decode request "borrows" a slot at vaBeginPicture, fills it across
+ * vaRenderPicture calls, queues it at vaEndPicture, and releases it
+ * after VIDIOC_DQBUF returns.
+ *
+ * This replaces the per-surface OUTPUT-buffer ownership model in the
+ * pre-refactor code, where object_surface.source_* fields permanently
+ * held a single OUTPUT buffer per surface — incorrect because OUTPUT
+ * buffers are request-time resources, not picture-time resources, and
+ * because the per-surface loop in RequestCreateContext only ran when
+ * surfaces_count > 0 (breaking ffmpeg's vaapi-copy num_render_targets=0
+ * convention).
+ */
+
+struct request_pool_slot {
+ unsigned int index; /* V4L2 buffer index in OUTPUT queue */
+ void *data; /* mmap pointer for this slot */
+ unsigned int size; /* mmap size in bytes */
+ bool busy; /* true while borrowed for a request */
+};
+
+struct request_pool {
+ struct request_pool_slot *slots;
+ unsigned int count;
+ unsigned int next; /* round-robin acquire cursor */
+ bool initialized;
+};
+
+/*
+ * Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
+ * each, populate pool->slots[]. Caller must have already done
+ * VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
+ * failure.
+ */
+int request_pool_init(struct request_pool *pool, int video_fd,
+ unsigned int output_type, unsigned int count);
+
+/*
+ * Munmap all slots and free the slots array. Idempotent.
+ */
+void request_pool_destroy(struct request_pool *pool);
+
+/*
+ * Claim the next free slot (round-robin). Returns the slot's V4L2
+ * buffer index on success (slot in pool->slots[] is determined by
+ * the returned index), or -1 if all slots are busy.
+ */
+int request_pool_acquire(struct request_pool *pool);
+
+/*
+ * Mark the slot at pool->slots[i] free for reuse. Caller must pass the
+ * V4L2 buffer index returned earlier from request_pool_acquire().
+ */
+void request_pool_release(struct request_pool *pool, unsigned int index);
+
+/*
+ * Look up the pool slot owning a given V4L2 buffer index. Returns
+ * pointer to the slot on success, NULL if the index is out of range.
+ * The returned pointer is valid until pool destruction; do not free.
+ */
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
+ unsigned int index);
+
+#endif
--- a/src/request_pool.c 2025-09-03 18:38:22.431999998 +0000
+++ b/src/request_pool.c 2026-05-01 20:17:37.537220017 +0000
@@ -0,0 +1,147 @@
+/*
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
+ */
+
+#include "request_pool.h"
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+
+#include "utils.h"
+#include "v4l2.h"
+
+int request_pool_init(struct request_pool *pool, int video_fd,
+ unsigned int output_type, unsigned int count)
+{
+ unsigned int index_base;
+ unsigned int length;
+ unsigned int offset;
+ unsigned int i;
+ int rc;
+
+ if (pool == NULL || count == 0)
+ return -1;
+
+ if (pool->initialized)
+ return 0;
+
+ pool->slots = calloc(count, sizeof(*pool->slots));
+ if (pool->slots == NULL)
+ return -1;
+
+ pool->count = count;
+ pool->next = 0;
+
+ rc = v4l2_create_buffers(video_fd, output_type, count, &index_base);
+ if (rc < 0)
+ goto error;
+
+ for (i = 0; i < count; i++) {
+ pool->slots[i].index = index_base + i;
+ pool->slots[i].busy = false;
+
+ rc = v4l2_query_buffer(video_fd, output_type,
+ pool->slots[i].index,
+ &length, &offset, 1);
+ if (rc < 0)
+ goto error;
+
+ pool->slots[i].data = mmap(NULL, length,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, video_fd, offset);
+ if (pool->slots[i].data == MAP_FAILED) {
+ pool->slots[i].data = NULL;
+ goto error;
+ }
+
+ pool->slots[i].size = length;
+ }
+
+ pool->initialized = true;
+ return 0;
+
+error:
+ request_pool_destroy(pool);
+ return -1;
+}
+
+void request_pool_destroy(struct request_pool *pool)
+{
+ unsigned int i;
+
+ if (pool == NULL || pool->slots == NULL)
+ return;
+
+ for (i = 0; i < pool->count; i++) {
+ if (pool->slots[i].data != NULL && pool->slots[i].size > 0)
+ munmap(pool->slots[i].data, pool->slots[i].size);
+ }
+
+ free(pool->slots);
+ pool->slots = NULL;
+ pool->count = 0;
+ pool->next = 0;
+ pool->initialized = false;
+}
+
+int request_pool_acquire(struct request_pool *pool)
+{
+ unsigned int start;
+ unsigned int i;
+
+ if (pool == NULL || !pool->initialized || pool->count == 0)
+ return -1;
+
+ start = pool->next;
+ for (i = 0; i < pool->count; i++) {
+ unsigned int slot = (start + i) % pool->count;
+
+ if (!pool->slots[slot].busy) {
+ pool->slots[slot].busy = true;
+ pool->next = (slot + 1) % pool->count;
+ return (int)pool->slots[slot].index;
+ }
+ }
+
+ /* All slots busy; caller must wait for an in-flight DQBUF. */
+ return -1;
+}
+
+void request_pool_release(struct request_pool *pool, unsigned int index)
+{
+ unsigned int i;
+
+ if (pool == NULL || pool->slots == NULL)
+ return;
+
+ for (i = 0; i < pool->count; i++) {
+ if (pool->slots[i].index == index) {
+ pool->slots[i].busy = false;
+ return;
+ }
+ }
+}
+
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
+ unsigned int index)
+{
+ unsigned int i;
+
+ if (pool == NULL || pool->slots == NULL)
+ return NULL;
+
+ for (i = 0; i < pool->count; i++) {
+ if (pool->slots[i].index == index)
+ return &pool->slots[i];
+ }
+
+ return NULL;
+}
@@ -0,0 +1,61 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:
V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
(slice_type == P || slice_type == SP)) ||
(pps->weighted_bipred_idc == 1 && slice_type == B)
Submitting PRED_WEIGHTS on a frame where the macro evaluates false
triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
6th, last control in the per-request batch) on hantro-vpu and any
other driver that strictly enforces the spec.
Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
weighted prediction): every per-frame batch fails identically, 13
EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
falls back to software decode for every frame.
Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
when the macro returns false; keep at 6 when it returns true.
Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
part of Sonnet's six-commit upstreamable plan; slotted in as patch
0005 ahead of the planned probe-then-set / FRAME_BASED commits
because it unblocks per-frame submission on every backing driver,
not just hantro.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c 2026-05-01 20:17:02.108697824 +0000
+++ b/src/h264.c 2026-05-01 20:30:02.632190563 +0000
@@ -559,8 +559,24 @@
}
};
+ /*
+ * PRED_WEIGHTS is conditionally required per kernel UAPI:
+ * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
+ * true when explicit weighted prediction applies (P/SP slice
+ * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
+ * == 1). Submitting it unconditionally on a frame that does
+ * not need it triggers EINVAL at error_idx=5 on hantro and
+ * other drivers that strictly enforce the spec.
+ *
+ * controls[5] is PRED_WEIGHTS (last in array); narrow the
+ * submission count to exclude it when not required.
+ */
+ unsigned int num_controls = 6;
+ if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+ num_controls = 5;
+
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
- controls, 6);
+ controls, num_controls);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,128 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] h264: omit per-slice controls in FRAME_BASED mode
Identified by cross-reference against GStreamer's
gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
PRED_WEIGHTS submission on is_slice_based(self):
if (is_slice_based (self)) {
control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
...
control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
...
}
In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
bitstream itself from the OUTPUT-queue payload; per-slice controls in
the request trigger cluster-validation EINVAL at error_idx=count
(observed on RK3568 hantro-vpu, kernel 6.19.10).
This patch:
- Reorders controls[] so FRAME_BASED-required entries come first
(SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
indices 4..5).
- Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
- Hardcodes slice_based=false for now since patch 0002 sets the
device to FRAME_BASED unconditionally. A TODO marks the spot for
the planned probe-then-set commit, which will populate
context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
G_EXT_CTRLS and replace the hardcoded false with a runtime check.
Diagnosis chain:
- patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
submission, but cluster-level rejection persisted at error_idx=5
(count) — meaning kernel walked all 5 controls cleanly but
rejected the request as a whole.
- dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
v4l2-h264.c), not in hantro driver where it could log.
- GStreamer reference confirmed FRAME_BASED contract: only 4
sequence-and-frame-level controls go in the per-request batch.
After this patch the kernel should accept the per-request controls
and actually decode the bitstream into the CAPTURE buffer.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c 2026-05-01 20:30:02.632190563 +0000
+++ b/src/h264.c 2026-05-01 20:49:46.937497317 +0000
@@ -531,6 +531,21 @@
sps.profile_idc = h264_profile_to_idc(profile);
+ /*
+ * Per-request control batch, ordered so the controls REQUIRED in
+ * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
+ * (indices 0..3) and the SLICE_BASED-only controls come last
+ * (indices 4..5).
+ *
+ * Cross-reference: GStreamer gst-plugins-bad
+ * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
+ * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
+ * is_slice_based(self); under FRAME_BASED only SPS/PPS/
+ * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
+ * parses the bitstream itself in FRAME_BASED mode; submitting
+ * per-slice controls in that mode triggers cluster-validation
+ * EINVAL at error_idx=count.
+ */
struct v4l2_ext_control controls[6] = {
{
.id = V4L2_CID_STATELESS_H264_SPS,
@@ -545,14 +560,14 @@
.p_h264_scaling_matrix = &matrix,
.size = sizeof(matrix),
}, {
- .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
- .p_h264_slice_params = &slice,
- .size = sizeof(slice),
- }, {
.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
.p_h264_decode_params = &decode,
.size = sizeof(decode),
}, {
+ .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
+ .p_h264_slice_params = &slice,
+ .size = sizeof(slice),
+ }, {
.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
.ptr = &weights,
.size = sizeof(weights),
@@ -560,20 +575,24 @@
};
/*
- * PRED_WEIGHTS is conditionally required per kernel UAPI:
- * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
- * true when explicit weighted prediction applies (P/SP slice
- * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
- * == 1). Submitting it unconditionally on a frame that does
- * not need it triggers EINVAL at error_idx=5 on hantro and
- * other drivers that strictly enforce the spec.
+ * Decode-mode dispatch. Patch 0002 unconditionally sets the
+ * device to FRAME_BASED, so we hardcode that here. When the
+ * planned probe-then-set commit lands, slice_based becomes
+ * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
+ * with context->decode_mode populated at CreateContext via
+ * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
*
- * controls[5] is PRED_WEIGHTS (last in array); narrow the
- * submission count to exclude it when not required.
+ * FRAME_BASED: 4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
+ * SLICE_BASED: +SLICE_PARAMS (always), +PRED_WEIGHTS (when
+ * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
*/
- unsigned int num_controls = 6;
- if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+ const bool slice_based = false; /* TODO: probe via context->decode_mode */
+ unsigned int num_controls = 4;
+ if (slice_based) {
num_controls = 5;
+ if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+ num_controls = 6;
+ }
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
controls, num_controls);
@@ -0,0 +1,59 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] context: enable ANNEX_B start-code emission to match device
Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the
device, telling the kernel that OUTPUT-buffer payloads will contain
0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has
the prepend logic guarded by `if (context->h264_start_code)`, but
that boolean is set ONLY inside h264_get_controls() — a function
that exists but is never called.
Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL
payloads with no start codes, kernel cannot find slice boundaries,
hantro emits a zeroed CAPTURE buffer. mpv reports successful decode
because the V4L2 round-trip succeeds (no EINVAL); the visual output
is a flat dark-green frame (NV12 zero through BT.709).
Identified via:
- Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on
bbb_1080p30) but visual output remained flat green.
- GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms
start codes are required when ANNEX_B is selected.
- Source-archaeology of fourier's picture.c:67-74 showed the gate
on context->h264_start_code.
Fix: in context.c::RequestCreateContext, immediately after patch
0002's device-control block, set context_object->h264_start_code =
true to match the ANNEX_B mode we just programmed. Hardcoded for
now (matches 0002's hardcoded set); replaced with a runtime probe
in the planned probe-then-set commit.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/context.c 2026-05-01 20:48:59.884816330 +0000
+++ b/src/context.c 2026-05-01 20:59:54.446340219 +0000
@@ -146,6 +146,23 @@
dev_ctrls, 2);
}
+ /*
+ * Mirror the ANNEX_B start-code mode set on the device above
+ * into context_object->h264_start_code so picture.c::
+ * codec_store_buffer prepends 0x00 0x00 0x01 to each slice
+ * payload it copies into the OUTPUT buffer. Without this, the
+ * kernel — which we just told to expect ANNEX_B — sees a raw
+ * NAL stream with no start codes, fails to find slice
+ * boundaries, and emits a zeroed CAPTURE buffer (visually a
+ * flat dark-green frame).
+ *
+ * h264_get_controls() exists for this purpose but is never
+ * called in the current code path; the planned probe-then-set
+ * commit will replace this hardcoded assignment with a runtime
+ * read of the kernel's accepted START_CODE value.
+ */
+ context_object->h264_start_code = true;
+
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
if (rc < 0) {
status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,87 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
Fourier's h264_va_picture_to_v4l2 only populated four fields of the
struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
IDR_PIC flag. Many other required-by-spec fields were left at zero-
init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).
For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
the bitstream from the OUTPUT buffer and uses these fields to drive
its bitstream-element offset tracking. Empirically the kernel
returned a successfully-decoded but ZEROED CAPTURE buffer — flat
dark-green frames in mpv output, no errors logged.
This patch fills every field VAAPI exposes:
- frame_num: from VAPicture->frame_num.
- FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
- BOTTOM_FIELD flag: from
VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.
Also corrects the IDR_PIC flag to use |= instead of = so the new
field flags don't clobber it.
Fields NOT derivable from VAAPI's pre-parsed structures —
idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle — require a slice_header() bit-level
parse. libva-v4l2-request does not currently do this. They remain
at zero-init.
Empirical question this patch answers: does hantro tolerate the
bit_size fields being zero for IDR frames, or does it strictly
require them? If post-patch CAPTURE is still zeroed, a slice-header
parser is required. If CAPTURE shows real picture data, hantro
fills in the bit-positions itself when no hint is supplied.
Cross-reference: gstv4l2codech264dec.c::
gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
lines 632-678).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c 2026-05-01 20:59:41.710154198 +0000
+++ b/src/h264.c 2026-05-01 21:16:35.712995986 +0000
@@ -243,13 +243,34 @@
h264_fill_dpb(driver_data, context, decode);
- //decode->num_slices = surface->slices_count;
+ /*
+ * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
+ * we can derive from VAAPI's pre-parsed VAPictureParameterBuffer
+ * + bitstream byte. Cross-reference: GStreamer
+ * gstv4l2codech264dec.c::gst_v4l2_codec_h264_dec_fill_decoder_params
+ * (lines 632-678).
+ *
+ * Fields not derivable from VAAPI (idr_pic_id, pic_order_cnt_lsb,
+ * delta_pic_order_cnt_*, dec_ref_pic_marking_bit_size,
+ * pic_order_cnt_bit_size, slice_group_change_cycle) require a
+ * full slice_header() bit-level parse, which libva-v4l2-request
+ * does not currently do. They are left at zero-init and the
+ * kernel-side hantro-vpu may compute them itself when scanning
+ * the OUTPUT bitstream — a hypothesis verified empirically by
+ * running this patch and inspecting the CAPTURE buffer.
+ */
decode->nal_ref_idc = nal_ref_idc;
- if (nal_unit_type == 5)
- decode->flags = V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
+ decode->frame_num = VAPicture->frame_num;
decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
+ if (nal_unit_type == 5)
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
+ if (VAPicture->pic_fields.bits.field_pic_flag)
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC;
+ if (VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD)
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD;
+
pps->weighted_bipred_idc =
VAPicture->pic_fields.bits.weighted_bipred_idc;
pps->pic_init_qs_minus26 = VAPicture->pic_init_qs_minus26;
@@ -0,0 +1,58 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] surface: don't VIDIOC_S_FMT the CAPTURE queue
The hantro stateless decoder derives the CAPTURE format from the
SPS attached to the per-request OUTPUT controls. Calling
VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
leave the driver's vb2 state in an inconsistent configuration
where the queue accepts buffers and DQBUF returns successfully but
the kernel never actually writes decoded pixels into them.
Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
(via gst_v4l2_decoder_negotiate_src_format and friends). The
same code path produces correctly-decoded NV12 frames on the
same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
emits flat-green zeroed CAPTURE buffers.
The v4l2_get_format() call immediately after this block already
gives us the bytesperline / sizes the driver chose; nothing else
in this file consumed the explicit S_FMT side-effects.
Empirical hypothesis test for the lingering "kernel decodes
without errors but emits zeroed CAPTURE" bug. If post-patch
output shows actual picture content, this confirms the
diagnosis: explicit CAPTURE format mutation breaks hantro's
internal state. If output remains flat-green, the bug is
elsewhere and we resume hex-dump-grade instrumentation.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/surface.c 2026-05-01 21:16:19.588759711 +0000
+++ b/src/surface.c 2026-05-01 21:41:12.095146549 +0000
@@ -118,10 +118,20 @@
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
- rc = v4l2_set_format(driver_data->video_fd, capture_type,
- video_format->v4l2_format, width, height);
- if (rc < 0)
- return VA_STATUS_ERROR_OPERATION_FAILED;
+ /*
+ * Do not VIDIOC_S_FMT on the CAPTURE queue. The hantro
+ * stateless decoder derives the CAPTURE format from the
+ * SPS attached to the OUTPUT request; explicitly setting
+ * it here can put the driver into an inconsistent state.
+ * GStreamer's v4l2slh264dec only G_FMTs CAPTURE (see
+ * gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c::
+ * gst_v4l2_decoder_negotiate_src_format), and that
+ * variant produces correct decoded NV12 on the same
+ * hardware where this driver currently emits zeros.
+ *
+ * v4l2_get_format() below queries the driver's current
+ * state and gives us the bytesperline/sizes we need.
+ */
} else {
video_format = driver_data->video_format;
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
@@ -0,0 +1,101 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
Diagnostic-only patch (NOT for upstream). Hex-dumps:
- First 32 bytes of OUTPUT buffer at QBUF time in
picture.c::RequestEndPicture (i.e. what we feed the kernel)
- First 32 bytes of CAPTURE Y-plane after DQBUF in
surface.c::RequestSyncSurface (i.e. what kernel returned)
Lets us see whether:
- OUTPUT bitstream begins with valid ANNEX_B start code + NAL
header byte (e.g. `00 00 01 65` for IDR slice)
- CAPTURE Y-plane after decode contains varied luma data
(working) vs. all-zeros / repeating pattern (kernel didn't
write anything).
Removed once Step 1 decode is verified working. Output goes via
existing request_log() to stderr.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/picture.c 2026-05-01 21:41:00.114969150 +0000
+++ b/src/picture.c 2026-05-01 21:50:11.123117853 +0000
@@ -36,6 +36,7 @@
#include "mpeg2.h"
#include <assert.h>
+#include <stdio.h>
#include <string.h>
#include <errno.h>
@@ -354,6 +355,27 @@
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
+ /*
+ * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
+ * OUTPUT buffer at the moment we hand it to the kernel. Helps
+ * pin down whether our bitstream prepend logic is correct.
+ * For a valid ANNEX_B IDR slice the dump should start
+ * 00 00 01 65 ... (00 00 01 = start code; 0x65 = nal_ref_idc=3,
+ * nal_unit_type=5 = IDR slice). Removed once Step 1 decode is
+ * verified working.
+ */
+ {
+ const unsigned char *p = surface_object->source_data;
+ char hex[32 * 3 + 1] = { 0 };
+ unsigned int i, n = surface_object->slices_size < 32 ?
+ surface_object->slices_size : 32;
+ for (i = 0; i < n; i++)
+ snprintf(hex + i * 3, 4, " %02x", p[i]);
+ request_log("OUTPUT[idx=%u, len=%u]:%s\n",
+ surface_object->source_index,
+ surface_object->slices_size, hex);
+ }
+
rc = v4l2_queue_buffer(driver_data->video_fd, request_fd, output_type,
&surface_object->timestamp,
surface_object->source_index,
--- a/src/surface.c 2026-05-01 21:41:12.095146549 +0000
+++ b/src/surface.c 2026-05-01 21:50:15.895188360 +0000
@@ -29,6 +29,7 @@
#include <assert.h>
#include <errno.h>
+#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
@@ -364,6 +365,30 @@
goto error;
}
+ /*
+ * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
+ * decoded CAPTURE Y-plane after DQBUF. If the kernel actually
+ * decoded the frame, these should reflect a real Y-luma pattern
+ * (varied bytes). All-zero or all-identical means no decode
+ * landed pixels in the buffer. Removed once Step 1 is verified.
+ */
+ {
+ const unsigned char *p =
+ (unsigned char *)surface_object->destination_map[0];
+ char hex[32 * 3 + 1] = { 0 };
+ unsigned int i;
+ if (p == NULL) {
+ request_log("CAPTURE[idx=%u, plane0]: (NULL)\n",
+ surface_object->destination_index);
+ } else {
+ for (i = 0; i < 32; i++)
+ snprintf(hex + i * 3, 4, " %02x", p[i]);
+ request_log("CAPTURE[idx=%u, plane0]:%s\n",
+ surface_object->destination_index,
+ hex);
+ }
+ }
+
surface_object->status = VASurfaceDisplaying;
status = VA_STATUS_SUCCESS;
@@ -0,0 +1,53 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] DEBUG: sentinel-pattern test for CAPTURE buffer write
Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
DQBUF reveals which case we're in:
- All 0xab → kernel never wrote to this buffer (wrong buffer
chosen, alias, or no decode actually happened despite
bytesused=3655712 reported).
- All zeros → kernel did write 0x00s (overwriting our
sentinel), and the apparent "no picture" output is the
kernel-side decode actually producing zeros (e.g. parser
rejected the bitstream).
- Mix of zeros and real luma values → kernel wrote real
decoded pixels; CPU read sees stale-cached zeros somewhere
OR the sentinel area was a header that decoder zeroed but
rest is real. Need to check more bytes.
- All 0xab still → kernel never touched this region but other
parts of buffer may be filled (incomplete decode).
Removed once Step 1 decode is verified.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/picture.c 2026-05-01 21:50:11.123117853 +0000
+++ b/src/picture.c 2026-05-01 22:20:20.589037667 +0000
@@ -349,6 +349,24 @@
if (rc != VA_STATUS_SUCCESS)
return rc;
+ /*
+ * DEBUG INSTRUMENTATION (0011): write a sentinel pattern into
+ * the CAPTURE buffer's first 32 bytes BEFORE QBUF. If after
+ * DQBUF the sentinel survives (per surface.c hex dump), the
+ * kernel never wrote to this buffer. If the sentinel is gone
+ * (replaced by zeros), the kernel did write but our CPU read
+ * sees stale-cached data — cache-coherency issue.
+ */
+ {
+ unsigned char *p = (unsigned char *)
+ surface_object->destination_map[0];
+ if (p != NULL) {
+ unsigned int i;
+ for (i = 0; i < 32; i++)
+ p[i] = 0xab;
+ }
+ }
+
rc = v4l2_queue_buffer(driver_data->video_fd, -1, capture_type, NULL,
surface_object->destination_index, 0,
surface_object->destination_buffers_count);
@@ -0,0 +1,214 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.
fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.
Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).
Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.
Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.
This patch:
- surface.h: adds bool matrix_set to params.h264, mirroring
mpeg2.iqmatrix_set / h265.iqmatrix_set.
- picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
sets matrix_set = true when the buffer arrives.
- picture.c RequestBeginPicture: resets matrix_set = false at the
start of each Begin/Render/End cycle.
- h264.c h264_set_controls: builds the controls[] array
incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."
Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/surface.h
+++ b/src/surface.h
@@ -73,6 +73,7 @@
} mpeg2;
struct {
VAIQMatrixBufferH264 matrix;
+ bool matrix_set;
VAPictureParameterBufferH264 picture;
VASliceParameterBufferH264 slice;
} h264;
--- a/src/picture.c
+++ b/src/picture.c
@@ -153,6 +153,7 @@
memcpy(&surface_object->params.h264.matrix,
buffer_object->data,
sizeof(surface_object->params.h264.matrix));
+ surface_object->params.h264.matrix_set = true;
break;
case VAProfileHEVCMain:
@@ -255,6 +256,7 @@
surface_object->source_size = slot->size;
surface_object->slices_size = 0;
surface_object->slices_count = 0;
+ surface_object->params.h264.matrix_set = false;
surface_object->status = VASurfaceRendering;
context_object->render_surface_id = surface_id;
--- a/src/h264.c
+++ b/src/h264.c
@@ -553,66 +553,68 @@
sps.profile_idc = h264_profile_to_idc(profile);
/*
- * Per-request control batch, ordered so the controls REQUIRED in
- * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
- * (indices 0..3) and the SLICE_BASED-only controls come last
- * (indices 4..5).
+ * Build the per-request control list incrementally:
+ * - SPS, PPS, DECODE_PARAMS: always required (in either decode
+ * mode).
+ * - SCALING_MATRIX: gated on surface->params.h264.matrix_set,
+ * i.e. the consumer sent a VAIQMatrixBufferH264 this frame.
+ * This matches the H.264 spec: explicit scaling lists are
+ * present iff sps_scaling_matrix_present_flag ||
+ * pps_scaling_matrix_present_flag, in which case VAAPI
+ * consumers send the matrix; otherwise the kernel uses
+ * spec-defined defaults. Independent of FRAME_BASED /
+ * SLICE_BASED.
+ * - SLICE_PARAMS: SLICE_BASED only. Kernel doc
+ * ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
+ * "When this mode is selected, the
+ * V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be
+ * set." Submitting it under FRAME_BASED triggers cluster-
+ * validation EINVAL at error_idx=count.
+ * - PRED_WEIGHTS: SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
*
- * Cross-reference: GStreamer gst-plugins-bad
- * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
- * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
- * is_slice_based(self); under FRAME_BASED only SPS/PPS/
- * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
- * parses the bitstream itself in FRAME_BASED mode; submitting
- * per-slice controls in that mode triggers cluster-validation
- * EINVAL at error_idx=count.
- */
- struct v4l2_ext_control controls[6] = {
- {
- .id = V4L2_CID_STATELESS_H264_SPS,
- .p_h264_sps = &sps,
- .size = sizeof(sps),
- }, {
- .id = V4L2_CID_STATELESS_H264_PPS,
- .p_h264_pps = &pps,
- .size = sizeof(pps),
- }, {
- .id = V4L2_CID_STATELESS_H264_SCALING_MATRIX,
- .p_h264_scaling_matrix = &matrix,
- .size = sizeof(matrix),
- }, {
- .id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
- .p_h264_decode_params = &decode,
- .size = sizeof(decode),
- }, {
- .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
- .p_h264_slice_params = &slice,
- .size = sizeof(slice),
- }, {
- .id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
- .ptr = &weights,
- .size = sizeof(weights),
- }
- };
-
- /*
- * Decode-mode dispatch. Patch 0002 unconditionally sets the
- * device to FRAME_BASED, so we hardcode that here. When the
- * planned probe-then-set commit lands, slice_based becomes
- * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
- * with context->decode_mode populated at CreateContext via
- * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
- *
- * FRAME_BASED: 4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
- * SLICE_BASED: +SLICE_PARAMS (always), +PRED_WEIGHTS (when
- * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
+ * Patch 0002 unconditionally sets the device to FRAME_BASED,
+ * so slice_based is hardcoded false here. When the planned
+ * probe-then-set commit lands, this becomes
+ * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED.
*/
+ struct v4l2_ext_control controls[6] = { 0 };
+ unsigned int num_controls = 0;
const bool slice_based = false; /* TODO: probe via context->decode_mode */
- unsigned int num_controls = 4;
+
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SPS;
+ controls[num_controls].p_h264_sps = &sps;
+ controls[num_controls].size = sizeof(sps);
+ num_controls++;
+
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_PPS;
+ controls[num_controls].p_h264_pps = &pps;
+ controls[num_controls].size = sizeof(pps);
+ num_controls++;
+
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_DECODE_PARAMS;
+ controls[num_controls].p_h264_decode_params = &decode;
+ controls[num_controls].size = sizeof(decode);
+ num_controls++;
+
+ if (surface->params.h264.matrix_set) {
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
+ controls[num_controls].p_h264_scaling_matrix = &matrix;
+ controls[num_controls].size = sizeof(matrix);
+ num_controls++;
+ }
+
if (slice_based) {
- num_controls = 5;
- if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
- num_controls = 6;
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
+ controls[num_controls].p_h264_slice_params = &slice;
+ controls[num_controls].size = sizeof(slice);
+ num_controls++;
+
+ if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice)) {
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
+ controls[num_controls].ptr = &weights;
+ controls[num_controls].size = sizeof(weights);
+ num_controls++;
+ }
}
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
@@ -0,0 +1,86 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: hardcode SPS level_idc = 51 (intentional over-allocation)
fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
field stays at zero-init. level_idc=0 is invalid per the H.264 spec
(lowest legal value is 10, Level 1.0). Hantro and other stateless
H.264 decoders use level_idc to pre-allocate decoder resources (DPB
size, motion-vector buffers); when fed an invalid level the hantro
kernel driver silently skips the decode-hardware dispatch — the V4L2
request completes with no error, DQBUF returns the CAPTURE buffer
reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
buffer is never written.
VAAPI's decode-side VAPictureParameterBufferH264 structurally does
NOT include level_idc — `grep level_idc va/va.h` returns only hits
inside VAEncSequenceParameterBufferH264 (the encode path). The
H.264 SPS NAL is also not included in VASliceDataBuffer because
ffmpeg-vaapi parses it client-side and forwards only slice data
(verified empirically via patch 0010's hex-dump of the OUTPUT
buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
therefore not viable from the bitstream libva-v4l2-request
receives.
Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
and 4K@30 mainstream consumer profiles). This INTENTIONALLY
OVER-ALLOCATES decoder resources but is sufficient for any stream
up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
stream (Level 6.x) would under-allocate.
This patch is a known-incomplete intermediate, not a final fix.
The proper upstreamable answer is a level-from-resolution
derivation per H.264 Annex A.3 (max MB rate / max frame size
thresholds). That requires mapping consumer-side framerate which
VAAPI does not expose, so the lookup table is non-trivial. The
TODO is captured inline.
This patch's goal is unblocking decode-hardware engagement on the
ohm_gl_fix corpus while the full level-derivation work proceeds.
Cross-reference: kernel doc
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
level_idc as a required field with no "kernel-derives" annotation —
i.e., userspace-required.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c
+++ b/src/h264.c
@@ -553,6 +553,35 @@
sps.profile_idc = h264_profile_to_idc(profile);
/*
+ * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
+ * level_idc — see va.h, the field exists only in
+ * VAEncSequenceParameterBufferH264 on the encode path. The H.264
+ * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
+ * parses it client-side and forwards only slice data), so a
+ * SPS-NAL byte extractor is not viable from the bitstream we
+ * receive.
+ *
+ * Hantro and other stateless H.264 decoders use level_idc to
+ * pre-allocate decoder resources (DPB, motion-vector buffers); a
+ * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
+ * 1.0) and causes hantro to silently skip the decode hardware
+ * dispatch.
+ *
+ * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
+ * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
+ * decoder resources and is sufficient for any stream up to 4K@30.
+ * It is corpus-correct, not contract-correct.
+ *
+ * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
+ * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
+ * thresholds. That is a small lookup table but requires also
+ * mapping the consumer's framerate, which VAAPI doesn't provide
+ * directly. For now the over-allocation is the upstreamable
+ * compromise.
+ */
+ sps.level_idc = 51;
+
+ /*
* Build the per-request control list incrementally:
* - SPS, PPS, DECODE_PARAMS: always required (in either decode
* mode).
@@ -0,0 +1,97 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-01
Subject: [PATCH] DEBUG: dump VAPictureH264 raw bytes + decoded fields
Diagnostic-only. Investigating the observed anomaly:
- V4L2 strace shows decode_params.top_field_order_cnt = 65536
on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
- GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
- Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
va_pic->TopFieldOrderCnt = 0;
if (pic->field_poc[0] != INT_MAX)
va_pic->TopFieldOrderCnt = pic->field_poc[0];
For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
= 0 + 0 = 0. So FFmpeg should write 0.
If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
libva ABI between ffmpeg's writer and our reader. Most likely
suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
libva headers ffmpeg+libva were built against and the headers
fourier was built against, shifting struct field offsets.
This patch dumps:
1. sizeof(VAPictureH264) at our reader's view
2. First 32 raw bytes of VAPicture->CurrPic
3. Field-decoded values via the .picture_id, .frame_idx, .flags,
.TopFieldOrderCnt, .BottomFieldOrderCnt accessors
If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
field offset is correct and FFmpeg actually wrote 65536 — meaning
either FFmpeg has a bug, or our test scenario triggers a non-spec
code path. If the raw bytes show 00 00 00 00 at offset 12 but
TopFieldOrderCnt accessor returns 65536, the struct ABI is
mismatched and we need to reconcile libva versions.
If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
+ 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
on this build differs from the documented libva-2.x layout.
Removed once the source of the 65536 is identified.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c 2026-05-01 22:56:42.656744048 +0000
+++ b/src/h264.c 2026-05-02 00:00:00.000000000 +0000
@@ -28,6 +28,7 @@
#include <assert.h>
#include <limits.h>
#include <string.h>
+#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
@@ -259,6 +259,42 @@
* the OUTPUT bitstream — a hypothesis verified empirically by
* running this patch and inspecting the CAPTURE buffer.
*/
+ /*
+ * DEBUG INSTRUMENTATION (0014): dump the raw bytes of
+ * VAPicture->CurrPic plus sizeof(VAPictureH264) so we can
+ * tell whether the observed TopFieldOrderCnt=65536 anomaly is
+ * (a) at the documented byte-offset 12 (ffmpeg-side bug or
+ * intentional non-spec encoding) or
+ * (b) at a different offset (libva ABI / VA_PADDING_LOW
+ * mismatch between ffmpeg's writer and our reader).
+ *
+ * Documented VAPictureH264 layout (libva-2.x):
+ * offset 0: VASurfaceID picture_id (uint32)
+ * offset 4: uint32 frame_idx
+ * offset 8: uint32 flags
+ * offset 12: int32 TopFieldOrderCnt
+ * offset 16: int32 BottomFieldOrderCnt
+ * offset 20+: uint32 va_reserved[VA_PADDING_LOW]
+ */
+ {
+ const unsigned char *cp = (const unsigned char *)&VAPicture->CurrPic;
+ char hex[32 * 3 + 1] = { 0 };
+ unsigned int i;
+ for (i = 0; i < 32; i++)
+ snprintf(hex + i * 3, 4, " %02x", cp[i]);
+ request_log("VAPictureH264 sizeof=%zu CurrPic[0..31]:%s\n",
+ sizeof(VAPictureH264), hex);
+ request_log("VAPictureH264 CurrPic field reads: "
+ "picture_id=0x%08x frame_idx=%u flags=0x%x "
+ "TopFOC=%d BottomFOC=%d frame_num=%u\n",
+ (unsigned)VAPicture->CurrPic.picture_id,
+ (unsigned)VAPicture->CurrPic.frame_idx,
+ (unsigned)VAPicture->CurrPic.flags,
+ (int)VAPicture->CurrPic.TopFieldOrderCnt,
+ (int)VAPicture->CurrPic.BottomFieldOrderCnt,
+ (unsigned)VAPicture->frame_num);
+ }
+
decode->nal_ref_idc = nal_ref_idc;
decode->frame_num = VAPicture->frame_num;
decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
@@ -0,0 +1,150 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
ROOT CAUSE for "kernel decodes successfully but produces zeroed
CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":
ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
0x10000 as a sentinel for "uninitialised":
libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
libavcodec/h264dec.c:444 — IDR reset in idr() helper
ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
pc->poc_msb = pc->prev_poc_msb whenever the slice header's
pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
is the typical case for any normal H.264 content with sane POC
ordering). The sentinel leaks into field_poc[] (line 305) and from
there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).
Empirical confirmation via meitner 2026-05-02 ground-truth test:
ran an LD_PRELOAD shim around vaCreateBuffer against an i965
VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
showed TopFieldOrderCnt = (POC | 0x10000):
Frame 1 IDR: raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
Frame 2: raw bytes "06 00 01 00" → TopFOC=65542
Frame 3: "02 00 01 00" → TopFOC=65538
i965 successfully decodes regardless. V4L2 stateless drivers
(hantro_h264.c::prepare_table feeds the value direct to
tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
for cur_pic_order_count comparison) cannot tolerate the high word —
the kernel's resource sizing math sees POC=65536 for an IDR and
breaks.
This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
inline in src/h264.c. It detects bit 16 set rather than blindly
subtracting, so a future ffmpeg version that fixes the leak
degrades gracefully. The helper is applied at all four POC sites:
1. h264_fill_dpb: dpb->top_field_order_cnt
2. h264_fill_dpb: dpb->bottom_field_order_cnt
3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt
VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
sets POC=0 there; the sentinel never applies. Zeroing them
explicitly removes a class of "stale POC value in invalidated
slot" foot-guns.
Non-trivial follow-ups identified during the meitner experiment
that are NOT addressed by this patch:
- PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
not yet derived from VASliceParameterBufferH264.slice_type. The
bbb corpus is I-only at the start so this hasn't been a
blocker, but a clip with B-frames will need the slice-type
routing patch.
- h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
almost certainly wrong per the kernel doc — pic_num must equal
the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
id. Out of scope here; will surface as a defect on streams
that have multi-frame DPB lookups.
Cross-references:
audit_0008_decode_params_2026-05-01.md — kernel-side consumer
audit confirming POC fields are userspace-required.
api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
semantics; H.264 spec section 8.2.1 is the binding contract.
meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
capture of the sentinel pattern across 60 frames.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c
+++ b/src/h264.c
@@ -187,6 +187,43 @@
}
}
+/*
+ * Strip ffmpeg-vaapi's POC sentinel.
+ *
+ * ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
+ * 0x10000 in libavcodec/h264dec.c (lines 301 and 444 of v8.0). After
+ * an IDR the idr() helper resets prev_poc_msb to that same sentinel.
+ * ff_h264_init_poc (libavcodec/h264_parse.c lines 296-305) then
+ * computes pc->poc_msb as prev_poc_msb when the slice header's
+ * poc_lsb hasn't wrapped — which is the typical case for normal
+ * content. The sentinel leaks into field_poc[] and from there into
+ * VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
+ * libavcodec/vaapi_h264.c::fill_vaapi_pic.
+ *
+ * Working VAAPI backends (intel-iHD, i965 verified empirically on
+ * meitner 2026-05-02) tolerate the high word — they either mask it
+ * or treat POCs as relative comparisons. V4L2 stateless H.264
+ * driver-side consumers (hantro_h264.c::prepare_table feeds the
+ * value direct to tbl->poc[]) need the spec value, so we strip the
+ * sentinel here at the libva-v4l2-request boundary.
+ *
+ * Detection by bit-16-set rather than blind subtraction so that a
+ * future ffmpeg version that fixes the sentinel leak degrades
+ * gracefully. POC values for non-degenerate H.264 content rarely
+ * exceed 16 bits; bit 16 set is a strong signal of the sentinel.
+ *
+ * Empty DPB slots (VA_PICTURE_H264_INVALID) carry POC=0 by
+ * libavcodec/vaapi_h264.c::init_vaapi_pic and need no fix-up.
+ */
+static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags)
+{
+ if (flags & VA_PICTURE_H264_INVALID)
+ return 0;
+ if (poc & (1 << 16))
+ return poc - (1 << 16);
+ return poc;
+}
+
static void h264_fill_dpb(struct request_data *data,
struct object_context *context,
struct v4l2_ctrl_h264_decode_params *decode)
@@ -210,8 +247,12 @@
dpb->frame_num = entry->pic.frame_idx;
dpb->pic_num = entry->pic.picture_id;
- dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
- dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;
+ dpb->top_field_order_cnt =
+ h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
+ entry->pic.flags);
+ dpb->bottom_field_order_cnt =
+ h264_strip_ffmpeg_poc_sentinel(entry->pic.BottomFieldOrderCnt,
+ entry->pic.flags);
dpb->flags = V4L2_H264_DPB_ENTRY_FLAG_VALID;
@@ -298,8 +339,12 @@
decode->nal_ref_idc = nal_ref_idc;
decode->frame_num = VAPicture->frame_num;
- decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
- decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
+ decode->top_field_order_cnt =
+ h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.TopFieldOrderCnt,
+ VAPicture->CurrPic.flags);
+ decode->bottom_field_order_cnt =
+ h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.BottomFieldOrderCnt,
+ VAPicture->CurrPic.flags);
if (nal_unit_type == 5)
decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
@@ -0,0 +1,82 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: derive PFRAME / BFRAME flags from VASlice slice_type
v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
request relied on each backing driver tolerating frame-class
ambiguity.
Kernel survey (linux 6.19.x):
- tegra-vde/h264.c (lines 783-799) consumes both flags to select
the inter-frame decode kernel. Without them the I-frame kernel
runs on P/B content.
- visl-trace-h264.h uses them for decode tracing.
- hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
not consume the flags.
Hantro on ohm decoded bbb cleanly without these flags set (see
phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
an upstreamability fix for cross-driver portability rather than a
correctness fix for hantro.
VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
H.264 slice_header() slice_type field. Per spec 7.4.3:
0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
this slice_type." `slice_type % 5` recovers the underlying type
in either encoding form.
In FRAME_BASED mode we only see surface->params.h264.slice from the
most-recent VASliceParameterBuffer — that's fine: a single coded
picture has a uniform slice_type for the purposes of the PFRAME /
BFRAME flag (multi-slice frames may mix slice types in some streams,
but the flag's semantic is "this is an inter-coded frame," which
holds if any slice is P or B; using the last-seen slice's type is
a reasonable approximation).
Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
Flags table.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c
+++ b/src/h264.c
@@ -587,6 +587,38 @@
&surface->params.h264.slice,
&surface->params.h264.picture, &slice, &weights);
+ /*
+ * Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
+ * from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
+ * matches the H.264 spec slice_type semantic: 0=P, 1=B, 2=I, 3=SP,
+ * 4=SI; values 5..9 mean "all slices in the picture have this
+ * slice_type" (mod 5 yields the underlying type). VAAPI consumers
+ * (ffmpeg, mpv) populate this for every slice; in FRAME_BASED mode
+ * we only see the most-recent slice's params, but slice_type is
+ * uniform across a single coded picture for our purposes.
+ *
+ * Kernel consumers that read these flags: tegra-vde
+ * (drivers/media/platform/nvidia/tegra-vde/h264.c lines 783-799 of
+ * 6.19.x) selects the inter-frame decode kernel. Hantro / rkvdec /
+ * cedrus / mediatek / qcom-iris-stateless do not consume them.
+ * Setting them keeps the libva-v4l2-request fork upstreamable
+ * across drivers without affecting hantro behaviour.
+ *
+ * Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
+ * Flags — V4L2_H264_DECODE_PARAM_FLAG_PFRAME / _BFRAME.
+ */
+ switch (surface->params.h264.slice.slice_type % 5) {
+ case H264_SLICE_P:
+ decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_PFRAME;
+ break;
+ case H264_SLICE_B:
+ decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_BFRAME;
+ break;
+ default:
+ /* I / SP / SI: no extra flag. */
+ break;
+ }
+
sps.profile_idc = h264_profile_to_idc(profile);
/*
@@ -0,0 +1,124 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id
fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
(equation 8-28) for short-term references or LongTermPicNum
(equation 8-29) for long-term references. The surface id has no
relationship to either.
Kernel-side consumers of pic_num:
- mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
dst_entry->pic_num = src_entry->pic_num. Used for
field-coded short-term reference disambiguation.
- hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
pic_num. They resolve refs via reference_ts (timestamp)
and POC. This is why fourier's wrong value never surfaced
on RK3568 hantro.
This patch makes pic_num spec-correct so the libva-v4l2-request
fork is upstreamable across drivers without depending on each
target's tolerance for non-spec fills.
Computation, derived from H.264 spec section 8.2.4.1:
For frames (not field-coded), PicNum = FrameNumWrap.
FrameNumWrap = (frame_num > cur_frame_num)
? frame_num - max_frame_num
: frame_num
max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
cur_frame_num = current picture's frame_num
For long-term references:
LongTermPicNum = long_term_frame_idx (when not field-coded).
VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
So long-term refs already carry long_term_frame_idx in frame_idx;
we copy it through.
Field-coded streams require an extra factor-of-2 plus a parity
adjustment per spec equations 8-28/8-29; this patch does not handle
field-coded content. ohm corpus is all frame-coded so this is a
follow-up for later.
Implementation: add VAPicture parameter to h264_fill_dpb so the
function has access to seq_fields.log2_max_frame_num_minus4 and
the current picture's frame_num. Update the single caller in
h264_va_picture_to_v4l2.
Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
line 210.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c
+++ b/src/h264.c
@@ -226,8 +226,12 @@
static void h264_fill_dpb(struct request_data *data,
struct object_context *context,
+ VAPictureParameterBufferH264 *VAPicture,
struct v4l2_ctrl_h264_decode_params *decode)
{
+ const int max_frame_num =
+ 1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
+ const int cur_frame_num = (int)VAPicture->frame_num;
int i;
for (i = 0; i < H264_DPB_SIZE; i++) {
@@ -246,7 +250,41 @@
}
dpb->frame_num = entry->pic.frame_idx;
- dpb->pic_num = entry->pic.picture_id;
+
+ /*
+ * Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
+ * equal the H.264 spec's PicNum (8-28) for short-term refs
+ * or LongTermPicNum (8-29) for long-term refs.
+ *
+ * For frames (not field-coded), PicNum = FrameNumWrap.
+ * FrameNumWrap = (frame_num > cur_frame_num)
+ * ? frame_num - max_frame_num
+ * : frame_num
+ * (per spec section 8.2.4.1, frame_num wraparound).
+ *
+ * VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
+ * line 64): VAPictureH264.frame_idx holds long_term_frame_idx
+ * for long-term refs and frame_num for short-term refs. So
+ * for long-term entries we copy frame_idx straight through
+ * as LongTermPicNum.
+ *
+ * fourier's previous code set pic_num to picture_id (the
+ * VAAPI surface id) which is unrelated to H.264 PicNum;
+ * mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
+ * one consumer that fails on that. Hantro doesn't read
+ * pic_num at all (uses reference_ts for ref resolution),
+ * which is why fourier's wrong value never surfaced on
+ * RK3568.
+ */
+ if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
+ dpb->pic_num = entry->pic.frame_idx;
+ } else {
+ int frame_num = (int)entry->pic.frame_idx;
+ dpb->pic_num = (frame_num > cur_frame_num)
+ ? frame_num - max_frame_num
+ : frame_num;
+ }
+
dpb->top_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
entry->pic.flags);
@@ -283,7 +321,7 @@
nal_ref_idc = (b[0] >> 5) & 0x3;
nal_unit_type = b[0] & 0x1f;
- h264_fill_dpb(driver_data, context, decode);
+ h264_fill_dpb(driver_data, context, VAPicture, decode);
/*
* Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
@@ -0,0 +1,159 @@
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: 2026-05-02
Subject: [PATCH] h264: derive sps.level_idc from H.264 Annex A.3 MaxFS
Replaces patch 0013's hardcoded level_idc = 51 with a small lookup
that picks the smallest level whose MaxFS contains the encoded
frame size. Patch 0013's TODO is resolved by this change.
VAAPI does not expose level_idc on the decode side
(VAPictureParameterBufferH264 has no such field; only
VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is
parsed client-side by ffmpeg-vaapi and only slice data forwards in
VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the
bitstream the libva-v4l2-request layer receives. We therefore
derive level_idc from picture dimensions, which VAAPI does provide
in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1.
Annex A.3 (Table A-1) MaxFS thresholds:
Level 1.0: 99 MBs ( 176×144 = 11×9 = 99 )
Level 1.1: 396 ( 352×288 = 22×18 = 396 )
Level 2.0: 396
Level 2.1: 792 ( 352×576 / 720×288 )
Level 2.2: 1620 ( 720×480 ≈ 1350; 720×576 = 1620 )
Level 3.0: 1620
Level 3.1: 3600 (1280×720 ≈ 3600 )
Level 3.2: 5120
Level 4.0: 8192 (1920×1088 = 8160 fits )
Level 4.1: 8192
Level 4.2: 8704
Level 5.0: 22080
Level 5.1: 36864 (3840×2176 = 32640 fits; 4K@8K-edge )
Level 5.2: 36864
Level 6.0: 139264 (8K )
V4L2 control encoding: level_idc = (level major × 10) + (level minor).
Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
Picks for typical content:
1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41)
4K (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51)
8K (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60)
The previous hardcode of 51 was over-allocating for 1080p; with
this patch hantro can pre-allocate based on the actual frame size.
For our ohm corpus (1080p) this drops the requested DPB / MV
buffer sizing from level-5.1 generosity to level-4.1 right-sized.
Without VAAPI exposing framerate we cannot also check MaxMBPS /
MaxBR / MaxCPB. The frame-size-based pick is acceptable in
practice: temporally-dense streams almost always also push
spatially-large frames, so MaxFS captures the dominant
resource-sizing signal.
Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits").
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
level_idc as required-userspace-input, no kernel-derives annotation.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
--- a/src/h264.c
+++ b/src/h264.c
@@ -638,6 +638,55 @@
}
}
+/*
+ * Derive sps.level_idc from the encoded frame size in macroblocks per
+ * H.264 Annex A.3 (Table A-1) MaxFS thresholds. Each level's MaxFS is
+ * the maximum encoded frame size in MBs the level supports; we pick
+ * the smallest level whose MaxFS contains the actual frame size.
+ *
+ * Level decoding for the V4L2 control: level_idc = level * 10
+ * Level 1.0 → 10, Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
+ *
+ * VAAPI does not expose the bitstream's actual level_idc on the
+ * decode side (VAPictureParameterBufferH264 has no such field) — see
+ * va.h. The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi /
+ * mpv and only slice data is forwarded in VASliceDataBuffer, so a
+ * SPS-NAL byte parser is not viable at this layer.
+ *
+ * Without framerate we cannot also check MaxMBPS / MaxBR / MaxCPB.
+ * That gap is acceptable in practice: consumers that push
+ * temporally-dense streams (high MBPS) almost always also push
+ * spatially-large frames (high MaxFS), so frame-size-based level
+ * selection over-allocates on the temporal axis but never
+ * under-allocates a level the consumer relies on for correct
+ * decode-resource sizing.
+ *
+ * Picks for typical content:
+ * 1080p (8160 MBs) → Level 4.1 (level_idc = 41)
+ * 4K (32400 MBs) → Level 5.1 (level_idc = 51)
+ * 8K (138240 MBs) → Level 6.0 (level_idc = 60)
+ *
+ * Replaces the hardcoded level_idc=51 from patch 0013.
+ */
+static inline __u8 h264_derive_level_idc(unsigned int width_in_mbs,
+ unsigned int height_in_mbs)
+{
+ const unsigned int frame_size_mbs = width_in_mbs * height_in_mbs;
+
+ if (frame_size_mbs <= 99) return 10; /* Level 1.0 */
+ if (frame_size_mbs <= 396) return 11; /* Level 1.1 - 2.0 */
+ if (frame_size_mbs <= 792) return 21; /* Level 2.1 */
+ if (frame_size_mbs <= 1620) return 22; /* Level 2.2 - 3.0 */
+ if (frame_size_mbs <= 3600) return 31; /* Level 3.1 */
+ if (frame_size_mbs <= 5120) return 32; /* Level 3.2 */
+ if (frame_size_mbs <= 8192) return 41; /* Level 4.0 - 4.1 */
+ if (frame_size_mbs <= 8704) return 42; /* Level 4.2 */
+ if (frame_size_mbs <= 22080) return 50; /* Level 5.0 */
+ if (frame_size_mbs <= 36864) return 51; /* Level 5.1 - 5.2 */
+ if (frame_size_mbs <= 139264) return 60; /* Level 6.0 - 6.2 */
+ return 62; /* > Level 6 ceiling */
+}
+
int h264_set_controls(struct request_data *driver_data,
struct object_context *context,
VAProfile profile,
@@ -705,33 +754,15 @@
sps.profile_idc = h264_profile_to_idc(profile);
/*
- * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
- * level_idc — see va.h, the field exists only in
- * VAEncSequenceParameterBufferH264 on the encode path. The H.264
- * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
- * parses it client-side and forwards only slice data), so a
- * SPS-NAL byte extractor is not viable from the bitstream we
- * receive.
- *
- * Hantro and other stateless H.264 decoders use level_idc to
- * pre-allocate decoder resources (DPB, motion-vector buffers); a
- * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
- * 1.0) and causes hantro to silently skip the decode hardware
- * dispatch.
- *
- * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
- * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
- * decoder resources and is sufficient for any stream up to 4K@30.
- * It is corpus-correct, not contract-correct.
- *
- * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
- * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
- * thresholds. That is a small lookup table but requires also
- * mapping the consumer's framerate, which VAAPI doesn't provide
- * directly. For now the over-allocation is the upstreamable
- * compromise.
+ * Derive level_idc from encoded frame size per H.264 Annex A.3.
+ * VAAPI doesn't expose level_idc on the decode side (see
+ * h264_derive_level_idc()'s docblock for the rationale); we pick
+ * the smallest level whose MaxFS contains the picture dimensions.
+ * Replaces patch 0013's intermediate hardcode of 51.
*/
- sps.level_idc = 51;
+ sps.level_idc = h264_derive_level_idc(
+ (unsigned int)surface->params.h264.picture.picture_width_in_mbs_minus1 + 1u,
+ (unsigned int)surface->params.h264.picture.picture_height_in_mbs_minus1 + 1u);
/*
* Build the per-request control list incrementally:
+262
View File
@@ -0,0 +1,262 @@
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
# Campaign: ohm_gl_fix Phase 6 Step 1
#
# DEPRECATED (2026-05-16): superseded by ../libva-v4l2-request-fourier/
# which tracks the campaign fork's git history directly and adds the
# iter38 multi-device probe (single libva session for rkvdec H.264/HEVC/VP9
# + hantro MPEG-2/VP8). The successor declares
# replaces=('libva-v4l2-request-ohm-gl-fix'), so installing it removes
# this package automatically. See README.md for the full deprecation note.
#
# Forks libva-v4l2-request to add hantro-vpu multiplanar + modern
# stateless UAPI support. Conflicts/replaces stock libva-v4l2-request.
#
# Build target: fermi LXD on hertz (Arch ARM aarch64) via marfrit-packages
# Gitea Actions; alternative: boltzmann via his subagent.
pkgname=libva-v4l2-request-ohm-gl-fix
_upstreampkg=libva-v4l2-request
pkgver=1.0.0.r0.ga3c2476
pkgrel=2
pkgdesc="DEPRECATED — use libva-v4l2-request-fourier. VA-API backend for V4L2 stateless decoders, hantro-vpu multiplanar fork"
arch=('aarch64')
url="https://github.com/bootlin/libva-v4l2-request"
license=('LGPL2.1' 'MIT')
depends=('libva' 'libdrm' 'systemd-libs')
makedepends=('meson' 'ninja' 'pkgconf' 'git')
provides=("${_upstreampkg}=${pkgver}" 'libva-driver')
conflicts=("${_upstreampkg}")
replaces=("${_upstreampkg}")
# Bootlin upstream tarball — pinned to last meaningful commit 2019-05-17.
# Use full SHA: github extracts the archive to <repo>-<full-sha>/, so the
# short form would mismatch ${srcdir}/${_upstreampkg}-${_commit}.
_commit=a3c2476de19e6635458273ceeaeceff124fabd63
source=(
"${_upstreampkg}-${_commit}.tar.gz::https://github.com/bootlin/libva-v4l2-request/archive/${_commit}.tar.gz"
"fourier-local.patch"
"0001-mplane-multiplanar-port.patch"
"0002-pre-streamon-controls-and-output-pool.patch"
"0003-v4l2-query-helpers.patch"
"0004-context-request-pool.patch"
"0005-h264-conditional-pred-weights.patch"
"0006-h264-frame-based-omit-slice-controls.patch"
"0007-context-h264-start-code-annex-b.patch"
"0008-h264-fill-decode-params-from-vaapi.patch"
"0009-surface-no-capture-sfmt.patch"
"0010-DEBUG-hex-dump-output-capture.patch"
"0011-DEBUG-sentinel-capture-buffer.patch"
"0012-h264-omit-scaling-matrix-frame-based.patch"
"0013-h264-sps-level-idc.patch"
"0014-DEBUG-vapic-bytes-dump.patch"
"0015-h264-strip-ffmpeg-poc-sentinel.patch"
"0016-h264-derive-pframe-bframe-flags.patch"
"0017-h264-dpb-picnum-correctness.patch"
"0018-h264-level-idc-from-annex-a3.patch"
)
sha256sums=(
'92b523050561d64f7b6016edb53ca00524805f9f31a8b566baf457bbb15716fa'
'1577ff1e2fd7944d2af85bba07658c26b2c54787175c2cc9024174ad2425d3ac'
'7637c8c76f86a4b745516cdfd1ee89484b7fe7ce88425ff38460bd4494e1451e'
'5309582759f260456b15635be610c2fe6fe25cbdf427cf8ad851f74991dc8c6e'
'4e38eacc2b2dc26094cbad38964e8dc8bec19d2ad408a37a3ee21952003e6c38'
'e5b61965921093292912136ae21727c9c792d0417201d86dc90b2e622f1edbb0'
'c7a8e02f2e84c6248586d1ceacf25f4c26578f2f365044c3b4a011080ec016e8'
'5ea1f8b193a3cba21631b00ac3d9cb8c6016754f0af47b33fcccce9e0114b32a'
'092abc79f639ecb7ac698a48fb544edc3b6eff3cb9b711efa2cc452365c17ed0'
'ac81992783f562128f55620dc54507b70026342519bb7c7d3efadf6275387861'
'6b2c26feeeaf253f87e0ef0517191b636c6945e374660a13574f3114331aaed6'
'ef8706062302fd7f13535501b5b2e8aed5325dbb0d4d56a88674bef53ca96eee'
'242a42e10ff09e4e82bcadb8824e036dddab94cf6dc9c5f6c80eb4c2cc5dda50'
'417c39397dfbc86db2cabc6217f54d9072de26dedcacf9a965b909fc998de052'
'472deb316ff3ad282c6be028cfaf033d69ddfee845dcd519c28a0692f298bb6a'
'835378dd0b7c126a6101b8df0c015951d88f5139f9586a618af6b3ee503d67b6'
'380d334a88213185183a05e7e55380de503e388fc29d8b11d96909dafcbbeb65'
'eaf1e363de111ee43d7ca3e4b161d9a3a3f6b1c9ca3d8642871abe70f18fbf95'
'7b6f0f63fdde32a411cf3230cabeb610ef8a6bd09777976a06dbd274daa540c7'
'15a0a40b918988e77e5f36eebf15e9f45b2c13a6628b5640efdac528c57aab80'
)
prepare() {
cd "${srcdir}/${_upstreampkg}-${_commit}"
# Patch 0: fourier's stateless-control modernization.
# - src/h264.c + src/picture.c → V4L2_CID_STATELESS_H264_*
# - include/hevc-ctrls.h → redirect shim to <linux/v4l2-controls.h>
# - src/meson.build: h265.c/h265.h commented out (HEVC excluded)
patch -p1 < "${srcdir}/fourier-local.patch"
# Hygiene: include/h264-ctrls.h is dead post-fourier (no source
# includes it, no install_headers directive). Drop it so the
# built source tree has no stale UAPI carry-over.
rm -f include/h264-ctrls.h
# Patch 1: ohm-gl-fix multiplanar port.
# - V4L2_BUF_TYPE_VIDEO_{OUTPUT,CAPTURE} -> *_MPLANE in src/v4l2.c
# - per-plane VIDIOC_EXPBUF in src/surface.c
# - struct v4l2_plane planes[] threading throughout
# - image.c plane-stride adjustments
patch -p1 < "${srcdir}/0001-mplane-multiplanar-port.patch"
# Patch 2: pre-STREAMON device controls + minimum OUTPUT pool.
# - context.c: floor OUTPUT pool to 4 buffers (not surfaces_count)
# - context.c: set V4L2_CID_STATELESS_H264_{DECODE_MODE,START_CODE}
# device-wide before VIDIOC_STREAMON
# THROWAWAY: superseded inline by patches 4 (request_pool) and 5
# (probe-then-set DECODE_MODE) per upstreamable_design.md §5.
patch -p1 < "${srcdir}/0002-pre-streamon-controls-and-output-pool.patch"
# Patch 3 (commit 2 in revised plan): QUERYCTRL/QUERYMENU helpers.
# Pure utility additions to src/v4l2.{c,h}, no behaviour change.
# Unblocks the request_pool and probe-then-set commits.
patch -p1 < "${srcdir}/0003-v4l2-query-helpers.patch"
# Patch 4 (commit 3 in revised plan): request_pool decoupling.
# NEW src/request_pool.{c,h}; context.c uses pool instead of
# per-surface OUTPUT loop; picture.c borrows on Begin, releases
# on Sync after DQBUF. Deletes 0002's "floor to 4" hunk inline
# — the pool's count parameter supersedes it. 0002's
# set_controls block remains until probe-then-set commit lands.
patch -p1 < "${srcdir}/0004-context-request-pool.patch"
# Patch 5: conditional PRED_WEIGHTS submission. Defect-fix found
# via ohm smoke testing (kernel rejects PRED_WEIGHTS at
# error_idx=5 on Main-profile clips without weighted prediction).
# Not part of Sonnet's planned series, but unblocks per-frame
# decode on every backing driver.
patch -p1 < "${srcdir}/0005-h264-conditional-pred-weights.patch"
# Patch 6: omit per-slice controls in FRAME_BASED mode. Identified
# via cross-reference against GStreamer's gstv4l2codech264dec.c
# (commit 9e3e775). FRAME_BASED requests must contain only
# SPS/PPS/SCALING_MATRIX/DECODE_PARAMS — submitting SLICE_PARAMS
# triggers V4L2 cluster validation EINVAL at error_idx=count.
# Hardcodes slice_based=false for now since 0002 sets FRAME_BASED;
# promotes to runtime probe via context->decode_mode in a
# follow-up commit.
patch -p1 < "${srcdir}/0006-h264-frame-based-omit-slice-controls.patch"
# Patch 7: enable ANNEX_B start-code emission in
# codec_store_buffer to match the device-side START_CODE_ANNEX_B
# set by 0002. Without this, kernel sees a raw NAL stream with
# no 0x00 0x00 0x01 markers, fails to parse slice boundaries,
# and emits a zeroed CAPTURE buffer (flat green frames in mpv).
patch -p1 < "${srcdir}/0007-context-h264-start-code-annex-b.patch"
# Patch 8: fill DECODE_PARAMS frame_num + FIELD_PIC/BOTTOM_FIELD
# flag bits from VAAPI. fourier left these zero-init; under
# FRAME_BASED on hantro the kernel uses them to drive bitstream
# parsing. Empirical question: does hantro tolerate the bit_size
# fields (idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
# dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
# slice_group_change_cycle) being zero, or do we need a
# slice_header() bit-level parser?
patch -p1 < "${srcdir}/0008-h264-fill-decode-params-from-vaapi.patch"
# Patch 9: drop VIDIOC_S_FMT on CAPTURE queue. Hantro derives
# CAPTURE format from per-request SPS; explicit S_FMT here can
# leave the driver in a state where DQBUF returns zeroed
# buffers despite no errors. GStreamer's reference path only
# G_FMTs the CAPTURE side.
patch -p1 < "${srcdir}/0009-surface-no-capture-sfmt.patch"
# Patch 10: DEBUG-only instrumentation. Hex-dumps OUTPUT and
# CAPTURE buffer first 32 bytes per frame via request_log().
# Removed before upstream submission.
patch -p1 < "${srcdir}/0010-DEBUG-hex-dump-output-capture.patch"
# Patch 11: DEBUG-only sentinel write before CAPTURE QBUF.
# Tells us whether kernel wrote to the buffer (sentinel gone)
# or didn't (sentinel survives).
patch -p1 < "${srcdir}/0011-DEBUG-sentinel-capture-buffer.patch"
# Patch 12 (REVISED 2026-05-02): gate SCALING_MATRIX submission
# on a per-surface matrix_set flag mirroring fourier's existing
# mpeg2.iqmatrix_set / h265.iqmatrix_set pattern. The earlier
# draft of this patch unconditionally omitted SCALING_MATRIX in
# FRAME_BASED, which was corpus-correct (bbb has no explicit
# scaling lists) but the wrong predicate — kernel-side gating
# is by "matrix-supplied vs. not," not by decode mode. Streams
# with explicit scaling lists must still submit in either mode.
# Three coordinated changes: surface.h adds bool matrix_set;
# picture.c sets it on VAIQMatrixBuffer arrival and resets it
# in RequestBeginPicture; h264.c builds controls[] incrementally.
# The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is
# preserved (kernel doc is explicit on it).
patch -p1 < "${srcdir}/0012-h264-omit-scaling-matrix-frame-based.patch"
# Patch 13 (REVISED 2026-05-02): hardcode SPS level_idc = 51 as
# an INTENTIONALLY OVER-ALLOCATING known-incomplete intermediate.
# VAAPI's decode-side picture-parameter buffer structurally lacks
# level_idc (only present in encode path). The H.264 SPS NAL is
# not in VASliceDataBuffer either (ffmpeg-vaapi parses it
# client-side and forwards only slice data — verified via the
# 0010 hex dump showing OUTPUT first bytes are "00 00 01 65 ...",
# i.e. start code + IDR slice NAL, no SPS). So a SPS-NAL byte
# extractor is not viable. TODO captured inline for level-from-
# resolution derivation per H.264 Annex A.3.
patch -p1 < "${srcdir}/0013-h264-sps-level-idc.patch"
# Patch 14: DEBUG-only — dump VAPictureH264 raw bytes + decoded
# fields. Used to disambiguate the TopFieldOrderCnt=65536 anomaly
# on ohm in 2026-04-30..2026-05-02 investigation. The dump output
# cross-referenced against meitner ground-truth (i965 backend)
# confirmed +0x10000 is ffmpeg-vaapi convention, not an ohm bug.
# Resolution: patch 0015. This patch stays in the series until
# the 65536 sentinel handling has been validated on ohm; remove
# before upstream submission.
patch -p1 < "${srcdir}/0014-DEBUG-vapic-bytes-dump.patch"
# Patch 15: strip ffmpeg-vaapi's POC sentinel before passing to
# V4L2. Root-cause fix for the "kernel decodes successfully but
# produces zeroed CAPTURE buffers" symptom. ffmpeg's
# H264POCContext initialises prev_poc_msb to (1 << 16) and the
# value leaks through field_poc[] to VAPictureH264. Working
# backends (i965, intel-iHD) tolerate the high word; V4L2
# stateless drivers cannot. Adds h264_strip_ffmpeg_poc_sentinel()
# static inline and applies it at all 4 POC sites (DPB top/bot,
# CurrPic top/bot). Detection by bit-16-set so a future ffmpeg
# version that fixes the leak degrades gracefully.
patch -p1 < "${srcdir}/0015-h264-strip-ffmpeg-poc-sentinel.patch"
# Patch 16: derive PFRAME / BFRAME flags from VAAPI slice_type.
# Upstreamability fix — tegra-vde consumes these flags to choose
# the inter-frame decode kernel; hantro/rkvdec/cedrus/mediatek/
# qcom don't read them but should still see spec-correct values.
patch -p1 < "${srcdir}/0016-h264-derive-pframe-bframe-flags.patch"
# Patch 17: fill dpb[].pic_num as PicNum/LongTermPicNum per H.264
# spec equations 8-28/8-29 instead of fourier's wrong VAAPI
# surface-id assignment. Adds VAPicture parameter to h264_fill_dpb
# so it can compute FrameNumWrap from log2_max_frame_num_minus4 +
# current frame_num. Mediatek consumes pic_num for short-term
# field-coded ref disambiguation; hantro doesn't read it (uses
# reference_ts), which is why fourier's wrong value never
# surfaced on RK3568.
patch -p1 < "${srcdir}/0017-h264-dpb-picnum-correctness.patch"
# Patch 18: derive sps.level_idc from encoded frame size per
# H.264 Annex A.3 (Table A-1) MaxFS thresholds. Replaces patch
# 0013's intermediate hardcode of 51. For typical content:
# 1080p → Level 4.1 (level_idc=41), 4K → 5.1, 8K → 6.0. Hantro
# uses level_idc to size DPB / MV buffers; correct sizing means
# less wasted memory than 0013's blanket over-allocation.
patch -p1 < "${srcdir}/0018-h264-level-idc-from-annex-a3.patch"
}
build() {
cd "${srcdir}/${_upstreampkg}-${_commit}"
# meson_options.txt only exposes 'kernel_headers' — leave it empty to
# use system /usr/include kernel UAPI headers. No per-codec toggles.
arch-meson build --buildtype=release
meson compile -C build
}
package() {
cd "${srcdir}/${_upstreampkg}-${_commit}"
meson install -C build --destdir "${pkgdir}"
install -Dm644 COPYING "${pkgdir}/usr/share/licenses/${pkgname}/COPYING"
install -Dm644 COPYING.LGPL "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.LGPL"
install -Dm644 COPYING.MIT "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.MIT"
}
@@ -0,0 +1,77 @@
# libva-v4l2-request-ohm-gl-fix
> ## ⚠ DEPRECATED — use [`libva-v4l2-request-fourier`](../libva-v4l2-request-fourier/) instead
>
> This package is the **predecessor experimental** build (tarball pin
> + 18 stacked patches) and is no longer maintained as of 2026-05-16.
> Its successor `libva-v4l2-request-fourier` tracks the campaign fork's
> git history directly
> ([git.reauktion.de/marfrit/libva-v4l2-request-fourier](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier))
> so iteration sweeps (DEBUG removal, follow-up bugfixes) land in a clean
> linear log, and adds the iter38 multi-device probe that lets a single
> libva session serve rkvdec H.264/HEVC/VP9 + hantro MPEG-2/VP8 without
> needing `LIBVA_V4L2_REQUEST_VIDEO_PATH` overrides.
>
> `libva-v4l2-request-fourier` declares
> `replaces=('libva-v4l2-request-ohm-gl-fix')`, so installing it will
> remove this package automatically. Kept in-tree as historical reference
> for the ohm_gl_fix Phase 6 audit trail.
---
Bootlin's libva-v4l2-request VA-API backend, with hantro-vpu
multi-planar + chromium-149-era stateless H.264 patches developed
in the [ohm_gl_fix campaign](../../../ohm_gl_fix/) Phase 6 Step 1
(2026-05-01..2026-05-02).
Patches 0001-0018 are contract-correct against the kernel V4L2
stateless H.264 UAPI, validated by inspection against
hantro_h264.c and v4l2_h264_init_reflist_builder() in linux 6.19.x.
See ohm_gl_fix's `phase6/step1/audit_0008_decode_params_2026-05-01.md`
and `phase6/step1/api_contract_findings_2026-05-01.md` for the
audit trail.
## Honest characterisation
This package compiles cleanly, installs cleanly, and `vainfo` with
`LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
enumerates all H.264 profiles. It is, however, **not on Brave's
critical decode path** on the ohm_gl_fix Step-1+Step-2 stack —
Brave/Chromium uses its own `V4L2VideoDecoder` in
`media/gpu/v4l2/`, opens `/dev/video1` directly, and never loads
libva. See `phase3_remeasure_2026-05-02/B3_decoder_discovery.md`
in the ohm_gl_fix repo for the strace/maps evidence.
For libva consumers that DO route through libva (mpv with
`--hwdec=vaapi`, ffmpeg-vaapi, Firefox with
`media.ffmpeg.vaapi.enabled=true`), this backend would in
principle engage hantro hardware decode — but each consumer hits
its own downstream issue:
- mpv `--vo=gpu-next`: blocked by a Mesa-panfrost WSI pitch bug
during EGL dmabuf import (out of scope for this package; see
`phase3_remeasure_2026-05-02/A3_mesa_wsi_pitch.md`).
- mpv `--vo=image`: silently falls back to libavcodec SW decode
rather than engaging the libva session (see
`phase3_remeasure_2026-05-02/A1_morning_pass_disambiguation.md`).
Reason not yet diagnosed.
The most likely use case where this backend cleanly delivers
hardware decode end-to-end is **Firefox via libavcodec-vaapi**, on
a stack that also has the Mesa pitch issue resolved (or that
doesn't hit the EGL import path because Firefox's video element
composites differently). Untested at time of writing.
## DEBUG patches in the series
`0010-DEBUG-hex-dump-output-capture.patch`,
`0011-DEBUG-sentinel-capture-buffer.patch`, and
`0014-DEBUG-vapic-bytes-dump.patch` produce verbose stderr output
useful for diagnosing decode-path issues but excessive for
production. They are intentionally kept in the applied series
during development; remove for cleaner runs.
## Status
Tagged stable as of 2026-05-02 against chromium-149 / linux-6.19.10
/ libva-2.22.0. Contract-correct; ecosystem-validation-pending.
File diff suppressed because it is too large Load Diff
+168
View File
@@ -0,0 +1,168 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# linux-ampere-fourier — CoolPi GenBook (RK3588) kernel built from the
# kernel-agent fleet/ampere.yaml manifest applied to mainline v7.0-rc3.
#
# kafr2 baseline (2026-05-18): mainline v7.0-rc3 + the 10 scope-tagged
# kernel-agent patches under patches/{soc,module,board,driver}/:
# - 1 soc/rk3588 pwm15 pinctrl
# - 6 board/coolpi-cm5-genbook DTS patches (pwm-fan, RK806 power-off,
# speaker, USB-C PD, lid switch + USB3 PHY, microphone)
# - 3 driver/media VP9-on-VDPU381 patches (Sarma's v8 series, imported
# via marfrit/kernel-agent#12 closure and PR #24)
#
# Drops the prior f8f3ad9 baseline ("18 commits ahead") because that tip
# black-screens ampere — kernel-agent's ka-promote produces this 10-patch
# minimal set from fleet/ampere.yaml. End-to-end VP9 + AV1 (av1-vpu-dec
# is mainline-7.0) decode verified bit-exact via kdirect on the
# hand-built tip 48a8c78 before this package iteration was cut.
#
# Coexists with the user's other extlinux labels in
# /boot/firmware/extlinux/extlinux.conf; never edits them. Adds a
# managed `linux-ampere-fourier` label (the user sets `default` manually
# after verifying boot).
#
# Bootloader path: /boot/firmware/ (vfat on mmcblk0p1). Kernel +
# initramfs + DTB land there directly. Reverting = boot a different
# extlinux label (e.g. arch_mainline, ubuntu_mainline).
pkgbase=linux-ampere-fourier
pkgname=("$pkgbase" "$pkgbase-headers")
pkgver=7.0rc3.kafr2
pkgrel=1
pkgdesc='CoolPi GenBook kernel (v7.0-rc3 + kernel-agent fleet/ampere.yaml — 6 board patches + 3 VP9-VDPU381 + 1 pwm15)'
arch=(aarch64)
url='https://git.reauktion.de/marfrit/kernel-agent'
license=(GPL-2.0-only)
makedepends=(
bc cpio gettext kmod libelf pahole perl python tar xz
ccache
uboot-tools dtc
)
options=('!strip')
# Pinned tip of the kernel-agent-managed source tree for ampere.
# 10 commits ahead of v7.0-rc3, exactly mirroring fleet/ampere.yaml's
# manifest under apply order:
# - c57d069 soc/rk3588: pwm15 pinctrl entries
# - 05a915c board/genbook: pwm-fan with thermal cooling
# - d007b90 module/coolpi-cm5: RK806 system-power-controller
# - 3722eab board/genbook: speaker via audio-graph-card
# - 3e42ab6 board/genbook: USB-C PD via FUSB302
# - 7c241f2 board/genbook: lid switch + USB3 PHY lane
# - dd545fa board/genbook: wire internal microphone
# - 9ddcae5 driver/media: rkvdec-vp9 helper rename (Sarma)
# - c5063d9 driver/media: rkvdec move vp9 to common (Sarma)
# - 48a8c78 driver/media: rkvdec VP9 for VDPU381 (Sarma)
#
# This is the same tree state ka-promote ampere produces as cumulative.patch
# (see marfrit/kernel-agent build/ampere/v7.0-rc3/manifest.lock for the
# b2sum + per-patch sha256s).
_commit=48a8c785de7f5320513052a64e544c6310d7b273
source=(
# Local tarball produced by ./prebuild.sh from a local clone of the
# linux-rk3588-marfrit branch. Not fetched from a URL because the
# boltzmann working clone is shallow (gitea push rejects) and the
# 260MB tarball isn't committed to marfrit-packages. Run prebuild.sh
# before makepkg; see README in this dir.
"linux-rk3588-marfrit-${_commit:0:7}.tar.gz"
'config' # snapshot of running ampere kernel's /proc/config.gz (7.0.0-rc3-ARCH+)
'linux-ampere-fourier.preset'
'extlinux-add.hook'
'extlinux-add.sh'
)
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
# kernelrelease becomes <Makefile-VERSION>.<PATCH>.<SUBLEVEL><EXTRAVERSION><LOCALVERSION>
# i.e. 7.0.0-rc3-ampere-fourier. Module dir + EXTRAVERSION suffix keep
# this disjoint from the hand-managed /boot/firmware/Image-7.0.0-rc3-ARCH+
# that's currently on the host.
_kernver=7.0.0-rc3-ampere-fourier
_srcdir=linux-rk3588-marfrit
prepare() {
cd "${_srcdir}"
echo ":: writing config"
cp "${srcdir}/config" .config
# LOCALVERSION suffix to differentiate from upstream-stock builds.
scripts/config --set-str LOCALVERSION "-ampere-fourier"
scripts/config -d LOCALVERSION_AUTO
echo ":: olddefconfig (accept new symbols sensibly)"
make olddefconfig
make -s kernelrelease > version
echo ":: kernel release: $(<version)"
}
build() {
cd "${_srcdir}"
unset LDFLAGS
# Native build only — no distcc per kernel-agent policy
# (feedback_kernel_agent_no_distcc.md). ccache stays.
export CC="ccache gcc"
export HOSTCC="ccache gcc"
make ${MAKEFLAGS:--j$(nproc)} Image modules dtbs
}
_package() {
pkgdesc='CoolPi GenBook overclocked kernel (ampere-fourier baseline)'
depends=(coreutils kmod mkinitcpio uboot-tools)
optdepends=('linux-firmware: firmware images needed for some devices')
backup=("etc/mkinitcpio.d/${pkgbase}.preset")
cd "${_srcdir}"
local _kver
_kver=$(<version)
# Kernel image into the vfat firmware partition (where extlinux looks).
install -Dm644 arch/arm64/boot/Image \
"${pkgdir}/boot/firmware/Image-ampere-fourier"
# Single DTB for the GenBook target — install directly under
# /boot/firmware/ (no subdir, matches existing host convention).
install -Dm644 arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dtb \
"${pkgdir}/boot/firmware/rk3588-coolpi-cm5-genbook.dtb-ampere-fourier"
ZSTD_CLEVEL=19 make INSTALL_MOD_PATH="${pkgdir}/usr" \
INSTALL_MOD_STRIP=1 modules_install
rm -f "${pkgdir}/usr/lib/modules/${_kver}/"{source,build}
install -Dm644 "${srcdir}/${pkgbase}.preset" \
"${pkgdir}/etc/mkinitcpio.d/${pkgbase}.preset"
install -Dm755 "${srcdir}/extlinux-add.hook" \
"${pkgdir}/usr/share/libalpm/hooks/95-${pkgbase}-extlinux.hook"
install -Dm755 "${srcdir}/extlinux-add.sh" \
"${pkgdir}/usr/share/libalpm/scripts/${pkgbase}-extlinux"
}
_package-headers() {
pkgdesc='Headers and scripts for the linux-ampere-fourier kernel'
depends=(pahole)
cd "${_srcdir}"
local _kver _builddir
_kver=$(<version)
_builddir="${pkgdir}/usr/lib/modules/${_kver}/build"
install -Dt "${_builddir}" -m644 .config Makefile Module.symvers System.map vmlinux version
install -Dt "${_builddir}/kernel" -m644 kernel/Makefile
cp -a scripts "${_builddir}"
install -Dt "${_builddir}/arch/arm64" -m644 arch/arm64/Makefile
cp -a arch/arm64/include "${_builddir}/arch/arm64/"
cp -a include "${_builddir}/"
find . -name 'Kbuild' -exec install -Dm644 {} "${_builddir}/{}" \;
find . -name 'Kconfig*' -exec install -Dm644 {} "${_builddir}/{}" \;
install -d "${pkgdir}/usr/src"
ln -sr "${_builddir}" "${pkgdir}/usr/src/${pkgbase}"
}
eval "package_${pkgbase}() { _package; }"
eval "package_${pkgbase}-headers() { _package-headers; }"
+70
View File
@@ -0,0 +1,70 @@
# linux-ampere-fourier
Kernel package for ampere (CoolPi GenBook RK3588). Baselined on
`marfrit/linux-rk3588-marfrit @ f8f3ad9` (mainline v7.0-rc3 + 18
RK3588-fleet-relevant commits — 10 Markus, 8 upstream cherry-picks).
See `marfrit/kernel-agent/fleet/ampere.yaml` for the manifest +
`marfrit/kernel-agent/patches/{soc,module,board}/...` for the
scope-tagged board patches in the baseline.
## Build
The kernel source isn't on Gitea — boltzmann's working clone is
shallow (Gitea refuses shallow pushes) and a 260MB tarball doesn't
belong in `marfrit-packages`. Stage the source locally from a
clone of the `linux-rk3588-marfrit` branch:
```sh
cd arch/linux-ampere-fourier
./prebuild.sh # produces linux-rk3588-marfrit-f8f3ad9.tar.gz
makepkg -s --noconfirm # native aarch64 build; no distcc
```
`prebuild.sh` looks at `$LINUX_RK3588_MARFRIT_TREE` (default
`~/src/linux-rockchip`) for the kernel working tree. The tip commit
must be reachable in that clone — fetch the `linux-rk3588-marfrit`
branch first if you cloned from elsewhere.
## Build hosts
Native aarch64 only (per kernel-agent `feedback_kernel_agent_no_distcc.md`).
Either ampere itself (8C/2.4GHz, 32GB, native arch) or boltzmann
(Rock 5 ITX+, same uarch). fermi as fallback.
## Install
Adds a managed label to `/boot/firmware/extlinux/extlinux.conf`:
```
label linux-ampere-fourier
menu label linux-ampere-fourier (managed)
kernel /Image-ampere-fourier
initrd /initramfs-ampere-fourier.img
fdt /rk3588-coolpi-cm5-genbook.dtb-ampere-fourier
append <inherited from arch_mainline>
```
Default label is NOT changed. After verifying boot of the managed
label at the u-boot menu, you flip `default` manually. Reverting =
boot a different label (e.g. `arch_mainline`, `ubuntu_mainline`).
## Boot path
ampere uses `/boot/firmware/` (vfat on mmcblk0p1, ~1G), distinct
from fresnel's `/boot/` on root partition. The PKGBUILD installs
Image, initramfs, and DTB directly under `/boot/firmware/`. No
DTB subdir — single board target.
## Open follow-ups (per kernel-agent issue #6)
- **Ask 2** (VP9 enablement on RK3588 rkvdec) — not addressed in this
iteration. Separate session.
- **Ask 3** (AV1 decoder integration) — backend libva work, not kernel.
- Hosting the source tarball publicly so `prebuild.sh` isn't needed —
candidate: Gitea release asset, or `packages.reauktion.de/sources/`.
- Splitting the 12 non-board cherry-pick commits in the baseline
(4 Shawn Lin phy, 2 Cristian Ciocaltea, etc.) into scope-tagged
patches in kernel-agent — currently they ride along inside the
pinned baseline rather than being explicit `includes:` in the
manifest.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,13 @@
[Trigger]
Operation = Install
Operation = Upgrade
Operation = Remove
Type = Path
Target = boot/firmware/Image-ampere-fourier
Target = boot/firmware/rk3588-coolpi-cm5-genbook.dtb-ampere-fourier
Target = boot/firmware/initramfs-ampere-fourier.img
[Action]
Description = Updating extlinux entry for linux-ampere-fourier
When = PostTransaction
Exec = /usr/share/libalpm/scripts/linux-ampere-fourier-extlinux
+62
View File
@@ -0,0 +1,62 @@
#!/bin/sh
# Add / update / remove the linux-ampere-fourier entry in
# /boot/firmware/extlinux/extlinux.conf. Idempotent. Coexists with
# the hand-managed labels in that file; never edits them. Default
# label is NOT touched — user picks at u-boot menu.
#
# ampere boots from a vfat partition (mmcblk0p1) mounted at
# /boot/firmware/, distinct from fresnel's /boot/ on root.
set -eu
CONF="/boot/firmware/extlinux/extlinux.conf"
TAG_BEGIN="# >>> linux-ampere-fourier (managed) >>>"
TAG_END="# <<< linux-ampere-fourier (managed) <<<"
# Copy APPEND from the user's `arch_mainline` label so the managed
# entry inherits the same root= and console= settings the host's
# bootloader already trusts. Falls back to a CoolPi GenBook default
# if no arch_mainline label exists (first-time install on a fresh
# bootloader config).
EXISTING_APPEND=$(awk '
/^[[:space:]]*label[[:space:]]+arch_mainline[[:space:]]*$/ { found=1; next }
found && /^[[:space:]]*append[[:space:]]/ {
sub(/^[[:space:]]*append[[:space:]]+/, "")
print
exit
}
/^[[:space:]]*label[[:space:]]/ { found=0 }
' "$CONF" 2>/dev/null || true)
APPEND="${EXISTING_APPEND:-root=LABEL=arch rw rootwait rootfstype=btrfs rootflags=subvol=@,ssd,discard=async console=ttyS2,1500000 console=tty1 consoleblank=0 loglevel=7 cma=256M coherent_pool=2M}"
ENTRY=$(cat <<EOF
${TAG_BEGIN}
label linux-ampere-fourier
menu label linux-ampere-fourier (managed)
kernel /Image-ampere-fourier
initrd /initramfs-ampere-fourier.img
fdt /rk3588-coolpi-cm5-genbook.dtb-ampere-fourier
append ${APPEND}
${TAG_END}
EOF
)
# Strip any prior managed block, then append fresh
TMP=$(mktemp)
awk -v b="$TAG_BEGIN" -v e="$TAG_END" '
$0==b{skip=1; next}
$0==e{skip=0; next}
!skip{print}
' "$CONF" > "$TMP"
# Post-Remove: kernel files absent → don't re-add the entry
if [ -f "/boot/firmware/Image-ampere-fourier" ] \
&& [ -f "/boot/firmware/rk3588-coolpi-cm5-genbook.dtb-ampere-fourier" ]; then
printf '%s\n' "$ENTRY" >> "$TMP"
echo "linux-ampere-fourier: extlinux entry updated"
else
echo "linux-ampere-fourier: kernel files absent, entry removed"
fi
mv "$TMP" "$CONF"
@@ -0,0 +1,13 @@
# mkinitcpio preset for linux-ampere-fourier
#
# ampere boots from /boot/firmware/ (vfat partition on mmcblk0p1). The
# initramfs lands there too so extlinux can pick it up. Only one PRESET
# because /boot/firmware is ~1G total — no room for a fallback image
# alongside the primary.
ALL_kver="/boot/firmware/Image-ampere-fourier"
ALL_microcode=()
PRESETS=('default')
default_image="/boot/firmware/initramfs-ampere-fourier.img"
+68
View File
@@ -0,0 +1,68 @@
#!/bin/bash
# prebuild — stage the kernel source tarball this PKGBUILD expects.
#
# linux-ampere-fourier's source is a snapshot of marfrit/linux-rk3588-marfrit
# @ f8f3ad9 (260MB), too big to commit to marfrit-packages and currently
# unpushable to Gitea (boltzmann's working clone is shallow; gitea push
# refuses shallow updates). Hosting the tarball outside Gitea would need
# infrastructure setup that's not in scope for the first iteration.
#
# So: produce the tarball locally from the kernel working tree just
# before makepkg runs. Idempotent — if an existing tarball matches the
# expected sha256 we skip the archive step.
#
# Run from this directory: cd arch/linux-ampere-fourier && ./prebuild.sh
# Override the kernel-tree location: LINUX_RK3588_MARFRIT_TREE=/path ./prebuild.sh
#
# Default tree location matches the boltzmann/ampere convention from
# kernel-agent issue #6: $HOME/src/linux-rockchip.
set -euo pipefail
TREE="${LINUX_RK3588_MARFRIT_TREE:-${HOME}/src/linux-rockchip}"
COMMIT=48a8c785de7f5320513052a64e544c6310d7b273
# Generated tarball sha varies with gzip version — script warns-not-fails.
# Leave EXPECTED empty for fresh kafr2 builds; first successful build can
# pin a canonical sha here if a reproducibility audit needs it.
SHA256_EXPECTED=
HERE="$(cd "$(dirname "$0")" && pwd)"
OUTPUT="${HERE}/linux-rk3588-marfrit-${COMMIT:0:7}.tar.gz"
if [ -f "$OUTPUT" ]; then
have=$(sha256sum "$OUTPUT" | cut -d' ' -f1)
if [ "$have" = "$SHA256_EXPECTED" ]; then
echo "prebuild: $OUTPUT already exists with correct sha256"
exit 0
else
echo "prebuild: existing $OUTPUT sha mismatch (have=$have, want=$SHA256_EXPECTED) — regenerating" >&2
rm -f "$OUTPUT"
fi
fi
if [ ! -d "$TREE/.git" ]; then
echo "prebuild: kernel tree not at $TREE" >&2
echo " set LINUX_RK3588_MARFRIT_TREE=/path/to/linux-rockchip and retry" >&2
exit 2
fi
cd "$TREE"
if ! git cat-file -e "$COMMIT" 2>/dev/null; then
echo "prebuild: commit $COMMIT not found in $TREE" >&2
echo " fetch the linux-rk3588-marfrit branch first:" >&2
echo " git fetch <remote> linux-rk3588-marfrit" >&2
exit 3
fi
echo "prebuild: generating archive from $TREE @ $COMMIT..."
git archive --format=tar.gz --prefix=linux-rk3588-marfrit/ "$COMMIT" -o "$OUTPUT"
# git archive emits a deterministic tar stream but gzip compression may
# vary by version. The sha256 check is informational; warn-don't-fail.
have=$(sha256sum "$OUTPUT" | cut -d' ' -f1)
if [ "$have" != "$SHA256_EXPECTED" ]; then
echo "prebuild: WARNING $OUTPUT sha=$have (canonical=$SHA256_EXPECTED)" >&2
echo " probably a gzip-version difference; tar payload should be identical" >&2
fi
echo "prebuild: wrote $OUTPUT ($(du -h "$OUTPUT" | cut -f1), sha=$have)"
@@ -0,0 +1,114 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] arm64: dts: rockchip: rk3399-pinebook-pro: extend OPP tables with OC entries (1.704 / 2.184 GHz)
Inline the OPP-V2 cluster0 / cluster1 tables on the Pinebook Pro DTS to add
overclock points: cluster0 (A53) up to 1.704 GHz @ 1.175 V, cluster1 (A72)
up to 2.184 GHz @ 1.275 V.
Original tables (from rk3399.dtsi → cluster0/1_opp) are reused intact for
their points up to 1.416 / 1.800 GHz; the additional opp06 (cluster0) and
opp08 (cluster1) extend the curve. Voltages above the stock max follow the
community-validated PBP OC ladder; thermals on the Pinebook Pro chassis
sustain these without throttling for normal workloads.
The label match (`cluster0_opp`, `cluster1_opp`) means the DTC merges these
inline tables with the parent ones rather than treating them as new nodes.
opp00..opp05/07 have identical content to upstream; opp06/08 add the OC
points.
scope: board/pinebook-pro
fleet: fresnel
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 78 ++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
--- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
@@ -18,6 +18,84 @@
compatible = "pine64,pinebook-pro", "rockchip,rk3399";
chassis-type = "laptop";
+ cluster0_opp: opp-table-0 {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp00 {
+ opp-hz = /bits/ 64 <408000000>;
+ opp-microvolt = <750000 750000 1150000>;
+ clock-latency-ns = <40000>;
+ };
+ opp01 {
+ opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <825000 825000 1150000>;
+ };
+ opp02 {
+ opp-hz = /bits/ 64 <816000000>;
+ opp-microvolt = <850000 850000 1150000>;
+ };
+ opp03 {
+ opp-hz = /bits/ 64 <1008000000>;
+ opp-microvolt = <900000 900000 1150000>;
+ };
+ opp04 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <975000 975000 1150000>;
+ };
+ opp05 {
+ opp-hz = /bits/ 64 <1416000000>;
+ opp-microvolt = <1100000 1100000 1150000>;
+ };
+ opp06 {
+ opp-hz = /bits/ 64 <1704000000>;
+ opp-microvolt = <1175000 1175000 1175000>;
+ };
+ };
+
+ cluster1_opp: opp-table-1 {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp00 {
+ opp-hz = /bits/ 64 <408000000>;
+ opp-microvolt = <750000 750000 1250000>;
+ clock-latency-ns = <40000>;
+ };
+ opp01 {
+ opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <800000 800000 1250000>;
+ };
+ opp02 {
+ opp-hz = /bits/ 64 <816000000>;
+ opp-microvolt = <825000 825000 1250000>;
+ };
+ opp03 {
+ opp-hz = /bits/ 64 <1008000000>;
+ opp-microvolt = <850000 850000 1250000>;
+ };
+ opp04 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <900000 900000 1250000>;
+ };
+ opp05 {
+ opp-hz = /bits/ 64 <1416000000>;
+ opp-microvolt = <975000 975000 1250000>;
+ };
+ opp06 {
+ opp-hz = /bits/ 64 <1608000000>;
+ opp-microvolt = <1050000 1050000 1250000>;
+ };
+ opp07 {
+ opp-hz = /bits/ 64 <1800000000>;
+ opp-microvolt = <1150000 1150000 1250000>;
+ };
+ opp08 {
+ opp-hz = /bits/ 64 <2184000000>;
+ opp-microvolt = <1275000 1275000 1275000>;
+ };
+ };
+
aliases {
mmc0 = &sdio0;
mmc1 = &sdmmc;
@@ -0,0 +1,30 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] arm64: dts: rockchip: rk3399-pinebook-pro: enable hdmi_sound
The Pinebook Pro routes HDMI audio through the rk3399 HDMI block; the audio
path is otherwise functional but the simple-audio-card binding is left
disabled by default in the inherited DTSI. Flip status to "okay" so HDMI
displays expose an ALSA sink.
scope: board/pinebook-pro
fleet: fresnel
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
--- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
@@ -421,6 +499,10 @@
status = "okay";
};
+&hdmi_sound {
+ status = "okay";
+};
+
&i2c0 {
clock-frequency = <400000>;
i2c-scl-falling-time-ns = <4>;
@@ -0,0 +1,28 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH] arm64: dts: rockchip: rk3399-pinebook-pro: cap spi1 to 10 MHz
The on-board SPI flash on the Pinebook Pro is reachable via spi1; setting
spi-max-frequency below the controller's nominal max keeps the bus stable
across the variants of NOR flash shipped with the laptop. 10 MHz is the
community-validated upper bound that all observed parts handle without
read-back errors during routine boot probes.
scope: board/pinebook-pro
fleet: fresnel
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
--- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
@@ -947,6 +1029,7 @@
};
&spi1 {
+ max-freq = <10000000>;
status = "okay";
spiflash: flash@0 {
@@ -0,0 +1,356 @@
From a202de1646d4c8f8ee2ebc2e4c100b621975754a Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:16:07 +0200
Subject: [PATCH RFC v2] media: videobuf2: add opt-in dma_resv producer fence
helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
V4L2 producers historically don't propagate buffer-state-done into
the dmabuf's dma_resv exclusive fence. Userspace consumers that
import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE,
EGL_LINUX_DMA_BUF_EXT) currently see either zero fences or a stub
fence from dma_fence_get_stub(). This is correct by accident for the
common DQBUF-then-import case but represents a contract gap that
breaks Wayland compositors importing CAPTURE buffers from a stateless
H.264 decoder under continuous playback on implicit-sync GPU stacks
(observed on RK3566 + hantro VPU + Mali-G52 panfrost; manifests as
green frames -- BT.709 limited-range YUV(0,0,0) -> RGB(0,77,0) -- when
the GPU samples the dmabuf before the producer's decode completes).
Add an opt-in API gated by both a per-driver runtime flag
(vb2_queue::supports_release_fences) and a Kconfig
(CONFIG_VIDEOBUF2_RELEASE_FENCES, default n) that lets producers
populate a real dma_resv exclusive write fence on the dmabufs they
export. Drivers call vb2_buffer_attach_release_fence(vb) at a
finite-time-fenced point in their pipeline (typically m2m
device_run, just before the HW kick); vb2_buffer_done() signals and
puts the fence as part of its state transition.
The publish and signal paths are wrapped in
dma_fence_begin_signalling() / dma_fence_end_signalling() so
PROVE_LOCKING can validate that nothing taken in those critical
sections deadlocks against the signal path. dma_resv_lock is
sleepable but not taken on the signal path, so taking it inside the
publish critical section is safe under lockdep.
Skips planes whose vb2_plane.dbuf is NULL -- buffers never exported
via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no
dmabuf for userspace to wait on.
Drivers that don't opt in pay nothing: the helper is a no-op stub
when CONFIG_VIDEOBUF2_RELEASE_FENCES=n, and an early-return check
of supports_release_fences when =y but the flag is unset.
Validated on RK3566 PineTab2 with PROVE_LOCKING enabled: 30s of
bbb_1080p30 H.264 stateless decode + zero-copy panfrost EGL import
via dmabuf-wayland (mpv 0.41 + KWin 6.6.4 + Mesa panfrost 26.0.5)
produces 31,816 dma_fence init/signal pairs across 5,724 vb2 buffer
cycles with zero lockdep splats from videobuf2 / dma_resv code paths.
Subsequent patches in this series opt the hantro and rockchip-rga
drivers in.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian König <christian.koenig@amd.com>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/common/videobuf2/Kconfig | 29 ++++
.../media/common/videobuf2/videobuf2-core.c | 135 ++++++++++++++++++
include/media/videobuf2-core.h | 51 +++++++
3 files changed, 215 insertions(+)
diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index d2223a12c..bbfa26984 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -30,3 +30,32 @@ config VIDEOBUF2_DMA_SG
config VIDEOBUF2_DVB
tristate
select VIDEOBUF2_CORE
+
+config VIDEOBUF2_RELEASE_FENCES
+ bool "videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports"
+ depends on VIDEOBUF2_CORE
+ depends on DMA_SHARED_BUFFER
+ default n
+ help
+ Enables an opt-in API that lets vb2 producers populate a dma_resv
+ exclusive write fence on the dmabufs they export to userspace.
+ The fence is signalled when the buffer transitions to
+ VB2_BUF_STATE_DONE.
+
+ This gives userspace consumers that import V4L2-produced dmabufs
+ and wait on the dmabuf's implicit-sync fence (poll(POLLIN),
+ DMA_BUF_IOCTL_EXPORT_SYNC_FILE, EGL_LINUX_DMA_BUF_EXT) a real
+ producer fence to wait on, instead of a stub fence from
+ dma_fence_get_stub() that the dma_buf core substitutes when
+ dma_resv is empty.
+
+ Drivers individually opt in by setting
+ vb2_queue::supports_release_fences = true and calling
+ vb2_buffer_attach_release_fence() at the right point in their
+ pipeline (typically m2m device_run, just before HW kick).
+
+ Distributors leave this off unless targeting Wayland/EGL
+ consumers of V4L2 stateless decoder output on
+ implicit-sync-only GPU stacks (e.g. mainline panfrost).
+
+ If unsure, say N.
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index adf668b21..85d7fddbd 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -26,6 +26,12 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+#include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-buf.h>
+#endif
+
#include <media/videobuf2-core.h>
#include <media/v4l2-mc.h>
@@ -1173,6 +1179,120 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
}
EXPORT_SYMBOL_GPL(vb2_plane_cookie);
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+/*
+ * dma_resv release-fence integration.
+ *
+ * Optional, opt-in path that lets producers publish a real
+ * dma_fence on their CAPTURE-side dmabufs so userspace consumers
+ * (compositors, EGL importers) get spec-clean implicit-sync
+ * semantics instead of the dma_buf core's stub fence. Drivers
+ * call vb2_buffer_attach_release_fence() at a finite-time-fenced
+ * point (typically m2m device_run) and the fence is signalled by
+ * vb2_buffer_done(). Gated at runtime by
+ * vb2_queue::supports_release_fences and at compile time by
+ * CONFIG_VIDEOBUF2_RELEASE_FENCES.
+ */
+
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
+{
+ return "videobuf2";
+}
+
+static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence)
+{
+ return "vb2-release-fence";
+}
+
+static const struct dma_fence_ops vb2_dma_resv_fence_ops = {
+ .get_driver_name = vb2_dma_resv_get_driver_name,
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
+};
+
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ struct vb2_queue *q = vb->vb2_queue;
+ struct dma_fence *fence;
+ unsigned int plane;
+ bool cookie;
+
+ if (!q->supports_release_fences)
+ return 0;
+
+ if (WARN_ON(vb->release_fence))
+ return -EINVAL;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return -ENOMEM;
+
+ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock,
+ q->dma_resv_fence_context,
+ atomic64_inc_return(&q->dma_resv_fence_seqno));
+
+ /*
+ * Annotate the publish-side critical section. Per
+ * Documentation/driver-api/dma-buf.rst, lockdep validates
+ * that nothing taken in this region can deadlock against
+ * the signal path in vb2_buffer_signal_release_fence().
+ * dma_resv_lock is sleepable but is not taken on the signal
+ * path, so taking it inside the critical section is safe.
+ */
+ cookie = dma_fence_begin_signalling();
+ for (plane = 0; plane < vb->num_planes; plane++) {
+ struct dma_buf *dbuf = vb->planes[plane].dbuf;
+
+ if (!dbuf)
+ continue;
+
+ dma_resv_lock(dbuf->resv, NULL);
+ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE);
+ dma_resv_unlock(dbuf->resv);
+ }
+ dma_fence_end_signalling(cookie);
+
+ /* One reference for the eventual signal in vb2_buffer_done. */
+ vb->release_fence = dma_fence_get(fence);
+
+ /* The dma_resv held its own reference per plane. Drop ours. */
+ dma_fence_put(fence);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
+
+static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
+ enum vb2_buffer_state state)
+{
+ struct dma_fence *fence = vb->release_fence;
+ bool cookie;
+
+ if (!fence)
+ return;
+
+ cookie = dma_fence_begin_signalling();
+ if (state == VB2_BUF_STATE_ERROR)
+ dma_fence_set_error(fence, -EIO);
+ dma_fence_signal(fence);
+ dma_fence_end_signalling(cookie);
+
+ dma_fence_put(fence);
+ vb->release_fence = NULL;
+}
+#else /* !CONFIG_VIDEOBUF2_RELEASE_FENCES */
+
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
+
+static inline void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
+ enum vb2_buffer_state state)
+{
+}
+#endif /* CONFIG_VIDEOBUF2_RELEASE_FENCES */
+
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
struct vb2_queue *q = vb->vb2_queue;
@@ -1199,6 +1319,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb);
+ if (state != VB2_BUF_STATE_QUEUED)
+ vb2_buffer_signal_release_fence(vb, state);
+
spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED;
@@ -2651,6 +2774,18 @@ int vb2_core_queue_init(struct vb2_queue *q)
mutex_init(&q->mmap_lock);
init_waitqueue_head(&q->done_wq);
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * Per-queue dma_resv release-fence context. Drivers that
+ * opt in via supports_release_fences and call
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a single per-queue timeline.
+ */
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
+ spin_lock_init(&q->dma_resv_fence_lock);
+#endif
+
q->memory = VB2_MEMORY_UNKNOWN;
if (q->buf_struct_size == 0)
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 4424d481d..766ff2194 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -288,6 +288,16 @@ struct vb2_buffer {
unsigned int skip_cache_sync_on_finish:1;
struct vb2_plane planes[VB2_MAX_PLANES];
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * Producer release fence published on each plane's
+ * dmabuf->resv when the driver opts in via
+ * vb2_buffer_attach_release_fence(). Signalled and put by
+ * vb2_buffer_done() on transition to DONE/ERROR. NULL when
+ * the driver did not opt in for this buffer.
+ */
+ struct dma_fence *release_fence;
+#endif
struct list_head queued_entry;
struct list_head done_entry;
#ifdef CONFIG_VIDEO_ADV_DEBUG
@@ -648,6 +658,19 @@ struct vb2_queue {
spinlock_t done_lock;
wait_queue_head_t done_wq;
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * dma_resv release-fence context. Drivers that set
+ * supports_release_fences and call
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a per-queue timeline.
+ */
+ u64 dma_resv_fence_context;
+ atomic64_t dma_resv_fence_seqno;
+ spinlock_t dma_resv_fence_lock;
+#endif
+
+ unsigned int supports_release_fences:1;
unsigned int streaming:1;
unsigned int start_streaming_called:1;
unsigned int error:1;
@@ -735,6 +758,34 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no);
*/
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
+/**
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
+ * @vb: the buffer being committed to the producer.
+ *
+ * Drivers that have set vb2_queue::supports_release_fences may call
+ * this from any sleepable context where they have committed to
+ * running the operation in finite time -- typically m2m
+ * device_run(), just before the HW kick. The helper allocates a
+ * dma_fence on the queue's per-queue timeline, attaches it as
+ * DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes
+ * it in vb->release_fence. vb2_buffer_done() signals and puts the
+ * fence as part of the buffer's state transition.
+ *
+ * Skips planes whose vb2_plane.dbuf is NULL -- buffers never
+ * exported via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF)
+ * have no dmabuf for userspace to wait on.
+ *
+ * No-op when vb2_queue::supports_release_fences is not set
+ * (regardless of CONFIG_VIDEOBUF2_RELEASE_FENCES). When
+ * CONFIG_VIDEOBUF2_RELEASE_FENCES=n, this is a stub that returns 0.
+ *
+ * Returns 0 on success or when the no-op stub is in effect,
+ * negative errno on allocation failure when fence publishing was
+ * attempted. Best-effort: drivers should ignore the return value
+ * unless they want diagnostics.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
+
/**
* vb2_discard_done() - discard all buffers marked as DONE.
* @q: pointer to &struct vb2_queue with videobuf2 queue.
--
2.53.0
@@ -0,0 +1,95 @@
From 1844c263bde8dd244d7db46f8c508e7c70da459c Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:24:01 +0200
Subject: [PATCH RFC v2] media: hantro: attach dma_resv release fence at
device_run
Opt the hantro driver into the new vb2 release-fence helper so its
CAPTURE-side dmabufs carry a real producer fence that wayland
compositors and other implicit-sync consumers can wait on, instead
of the dma_buf core's stub fence.
Attach point is m2m device_run, immediately after
v4l2_m2m_buf_copy_metadata() and before ctx->codec_ops->run().
Per Nicolas Dufresne's v1 review (lore.kernel.org/linux-media/
3d8deeb15581b754e4c061d4c4a13657aa08bc3c.camel@ndufresne.ca/),
this satisfies the dma_fence finite-time contract: the m2m core
has committed to running the job by this point, codec_ops->run
either kicks the HW (decode-complete signals the fence via
vb2_buffer_done) or fails immediately (job_finish with
VB2_BUF_STATE_ERROR signals with -EIO). PM and clocks are already
up by this point, so no allocation context restrictions.
The CAPTURE queue is opted in with supports_release_fences=true at
queue_init.
Userspace consumers that import hantro CAPTURE dmabufs and wait on
their implicit-sync fence (Wayland zwp_linux_dmabuf_v1 +
panfrost EGL_LINUX_DMA_BUF_EXT) now wait on a real fence
representing the producer's actual completion, fixing green-frame
corruption observed on RK3566 PineTab2 + Mali-G52 panfrost (the
GPU was sampling zero pages because the dmabuf's implicit fence
was the dma_buf core's pre-signalled stub).
Validated end-to-end on PineTab2 (RK3566 / hantro G1 / Mali-G52
mainline panfrost): 30s of bbb_1080p30 H.264 stateless decode +
zero-copy panfrost EGL import via dmabuf-wayland (mpv 0.41 +
KWin 6.6.4 + Mesa panfrost 26.0.5) renders correctly with no
green-frame corruption and no PROVE_LOCKING splats.
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
Cc: linux-media@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
.../media/platform/verisilicon/hantro_drv.c | 23 +++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c
index 2e81877f6..6a66c47ed 100644
--- a/drivers/media/platform/verisilicon/hantro_drv.c
+++ b/drivers/media/platform/verisilicon/hantro_drv.c
@@ -186,6 +186,22 @@ static void device_run(void *priv)
v4l2_m2m_buf_copy_metadata(src, dst);
+ /*
+ * Attach a producer fence on the CAPTURE-side dmabuf so userspace
+ * importers (e.g. Wayland compositors) get spec-clean implicit-sync
+ * semantics. Called from device_run rather than buf_queue: the
+ * dma_fence finite-time contract requires that once a fence is
+ * published, the producer must signal it in finite time. By the
+ * time we reach device_run, the m2m core has committed to running
+ * this job, and the next hop (codec_ops->run) either kicks the HW
+ * (decode-complete signals the fence via vb2_buffer_done) or
+ * fails immediately (job_finish with VB2_BUF_STATE_ERROR signals
+ * the fence with -EIO). Either path resolves the fence in finite
+ * time. Best-effort: a NOMEM here means we lose implicit-sync
+ * precision for this frame, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf);
+
if (ctx->codec_ops->run(ctx))
goto err_cancel_job;
@@ -249,6 +265,13 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
dst_vq->lock = &ctx->dev->vpu_mutex;
dst_vq->dev = ctx->dev->v4l2_dev.dev;
+ /*
+ * Opt the CAPTURE queue into vb2 release-fence publishing.
+ * No-op unless CONFIG_VIDEOBUF2_RELEASE_FENCES=y; runtime cost
+ * is one extra fence allocation + dma_resv update per device_run.
+ */
+ dst_vq->supports_release_fences = true;
+
return vb2_queue_init(dst_vq);
}
--
2.53.0
@@ -0,0 +1,117 @@
From 2c63a63bf65739763051dc4ce7ce2ffaf2d514c4 Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:50:51 +0200
Subject: [PATCH RFC v2] media: rockchip-rga: attach dma_resv release fence at
device_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Opt the rockchip-rga driver into the new vb2 release-fence helper.
Same shape as the hantro patch: attach a producer fence on the
CAPTURE-side dmabuf at m2m device_run, signalled by
vb2_buffer_done() when RGA completes the m2m operation.
Differs from hantro in one mechanical detail: rga's device_run
wraps the entire body in spin_lock_irqsave(&rga->ctrl_lock). Our
helper calls dma_resv_lock(), which is sleepable, so the
buffer-fetch + fence-attach sequence has to run above the spinlock.
Restructure device_run so:
- v4l2_m2m_next_src_buf / next_dst_buf,
- src->sequence increment,
- vb2_buffer_attach_release_fence(&dst->vb2_buf)
run before spin_lock_irqsave; only the rga->curr assignment and
rga_hw_start() (the actual HW kick) remain inside the spinlock.
This is safe under the m2m-job ownership model: by the time
device_run is called, the m2m core has selected this context and
serializes one device_run per context, so v4l2_m2m_next_*_buf
returns stable pointers until the corresponding *_buf_remove in
rga_isr. ctrl_lock was previously protecting per-device state
(rga->curr) and the HW register access, neither of which depends on
the buffer-fetch happening inside the lock.
The CAPTURE queue is opted in with supports_release_fences=true at
queue_init.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows on Rockchip
boards) get spec-clean implicit-sync semantics, matching what
hantro does in the previous patch in this series.
Sven Püschel's ongoing "media: platform: rga: Add RGA3 support"
v5 series (linux-rockchip 2026-04-28) restructures rga.c
substantially. If that lands first, the device_run restructure
here will need a rebase against the new shape; the locking story
itself is invariant.
Cc: Jacob Chen <jacob-chen@iotwrt.com>
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: Sven Püschel <s.pueschel@pengutronix.de>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: linux-media@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga.c | 27 +++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
index fea63b94c..03030c7ea 100644
--- a/drivers/media/platform/rockchip/rga/rga.c
+++ b/drivers/media/platform/rockchip/rga/rga.c
@@ -38,15 +38,28 @@ static void device_run(void *prv)
struct vb2_v4l2_buffer *src, *dst;
unsigned long flags;
- spin_lock_irqsave(&rga->ctrl_lock, flags);
-
- rga->curr = ctx;
-
+ /*
+ * Fetch the next-job buffers and (best-effort) attach a producer
+ * fence on CAPTURE before taking ctrl_lock below.
+ * vb2_buffer_attach_release_fence() takes dma_resv_lock, which is
+ * sleepable; ctrl_lock is taken with spin_lock_irqsave so any
+ * sleepable call must happen above it. Buffer ownership is
+ * already committed at this point: the m2m core has selected
+ * this context for device_run and serializes one device_run per
+ * context, so v4l2_m2m_next_*_buf returns stable pointers until
+ * the corresponding *_buf_remove in rga_isr.
+ */
src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
src->sequence = ctx->osequence++;
dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf);
+
+ spin_lock_irqsave(&rga->ctrl_lock, flags);
+
+ rga->curr = ctx;
+
rga_hw_start(rga, vb_to_rga(src), vb_to_rga(dst));
spin_unlock_irqrestore(&rga->ctrl_lock, flags);
@@ -123,6 +136,12 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
dst_vq->lock = &ctx->rga->mutex;
dst_vq->dev = ctx->rga->v4l2_dev.dev;
+ /*
+ * Opt the CAPTURE queue into vb2 release-fence publishing.
+ * Compile-time gated by CONFIG_VIDEOBUF2_RELEASE_FENCES.
+ */
+ dst_vq->supports_release_fences = true;
+
return vb2_queue_init(dst_vq);
}
--
2.53.0
+134
View File
@@ -0,0 +1,134 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# linux-fresnel-fourier — overclocked + DTS-tweaked Pinebook Pro kernel
# baselined on mmind/linux-rockchip v7.0.
#
# Coexists with linux-eos-arm; ships its own /boot/{Image,dtbs}-fresnel-fourier
# paths. Extlinux entry adds itself as a parallel boot option; user picks at
# u-boot menu. Reverting = boot the linux-eos-arm entry.
pkgbase=linux-fresnel-fourier
pkgname=("$pkgbase" "$pkgbase-headers")
pkgver=7.0
pkgrel=14
pkgdesc='Pinebook Pro kernel (mmind/linux-rockchip v7.0 + OC OPP + PBP DTS tweaks + vb2_dma_resv RFC v2)'
arch=(aarch64)
url='https://git.reauktion.de/marfrit/kernel-agent'
license=(GPL-2.0-only)
makedepends=(
bc cpio gettext kmod libelf pahole perl python tar xz
ccache
uboot-tools dtc
)
options=('!strip')
source=(
"https://git.kernel.org/torvalds/t/linux-${pkgver}.tar.gz"
# board/pinebook-pro
'0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch'
'0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch'
'0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch'
# subsystem/media/videobuf2/dma-resv-release-fence (RFC v2, in kernel-agent)
'0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch'
'0005-media-hantro-attach-dma_resv-release-fence-at-device.patch'
'0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch'
'config' # snapshot of fresnel /usr/lib/modules/6.19.10-1-eos-arm/build/.config
'linux-fresnel-fourier.preset'
'extlinux-add.hook'
'extlinux-add.sh'
)
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
_kernver=${pkgver}.0-fresnel-fourier
_srcdir=linux-${pkgver}
prepare() {
cd "${_srcdir}"
echo ":: applying patches"
for p in "${srcdir}"/*.patch; do
echo " $(basename "$p")"
patch -Np1 -i "$p"
done
echo ":: writing config"
cp "${srcdir}/config" .config
# Force EXTRAVERSION suffix so the kernel version string and module dir
# become ${_kernver} — keeps them disjoint from linux-eos-arm.
scripts/config --set-str LOCALVERSION "-fresnel-fourier"
scripts/config -d LOCALVERSION_AUTO
echo ":: olddefconfig (accept new symbols sensibly)"
make olddefconfig
make -s kernelrelease > version
echo ":: kernel release: $(<version)"
}
build() {
cd "${_srcdir}"
unset LDFLAGS
# Native build only — no distcc per kernel-agent policy
# (feedback_kernel_agent_no_distcc.md). ccache stays.
export CC="ccache gcc"
export HOSTCC="ccache gcc"
make ${MAKEFLAGS:--j$(nproc)} Image modules dtbs
}
_package() {
pkgdesc='Pinebook Pro overclocked kernel (fresnel-fourier baseline)'
depends=(coreutils kmod mkinitcpio uboot-tools)
optdepends=('linux-firmware: firmware images needed for some devices')
backup=("etc/mkinitcpio.d/${pkgbase}.preset")
cd "${_srcdir}"
local _kver
_kver=$(<version)
install -Dm644 arch/arm64/boot/Image \
"${pkgdir}/boot/Image-fresnel-fourier"
# DTBs into a private tree to avoid clobbering linux-eos-arm
install -d "${pkgdir}/boot/dtbs-fresnel-fourier/rockchip"
cp -a arch/arm64/boot/dts/rockchip/*.dtb \
"${pkgdir}/boot/dtbs-fresnel-fourier/rockchip/"
ZSTD_CLEVEL=19 make INSTALL_MOD_PATH="${pkgdir}/usr" \
INSTALL_MOD_STRIP=1 modules_install
rm -f "${pkgdir}/usr/lib/modules/${_kver}/"{source,build}
install -Dm644 "${srcdir}/${pkgbase}.preset" \
"${pkgdir}/etc/mkinitcpio.d/${pkgbase}.preset"
install -Dm755 "${srcdir}/extlinux-add.hook" \
"${pkgdir}/usr/share/libalpm/hooks/95-${pkgbase}-extlinux.hook"
install -Dm755 "${srcdir}/extlinux-add.sh" \
"${pkgdir}/usr/share/libalpm/scripts/${pkgbase}-extlinux"
}
_package-headers() {
pkgdesc='Headers and scripts for the linux-fresnel-fourier kernel'
depends=(pahole)
cd "${_srcdir}"
local _kver _builddir
_kver=$(<version)
_builddir="${pkgdir}/usr/lib/modules/${_kver}/build"
install -Dt "${_builddir}" -m644 .config Makefile Module.symvers System.map vmlinux version
install -Dt "${_builddir}/kernel" -m644 kernel/Makefile
cp -a scripts "${_builddir}"
install -Dt "${_builddir}/arch/arm64" -m644 arch/arm64/Makefile
cp -a arch/arm64/include "${_builddir}/arch/arm64/"
cp -a include "${_builddir}/"
find . -name 'Kbuild' -exec install -Dm644 {} "${_builddir}/{}" \;
find . -name 'Kconfig*' -exec install -Dm644 {} "${_builddir}/{}" \;
install -d "${pkgdir}/usr/src"
ln -sr "${_builddir}" "${pkgdir}/usr/src/${pkgbase}"
}
eval "package_${pkgbase}() { _package; }"
eval "package_${pkgbase}-headers() { _package-headers; }"
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,13 @@
[Trigger]
Operation = Install
Operation = Upgrade
Operation = Remove
Type = Path
Target = boot/Image-fresnel-fourier
Target = boot/initramfs-fresnel-fourier.img
Target = boot/dtbs-fresnel-fourier/rockchip/rk3399-pinebook-pro.dtb
[Action]
Description = Updating extlinux entry for linux-fresnel-fourier
When = PostTransaction
Exec = /usr/share/libalpm/scripts/linux-fresnel-fourier-extlinux
@@ -0,0 +1,44 @@
#!/bin/sh
# Add / update / remove the linux-fresnel-fourier entry in /boot/extlinux/extlinux.conf
# Idempotent. Coexists with the existing linux-eos-arm entry; never edits it.
set -eu
CONF="/boot/extlinux/extlinux.conf"
TAG_BEGIN="# >>> linux-fresnel-fourier (managed) >>>"
TAG_END="# <<< linux-fresnel-fourier (managed) <<<"
# What the entry should look like. APPEND copied from the existing
# linux-eos-arm entry where possible; otherwise sane PBP default.
EOS_APPEND=$(awk '/^[[:space:]]*APPEND/{print substr($0,index($0,"APPEND")+7); exit}' "$CONF" 2>/dev/null || true)
APPEND="${EOS_APPEND:-console=ttyS2,1500000 root=PARTLABEL=root rw rootwait}"
ENTRY=$(cat <<EOF
${TAG_BEGIN}
LABEL fresnel-fourier
MENU LABEL fresnel-fourier OC kernel ($(uname -m))
LINUX /Image-fresnel-fourier
INITRD /initramfs-fresnel-fourier.img
FDT /dtbs-fresnel-fourier/rockchip/rk3399-pinebook-pro.dtb
APPEND ${APPEND}
${TAG_END}
EOF
)
# Strip any prior managed block first, then append fresh
TMP=$(mktemp)
awk -v b="$TAG_BEGIN" -v e="$TAG_END" '
$0==b{skip=1; next}
$0==e{skip=0; next}
!skip{print}
' "$CONF" > "$TMP"
# If the kernel files no longer exist (post-Remove), don't re-add the entry.
if [ -f "/boot/Image-fresnel-fourier" ] && [ -f "/boot/dtbs-fresnel-fourier/rockchip/rk3399-pinebook-pro.dtb" ]; then
printf '%s\n' "$ENTRY" >> "$TMP"
echo "linux-fresnel-fourier: extlinux entry updated"
else
echo "linux-fresnel-fourier: kernel files absent, entry removed"
fi
mv "$TMP" "$CONF"
@@ -0,0 +1,11 @@
# mkinitcpio preset for linux-fresnel-fourier
ALL_kver="/boot/Image-fresnel-fourier"
ALL_microcode=()
PRESETS=('default' 'fallback')
default_image="/boot/initramfs-fresnel-fourier.img"
fallback_image="/boot/initramfs-fresnel-fourier-fallback.img"
fallback_options="-S autodetect"
+6 -3
View File
@@ -3,15 +3,18 @@
# Source of truth: git.reauktion.de/marfrit/lmcp
pkgname=lmcp
pkgver=0.5.4
pkgver=1.2.1
pkgrel=1
pkgdesc="Lightweight MCP (Model Context Protocol) server in pure Lua"
arch=('any')
url="https://git.reauktion.de/marfrit/lmcp"
license=('MIT')
depends=('lua' 'lua-socket')
source=("${pkgname}-${pkgver}.tar.gz::https://git.reauktion.de/marfrit/lmcp/archive/v${pkgver}.tar.gz")
sha256sums=('af72b8c1d88255456b75d2c53cd5c451a8923417e5498ef31858539397e09caf')
# The _tag back-translation handles both clean releases (no '_') and
# pre-release pkgvers (e.g. 1.2.0_rc1 → v1.2.0-rc1).
_tag="v${pkgver//_/-}"
source=("${pkgname}-${pkgver}.tar.gz::https://git.reauktion.de/marfrit/lmcp/archive/${_tag}.tar.gz")
sha256sums=('bf9cce1a84c66b1b74c5aec923c5960d60ae33c221afc8d47ce0d74b8f7ee609')
package() {
cd "${pkgname}"
@@ -0,0 +1,56 @@
From 9d3bbd3651eb8405b8609e4f5e8c4978056483d0 Mon Sep 17 00:00:00 2001
From: Jonas Karlman <jonas@kwiboo.se>
Date: Sun, 18 Aug 2024 17:42:14 -0700
Subject: [PATCH 1/2] meson: add detection logic for v4l2request support
We will probably adjust this to look for a specific libavutil version after
v4l2request support is merged upstream, but this check is fine for now.
---
meson.build | 11 +++++++++++
meson.options | 1 +
2 files changed, 12 insertions(+)
diff --git a/meson.build b/meson.build
index d4c75a907f..540f279dc7 100644
--- a/meson.build
+++ b/meson.build
@@ -1444,6 +1444,16 @@ if features['ios-gl']
sources += files('video/out/hwdec/hwdec_ios_gl.m')
endif
+v4l2request = get_option('v4l2request').require(
+ cc.has_header_symbol('libavutil/hwcontext.h',
+ 'AV_HWDEVICE_TYPE_V4L2REQUEST',
+ dependencies: libavutil)
+)
+features += {'v4l2request': v4l2request.allowed()}
+if features['v4l2request']
+ sources += files('video/v4l2request.c')
+endif
+
libva = dependency('libva', version: '>= 1.1.0', required: get_option('vaapi'))
vaapi_drm = dependency('libva-drm', version: '>= 1.1.0', required:
@@ -1911,6 +1921,7 @@ summary({'cocoa': features['cocoa'] and features['swift'],
'libmpv': get_option('libmpv'),
'lua': features['lua'],
'opengl': features['gl'],
+ 'v4l2request': features['v4l2request'],
'vulkan': features['vulkan'],
'wayland': features['wayland'],
'x11': features['x11']},
diff --git a/meson.options b/meson.options
index 836d16d03f..54ec2dccfc 100644
--- a/meson.options
+++ b/meson.options
@@ -103,6 +103,7 @@ option('d3d-hwaccel', type: 'feature', value: 'auto', description: 'D3D11VA hwac
option('d3d9-hwaccel', type: 'feature', value: 'auto', description: 'DXVA2 hwaccel')
option('gl-dxinterop-d3d9', type: 'feature', value: 'auto', description: 'OpenGL/DirectX DXVA2 hwaccel')
option('ios-gl', type: 'feature', value: 'auto', description: 'iOS OpenGL ES interop support')
+option('v4l2request', type: 'feature', value: 'auto', description: 'V4L2 Request API hwaccel')
option('videotoolbox-gl', type: 'feature', value: 'auto', description: 'Videotoolbox with OpenGL')
option('videotoolbox-pl', type: 'feature', value: 'auto', description: 'Videotoolbox with libplacebo')
--
2.52.0
@@ -0,0 +1,81 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: Fri, 8 May 2026 23:30:00 +0000
Subject: [PATCH] vo_dmabuf_wayland: explicit DMA_BUF_IOCTL_SYNC on import fds
V4L2 does not attach implicit fences (dma_resv) to CAPTURE buffers
on VIDIOC_DQBUF. When the buffer is forwarded to a Wayland compositor
that imports it via wl_dmabuf and samples in the GPU, the GPU may
read from physical memory before the producer's writes have flushed,
producing all-zero output (manifests as solid green for BT.601
limited-range YUV(0,0,0) -> RGB(0, 135, 0) on the consumer side).
Issue an explicit DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW) +
SYNC_END(SYNC_RW) round-trip on each unique dma_buf fd before
zwp_linux_buffer_params_v1_add(). This invokes the producer driver's
dma_buf_ops->begin_cpu_access / end_cpu_access, which on most ARM
SoCs flushes write buffers and synchronizes coherent memory before
the compositor's GPU import.
This is a userspace workaround. Root cause is the missing implicit
fence on V4L2 CAPTURE DQBUF and is being addressed upstream via
the vb2_dma_resv RFC.
Without this patch, on RK3566 (hantro VPU + Mali-G52 panfrost +
KDE Plasma 6 / KWin 6.6.4), `mpv --hwdec=vaapi --vo=dmabuf-wayland`
shows solid green frames for all hardware-decoded content. With
this patch, decoded frames are presented correctly.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
---
diff --git a/video/out/vo_dmabuf_wayland.c b/video/out/vo_dmabuf_wayland.c
index 6b7c511..16e3d18 100644
--- a/video/out/vo_dmabuf_wayland.c
+++ b/video/out/vo_dmabuf_wayland.c
@@ -27,6 +27,12 @@
#include <va/va_drmcommon.h>
#endif
+/* fourier patch: explicit dma_buf cache sync workaround for missing
+ * implicit-fence on V4L2 stateless CAPTURE buffers. Applies to both
+ * VAAPI and DRMPrime import paths. */
+#include <linux/dma-buf.h>
+#include <sys/ioctl.h>
+
#include "gpu/hwdec.h"
#include "gpu/video.h"
#include "mpv_talloc.h"
@@ -205,6 +211,14 @@ static void vaapi_dmabuf_importer(struct buffer *buf, struct mp_image *src,
buf->drm_format = 0;
goto done;
}
+ /* fourier patch: explicit cache coherency sync on each dma_buf fd
+ * before submitting to the compositor. See top-of-file comment. */
+ for (int obj_no = 0; obj_no < desc.num_objects; obj_no++) {
+ struct dma_buf_sync sync = { .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW };
+ (void)ioctl(desc.objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync);
+ sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW;
+ (void)ioctl(desc.objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync);
+ }
for (int plane_no = 0; plane_no < desc.layers[layer_no].num_planes; ++plane_no) {
int object = desc.layers[layer_no].object_index[plane_no];
uint64_t modifier = desc.objects[object].drm_format_modifier;
@@ -258,6 +272,16 @@ static void drmprime_dmabuf_importer(struct buffer *buf, struct mp_image *src,
return;
buf->id = drmprime_surface_id(src);
+
+ /* fourier patch: explicit cache coherency sync on each dma_buf fd
+ * before submitting to the compositor. See top-of-file comment. */
+ for (int obj_no = 0; obj_no < desc->nb_objects; obj_no++) {
+ struct dma_buf_sync sync = { .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW };
+ (void)ioctl(desc->objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync);
+ sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW;
+ (void)ioctl(desc->objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync);
+ }
+
for (layer_no = 0; layer_no < desc->nb_layers; layer_no++) {
AVDRMLayerDescriptor layer = desc->layers[layer_no];
--
2.51.0
@@ -0,0 +1,435 @@
From dd1e1fd6fe884d66c49dc26af715e1423c7471a3 Mon Sep 17 00:00:00 2001
From: Philip Langdale <philipl@overt.org>
Date: Sun, 18 Aug 2024 17:43:41 -0700
Subject: [PATCH 2/2] vo: hwdec: drmprime: add separate hwdecs for v4l2request
With all the machinery in place, we can now add the v4l2request hwdecs with a
different hw device type, and a different initialisation path. This applies to
both the drmprime and drmprime_overlay hwdecs.
At the moment, the device initialisation is done in the bare minimum way, but
it can be extended to take a device path (for example) if that makes sense as
we better understand what meaningful configuration will be.
Co-authored-by: Jonas Karlman <jonas@kwiboo.se>
---
video/hwdec.c | 3 +
video/hwdec.h | 1 +
video/out/gpu/hwdec.c | 6 ++
video/out/hwdec/hwdec_drmprime.c | 125 +++++++++++++++++------
video/out/hwdec/hwdec_drmprime_overlay.c | 81 +++++++++++++--
video/out/vo_dmabuf_wayland.c | 1 +
video/v4l2request.c | 34 ++++++
7 files changed, 210 insertions(+), 41 deletions(-)
create mode 100644 video/v4l2request.c
diff --git a/video/hwdec.c b/video/hwdec.c
index deba518e82..de2ffecc40 100644
--- a/video/hwdec.c
+++ b/video/hwdec.c
@@ -125,6 +125,9 @@ static const struct hwcontext_fns *const hwcontext_fns[] = {
#if HAVE_DRM
&hwcontext_fns_drmprime,
#endif
+#if HAVE_V4L2REQUEST
+ &hwcontext_fns_v4l2request,
+#endif
#if HAVE_VAAPI
&hwcontext_fns_vaapi,
#endif
diff --git a/video/hwdec.h b/video/hwdec.h
index e7734e5d7e..bf337389cb 100644
--- a/video/hwdec.h
+++ b/video/hwdec.h
@@ -119,6 +119,7 @@ extern const struct hwcontext_fns hwcontext_fns_cuda;
extern const struct hwcontext_fns hwcontext_fns_d3d11;
extern const struct hwcontext_fns hwcontext_fns_drmprime;
extern const struct hwcontext_fns hwcontext_fns_dxva2;
+extern const struct hwcontext_fns hwcontext_fns_v4l2request;
extern const struct hwcontext_fns hwcontext_fns_vaapi;
extern const struct hwcontext_fns hwcontext_fns_vdpau;
diff --git a/video/out/gpu/hwdec.c b/video/out/gpu/hwdec.c
index be39c507d0..f50b927851 100644
--- a/video/out/gpu/hwdec.c
+++ b/video/out/gpu/hwdec.c
@@ -38,6 +38,8 @@ extern const struct ra_hwdec_driver ra_hwdec_drmprime;
extern const struct ra_hwdec_driver ra_hwdec_drmprime_overlay;
extern const struct ra_hwdec_driver ra_hwdec_aimagereader;
extern const struct ra_hwdec_driver ra_hwdec_vulkan;
+extern const struct ra_hwdec_driver ra_hwdec_v4l2request;
+extern const struct ra_hwdec_driver ra_hwdec_v4l2request_overlay;
const struct ra_hwdec_driver *const ra_hwdec_drivers[] = {
#if HAVE_D3D_HWACCEL
@@ -73,6 +75,10 @@ const struct ra_hwdec_driver *const ra_hwdec_drivers[] = {
&ra_hwdec_drmprime,
&ra_hwdec_drmprime_overlay,
#endif
+#if HAVE_V4L2REQUEST
+ &ra_hwdec_v4l2request,
+ &ra_hwdec_v4l2request_overlay,
+#endif
#if HAVE_ANDROID_MEDIA_NDK
&ra_hwdec_aimagereader,
#endif
diff --git a/video/out/hwdec/hwdec_drmprime.c b/video/out/hwdec/hwdec_drmprime.c
index 7869eb124a..446f63de44 100644
--- a/video/out/hwdec/hwdec_drmprime.c
+++ b/video/out/hwdec/hwdec_drmprime.c
@@ -77,7 +77,7 @@ static const char *forked_pix_fmt_names[] = {
"rpi4_10",
};
-static int init(struct ra_hwdec *hw)
+static int pre_init(struct ra_hwdec *hw)
{
struct priv_owner *p = hw->priv;
@@ -92,36 +92,12 @@ static int init(struct ra_hwdec *hw)
return -1;
}
- /*
- * The drm_params resource is not provided when using X11 or Wayland, but
- * there are extensions that supposedly provide this information from the
- * drivers. Not properly documented. Of course.
- */
- mpv_opengl_drm_params_v2 *params = ra_get_native_resource(hw->ra_ctx->ra,
- "drm_params_v2");
-
- /*
- * Respect drm_device option, so there is a way to control this when not
- * using a DRM gpu context. If drm_params_v2 are present, they will already
- * respect this option.
- */
- void *tmp = talloc_new(NULL);
- struct drm_opts *drm_opts = mp_get_config_group(tmp, hw->global, &drm_conf);
- const char *opt_path = drm_opts->device_path;
-
- const char *device_path = params && params->render_fd > -1 ?
- drmGetRenderDeviceNameFromFd(params->render_fd) :
- opt_path ? opt_path : "/dev/dri/renderD128";
- MP_VERBOSE(hw, "Using DRM device: %s\n", device_path);
+ return 0;
+}
- int ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
- AV_HWDEVICE_TYPE_DRM,
- device_path, NULL, 0);
- talloc_free(tmp);
- if (ret != 0) {
- MP_VERBOSE(hw, "Failed to create hwdevice_ctx: %s\n", av_err2str(ret));
- return -1;
- }
+static int post_init(struct ra_hwdec *hw)
+{
+ struct priv_owner *p = hw->priv;
/*
* At the moment, there is no way to discover compatible formats
@@ -154,6 +130,75 @@ static int init(struct ra_hwdec *hw)
return 0;
}
+static int init_drmprime(struct ra_hwdec *hw)
+{
+ struct priv_owner *p = hw->priv;
+
+ int ret = pre_init(hw);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The drm_params resource is not provided when using X11 or Wayland, but
+ * there are extensions that supposedly provide this information from the
+ * drivers. Not properly documented. Of course.
+ */
+ mpv_opengl_drm_params_v2 *params = ra_get_native_resource(hw->ra_ctx->ra,
+ "drm_params_v2");
+
+ /*
+ * Respect drm_device option, so there is a way to control this when not
+ * using a DRM gpu context. If drm_params_v2 are present, they will already
+ * respect this option.
+ */
+ void *tmp = talloc_new(NULL);
+ struct drm_opts *drm_opts = mp_get_config_group(tmp, hw->global, &drm_conf);
+ const char *opt_path = drm_opts->device_path;
+
+ const char *device_path = params && params->render_fd > -1 ?
+ drmGetRenderDeviceNameFromFd(params->render_fd) :
+ opt_path ? opt_path : "/dev/dri/renderD128";
+ MP_VERBOSE(hw, "Using DRM device: %s\n", device_path);
+
+ ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
+ AV_HWDEVICE_TYPE_DRM,
+ device_path, NULL, 0);
+ talloc_free(tmp);
+ if (ret < 0) {
+ MP_VERBOSE(hw, "Failed to create hwdevice_ctx: %s\n", av_err2str(ret));
+ return ret;
+ }
+
+ return post_init(hw);
+}
+
+#if HAVE_V4L2REQUEST
+static int init_v4l2request(struct ra_hwdec *hw)
+{
+ struct priv_owner *p = hw->priv;
+
+ int ret = pre_init(hw);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * AVCodecHWConfig contains a combo of a pixel format and hwdevice type,
+ * correct type must be created here or hwaccel will fail.
+ *
+ * FIXME: Create hwdevice based on type in AVCodecHWConfig
+ */
+ ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
+ AV_HWDEVICE_TYPE_V4L2REQUEST,
+ NULL, NULL, 0);
+ if (ret < 0) {
+ MP_VERBOSE(hw, "Failed to create hwdevice_ctx: %s\n", av_err2str(ret));
+ return ret;
+ }
+
+ return post_init(hw);
+}
+#endif
+
static void mapper_unmap(struct ra_hwdec_mapper *mapper)
{
struct priv_owner *p_owner = mapper->owner->priv;
@@ -308,7 +353,7 @@ const struct ra_hwdec_driver ra_hwdec_drmprime = {
.priv_size = sizeof(struct priv_owner),
.imgfmts = {IMGFMT_DRMPRIME, 0},
.device_type = AV_HWDEVICE_TYPE_DRM,
- .init = init,
+ .init = init_drmprime,
.uninit = uninit,
.mapper = &(const struct ra_hwdec_mapper_driver){
.priv_size = sizeof(struct dmabuf_interop_priv),
@@ -318,3 +363,21 @@ const struct ra_hwdec_driver ra_hwdec_drmprime = {
.unmap = mapper_unmap,
},
};
+
+#if HAVE_V4L2REQUEST
+const struct ra_hwdec_driver ra_hwdec_v4l2request = {
+ .name = "v4l2request",
+ .priv_size = sizeof(struct priv_owner),
+ .imgfmts = {IMGFMT_DRMPRIME, 0},
+ .device_type = AV_HWDEVICE_TYPE_V4L2REQUEST,
+ .init = init_v4l2request,
+ .uninit = uninit,
+ .mapper = &(const struct ra_hwdec_mapper_driver){
+ .priv_size = sizeof(struct dmabuf_interop_priv),
+ .init = mapper_init,
+ .uninit = mapper_uninit,
+ .map = mapper_map,
+ .unmap = mapper_unmap,
+ },
+};
+#endif
diff --git a/video/out/hwdec/hwdec_drmprime_overlay.c b/video/out/hwdec/hwdec_drmprime_overlay.c
index 61514f8e89..689e9b04e5 100644
--- a/video/out/hwdec/hwdec_drmprime_overlay.c
+++ b/video/out/hwdec/hwdec_drmprime_overlay.c
@@ -246,7 +246,7 @@ static void uninit(struct ra_hwdec *hw)
}
}
-static int init(struct ra_hwdec *hw)
+static int pre_init(struct ra_hwdec *hw)
{
struct priv *p = hw->priv;
int draw_plane, drmprime_video_plane;
@@ -267,15 +267,15 @@ static int init(struct ra_hwdec *hw)
drm_params->connector_id, draw_plane, drmprime_video_plane);
if (!p->ctx) {
mp_err(p->log, "Failed to retrieve DRM atomic context.\n");
- goto err;
+ return -1;
}
if (!p->ctx->drmprime_video_plane) {
mp_warn(p->log, "No drmprime video plane. You might need to specify it manually using --drm-drmprime-video-plane\n");
- goto err;
+ return -1;
}
} else {
mp_verbose(p->log, "Failed to retrieve DRM fd from native display.\n");
- goto err;
+ return -1;
}
drmModeCrtcPtr crtc;
@@ -289,7 +289,7 @@ static int init(struct ra_hwdec *hw)
uint64_t has_prime;
if (drmGetCap(p->ctx->fd, DRM_CAP_PRIME, &has_prime) < 0) {
MP_ERR(hw, "Card does not support prime handles.\n");
- goto err;
+ return -1;
}
if (has_prime) {
@@ -298,19 +298,67 @@ static int init(struct ra_hwdec *hw)
disable_video_plane(hw);
+ return 0;
+}
+
+static int init_drmprime(struct ra_hwdec *hw)
+{
+ struct priv *p = hw->priv;
+
+ int ret = pre_init(hw);
+ if (ret < 0)
+ goto err;
+
p->hwctx = (struct mp_hwdec_ctx) {
.driver_name = hw->driver->name,
.hw_imgfmt = IMGFMT_DRMPRIME,
};
char *device = drmGetDeviceNameFromFd2(p->ctx->fd);
- int ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
- AV_HWDEVICE_TYPE_DRM, device, NULL, 0);
+ ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
+ AV_HWDEVICE_TYPE_DRM, device, NULL, 0);
if (device)
free(device);
- if (ret != 0) {
+ if (ret < 0) {
+ MP_VERBOSE(hw, "Failed to create hwdevice_ctx: %s\n", av_err2str(ret));
+ goto err;
+ }
+
+ hwdec_devices_add(hw->devs, &p->hwctx);
+
+ return 0;
+
+err:
+ uninit(hw);
+ return ret;
+}
+
+#if HAVE_V4L2REQUEST
+static int init_v4l2request(struct ra_hwdec *hw)
+{
+ struct priv *p = hw->priv;
+
+ int ret = pre_init(hw);
+ if (ret < 0)
+ goto err;
+
+ p->hwctx = (struct mp_hwdec_ctx) {
+ .driver_name = hw->driver->name,
+ .hw_imgfmt = IMGFMT_DRMPRIME,
+ };
+
+ /*
+ * AVCodecHWConfig contains a combo of a pixel format and hwdevice type,
+ * correct type must be created here or hwaccel will fail.
+ *
+ * FIXME: Create hwdevice based on type in AVCodecHWConfig
+ */
+ ret = av_hwdevice_ctx_create(&p->hwctx.av_device_ref,
+ AV_HWDEVICE_TYPE_V4L2REQUEST,
+ NULL, NULL, 0);
+ if (ret < 0) {
MP_VERBOSE(hw, "Failed to create hwdevice_ctx: %s\n", av_err2str(ret));
goto err;
}
@@ -321,15 +369,28 @@ static int init(struct ra_hwdec *hw)
err:
uninit(hw);
- return -1;
+ return ret;
}
+#endif
const struct ra_hwdec_driver ra_hwdec_drmprime_overlay = {
.name = "drmprime-overlay",
.priv_size = sizeof(struct priv),
.imgfmts = {IMGFMT_DRMPRIME, 0},
.device_type = AV_HWDEVICE_TYPE_DRM,
- .init = init,
+ .init = init_drmprime,
+ .overlay_frame = overlay_frame,
+ .uninit = uninit,
+};
+
+#if HAVE_V4L2REQUEST
+const struct ra_hwdec_driver ra_hwdec_v4l2request_overlay = {
+ .name = "v4l2request-overlay",
+ .priv_size = sizeof(struct priv),
+ .imgfmts = {IMGFMT_DRMPRIME, 0},
+ .device_type = AV_HWDEVICE_TYPE_V4L2REQUEST,
+ .init = init_v4l2request,
.overlay_frame = overlay_frame,
.uninit = uninit,
};
+#endif
diff --git a/video/out/vo_dmabuf_wayland.c b/video/out/vo_dmabuf_wayland.c
index 9b06643544..6d62849568 100644
--- a/video/out/vo_dmabuf_wayland.c
+++ b/video/out/vo_dmabuf_wayland.c
@@ -860,6 +860,7 @@ static int preinit(struct vo *vo)
// Initialize all possible hwdec drivers.
ra_hwdec_ctx_init(&p->hwdec_ctx, vo->hwdec_devs, "vaapi", false);
ra_hwdec_ctx_init(&p->hwdec_ctx, vo->hwdec_devs, "drmprime", false);
+ ra_hwdec_ctx_init(&p->hwdec_ctx, vo->hwdec_devs, "v4l2request", false);
p->src = (struct mp_rect){0, 0, 0, 0};
return 0;
diff --git a/video/v4l2request.c b/video/v4l2request.c
new file mode 100644
index 0000000000..2aa4d14fea
--- /dev/null
+++ b/video/v4l2request.c
@@ -0,0 +1,34 @@
+/*
+ * This file is part of mpv.
+ *
+ * mpv is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * mpv is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with mpv. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <libavutil/hwcontext.h>
+
+#include "hwdec.h"
+
+static struct AVBufferRef *v4l2request_create_standalone(struct mpv_global *global,
+ struct mp_log *log, struct hwcontext_create_dev_params *params)
+{
+ AVBufferRef* ref = NULL;
+ av_hwdevice_ctx_create(&ref, AV_HWDEVICE_TYPE_V4L2REQUEST, NULL, NULL, 0);
+
+ return ref;
+}
+
+const struct hwcontext_fns hwcontext_fns_v4l2request = {
+ .av_hwdevice_type = AV_HWDEVICE_TYPE_V4L2REQUEST,
+ .create_dev = v4l2request_create_standalone,
+};
--
2.52.0
+117
View File
@@ -0,0 +1,117 @@
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
#
# mpv-fourier — mpv with the fourier-umbrella patches. The first patch
# slot exists for the vo_dmabuf_wayland plane-semantics fix per
# marfrit/dmabuf-modifier-triage#1: mpv currently constructs the
# zwp_linux_buffer_params_v1 wl_protocol message with internally
# inconsistent plane semantics on V4L2 stateless decoder dmabufs (per-
# plane fds combined with single-allocation offset for plane 1), causing
# KWin to read the UV chroma plane past-EOF and render solid dark green
# on ohm (RK3566 + hantro G1 + Mali-G52). The patch is not yet written —
# this scaffold prepares the shipping path for when iter1 lands.
#
# Build configuration mirrors stock arch's mpv PKGBUILD (which builds
# successfully). Key detail: --auto-features auto is required because
# arch-meson defaults to --auto-features enabled, which makes mpv's
# platform-specific features (win32-threads, etc.) required and fails
# the build on Linux.
#
# Campaign: ~/src/dmabuf-modifier-triage/ (Phase 0 closed 2026-05-08)
# Upstream: https://github.com/mpv-player/mpv
pkgname=mpv-fourier
_upstreampkg=mpv
epoch=1
pkgver=0.41.0
pkgrel=10
pkgdesc='mpv with fourier-umbrella patches (vo_dmabuf_wayland explicit cache-sync workaround + v4l2request hwdec wiring)'
arch=('aarch64')
url='https://mpv.io/'
license=('GPL-2.0-or-later AND LGPL-2.1-or-later')
depends=(
alsa-lib desktop-file-utils ffmpeg-v4l2-request-fourier glibc hicolor-icon-theme
jack lcms2 libarchive libass libbluray libcdio
libcdio-paranoia libdisplay-info libdrm libdvdnav libdvdread libegl libgl
libglvnd libjpeg-turbo libplacebo libpulse libsixel libva
libvdpau libx11 libxext libxkbcommon libxpresent libxrandr
libxss libxv luajit mesa mujs libpipewire rubberband sdl2
openal uchardet vapoursynth vulkan-icd-loader wayland zlib
)
makedepends=(
git meson python-docutils ladspa wayland-protocols
vulkan-headers
)
optdepends=('yt-dlp: for video-sharing websites playback')
provides=("${_upstreampkg}=${pkgver}" 'libmpv.so')
conflicts=("${_upstreampkg}")
replaces=("${_upstreampkg}")
options=('!emptydirs')
source=(
"${_upstreampkg}-${pkgver}.tar.gz::https://github.com/mpv-player/${_upstreampkg}/archive/v${pkgver}/${_upstreampkg}-${pkgver}.tar.gz"
# Kwiboo + Langdale: wire AV_HWDEVICE_TYPE_V4L2REQUEST through mpv's drmprime VO hwdec
'0001-meson-add-detection-logic-for-v4l2request-support.patch'
'0002-vo-hwdec-drmprime-add-separate-hwdecs-for-v4l2reques.patch'
'0001-vo_dmabuf_wayland-explicit-cache-sync-on-import-fd.patch'
)
sha256sums=(
'ee21092a5ee427353392360929dc64645c54479aefdb5babc5cfbb5fad626209'
'302dc2a89f1dc3efb8fd7a035fd70bee4b343acf01fe78b294e8aeca0b221dc2'
'b6a71030017867c84692a8a198f9db143ce8394ab5a6ed3c705ea2d715d525fc'
'6c929bea7636b8d81b63a1275ba1d8a471fe2f249fc23509043ace6cf9b076a7'
)
prepare() {
cd "${_upstreampkg}-${pkgver}"
# Kwiboo + Langdale (2024-08): wire AV_HWDEVICE_TYPE_V4L2REQUEST through
# the drmprime VO hwdec so --hwdec=v4l2request engages on dmabuf-wayland.
# Without these, mpv reports "Could not create device" because the matcher
# filters by lavc_device type and v4l2request != AV_HWDEVICE_TYPE_DRM.
patch -p1 < "${srcdir}/0001-meson-add-detection-logic-for-v4l2request-support.patch"
patch -p1 < "${srcdir}/0002-vo-hwdec-drmprime-add-separate-hwdecs-for-v4l2reques.patch"
# iter1 of dmabuf-modifier-triage — explicit DMA_BUF_IOCTL_SYNC on import
# fds in vaapi_dmabuf_importer + drmprime_dmabuf_importer. Workaround for
# missing implicit-fence on V4L2 stateless CAPTURE buffers; root cause is
# being addressed upstream via the vb2_dma_resv RFC.
patch -p1 < "${srcdir}/0001-vo_dmabuf_wayland-explicit-cache-sync-on-import-fd.patch"
}
build() {
local _meson_options=(
--auto-features auto
-Dv4l2request=enabled
-Dlibmpv=true
-Dgl-x11=enabled
-Dcaca=disabled
-Dcdda=enabled
-Ddrm=enabled
-Ddvbin=enabled
-Ddvdnav=enabled
-Dlibarchive=enabled
-Dopenal=enabled
-Dsdl2-audio=enabled
-Dsdl2-video=enabled
-Dsdl2-gamepad=enabled
)
arch-meson "${_upstreampkg}-${pkgver}" build "${_meson_options[@]}"
meson compile -C build
}
package() {
meson install -C build --destdir "${pkgdir}"
# Drop private linkage entries (they only matter for static linking).
sed -i -e '/Requires.private/d' -e '/Libs.private/d' \
"${pkgdir}/usr/lib/pkgconfig/mpv.pc"
install -m0644 "${_upstreampkg}-${pkgver}/DOCS/encoding.rst" \
"${_upstreampkg}-${pkgver}/DOCS/tech-overview.txt" \
-t "${pkgdir}/usr/share/doc/mpv/"
install -m0644 "${_upstreampkg}-${pkgver}"/TOOLS/{umpv,mpv_identify.sh,stats-conv.py,idet.sh,lua/*} \
-D -t "${pkgdir}/usr/share/mpv/scripts/"
}
+40
View File
@@ -0,0 +1,40 @@
# mpv-fourier
mpv with the fourier-umbrella patches.
## Why
The dmabuf-modifier-triage campaign isolated the green-frames bug on ohm
to mpv's `vo_dmabuf_wayland.c` plane-semantics handling for V4L2 stateless
decoder dmabufs. mpv currently emits a `zwp_linux_buffer_params_v1`
message that mixes per-plane fds (V4L2 MPLANE export) with a single-
allocation offset for plane 1, causing KWin to read the UV chroma plane
past-EOF on the UV-plane fd and render solid dark green.
This package is the delivery vehicle for the fix once it's written.
PKGBUILD is in place with an empty patch slot; the patch itself is iter1
of the triage campaign.
## Tracker
- Bug: [marfrit/dmabuf-modifier-triage#1](https://git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/1)
- Symptom-tracker: [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1)
- Acceptance criterion: `~/src/dmabuf-modifier-triage/screenshots/`
## Status
- 2026-05-08: scaffold landed. Builds vanilla mpv 0.41.0 with no
fourier patches applied (patch slot empty in `prepare()`). pkgrel=1
pinned to mpv release v0.41.0.
When iter1 of the triage campaign produces the patch, bump pkgrel,
add the patch to `source=()` and `sha256sums=()`, uncomment the
`patch -p1` line in `prepare()`.
## Hosts that benefit
Only hosts that exercise the V4L2 stateless dmabuf-wayland path —
ohm (RK3566 + hantro G1) today, fresnel (RK3399 + hantro + rkvdec)
once that campaign reaches its mpv-test phase. Other hosts on
[marfrit] (boltzmann, hertz desktop) won't see any difference as
the fix is no-op for non-V4L2-stateless dmabuf paths.
@@ -0,0 +1,43 @@
diff --git a/src/opengl/qopengltextureglyphcache.cpp b/src/opengl/qopengltextureglyphcache.cpp
index 0bab710b..46bad551 100644
--- a/src/opengl/qopengltextureglyphcache.cpp
+++ b/src/opengl/qopengltextureglyphcache.cpp
@@ -108,14 +108,20 @@ void QOpenGLTextureGlyphCache::createTextureData(int width, int height)
for (int i = 0; i < data.size(); ++i)
data[i] = 0;
#if !QT_CONFIG(opengles2)
const GLint internalFormat = isCoreProfile() ? GL_R8 : GL_ALPHA;
const GLenum format = isCoreProfile() ? GL_RED : GL_ALPHA;
#else
- const GLint internalFormat = GL_ALPHA;
- const GLenum format = GL_ALPHA;
+ // qt6-base-fourier: OpenGL ES 3.x removed GL_ALPHA from
+ // glTexImage2D's valid internalFormat list (ES 3 spec
+ // section 3.8.3, table 3.13). Pick GL_R8 + matching format
+ // when the runtime context is ES 3 or newer; only legacy
+ // ES 2 contexts fall through to the historic GL_ALPHA path.
+ const bool useR8 = ctx->format().majorVersion() >= 3;
+ const GLint internalFormat = useR8 ? GL_R8 : GL_ALPHA;
+ const GLenum format = useR8 ? GL_RED : GL_ALPHA;
#endif
funcs->glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, width, height, 0, format, GL_UNSIGNED_BYTE, &data[0]);
}
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
@@ -210,13 +216,14 @@ static void load_glyph_image_to_texture(QOpenGLContext *ctx,
} else {
// The scanlines in image are 32-bit aligned, even for mono or 8-bit formats. This
// is good because it matches the default of 4 bytes for GL_UNPACK_ALIGNMENT.
#if !QT_CONFIG(opengles2)
const GLenum format = isCoreProfile() ? GL_RED : GL_ALPHA;
#else
- const GLenum format = GL_ALPHA;
+ // qt6-base-fourier: ES 3 path — see createTextureData() above.
+ const GLenum format = ctx->format().majorVersion() >= 3 ? GL_RED : GL_ALPHA;
#endif
funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, tx, ty, imgWidth, imgHeight, format, GL_UNSIGNED_BYTE, img.constBits());
}
}
static void load_glyph_image_region_to_texture(QOpenGLContext *ctx,
@@ -0,0 +1,33 @@
diff --git a/src/gui/rhi/qrhigles2.cpp b/src/gui/rhi/qrhigles2.cpp
index 5208bd4d..a2625949 100644
--- a/src/gui/rhi/qrhigles2.cpp
+++ b/src/gui/rhi/qrhigles2.cpp
@@ -1367,18 +1367,25 @@ static inline void toGlTextureFormat(QRhiTexture::Format format, const QRhiGles2
case QRhiTexture::RG8:
*glintformat = GL_RG8;
*glsizedintformat = *glintformat;
*glformat = GL_RG;
*gltype = GL_UNSIGNED_BYTE;
break;
- case QRhiTexture::RED_OR_ALPHA8:
- *glintformat = caps.coreProfile ? GL_R8 : GL_ALPHA;
+ case QRhiTexture::RED_OR_ALPHA8: {
+ // qt6-base-fourier: GL_ALPHA was removed from the valid
+ // glTexImage2D internalFormat list in OpenGL ES 3.0 (ES 3
+ // spec section 3.8.3). Pick GL_R8 + GL_RED for desktop GL
+ // Core profile and for any ES 3 or newer context; only
+ // legacy ES 2 contexts retain GL_ALPHA.
+ const bool useR8 = caps.coreProfile || (caps.gles && caps.ctxMajor >= 3);
+ *glintformat = useR8 ? GL_R8 : GL_ALPHA;
*glsizedintformat = *glintformat;
- *glformat = caps.coreProfile ? GL_RED : GL_ALPHA;
+ *glformat = useR8 ? GL_RED : GL_ALPHA;
*gltype = GL_UNSIGNED_BYTE;
break;
+ }
case QRhiTexture::RGBA16F:
*glintformat = GL_RGBA16F;
*glsizedintformat = *glintformat;
*glformat = GL_RGBA;
*gltype = GL_HALF_FLOAT;
break;
@@ -0,0 +1,51 @@
diff --git a/src/opengl/qopengltextureuploader.cpp b/src/opengl/qopengltextureuploader.cpp
index 3dca0a43..5d650523 100644
--- a/src/opengl/qopengltextureuploader.cpp
+++ b/src/opengl/qopengltextureuploader.cpp
@@ -246,17 +246,24 @@ qsizetype QOpenGLTextureUploader::textureImage(GLenum target, const QImage &imag
// Always needs conversion
break;
} else if (options & UseRedForAlphaAndLuminanceBindOption) {
externalFormat = internalFormat = GL_RED;
pixelType = GL_UNSIGNED_BYTE;
targetFormat = image.format();
- } else if (context->isOpenGLES() || context->format().profile() != QSurfaceFormat::CoreProfile) {
+ } else if ((context->isOpenGLES() && context->format().majorVersion() < 3)
+ || (!context->isOpenGLES() && context->format().profile() != QSurfaceFormat::CoreProfile
+ && !funcs->hasOpenGLExtension(QOpenGLExtensions::TextureSwizzle))) {
+ // qt6-base-fourier: only true ES 2 (or pre-3.3 desktop
+ // compat without the swizzle extension) reaches the
+ // GL_ALPHA fallback. ES 3+ falls through to the swizzle
+ // path below — TextureSwizzle is core in ES 3.0.
externalFormat = internalFormat = GL_ALPHA;
pixelType = GL_UNSIGNED_BYTE;
targetFormat = image.format();
- } else if (funcs->hasOpenGLExtension(QOpenGLExtensions::TextureSwizzle)) {
+ } else if (funcs->hasOpenGLExtension(QOpenGLExtensions::TextureSwizzle)
+ || (context->isOpenGLES() && context->format().majorVersion() >= 3)) {
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_R, GL_ALPHA);
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_G, GL_ZERO);
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_B, GL_ZERO);
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_A, GL_ZERO);
externalFormat = internalFormat = GL_RED;
pixelType = GL_UNSIGNED_BYTE;
@@ -268,13 +275,18 @@ qsizetype QOpenGLTextureUploader::textureImage(GLenum target, const QImage &imag
// Always needs conversion
break;
} else if (options & UseRedForAlphaAndLuminanceBindOption) {
externalFormat = internalFormat = GL_RED;
pixelType = GL_UNSIGNED_BYTE;
targetFormat = image.format();
- } else if (context->isOpenGLES() || context->format().profile() != QSurfaceFormat::CoreProfile) {
+ } else if ((context->isOpenGLES() && context->format().majorVersion() < 3)
+ || (!context->isOpenGLES() && context->format().profile() != QSurfaceFormat::CoreProfile
+ && !funcs->hasOpenGLExtension(QOpenGLExtensions::TextureSwizzle))) {
+ // qt6-base-fourier: same reasoning as the Format_Alpha8
+ // branch above — GL_LUMINANCE is also gone from ES 3
+ // glTexImage2D internalFormats.
externalFormat = internalFormat = GL_LUMINANCE;
pixelType = GL_UNSIGNED_BYTE;
targetFormat = image.format();
} else if (funcs->hasOpenGLExtension(QOpenGLExtensions::TextureSwizzle)) {
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_R, GL_RED);
funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_G, GL_RED);
+199
View File
@@ -0,0 +1,199 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
# Upstream maintainer: Antonio Rojas <arojas@archlinux.org>
# Upstream contributor: Andrea Scarpino <andrea@archlinux.org>
#
# qt6-base-fourier — Qt 6 base with the GL_TEXTURE_*_OPENGL_ES_3 fix
# unlocked. Adds three small runtime checks so that
# QOpenGLTextureGlyphCache, QRhiGles2 (RED_OR_ALPHA8 path) and
# QOpenGLTextureUploader (Format_Alpha8 / Format_Grayscale8) pick
# GL_R8 instead of GL_ALPHA when the live OpenGL context advertises
# ES 3.x or newer. Without those patches Qt 6 emits
# `glTexImage2D(internalFormat=GL_ALPHA)` on Mali-class GLES3
# hardware (mesa panfrost / panthor) — every call returns
# GL_INVALID_VALUE, every dependent glTexSubImage2D errors at level
# 0, and KWin's compositor frame-callback path stalls so badly that
# every Wayland video client deadlocks. Discovered while validating
# chromium-fourier patch 3/3 (NV12 zero-copy) on ohm (PineTab2 /
# RK3566 / hantro mainline) and the chrome+VLC+mpv stalls turned out
# to share a Qt root cause. See ../chromium-fourier/KWIN_PIVOT.md
# for the diagnosis story and the upstream-targetable patch set.
# pkgbase stays as qt6-base so $_pkgfn (= ${pkgbase/6-/} = "qtbase")
# resolves correctly. The "-fourier" suffix lives only in the
# directory name and the commit history; epoch=1 gives our local
# build strict precedence over upstream pkgrel=N until upstream lands
# the GL_R8/ES3 fix and we can drop the epoch.
pkgbase=qt6-base
pkgname=(qt6-base-fourier
qt6-xcb-private-headers-fourier)
_pkgver=6.11.1
pkgver=${_pkgver/-/}
pkgrel=1
epoch=1
arch=(aarch64 x86_64)
url='https://www.qt.io'
license=(GPL-3.0-only
LGPL-3.0-only
LicenseRef-Qt-Commercial
Qt-GPL-exception-1.0)
pkgdesc='A cross-platform application and UI framework'
depends=(brotli
dbus
double-conversion
fontconfig
freetype2
glib2
glibc
harfbuzz
icu
krb5
libb2
libcups
libdrm
libgcc
libgl
libice
libinput
libjpeg-turbo
libpng
libproxy
libsm
libstdc++
liburing
libx11
libxcb
libxkbcommon
libxkbcommon-x11
md4c
mesa
mtdev
openssl
pcre2
shared-mime-info
sqlite
systemd-libs
tslib
wayland
xcb-util-cursor
xcb-util-image
xcb-util-keysyms
xcb-util-renderutil
xcb-util-wm
xdg-utils
zlib
zstd)
makedepends=(alsa-lib
cmake
cups
freetds
git
gst-plugins-base-libs
gtk3
jemalloc
libfbclient
libpulse
mariadb-libs
ninja
postgresql
renderdoc
unixodbc
vulkan-headers
xmlstarlet)
optdepends=('freetds: MS SQL driver'
'gdk-pixbuf2: GTK platform plugin'
'gtk3: GTK platform plugin'
'libfbclient: Firebird/iBase driver'
'mariadb-libs: MariaDB driver'
'pango: GTK platform plugin'
'perl: for syncqt'
'postgresql-libs: PostgreSQL driver'
'unixodbc: ODBC driver')
groups=(qt6)
_pkgfn=qtbase
source=(git+https://code.qt.io/qt/$_pkgfn#tag=v$_pkgver
qt6-base-cflags.patch
qt6-base-nostrip.patch
0001-qopengltextureglyphcache-pick-GL_R8-on-ES3.patch
0002-qrhigles2-RED_OR_ALPHA8-pick-GL_R8-on-ES3.patch
0003-qopengltextureuploader-pick-GL_R8-on-ES3.patch)
sha256sums=('2eafe504fae873d20f206b5661e2e10506879455cb2d370f42c5bb72ccf7a8a1'
'5411edbe215c24b30448fac69bd0ba7c882f545e8cf05027b2b6e2227abc5e78'
'4b93f6a79039e676a56f9d6990a324a64a36f143916065973ded89adc621e094'
'SKIP'
'SKIP'
'SKIP')
prepare() {
patch -d $_pkgfn -p1 < qt6-base-cflags.patch # Use system CFLAGS
patch -d $_pkgfn -p1 < qt6-base-nostrip.patch # Don't strip binaries with qmake
# 8b54513cdcf6 (qdbus crash fix) cherry-pick removed: landed upstream
# in 6.11.1. Re-add if qdbus regressions re-surface.
# qt6-base-fourier — three small runtime-checks that pick GL_R8 over
# GL_ALPHA when the live GL context is ES 3 or newer. See the
# individual patch headers for per-site diagnosis. The
# chromium-fourier KWIN_PIVOT.md writeup carries the discovery thread.
patch -d $_pkgfn -p1 < 0001-qopengltextureglyphcache-pick-GL_R8-on-ES3.patch
patch -d $_pkgfn -p1 < 0002-qrhigles2-RED_OR_ALPHA8-pick-GL_R8-on-ES3.patch
patch -d $_pkgfn -p1 < 0003-qopengltextureuploader-pick-GL_R8-on-ES3.patch
}
build() {
# Set no_direct_extern_access based on architecture
if [[ $CARCH == "aarch64" || $CARCH == "riscv64" ]]; then
_no_direct_extern_access=OFF
else
_no_direct_extern_access=ON
fi
cmake -B build -S $_pkgfn -G Ninja \
-DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DINSTALL_BINDIR=lib/qt6/bin \
-DINSTALL_PUBLICBINDIR=bin \
-DINSTALL_LIBEXECDIR=lib/qt6 \
-DINSTALL_DOCDIR=share/doc/qt6 \
-DINSTALL_ARCHDATADIR=lib/qt6 \
-DINSTALL_DATADIR=share/qt6 \
-DINSTALL_INCLUDEDIR=include/qt6 \
-DINSTALL_MKSPECSDIR=lib/qt6/mkspecs \
-DINSTALL_EXAMPLESDIR=share/doc/qt6/examples \
-DFEATURE_journald=ON \
-DFEATURE_libproxy=ON \
-DFEATURE_openssl_linked=ON \
-DFEATURE_system_sqlite=ON \
-DFEATURE_system_xcb_xinput=ON \
-DFEATURE_no_direct_extern_access=$_no_direct_extern_access \
-DFEATURE_mimetype_database=OFF \
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON \
-DCMAKE_MESSAGE_LOG_LEVEL=STATUS
cmake --build build
}
package_qt6-base-fourier() {
pkgdesc='A cross-platform application and UI framework'
provides=(qt6-base)
conflicts=(qt6-base)
replaces=(qt6-base)
depends+=(qt6-translations)
DESTDIR="$pkgdir" cmake --install build
install -Dm644 $_pkgfn/LICENSES/* -t "$pkgdir"/usr/share/licenses/$pkgbase
}
package_qt6-xcb-private-headers-fourier() {
pkgdesc='Private headers for Qt6 Xcb'
provides=(qt6-xcb-private-headers)
conflicts=(qt6-xcb-private-headers)
replaces=(qt6-xcb-private-headers)
depends=("qt6-base-fourier=$pkgver")
optdepends=()
groups=()
cd $_pkgfn
install -d -m755 "$pkgdir"/usr/include/qt6xcb-private/gl_integrations
cp -r src/plugins/platforms/xcb/*.h "$pkgdir"/usr/include/qt6xcb-private/
cp -r src/plugins/platforms/xcb/gl_integrations/*.h "$pkgdir"/usr/include/qt6xcb-private/gl_integrations/
}
@@ -0,0 +1,46 @@
diff --git a/mkspecs/common/g++-unix.conf b/mkspecs/common/g++-unix.conf
index a493cd5984..41342f5020 100644
--- a/mkspecs/common/g++-unix.conf
+++ b/mkspecs/common/g++-unix.conf
@@ -10,5 +10,6 @@
include(g++-base.conf)
-QMAKE_LFLAGS_RELEASE += -Wl,-O1
+SYSTEM_LDFLAGS = $$(LDFLAGS)
+!isEmpty(SYSTEM_LDFLAGS) { eval(QMAKE_LFLAGS_RELEASE += $$(LDFLAGS)) } else { QMAKE_LFLAGS_RELEASE += -Wl,-O1 }
QMAKE_LFLAGS_NOUNDEF += -Wl,--no-undefined
diff --git a/mkspecs/common/gcc-base.conf b/mkspecs/common/gcc-base.conf
index 1f919d270a..7ef6046326 100644
--- a/mkspecs/common/gcc-base.conf
+++ b/mkspecs/common/gcc-base.conf
@@ -40,9 +40,11 @@ QMAKE_CFLAGS_OPTIMIZE_SIZE = -Os
QMAKE_CFLAGS_DEPS += -M
QMAKE_CFLAGS_WARN_ON += -Wall -Wextra
QMAKE_CFLAGS_WARN_OFF += -w
-QMAKE_CFLAGS_RELEASE += $$QMAKE_CFLAGS_OPTIMIZE
-QMAKE_CFLAGS_RELEASE_WITH_DEBUGINFO += $$QMAKE_CFLAGS_OPTIMIZE -g
-QMAKE_CFLAGS_DEBUG += -g
+SYSTEM_CFLAGS = $$(CFLAGS)
+SYSTEM_DEBUG_CFLAGS = $$(DEBUG_CFLAGS)
+!isEmpty(SYSTEM_CFLAGS) { eval(QMAKE_CFLAGS_RELEASE += $$(CFLAGS)) } else { QMAKE_CFLAGS_RELEASE += $$QMAKE_CFLAGS_OPTIMIZE }
+!isEmpty(SYSTEM_CFLAGS) { eval(QMAKE_CFLAGS_RELEASE_WITH_DEBUGINFO += -g $$(CFLAGS)) } else { QMAKE_CFLAGS_RELEASE_WITH_DEBUGINFO += $$QMAKE_CFLAGS_OPTIMIZE -g }
+!isEmpty(SYSTEM_DEBUG_CFLAGS) { eval(QMAKE_CFLAGS_DEBUG += $$(DEBUG_CFLAGS)) } else { QMAKE_CFLAGS_DEBUG += -g }
QMAKE_CFLAGS_SHLIB += $$QMAKE_CFLAGS_PIC
QMAKE_CFLAGS_STATIC_LIB += $$QMAKE_CFLAGS_PIC
QMAKE_CFLAGS_APP += $$QMAKE_CFLAGS_PIC
@@ -59,9 +61,11 @@ QMAKE_CXXFLAGS += $$QMAKE_CFLAGS
QMAKE_CXXFLAGS_DEPS += $$QMAKE_CFLAGS_DEPS
QMAKE_CXXFLAGS_WARN_ON += $$QMAKE_CFLAGS_WARN_ON
QMAKE_CXXFLAGS_WARN_OFF += $$QMAKE_CFLAGS_WARN_OFF
-QMAKE_CXXFLAGS_RELEASE += $$QMAKE_CFLAGS_RELEASE
-QMAKE_CXXFLAGS_RELEASE_WITH_DEBUGINFO += $$QMAKE_CFLAGS_RELEASE_WITH_DEBUGINFO
-QMAKE_CXXFLAGS_DEBUG += $$QMAKE_CFLAGS_DEBUG
+SYSTEM_CXXFLAGS = $$(CXXFLAGS)
+SYSTEM_DEBUG_CXXFLAGS = $$(DEBUG_CXXFLAGS)
+!isEmpty(SYSTEM_CXXFLAGS) { eval(QMAKE_CXXFLAGS_RELEASE += $$(CXXFLAGS)) } else { QMAKE_CXXFLAGS_RELEASE += $$QMAKE_CFLAGS_OPTIMIZE }
+!isEmpty(SYSTEM_CXXFLAGS) { eval(QMAKE_CXXFLAGS_RELEASE_WITH_DEBUGINFO += -g $$(CXXFLAGS)) } else { QMAKE_CXXFLAGS_RELEASE_WITH_DEBUGINFO += $$QMAKE_CFLAGS_OPTIMIZE -g }
+!isEmpty(SYSTEM_DEBUG_CXXFLAGS) { eval(QMAKE_CXXFLAGS_DEBUG += $$(DEBUG_CXXFLAGS)) } else { QMAKE_CXXFLAGS_DEBUG += -g }
QMAKE_CXXFLAGS_SHLIB += $$QMAKE_CFLAGS_SHLIB
QMAKE_CXXFLAGS_STATIC_LIB += $$QMAKE_CFLAGS_STATIC_LIB
QMAKE_CXXFLAGS_APP += $$QMAKE_CFLAGS_APP
@@ -0,0 +1,13 @@
diff --git a/mkspecs/common/gcc-base.conf b/mkspecs/common/gcc-base.conf
index 99d77156fd..fc840fe9f6 100644
--- a/mkspecs/common/gcc-base.conf
+++ b/mkspecs/common/gcc-base.conf
@@ -31,6 +31,8 @@
# you can use the manual test in tests/manual/mkspecs.
#
+CONFIG += nostrip
+
QMAKE_CFLAGS_OPTIMIZE = -O2
QMAKE_CFLAGS_OPTIMIZE_FULL = -O3
QMAKE_CFLAGS_OPTIMIZE_DEBUG = -Og
+110
View File
@@ -0,0 +1,110 @@
# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
#
# vulkan-panfrost — Mesa's panvk Vulkan driver, packaged as a
# standalone ICD on Arch Linux ARM. Stock ALARM `mesa` does not build
# with -Dvulkan-drivers=panfrost, so panvk doesn't ship.
#
# Targets both Mali kernel drivers:
# - panfrost (Bifrost: Mali-G31 / G52 / G57) on RK3566 / RK3568 etc.
# - panthor (Valhall: Mali-G610+) on RK3588 / RK3588S etc.
#
# panvk on Mali-G52 r1 (Bifrost-gen2) currently returns
# VK_ERROR_INCOMPATIBLE_DRIVER on probe — that's an upstream mesa
# issue, not a packaging one. The driver lights up cleanly on
# Mali-G610 Valhall (RK3588) which is the immediate target. Install on
# Bifrost boards anyway; future mesa releases may unblock G52 r1
# without re-packaging.
pkgname=vulkan-panfrost-fourier
pkgver=26.0.5
pkgrel=2
epoch=1
pkgdesc='Mesa Vulkan ICD for Mali Bifrost / Valhall (panvk)'
arch=('aarch64')
url='https://gitlab.freedesktop.org/mesa/mesa'
license=('MIT')
depends=(
vulkan-icd-loader
libdrm
zlib
zstd
expat
libelf
wayland
)
makedepends=(
meson
ninja
python-mako
glslang
libxrandr
libxshmfence
libxxf86vm
vulkan-headers
wayland-protocols
rust-bindgen
rust
llvm
llvm-libs
libclc
spirv-tools
spirv-llvm-translator
)
provides=('vulkan-driver')
options=('!lto')
source=("https://archive.mesa3d.org/mesa-${pkgver}.tar.xz")
sha256sums=('SKIP')
build() {
cd "${srcdir}/mesa-${pkgver}"
# Strip mesa down to just the panvk Vulkan driver — no Gallium
# drivers, no GL/GLES, no GLX, no EGL, no VAAPI/VDPAU. The host's
# stock `mesa` package keeps providing all of those; this PKGBUILD
# only adds the missing Vulkan ICD next to it.
#
# --auto-features=disabled means features have to be opt-in. Avoids
# mesa's default "enable everything we can find headers for" pulling
# in xlib-lease / gallium-va / etc. that we don't want here.
meson setup . build \
--prefix=/usr \
--libexecdir=lib \
--sbindir=bin \
--buildtype=release \
--auto-features=disabled \
--wrap-mode=nodownload \
-Db_lto=false \
-Db_pie=true \
-Dvulkan-drivers=panfrost \
-Dgallium-drivers= \
-Dplatforms=wayland \
-Dshared-glapi=disabled \
-Dgallium-rusticl=false \
-Dmicrosoft-clc=disabled \
-Dvideo-codecs= \
-Dllvm=enabled \
-Dshared-llvm=enabled \
-Dspirv-tools=enabled \
-Dvulkan-icd-dir=/usr/share/vulkan/icd.d
meson compile -C build
}
package() {
cd "${srcdir}/mesa-${pkgver}"
# Mesa's install rules drop a lot of files we don't want in this
# narrow package — stage to a temp dir, then cherry-pick.
DESTDIR="${srcdir}/staging" meson install -C build --no-rebuild
# The ICD shared object
install -Dm755 "${srcdir}/staging/usr/lib/libvulkan_panfrost.so" \
"${pkgdir}/usr/lib/libvulkan_panfrost.so"
# The Vulkan loader manifest. Mesa installs it as
# `panfrost_icd.<arch>.json` (the userspace driver name) — that
# filename is what the Vulkan loader expects, no need to rename.
install -dm755 "${pkgdir}/usr/share/vulkan/icd.d"
cp -av "${srcdir}/staging/usr/share/vulkan/icd.d/"panfrost_icd*.json \
"${pkgdir}/usr/share/vulkan/icd.d/"
}