daedalus_queue_init was wiring both src_vq->lock and dst_vq->lock to
ctx->dev->m2m_lock — a device-wide mutex. That serialises every
vb2 ioctl (S_FMT, REQBUFS, QBUF, DQBUF, STREAMON, ...) across ALL
concurrent clients of /dev/video0. For a single-client consumer
like the test_m2m_* tools it doesn't matter; for Firefox, which
spawns separate content + RDD + GPU processes that each open
/dev/video0 and run libva probe simultaneously, the contention
showed up as EBUSY from one libva session's S_FMT(OUTPUT_MPLANE)
when another session was mid-streamon on the same device.
Observable on higgs (Pi CM5):
$ MOZ_VA_API_ENABLED=1 LIBVA_DRIVER_NAME=v4l2_request firefox
...
v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=32 ...
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to set format for type 10: Device or
resource busy
After this fix, each open() gets its own ctx->vb_mutex and the
per-context vb2_queue locks are independent — Firefox's multi-
process VAAPI clients no longer fight each other. YouTube
playback on higgs runs through daedalus at ~230 fps sustained
(640x368, libavcodec dlopen path), 7× headroom over the 30fps
target.
cedrus / rkvdec / hantro all use the per-ctx vb mutex pattern
for the same reason. This mirrors them.
Lifecycle:
- mutex_init in daedalus_open (right after the kzalloc that
creates ctx, before v4l2_fh_init).
- mutex_destroy in daedalus_release (after v4l2_fh_exit, before
kfree), and in the err_ctrl unwind path in daedalus_open.
Verified end-to-end on higgs:
- rmmod + modprobe the rebuilt .ko.
- Restart daedalus-v4l2.service.
- Firefox YouTube playback engages VAAPI, daemon journal shows
cookie=1..N codec=3 (H.264) REQ_DECODE / decoder:OK pairs
with unique per-frame fnv1a hashes.
- No EBUSY in either firefox stderr or daemon journal during
the entire session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
daedalus-v4l2
V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Status: scaffold (2026-05-18). Architecture locked per daedalus-fourier session memory; implementation not yet begun.
What this is
Sibling repo to daedalus-fourier (the kernel library; cycles 1-9 closed).
A two-piece userspace + kernel-module stack that exposes a V4L2
stateless decoder interface (/dev/videoNN) so that
libva-v4l2-request-fourier → firefox-fourier /
chromium-fourier can drive it the same way they drive existing
hardware-decode pipelines on Pi 5 / RK3588.
+-----------------------------------------------------------+
| firefox-fourier / chromium-fourier (existing) |
+-----------------------------------------------------------+
| VA-API |
+-----------------------------------------------------------+
| libva-v4l2-request-fourier (existing, sibling project) |
+-----------------------------------------------------------+
| V4L2 stateless ioctl uAPI |
+-----------------------------------------------------------+
| daedalus-v4l2 kernel module (`kernel/`) |
| - registers /dev/videoNN |
| - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
| - forwards bitstream + controls to userspace daemon |
| via chardev or netlink |
+-----------------------------------------------------------+
| daedalus-v4l2 userspace daemon (`daemon/`) |
| - takes bitstream blobs + per-slice controls |
| - drives FFmpeg parsers via dlopen (Option γ) |
| - dispatches per-block ops via daedalus-fourier |
| public API (daedalus_dispatch_*) |
| - posts decoded frames back to kernel module |
+-----------------------------------------------------------+
| daedalus-fourier kernel library (sibling project) |
| - exports include/daedalus.h public API |
| - per-kernel CPU NEON + opportunistic V3D QPU dispatch |
| - 9 closed cycles across VP9, AV1 CDEF, H.264 |
+-----------------------------------------------------------+
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712) |
+-----------------------------------------------------------+
Why this architecture (Option B + γ + sibling)
Locked by user 2026-05-18 from 3 options in
daedalus-fourier/docs/phase8_scoping.md:
- Option B over A (userspace v4l2loopback): real
/dev/videoNN, proper DRM PRIME / dmabuf for browser zero-copy. - Option γ: dlopen FFmpeg as parser at runtime. No vendoring, fastest to v1.
- Sibling repo: per
project_consumer_targetconvention, V4L2-side work lives outside daedalus-fourier so the kernel-library has a clean API boundary.
Status
Initial scaffold only. See docs/architecture.md for the
deeper design and docs/roadmap.md for the
sub-phase breakdown.
Repo layout
kernel/— Linux kernel module (V4L2 device registration + ioctl handling + userspace chardev bridge). Out-of-tree.daemon/— userspace decoder daemon (linkslibdaedalus_core.afrom sibling daedalus-fourier; uses dlopen for FFmpeg parser).include/— shared headers between kernel and daemon.docs/— architecture + roadmap.
License
Kernel module: GPLv2 (required for kernel-tree compatibility). Userspace daemon: BSD-2-Clause (matches daedalus-fourier).