marfrit ee42419479 proto: bump PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB (closes #19)
Real H.264 access units routinely exceed the previous 64 KiB cap
on the chardev wire protocol:

  720p worst-case I-frame  ~200 KiB
  1080p worst-case I-frame ~500 KiB

libva-v4l2-request-fourier detects the under-sized OUTPUT-MPLANE
buffer and tries to grow it via VIDIOC_S_FMT to 147456 B, but
daedalus_fill_output_fmt unconditionally pins sizeimage to
DAEDALUS_MAX_BITSTREAM (= 65484) regardless of userspace's
request.  Firefox loses the slice, falls back to libmozavcodec
SW for the rest of the session.

Bumping the wire-protocol cap to 1 MiB lifts the kernel
OUTPUT_MPLANE sizeimage with it (DAEDALUS_MAX_BITSTREAM is derived
from the same #define).  All allocations (kernel kmalloc /
kmemdup, daemon read buffer, vb2 plane backing) are dynamic and
sized per-payload at runtime, so the only growth is the daemon's
startup read buffer (one ~1 MiB allocation per daemon process)
and the V4L2 OUTPUT_MPLANE per-buffer size.  KMALLOC_MAX_SIZE on
aarch64 SLUB is several MiB; 1 MiB is well within bounds.  Other
V4L2 stateless decoders (cedrus, rkvdec, hantro) report 1-4 MiB
OUTPUT_MPLANE sizeimage — this puts daedalus at the conservative
end of normal.

## Compatibility

#define-only change; struct layout unchanged.  But the
effective cap is the smaller of (kernel cap, daemon cap), so:
- new daemon + stale kernel: still capped at 64 KiB until the
  kernel module rebuilds.
- new kernel + stale daemon: same.
Lock-step install of daedalus-v4l2 + daedalus-v4l2-dkms is
therefore required for the fix to take effect; mirrors the
PR-#7/#8 cadence.

## NOT changed in this commit

- daedalus_fill_output_fmt still hardcodes sizeimage =
  DAEDALUS_MAX_BITSTREAM regardless of userspace request.
  Acceptable: vb2 will allocate up to that, and libva's resize-
  test now sees the kernel report a sizeimage at-least-as-large
  as what it asked for (147456 < 1048524).  A future cleanup
  could respect userspace's S_FMT.sizeimage clamped to the cap,
  to save memory on tiny streams.
- chardev kmalloc → kvmalloc swap (only matters above
  KMALLOC_MAX_SIZE, not here).

Refs #19.
2026-05-22 20:46:27 +02:00

daedalus-v4l2

V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video decode kernels on VideoCore VII compute + ARM NEON).

Status: scaffold (2026-05-18). Architecture locked per daedalus-fourier session memory; implementation not yet begun.

What this is

Sibling repo to daedalus-fourier (the kernel library; cycles 1-9 closed).

A two-piece userspace + kernel-module stack that exposes a V4L2 stateless decoder interface (/dev/videoNN) so that libva-v4l2-request-fourierfirefox-fourier / chromium-fourier can drive it the same way they drive existing hardware-decode pipelines on Pi 5 / RK3588.

+-----------------------------------------------------------+
| firefox-fourier / chromium-fourier  (existing)            |
+-----------------------------------------------------------+
| VA-API                                                    |
+-----------------------------------------------------------+
| libva-v4l2-request-fourier  (existing, sibling project)   |
+-----------------------------------------------------------+
| V4L2 stateless ioctl uAPI                                 |
+-----------------------------------------------------------+
| daedalus-v4l2 kernel module  (`kernel/`)                  |
|   - registers /dev/videoNN                                |
|   - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
|   - forwards bitstream + controls to userspace daemon     |
|     via chardev or netlink                                |
+-----------------------------------------------------------+
| daedalus-v4l2 userspace daemon  (`daemon/`)               |
|   - takes bitstream blobs + per-slice controls            |
|   - drives FFmpeg parsers via dlopen (Option γ)           |
|   - dispatches per-block ops via daedalus-fourier         |
|     public API (daedalus_dispatch_*)                      |
|   - posts decoded frames back to kernel module            |
+-----------------------------------------------------------+
| daedalus-fourier kernel library  (sibling project)        |
|   - exports include/daedalus.h public API                 |
|   - per-kernel CPU NEON + opportunistic V3D QPU dispatch  |
|   - 9 closed cycles across VP9, AV1 CDEF, H.264           |
+-----------------------------------------------------------+
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712)        |
+-----------------------------------------------------------+

Why this architecture (Option B + γ + sibling)

Locked by user 2026-05-18 from 3 options in daedalus-fourier/docs/phase8_scoping.md:

  • Option B over A (userspace v4l2loopback): real /dev/videoNN, proper DRM PRIME / dmabuf for browser zero-copy.
  • Option γ: dlopen FFmpeg as parser at runtime. No vendoring, fastest to v1.
  • Sibling repo: per project_consumer_target convention, V4L2-side work lives outside daedalus-fourier so the kernel-library has a clean API boundary.

Status

Initial scaffold only. See docs/architecture.md for the deeper design and docs/roadmap.md for the sub-phase breakdown.

Repo layout

  • kernel/ — Linux kernel module (V4L2 device registration + ioctl handling + userspace chardev bridge). Out-of-tree.
  • daemon/ — userspace decoder daemon (links libdaedalus_core.a from sibling daedalus-fourier; uses dlopen for FFmpeg parser).
  • include/ — shared headers between kernel and daemon.
  • docs/ — architecture + roadmap.

License

Kernel module: GPLv2 (required for kernel-tree compatibility). Userspace daemon: BSD-2-Clause (matches daedalus-fourier).

S
Description
No description provided
Readme 686 KiB
Languages
C 97.8%
CMake 1.5%
Makefile 0.7%