marfrit d84efdb125 Phase 8.9: long-form stress + multi-codec HDR + libva scoping
Three verification deliverables; no production code changes
(infrastructure from 8.8 was sufficient).

1. libva-v4l2-request consumer investigation (task 95):
   - bootlin/libva-v4l2-request@master supports MPEG-2 /
     H.264 / HEVC only. No VP9, no AV1.
   - H264 expects V4L2_PIX_FMT_H264_SLICE_RAW (older
     fourcc); we advertise V4L2_PIX_FMT_H264_SLICE.
   - CAPTURE expects V4L2_PIX_FMT_NV12 (single-plane);
     we advertise NV12M + P010.
   - Real integration = patch libva-v4l2-request to add
     VP9 + AV1 mappings + accept the newer H.264 fourcc.
     Multi-session work — pushed to Phase 8.10.

2. Long-form stress test (task 96):
   - Built a 1800-frame (60s @ 30fps) VP9 1080p stream
     by Python concat of vp9_5s.ivf × 12 with PTS
     adjustment and re-muxed IVF header.
   - 1800 / 1800 frames decoded cleanly through
     test_m2m_stream + daemon, fps=120.9 sustained
     across 14.9 s wall, p99=17.3 ms/frame (well inside
     the 33 ms 30fps budget).
   - Daemon alive after 3620 cookies across two
     back-to-back runs, RSS=23 MiB — no leak.
   - No kernel oops/WARN, no fps degradation across
     the long run.

3. Multi-codec HDR (task 97):
   - AV1 1080p 10-bit → P010: byte-exact vs ffmpeg
     p010le. fps 17.1 (below 30fps target; AV1 10-bit
     is intrinsically expensive).
   - H.264 1080p 10-bit (high10) → P010: byte-exact
     vs ffmpeg p010le. fps 26.9 (close to target).
   - Combined with 8.8's VP9-10bit P010 result
     (48.8 fps): all three codecs' 10-bit paths
     produce byte-exact P010 output.

Roadmap update (docs/roadmap.md):
- 8.9 marked closed with the scope-cut explained.
- 8.10 = libva-v4l2-request VP9/AV1 patch + end-to-end
  consumer integration (the actual user-facing loop:
  mpv --hwdec=vaapi → libva-v4l2-request → /dev/video0
  → daemon → decoded frame).

Per correctness-before-speed: characterised the libva
integration scope rigorously rather than starting a
multi-session battle in this phase. The bounded
deliverables (stress test + HDR matrix) ship clean and
prove the existing infrastructure handles real-world
workloads stably.

Phase 8.10 next: build + patch libva-v4l2-request on
hertz; end-to-end with mpv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:26:42 +00:00

daedalus-v4l2

V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video decode kernels on VideoCore VII compute + ARM NEON).

Status: scaffold (2026-05-18). Architecture locked per daedalus-fourier session memory; implementation not yet begun.

What this is

Sibling repo to daedalus-fourier (the kernel library; cycles 1-9 closed).

A two-piece userspace + kernel-module stack that exposes a V4L2 stateless decoder interface (/dev/videoNN) so that libva-v4l2-request-fourierfirefox-fourier / chromium-fourier can drive it the same way they drive existing hardware-decode pipelines on Pi 5 / RK3588.

+-----------------------------------------------------------+
| firefox-fourier / chromium-fourier  (existing)            |
+-----------------------------------------------------------+
| VA-API                                                    |
+-----------------------------------------------------------+
| libva-v4l2-request-fourier  (existing, sibling project)   |
+-----------------------------------------------------------+
| V4L2 stateless ioctl uAPI                                 |
+-----------------------------------------------------------+
| daedalus-v4l2 kernel module  (`kernel/`)                  |
|   - registers /dev/videoNN                                |
|   - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
|   - forwards bitstream + controls to userspace daemon     |
|     via chardev or netlink                                |
+-----------------------------------------------------------+
| daedalus-v4l2 userspace daemon  (`daemon/`)               |
|   - takes bitstream blobs + per-slice controls            |
|   - drives FFmpeg parsers via dlopen (Option γ)           |
|   - dispatches per-block ops via daedalus-fourier         |
|     public API (daedalus_dispatch_*)                      |
|   - posts decoded frames back to kernel module            |
+-----------------------------------------------------------+
| daedalus-fourier kernel library  (sibling project)        |
|   - exports include/daedalus.h public API                 |
|   - per-kernel CPU NEON + opportunistic V3D QPU dispatch  |
|   - 9 closed cycles across VP9, AV1 CDEF, H.264           |
+-----------------------------------------------------------+
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712)        |
+-----------------------------------------------------------+

Why this architecture (Option B + γ + sibling)

Locked by user 2026-05-18 from 3 options in daedalus-fourier/docs/phase8_scoping.md:

  • Option B over A (userspace v4l2loopback): real /dev/videoNN, proper DRM PRIME / dmabuf for browser zero-copy.
  • Option γ: dlopen FFmpeg as parser at runtime. No vendoring, fastest to v1.
  • Sibling repo: per project_consumer_target convention, V4L2-side work lives outside daedalus-fourier so the kernel-library has a clean API boundary.

Status

Initial scaffold only. See docs/architecture.md for the deeper design and docs/roadmap.md for the sub-phase breakdown.

Repo layout

  • kernel/ — Linux kernel module (V4L2 device registration + ioctl handling + userspace chardev bridge). Out-of-tree.
  • daemon/ — userspace decoder daemon (links libdaedalus_core.a from sibling daedalus-fourier; uses dlopen for FFmpeg parser).
  • include/ — shared headers between kernel and daemon.
  • docs/ — architecture + roadmap.

License

Kernel module: GPLv2 (required for kernel-tree compatibility). Userspace daemon: BSD-2-Clause (matches daedalus-fourier).

S
Description
No description provided
Readme 686 KiB
Languages
C 97.8%
CMake 1.5%
Makefile 0.7%