H.264 streams with B-frames showed visibly pair-swapped output in
mpv / Firefox playback through the libva → daedalus_v4l2 path —
"frames went 2 1 4 3 6 5 instead of 1 2 3 4 5 6". Reproduced in mpv
with --hwdec=vaapi-copy at 720p (bypassing Firefox's compositor),
confirming the bug was in this daemon pipeline, not downstream.
Root cause
----------
libavcodec's H.264 decoder internally reorders output to DISPLAY
order before returning from avcodec_receive_frame. The daemon
previously called send_packet → receive_frame ONCE per REQ_DECODE
and shipped the resulting pixels in a RESP_FRAME tagged with the
SAME cookie. For B-frames this is wrong: the frame returned from
receive_frame may belong to an EARLIER bitstream (libavcodec held
it for display-order release). Cookie N's CAPTURE buffer therefore
got cookie N-2's pixels, while cookie N-2's CAPTURE buffer got
silently marked VB2_BUF_STATE_ERROR (the daemon returned
DAEDALUS_DECODE_NO_FRAME for the cookie whose pixels were held).
Fix shape
---------
Decouple kernel cookie identity (decode-order routing) from
libavcodec's display-ordered output. Wire-protocol changes:
REQ_DECODE + __u64 src_pts (= src_buf->vb2_buf.timestamp)
RESP_FRAME + __u32 flags (HAS_PIXELS | SRC_CONSUMED)
+ __u64 output_src_pts (= frame->pts on drain)
PROTO_VERSION bumped 0 → 1. Lock-step rebuild required.
Kernel
------
device_run now mirrors src_buf->vb2_buf.timestamp into req->src_pts
before sending REQ_DECODE, and stores it on the inflight item so
the completion path can stamp dst_buf.timestamp explicitly when
src/dst lifecycles decouple (V4L2_BUF_FLAG_TIMESTAMP_COPY's auto-
pairing no longer applies).
daedalus_complete_resp_frame splits into:
HAS_PIXELS: pack pixels into THIS cookie's CAPTURE buffer,
stamp dst timestamp from inflight->src_pts,
v4l2_m2m_buf_done(dst, DONE/ERROR).
No job_finish here.
SRC_CONSUMED: release the bound media_request, run
v4l2_m2m_buf_done(src) + v4l2_m2m_job_finish so
the scheduler can dispatch the next REQ. dst_buf
may still be parked at this point.
Inflight entry is removed and freed only when BOTH src_buf and
dst_buf have been cleared. Combined HAS_PIXELS|SRC_CONSUMED RESPs
(steady-state VP9/AV1 with no reorder lag) collapse to the prior
1:1 behaviour for free.
Daemon
------
daedalus_decoder_run_request split into three primitives:
daedalus_decoder_submit — set pkt->pts = req->src_pts,
avcodec_send_packet.
daedalus_decoder_drain_one — avcodec_receive_frame, populate
resp meta + output_src_pts (= the
frame's pts, carried back from
the bitstream that produced it).
daedalus_decoder_pack_current — pack current AVFrame into the
caller-mapped CAPTURE planes.
chardev_client maintains a small (src_pts → cookie, cached_req)
table indexed linearly (≤64 entries; bounded by V4L2 client buffer
pool depth). On each REQ_DECODE:
1. Register (src_pts → cookie) in the table.
2. submit().
3. Drain loop: for each frame returned, look up its owner cookie
via pending_lookup(frame->pts), GET_DMABUF for THAT cookie,
pack pixels, emit RESP_FRAME(owner_cookie, HAS_PIXELS,
output_src_pts=frame->pts). Combine with SRC_CONSUMED when
owner_cookie equals the current REQ's cookie.
4. If the current REQ's cookie wasn't drained inside the loop
(libavcodec held the frame), emit a standalone SRC_CONSUMED
RESP so the kernel runs job_finish + dispatches the next REQ;
dst_buf for this cookie stays parked until a future drain
produces its pixels.
VP9 / AV1 paths are unchanged in behaviour: one frame per REQ,
HAS_PIXELS|SRC_CONSUMED in one combined RESP.
Verified
--------
Builds clean cross-compiled on higgs against 6.18.29+rpt-rpi-2712
(Pi CM5). Frame-size warning in device_run is pre-existing
(unchanged by this commit).
daedalus-v4l2
V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Status: scaffold (2026-05-18). Architecture locked per daedalus-fourier session memory; implementation not yet begun.
What this is
Sibling repo to daedalus-fourier (the kernel library; cycles 1-9 closed).
A two-piece userspace + kernel-module stack that exposes a V4L2
stateless decoder interface (/dev/videoNN) so that
libva-v4l2-request-fourier → firefox-fourier /
chromium-fourier can drive it the same way they drive existing
hardware-decode pipelines on Pi 5 / RK3588.
+-----------------------------------------------------------+
| firefox-fourier / chromium-fourier (existing) |
+-----------------------------------------------------------+
| VA-API |
+-----------------------------------------------------------+
| libva-v4l2-request-fourier (existing, sibling project) |
+-----------------------------------------------------------+
| V4L2 stateless ioctl uAPI |
+-----------------------------------------------------------+
| daedalus-v4l2 kernel module (`kernel/`) |
| - registers /dev/videoNN |
| - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
| - forwards bitstream + controls to userspace daemon |
| via chardev or netlink |
+-----------------------------------------------------------+
| daedalus-v4l2 userspace daemon (`daemon/`) |
| - takes bitstream blobs + per-slice controls |
| - drives FFmpeg parsers via dlopen (Option γ) |
| - dispatches per-block ops via daedalus-fourier |
| public API (daedalus_dispatch_*) |
| - posts decoded frames back to kernel module |
+-----------------------------------------------------------+
| daedalus-fourier kernel library (sibling project) |
| - exports include/daedalus.h public API |
| - per-kernel CPU NEON + opportunistic V3D QPU dispatch |
| - 9 closed cycles across VP9, AV1 CDEF, H.264 |
+-----------------------------------------------------------+
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712) |
+-----------------------------------------------------------+
Why this architecture (Option B + γ + sibling)
Locked by user 2026-05-18 from 3 options in
daedalus-fourier/docs/phase8_scoping.md:
- Option B over A (userspace v4l2loopback): real
/dev/videoNN, proper DRM PRIME / dmabuf for browser zero-copy. - Option γ: dlopen FFmpeg as parser at runtime. No vendoring, fastest to v1.
- Sibling repo: per
project_consumer_targetconvention, V4L2-side work lives outside daedalus-fourier so the kernel-library has a clean API boundary.
Status
Initial scaffold only. See docs/architecture.md for the
deeper design and docs/roadmap.md for the
sub-phase breakdown.
Repo layout
kernel/— Linux kernel module (V4L2 device registration + ioctl handling + userspace chardev bridge). Out-of-tree.daemon/— userspace decoder daemon (linkslibdaedalus_core.afrom sibling daedalus-fourier; uses dlopen for FFmpeg parser).include/— shared headers between kernel and daemon.docs/— architecture + roadmap.
License
Kernel module: GPLv2 (required for kernel-tree compatibility). Userspace daemon: BSD-2-Clause (matches daedalus-fourier).