claude-noether 94be8c3d03 kernel: drain in-flight m2m jobs on daemon disconnect
Fixes issue #146 — daemon-crash (SIGKILL, SEGV, anything that
triggers chardev release) leaves V4L2 consumers in unkillable
TASK_UNINTERRUPTIBLE on /dev/video0 close.

## Root cause

device_run() adds an entry to dev->inflight when it sends a
REQ_DECODE to the daemon, marking the m2m job as "running".
The job is only cleared via v4l2_m2m_buf_done_and_job_finish()
in daedalus_complete_resp_frame(), which only fires on RESP_FRAME.

If the daemon dies (SIGKILL, SEGV, exit) BEFORE writing the
matching RESP_FRAME:
  - the inflight entry is never popped
  - v4l2_m2m_buf_done_and_job_finish is never called
  - the m2m scheduler still thinks a job is running

Later, when the V4L2 consumer's close() runs (or gets signalled
to exit), v4l2_m2m_ctx_release() → v4l2_m2m_cancel_job() waits
for !job_running indefinitely.  The consumer enters D-state and
survives SIGKILL until reboot.

Reproduced on hertz 2026-05-23, kernel 6.12.75+rpt-rpi-2712:

  $ sudo kill -STOP $DAEMON_PID            # block daemon I/O
  $ ./test_m2m_decode keyframe.bin out.nv12 1920 1080 vp9 &
  $ sudo kill -9 $DAEMON_PID               # chardev_release fires
  $ kill -9 $CLIENT_PID                    # ignored — D-state
  # client stack:
  v4l2_m2m_cancel_job+0x14c [v4l2_mem2mem]
  v4l2_m2m_ctx_release+0x20 [v4l2_mem2mem]
  daedalus_release+0x2c [daedalus_v4l2]
  v4l2_release+0x7c [videodev]
  __fput → do_exit → SIGKILL never delivered

## Fix

New API daedalus_drain_inflight_on_disconnect() in main.{c,h}:
walks the in-flight list, marks both src+dst buffers
VB2_BUF_STATE_ERROR via v4l2_m2m_buf_done_and_job_finish(), and
releases the bound media_request if any.  Same completion shape
as daedalus_complete_resp_frame() takes on the success path,
just with state = ERROR for every in-flight entry.

chardev_release calls the drain after flushing dev->req_queue
(messages still in req_queue weren't released to the daemon yet,
so they don't need the m2m-job-finish dance — freeing them is
sufficient).  The order matters: queue first (cheap), then m2m
drain (heavier, takes the inflight list).

Locking: list_splice_init under inflight_lock to take the entire
list atomically; lock dropped before iterating because
v4l2_m2m_buf_done_and_job_finish can sleep via vb2's buffer-done
dispatch and can re-enter device_run via the scheduler (which
would need inflight_lock again on the next REQ_DECODE).

## Verification path

Cannot rmmod the running module on hertz right now — the D-state
corpse from the repro session pins the refcount.  Verification
of the fixed module needs a reboot or fresh test host:

  $ sudo reboot                            # clears hung client
  $ sudo make modules_install              # install new .ko
  $ sudo modprobe daedalus_v4l2
  $ # rerun the repro script — client should die cleanly with
  $ # an -EIO / similar return from poll/DQBUF instead of hanging.

Build: clean on Linux 6.12.75 + rpt-rpi-2712, no new warnings.
The pre-existing "frame size 2128 > 2048" warning on
daedalus_device_run is unchanged by this commit.

## Followup not in scope

If a new V4L2 consumer races a REQ_DECODE through device_run
AFTER the drain has spliced the list (but before the daemon
chardev is reopened), the new entry sits in a freshly-empty
inflight list and the same hang can recur for that consumer
when the systemd auto-restart of the daemon either fails or
takes longer than the consumer's patience.  A secondary
safeguard would be to fail-fast in device_run when dev->chardev
is unopened — proposing as a separate ticket if this race
materialises in practice.

Closes #146.
2026-05-23 17:06:06 +02:00

daedalus-v4l2

V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video decode kernels on VideoCore VII compute + ARM NEON).

Status: scaffold (2026-05-18). Architecture locked per daedalus-fourier session memory; implementation not yet begun.

What this is

Sibling repo to daedalus-fourier (the kernel library; cycles 1-9 closed).

A two-piece userspace + kernel-module stack that exposes a V4L2 stateless decoder interface (/dev/videoNN) so that libva-v4l2-request-fourierfirefox-fourier / chromium-fourier can drive it the same way they drive existing hardware-decode pipelines on Pi 5 / RK3588.

+-----------------------------------------------------------+
| firefox-fourier / chromium-fourier  (existing)            |
+-----------------------------------------------------------+
| VA-API                                                    |
+-----------------------------------------------------------+
| libva-v4l2-request-fourier  (existing, sibling project)   |
+-----------------------------------------------------------+
| V4L2 stateless ioctl uAPI                                 |
+-----------------------------------------------------------+
| daedalus-v4l2 kernel module  (`kernel/`)                  |
|   - registers /dev/videoNN                                |
|   - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
|   - forwards bitstream + controls to userspace daemon     |
|     via chardev or netlink                                |
+-----------------------------------------------------------+
| daedalus-v4l2 userspace daemon  (`daemon/`)               |
|   - takes bitstream blobs + per-slice controls            |
|   - drives FFmpeg parsers via dlopen (Option γ)           |
|   - dispatches per-block ops via daedalus-fourier         |
|     public API (daedalus_dispatch_*)                      |
|   - posts decoded frames back to kernel module            |
+-----------------------------------------------------------+
| daedalus-fourier kernel library  (sibling project)        |
|   - exports include/daedalus.h public API                 |
|   - per-kernel CPU NEON + opportunistic V3D QPU dispatch  |
|   - 9 closed cycles across VP9, AV1 CDEF, H.264           |
+-----------------------------------------------------------+
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712)        |
+-----------------------------------------------------------+

Why this architecture (Option B + γ + sibling)

Locked by user 2026-05-18 from 3 options in daedalus-fourier/docs/phase8_scoping.md:

  • Option B over A (userspace v4l2loopback): real /dev/videoNN, proper DRM PRIME / dmabuf for browser zero-copy.
  • Option γ: dlopen FFmpeg as parser at runtime. No vendoring, fastest to v1.
  • Sibling repo: per project_consumer_target convention, V4L2-side work lives outside daedalus-fourier so the kernel-library has a clean API boundary.

Status

Initial scaffold only. See docs/architecture.md for the deeper design and docs/roadmap.md for the sub-phase breakdown.

Repo layout

  • kernel/ — Linux kernel module (V4L2 device registration + ioctl handling + userspace chardev bridge). Out-of-tree.
  • daemon/ — userspace decoder daemon (links libdaedalus_core.a from sibling daedalus-fourier; uses dlopen for FFmpeg parser).
  • include/ — shared headers between kernel and daemon.
  • docs/ — architecture + roadmap.

License

Kernel module: GPLv2 (required for kernel-tree compatibility). Userspace daemon: BSD-2-Clause (matches daedalus-fourier).

S
Description
No description provided
Readme 686 KiB
Languages
C 97.8%
CMake 1.5%
Makefile 0.7%