main
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
94be8c3d03 |
kernel: drain in-flight m2m jobs on daemon disconnect
Fixes issue #146 — daemon-crash (SIGKILL, SEGV, anything that triggers chardev release) leaves V4L2 consumers in unkillable TASK_UNINTERRUPTIBLE on /dev/video0 close. ## Root cause device_run() adds an entry to dev->inflight when it sends a REQ_DECODE to the daemon, marking the m2m job as "running". The job is only cleared via v4l2_m2m_buf_done_and_job_finish() in daedalus_complete_resp_frame(), which only fires on RESP_FRAME. If the daemon dies (SIGKILL, SEGV, exit) BEFORE writing the matching RESP_FRAME: - the inflight entry is never popped - v4l2_m2m_buf_done_and_job_finish is never called - the m2m scheduler still thinks a job is running Later, when the V4L2 consumer's close() runs (or gets signalled to exit), v4l2_m2m_ctx_release() → v4l2_m2m_cancel_job() waits for !job_running indefinitely. The consumer enters D-state and survives SIGKILL until reboot. Reproduced on hertz 2026-05-23, kernel 6.12.75+rpt-rpi-2712: $ sudo kill -STOP $DAEMON_PID # block daemon I/O $ ./test_m2m_decode keyframe.bin out.nv12 1920 1080 vp9 & $ sudo kill -9 $DAEMON_PID # chardev_release fires $ kill -9 $CLIENT_PID # ignored — D-state # client stack: v4l2_m2m_cancel_job+0x14c [v4l2_mem2mem] v4l2_m2m_ctx_release+0x20 [v4l2_mem2mem] daedalus_release+0x2c [daedalus_v4l2] v4l2_release+0x7c [videodev] __fput → do_exit → SIGKILL never delivered ## Fix New API daedalus_drain_inflight_on_disconnect() in main.{c,h}: walks the in-flight list, marks both src+dst buffers VB2_BUF_STATE_ERROR via v4l2_m2m_buf_done_and_job_finish(), and releases the bound media_request if any. Same completion shape as daedalus_complete_resp_frame() takes on the success path, just with state = ERROR for every in-flight entry. chardev_release calls the drain after flushing dev->req_queue (messages still in req_queue weren't released to the daemon yet, so they don't need the m2m-job-finish dance — freeing them is sufficient). The order matters: queue first (cheap), then m2m drain (heavier, takes the inflight list). Locking: list_splice_init under inflight_lock to take the entire list atomically; lock dropped before iterating because v4l2_m2m_buf_done_and_job_finish can sleep via vb2's buffer-done dispatch and can re-enter device_run via the scheduler (which would need inflight_lock again on the next REQ_DECODE). ## Verification path Cannot rmmod the running module on hertz right now — the D-state corpse from the repro session pins the refcount. Verification of the fixed module needs a reboot or fresh test host: $ sudo reboot # clears hung client $ sudo make modules_install # install new .ko $ sudo modprobe daedalus_v4l2 $ # rerun the repro script — client should die cleanly with $ # an -EIO / similar return from poll/DQBUF instead of hanging. Build: clean on Linux 6.12.75 + rpt-rpi-2712, no new warnings. The pre-existing "frame size 2128 > 2048" warning on daedalus_device_run is unchanged by this commit. ## Followup not in scope If a new V4L2 consumer races a REQ_DECODE through device_run AFTER the drain has spliced the list (but before the daemon chardev is reopened), the new entry sits in a freshly-empty inflight list and the same hang can recur for that consumer when the systemd auto-restart of the daemon either fails or takes longer than the consumer's patience. A secondary safeguard would be to fail-fast in device_run when dev->chardev is unopened — proposing as a separate ticket if this race materialises in practice. Closes #146. |
||
|
|
c7f6fb90cb |
Phase 8.6: dmabuf + AV1 + H.264 + stateless controls
Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE
buffers as dmabuf-fds the daemon mmaps and writes pixels into
directly. Adds AV1 + H.264 codec support, V4L2 stateless control
registration, and the compliance polish that brings the driver
to 47/48 v4l2-compliance pass.
Protocol (include/daedalus_v4l2_proto.h):
- struct daedalus_req_decode grew capture-buffer metadata
(width/height/pix_fmt/num_planes + per-plane size+stride).
- New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon
asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf
in daemon task context so the fd lands in the daemon's table.
Kernel m2m driver (kernel/daedalus_v4l2_main.c):
- Both queues switched to vb2_dma_contig_memops. OUTPUT was
vmalloc in 8.5; the switch is needed because vmalloc doesn't
honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's
REQBUFS test rejected the driver because of it. We still
read bitstream via vb2_plane_vaddr (dma_contig gives a
kernel virtual address just like vmalloc did).
- dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe.
- queue_setup populates alloc_devs[plane] = &pdev->dev for
both queues; allow_cache_hints=1 on both.
- daedalus_export_capture_dmabuf(cookie, plane, flags, *fd):
walks inflight list, calls vb2_core_expbuf on the CAPTURE
buffer in the caller's (daemon's) task context.
- device_run fills the new REQ_DECODE capture fields from
ctx->dst_fmt and maps ctx->src_fmt.pixelformat to
DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9).
- daedalus_complete_resp_frame handles both the 8.5 inline
path (kept for debugging) and the 8.6 dmabuf path (pixels
already in CAPTURE buffer, just set payload from metadata).
- enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264).
- try_fmt preserves userspace colorspace fields instead of
overwriting with REC709 defaults (fixes 8.5 compliance fail).
- s_fmt propagates OUTPUT colorspace → CAPTURE (stateless
decoder round-trip test at v4l2-test-formats.cpp:958).
- 12 V4L2 stateless controls registered per open (VP9_FRAME,
VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/
SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/
TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg
re-parses); registration is what makes libva-v4l2-request
see us.
Kernel chardev (kernel/daedalus_v4l2_chardev.c):
- New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to
daedalus_export_capture_dmabuf.
- debugfs test_decode cookies unified with the m2m cookie
allocator via shared daedalus_next_cookie() — kills the
Phase 8.5 namespace collision.
Daemon (daemon/src/...):
- New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on
REQ_DECODE; munmap + close on completion. O_RDWR | O_CLOEXEC
is essential — vb2_core_expbuf extracts O_ACCMODE from flags
and exports read-only by default (caught on first run; mmap
-EACCES on PROT_WRITE).
- decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in
addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes
writes Y line-by-line into planes[0] with planes[0].stride;
interleaves Cb/Cr into planes[1] with planes[1].stride.
- chardev_client.c handle_req_decode: opens dmabuf planes,
runs decode (pixels land in CAPTURE buffer directly), closes
planes, sends metadata-only RESP_FRAME. No wire-pixel
allocation.
Test harness (tools/test_m2m_decode.c):
- Optional 5th arg `codec` (vp9 | av1 | h264). Same client
drives all three codecs.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`:
VP9 1920x1080 3,110,400 bytes MATCH
AV1 128x96 18,432 bytes MATCH
H.264 128x96 18,432 bytes MATCH
VP9 1080p went through the full dmabuf path with no chardev
payload bloat — the same chardev that capped at 64 KiB in 8.5
now ferries metadata only and lets the daemon mmap+write a
3.1 MB frame directly into the V4L2 client's buffer.
v4l2-compliance:
Phase 8.1: 44/48
Phase 8.5: 44/48 (different fails after m2m landed)
Phase 8.6: 47/48
Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media
controller — explicitly Phase 8.7 work).
11 standard compound controls visible:
vp9_frame_decode_parameters, vp9_probabilities_updates,
h264_sequence_parameter_set, h264_picture_parameter_set,
h264_scaling_matrix, h264_prediction_weight_table,
h264_slice_parameters, h264_decode_parameters,
av1_sequence_parameters, av1_frame_parameters,
av1_film_grain (av1_tile_group_entry refused by hdl->error
on this kernel — skipped silently).
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- Phase 8.6 marked closed with the closure-doc reference.
- Phase 8.7 reshaped to (1) media controller, (2) perf +
daedalus_dispatch_* substitution, (3) HDR/10-bit, (4)
long-form multi-frame streaming.
Per correctness-before-speed:
- Real V4L2 dmabuf via vb2_core_expbuf (not a sideband
fd-passing hack).
- O_RDWR access mode threaded through correctly.
- Strict pixel-byte comparison against ffmpeg, not "looks
right" eyeballing.
- Each compliance edge documented with the underlying test
source-line + the fix.
- All resource paths cleaned (munmap + close per plane on
every exit, including error paths).
Phase 8.7 next: media controller binding (closes last
compliance fail), per-frame profiling, QPU dispatch
substitution targeting 30fps@1080p from
30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6f4b580f7c |
Phase 8.5: full V4L2 m2m driver, VP9 decode via QBUF/DQBUF
Replaces the Phase 8.4 debugfs-triggered chardev path with a
real V4L2 m2m driver. Userspace clients now drive decoding the
standard way — S_FMT / REQBUFS / QBUF on the OUTPUT (bitstream)
queue, DQBUF on the CAPTURE (NV12M) queue. Kernel device_run
packs the bitstream into REQ_DECODE; daemon decodes via FFmpeg;
RESP_FRAME's inline NV12 pixel payload lands in the CAPTURE
buffer. Phase 8.6 swaps the inline payload for dmabuf so big
frames stop being capped at 64 KiB.
Kernel (daedalus_v4l2_main.c, rewritten + main.h added):
- Per-open struct daedalus_ctx: v4l2_fh, m2m_ctx, ctrl_handler,
per-queue v4l2_pix_format_mplane.
- Two vb2_queues (vb2_vmalloc_memops for both — no DMA needed
yet; 8.6 switches CAPTURE to dma_contig for dmabuf-export):
OUTPUT = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, VP9_FRAME
CAPTURE = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, NV12M
- Full v4l2_ioctl_ops table: querycap, enum_fmt, g/s/try_fmt
for both queues, reqbufs/querybuf/qbuf/dqbuf/create_bufs/
prepare_buf/expbuf/streamon/streamoff via v4l2_m2m_ioctl_*
helpers.
- v4l2_m2m_ops.device_run: peeks next OUTPUT buf, builds
REQ_DECODE inline with the bitstream bytes, enqueues with an
auto-incrementing cookie, stores {ctx, src_buf, dst_buf} in
a per-device inflight list. Job stays open until RESP_FRAME.
- daedalus_complete_resp_frame(): pops the inflight entry,
memcpys inline NV12 pixels into the CAPTURE buffer (Y plane
+ interleaved CbCr), finishes via
v4l2_m2m_buf_done_and_job_finish — NOT plain buf_done +
job_finish, which leaves the src buf on the m2m queue and
causes device_run to immediately re-run on the same input
(caught on first run; second REQ_DECODE for same bitstream +
eventual oops in stop_streaming on teardown).
Kernel (daedalus_v4l2_chardev.c):
- RESP_FRAME handler now hands inline pixel payload to
daedalus_complete_resp_frame so it lands in the CAPTURE
vb2 buffer. Existing PONG and debugfs test_decode paths still
work; the latter produces a harmless ratelimited "unknown
cookie" since it bypasses V4L2 m2m.
Daemon (decoder.c, decoder.h):
- daedalus_decoder_run_request signature extended with
(nv12_out, nv12_cap, nv12_used). After the FNV-1a digest the
decoder packs YUV420P into NV12 in the caller's buffer: Y
plane line-by-line stripped of stride padding; Cb/Cr
interleaved into a single chroma plane. Truncation silent —
kernel only memcpys what fits in the CAPTURE plane.
Daemon (chardev_client.c):
- handle_req_decode allocates a response buffer sized for the
full chardev payload, lets decoder fill the pixel area
after the resp_frame struct, sends the full payload via the
existing send_response.
Test client (tools/test_m2m_decode.c, new):
- Minimal V4L2 m2m client: S_FMT both queues, REQBUFS 1 each,
mmap+fill OUTPUT, QBUF both, STREAMON, poll, DQBUF, dump
CAPTURE planes to a raw NV12 file. ~250 LOC; verifies the
whole flow without needing v4l2-ctl framing.
Roadmap update (docs/roadmap.md):
- Phase 8.4 retitled "daemon ↔ kernel decode round-trip"
to reflect what actually shipped (vs. the original V4L2-
ioctl-driven plan which moved here).
- Phase 8.5 retitled "full V4L2 m2m driver" with closure
status.
- Phase 8.6 reshaped to two tracks: dmabuf + AV1/H.264/
stateless controls + media controller. Adds the punch list
of v4l2-compliance failures (DECODER_CMD, S_FMT colorspace)
that 8.6 will fix.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
Kernel + daemon build clean (-Wall -Wextra clean both sides).
Test harness drives one VP9 keyframe end-to-end:
OUTPUT REQBUFS -> 2
CAPTURE REQBUFS -> 2
QBUF OUTPUT[0] bytesused=1566
QBUF CAPTURE[0]; STREAMON both
poll revents=0x5
DQBUF OUTPUT[0] flags=0x4001 (DONE)
DQBUF CAPTURE[0] flags=0x4000 payloads=[12288, 6144]
wrote 12288 Y + 6144 UV bytes to /tmp/out_m2m.nv12
Pixel correctness vs reference:
ffmpeg -i vp9_small.ivf -pix_fmt nv12 -f rawvideo -y ref.nv12
cmp /tmp/out_m2m.nv12 /tmp/ref.nv12 → match ✓
Byte-for-byte identical to FFmpeg's stock CPU decode.
v4l2-compliance: detected as Stateless Decoder; most ioctls
pass; two expected fails documented in closure doc
(DECODER_CMD/media controller, S_FMT colorspace).
Clean teardown: SIGTERM the daemon, rmmod the module, no
oops/WARN in dmesg.
Per correctness-before-speed:
- Real V4L2 ioctl table (not stubs); uses v4l2-core helpers
where they exist instead of reinventing.
- v4l2_m2m_buf_done_and_job_finish (not the manual sequence)
to keep scheduler state consistent.
- Bit-exact reference comparison, not just "looks right."
- Documented every compliance failure with the planned fix.
- All resource paths (kmalloc/kfree, inflight list cleanup,
src/dst buf removal in stop_streaming) handled on every
error branch.
Phase 8.6 next: dmabuf-export for CAPTURE (removes 64 KiB
frame-size cap), add AV1+H.264 codecs, add V4L2 stateless
controls + media controller binding, fix the colorspace +
cookie-namespace compliance issues.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2a449632b9 |
Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end)
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit,
daemon hands the bitstream to libavcodec via dlopen, sends
RESP_FRAME back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in
dmabuf + QPU dispatch.
Protocol (include/daedalus_v4l2_proto.h):
- New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel)
message types, with fixed-size payload structs.
- New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's
AV1+H.264 work doesn't move existing values).
- New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN /
ERR_SEND / ERR_RECV / ERR_CODEC).
- Converted the prior `enum daedalus_msg_type` to #defines —
high-bit values exceed INT_MAX and tripped -Wpedantic on
userspace; kernel uABI headers use the same idiom.
Kernel (kernel/daedalus_v4l2_chardev.c):
- New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode:
writing raw bitstream bytes wraps them in a REQ_DECODE
(codec=VP9 for Phase 8.4) and enqueues with an
auto-incrementing cookie.
- daedalus_chardev_write learned RESP_FRAME: parses the payload
and emits a single pr_info line with decode metadata. Keeps
existing PONG handling on the default arm.
Daemon (daemon/src/...):
- chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read
loop, single-buffer write() responses (kernel chardev has only
.write, not .write_iter, so writev lands as -EINVAL —
discovered the hard way during first run).
- decoder.{c,h} — lazily-opened AVCodecContext per codec, shared
AVPacket/AVFrame pair, descriptor-driven plane walker
(av_pix_fmt_desc_get) so the same hash path covers YUV420P,
YUV422P, YUV444P, GBRP and other 8-bit planar layouts.
Generalised after first run decoded testsrc as GBRP (71)
rather than the assumed YUV420P.
- `daemon` command in main.c opens the chardev and runs the loop
until SIGINT/SIGTERM. Cookie correlation handled end-to-end.
- ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total).
Build:
- CMakeLists adds chardev_client.c + decoder.c; explicit
-I../include for the shared protocol header.
- Still -Wall -Wextra -Wpedantic clean.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
$ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \
-y /tmp/vp9_test.ivf
$ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B)
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ sudo dd if=vp9_keyframe.bin \
of=/sys/kernel/debug/daedalus_v4l2/test_decode
daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240
fnv1a=0x6ef10d71 luma=76800 chroma=38400
kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0
fnv1a=0x6ef10d71 ← matches daemon ✓
Hash properties verified:
cookie=2 testsrc 3268 B → 0x6ef10d71 (first decode)
cookie=3 red 44 B → 0x7f6e5dc5 (content-dependent ✓)
cookie=4 testsrc 3268 B → 0x6ef10d71 (deterministic ✓)
cookie=5 64 B random → status=101 (ERR_SEND, daemon alive)
Daemon survives bad input (FFmpeg "Invalid sync code" wrapped
into structured ERR_SEND response). Clean SIGTERM shutdown,
clean rmmod.
Phase 8.4 acceptance criteria met:
- ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip
- ✓ cookie correlation per request/response pair
- ✓ content-dependent + deterministic digest
- ✓ structured error responses (no daemon crash on bad input)
- ✓ clean teardown (SIGTERM + rmmod)
- ✓ builds clean on both kernel kbuild and daemon CMake
Per correctness-before-speed:
- Real chardev I/O (no shortcuts, no select-loop hacks)
- Real FFmpeg AVCodecContext lifecycle (lazily opened, properly
freed on cleanup)
- Descriptor-driven plane walk (generalises across pix_fmts)
- Structured error path (not just log-and-continue)
- All resource paths cleaned up on every error branch
- Documented why FNV-1a digest, why write() not writev(), why
pix_desc walk in docs/phase_8_4_closure.md
Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from
vidioc_qbuf; dmabuf carries actual pixel data so the chardev's
64 KiB cap doesn't gate frame size; begin substituting
daedalus_dispatch_* into the daemon's decode path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
895f57c63a |
Phase 8.2: kernel ↔ daemon chardev bridge with round-trip test
Adds /dev/daedalus-v4l2 misc chardev to the kernel module. The
chardev is the IPC channel for the future userspace decoder
daemon: kernel enqueues REQ_* messages, daemon read()s them,
processes, write()s RESP_* back.
Wire protocol (pre-1.0, header in include/daedalus_v4l2_proto.h):
- struct daedalus_msg_hdr: magic (D04V) + version + type +
cookie + payload_len + reserved
- Request/response separated by high bit of type field
- Max 64 KiB payload per message
- Cookie correlates request with matching response
Kernel implementation (kernel/daedalus_v4l2_chardev.{c,h}):
- Single-instance chardev (-EBUSY on second open)
- In-kernel FIFO bounded at 64 messages
- Blocking + non-blocking read; poll() with EPOLLIN on queued
- write() parses + validates header, logs response at pr_debug
- Bad magic → -EBADMSG, bad version → -EPROTO, oversize → -EMSGSIZE
- All error paths free resources
Phase 8.2 test trigger via debugfs:
- /sys/kernel/debug/daedalus_v4l2/test_ping — any byte
enqueues a PING with a fixed 24-byte payload. Removed in
Phase 8.4 when real REQ_DECODE from V4L2 path takes over.
Userspace verification tool (tools/test_chardev_pingpong.c):
- Real C program, proper error reporting via strerror
- Validates the 6-step round-trip: open → empty-queue EAGAIN →
trigger ping → read PING → verify all fields → write PONG → close
- Builds with -Wall -Wextra clean
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
$ sudo insmod daedalus_v4l2.ko
$ sudo tools/test_chardev_pingpong
opening /dev/daedalus-v4l2...
non-blocking read on empty queue: EAGAIN ✓
injected PING via debugfs ✓
read PING: magic ✓ version ✓ type=PING ✓ cookie=0x1234 ✓ payload=24 bytes
payload: "DAEDALUS-V4L2-PING-PL"
wrote PONG (cookie=0x1234) ✓
ALL TESTS PASSED.
$ sudo rmmod daedalus_v4l2 # clean
Per correctness-before-speed: full kerneldoc on structs, 8-tab
kernel style, SPDX headers, proper error paths, real test
program (not "I ran it once"), failure-mode coverage documented.
Phase 8.3 next: userspace daemon with dlopen'd FFmpeg parse path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|