Files
daedalus-v4l2/docs/phase_8_4_closure.md
marfrit 2a449632b9 Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end)
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit,
daemon hands the bitstream to libavcodec via dlopen, sends
RESP_FRAME back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in
dmabuf + QPU dispatch.

Protocol (include/daedalus_v4l2_proto.h):
- New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel)
  message types, with fixed-size payload structs.
- New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's
  AV1+H.264 work doesn't move existing values).
- New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN /
  ERR_SEND / ERR_RECV / ERR_CODEC).
- Converted the prior `enum daedalus_msg_type` to #defines —
  high-bit values exceed INT_MAX and tripped -Wpedantic on
  userspace; kernel uABI headers use the same idiom.

Kernel (kernel/daedalus_v4l2_chardev.c):
- New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode:
  writing raw bitstream bytes wraps them in a REQ_DECODE
  (codec=VP9 for Phase 8.4) and enqueues with an
  auto-incrementing cookie.
- daedalus_chardev_write learned RESP_FRAME: parses the payload
  and emits a single pr_info line with decode metadata. Keeps
  existing PONG handling on the default arm.

Daemon (daemon/src/...):
- chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read
  loop, single-buffer write() responses (kernel chardev has only
  .write, not .write_iter, so writev lands as -EINVAL —
  discovered the hard way during first run).
- decoder.{c,h} — lazily-opened AVCodecContext per codec, shared
  AVPacket/AVFrame pair, descriptor-driven plane walker
  (av_pix_fmt_desc_get) so the same hash path covers YUV420P,
  YUV422P, YUV444P, GBRP and other 8-bit planar layouts.
  Generalised after first run decoded testsrc as GBRP (71)
  rather than the assumed YUV420P.
- `daemon` command in main.c opens the chardev and runs the loop
  until SIGINT/SIGTERM. Cookie correlation handled end-to-end.
- ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total).

Build:
- CMakeLists adds chardev_client.c + decoder.c; explicit
  -I../include for the shared protocol header.
- Still -Wall -Wextra -Wpedantic clean.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  $ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \
           -y /tmp/vp9_test.ivf
  $ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B)

  $ sudo insmod kernel/daedalus_v4l2.ko
  $ daedalus_v4l2_daemon -v daemon &
  $ sudo dd if=vp9_keyframe.bin \
         of=/sys/kernel/debug/daedalus_v4l2/test_decode

  daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240
          fnv1a=0x6ef10d71 luma=76800 chroma=38400
  kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0
          fnv1a=0x6ef10d71  ← matches daemon ✓

Hash properties verified:
  cookie=2  testsrc 3268 B → 0x6ef10d71  (first decode)
  cookie=3  red     44 B   → 0x7f6e5dc5  (content-dependent ✓)
  cookie=4  testsrc 3268 B → 0x6ef10d71  (deterministic ✓)
  cookie=5  64 B random    → status=101  (ERR_SEND, daemon alive)

Daemon survives bad input (FFmpeg "Invalid sync code" wrapped
into structured ERR_SEND response). Clean SIGTERM shutdown,
clean rmmod.

Phase 8.4 acceptance criteria met:
- ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip
- ✓ cookie correlation per request/response pair
- ✓ content-dependent + deterministic digest
- ✓ structured error responses (no daemon crash on bad input)
- ✓ clean teardown (SIGTERM + rmmod)
- ✓ builds clean on both kernel kbuild and daemon CMake

Per correctness-before-speed:
- Real chardev I/O (no shortcuts, no select-loop hacks)
- Real FFmpeg AVCodecContext lifecycle (lazily opened, properly
  freed on cleanup)
- Descriptor-driven plane walk (generalises across pix_fmts)
- Structured error path (not just log-and-continue)
- All resource paths cleaned up on every error branch
- Documented why FNV-1a digest, why write() not writev(), why
  pix_desc walk in docs/phase_8_4_closure.md

Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from
vidioc_qbuf; dmabuf carries actual pixel data so the chardev's
64 KiB cap doesn't gate frame size; begin substituting
daedalus_dispatch_* into the daemon's decode path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:22:16 +00:00

215 lines
8.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 8.4 closure — daemon ↔ kernel decode round-trip (VP9)
**Status:** closed 2026-05-18.
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects `REQ_DECODE` carrying a raw VP9 access
unit, daemon hands the bitstream to libavcodec via dlopen, sends
`RESP_FRAME` back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps
in dmabuf + QPU dispatch.
## What lands
### Protocol (`include/daedalus_v4l2_proto.h`)
- New message types: `REQ_DECODE` (kernel→daemon) and
`RESP_FRAME` (daemon→kernel). Converted the prior `enum
daedalus_msg_type` to `#define`s — high-bit values exceed
INT_MAX and tripped -Wpedantic on userspace builds; kernel uABI
headers use the same idiom.
- New payload structs `daedalus_req_decode`,
`daedalus_resp_frame`.
- New codec id enum (`DAEDALUS_CODEC_VP9 = 1`); wire-stable so
Phase 8.6's AV1/H.264 additions don't move the existing
values.
- New status enum (`DAEDALUS_DECODE_OK`, `..._NO_FRAME`,
`..._ERR_OPEN`, `..._ERR_SEND`, `..._ERR_RECV`,
`..._ERR_CODEC`).
### Kernel (`kernel/daedalus_v4l2_chardev.c`)
- New debugfs entry `/sys/kernel/debug/daedalus_v4l2/test_decode`
— writing raw bitstream bytes wraps them in a `REQ_DECODE`
(codec hard-wired to VP9 for Phase 8.4) and enqueues for the
daemon. Auto-incrementing cookie per request.
- `daedalus_chardev_write` learned `RESP_FRAME`: parses the
fixed-size payload and emits a single `pr_info` line with the
decode metadata. Keeps the existing PONG path on the default
arm.
### Daemon (`daemon/src/...`)
- `chardev_client.{c,h}` — opens `/dev/daedalus-v4l2`, blocking
read loop dispatching on message type, writes responses via
single contiguous `write()` (kernel chardev has only `.write`,
no `.write_iter`, so `writev` lands as -EINVAL — discovered
the hard way during first end-to-end run).
- `decoder.{c,h}` — encapsulates the AVCodecContext (lazily
opened on first request per codec), shared AVPacket/AVFrame
pair, and an FNV-1a digest of the decoded planes. Plane walk
is descriptor-driven (`av_pix_fmt_desc_get`) so the same code
path covers YUV420P, YUV422P, YUV444P, GBRP and other 8-bit
planar layouts.
- `daemon` command in `main.c` opens the chardev and runs the
loop until SIGINT / SIGTERM.
- `ffmpeg_loader` gained `av_pix_fmt_desc_get` (23 resolved
symbols total).
### Build
- CMakeLists adds `chardev_client.c` and `decoder.c` to the
executable; explicit `-I../include` for the shared protocol
header.
- Still `-Wall -Wextra -Wpedantic` clean.
## Verification
Kernel module built clean against the in-tree headers
(`linux-headers-6.12.75+rpt-rpi-2712`):
```
$ cd /home/mfritsche/src/daedalus-v4l2/kernel && make
CC [M] daedalus_v4l2_chardev.o
LD [M] daedalus_v4l2.ko
```
Daemon built clean:
```
$ cmake --build build/
[100%] Built target daedalus_v4l2_daemon
```
End-to-end:
```
$ ffmpeg -hide_banner -loglevel warning -f lavfi \
-i 'testsrc=duration=0.04:size=320x240:rate=25' \
-pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 -y /tmp/vp9_test.ivf
$ python3 -c "..." # strip IVF framing → /tmp/vp9_keyframe.bin
extracted 3268 bytes raw VP9
$ sudo insmod kernel/daedalus_v4l2.ko
$ /tmp/start_daemon.sh # daemon mode, blocks on read
$ sudo dd if=/tmp/vp9_keyframe.bin \
of=/sys/kernel/debug/daedalus_v4l2/test_decode bs=8192 count=1
daemon log:
[INFO] REQ_DECODE cookie=2 codec=1 bitstream=3268 bytes
[INFO] decoder: opened vp9 context
[INFO] decoder: OK 320x240 fmt=0 (yuv420p) fnv1a=0x6ef10d71 luma=76800 chroma=38400
kernel log:
[16199.734667] daedalus_v4l2: REQ_DECODE enqueued cookie=2 codec=VP9 bitstream=3268
[16199.735951] daedalus_v4l2: RESP_FRAME cookie=2 status=0 codec=1 320x240
pixfmt=0 luma=76800 chroma=38400 fnv1a=0x6ef10d71
^^^^^^^^^^
matches the daemon's
```
### Hash properties (sanity)
| Trigger | Bitstream | Hash | Notes |
|---|---|---|---|
| testsrc 320×240 | 3268 B | `0x6ef10d71` | first decode (codec open) |
| color=red 320×240 | 44 B | `0x7f6e5dc5` | hash changes with content ✓ |
| testsrc again | 3268 B | `0x6ef10d71` | deterministic ✓ |
| 64 B `/dev/urandom` | 64 B | n/a | structured error, status=101 |
The garbage-input case is the interesting one: FFmpeg's
`avcodec_send_packet` returned -1094995529 ("Invalid sync code"),
the daemon stayed alive, wrapped that into
`DAEDALUS_DECODE_ERR_SEND`, sent `RESP_FRAME` with status=101 and
zeroed metadata. Kernel logged the response. No daemon crash,
no kernel oops, no stuck request in the FIFO.
### Cleanup
```
$ pkill -TERM -f daedalus_v4l2_daemon # daemon exits cleanly
$ sudo rmmod daedalus_v4l2 # ok, all queued requests drained
```
## Design decisions
### Why FNV-1a, why no pixel data on the wire?
The chardev's wire protocol caps single messages at 64 KiB
(`DAEDALUS_PROTO_MAX_PAYLOAD`). A single 1080p YUV420P frame is
3.1 MB — orders of magnitude larger. Forcing pixel data through
the chardev would require either:
1. Fragmentation across multiple messages (re-assembly state in
kernel, complexity tax for a temporary path).
2. Bumping the limit, which lifts the per-message kmalloc out of
GFP_KERNEL territory.
Neither is the right answer. Phase 8.5 wires dmabuf for actual
frame transfer; the FNV-1a digest is just enough to prove "the
right bytes came out of the decoder" without paying that cost
yet. The digest also stays useful as a cross-host sanity check
(reference vs target).
### Why `write()` not `writev()` in the daemon?
The kernel chardev implements only `.write` in its fops — not
`.write_iter`. Modern Linux does not auto-fallback `writev →
write`; userspace `writev` returns `-EINVAL` directly. Options
were:
1. Implement `.write_iter` in the kernel (slightly more code,
buys nothing functionally for a one-or-two-iovec write).
2. Marshal into a single buffer in the daemon (one
short-lived malloc per response, dead simple).
Picked (2). Response payloads are ≤ 36 B (struct
daedalus_resp_frame) for decode and ≤ 64 KiB for PONG; a
malloc/free per response is invisible at the scale we're
working.
### Why plane walk via `av_pix_fmt_desc_get`?
First end-to-end run decoded `testsrc` (an RGB-native source) as
`AV_PIX_FMT_GBRP` (71), not `YUV420P`. The original hand-rolled
hash hard-coded YUV420P plane geometry and fell back to "plane 0
only" otherwise — fine for a one-time test, but it would silently
miss two-thirds of the pixels on the very first real-world
content variation.
Using `AVPixFmtDescriptor` directly gives us a generic plane
walker: how many planes, which components live in each, and
the chroma subsampling shifts. Now the same hash path correctly
covers planar YUV (any subsampling), GBRP, and similar
8-bit-per-sample layouts. 10/12-bit (P010, YUV420P10LE) needs a
depth-aware variant — that lands when Phase 8.6 starts looking at
HDR.
## What's NOT here (deferred)
- **dmabuf / DRM PRIME**: Phase 8.5. RESP_FRAME today carries
metadata + digest only; actual pixel data goes out-of-band via
dmabuf in the next phase.
- **V4L2 buffer-queue wiring**: REQ_DECODE today is debugfs-
triggered. Phase 8.5+ has the V4L2 m2m queue submit
requests from `vidioc_qbuf`.
- **QPU dispatch**: the daemon decodes on CPU via FFmpeg.
Substituting per-block dispatch into the sibling
daedalus-fourier kernels (cycles 1, 2, 4, 9) lands once the
daemon-side parser can extract block-level metadata — that's a
Phase 8.5/8.6 follow-up.
- **AV1 / H.264**: decoder rejects them with
`DAEDALUS_DECODE_ERR_CODEC` today. Phase 8.6 adds the codec
contexts.
- **10-bit pixel formats**: hash path is 8-bit/sample/plane only.
## Phase 8.5 plan
1. Replace debugfs `test_decode` with V4L2 m2m queue submission:
`vidioc_qbuf` on the OUTPUT queue extracts the bitstream from
the userspace plane and calls `daedalus_chardev_enqueue_req`.
2. dmabuf import on the CAPTURE queue: daemon writes decoded
pixels into a kernel-allocated dmabuf and `RESP_FRAME`
references the buffer index, not raw bytes.
3. Drive a userspace V4L2 client (start with `v4l2-compliance
--stream-options` then a tiny custom test) end-to-end.
4. Begin substituting `daedalus_dispatch_*` calls into the
daemon's decode path for kernels where the QPU implementation
matches the FFmpeg block format.