1e9619afe8c3574f239fc08bd1a7b078952aec24
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1e9619afe8 |
daemon: AV1 Sequence Header OBU synthesiser + unit test
V4L2 stateless AV1 passes the sequence header information as a structured control (V4L2_CID_STATELESS_AV1_SEQUENCE) and ships only tile-group bytes in the OUTPUT buffer. libavcodec's AV1 decoder is full-bitstream, so the daemon needs to reconstruct the OBU bytes the consumer parsed out before feeding the assembled stream to libavcodec. This commit lands the Sequence Header OBU half of that reconstruction — av1_synth_sequence_header_obu(). Frame Header / Frame OBU synthesisers + the integration that wires the assembled OBUs into the decode hot path are separate follow-on modules. Module shape mirrors the H.264 NAL synthesiser (PR #1): - Public API: single function returning byte count or 0 on overflow/invalid input. - Wire encoder uses the existing bitstream_writer (bsw_put_u is AV1's f(n); bsw_put_ue is bit-identical to AV1's uvlc; bsw_align_rbsp matches AV1's trailing_bits()). - AV1-specific helpers (leb128 size, min_bits_for, subsampling resolution per §5.5.2) are file-local statics. - No emulation prevention — AV1 uses leb128-sized OBUs for bitstream boundaries, not byte-pattern escapes. Synthesis decisions for fields V4L2 doesn't carry are documented verbatim in the file header (reduced_still_picture_header = 0; single operating point at seq_level_idx = 13 / level 5.1; color_description_present_flag = 0; chroma_sample_position = 0; seq_choose_screen_detection_tools = 1; seq_choose_integer_mv = 1). Rejection cases: - seq_profile > 2 - bit_depth not in {8, 10, 12} - seq_profile = 1 + monochrome (4:4:4 forced colour) - seq_profile = 1 + bit_depth = 12 (only profile 2 allows it) - max_frame_{width,height}_minus_1 requiring > 16 length bits - out_cap too small to hold header + leb128 + payload Each returns 0 to surface the mismatch loudly rather than emit nonsense the libavcodec parser would reject downstream. Unit test (test_av1_obu_synth.c, opt-in via DAEDALUS_BUILD_TESTS=ON) exercises four cases bit-by-bit against a hand-computed reference: 1. profile 0, 1080p, 8-bit, 4:2:0, order_hint on (7 bits), CDEF+restoration on — the common Pi 5 path. 2. profile 0, 720p, 10-bit, monochrome — exercises high_bitdepth and the monochrome short-form color_config. 3. profile 1 + bit_depth 12 → expects 0 (rejected). 4. tiny out_cap → expects 0 (overflow). All four green on hertz (aarch64 Arch, gcc Wall+Wextra+Wpedantic clean). This commit does not change daemon behaviour — av1_obu_synth.c is built into the daemon binary so the symbols are reachable, but no call site is wired yet. Integration goes in the follow-on DAEMON-AV1 patches that also synthesise the Frame Header OBU and bracket the assembled OBUs with a Temporal Delimiter. Refs reauktion/daedalus-v4l2#11 daemon-half; closes daedalus backlog task #144. |
||
|
|
88b2ebfaa9 |
daemon: link daedalus-fourier + log substrate availability at startup
First incremental step toward H.264 daemon-rewrite (daedalus-v4l2#11):
make the daedalus-fourier kernel library available to the daemon
process so subsequent patches can substitute its primitives
(IDCT 4×4, IDCT 8×8, luma vertical deblock, etc.) for libavcodec's
per-MB pixel math.
This patch does NOT yet dispatch any kernels. It only:
- Adds `pkg_check_modules(DAEDALUS_FOURIER REQUIRED daedalus-fourier)`
to the daemon's CMakeLists, with explicit link ordering
(libdaedalus_core.a must precede -lvulkan because the static
archive references vulkan symbols and the linker resolves
left-to-right). We bypass IMPORTED_TARGET because pkg-config's
Requires.private chain leaves CMake's dependency graph reordering
the archive after -lvulkan, breaking the static link.
- Calls daedalus_ctx_create_no_qpu() at daemon startup, logs the
substrate-availability line, destroys the context at exit.
no_qpu mode skips V3D Vulkan probe — proves linkage works
without depending on shader-path resolution (which is a
separate piece of work, since v3d_runner currently loads
.spv files from cwd-relative paths and consumer would need
a search path override).
Sample journal line:
[2026-05-21 17:59:35.271 INFO] daedalus-fourier: linked, ctx alive
(no_qpu mode; has_qpu=0)
Build-test verified on hertz (Pi 5 dev host) against an installed
copy of daedalus-fourier r35+gd87239d (from marfrit/daedalus-fourier
PR #1). Binary links cleanly, --help prints, daemon mode opens
chardev (fails predictably on hertz which has no daedalus_v4l2
kmod; on higgs this is the existing working path).
Follow-up patches per daedalus-v4l2#11:
1. Instrument the existing libavcodec decode path to count
per-frame IDCT blocks / deblock edges / MC tiles so we have
a baseline of what work the daemon dispatches for a typical
YouTube H.264 stream.
2. Substitute daedalus-fourier kernels one at a time, measuring
CPU saved per substitution.
3. Wire shader path resolution into daedalus_ctx_create() for
the QPU substrate (V3D opportunistic helper paths).
Wire protocol unchanged. DAEDALUS_PROTO_VERSION stays at 0.
|
||
|
|
8c1d9960c4 |
DAEMON-PPS: synthesise H.264 SPS/PPS NAL units from V4L2 controls
libva-v4l2-request-fourier (and any V4L2-stateless-API consumer)
passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS}
controls; only the slice NAL goes into the OUTPUT buffer. This is
correct per the V4L2 stateless contract. But libavcodec — which
the daedalus daemon uses for actual decode (Option γ) — wants a
self-contained AnnexB stream including SPS+PPS before any slice.
Result on higgs: "non-existing PPS 0 referenced" + decode_slice_
header errors on every H.264 frame, even after LIBVA-1 and -2
routing correctly delivered the request to the daemon.
Fix splits across kernel + daemon, keeping the kernel module as a
thin transport and putting the actual NAL encoding in userspace:
include/daedalus_v4l2_proto.h:
Add struct daedalus_h264_meta (the four v4l2_ctrl_h264_*
structs the kernel collects) and DAEDALUS_REQ_FLAG_H264_META
(set in req.flags when the meta block is present between the
daedalus_req_decode prefix and the slice bitstream).
kernel/daedalus_v4l2_main.c:
Add daedalus_collect_h264_meta() — reads the H.264 ctrl values
from the bound media_request via v4l2_ctrl_find +
ctrl->p_cur.p_h264_*. device_run() calls it on H.264 codec_id,
copies the structs into the REQ_DECODE payload between the
prefix and bitstream, and sets the flag. Payload size is
bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD so an over-
sized slice + meta fails loud instead of truncating.
daemon/src/bitstream_writer.{c,h}:
New module — MSB-first bit packer with H.264 Exp-Golomb ue(v)
and se(v) coding + rbsp_trailing_bits alignment. Sticky
overflow flag so callers can verify the output buffer wasn't
truncated.
daemon/src/h264_nal_synth.{c,h}:
New module — turns v4l2_ctrl_h264_sps / v4l2_ctrl_h264_pps
into AnnexB-framed NAL units per ITU-T H.264 7.3.2.1 / 7.3.2.2.
Emits emulation prevention bytes (0x03 after every 00 00 in the
EBSP) and the 4-byte start code (0x00000001). Coverage matches
what V4L2 stateless surface gives us: VUI parameters and full
scaling matrices are NOT emitted (V4L2 doesn't carry them — the
seq_scaling_matrix_present_flag is set to 0 and libavcodec uses
flat defaults, which matches the de-facto behaviour of most
H.264 streams libva-v4l2-request drives).
daemon/src/decoder.c:
daedalus_decoder_run_request() now takes an optional
h264_meta parameter. For codec_id == H264 with meta != NULL,
synthesises SPS+PPS NAL units, allocates a combined
[SPS][PPS][slice] buffer (+ AV_INPUT_BUFFER_PADDING_SIZE), and
feeds that to avcodec_send_packet instead of the raw slice.
VP9/AV1 path unchanged (frames are self-contained). Cleanup
now goes through a unified `out:` label so the assembled
buffer is always freed on every exit (including the existing
decoder_open_codec / no-frame / receive_frame failure paths).
daemon/src/chardev_client.c:
handle_req_decode() peels off the optional meta block when the
flag is set, passes it through to the decoder, and updates
the payload-length consistency check (now allows for an extra
sizeof(daedalus_h264_meta) when the flag is on).
Build (boltzmann aarch64): clean compile of all daemon sources,
including bitstream_writer + h264_nal_synth + the refactored
decoder.c. Kernel module compile to be verified via DKMS rebuild
on higgs in the marfrit-packages bump that follows.
Test plan: with this commit + a marfrit-packages daedalus pin
bump, higgs's ffmpeg -hwaccel vaapi -i h264_test.mp4 should
produce a successful decode (vs. the previous "non-existing PPS 0
referenced" failure). The daemon log should show:
decoder: opened h264 context
decoder: h264 prepended SPS=NB PPS=MB slice=KB
decoder: OK 320x240 fmt=0 (yuv420p) fnv1a=0x...
VP9 / AV1 behaviour unchanged — they don't carry meta and the
existing per-frame self-describing path still applies.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
c7f6fb90cb |
Phase 8.6: dmabuf + AV1 + H.264 + stateless controls
Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE
buffers as dmabuf-fds the daemon mmaps and writes pixels into
directly. Adds AV1 + H.264 codec support, V4L2 stateless control
registration, and the compliance polish that brings the driver
to 47/48 v4l2-compliance pass.
Protocol (include/daedalus_v4l2_proto.h):
- struct daedalus_req_decode grew capture-buffer metadata
(width/height/pix_fmt/num_planes + per-plane size+stride).
- New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon
asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf
in daemon task context so the fd lands in the daemon's table.
Kernel m2m driver (kernel/daedalus_v4l2_main.c):
- Both queues switched to vb2_dma_contig_memops. OUTPUT was
vmalloc in 8.5; the switch is needed because vmalloc doesn't
honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's
REQBUFS test rejected the driver because of it. We still
read bitstream via vb2_plane_vaddr (dma_contig gives a
kernel virtual address just like vmalloc did).
- dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe.
- queue_setup populates alloc_devs[plane] = &pdev->dev for
both queues; allow_cache_hints=1 on both.
- daedalus_export_capture_dmabuf(cookie, plane, flags, *fd):
walks inflight list, calls vb2_core_expbuf on the CAPTURE
buffer in the caller's (daemon's) task context.
- device_run fills the new REQ_DECODE capture fields from
ctx->dst_fmt and maps ctx->src_fmt.pixelformat to
DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9).
- daedalus_complete_resp_frame handles both the 8.5 inline
path (kept for debugging) and the 8.6 dmabuf path (pixels
already in CAPTURE buffer, just set payload from metadata).
- enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264).
- try_fmt preserves userspace colorspace fields instead of
overwriting with REC709 defaults (fixes 8.5 compliance fail).
- s_fmt propagates OUTPUT colorspace → CAPTURE (stateless
decoder round-trip test at v4l2-test-formats.cpp:958).
- 12 V4L2 stateless controls registered per open (VP9_FRAME,
VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/
SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/
TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg
re-parses); registration is what makes libva-v4l2-request
see us.
Kernel chardev (kernel/daedalus_v4l2_chardev.c):
- New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to
daedalus_export_capture_dmabuf.
- debugfs test_decode cookies unified with the m2m cookie
allocator via shared daedalus_next_cookie() — kills the
Phase 8.5 namespace collision.
Daemon (daemon/src/...):
- New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on
REQ_DECODE; munmap + close on completion. O_RDWR | O_CLOEXEC
is essential — vb2_core_expbuf extracts O_ACCMODE from flags
and exports read-only by default (caught on first run; mmap
-EACCES on PROT_WRITE).
- decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in
addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes
writes Y line-by-line into planes[0] with planes[0].stride;
interleaves Cb/Cr into planes[1] with planes[1].stride.
- chardev_client.c handle_req_decode: opens dmabuf planes,
runs decode (pixels land in CAPTURE buffer directly), closes
planes, sends metadata-only RESP_FRAME. No wire-pixel
allocation.
Test harness (tools/test_m2m_decode.c):
- Optional 5th arg `codec` (vp9 | av1 | h264). Same client
drives all three codecs.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`:
VP9 1920x1080 3,110,400 bytes MATCH
AV1 128x96 18,432 bytes MATCH
H.264 128x96 18,432 bytes MATCH
VP9 1080p went through the full dmabuf path with no chardev
payload bloat — the same chardev that capped at 64 KiB in 8.5
now ferries metadata only and lets the daemon mmap+write a
3.1 MB frame directly into the V4L2 client's buffer.
v4l2-compliance:
Phase 8.1: 44/48
Phase 8.5: 44/48 (different fails after m2m landed)
Phase 8.6: 47/48
Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media
controller — explicitly Phase 8.7 work).
11 standard compound controls visible:
vp9_frame_decode_parameters, vp9_probabilities_updates,
h264_sequence_parameter_set, h264_picture_parameter_set,
h264_scaling_matrix, h264_prediction_weight_table,
h264_slice_parameters, h264_decode_parameters,
av1_sequence_parameters, av1_frame_parameters,
av1_film_grain (av1_tile_group_entry refused by hdl->error
on this kernel — skipped silently).
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- Phase 8.6 marked closed with the closure-doc reference.
- Phase 8.7 reshaped to (1) media controller, (2) perf +
daedalus_dispatch_* substitution, (3) HDR/10-bit, (4)
long-form multi-frame streaming.
Per correctness-before-speed:
- Real V4L2 dmabuf via vb2_core_expbuf (not a sideband
fd-passing hack).
- O_RDWR access mode threaded through correctly.
- Strict pixel-byte comparison against ffmpeg, not "looks
right" eyeballing.
- Each compliance edge documented with the underlying test
source-line + the fix.
- All resource paths cleaned (munmap + close per plane on
every exit, including error paths).
Phase 8.7 next: media controller binding (closes last
compliance fail), per-frame profiling, QPU dispatch
substitution targeting 30fps@1080p from
30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2a449632b9 |
Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end)
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit,
daemon hands the bitstream to libavcodec via dlopen, sends
RESP_FRAME back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in
dmabuf + QPU dispatch.
Protocol (include/daedalus_v4l2_proto.h):
- New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel)
message types, with fixed-size payload structs.
- New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's
AV1+H.264 work doesn't move existing values).
- New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN /
ERR_SEND / ERR_RECV / ERR_CODEC).
- Converted the prior `enum daedalus_msg_type` to #defines —
high-bit values exceed INT_MAX and tripped -Wpedantic on
userspace; kernel uABI headers use the same idiom.
Kernel (kernel/daedalus_v4l2_chardev.c):
- New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode:
writing raw bitstream bytes wraps them in a REQ_DECODE
(codec=VP9 for Phase 8.4) and enqueues with an
auto-incrementing cookie.
- daedalus_chardev_write learned RESP_FRAME: parses the payload
and emits a single pr_info line with decode metadata. Keeps
existing PONG handling on the default arm.
Daemon (daemon/src/...):
- chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read
loop, single-buffer write() responses (kernel chardev has only
.write, not .write_iter, so writev lands as -EINVAL —
discovered the hard way during first run).
- decoder.{c,h} — lazily-opened AVCodecContext per codec, shared
AVPacket/AVFrame pair, descriptor-driven plane walker
(av_pix_fmt_desc_get) so the same hash path covers YUV420P,
YUV422P, YUV444P, GBRP and other 8-bit planar layouts.
Generalised after first run decoded testsrc as GBRP (71)
rather than the assumed YUV420P.
- `daemon` command in main.c opens the chardev and runs the loop
until SIGINT/SIGTERM. Cookie correlation handled end-to-end.
- ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total).
Build:
- CMakeLists adds chardev_client.c + decoder.c; explicit
-I../include for the shared protocol header.
- Still -Wall -Wextra -Wpedantic clean.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
$ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \
-y /tmp/vp9_test.ivf
$ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B)
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ sudo dd if=vp9_keyframe.bin \
of=/sys/kernel/debug/daedalus_v4l2/test_decode
daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240
fnv1a=0x6ef10d71 luma=76800 chroma=38400
kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0
fnv1a=0x6ef10d71 ← matches daemon ✓
Hash properties verified:
cookie=2 testsrc 3268 B → 0x6ef10d71 (first decode)
cookie=3 red 44 B → 0x7f6e5dc5 (content-dependent ✓)
cookie=4 testsrc 3268 B → 0x6ef10d71 (deterministic ✓)
cookie=5 64 B random → status=101 (ERR_SEND, daemon alive)
Daemon survives bad input (FFmpeg "Invalid sync code" wrapped
into structured ERR_SEND response). Clean SIGTERM shutdown,
clean rmmod.
Phase 8.4 acceptance criteria met:
- ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip
- ✓ cookie correlation per request/response pair
- ✓ content-dependent + deterministic digest
- ✓ structured error responses (no daemon crash on bad input)
- ✓ clean teardown (SIGTERM + rmmod)
- ✓ builds clean on both kernel kbuild and daemon CMake
Per correctness-before-speed:
- Real chardev I/O (no shortcuts, no select-loop hacks)
- Real FFmpeg AVCodecContext lifecycle (lazily opened, properly
freed on cleanup)
- Descriptor-driven plane walk (generalises across pix_fmts)
- Structured error path (not just log-and-continue)
- All resource paths cleaned up on every error branch
- Documented why FNV-1a digest, why write() not writev(), why
pix_desc walk in docs/phase_8_4_closure.md
Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from
vidioc_qbuf; dmabuf carries actual pixel data so the chardev's
64 KiB cap doesn't gate frame size; begin substituting
daedalus_dispatch_* into the daemon's decode path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
873a04c622 |
Phase 8.3: userspace daemon scaffold + FFmpeg dlopen + parse path
Builds the daemon executable per the locked Phase 8 architecture
(Option γ: dlopen FFmpeg at runtime). Phase 8.3 scope: parse
path validation only — no V4L2 wiring, no decode, no chardev
connection.
Components:
- daemon/CMakeLists.txt — CMake with -Wall -Wextra -Wpedantic
clean. pkg-config for FFmpeg headers; only -ldl + -lpthread
at link time.
- daemon/src/main.c — entry point, signal handlers
(SIGINT/SIGTERM), command dispatcher. Currently `parse <file>`.
- daemon/src/ffmpeg_loader.{c,h} — runtime FFmpeg loader.
dlopens libavformat.so.61, libavcodec.so.61, libavutil.so.59.
Resolves 22 function pointers using POSIX-recommended
*(void**)& dlsym idiom (per POSIX.1-2017 dlsym(3p) Rationale).
- daemon/src/parser.{c,h} — demux loop via avformat_open_input +
av_read_frame. Per-frame logging on -v.
- daemon/src/log.{c,h} — logging facade (stderr Phase 8.3;
syslog/journal planned for 8.5+).
Verification on hertz:
$ ffmpeg -f lavfi -i testsrc=duration=2:size=320x240:rate=30 \
-c:v libvpx-vp9 -y /tmp/testsrc.ivf
$ daedalus_v4l2_daemon parse /tmp/testsrc.ivf
[INFO] FFmpeg loaded: 7.1.3-0+deb13u1+rpt1 (libavformat 61.7.100)
[INFO] video stream #0: codec=vp9 (Google VP9) 320x240, 0/0 fps
[INFO] parse complete: 60 frames (1 key) total 17859 bytes
Error paths verified:
- Missing file → "avformat_open_input(...): code -2", exit 1
- No command → usage message, exit 2
- Bad command → usage message, exit 2
Per correctness-before-speed:
- Real CMake (no Makefile hacks)
- pkg-config for headers
- POSIX-conformant dlsym pattern (no -Wpedantic suppression)
- Real signal handling + proper exit codes
- Real logging with timestamp + level
- Headers included at compile-time for type safety; dlopen
decouples runtime
- All FFmpeg resources freed on every exit path
- Builds clean on -Wall -Wextra -Wpedantic
Phase 8.3 acceptance criteria met:
- ✓ daemon binary builds
- ✓ dlopen FFmpeg at runtime
- ✓ demux a VP9 IVF file end-to-end
- ✓ per-frame metadata logged correctly
- ✓ frame count + keyframe count + byte total accurate
Phase 8.4 next: wire daemon to /dev/daedalus-v4l2 chardev,
add REQ_DECODE / RESP_FRAME handling, drive VP9 decode
end-to-end via daedalus_dispatch_* from daedalus-fourier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|