DAEMON-PPS: synthesise H.264 SPS/PPS NAL units from V4L2 controls #1
Reference in New Issue
Block a user
Delete Branch "noether/daemon-pps-h264-nal-synth"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
DAEMON-PPS — fix H.264 decode after LIBVA-{1,2} routing already works
LIBVA-1 + -2 routing on higgs delivers H.264 requests to the daedalus daemon, but the actual decode fails:
Root cause
libva-v4l2-request passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS} controls — the OUTPUT buffer carries only the slice NAL (correct per V4L2 stateless contract).
daedalus daemon uses libavcodec for the actual decode (Option γ — dlopen FFmpeg). libavcodec wants a self-contained AnnexB stream with SPS+PPS before the slice. The daemon was handing it the slice in isolation. VP9/AV1 worked because their frames are self-describing.
Fix — daemon-side synthesis
Kernel ships SPS/PPS structs to daemon; daemon synthesises AnnexB NAL units and prepends to the slice bitstream.
Wire protocol (include/daedalus_v4l2_proto.h):
Kernel (kernel/daedalus_v4l2_main.c):
Daemon (daemon/src/):
Coverage / scope
What's emitted:
What's deliberately set to 0 (V4L2 ctrls don't carry these):
Build status
Daemon builds clean on boltzmann (aarch64). Kernel module to be exercised via DKMS rebuild on higgs once a marfrit-packages bump pins this commit.
Test plan
With this commit + marfrit-packages daedalus pin bump + libva already at the LIBVA-2 commit (9898331), ffmpeg -hwaccel vaapi -i h264_test.mp4 on higgs should produce:
Replacing the previous PPS-0-referenced failure. VP9 / AV1 behaviour unchanged.
Generated with Claude Code
libva-v4l2-request-fourier (and any V4L2-stateless-API consumer) passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS} controls; only the slice NAL goes into the OUTPUT buffer. This is correct per the V4L2 stateless contract. But libavcodec — which the daedalus daemon uses for actual decode (Option γ) — wants a self-contained AnnexB stream including SPS+PPS before any slice. Result on higgs: "non-existing PPS 0 referenced" + decode_slice_ header errors on every H.264 frame, even after LIBVA-1 and -2 routing correctly delivered the request to the daemon. Fix splits across kernel + daemon, keeping the kernel module as a thin transport and putting the actual NAL encoding in userspace: include/daedalus_v4l2_proto.h: Add struct daedalus_h264_meta (the four v4l2_ctrl_h264_* structs the kernel collects) and DAEDALUS_REQ_FLAG_H264_META (set in req.flags when the meta block is present between the daedalus_req_decode prefix and the slice bitstream). kernel/daedalus_v4l2_main.c: Add daedalus_collect_h264_meta() — reads the H.264 ctrl values from the bound media_request via v4l2_ctrl_find + ctrl->p_cur.p_h264_*. device_run() calls it on H.264 codec_id, copies the structs into the REQ_DECODE payload between the prefix and bitstream, and sets the flag. Payload size is bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD so an over- sized slice + meta fails loud instead of truncating. daemon/src/bitstream_writer.{c,h}: New module — MSB-first bit packer with H.264 Exp-Golomb ue(v) and se(v) coding + rbsp_trailing_bits alignment. Sticky overflow flag so callers can verify the output buffer wasn't truncated. daemon/src/h264_nal_synth.{c,h}: New module — turns v4l2_ctrl_h264_sps / v4l2_ctrl_h264_pps into AnnexB-framed NAL units per ITU-T H.264 7.3.2.1 / 7.3.2.2. Emits emulation prevention bytes (0x03 after every 00 00 in the EBSP) and the 4-byte start code (0x00000001). Coverage matches what V4L2 stateless surface gives us: VUI parameters and full scaling matrices are NOT emitted (V4L2 doesn't carry them — the seq_scaling_matrix_present_flag is set to 0 and libavcodec uses flat defaults, which matches the de-facto behaviour of most H.264 streams libva-v4l2-request drives). daemon/src/decoder.c: daedalus_decoder_run_request() now takes an optional h264_meta parameter. For codec_id == H264 with meta != NULL, synthesises SPS+PPS NAL units, allocates a combined [SPS][PPS][slice] buffer (+ AV_INPUT_BUFFER_PADDING_SIZE), and feeds that to avcodec_send_packet instead of the raw slice. VP9/AV1 path unchanged (frames are self-contained). Cleanup now goes through a unified `out:` label so the assembled buffer is always freed on every exit (including the existing decoder_open_codec / no-frame / receive_frame failure paths). daemon/src/chardev_client.c: handle_req_decode() peels off the optional meta block when the flag is set, passes it through to the decoder, and updates the payload-length consistency check (now allows for an extra sizeof(daedalus_h264_meta) when the flag is on). Build (boltzmann aarch64): clean compile of all daemon sources, including bitstream_writer + h264_nal_synth + the refactored decoder.c. Kernel module compile to be verified via DKMS rebuild on higgs in the marfrit-packages bump that follows. Test plan: with this commit + a marfrit-packages daedalus pin bump, higgs's ffmpeg -hwaccel vaapi -i h264_test.mp4 should produce a successful decode (vs. the previous "non-existing PPS 0 referenced" failure). The daemon log should show: decoder: opened h264 context decoder: h264 prepended SPS=NB PPS=MB slice=KB decoder: OK 320x240 fmt=0 (yuv420p) fnv1a=0x... VP9 / AV1 behaviour unchanged — they don't carry meta and the existing per-frame self-describing path still applies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>