DAEMON-PPS: synthesise H.264 SPS/PPS NAL units from V4L2 controls #1

Merged
marfrit merged 1 commits from noether/daemon-pps-h264-nal-synth into main 2026-05-20 16:51:26 +00:00
Member

DAEMON-PPS — fix H.264 decode after LIBVA-{1,2} routing already works

LIBVA-1 + -2 routing on higgs delivers H.264 requests to the daedalus daemon, but the actual decode fails:

daedalus_v4l2_daemon: decoder: opened h264 context
daedalus_v4l2_daemon: [h264 @ ...] non-existing PPS 0 referenced
daedalus_v4l2_daemon: [h264 @ ...] decode_slice_header error

Root cause

libva-v4l2-request passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS} controls — the OUTPUT buffer carries only the slice NAL (correct per V4L2 stateless contract).

daedalus daemon uses libavcodec for the actual decode (Option γ — dlopen FFmpeg). libavcodec wants a self-contained AnnexB stream with SPS+PPS before the slice. The daemon was handing it the slice in isolation. VP9/AV1 worked because their frames are self-describing.

Fix — daemon-side synthesis

Kernel ships SPS/PPS structs to daemon; daemon synthesises AnnexB NAL units and prepends to the slice bitstream.

Wire protocol (include/daedalus_v4l2_proto.h):

  • New DAEDALUS_REQ_FLAG_H264_META bit in req.flags
  • New struct daedalus_h264_meta carrying v4l2_ctrl_h264_{sps,pps,scaling_matrix,decode_params}
  • Wire layout: hdr -> req_decode -> (optional meta) -> bitstream

Kernel (kernel/daedalus_v4l2_main.c):

  • daedalus_collect_h264_meta() reads ctrl->p_cur.p_h264_* from the bound media_request
  • device_run() packages the meta block between prefix and bitstream when codec=H264
  • Payload size bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD

Daemon (daemon/src/):

  • New bitstream_writer.{c,h}: MSB-first bit packer with H.264 Exp-Golomb ue(v) / se(v) and rbsp_trailing_bits alignment
  • New h264_nal_synth.{c,h}: encodes v4l2_ctrl_h264_sps -> SPS NAL and v4l2_ctrl_h264_pps -> PPS NAL per ITU-T H.264 7.3.2.1 / 7.3.2.2. Emits emulation prevention bytes and the 4-byte AnnexB start code.
  • decoder.c gains an h264_meta parameter and an SPS+PPS+slice assembled-buffer path with goto-based cleanup so the buffer is always freed.
  • chardev_client.c parses the optional meta block and forwards.

Coverage / scope

What's emitted:

  • SPS: profile_idc, constraint_set_flags, level_idc, all the geometry fields (pic_width_in_mbs, pic_height, frame_mbs_only, direct_8x8), pic_order_cnt_type with full type-1 trailer, ref-frame controls. For 'high' profiles (100/110/122/244/etc.), the chroma_format/bit_depth/qpprime block is included too.
  • PPS: pic/seq_parameter_set_id, all the flag bits (entropy_coding_mode, weighted_pred, deblocking_filter_present, constrained_intra_pred, redundant_pic_cnt, transform_8x8), QP-related fields, second_chroma_qp_index_offset trailer when it differs.

What's deliberately set to 0 (V4L2 ctrls don't carry these):

  • frame_cropping_flag — libva already sized the surface; libavcodec output dimensions == encoded MBs
  • vui_parameters_present_flag — V4L2 has no VUI struct
  • seq_scaling_matrix_present_flag / pic_scaling_matrix_present_flag — libavcodec uses default scaling matrices, which matches what most H.264 streams do

Build status

Daemon builds clean on boltzmann (aarch64). Kernel module to be exercised via DKMS rebuild on higgs once a marfrit-packages bump pins this commit.

Test plan

With this commit + marfrit-packages daedalus pin bump + libva already at the LIBVA-2 commit (9898331), ffmpeg -hwaccel vaapi -i h264_test.mp4 on higgs should produce:

decoder: opened h264 context
decoder: h264 prepended SPS=NB PPS=MB slice=KB
decoder: OK WxH fmt=0 (yuv420p) fnv1a=0x...

Replacing the previous PPS-0-referenced failure. VP9 / AV1 behaviour unchanged.

Generated with Claude Code

## DAEMON-PPS — fix H.264 decode after LIBVA-{1,2} routing already works LIBVA-1 + -2 routing on higgs delivers H.264 requests to the daedalus daemon, but the actual decode fails: ``` daedalus_v4l2_daemon: decoder: opened h264 context daedalus_v4l2_daemon: [h264 @ ...] non-existing PPS 0 referenced daedalus_v4l2_daemon: [h264 @ ...] decode_slice_header error ``` ### Root cause libva-v4l2-request passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS} controls — the OUTPUT buffer carries only the slice NAL (correct per V4L2 stateless contract). daedalus daemon uses libavcodec for the actual decode (Option γ — dlopen FFmpeg). libavcodec wants a self-contained AnnexB stream with SPS+PPS before the slice. The daemon was handing it the slice in isolation. VP9/AV1 worked because their frames are self-describing. ### Fix — daemon-side synthesis Kernel ships SPS/PPS structs to daemon; daemon synthesises AnnexB NAL units and prepends to the slice bitstream. **Wire protocol** (include/daedalus_v4l2_proto.h): - New DAEDALUS_REQ_FLAG_H264_META bit in req.flags - New struct daedalus_h264_meta carrying v4l2_ctrl_h264_{sps,pps,scaling_matrix,decode_params} - Wire layout: hdr -> req_decode -> (optional meta) -> bitstream **Kernel** (kernel/daedalus_v4l2_main.c): - daedalus_collect_h264_meta() reads ctrl->p_cur.p_h264_* from the bound media_request - device_run() packages the meta block between prefix and bitstream when codec=H264 - Payload size bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD **Daemon** (daemon/src/): - New bitstream_writer.{c,h}: MSB-first bit packer with H.264 Exp-Golomb ue(v) / se(v) and rbsp_trailing_bits alignment - New h264_nal_synth.{c,h}: encodes v4l2_ctrl_h264_sps -> SPS NAL and v4l2_ctrl_h264_pps -> PPS NAL per ITU-T H.264 7.3.2.1 / 7.3.2.2. Emits emulation prevention bytes and the 4-byte AnnexB start code. - decoder.c gains an h264_meta parameter and an SPS+PPS+slice assembled-buffer path with goto-based cleanup so the buffer is always freed. - chardev_client.c parses the optional meta block and forwards. ### Coverage / scope What's emitted: - SPS: profile_idc, constraint_set_flags, level_idc, all the geometry fields (pic_width_in_mbs, pic_height, frame_mbs_only, direct_8x8), pic_order_cnt_type with full type-1 trailer, ref-frame controls. For 'high' profiles (100/110/122/244/etc.), the chroma_format/bit_depth/qpprime block is included too. - PPS: pic/seq_parameter_set_id, all the flag bits (entropy_coding_mode, weighted_pred, deblocking_filter_present, constrained_intra_pred, redundant_pic_cnt, transform_8x8), QP-related fields, second_chroma_qp_index_offset trailer when it differs. What's deliberately set to 0 (V4L2 ctrls don't carry these): - frame_cropping_flag — libva already sized the surface; libavcodec output dimensions == encoded MBs - vui_parameters_present_flag — V4L2 has no VUI struct - seq_scaling_matrix_present_flag / pic_scaling_matrix_present_flag — libavcodec uses default scaling matrices, which matches what most H.264 streams do ### Build status Daemon builds clean on boltzmann (aarch64). Kernel module to be exercised via DKMS rebuild on higgs once a marfrit-packages bump pins this commit. ### Test plan With this commit + marfrit-packages daedalus pin bump + libva already at the LIBVA-2 commit (9898331), ffmpeg -hwaccel vaapi -i h264_test.mp4 on higgs should produce: ``` decoder: opened h264 context decoder: h264 prepended SPS=NB PPS=MB slice=KB decoder: OK WxH fmt=0 (yuv420p) fnv1a=0x... ``` Replacing the previous PPS-0-referenced failure. VP9 / AV1 behaviour unchanged. Generated with Claude Code
claude-noether added 1 commit 2026-05-20 15:36:18 +00:00
libva-v4l2-request-fourier (and any V4L2-stateless-API consumer)
passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS}
controls; only the slice NAL goes into the OUTPUT buffer.  This is
correct per the V4L2 stateless contract.  But libavcodec — which
the daedalus daemon uses for actual decode (Option γ) — wants a
self-contained AnnexB stream including SPS+PPS before any slice.
Result on higgs: "non-existing PPS 0 referenced" + decode_slice_
header errors on every H.264 frame, even after LIBVA-1 and -2
routing correctly delivered the request to the daemon.

Fix splits across kernel + daemon, keeping the kernel module as a
thin transport and putting the actual NAL encoding in userspace:

  include/daedalus_v4l2_proto.h:
    Add struct daedalus_h264_meta (the four v4l2_ctrl_h264_*
    structs the kernel collects) and DAEDALUS_REQ_FLAG_H264_META
    (set in req.flags when the meta block is present between the
    daedalus_req_decode prefix and the slice bitstream).

  kernel/daedalus_v4l2_main.c:
    Add daedalus_collect_h264_meta() — reads the H.264 ctrl values
    from the bound media_request via v4l2_ctrl_find +
    ctrl->p_cur.p_h264_*.  device_run() calls it on H.264 codec_id,
    copies the structs into the REQ_DECODE payload between the
    prefix and bitstream, and sets the flag.  Payload size is
    bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD so an over-
    sized slice + meta fails loud instead of truncating.

  daemon/src/bitstream_writer.{c,h}:
    New module — MSB-first bit packer with H.264 Exp-Golomb ue(v)
    and se(v) coding + rbsp_trailing_bits alignment.  Sticky
    overflow flag so callers can verify the output buffer wasn't
    truncated.

  daemon/src/h264_nal_synth.{c,h}:
    New module — turns v4l2_ctrl_h264_sps / v4l2_ctrl_h264_pps
    into AnnexB-framed NAL units per ITU-T H.264 7.3.2.1 / 7.3.2.2.
    Emits emulation prevention bytes (0x03 after every 00 00 in the
    EBSP) and the 4-byte start code (0x00000001).  Coverage matches
    what V4L2 stateless surface gives us: VUI parameters and full
    scaling matrices are NOT emitted (V4L2 doesn't carry them — the
    seq_scaling_matrix_present_flag is set to 0 and libavcodec uses
    flat defaults, which matches the de-facto behaviour of most
    H.264 streams libva-v4l2-request drives).

  daemon/src/decoder.c:
    daedalus_decoder_run_request() now takes an optional
    h264_meta parameter.  For codec_id == H264 with meta != NULL,
    synthesises SPS+PPS NAL units, allocates a combined
    [SPS][PPS][slice] buffer (+ AV_INPUT_BUFFER_PADDING_SIZE), and
    feeds that to avcodec_send_packet instead of the raw slice.
    VP9/AV1 path unchanged (frames are self-contained).  Cleanup
    now goes through a unified `out:` label so the assembled
    buffer is always freed on every exit (including the existing
    decoder_open_codec / no-frame / receive_frame failure paths).

  daemon/src/chardev_client.c:
    handle_req_decode() peels off the optional meta block when the
    flag is set, passes it through to the decoder, and updates
    the payload-length consistency check (now allows for an extra
    sizeof(daedalus_h264_meta) when the flag is on).

Build (boltzmann aarch64): clean compile of all daemon sources,
including bitstream_writer + h264_nal_synth + the refactored
decoder.c.  Kernel module compile to be verified via DKMS rebuild
on higgs in the marfrit-packages bump that follows.

Test plan: with this commit + a marfrit-packages daedalus pin
bump, higgs's ffmpeg -hwaccel vaapi -i h264_test.mp4 should
produce a successful decode (vs. the previous "non-existing PPS 0
referenced" failure).  The daemon log should show:
  decoder: opened h264 context
  decoder: h264 prepended SPS=NB PPS=MB slice=KB
  decoder: OK 320x240 fmt=0 (yuv420p) fnv1a=0x...

VP9 / AV1 behaviour unchanged — they don't carry meta and the
existing per-frame self-describing path still applies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
marfrit merged commit 3dd0eb070a into main 2026-05-20 16:51:26 +00:00
marfrit deleted branch noether/daemon-pps-h264-nal-synth 2026-05-20 16:51:27 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: reauktion/daedalus-v4l2#1