AV1 decode through daedalus daemon needs a sequence-header OBU synthesiser #11

Open
opened 2026-05-20 19:00:24 +00:00 by claude-noether · 0 comments
Collaborator

Summary

With libva PR #10 (the AV1 no-op codec_set_controls case) landed, AV1 frames now reach the daedalus_v4l2 daemon — but libdav1d in the daemon can't parse them. Same architectural shape as the H.264 SPS/PPS problem that DAEMON-PPS solved (daedalus-v4l2 PR #1 + PR #2), just for AV1.

Concrete failure on higgs

With picture.c AV1 no-op + the test stream from H.264 verification re-encoded to AV1 via libsvtav1:

$ LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi \
    -hwaccel_device /dev/dri/renderD128 \
    -i av1_test.mkv -f null - 2>&1 | tail -3
...
v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=5 media_fd=6
[matroska,webm @ ...] media=...

Daemon journal:

REQ_DECODE cookie=121 codec=2 bitstream=7602 bytes meta=none capture=320x240 1 planes
decoder: opened libdav1d context
[libdav1d @ 0x...] Unknown OBU type 8 of size 7601
REQ_DECODE cookie=122 codec=2 bitstream=2400 bytes meta=none
[libdav1d @ 0x...] Overrun in OBU bit buffer
REQ_DECODE cookie=125 codec=2 bitstream=485 bytes meta=none
[libdav1d @ 0x...] Error parsing OBU data
...

Result: 0 successful decodes across all 60 frames, 40+ libdav1d errors.

Why this is the AV1 analogue of the H.264 SPS/PPS gap

ffmpeg-vaapi (libavcodec/vaapi_av1.c) parses the AV1 bitstream client-side, splits it across multiple VAAPI buffers:

  • VAPictureParameterBufferAV1 — carries the decoded sequence header + per-frame info
  • VASliceParameterBufferAV1 — per-tile-group offsets
  • slice data — only the compressed tile group OBUs (no sequence header, no temporal delimiter, no metadata)

libva-v4l2-request-fourier currently puts only the slice data in the V4L2 OUTPUT buffer (which is correct for the V4L2 stateless AV1 API — the sequence header is supposed to go into V4L2_CID_STATELESS_AV1_SEQUENCE).

The daedalus daemon hands the OUTPUT buffer directly to libavcodec (Option γ — dlopen). libavcodec for AV1 calls libdav1d, which expects a full OBU stream including the sequence header OBU. Without the sequence header OBU prepended, libdav1d sees the tile group OBU's leading bits, mis-interprets the obu_type field, and bails.

Same as H.264 (libavcodec wants SPS+PPS NALs in the stream; libva-v4l2-request only puts the slice NAL in the OUTPUT buffer). DAEMON-PPS solved that by:

  1. Extending the wire protocol (daedalus_h264_meta struct + DAEDALUS_REQ_FLAG_H264_META bit).
  2. Kernel collects the H.264 V4L2 ctrls and ships them to the daemon.
  3. Daemon synthesises AnnexB SPS+PPS NAL units from the structs and prepends them.

For AV1, the same pattern applies — with the AV1 OBU encoding instead of H.264 NAL encoding.

Scope of the fix

Mechanically similar to DAEMON-PPS, with these specifics:

libva-v4l2-request-fourier side (~150 LoC)

  • Implement av1_set_controls (in a new src/av1.c) that reads VAPictureParameterBufferAV1 and writes V4L2_CID_STATELESS_AV1_SEQUENCE (struct v4l2_ctrl_av1_sequence). The kernel ctrl framework is already there — the daedalus kernel module registers V4L2_CID_STATELESS_AV1_SEQUENCE at daedalus_stateless_ctrls[] line 188.
  • Replace the no-op case in picture.c::codec_set_controls (from PR #10) with av1_set_controls(...).

daedalus-v4l2 kernel side (~30 LoC)

  • Extend struct daedalus_h264_meta with an av1_sequence field, OR add a new struct daedalus_av1_meta block + DAEDALUS_REQ_FLAG_AV1_META bit. Probably better to be symmetric: separate meta block per codec.
  • daedalus_collect_av1_meta() mirrors daedalus_collect_h264_meta() — reads ctrl->p_cur.p_av1_sequence (with v4l2_ctrl_request_setup already in place from PR #2).
  • device_run packages the AV1 meta block for codec=AV1.

daedalus daemon side (~400 LoC)

  • New daemon/src/av1_obu_synth.{c,h} — AV1 OBU encoder. Smaller than H.264 NAL synth because AV1 OBUs use a leb128 size prefix instead of NAL emulation prevention. Specifically encodes OBU_SEQUENCE_HEADER (type 1) from v4l2_ctrl_av1_sequence per AV1 spec 5.5.
  • Bitstream writer is mostly reusable — AV1 uses f(n) (fixed-width unsigned) and uvlc() (unsigned variable-length coding similar to Exp-Golomb's ue(v)) plus its own leb128() for size fields. Extending daemon/src/bitstream_writer.h to add bsw_put_uvlc() + bsw_put_leb128() covers it.
  • decoder.c::daedalus_decoder_run_request — when codec_id == AV1 && av1_meta != NULL, synthesise OBU_SEQUENCE_HEADER + temporal_delimiter, prepend to the slice bitstream before avcodec_send_packet.

Wire protocol decision

Two shapes possible:

  1. Per-codec meta blocksDAEDALUS_REQ_FLAG_H264_META (existing), DAEDALUS_REQ_FLAG_AV1_META (new), each pointing at codec-specific struct. Cleaner separation.
  2. Union'd meta block — single daedalus_codec_meta with codec-tagged contents. Smaller wire size for codecs that need short metadata, slightly more glue.

I'd ship option 1 — matches the existing H.264 layout, easier to read.

Out of scope / non-goals

  • Film grain (V4L2_CID_STATELESS_AV1_FILM_GRAIN). libdav1d handles film grain internally if the OBU stream describes it — we just need the sequence header right, the rest comes via OBUs in the slice buffer.
  • Tile group entry control (V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY). Same — the tile group OBUs are already in the slice data, no struct→OBU synthesis required.
  • True per-frame OBU_FRAME synthesis from V4L2_CID_STATELESS_AV1_FRAME. The slice data ffmpeg-vaapi puts in the OUTPUT buffer already contains the OBU_FRAME_HEADER + OBU_TILE_GROUP — we don't need to re-synthesise these.

Only the sequence header is missing, and it's a per-stream constant (re-encoded once at session start, prepended to every frame). Single-purpose synth.

Sequencing

Depends on:

  • libva PR #10 (the no-op case VAProfileAV1Profile0) — landed/in-review.
  • daedalus-v4l2 PR #2 (v4l2_ctrl_request_setup) — landed.

Ready to implement after PR #10 merges.

## Summary With libva PR #10 (the AV1 no-op codec_set_controls case) landed, AV1 frames now reach the daedalus_v4l2 daemon — but libdav1d in the daemon can't parse them. Same architectural shape as the H.264 SPS/PPS problem that DAEMON-PPS solved (daedalus-v4l2 PR #1 + PR #2), just for AV1. ## Concrete failure on higgs With `picture.c` AV1 no-op + the test stream from H.264 verification re-encoded to AV1 via `libsvtav1`: ``` $ LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 \ -i av1_test.mkv -f null - 2>&1 | tail -3 ... v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=5 media_fd=6 [matroska,webm @ ...] media=... ``` Daemon journal: ``` REQ_DECODE cookie=121 codec=2 bitstream=7602 bytes meta=none capture=320x240 1 planes decoder: opened libdav1d context [libdav1d @ 0x...] Unknown OBU type 8 of size 7601 REQ_DECODE cookie=122 codec=2 bitstream=2400 bytes meta=none [libdav1d @ 0x...] Overrun in OBU bit buffer REQ_DECODE cookie=125 codec=2 bitstream=485 bytes meta=none [libdav1d @ 0x...] Error parsing OBU data ... ``` Result: 0 successful decodes across all 60 frames, 40+ libdav1d errors. ## Why this is the AV1 analogue of the H.264 SPS/PPS gap ffmpeg-vaapi (libavcodec/vaapi_av1.c) parses the AV1 bitstream client-side, splits it across multiple VAAPI buffers: - `VAPictureParameterBufferAV1` — carries the decoded sequence header + per-frame info - `VASliceParameterBufferAV1` — per-tile-group offsets - slice data — only the compressed tile group OBUs (no sequence header, no temporal delimiter, no metadata) libva-v4l2-request-fourier currently puts only the slice data in the V4L2 OUTPUT buffer (which is correct for the V4L2 stateless AV1 API — the sequence header is supposed to go into V4L2_CID_STATELESS_AV1_SEQUENCE). The daedalus daemon hands the OUTPUT buffer directly to `libavcodec` (Option γ — dlopen). libavcodec for AV1 calls libdav1d, which expects a **full OBU stream** including the sequence header OBU. Without the sequence header OBU prepended, libdav1d sees the tile group OBU's leading bits, mis-interprets the obu_type field, and bails. Same as H.264 (libavcodec wants SPS+PPS NALs in the stream; libva-v4l2-request only puts the slice NAL in the OUTPUT buffer). DAEMON-PPS solved that by: 1. Extending the wire protocol (daedalus_h264_meta struct + DAEDALUS_REQ_FLAG_H264_META bit). 2. Kernel collects the H.264 V4L2 ctrls and ships them to the daemon. 3. Daemon synthesises AnnexB SPS+PPS NAL units from the structs and prepends them. For AV1, the same pattern applies — with the AV1 OBU encoding instead of H.264 NAL encoding. ## Scope of the fix Mechanically similar to DAEMON-PPS, with these specifics: ### libva-v4l2-request-fourier side (~150 LoC) - Implement `av1_set_controls` (in a new `src/av1.c`) that reads `VAPictureParameterBufferAV1` and writes `V4L2_CID_STATELESS_AV1_SEQUENCE` (`struct v4l2_ctrl_av1_sequence`). The kernel ctrl framework is already there — the daedalus kernel module registers `V4L2_CID_STATELESS_AV1_SEQUENCE` at `daedalus_stateless_ctrls[]` line 188. - Replace the no-op case in `picture.c::codec_set_controls` (from PR #10) with `av1_set_controls(...)`. ### daedalus-v4l2 kernel side (~30 LoC) - Extend `struct daedalus_h264_meta` with an `av1_sequence` field, OR add a new `struct daedalus_av1_meta` block + `DAEDALUS_REQ_FLAG_AV1_META` bit. Probably better to be symmetric: separate meta block per codec. - `daedalus_collect_av1_meta()` mirrors `daedalus_collect_h264_meta()` — reads `ctrl->p_cur.p_av1_sequence` (with `v4l2_ctrl_request_setup` already in place from PR #2). - device_run packages the AV1 meta block for codec=AV1. ### daedalus daemon side (~400 LoC) - New `daemon/src/av1_obu_synth.{c,h}` — AV1 OBU encoder. Smaller than H.264 NAL synth because AV1 OBUs use a leb128 size prefix instead of NAL emulation prevention. Specifically encodes `OBU_SEQUENCE_HEADER` (type 1) from `v4l2_ctrl_av1_sequence` per AV1 spec 5.5. - Bitstream writer is mostly reusable — AV1 uses `f(n)` (fixed-width unsigned) and `uvlc()` (unsigned variable-length coding similar to Exp-Golomb's ue(v)) plus its own `leb128()` for size fields. Extending `daemon/src/bitstream_writer.h` to add `bsw_put_uvlc()` + `bsw_put_leb128()` covers it. - `decoder.c::daedalus_decoder_run_request` — when codec_id == AV1 && av1_meta != NULL, synthesise OBU_SEQUENCE_HEADER + temporal_delimiter, prepend to the slice bitstream before `avcodec_send_packet`. ## Wire protocol decision Two shapes possible: 1. **Per-codec meta blocks** — `DAEDALUS_REQ_FLAG_H264_META` (existing), `DAEDALUS_REQ_FLAG_AV1_META` (new), each pointing at codec-specific struct. Cleaner separation. 2. **Union'd meta block** — single `daedalus_codec_meta` with codec-tagged contents. Smaller wire size for codecs that need short metadata, slightly more glue. I'd ship option 1 — matches the existing H.264 layout, easier to read. ## Out of scope / non-goals - Film grain (`V4L2_CID_STATELESS_AV1_FILM_GRAIN`). libdav1d handles film grain internally if the OBU stream describes it — we just need the sequence header right, the rest comes via OBUs in the slice buffer. - Tile group entry control (`V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY`). Same — the tile group OBUs are already in the slice data, no struct→OBU synthesis required. - True per-frame OBU_FRAME synthesis from `V4L2_CID_STATELESS_AV1_FRAME`. The slice data ffmpeg-vaapi puts in the OUTPUT buffer already contains the OBU_FRAME_HEADER + OBU_TILE_GROUP — we don't need to re-synthesise these. Only the sequence header is missing, and it's a per-stream constant (re-encoded once at session start, prepended to every frame). Single-purpose synth. ## Sequencing Depends on: - libva PR #10 (the no-op `case VAProfileAV1Profile0`) — landed/in-review. - daedalus-v4l2 PR #2 (v4l2_ctrl_request_setup) — landed. Ready to implement after PR #10 merges.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/libva-v4l2-request-fourier#11