daemon: AV_CODEC_FLAG_LOW_DELAY for H.264 — implements #11 part (2) #12

Merged
marfrit merged 1 commits from noether/daemon-low-delay-h264 into main 2026-05-21 15:17:57 +00:00
Owner

What

Single-line change: ctx->flags |= AV_CODEC_FLAG_LOW_DELAY before avcodec_open2, gated on codec_id == DAEDALUS_CODEC_H264.

Forces libavcodec's H.264 decoder to emit pictures in decode order, one per avcodec_send_packet, with no internal display-order reorder queue. Aligns daedalus's output ordering with what every V4L2 stateless decoder produces (cedrus, hantro, rkvdec — they hand CAPTURE buffers to libva in decode order, ffmpeg-vaapi reorders display upstream via per-surface POC).

Closes the design loop from #11. Section (1) — concurrent in-flight requests — turned out to be unnecessary: with LOW_DELAY, each REQ_DECODE produces one CAPTURE completion immediately, and the existing synchronous chardev loop is already fast enough. Section (3) — libva-side reorder — was a misread, never needed (ffmpeg-vaapi handles display reorder upstream of us).

How

Inside libavcodec/h264dec.c, AV_CODEC_FLAG_LOW_DELAY sets h->low_delay = 1. h264_select_output_frame (in h264_picture.c) emits the just-decoded picture immediately instead of routing through the display-order DPB output queue.

Reference-frame DPB management (short_ref / long_ref) is unaffected. B-frame decoding correctness preserved. Only the OUTPUT buffering is bypassed.

VP9 / AV1: flag not set — those codecs don't reorder, so it'd be a no-op but adds no value.

Verified

Field-test on higgs (Pi CM5, 6.18.29+rpt-rpi-2712), test daemon binary hot-swapped:

  • LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --frames=300 --no-config --no-audio --vo=null bbb_720p_h264.mp4
  • 311 REQ_DECODE received, 308 "decoder: OK" — 99.04% steady-state 1:1 delivery.
  • 3 lost frames are at GOP boundaries (libavcodec needing the I-frame's reference data before emitting subsequent P-frame in low-delay mode); no compounding drift.
  • mpv plays to --frames=300 cap and exits cleanly with End of file.
  • All failure modes from #9 are gone: no "Unable to dequeue buffer", no "Failed to end picture decode", no "AVHWFramesContext: Failed to sync surface".

Firefox YouTube playback test pending — needs the bump in marfrit-packages first.

Wire compat

No protocol change. DAEDALUS_PROTO_VERSION stays at 0. Kernel module unchanged. This is a daedalus-v4l2 (userspace) package-only bump.

Refs

  • #11 (design + correction)
  • #9 (mpv stuck pre-playing under PR #7+#8)
  • #10 (revert of #7+#8 — last known-good base)
## What Single-line change: `ctx->flags |= AV_CODEC_FLAG_LOW_DELAY` before `avcodec_open2`, gated on `codec_id == DAEDALUS_CODEC_H264`. Forces libavcodec's H.264 decoder to emit pictures in **decode order**, one per `avcodec_send_packet`, with no internal display-order reorder queue. Aligns daedalus's output ordering with what every V4L2 stateless decoder produces (cedrus, hantro, rkvdec — they hand CAPTURE buffers to libva in decode order, ffmpeg-vaapi reorders display upstream via per-surface POC). Closes the design loop from #11. Section (1) — concurrent in-flight requests — turned out to be unnecessary: with LOW_DELAY, each REQ_DECODE produces one CAPTURE completion immediately, and the existing synchronous chardev loop is already fast enough. Section (3) — libva-side reorder — was a misread, never needed (ffmpeg-vaapi handles display reorder upstream of us). ## How Inside `libavcodec/h264dec.c`, `AV_CODEC_FLAG_LOW_DELAY` sets `h->low_delay = 1`. `h264_select_output_frame` (in `h264_picture.c`) emits the just-decoded picture immediately instead of routing through the display-order DPB output queue. Reference-frame DPB management (short_ref / long_ref) is unaffected. B-frame decoding correctness preserved. Only the OUTPUT buffering is bypassed. VP9 / AV1: flag not set — those codecs don't reorder, so it'd be a no-op but adds no value. ## Verified Field-test on higgs (Pi CM5, 6.18.29+rpt-rpi-2712), test daemon binary hot-swapped: - `LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --frames=300 --no-config --no-audio --vo=null bbb_720p_h264.mp4` - 311 REQ_DECODE received, 308 "decoder: OK" — **99.04% steady-state 1:1 delivery**. - 3 lost frames are at GOP boundaries (libavcodec needing the I-frame's reference data before emitting subsequent P-frame in low-delay mode); no compounding drift. - mpv plays to `--frames=300` cap and exits cleanly with `End of file`. - All failure modes from #9 are gone: no "Unable to dequeue buffer", no "Failed to end picture decode", no "AVHWFramesContext: Failed to sync surface". Firefox YouTube playback test pending — needs the bump in marfrit-packages first. ## Wire compat No protocol change. `DAEDALUS_PROTO_VERSION` stays at 0. Kernel module unchanged. This is a daedalus-v4l2 (userspace) package-only bump. ## Refs - #11 (design + correction) - #9 (mpv stuck pre-playing under PR #7+#8) - #10 (revert of #7+#8 — last known-good base)
marfrit added 1 commit 2026-05-21 15:15:12 +00:00
Force libavcodec's H.264 decoder to emit frames in DECODE order
(one frame per send_packet, no internal display-order reorder
queue).  Single-line addition: ctx->flags |= AV_CODEC_FLAG_LOW_DELAY
before avcodec_open2, gated on codec_id == DAEDALUS_CODEC_H264.

Closes daedalus-v4l2#11 part (2).

Background
----------
PR #7's "parking design" approach to the H.264 display-reorder
problem broke libva-v4l2-request-fourier's 1:1 CAPTURE-completion
contract (see #9 + #10).  After the revert, the visible "2 1 4 3"
pair-swap regressed and the only path forward was to align the
daemon's output ordering with what V4L2 stateless clients expect:
**decode order, one CAPTURE buffer per OUTPUT slice, with display
reorder pushed upstream to ffmpeg-vaapi's per-VAAPI-surface POC
logic** (which it already does correctly for every real H.264
hardware decoder via VAPictureParameterBufferH264).

How LOW_DELAY does this
-----------------------
Inside libavcodec/h264dec.c, the flag sets h->low_delay = 1.
h264_select_output_frame (h264_picture.c) emits the just-decoded
picture immediately instead of routing through the display-order
DPB output queue.  DPB management for reference frames
(short_ref / long_ref) is unaffected — B-frame decoding
correctness is preserved; only the output buffering is bypassed.

Skipped for VP9 / AV1 — those codecs don't reorder internally,
so the flag would be a no-op but adds no value.

Verified
--------
On higgs (Pi CM5, 6.18.29+rpt-rpi-2712), test daemon hot-swapped
into /usr/bin/daedalus_v4l2_daemon, mpv --hwdec=vaapi-copy
--frames=300 against bbb_720p_h264.mp4: 311 REQ_DECODEs received,
308 successful "decoder: OK" responses (99.04% steady-state
delivery — 3 lost at GOP boundaries, no compounding drift).
mpv plays to its --frames cap and exits cleanly with "End of
file".  No "Unable to dequeue buffer", no "Failed to end picture
decode", no "AVHWFramesContext: Failed to sync surface" — all
the failures from #9 are gone.

Builds clean against ffmpeg-v4l2-request-fourier libavcodec.
marfrit merged commit 64b9599e47 into main 2026-05-21 15:17:57 +00:00
marfrit deleted branch noether/daemon-low-delay-h264 2026-05-21 15:17:57 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: reauktion/daedalus-v4l2#12