H.264 B-frame display reorder: daemon binds libavcodec display-ordered output to decode-ordered V4L2 cookies → pair-swapped frames (visible 2-1-4-3-6-5) #6
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
Visible "awful jumping" / pair-swapped frames on every H.264 stream with B-frames, regardless of resolution (verified at 720p / 1080p) and frame rate (30 fps / 60 fps). Reported by user as "frames are 2 1 4 3 6 5 instead of 1 2 3 4 5 6".
Reproduction
r24+gf0d4186.LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy bbb_720p_h264.mp4→ visible pair-swap. Bypasses Firefox entirely, so the bug is upstream of any browser compositor.firefox-fourierYouTube playback (independently observed before mpv test, but Firefox-induced noise made it harder to localise).Root cause — cookie / display-order mismatch
Pipeline as it stands today
libva-v4l2-request-fouriersubmits H.264 bitstream chunks viaVIDIOC_QBUF(OUTPUT) in decode order (the order libavcodec gave them to libva). Each OUTPUT buffer is bound to a media_request carrying that frame's per-control state (SPS/PPS/DECODE_PARAMS/SLICE_PARAMS).daedalus_v4l2kerneldevice_run(kernel/daedalus_v4l2_main.c:660-790) pops the next src_buf + dst_buf from the m2m queues, allocates a freshcookie = daedalus_next_cookie(), packs(cookie, bitstream, h264_meta)intoREQ_DECODE, sends to the daemon, and stores(cookie → {src_buf, dst_buf, req})in the inflight table.daemon/src/decoder.c:495-513) doesavcodec_send_packet(pkt)thenavcodec_receive_frame(frame)once perREQ_DECODE. Ships the resulting NV12 (orDAEDALUS_DECODE_NO_FRAMEifEAGAIN) back asRESP_FRAMEwith the same cookie.dst_buf[cookie].V4L2_BUF_FLAG_TIMESTAMP_COPYcopiessrc_buf[cookie].timestamp → dst_buf[cookie].timestamp.Why this is wrong for B-frames
libavcodec's H.264 decoder internally reorders output to display order before returning from
avcodec_receive_frame. Each call returns the oldest display-ready frame in its DPB, not necessarily the frame whose slice data the most recentsend_packetcontained.Concrete IBP example, decode order I₀ P₃ B₁ B₂ P₆ B₄ B₅ (subscript = display position):
RESP NO_FRAME→ dst_buf[2] =VB2_BUF_STATE_ERROR(P₃'s pixels held inside libavcodec; lost from V4L2's perspective)Result: pixel content and dst_buf timestamp drift apart. Many src_bufs get marked ERROR (lost frames). When the V4L2 client's compositor presents
dst_buf[N]it sees pixels of an earlier or later display frame than its timestamp claims. At a high level: pairs of P/B frames present in inverted order — the user-visible 2-1-4-3-6-5.The daemon today even ships some frames with a NV12 resolution that differs from the requested CAPTURE buffer size (
decoder: OK 1280x720shipped into acapture=1920x1088buffer on the first frame of a resolution change — see journal cookie=15027 on 2026-05-21) which is a related but separate symptom of the same cookie-binding flaw.Proposed fix — ferry the source PTS through the protocol
Wire protocol
Extend
struct daedalus_req_decodeandstruct daedalus_resp_frameininclude/daedalus_v4l2_proto.h:Bump a
DAEDALUS_PROTO_VERSION; both sides must upgrade together.Kernel side (
device_run)Fill
req->src_pts = src_buf->vb2_buf.timestampbefore sendingREQ_DECODE. (Already accessible.)Inflight lookup needs a secondary key on
src_pts → dst_buf, since the dst_buf the daemon ultimately wants to fill is whichever one was paired with a different src_buf whose timestamp matches the daemon'soutput_src_pts. Simplest: keep cookie-keyed inflight + a parallel hashsrc_pts → inflight. OnRESP_FRAME:(cookie)→ free src_buf (it's done — the slice data was consumed even if no pixels are ready yet).(output_src_pts)→ that inflight'sdst_bufis where the pixels go.dst_buf.timestamp = output_src_ptsexplicitly (we can no longer rely onV4L2_BUF_FLAG_TIMESTAMP_COPYbecause src and dst are no longer paired 1:1).For cookies whose
RESP_FRAMEsayssrc_consumed=1but nooutput_src_ptsmatches an outstanding dst (i.e., the daemon hasn't released a frame for this cookie's bitstream yet), the dst_buf stays parked in the inflight table until a later RESP brings it home.Daemon side (
decoder.c)Replace the synchronous
send_packet → receive_frame_oncepattern with a drain loop:The daemon may send 0, 1, or N
RESP_FRAMEmessages perREQ_DECODE(typically 1 in steady state). Each carries anoutput_src_ptsidentifying which OUTPUT bitstream's pixels these are.libva-v4l2-request-fourier — no changes
The libva driver already submits OUTPUT buffers with the correct per-frame timestamp (display PTS the application passed in). The dst_buf-side timestamp the kernel now stamps explicitly will match what libva expects, so VAAPI surface ordering on the application side stays correct.
Why this is the right shape
AV_CODEC_FLAG_LOW_DELAYis a hack that gives up display order entirely; we want the opposite — preserve it but make it visible to V4L2).Effort estimate
Maybe a day of focused work + a day of soak testing on Pi CM5 with mpv + Firefox on a few stream profiles. Risk: getting the inflight ordering right when
srcanddstdecouple — needs care under concurrent client load (the PR #3 vb_mutex fix covered the prior cross-client hazard, but this fix introduces in-context ordering complexity).Out of scope / related
codec_store_bufferslice-buffer overflow on resolution change (picture.c:112) — separate bug, filed elsewhere. mpv crashes on the 1080p source today because of that; this issue investigated using a 720p source where the same overflow doesn't trigger.V4L2_BUF_FLAG_TIMESTAMP_COPYflag (kernel/daedalus_v4l2_main.c:536, 553) becomes a misnomer once src/dst are no longer paired. It's still useful for VP9/AV1 (no reorder, src/dst 1:1) so it stays — but the H.264 path effectively overrides with an explicit stamp.