h264_set_controls overwrites stream's SPS profile/level with session-derived values; max_num_ref_frames not populated #8

Open
opened 2026-05-20 17:46:03 +00:00 by claude-noether · 1 comment
Collaborator

Summary

h264_set_controls in src/h264.c builds the v4l2_ctrl_h264_sps it ships to the kernel by:

  1. Hardcoding sps.profile_idc from the libva session profile (h264_profile_to_idc(profile), h264.c:919) — overwrites the bitstream's actual profile_idc.
  2. Deriving sps.level_idc from the frame size in MBs (h264_derive_level_idc(...), h264.c:928) — overwrites the bitstream's actual level_idc.
  3. Reading sps.max_num_ref_frames from VAPicture->num_ref_frames (h264.c:516), which appears to be 0 for at least some streams ffmpeg-vaapi feeds.

For hardware V4L2 stateless decoders (rkvdec, rpi-hevc-dec) this may work in practice — their firmware re-parses SPS from the slice prefix or tolerates SPS-control mismatch. For the new daedalus_v4l2 / Option γ path (libavcodec via dlopen, in a userspace daemon), it's fatal: the daedalus daemon synthesises a fresh AnnexB SPS NAL from these V4L2 controls, libavcodec parses that synthesised SPS, and the slice data fails to match.

This became visible when DAEMON-PPS landed (daedalus-v4l2 PR #1, commit 3dd0eb0). The wire-protocol + kernel collect + daemon NAL synth + libavcodec integration all work end-to-end; the synthesised SPS is byte-correct for the input data — but the input data is wrong before it ever reaches daedalus.

Concrete reproduction on higgs (Pi CM5, kernel 6.18.29, daedalus_v4l2 + rpi-hevc-dec both loaded)

$ cd /tmp
$ ffmpeg -hide_banner -loglevel error -y \
    -f lavfi -i testsrc=duration=2:size=320x240:rate=30 \
    -c:v libx264 -profile:v baseline -preset ultrafast \
    -g 32 -bf 0 -pix_fmt yuv420p h264_test.mp4

$ LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi \
    -hwaccel_device /dev/dri/renderD128 -i h264_test.mp4 \
    -frames:v 1 -f null - 2>&1 | tail -3
v4l2-request: cap_pool_init: 24 slots ready ...
v4l2-request: Unable to set control(s): Invalid argument (error_idx=2/2 ioctl-level)

(setup-time DECODE_MODE/START_CODE probe — best-effort, ignored. Per-frame S_EXT_CTRLS succeeds and sets SPS+PPS into the daedalus_v4l2 request handler.)

Daedalus daemon journal then shows what came out of the kernel ctrl handler:

decoder: h264 SPS prof=100 level=41 ref_frames=0 w_mbs=19 h_units=14 poc_type=0 flags=0x10
decoder: h264 PPS spsid=0 ppsid=0 qp-26=0 flags=0x0
decoder: h264 prepended SPS=13B PPS=8B slice=505B

But the stream's actual SPS (extracted via ffmpeg -bsf:v h264_mp4toannexb + raw scan):

SPS NAL bytes: 42 c0 0d da 05 07 ec 04 40 00 00 03 00 40 00 00 0f 23 c5 0a a8
profile_idc=66 (Constrained Baseline)  constraint_flags=0xc0  level_idc=13 (level 1.3)

The libva-shipped values (profile=100/level=41) don't match the stream (profile=66/level=13). max_num_ref_frames=0 is also wrong — libx264 -bf 0 -g 32 writes a stream with at least 1 reference.

Libavcodec then complains:

[h264 @ ...] number of reference frames (1+1) exceeds max (0; probably corrupt input)
[h264 @ ...] illegal long ref in memory management control operation 4
[h264 @ ...] dquant out of range (-1059)
[h264 @ ...] top block unavailable for requested intra mode
[h264 @ ...] error while decoding MB 4 11

ffmpeg still exits rc=0 with a (corrupted) decoded frame because libavcodec is lenient about returning best-effort output.

Why this hits only the daedalus path

Hardware decoders accept the SPS struct fields as hints; they have their own bitstream parsers in firmware (rkvdec) or accept slice data with embedded SPS prefix (rpi-hevc-dec uses NONE start-code mode). They don't strictly enforce that the V4L2 SPS control matches what's in the slice.

Libavcodec via daedalus_v4l2 has no such tolerance: it parses the SPS NAL we hand it and uses those values to interpret the slice. Mismatch → decode garbage.

Proposed fix

Read actual stream-level SPS fields from the VAAPI parameter buffer (VAPictureParameterBufferH264) and the slice prefix — don't derive from session profile. Concretely:

  • sps.profile_idc: use VAPicture->seq_fields.bits.profile_idc if present, or extract from the SPS NAL the client embedded. Fall back to h264_profile_to_idc(profile) only if neither is available.
  • sps.level_idc: same — read the real level, fall back to the size-derived value only when nothing else.
  • sps.max_num_ref_frames: if VAPicture->num_ref_frames == 0, derive from VAPicture->seq_fields.bits.log2_max_frame_num_minus4 and reference-count of ReferenceFrames[], or fall back to a safe default (e.g. 1 for baseline, 4 for high) rather than 0.
  • PPS flags: investigate why every flag-bit field reads 0 — possibly pic_fields.bits.entropy_coding_mode_flag etc. is being read off the wrong offset; should be reproducible against any libx264-encoded sample.

Scope / blast radius

Fix affects only what the libva driver writes to V4L2 controls. Existing hardware decoder paths (rkvdec, rpi-hevc-dec) should be unaffected if their firmware was already ignoring the SPS struct; if some path relied on the hardcoded profile_idc=100, that's a worse bug that the fix surfaces. Worth running on rkvdec testbed (RK3399, RK3588) before merging to confirm no regression.

Related

  • DAEMON-PPS (daedalus-v4l2 PR #1, 3dd0eb0) — daemon-side AnnexB SPS+PPS synth from V4L2 ctrls. All works correctly; surfaces this bug.
  • LIBVA-1 (PR #6, c332d34) and LIBVA-2 (PR #7, 9898331) — per-codec dispatch. Required for any decode at all on higgs.
  • Project memory: "Pi 5 (higgs) blocked on rpi-hevc-dec SPS quirks" — possibly the same root cause manifesting differently on rpi-hevc-dec (which silently fails control validation rather than producing garbage).
## Summary `h264_set_controls` in src/h264.c builds the `v4l2_ctrl_h264_sps` it ships to the kernel by: 1. **Hardcoding `sps.profile_idc`** from the libva session profile (`h264_profile_to_idc(profile)`, h264.c:919) — overwrites the bitstream's actual `profile_idc`. 2. **Deriving `sps.level_idc`** from the frame size in MBs (`h264_derive_level_idc(...)`, h264.c:928) — overwrites the bitstream's actual `level_idc`. 3. **Reading `sps.max_num_ref_frames` from `VAPicture->num_ref_frames`** (h264.c:516), which appears to be `0` for at least some streams ffmpeg-vaapi feeds. For hardware V4L2 stateless decoders (rkvdec, rpi-hevc-dec) this may work in practice — their firmware re-parses SPS from the slice prefix or tolerates SPS-control mismatch. For the new **daedalus_v4l2 / Option γ** path (libavcodec via dlopen, in a userspace daemon), it's fatal: the daedalus daemon synthesises a fresh AnnexB SPS NAL from these V4L2 controls, libavcodec parses that synthesised SPS, and the slice data fails to match. This became visible when DAEMON-PPS landed (daedalus-v4l2 PR #1, commit `3dd0eb0`). The wire-protocol + kernel collect + daemon NAL synth + libavcodec integration all work end-to-end; the synthesised SPS is byte-correct for the input data — but the input data is wrong before it ever reaches daedalus. ## Concrete reproduction on higgs (Pi CM5, kernel 6.18.29, daedalus_v4l2 + rpi-hevc-dec both loaded) ``` $ cd /tmp $ ffmpeg -hide_banner -loglevel error -y \ -f lavfi -i testsrc=duration=2:size=320x240:rate=30 \ -c:v libx264 -profile:v baseline -preset ultrafast \ -g 32 -bf 0 -pix_fmt yuv420p h264_test.mp4 $ LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 -i h264_test.mp4 \ -frames:v 1 -f null - 2>&1 | tail -3 v4l2-request: cap_pool_init: 24 slots ready ... v4l2-request: Unable to set control(s): Invalid argument (error_idx=2/2 ioctl-level) ``` (setup-time DECODE_MODE/START_CODE probe — best-effort, ignored. Per-frame S_EXT_CTRLS succeeds and sets SPS+PPS into the daedalus_v4l2 request handler.) Daedalus daemon journal then shows what came out of the kernel ctrl handler: ``` decoder: h264 SPS prof=100 level=41 ref_frames=0 w_mbs=19 h_units=14 poc_type=0 flags=0x10 decoder: h264 PPS spsid=0 ppsid=0 qp-26=0 flags=0x0 decoder: h264 prepended SPS=13B PPS=8B slice=505B ``` But the **stream's actual SPS** (extracted via `ffmpeg -bsf:v h264_mp4toannexb` + raw scan): ``` SPS NAL bytes: 42 c0 0d da 05 07 ec 04 40 00 00 03 00 40 00 00 0f 23 c5 0a a8 profile_idc=66 (Constrained Baseline) constraint_flags=0xc0 level_idc=13 (level 1.3) ``` The libva-shipped values (profile=100/level=41) don't match the stream (profile=66/level=13). `max_num_ref_frames=0` is also wrong — `libx264 -bf 0 -g 32` writes a stream with at least 1 reference. Libavcodec then complains: ``` [h264 @ ...] number of reference frames (1+1) exceeds max (0; probably corrupt input) [h264 @ ...] illegal long ref in memory management control operation 4 [h264 @ ...] dquant out of range (-1059) [h264 @ ...] top block unavailable for requested intra mode [h264 @ ...] error while decoding MB 4 11 ``` ffmpeg still exits rc=0 with a (corrupted) decoded frame because libavcodec is lenient about returning best-effort output. ## Why this hits only the daedalus path Hardware decoders accept the SPS struct fields as hints; they have their own bitstream parsers in firmware (rkvdec) or accept slice data with embedded SPS prefix (rpi-hevc-dec uses NONE start-code mode). They don't strictly enforce that the V4L2 SPS control matches what's in the slice. Libavcodec via daedalus_v4l2 has no such tolerance: it parses the SPS NAL we hand it and uses those values to interpret the slice. Mismatch → decode garbage. ## Proposed fix Read actual stream-level SPS fields from the VAAPI parameter buffer (`VAPictureParameterBufferH264`) and the slice prefix — don't derive from session profile. Concretely: - `sps.profile_idc`: use `VAPicture->seq_fields.bits.profile_idc` if present, or extract from the SPS NAL the client embedded. Fall back to `h264_profile_to_idc(profile)` only if neither is available. - `sps.level_idc`: same — read the real level, fall back to the size-derived value only when nothing else. - `sps.max_num_ref_frames`: if `VAPicture->num_ref_frames == 0`, derive from `VAPicture->seq_fields.bits.log2_max_frame_num_minus4` and reference-count of `ReferenceFrames[]`, or fall back to a safe default (e.g. 1 for baseline, 4 for high) rather than 0. - PPS flags: investigate why every flag-bit field reads 0 — possibly `pic_fields.bits.entropy_coding_mode_flag` etc. is being read off the wrong offset; should be reproducible against any libx264-encoded sample. ## Scope / blast radius Fix affects only what the libva driver writes to V4L2 controls. Existing hardware decoder paths (rkvdec, rpi-hevc-dec) should be unaffected if their firmware was already ignoring the SPS struct; if some path relied on the hardcoded `profile_idc=100`, that's a worse bug that the fix surfaces. Worth running on rkvdec testbed (RK3399, RK3588) before merging to confirm no regression. ## Related - DAEMON-PPS (daedalus-v4l2 PR #1, `3dd0eb0`) — daemon-side AnnexB SPS+PPS synth from V4L2 ctrls. All works correctly; surfaces this bug. - LIBVA-1 (PR #6, `c332d34`) and LIBVA-2 (PR #7, `9898331`) — per-codec dispatch. Required for any decode at all on higgs. - Project memory: "Pi 5 (higgs) blocked on rpi-hevc-dec SPS quirks" — possibly the same root cause manifesting differently on rpi-hevc-dec (which silently fails control validation rather than producing garbage).
Author
Collaborator

Triage note 2026-05-20 — proposed fix #1 needs to be repointed; VAAPI doesn't expose profile_idc/level_idc.

Read libva 2.22.0-3's VAPictureParameterBufferH264 struct on higgs (/usr/include/va/va.h:3571-3622):

typedef struct _VAPictureParameterBufferH264 {
    VAPictureH264 CurrPic;
    VAPictureH264 ReferenceFrames[16];
    uint16_t picture_width_in_mbs_minus1;
    uint16_t picture_height_in_mbs_minus1;
    uint8_t bit_depth_luma_minus8;
    uint8_t bit_depth_chroma_minus8;
    uint8_t num_ref_frames;
    union {
        struct {
            uint32_t chroma_format_idc                       : 2;
            uint32_t residual_colour_transform_flag          : 1;
            uint32_t gaps_in_frame_num_value_allowed_flag    : 1;
            uint32_t frame_mbs_only_flag                     : 1;
            uint32_t mb_adaptive_frame_field_flag            : 1;
            uint32_t direct_8x8_inference_flag               : 1;
            uint32_t MinLumaBiPredSize8x8                    : 1;
            uint32_t log2_max_frame_num_minus4               : 4;
            uint32_t pic_order_cnt_type                      : 2;
            uint32_t log2_max_pic_order_cnt_lsb_minus4       : 4;
            uint32_t delta_pic_order_always_zero_flag        : 1;
        } bits;
        ...
    } seq_fields;
    ...
    union {
        struct {
            uint32_t entropy_coding_mode_flag                : 1;
            uint32_t weighted_pred_flag                      : 1;
            uint32_t weighted_bipred_idc                     : 2;
            uint32_t transform_8x8_mode_flag                 : 1;
            uint32_t field_pic_flag                          : 1;
            uint32_t constrained_intra_pred_flag             : 1;
            uint32_t pic_order_present_flag                  : 1;
            uint32_t deblocking_filter_control_present_flag  : 1;
            uint32_t redundant_pic_cnt_present_flag          : 1;
            uint32_t reference_pic_flag                      : 1;
        } bits;
        ...
    } pic_fields;
    uint16_t frame_num;
    ...
} VAPictureParameterBufferH264;

No profile_idc. No level_idc. No constraint_set*_flag. Same family of VAAPI-blindspot as feedback_vaapi_blind_to_some_hevc_sps_fields.

Implication for each proposed fix

# Proposed fix Verdict
1 sps.profile_idc from seq_fields.bits.profile_idc Not possible — field doesn't exist in libva. Must either parse the SPS NAL from surface->source_data (slice prefix), add a wire-protocol field daedalus client → daemon, OR keep the current session-derived hardcode (correct for hardware decoders, wrong for daedalus).
2 sps.level_idc from seq_fields.bits.level_idc Not possible — same reason. Real bitstream level_idc requires SPS-NAL parsing or wire-protocol pass-through.
3 sps.max_num_ref_frames fallback when 0 Possible. Practical fallback: count non-INVALID entries in VAPicture->ReferenceFrames[] (which has 16 slots); if still 0, default per-profile (1 for baseline, 4 for main/high).
4 PPS flags reading 0 Investigation needed. Code at h264.c:494-514 reads pic_fields.bits.*_flag correctly per the libva struct. If they're 0 after-the-fact, the cause is upstream (ffmpeg-vaapi didn't populate them) OR daedalus wire-protocol serialization clobbering them. Need a one-line log dumping pic_fields.value (raw uint32) at h264_set_controls entry to disambiguate before/after.

What's actually fixable today, narrowed scope

Three changes I'd ship as one commit:

/* Fix 3 — max_num_ref_frames fallback */
sps->max_num_ref_frames = VAPicture->num_ref_frames;
if (sps->max_num_ref_frames == 0) {
    int n = 0;
    for (int i = 0; i < 16; i++)
        if (VAPicture->ReferenceFrames[i].flags != VA_PICTURE_H264_INVALID)
            n++;
    if (n > 0) {
        sps->max_num_ref_frames = (uint8_t)n;
    } else {
        /* Last-resort per-profile default; better than 0 which lets
         * libavcodec reject any frame as exceeding max. */
        sps->max_num_ref_frames = (profile == VAProfileH264ConstrainedBaseline ||
                                   profile == VAProfileH264Main) ? 1 : 4;
    }
}

/* Fix 4 instrumentation — single log line at function entry */
request_log("h264_set_controls: seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u\n",
            VAPicture->seq_fields.value,
            VAPicture->pic_fields.value,
            VAPicture->num_ref_frames);

Fix 1+2 are operator decisions on whether to pursue SPS-NAL parsing in this backend. The slice prefix typically contains the SPS+PPS+slice when ffmpeg-vaapi writes it — confirmable by dumping the first 64 bytes of surface->source_data and grepping for NAL 0x00 0x00 0x01 0x67 (SPS) prefix. If the SPS is there for libx264 baseline output, the backend can parse it; if it's not, daedalus needs the wire-protocol pass-through path.

Anomaly to flag: the reproducer log shows level=41 from a 320x240 baseline stream (300 MBs → h264_derive_level_idc returns 11, not 41). So either the daedalus daemon's level field comes from somewhere other than the SPS struct (maybe an older code path / cached state), or the log is misformatted. Worth running the reproducer with a request_log("h264_set_controls: derived level_idc=%u", sps.level_idc); to confirm the value at the libva boundary matches what daedalus prints.

Recommended next move

Ship Fix 3 + Fix 4 instrumentation now. After Fix 4 log lines come back from higgs, decide whether Fix 1+2 need NAL-parsing (a real ~100 LoC addition) vs wire-protocol pass-through (operator's daedalus design call). I can do the small fix + ship a PR; the SPS-NAL parse work is a separate scoped commit.

Want me to write the patch?

**Triage note 2026-05-20 — proposed fix #1 needs to be repointed; VAAPI doesn't expose `profile_idc`/`level_idc`.** Read libva 2.22.0-3's `VAPictureParameterBufferH264` struct on higgs (`/usr/include/va/va.h:3571-3622`): ```c typedef struct _VAPictureParameterBufferH264 { VAPictureH264 CurrPic; VAPictureH264 ReferenceFrames[16]; uint16_t picture_width_in_mbs_minus1; uint16_t picture_height_in_mbs_minus1; uint8_t bit_depth_luma_minus8; uint8_t bit_depth_chroma_minus8; uint8_t num_ref_frames; union { struct { uint32_t chroma_format_idc : 2; uint32_t residual_colour_transform_flag : 1; uint32_t gaps_in_frame_num_value_allowed_flag : 1; uint32_t frame_mbs_only_flag : 1; uint32_t mb_adaptive_frame_field_flag : 1; uint32_t direct_8x8_inference_flag : 1; uint32_t MinLumaBiPredSize8x8 : 1; uint32_t log2_max_frame_num_minus4 : 4; uint32_t pic_order_cnt_type : 2; uint32_t log2_max_pic_order_cnt_lsb_minus4 : 4; uint32_t delta_pic_order_always_zero_flag : 1; } bits; ... } seq_fields; ... union { struct { uint32_t entropy_coding_mode_flag : 1; uint32_t weighted_pred_flag : 1; uint32_t weighted_bipred_idc : 2; uint32_t transform_8x8_mode_flag : 1; uint32_t field_pic_flag : 1; uint32_t constrained_intra_pred_flag : 1; uint32_t pic_order_present_flag : 1; uint32_t deblocking_filter_control_present_flag : 1; uint32_t redundant_pic_cnt_present_flag : 1; uint32_t reference_pic_flag : 1; } bits; ... } pic_fields; uint16_t frame_num; ... } VAPictureParameterBufferH264; ``` **No `profile_idc`. No `level_idc`. No `constraint_set*_flag`.** Same family of VAAPI-blindspot as `feedback_vaapi_blind_to_some_hevc_sps_fields`. ## Implication for each proposed fix | # | Proposed fix | Verdict | |---|---|---| | 1 | `sps.profile_idc` from `seq_fields.bits.profile_idc` | **Not possible — field doesn't exist in libva.** Must either parse the SPS NAL from `surface->source_data` (slice prefix), add a wire-protocol field daedalus client → daemon, OR keep the current session-derived hardcode (correct for hardware decoders, wrong for daedalus). | | 2 | `sps.level_idc` from `seq_fields.bits.level_idc` | **Not possible — same reason.** Real bitstream `level_idc` requires SPS-NAL parsing or wire-protocol pass-through. | | 3 | `sps.max_num_ref_frames` fallback when 0 | **Possible.** Practical fallback: count non-INVALID entries in `VAPicture->ReferenceFrames[]` (which has 16 slots); if still 0, default per-profile (1 for baseline, 4 for main/high). | | 4 | PPS flags reading 0 | **Investigation needed.** Code at h264.c:494-514 reads `pic_fields.bits.*_flag` correctly per the libva struct. If they're 0 after-the-fact, the cause is upstream (ffmpeg-vaapi didn't populate them) OR daedalus wire-protocol serialization clobbering them. Need a one-line log dumping `pic_fields.value` (raw uint32) at h264_set_controls entry to disambiguate before/after. | ## What's actually fixable today, narrowed scope Three changes I'd ship as one commit: ```c /* Fix 3 — max_num_ref_frames fallback */ sps->max_num_ref_frames = VAPicture->num_ref_frames; if (sps->max_num_ref_frames == 0) { int n = 0; for (int i = 0; i < 16; i++) if (VAPicture->ReferenceFrames[i].flags != VA_PICTURE_H264_INVALID) n++; if (n > 0) { sps->max_num_ref_frames = (uint8_t)n; } else { /* Last-resort per-profile default; better than 0 which lets * libavcodec reject any frame as exceeding max. */ sps->max_num_ref_frames = (profile == VAProfileH264ConstrainedBaseline || profile == VAProfileH264Main) ? 1 : 4; } } /* Fix 4 instrumentation — single log line at function entry */ request_log("h264_set_controls: seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u\n", VAPicture->seq_fields.value, VAPicture->pic_fields.value, VAPicture->num_ref_frames); ``` Fix 1+2 are operator decisions on whether to pursue SPS-NAL parsing in this backend. The slice prefix typically contains the SPS+PPS+slice when ffmpeg-vaapi writes it — confirmable by dumping the first 64 bytes of `surface->source_data` and grepping for NAL `0x00 0x00 0x01 0x67` (SPS) prefix. If the SPS is there for libx264 baseline output, the backend can parse it; if it's not, daedalus needs the wire-protocol pass-through path. Anomaly to flag: the reproducer log shows `level=41` from a 320x240 baseline stream (300 MBs → `h264_derive_level_idc` returns 11, not 41). So either the daedalus daemon's level field comes from somewhere other than the SPS struct (maybe an older code path / cached state), or the log is misformatted. Worth running the reproducer with a `request_log("h264_set_controls: derived level_idc=%u", sps.level_idc);` to confirm the value at the libva boundary matches what daedalus prints. ## Recommended next move Ship Fix 3 + Fix 4 instrumentation now. After Fix 4 log lines come back from higgs, decide whether Fix 1+2 need NAL-parsing (a real ~100 LoC addition) vs wire-protocol pass-through (operator's daedalus design call). I can do the small fix + ship a PR; the SPS-NAL parse work is a separate scoped commit. Want me to write the patch?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/libva-v4l2-request-fourier#8