Files
libva-multiplanar/diff_against_ffmpeg.md
marfrit c6e3f58958 Phase 4 input: diff against FFmpeg + hantro kernel source identifies bug
Read FFmpeg's libavcodec/v4l2_request_h264.c (Kwiboo downstream
v4l2-request-n8.1 head b57fbbe, proven working on hantro hardware) and
Linux mainline drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
(the actual register-write code).

Smoking gun: hantro G1 writes three DECODE_PARAMS fields directly into
hardware MMIO registers:

  dec_ref_pic_marking_bit_size  -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
  idr_pic_id                    -> G1_REG_DEC_CTRL5_IDR_PIC_ID
  pic_order_cnt_bit_size        -> G1_REG_DEC_CTRL6_POC_LENGTH

When all three are zero (our current state — patch 0008 left them
uninitialized with the open question "does hantro tolerate?"), the
hardware bitstream parser advances by zero bits past slice_type,
lands on garbage, decodes zero pixels into the CAPTURE buffer.

This explains the all-zero CAPTURE output we see across mpv and
Firefox in the live-session real-VO tests. Patch 0008's "Empirical
question" was empirically answered today: NO, hantro does not
tolerate zeros in these fields.

Additional easy fixes identified:
- pps->num_ref_idx_l0_default_active_minus1 (uninitialized;
  VAAPI provides; written to MMIO REFIDX0_ACTIVE)
- pps->num_ref_idx_l1_default_active_minus1 (same)
- pps->pic_parameter_set_id (uninitialized; for single-PPS BBB
  accidentally correct at 0; written to MMIO PPS_ID)
- pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always
  (FFmpeg comment: "FFmpeg always provide a scaling matrix")
- always send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request
  (with default flat matrix when no VAIQMatrixBuffer arrives)

The load-bearing fix requires implementing a small slice_header()
bit-parser in libva-v4l2-request — VAAPI doesn't carry the
bit-position fields and hantro requires them. ~50-100 lines of C
in a new helper. This is real Phase 6 work, well-scoped.

Saved feedback memory:
- ../.claude/projects/-home-mfritsche-src/memory/
    feedback_kernel_source_audit_for_uapi_contract.md
  on the broader lesson: read kernel-side driver source for any
  UAPI control fields userspace fills, especially when the kernel
  writes them into hardware MMIO. "Empirical question -- does the
  kernel tolerate?" is not an acceptable resolution; either read
  the source or get a definitive empirical answer.

Reference sources gitignored — separate-repo discipline. Pull
recipes (ffmpeg-kwiboo + linux-mainline sparse clones) preserved
in shell history; can be reconstructed from diff_against_ffmpeg.md
URL/branch references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:08:43 +00:00

10 KiB
Raw Permalink Blame History

Phase 4 input — src/h264.c diff vs FFmpeg + hantro kernel-source ground truth

Reference sources:

  • FFmpeg (proven working on hantro): references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c (Kwiboo's downstream v4l2-request-n8.1, head b57fbbe)
  • Linux mainline kernel (the actual hardware-register-write code): references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_h264_dec.c

Acquired 2026-05-04. Both gitignored from the campaign repo (separate-repo discipline; pull recipes in references/README if/when added).

The smoking gun — DECODE_PARAMS bit-size fields are MMIO-bound on hantro G1

hantro_g1_h264_dec.c::set_params() writes these dec_param fields directly into hardware registers:

dec_param field Register Our value FFmpeg's source
dec_ref_pic_marking_bit_size G1_REG_DEC_CTRL5_REFPIC_MK_LEN 0 (uninitialized) sl->ref_pic_marking_bit_size (bit-parsed from slice header)
idr_pic_id G1_REG_DEC_CTRL5_IDR_PIC_ID 0 (uninitialized) sl->idr_pic_id (bit-parsed from slice header)
pic_order_cnt_bit_size G1_REG_DEC_CTRL6_POC_LENGTH 0 (uninitialized) sl->pic_order_cnt_bit_size (bit-parsed from slice header)

When all three are zero, hantro's hardware bitstream parser thinks the slice header has zero ref_pic_marking bits and zero POC syntax bits, advances the bitstream pointer by zero past slice_type, lands on garbage, decodes zero pixels into the CAPTURE buffer.

Patch 0008's open question — "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" — is empirically answered: NO, hantro does not tolerate zeros. The patch left this unresolved; the kernel source resolves it definitively (registers read, hardware enforces). The "contract before code" discipline was violated in the predecessor work — the contract (these fields drive hardware MMIO writes) was not actually verified against kernel source, and "Empirical question" was left as an unaudited TODO.

This explains the all-zero CAPTURE output we see across mpv and Firefox.

All hantro G1 dec_param reads

From hantro_g1_h264_dec.c::set_params():

Field Our coverage Status
dec_param->nal_ref_idc ✓ set from slice byte (line 378) ok
dec_param->frame_num ✓ set from VAPicture->frame_num (line 379) ok
dec_param->top_field_order_cnt ✓ set, with patch-0015 POC sentinel strip ok (verify strip works)
dec_param->bottom_field_order_cnt ✓ set, with patch-0015 POC sentinel strip ok (verify strip works)
dec_param->dec_ref_pic_marking_bit_size zero — written to MMIO REFPIC_MK_LEN FIX REQUIRED
dec_param->idr_pic_id zero — written to MMIO IDR_PIC_ID FIX REQUIRED
dec_param->pic_order_cnt_bit_size zero — written to MMIO POC_LENGTH FIX REQUIRED
dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC ✓ set from VAAPI field_pic_flag ok
dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD ✓ set from VAAPI flags ok
dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC ✓ set when nal_unit_type == 5 ok
dec_param->dpb[].reference_ts populated by patch 0017 verify against FFmpeg's ff_v4l2_request_get_capture_timestamp semantics
dec_param->dpb[].frame_num populated by patch 0017 ok

All hantro G1 sps reads

sps field Our coverage Status
sps->profile_idc ✓ set from VAProfile via h264_profile_to_idc() verify h264_profile_to_idc returns reasonable values
sps->max_num_ref_frames ✓ set from VAPicture->num_ref_frames ok
sps->chroma_format_idc ✓ set from VAPicture->seq_fields ok
sps->log2_max_frame_num_minus4 ✓ set from VAPicture->seq_fields ok
sps->flags (bits: MB_ADAPTIVE_FRAME_FIELD, FRAME_MBS_ONLY, DIRECT_8X8_INFERENCE, etc.) ✓ set from VAAPI seq_fields ok

All hantro G1 pps reads

pps field Our coverage Status
pps->chroma_qp_index_offset ok
pps->second_chroma_qp_index_offset ok
pps->pic_init_qp_minus26 ok
pps->weighted_bipred_idc ok
pps->pic_parameter_set_id zero (uninitialized) — written to MMIO G1_REG_DEC_CTRL6_PPS_ID For BBB single-PPS streams, ID=0 is accidentally correct; FIX (set explicitly to 0 or bit-parse from slice header) for general correctness
pps->num_ref_idx_l0_default_active_minus1 zero (uninitialized) — written to MMIO G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(... + 1) so kernel computes 0 + 1 = 1 FIX REQUIRED — VAAPI VAPictureParameterBufferH264.num_ref_idx_l0_default_active_minus1 exists, just copy
pps->num_ref_idx_l1_default_active_minus1 zero (uninitialized) FIX REQUIRED — same as above
pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT ✗ Patch 0012 sets only when matrix_set is true; FFmpeg always sets it (with a "FFmpeg always provide a scaling matrix" comment) FIX REQUIRED — set unconditionally, and unconditionally send SCALING_MATRIX in the request (with default flat 16/16 matrix when no VAIQMatrixBuffer arrives)
pps->flags other bits ✓ set from VAAPI pic_fields ok

All FFmpeg per-request controls vs ours

FFmpeg's v4l2_request_h264_queue_decode for FRAME_BASED mode sends 4 controls:

  1. V4L2_CID_STATELESS_H264_SPS
  2. V4L2_CID_STATELESS_H264_PPS
  3. V4L2_CID_STATELESS_H264_SCALING_MATRIXalways
  4. V4L2_CID_STATELESS_H264_DECODE_PARAMS

Our src/h264.c for FRAME_BASED sends 3 controls (when matrix_set == false):

  1. SPS
  2. PPS
  3. DECODE_PARAMS (SCALING_MATRIX omitted by patch 0012)

This matters because hantro reads pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT (line 64 of hantro_g1_h264_dec.c) and sets G1_REG_DEC_CTRL2_TYPE1_QUANT_E based on it. If we don't set the flag and don't send the matrix, hantro's TYPE1_QUANT_E register bit stays clear and the hardware skips the per-matrix scaling. That's probably correct behavior actually — but for safety + parity with FFmpeg, set it always.

The bigger ambiguity: does hantro's hardware tolerate SCALING_MATRIX_PRESENT=0 + missing matrix control, OR does it expect the matrix to always be present? FFmpeg's "always provide a scaling matrix" comment suggests the latter is the safer assumption.

Fix priority

TIER 1 — easy, highest leverage, do first:

A. Set dec_param->idr_pic_id, dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size correctly. Requires bit-parsing the slice header. ~50100 lines of careful C in a new helper, e.g. h264_slice_header_extract_bitsizes(). Inputs: slice_data buffer, sps_id, sps fields (log2_max_frame_num_minus4, pic_order_cnt_type, log2_max_pic_order_cnt_lsb_minus4), pps_id, pps fields (bottom_field_pic_order_in_frame_present_flag), nal_unit_type, nal_ref_idc. Outputs: idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_bottom, delta_pic_order_cnt[0..1], pic_order_cnt_bit_size, dec_ref_pic_marking_bit_size. This is the load-bearing fix.

B. Copy pps->num_ref_idx_l0_default_active_minus1 and num_ref_idx_l1_default_active_minus1 from VAAPI. 2-line fix. Cheap insurance.

TIER 2 — easy, parity-with-FFmpeg, safe:

C. Set pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always. And unconditionally send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request, with a default flat matrix (all 16/16/...) when no VAIQMatrixBuffer arrived from the consumer.

D. Verify patch 0015's POC sentinel strip is actually applied to the values that reach VIDIOC_S_EXT_CTRLS. Add a post-strip dump or VIDIOC_G_EXT_CTRLS readback — patch 0014's DEBUG dump runs before the strip in the code path, so we never confirmed the strip works.

TIER 3 — diagnostic plumbing (can run in parallel with the above):

E. Fix patch 0011's cache coherency bug (msync MS_INVALIDATE or DMA-BUF cache sync ioctl) so future probes are reliable.

F. Add a VIDIOC_G_EXT_CTRLS readback right before MEDIA_REQUEST_IOC_QUEUE to dump everything actually being submitted. Cross-validates fixes A-D land at the V4L2 layer.

Order of attack

  1. Tier 2C first (1-line fix to patch 0012 + small change to control-set composition) — cheapest possible test that disambiguates "scaling matrix is the only thing missing" from "we have multiple bugs."
  2. Tier 1B (2-line copy of num_ref_idx fields).
  3. Rebuild, reinstall on ohm, real-VO test. Check whether output is still all-zero or starts producing something.
  4. Tier 1A (the slice header bit parser) — the load-bearing fix. Requires care; cross-test against ffmpeg's H264SliceContext::ref_pic_marking_bit_size etc. computations.
  5. Rebuild, real-VO test. Bunny should appear.
  6. Tier 3E + 3F as observability infrastructure for future probes.

Why the predecessor work missed this

The Step 1 patch series was authored with cross-references to FFmpeg's v4l2_request_h264.c per the STUDY.md, but the kernel source-side audit was not performed. Patch 0008's commit message says "Empirical question: does hantro tolerate the bit_size fields being zero?" — the empirical answer was supposedly to be obtained "by running this patch and inspecting the CAPTURE buffer" (per picture.c comment). That inspection was performed via patch 0011's sentinel test, which had a cache-coherency bug (phase0_evidence/2026-05-04-kernel-trace/findings.md), so the empirical answer never landed. Plus "kernel source side audit" wasn't a step in the loop at that point.

The lesson: when the kernel code reads V4L2 control fields directly into MMIO registers, those fields are load-bearing at the contract level. Their values aren't merely "informational" — the hardware parses bitstream based on them. Setting them to zero isn't "passing through unspecified data," it's "lying to the hardware about how many bits to skip."

Adding to memory under feedback_kernel_source_audit_for_uapi_contract.md for the broader Phase 6 "contract before code" discipline.