# Phase 4 input — `src/h264.c` diff vs FFmpeg + hantro kernel-source ground truth Reference sources: - **FFmpeg** (proven working on hantro): `references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c` (Kwiboo's downstream `v4l2-request-n8.1`, head `b57fbbe`) - **Linux mainline kernel** (the actual hardware-register-write code): `references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_h264_dec.c` Acquired 2026-05-04. Both gitignored from the campaign repo (separate-repo discipline; pull recipes in `references/README` if/when added). ## The smoking gun — DECODE_PARAMS bit-size fields are MMIO-bound on hantro G1 `hantro_g1_h264_dec.c::set_params()` writes these dec_param fields **directly into hardware registers**: | dec_param field | Register | Our value | FFmpeg's source | |---|---|---|---| | `dec_ref_pic_marking_bit_size` | `G1_REG_DEC_CTRL5_REFPIC_MK_LEN` | **0** (uninitialized) | `sl->ref_pic_marking_bit_size` (bit-parsed from slice header) | | `idr_pic_id` | `G1_REG_DEC_CTRL5_IDR_PIC_ID` | **0** (uninitialized) | `sl->idr_pic_id` (bit-parsed from slice header) | | `pic_order_cnt_bit_size` | `G1_REG_DEC_CTRL6_POC_LENGTH` | **0** (uninitialized) | `sl->pic_order_cnt_bit_size` (bit-parsed from slice header) | When all three are zero, hantro's hardware bitstream parser thinks the slice header has zero ref_pic_marking bits and zero POC syntax bits, advances the bitstream pointer by zero past `slice_type`, lands on garbage, decodes zero pixels into the CAPTURE buffer. **Patch 0008's open question** — "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" — is **empirically answered: NO, hantro does not tolerate zeros**. The patch left this unresolved; the kernel source resolves it definitively (registers read, hardware enforces). The "contract before code" discipline was violated in the predecessor work — the contract (these fields drive hardware MMIO writes) was not actually verified against kernel source, and "Empirical question" was left as an unaudited TODO. This explains the all-zero CAPTURE output we see across mpv and Firefox. ## All hantro G1 dec_param reads From `hantro_g1_h264_dec.c::set_params()`: | Field | Our coverage | Status | |---|---|---| | `dec_param->nal_ref_idc` | ✓ set from slice byte (line 378) | ok | | `dec_param->frame_num` | ✓ set from `VAPicture->frame_num` (line 379) | ok | | `dec_param->top_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) | | `dec_param->bottom_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) | | `dec_param->dec_ref_pic_marking_bit_size` | ✗ **zero — written to MMIO REFPIC_MK_LEN** | **FIX REQUIRED** | | `dec_param->idr_pic_id` | ✗ **zero — written to MMIO IDR_PIC_ID** | **FIX REQUIRED** | | `dec_param->pic_order_cnt_bit_size` | ✗ **zero — written to MMIO POC_LENGTH** | **FIX REQUIRED** | | `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC` | ✓ set from VAAPI field_pic_flag | ok | | `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD` | ✓ set from VAAPI flags | ok | | `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC` | ✓ set when nal_unit_type == 5 | ok | | `dec_param->dpb[].reference_ts` | populated by patch 0017 | verify against FFmpeg's `ff_v4l2_request_get_capture_timestamp` semantics | | `dec_param->dpb[].frame_num` | populated by patch 0017 | ok | ## All hantro G1 sps reads | sps field | Our coverage | Status | |---|---|---| | `sps->profile_idc` | ✓ set from VAProfile via `h264_profile_to_idc()` | verify h264_profile_to_idc returns reasonable values | | `sps->max_num_ref_frames` | ✓ set from `VAPicture->num_ref_frames` | ok | | `sps->chroma_format_idc` | ✓ set from `VAPicture->seq_fields` | ok | | `sps->log2_max_frame_num_minus4` | ✓ set from `VAPicture->seq_fields` | ok | | `sps->flags` (bits: `MB_ADAPTIVE_FRAME_FIELD`, `FRAME_MBS_ONLY`, `DIRECT_8X8_INFERENCE`, etc.) | ✓ set from VAAPI seq_fields | ok | ## All hantro G1 pps reads | pps field | Our coverage | Status | |---|---|---| | `pps->chroma_qp_index_offset` | ✓ | ok | | `pps->second_chroma_qp_index_offset` | ✓ | ok | | `pps->pic_init_qp_minus26` | ✓ | ok | | `pps->weighted_bipred_idc` | ✓ | ok | | `pps->pic_parameter_set_id` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_PPS_ID` | For BBB single-PPS streams, ID=0 is accidentally correct; **FIX (set explicitly to 0 or bit-parse from slice header) for general correctness** | | `pps->num_ref_idx_l0_default_active_minus1` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(... + 1)` so kernel computes `0 + 1 = 1` | **FIX REQUIRED** — VAAPI `VAPictureParameterBufferH264.num_ref_idx_l0_default_active_minus1` exists, just copy | | `pps->num_ref_idx_l1_default_active_minus1` | ✗ **zero (uninitialized)** | **FIX REQUIRED** — same as above | | `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` | ✗ Patch 0012 sets only when `matrix_set` is true; **FFmpeg always sets it** (with a "FFmpeg always provide a scaling matrix" comment) | **FIX REQUIRED** — set unconditionally, and unconditionally send SCALING_MATRIX in the request (with default flat 16/16 matrix when no VAIQMatrixBuffer arrives) | | `pps->flags` other bits | ✓ set from VAAPI pic_fields | ok | ## All FFmpeg per-request controls vs ours FFmpeg's `v4l2_request_h264_queue_decode` for FRAME_BASED mode sends 4 controls: 1. `V4L2_CID_STATELESS_H264_SPS` 2. `V4L2_CID_STATELESS_H264_PPS` 3. `V4L2_CID_STATELESS_H264_SCALING_MATRIX` ← **always** 4. `V4L2_CID_STATELESS_H264_DECODE_PARAMS` Our `src/h264.c` for FRAME_BASED sends 3 controls (when `matrix_set == false`): 1. SPS 2. PPS 3. DECODE_PARAMS (SCALING_MATRIX omitted by patch 0012) This matters because hantro reads `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` (line 64 of `hantro_g1_h264_dec.c`) and sets `G1_REG_DEC_CTRL2_TYPE1_QUANT_E` based on it. If we *don't* set the flag and *don't* send the matrix, hantro's `TYPE1_QUANT_E` register bit stays clear and the hardware skips the per-matrix scaling. That's probably *correct* behavior actually — but for safety + parity with FFmpeg, set it always. The bigger ambiguity: does hantro's hardware tolerate `SCALING_MATRIX_PRESENT=0` + missing matrix control, OR does it expect the matrix to always be present? FFmpeg's "always provide a scaling matrix" comment suggests the latter is the safer assumption. ## Fix priority **TIER 1 — easy, highest leverage, do first**: A. **Set `dec_param->idr_pic_id`, `dec_ref_pic_marking_bit_size`, `pic_order_cnt_bit_size` correctly.** Requires bit-parsing the slice header. ~50–100 lines of careful C in a new helper, e.g. `h264_slice_header_extract_bitsizes()`. Inputs: slice_data buffer, `sps_id`, sps fields (`log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`), `pps_id`, pps fields (`bottom_field_pic_order_in_frame_present_flag`), `nal_unit_type`, `nal_ref_idc`. Outputs: `idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_bottom`, `delta_pic_order_cnt[0..1]`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size`. **This is the load-bearing fix.** B. **Copy `pps->num_ref_idx_l0_default_active_minus1` and `num_ref_idx_l1_default_active_minus1` from VAAPI.** 2-line fix. Cheap insurance. **TIER 2 — easy, parity-with-FFmpeg, safe**: C. **Set `pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` always.** And unconditionally send `V4L2_CID_STATELESS_H264_SCALING_MATRIX` in the request, with a default flat matrix (all 16/16/...) when no VAIQMatrixBuffer arrived from the consumer. D. **Verify patch 0015's POC sentinel strip is actually applied** to the values that reach VIDIOC_S_EXT_CTRLS. Add a post-strip dump or VIDIOC_G_EXT_CTRLS readback — patch 0014's DEBUG dump runs *before* the strip in the code path, so we never confirmed the strip works. **TIER 3 — diagnostic plumbing (can run in parallel with the above)**: E. **Fix patch 0011's cache coherency bug** (msync MS_INVALIDATE or DMA-BUF cache sync ioctl) so future probes are reliable. F. **Add a VIDIOC_G_EXT_CTRLS readback** right before `MEDIA_REQUEST_IOC_QUEUE` to dump everything actually being submitted. Cross-validates fixes A-D land at the V4L2 layer. ## Order of attack 1. **Tier 2C first** (1-line fix to patch 0012 + small change to control-set composition) — cheapest possible test that disambiguates "scaling matrix is the only thing missing" from "we have multiple bugs." 2. **Tier 1B** (2-line copy of num_ref_idx fields). 3. Rebuild, reinstall on ohm, real-VO test. Check whether output is still all-zero or starts producing something. 4. **Tier 1A** (the slice header bit parser) — the load-bearing fix. Requires care; cross-test against ffmpeg's `H264SliceContext::ref_pic_marking_bit_size` etc. computations. 5. Rebuild, real-VO test. Bunny should appear. 6. **Tier 3E + 3F** as observability infrastructure for future probes. ## Why the predecessor work missed this The Step 1 patch series was authored with cross-references to FFmpeg's `v4l2_request_h264.c` per the `STUDY.md`, but the **kernel source-side audit was not performed**. Patch 0008's commit message says "Empirical question: does hantro tolerate the bit_size fields being zero?" — the empirical answer was supposedly to be obtained "by running this patch and inspecting the CAPTURE buffer" (per `picture.c` comment). That inspection was performed via patch 0011's sentinel test, which had a cache-coherency bug (`phase0_evidence/2026-05-04-kernel-trace/findings.md`), so the empirical answer never landed. Plus "kernel source side audit" wasn't a step in the loop at that point. The lesson: when the kernel code reads V4L2 control fields directly into MMIO registers, those fields are **load-bearing** at the contract level. Their values aren't merely "informational" — the hardware parses bitstream based on them. Setting them to zero isn't "passing through unspecified data," it's "lying to the hardware about how many bits to skip." Adding to memory under `feedback_kernel_source_audit_for_uapi_contract.md` for the broader Phase 6 "contract before code" discipline.