c6e3f58958
Read FFmpeg's libavcodec/v4l2_request_h264.c (Kwiboo downstream
v4l2-request-n8.1 head b57fbbe, proven working on hantro hardware) and
Linux mainline drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
(the actual register-write code).
Smoking gun: hantro G1 writes three DECODE_PARAMS fields directly into
hardware MMIO registers:
dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID
pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH
When all three are zero (our current state — patch 0008 left them
uninitialized with the open question "does hantro tolerate?"), the
hardware bitstream parser advances by zero bits past slice_type,
lands on garbage, decodes zero pixels into the CAPTURE buffer.
This explains the all-zero CAPTURE output we see across mpv and
Firefox in the live-session real-VO tests. Patch 0008's "Empirical
question" was empirically answered today: NO, hantro does not
tolerate zeros in these fields.
Additional easy fixes identified:
- pps->num_ref_idx_l0_default_active_minus1 (uninitialized;
VAAPI provides; written to MMIO REFIDX0_ACTIVE)
- pps->num_ref_idx_l1_default_active_minus1 (same)
- pps->pic_parameter_set_id (uninitialized; for single-PPS BBB
accidentally correct at 0; written to MMIO PPS_ID)
- pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always
(FFmpeg comment: "FFmpeg always provide a scaling matrix")
- always send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request
(with default flat matrix when no VAIQMatrixBuffer arrives)
The load-bearing fix requires implementing a small slice_header()
bit-parser in libva-v4l2-request — VAAPI doesn't carry the
bit-position fields and hantro requires them. ~50-100 lines of C
in a new helper. This is real Phase 6 work, well-scoped.
Saved feedback memory:
- ../.claude/projects/-home-mfritsche-src/memory/
feedback_kernel_source_audit_for_uapi_contract.md
on the broader lesson: read kernel-side driver source for any
UAPI control fields userspace fills, especially when the kernel
writes them into hardware MMIO. "Empirical question -- does the
kernel tolerate?" is not an acceptable resolution; either read
the source or get a definitive empirical answer.
Reference sources gitignored — separate-repo discipline. Pull
recipes (ffmpeg-kwiboo + linux-mainline sparse clones) preserved
in shell history; can be reconstructed from diff_against_ffmpeg.md
URL/branch references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
122 lines
10 KiB
Markdown
122 lines
10 KiB
Markdown
# Phase 4 input — `src/h264.c` diff vs FFmpeg + hantro kernel-source ground truth
|
||
|
||
Reference sources:
|
||
- **FFmpeg** (proven working on hantro): `references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c` (Kwiboo's downstream `v4l2-request-n8.1`, head `b57fbbe`)
|
||
- **Linux mainline kernel** (the actual hardware-register-write code): `references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_h264_dec.c`
|
||
|
||
Acquired 2026-05-04. Both gitignored from the campaign repo (separate-repo discipline; pull recipes in `references/README` if/when added).
|
||
|
||
## The smoking gun — DECODE_PARAMS bit-size fields are MMIO-bound on hantro G1
|
||
|
||
`hantro_g1_h264_dec.c::set_params()` writes these dec_param fields **directly into hardware registers**:
|
||
|
||
| dec_param field | Register | Our value | FFmpeg's source |
|
||
|---|---|---|---|
|
||
| `dec_ref_pic_marking_bit_size` | `G1_REG_DEC_CTRL5_REFPIC_MK_LEN` | **0** (uninitialized) | `sl->ref_pic_marking_bit_size` (bit-parsed from slice header) |
|
||
| `idr_pic_id` | `G1_REG_DEC_CTRL5_IDR_PIC_ID` | **0** (uninitialized) | `sl->idr_pic_id` (bit-parsed from slice header) |
|
||
| `pic_order_cnt_bit_size` | `G1_REG_DEC_CTRL6_POC_LENGTH` | **0** (uninitialized) | `sl->pic_order_cnt_bit_size` (bit-parsed from slice header) |
|
||
|
||
When all three are zero, hantro's hardware bitstream parser thinks the slice header has zero ref_pic_marking bits and zero POC syntax bits, advances the bitstream pointer by zero past `slice_type`, lands on garbage, decodes zero pixels into the CAPTURE buffer.
|
||
|
||
**Patch 0008's open question** — "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" — is **empirically answered: NO, hantro does not tolerate zeros**. The patch left this unresolved; the kernel source resolves it definitively (registers read, hardware enforces). The "contract before code" discipline was violated in the predecessor work — the contract (these fields drive hardware MMIO writes) was not actually verified against kernel source, and "Empirical question" was left as an unaudited TODO.
|
||
|
||
This explains the all-zero CAPTURE output we see across mpv and Firefox.
|
||
|
||
## All hantro G1 dec_param reads
|
||
|
||
From `hantro_g1_h264_dec.c::set_params()`:
|
||
|
||
| Field | Our coverage | Status |
|
||
|---|---|---|
|
||
| `dec_param->nal_ref_idc` | ✓ set from slice byte (line 378) | ok |
|
||
| `dec_param->frame_num` | ✓ set from `VAPicture->frame_num` (line 379) | ok |
|
||
| `dec_param->top_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) |
|
||
| `dec_param->bottom_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) |
|
||
| `dec_param->dec_ref_pic_marking_bit_size` | ✗ **zero — written to MMIO REFPIC_MK_LEN** | **FIX REQUIRED** |
|
||
| `dec_param->idr_pic_id` | ✗ **zero — written to MMIO IDR_PIC_ID** | **FIX REQUIRED** |
|
||
| `dec_param->pic_order_cnt_bit_size` | ✗ **zero — written to MMIO POC_LENGTH** | **FIX REQUIRED** |
|
||
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC` | ✓ set from VAAPI field_pic_flag | ok |
|
||
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD` | ✓ set from VAAPI flags | ok |
|
||
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC` | ✓ set when nal_unit_type == 5 | ok |
|
||
| `dec_param->dpb[].reference_ts` | populated by patch 0017 | verify against FFmpeg's `ff_v4l2_request_get_capture_timestamp` semantics |
|
||
| `dec_param->dpb[].frame_num` | populated by patch 0017 | ok |
|
||
|
||
## All hantro G1 sps reads
|
||
|
||
| sps field | Our coverage | Status |
|
||
|---|---|---|
|
||
| `sps->profile_idc` | ✓ set from VAProfile via `h264_profile_to_idc()` | verify h264_profile_to_idc returns reasonable values |
|
||
| `sps->max_num_ref_frames` | ✓ set from `VAPicture->num_ref_frames` | ok |
|
||
| `sps->chroma_format_idc` | ✓ set from `VAPicture->seq_fields` | ok |
|
||
| `sps->log2_max_frame_num_minus4` | ✓ set from `VAPicture->seq_fields` | ok |
|
||
| `sps->flags` (bits: `MB_ADAPTIVE_FRAME_FIELD`, `FRAME_MBS_ONLY`, `DIRECT_8X8_INFERENCE`, etc.) | ✓ set from VAAPI seq_fields | ok |
|
||
|
||
## All hantro G1 pps reads
|
||
|
||
| pps field | Our coverage | Status |
|
||
|---|---|---|
|
||
| `pps->chroma_qp_index_offset` | ✓ | ok |
|
||
| `pps->second_chroma_qp_index_offset` | ✓ | ok |
|
||
| `pps->pic_init_qp_minus26` | ✓ | ok |
|
||
| `pps->weighted_bipred_idc` | ✓ | ok |
|
||
| `pps->pic_parameter_set_id` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_PPS_ID` | For BBB single-PPS streams, ID=0 is accidentally correct; **FIX (set explicitly to 0 or bit-parse from slice header) for general correctness** |
|
||
| `pps->num_ref_idx_l0_default_active_minus1` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(... + 1)` so kernel computes `0 + 1 = 1` | **FIX REQUIRED** — VAAPI `VAPictureParameterBufferH264.num_ref_idx_l0_default_active_minus1` exists, just copy |
|
||
| `pps->num_ref_idx_l1_default_active_minus1` | ✗ **zero (uninitialized)** | **FIX REQUIRED** — same as above |
|
||
| `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` | ✗ Patch 0012 sets only when `matrix_set` is true; **FFmpeg always sets it** (with a "FFmpeg always provide a scaling matrix" comment) | **FIX REQUIRED** — set unconditionally, and unconditionally send SCALING_MATRIX in the request (with default flat 16/16 matrix when no VAIQMatrixBuffer arrives) |
|
||
| `pps->flags` other bits | ✓ set from VAAPI pic_fields | ok |
|
||
|
||
## All FFmpeg per-request controls vs ours
|
||
|
||
FFmpeg's `v4l2_request_h264_queue_decode` for FRAME_BASED mode sends 4 controls:
|
||
1. `V4L2_CID_STATELESS_H264_SPS`
|
||
2. `V4L2_CID_STATELESS_H264_PPS`
|
||
3. `V4L2_CID_STATELESS_H264_SCALING_MATRIX` ← **always**
|
||
4. `V4L2_CID_STATELESS_H264_DECODE_PARAMS`
|
||
|
||
Our `src/h264.c` for FRAME_BASED sends 3 controls (when `matrix_set == false`):
|
||
1. SPS
|
||
2. PPS
|
||
3. DECODE_PARAMS
|
||
(SCALING_MATRIX omitted by patch 0012)
|
||
|
||
This matters because hantro reads `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` (line 64 of `hantro_g1_h264_dec.c`) and sets `G1_REG_DEC_CTRL2_TYPE1_QUANT_E` based on it. If we *don't* set the flag and *don't* send the matrix, hantro's `TYPE1_QUANT_E` register bit stays clear and the hardware skips the per-matrix scaling. That's probably *correct* behavior actually — but for safety + parity with FFmpeg, set it always.
|
||
|
||
The bigger ambiguity: does hantro's hardware tolerate `SCALING_MATRIX_PRESENT=0` + missing matrix control, OR does it expect the matrix to always be present? FFmpeg's "always provide a scaling matrix" comment suggests the latter is the safer assumption.
|
||
|
||
## Fix priority
|
||
|
||
**TIER 1 — easy, highest leverage, do first**:
|
||
|
||
A. **Set `dec_param->idr_pic_id`, `dec_ref_pic_marking_bit_size`, `pic_order_cnt_bit_size` correctly.** Requires bit-parsing the slice header. ~50–100 lines of careful C in a new helper, e.g. `h264_slice_header_extract_bitsizes()`. Inputs: slice_data buffer, `sps_id`, sps fields (`log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`), `pps_id`, pps fields (`bottom_field_pic_order_in_frame_present_flag`), `nal_unit_type`, `nal_ref_idc`. Outputs: `idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_bottom`, `delta_pic_order_cnt[0..1]`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size`. **This is the load-bearing fix.**
|
||
|
||
B. **Copy `pps->num_ref_idx_l0_default_active_minus1` and `num_ref_idx_l1_default_active_minus1` from VAAPI.** 2-line fix. Cheap insurance.
|
||
|
||
**TIER 2 — easy, parity-with-FFmpeg, safe**:
|
||
|
||
C. **Set `pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` always.** And unconditionally send `V4L2_CID_STATELESS_H264_SCALING_MATRIX` in the request, with a default flat matrix (all 16/16/...) when no VAIQMatrixBuffer arrived from the consumer.
|
||
|
||
D. **Verify patch 0015's POC sentinel strip is actually applied** to the values that reach VIDIOC_S_EXT_CTRLS. Add a post-strip dump or VIDIOC_G_EXT_CTRLS readback — patch 0014's DEBUG dump runs *before* the strip in the code path, so we never confirmed the strip works.
|
||
|
||
**TIER 3 — diagnostic plumbing (can run in parallel with the above)**:
|
||
|
||
E. **Fix patch 0011's cache coherency bug** (msync MS_INVALIDATE or DMA-BUF cache sync ioctl) so future probes are reliable.
|
||
|
||
F. **Add a VIDIOC_G_EXT_CTRLS readback** right before `MEDIA_REQUEST_IOC_QUEUE` to dump everything actually being submitted. Cross-validates fixes A-D land at the V4L2 layer.
|
||
|
||
## Order of attack
|
||
|
||
1. **Tier 2C first** (1-line fix to patch 0012 + small change to control-set composition) — cheapest possible test that disambiguates "scaling matrix is the only thing missing" from "we have multiple bugs."
|
||
2. **Tier 1B** (2-line copy of num_ref_idx fields).
|
||
3. Rebuild, reinstall on ohm, real-VO test. Check whether output is still all-zero or starts producing something.
|
||
4. **Tier 1A** (the slice header bit parser) — the load-bearing fix. Requires care; cross-test against ffmpeg's `H264SliceContext::ref_pic_marking_bit_size` etc. computations.
|
||
5. Rebuild, real-VO test. Bunny should appear.
|
||
6. **Tier 3E + 3F** as observability infrastructure for future probes.
|
||
|
||
## Why the predecessor work missed this
|
||
|
||
The Step 1 patch series was authored with cross-references to FFmpeg's `v4l2_request_h264.c` per the `STUDY.md`, but the **kernel source-side audit was not performed**. Patch 0008's commit message says "Empirical question: does hantro tolerate the bit_size fields being zero?" — the empirical answer was supposedly to be obtained "by running this patch and inspecting the CAPTURE buffer" (per `picture.c` comment). That inspection was performed via patch 0011's sentinel test, which had a cache-coherency bug (`phase0_evidence/2026-05-04-kernel-trace/findings.md`), so the empirical answer never landed. Plus "kernel source side audit" wasn't a step in the loop at that point.
|
||
|
||
The lesson: when the kernel code reads V4L2 control fields directly into MMIO registers, those fields are **load-bearing** at the contract level. Their values aren't merely "informational" — the hardware parses bitstream based on them. Setting them to zero isn't "passing through unspecified data," it's "lying to the hardware about how many bits to skip."
|
||
|
||
Adding to memory under `feedback_kernel_source_audit_for_uapi_contract.md` for the broader Phase 6 "contract before code" discipline.
|