Files
libva-multiplanar/diff_against_ffmpeg.md
T
marfrit c6e3f58958 Phase 4 input: diff against FFmpeg + hantro kernel source identifies bug
Read FFmpeg's libavcodec/v4l2_request_h264.c (Kwiboo downstream
v4l2-request-n8.1 head b57fbbe, proven working on hantro hardware) and
Linux mainline drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
(the actual register-write code).

Smoking gun: hantro G1 writes three DECODE_PARAMS fields directly into
hardware MMIO registers:

  dec_ref_pic_marking_bit_size  -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
  idr_pic_id                    -> G1_REG_DEC_CTRL5_IDR_PIC_ID
  pic_order_cnt_bit_size        -> G1_REG_DEC_CTRL6_POC_LENGTH

When all three are zero (our current state — patch 0008 left them
uninitialized with the open question "does hantro tolerate?"), the
hardware bitstream parser advances by zero bits past slice_type,
lands on garbage, decodes zero pixels into the CAPTURE buffer.

This explains the all-zero CAPTURE output we see across mpv and
Firefox in the live-session real-VO tests. Patch 0008's "Empirical
question" was empirically answered today: NO, hantro does not
tolerate zeros in these fields.

Additional easy fixes identified:
- pps->num_ref_idx_l0_default_active_minus1 (uninitialized;
  VAAPI provides; written to MMIO REFIDX0_ACTIVE)
- pps->num_ref_idx_l1_default_active_minus1 (same)
- pps->pic_parameter_set_id (uninitialized; for single-PPS BBB
  accidentally correct at 0; written to MMIO PPS_ID)
- pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always
  (FFmpeg comment: "FFmpeg always provide a scaling matrix")
- always send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request
  (with default flat matrix when no VAIQMatrixBuffer arrives)

The load-bearing fix requires implementing a small slice_header()
bit-parser in libva-v4l2-request — VAAPI doesn't carry the
bit-position fields and hantro requires them. ~50-100 lines of C
in a new helper. This is real Phase 6 work, well-scoped.

Saved feedback memory:
- ../.claude/projects/-home-mfritsche-src/memory/
    feedback_kernel_source_audit_for_uapi_contract.md
  on the broader lesson: read kernel-side driver source for any
  UAPI control fields userspace fills, especially when the kernel
  writes them into hardware MMIO. "Empirical question -- does the
  kernel tolerate?" is not an acceptable resolution; either read
  the source or get a definitive empirical answer.

Reference sources gitignored — separate-repo discipline. Pull
recipes (ffmpeg-kwiboo + linux-mainline sparse clones) preserved
in shell history; can be reconstructed from diff_against_ffmpeg.md
URL/branch references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:08:43 +00:00

122 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 4 input — `src/h264.c` diff vs FFmpeg + hantro kernel-source ground truth
Reference sources:
- **FFmpeg** (proven working on hantro): `references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c` (Kwiboo's downstream `v4l2-request-n8.1`, head `b57fbbe`)
- **Linux mainline kernel** (the actual hardware-register-write code): `references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_h264_dec.c`
Acquired 2026-05-04. Both gitignored from the campaign repo (separate-repo discipline; pull recipes in `references/README` if/when added).
## The smoking gun — DECODE_PARAMS bit-size fields are MMIO-bound on hantro G1
`hantro_g1_h264_dec.c::set_params()` writes these dec_param fields **directly into hardware registers**:
| dec_param field | Register | Our value | FFmpeg's source |
|---|---|---|---|
| `dec_ref_pic_marking_bit_size` | `G1_REG_DEC_CTRL5_REFPIC_MK_LEN` | **0** (uninitialized) | `sl->ref_pic_marking_bit_size` (bit-parsed from slice header) |
| `idr_pic_id` | `G1_REG_DEC_CTRL5_IDR_PIC_ID` | **0** (uninitialized) | `sl->idr_pic_id` (bit-parsed from slice header) |
| `pic_order_cnt_bit_size` | `G1_REG_DEC_CTRL6_POC_LENGTH` | **0** (uninitialized) | `sl->pic_order_cnt_bit_size` (bit-parsed from slice header) |
When all three are zero, hantro's hardware bitstream parser thinks the slice header has zero ref_pic_marking bits and zero POC syntax bits, advances the bitstream pointer by zero past `slice_type`, lands on garbage, decodes zero pixels into the CAPTURE buffer.
**Patch 0008's open question** — "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" — is **empirically answered: NO, hantro does not tolerate zeros**. The patch left this unresolved; the kernel source resolves it definitively (registers read, hardware enforces). The "contract before code" discipline was violated in the predecessor work — the contract (these fields drive hardware MMIO writes) was not actually verified against kernel source, and "Empirical question" was left as an unaudited TODO.
This explains the all-zero CAPTURE output we see across mpv and Firefox.
## All hantro G1 dec_param reads
From `hantro_g1_h264_dec.c::set_params()`:
| Field | Our coverage | Status |
|---|---|---|
| `dec_param->nal_ref_idc` | ✓ set from slice byte (line 378) | ok |
| `dec_param->frame_num` | ✓ set from `VAPicture->frame_num` (line 379) | ok |
| `dec_param->top_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) |
| `dec_param->bottom_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) |
| `dec_param->dec_ref_pic_marking_bit_size` | ✗ **zero — written to MMIO REFPIC_MK_LEN** | **FIX REQUIRED** |
| `dec_param->idr_pic_id` | ✗ **zero — written to MMIO IDR_PIC_ID** | **FIX REQUIRED** |
| `dec_param->pic_order_cnt_bit_size` | ✗ **zero — written to MMIO POC_LENGTH** | **FIX REQUIRED** |
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC` | ✓ set from VAAPI field_pic_flag | ok |
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD` | ✓ set from VAAPI flags | ok |
| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC` | ✓ set when nal_unit_type == 5 | ok |
| `dec_param->dpb[].reference_ts` | populated by patch 0017 | verify against FFmpeg's `ff_v4l2_request_get_capture_timestamp` semantics |
| `dec_param->dpb[].frame_num` | populated by patch 0017 | ok |
## All hantro G1 sps reads
| sps field | Our coverage | Status |
|---|---|---|
| `sps->profile_idc` | ✓ set from VAProfile via `h264_profile_to_idc()` | verify h264_profile_to_idc returns reasonable values |
| `sps->max_num_ref_frames` | ✓ set from `VAPicture->num_ref_frames` | ok |
| `sps->chroma_format_idc` | ✓ set from `VAPicture->seq_fields` | ok |
| `sps->log2_max_frame_num_minus4` | ✓ set from `VAPicture->seq_fields` | ok |
| `sps->flags` (bits: `MB_ADAPTIVE_FRAME_FIELD`, `FRAME_MBS_ONLY`, `DIRECT_8X8_INFERENCE`, etc.) | ✓ set from VAAPI seq_fields | ok |
## All hantro G1 pps reads
| pps field | Our coverage | Status |
|---|---|---|
| `pps->chroma_qp_index_offset` | ✓ | ok |
| `pps->second_chroma_qp_index_offset` | ✓ | ok |
| `pps->pic_init_qp_minus26` | ✓ | ok |
| `pps->weighted_bipred_idc` | ✓ | ok |
| `pps->pic_parameter_set_id` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_PPS_ID` | For BBB single-PPS streams, ID=0 is accidentally correct; **FIX (set explicitly to 0 or bit-parse from slice header) for general correctness** |
| `pps->num_ref_idx_l0_default_active_minus1` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(... + 1)` so kernel computes `0 + 1 = 1` | **FIX REQUIRED** — VAAPI `VAPictureParameterBufferH264.num_ref_idx_l0_default_active_minus1` exists, just copy |
| `pps->num_ref_idx_l1_default_active_minus1` | ✗ **zero (uninitialized)** | **FIX REQUIRED** — same as above |
| `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` | ✗ Patch 0012 sets only when `matrix_set` is true; **FFmpeg always sets it** (with a "FFmpeg always provide a scaling matrix" comment) | **FIX REQUIRED** — set unconditionally, and unconditionally send SCALING_MATRIX in the request (with default flat 16/16 matrix when no VAIQMatrixBuffer arrives) |
| `pps->flags` other bits | ✓ set from VAAPI pic_fields | ok |
## All FFmpeg per-request controls vs ours
FFmpeg's `v4l2_request_h264_queue_decode` for FRAME_BASED mode sends 4 controls:
1. `V4L2_CID_STATELESS_H264_SPS`
2. `V4L2_CID_STATELESS_H264_PPS`
3. `V4L2_CID_STATELESS_H264_SCALING_MATRIX`**always**
4. `V4L2_CID_STATELESS_H264_DECODE_PARAMS`
Our `src/h264.c` for FRAME_BASED sends 3 controls (when `matrix_set == false`):
1. SPS
2. PPS
3. DECODE_PARAMS
(SCALING_MATRIX omitted by patch 0012)
This matters because hantro reads `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` (line 64 of `hantro_g1_h264_dec.c`) and sets `G1_REG_DEC_CTRL2_TYPE1_QUANT_E` based on it. If we *don't* set the flag and *don't* send the matrix, hantro's `TYPE1_QUANT_E` register bit stays clear and the hardware skips the per-matrix scaling. That's probably *correct* behavior actually — but for safety + parity with FFmpeg, set it always.
The bigger ambiguity: does hantro's hardware tolerate `SCALING_MATRIX_PRESENT=0` + missing matrix control, OR does it expect the matrix to always be present? FFmpeg's "always provide a scaling matrix" comment suggests the latter is the safer assumption.
## Fix priority
**TIER 1 — easy, highest leverage, do first**:
A. **Set `dec_param->idr_pic_id`, `dec_ref_pic_marking_bit_size`, `pic_order_cnt_bit_size` correctly.** Requires bit-parsing the slice header. ~50100 lines of careful C in a new helper, e.g. `h264_slice_header_extract_bitsizes()`. Inputs: slice_data buffer, `sps_id`, sps fields (`log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`), `pps_id`, pps fields (`bottom_field_pic_order_in_frame_present_flag`), `nal_unit_type`, `nal_ref_idc`. Outputs: `idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_bottom`, `delta_pic_order_cnt[0..1]`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size`. **This is the load-bearing fix.**
B. **Copy `pps->num_ref_idx_l0_default_active_minus1` and `num_ref_idx_l1_default_active_minus1` from VAAPI.** 2-line fix. Cheap insurance.
**TIER 2 — easy, parity-with-FFmpeg, safe**:
C. **Set `pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` always.** And unconditionally send `V4L2_CID_STATELESS_H264_SCALING_MATRIX` in the request, with a default flat matrix (all 16/16/...) when no VAIQMatrixBuffer arrived from the consumer.
D. **Verify patch 0015's POC sentinel strip is actually applied** to the values that reach VIDIOC_S_EXT_CTRLS. Add a post-strip dump or VIDIOC_G_EXT_CTRLS readback — patch 0014's DEBUG dump runs *before* the strip in the code path, so we never confirmed the strip works.
**TIER 3 — diagnostic plumbing (can run in parallel with the above)**:
E. **Fix patch 0011's cache coherency bug** (msync MS_INVALIDATE or DMA-BUF cache sync ioctl) so future probes are reliable.
F. **Add a VIDIOC_G_EXT_CTRLS readback** right before `MEDIA_REQUEST_IOC_QUEUE` to dump everything actually being submitted. Cross-validates fixes A-D land at the V4L2 layer.
## Order of attack
1. **Tier 2C first** (1-line fix to patch 0012 + small change to control-set composition) — cheapest possible test that disambiguates "scaling matrix is the only thing missing" from "we have multiple bugs."
2. **Tier 1B** (2-line copy of num_ref_idx fields).
3. Rebuild, reinstall on ohm, real-VO test. Check whether output is still all-zero or starts producing something.
4. **Tier 1A** (the slice header bit parser) — the load-bearing fix. Requires care; cross-test against ffmpeg's `H264SliceContext::ref_pic_marking_bit_size` etc. computations.
5. Rebuild, real-VO test. Bunny should appear.
6. **Tier 3E + 3F** as observability infrastructure for future probes.
## Why the predecessor work missed this
The Step 1 patch series was authored with cross-references to FFmpeg's `v4l2_request_h264.c` per the `STUDY.md`, but the **kernel source-side audit was not performed**. Patch 0008's commit message says "Empirical question: does hantro tolerate the bit_size fields being zero?" — the empirical answer was supposedly to be obtained "by running this patch and inspecting the CAPTURE buffer" (per `picture.c` comment). That inspection was performed via patch 0011's sentinel test, which had a cache-coherency bug (`phase0_evidence/2026-05-04-kernel-trace/findings.md`), so the empirical answer never landed. Plus "kernel source side audit" wasn't a step in the loop at that point.
The lesson: when the kernel code reads V4L2 control fields directly into MMIO registers, those fields are **load-bearing** at the contract level. Their values aren't merely "informational" — the hardware parses bitstream based on them. Setting them to zero isn't "passing through unspecified data," it's "lying to the hardware about how many bits to skip."
Adding to memory under `feedback_kernel_source_audit_for_uapi_contract.md` for the broader Phase 6 "contract before code" discipline.