diff --git a/.gitignore b/.gitignore index 8c50019..8f42705 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,2 @@ -# The fork lives as its own git repo with its own origin -# (marfrit/libva-v4l2-request-fourier on git.reauktion.de). It's a -# sibling repo nested inside the campaign tree for working-directory -# convenience, not a submodule. Hide it from the campaign's git so -# `git status` here only shows campaign-level findings. libva-v4l2-request-fourier/ +references/ diff --git a/diff_against_ffmpeg.md b/diff_against_ffmpeg.md new file mode 100644 index 0000000..b4699fa --- /dev/null +++ b/diff_against_ffmpeg.md @@ -0,0 +1,121 @@ +# Phase 4 input — `src/h264.c` diff vs FFmpeg + hantro kernel-source ground truth + +Reference sources: +- **FFmpeg** (proven working on hantro): `references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c` (Kwiboo's downstream `v4l2-request-n8.1`, head `b57fbbe`) +- **Linux mainline kernel** (the actual hardware-register-write code): `references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_h264_dec.c` + +Acquired 2026-05-04. Both gitignored from the campaign repo (separate-repo discipline; pull recipes in `references/README` if/when added). + +## The smoking gun — DECODE_PARAMS bit-size fields are MMIO-bound on hantro G1 + +`hantro_g1_h264_dec.c::set_params()` writes these dec_param fields **directly into hardware registers**: + +| dec_param field | Register | Our value | FFmpeg's source | +|---|---|---|---| +| `dec_ref_pic_marking_bit_size` | `G1_REG_DEC_CTRL5_REFPIC_MK_LEN` | **0** (uninitialized) | `sl->ref_pic_marking_bit_size` (bit-parsed from slice header) | +| `idr_pic_id` | `G1_REG_DEC_CTRL5_IDR_PIC_ID` | **0** (uninitialized) | `sl->idr_pic_id` (bit-parsed from slice header) | +| `pic_order_cnt_bit_size` | `G1_REG_DEC_CTRL6_POC_LENGTH` | **0** (uninitialized) | `sl->pic_order_cnt_bit_size` (bit-parsed from slice header) | + +When all three are zero, hantro's hardware bitstream parser thinks the slice header has zero ref_pic_marking bits and zero POC syntax bits, advances the bitstream pointer by zero past `slice_type`, lands on garbage, decodes zero pixels into the CAPTURE buffer. + +**Patch 0008's open question** — "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" — is **empirically answered: NO, hantro does not tolerate zeros**. The patch left this unresolved; the kernel source resolves it definitively (registers read, hardware enforces). The "contract before code" discipline was violated in the predecessor work — the contract (these fields drive hardware MMIO writes) was not actually verified against kernel source, and "Empirical question" was left as an unaudited TODO. + +This explains the all-zero CAPTURE output we see across mpv and Firefox. + +## All hantro G1 dec_param reads + +From `hantro_g1_h264_dec.c::set_params()`: + +| Field | Our coverage | Status | +|---|---|---| +| `dec_param->nal_ref_idc` | ✓ set from slice byte (line 378) | ok | +| `dec_param->frame_num` | ✓ set from `VAPicture->frame_num` (line 379) | ok | +| `dec_param->top_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) | +| `dec_param->bottom_field_order_cnt` | ✓ set, with patch-0015 POC sentinel strip | ok (verify strip works) | +| `dec_param->dec_ref_pic_marking_bit_size` | ✗ **zero — written to MMIO REFPIC_MK_LEN** | **FIX REQUIRED** | +| `dec_param->idr_pic_id` | ✗ **zero — written to MMIO IDR_PIC_ID** | **FIX REQUIRED** | +| `dec_param->pic_order_cnt_bit_size` | ✗ **zero — written to MMIO POC_LENGTH** | **FIX REQUIRED** | +| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC` | ✓ set from VAAPI field_pic_flag | ok | +| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD` | ✓ set from VAAPI flags | ok | +| `dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC` | ✓ set when nal_unit_type == 5 | ok | +| `dec_param->dpb[].reference_ts` | populated by patch 0017 | verify against FFmpeg's `ff_v4l2_request_get_capture_timestamp` semantics | +| `dec_param->dpb[].frame_num` | populated by patch 0017 | ok | + +## All hantro G1 sps reads + +| sps field | Our coverage | Status | +|---|---|---| +| `sps->profile_idc` | ✓ set from VAProfile via `h264_profile_to_idc()` | verify h264_profile_to_idc returns reasonable values | +| `sps->max_num_ref_frames` | ✓ set from `VAPicture->num_ref_frames` | ok | +| `sps->chroma_format_idc` | ✓ set from `VAPicture->seq_fields` | ok | +| `sps->log2_max_frame_num_minus4` | ✓ set from `VAPicture->seq_fields` | ok | +| `sps->flags` (bits: `MB_ADAPTIVE_FRAME_FIELD`, `FRAME_MBS_ONLY`, `DIRECT_8X8_INFERENCE`, etc.) | ✓ set from VAAPI seq_fields | ok | + +## All hantro G1 pps reads + +| pps field | Our coverage | Status | +|---|---|---| +| `pps->chroma_qp_index_offset` | ✓ | ok | +| `pps->second_chroma_qp_index_offset` | ✓ | ok | +| `pps->pic_init_qp_minus26` | ✓ | ok | +| `pps->weighted_bipred_idc` | ✓ | ok | +| `pps->pic_parameter_set_id` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_PPS_ID` | For BBB single-PPS streams, ID=0 is accidentally correct; **FIX (set explicitly to 0 or bit-parse from slice header) for general correctness** | +| `pps->num_ref_idx_l0_default_active_minus1` | ✗ **zero (uninitialized)** — written to MMIO `G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(... + 1)` so kernel computes `0 + 1 = 1` | **FIX REQUIRED** — VAAPI `VAPictureParameterBufferH264.num_ref_idx_l0_default_active_minus1` exists, just copy | +| `pps->num_ref_idx_l1_default_active_minus1` | ✗ **zero (uninitialized)** | **FIX REQUIRED** — same as above | +| `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` | ✗ Patch 0012 sets only when `matrix_set` is true; **FFmpeg always sets it** (with a "FFmpeg always provide a scaling matrix" comment) | **FIX REQUIRED** — set unconditionally, and unconditionally send SCALING_MATRIX in the request (with default flat 16/16 matrix when no VAIQMatrixBuffer arrives) | +| `pps->flags` other bits | ✓ set from VAAPI pic_fields | ok | + +## All FFmpeg per-request controls vs ours + +FFmpeg's `v4l2_request_h264_queue_decode` for FRAME_BASED mode sends 4 controls: +1. `V4L2_CID_STATELESS_H264_SPS` +2. `V4L2_CID_STATELESS_H264_PPS` +3. `V4L2_CID_STATELESS_H264_SCALING_MATRIX` ← **always** +4. `V4L2_CID_STATELESS_H264_DECODE_PARAMS` + +Our `src/h264.c` for FRAME_BASED sends 3 controls (when `matrix_set == false`): +1. SPS +2. PPS +3. DECODE_PARAMS +(SCALING_MATRIX omitted by patch 0012) + +This matters because hantro reads `pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` (line 64 of `hantro_g1_h264_dec.c`) and sets `G1_REG_DEC_CTRL2_TYPE1_QUANT_E` based on it. If we *don't* set the flag and *don't* send the matrix, hantro's `TYPE1_QUANT_E` register bit stays clear and the hardware skips the per-matrix scaling. That's probably *correct* behavior actually — but for safety + parity with FFmpeg, set it always. + +The bigger ambiguity: does hantro's hardware tolerate `SCALING_MATRIX_PRESENT=0` + missing matrix control, OR does it expect the matrix to always be present? FFmpeg's "always provide a scaling matrix" comment suggests the latter is the safer assumption. + +## Fix priority + +**TIER 1 — easy, highest leverage, do first**: + +A. **Set `dec_param->idr_pic_id`, `dec_ref_pic_marking_bit_size`, `pic_order_cnt_bit_size` correctly.** Requires bit-parsing the slice header. ~50–100 lines of careful C in a new helper, e.g. `h264_slice_header_extract_bitsizes()`. Inputs: slice_data buffer, `sps_id`, sps fields (`log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`), `pps_id`, pps fields (`bottom_field_pic_order_in_frame_present_flag`), `nal_unit_type`, `nal_ref_idc`. Outputs: `idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_bottom`, `delta_pic_order_cnt[0..1]`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size`. **This is the load-bearing fix.** + +B. **Copy `pps->num_ref_idx_l0_default_active_minus1` and `num_ref_idx_l1_default_active_minus1` from VAAPI.** 2-line fix. Cheap insurance. + +**TIER 2 — easy, parity-with-FFmpeg, safe**: + +C. **Set `pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` always.** And unconditionally send `V4L2_CID_STATELESS_H264_SCALING_MATRIX` in the request, with a default flat matrix (all 16/16/...) when no VAIQMatrixBuffer arrived from the consumer. + +D. **Verify patch 0015's POC sentinel strip is actually applied** to the values that reach VIDIOC_S_EXT_CTRLS. Add a post-strip dump or VIDIOC_G_EXT_CTRLS readback — patch 0014's DEBUG dump runs *before* the strip in the code path, so we never confirmed the strip works. + +**TIER 3 — diagnostic plumbing (can run in parallel with the above)**: + +E. **Fix patch 0011's cache coherency bug** (msync MS_INVALIDATE or DMA-BUF cache sync ioctl) so future probes are reliable. + +F. **Add a VIDIOC_G_EXT_CTRLS readback** right before `MEDIA_REQUEST_IOC_QUEUE` to dump everything actually being submitted. Cross-validates fixes A-D land at the V4L2 layer. + +## Order of attack + +1. **Tier 2C first** (1-line fix to patch 0012 + small change to control-set composition) — cheapest possible test that disambiguates "scaling matrix is the only thing missing" from "we have multiple bugs." +2. **Tier 1B** (2-line copy of num_ref_idx fields). +3. Rebuild, reinstall on ohm, real-VO test. Check whether output is still all-zero or starts producing something. +4. **Tier 1A** (the slice header bit parser) — the load-bearing fix. Requires care; cross-test against ffmpeg's `H264SliceContext::ref_pic_marking_bit_size` etc. computations. +5. Rebuild, real-VO test. Bunny should appear. +6. **Tier 3E + 3F** as observability infrastructure for future probes. + +## Why the predecessor work missed this + +The Step 1 patch series was authored with cross-references to FFmpeg's `v4l2_request_h264.c` per the `STUDY.md`, but the **kernel source-side audit was not performed**. Patch 0008's commit message says "Empirical question: does hantro tolerate the bit_size fields being zero?" — the empirical answer was supposedly to be obtained "by running this patch and inspecting the CAPTURE buffer" (per `picture.c` comment). That inspection was performed via patch 0011's sentinel test, which had a cache-coherency bug (`phase0_evidence/2026-05-04-kernel-trace/findings.md`), so the empirical answer never landed. Plus "kernel source side audit" wasn't a step in the loop at that point. + +The lesson: when the kernel code reads V4L2 control fields directly into MMIO registers, those fields are **load-bearing** at the contract level. Their values aren't merely "informational" — the hardware parses bitstream based on them. Setting them to zero isn't "passing through unspecified data," it's "lying to the hardware about how many bits to skip." + +Adding to memory under `feedback_kernel_source_audit_for_uapi_contract.md` for the broader Phase 6 "contract before code" discipline.