# Iteration 4 — Phase 2 (situation analysis: kernel V4L2 control validation) Goal: identify what kernel layer rejects our 11th-frame `VIDIOC_S_EXT_CTRLS` and how to learn WHICH control is bad. iter3 surfaced the EINVAL with `error_idx == num_controls`, and iter4's Phase 1 lock points at this defect as the binding question. Source: `linux-pinetab2` v6.19.10-danctnix1 cloned to `/build/linux-pinetab2/` on the boltzmann firefox-fourier container. Hantro driver lives at `drivers/media/platform/verisilicon/` (moved out of staging by 6.19). Generic V4L2 H.264 helpers at `drivers/media/v4l2-core/v4l2-h264.c`. ## Finding 1 — `error_idx == count` is intentional kernel obfuscation, not under-reporting `v4l2-ctrls-api.c:629` (within `try_set_ext_ctrls_common`): ```c ret = prepare_ext_ctrls(hdl, cs, helpers, vdev, false); if (!ret) ret = validate_ctrls(cs, helpers, vdev, set); if (ret && set) cs->error_idx = cs->count; ``` If validation fails AND we're calling `S_EXT_CTRLS` (not `TRY_EXT_CTRLS`), the kernel **deliberately overwrites** `error_idx` with `count`, forcing the caller to bail cleanly rather than try to partially fix the bad control. The actual failing control is known to `validate_ctrls` (it set `error_idx = i` in its loop) but is hidden from the S_EXT caller. Author comment at lines 209-225 (verbatim): > "It is all fairly theoretical, though. In practice all you can do is to bail out. **If error_idx == count, then it is an application bug.** ... Note that these rules do not apply to VIDIOC_TRY_EXT_CTRLS: since that never modifies controls the error_idx is just set to whatever control has an invalid value." So our iter3 Y2 diagnostic was logging exactly what the kernel intended — useless. The escape hatch is `VIDIOC_TRY_EXT_CTRLS`, which never modifies controls and DOES report the specific failing control. ## Finding 2 — H.264 stateless control validators (per-control) `v4l2-ctrls-core.c::std_validate_compound()` validates each compound control individually. For our four controls, the per-control validators check (from `v4l2-ctrls-core.c:1031-1180`): **SPS** rejects on: - `profile_idc < 122 && chroma_format_idc > 1` - `profile_idc < 244 && chroma_format_idc > 2` - `chroma_format_idc > 3` - `bit_depth_luma_minus8 > 6` or `bit_depth_chroma_minus8 > 6` - `log2_max_frame_num_minus4 > 12` - `pic_order_cnt_type > 2` - `log2_max_pic_order_cnt_lsb_minus4 > 12` - `max_num_ref_frames > V4L2_H264_REF_LIST_LEN` **PPS** rejects on: - `num_slice_groups_minus1 > 7` - `num_ref_idx_l0_default_active_minus1 > (V4L2_H264_REF_LIST_LEN - 1)` **DECODE_PARAMS** rejects on: - `nal_ref_idc > 3` **SCALING_MATRIX**: no reject path (just zero-pad). Frames 1–10 pass these validators (they're the same SPS/PPS/decode-mode constants). Frame 11 must violate one of them OR fail somewhere else (e.g. a request_fd state precondition, cluster-internal coupling). ## Finding 3 — Hantro driver-level validation runs *after* `S_EXT_CTRLS` `hantro_h264.c::hantro_h264_dec_prepare_run()` is called from the V4L2 m2m worker, post-`MEDIA_REQUEST_IOC_QUEUE`. Its EINVAL paths (lines 449/454/459/464) are NULL-checks for missing controls — they fire if the request didn't carry one of the four required H.264 controls. NOT the iter1+2+3 carryover path. Conclusion: the failure is in the V4L2 control-handler `validate_ctrls` for one of our four compound controls. Per-control validators (Finding 2) don't seem to fit since fields 1–10 use the same constants, but `error_idx` was hidden by the obfuscation rule. ## Finding 4 — `VIDIOC_TRY_EXT_CTRLS` is the diagnostic escape hatch Per the kernel comment, `TRY_EXT_CTRLS` ALWAYS reports the specific failing control's index in `error_idx`, even when set=true would have hidden it. Path forward: 1. Amend our driver's Y2 instrumentation: on `VIDIOC_S_EXT_CTRLS` returning -EINVAL with `error_idx == num_controls`, **retry the same controls via `VIDIOC_TRY_EXT_CTRLS`** to extract the real failing index. 2. Log the precise failing control + its raw fields. 3. From that, identify the specific field-value combination on frame 11 that the per-control validator rejects. This is a one-liner in v4l2.c: add another `ioctl(video_fd, VIDIOC_TRY_EXT_CTRLS, &controls)` call inside the existing EINVAL diagnostic block, then log `controls.error_idx` from the TRY result. ## Implication for Phase 4 Phase 4's plan needs amendment: before any driver-side fix can be authored, we need the precise failing control + field. Steps: 1. Y2 v3: add TRY_EXT_CTRLS retry on S_EXT_CTRLS EINVAL. 2. Rebuild driver on ohm (~30 sec via meson+ninja). 3. Re-run autonomous Phase 7 (`/tmp/run_phase7_v2.sh`). 4. Read the new diagnostic — now naming the specific control. 5. Compare frame-11 field values vs frames 1–10 to localize the bad field. 6. Fix in driver. Most likely surfaces are: SPS field that wasn't properly latched after IDR, PPS field that flipped at frame 5, DECODE_PARAMS DPB entry with malformed `frame_num`/`pic_num`/`flags`. Phase 3 (baseline anchor) is satisfied by iter3's Phase 7 run — same rig, same EINVAL, deterministic. ## Stop point Phase 2 closed. Next: Phase 4 step 1 (Y2 v3 with TRY_EXT_CTRLS retry diagnostic). Phase 3 anchored from iter3. Per the "Stop only if user is needed" rule, no user input required to proceed — but this is a cycle of read→speculate→test that may take several iterations.