Reading v4l2-core/v4l2-ctrls-api.c and v4l2-ctrls-core.c on the cloned linux-pinetab2 v6.19.10-danctnix1 source: error_idx == count for S_EXT_CTRLS is intentional kernel obfuscation, not under-reporting. Line 629 deliberately overwrites error_idx with cs->count after validate_ctrls failures in set mode, forcing the caller to bail rather than partial-set. The escape hatch is VIDIOC_TRY_EXT_CTRLS, which "never modifies controls [so] error_idx is just set to whatever control has an invalid value" (quoting v4l2-ctrls-api.c:222-224). Path forward into Phase 4: amend Y2 instrumentation to retry with TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL, extract the actual failing control index. From there, narrow the failing field by comparing frame-11 values against frames 1-10. Phase 3 baseline anchored from iter3 Phase 7 — same rig, same EINVAL, deterministic. No re-acquire needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.3 KiB
Iteration 4 — Phase 2 (situation analysis: kernel V4L2 control validation)
Goal: identify what kernel layer rejects our 11th-frame VIDIOC_S_EXT_CTRLS and how to learn WHICH control is bad. iter3 surfaced the EINVAL with error_idx == num_controls, and iter4's Phase 1 lock points at this defect as the binding question.
Source: linux-pinetab2 v6.19.10-danctnix1 cloned to /build/linux-pinetab2/ on the boltzmann firefox-fourier container. Hantro driver lives at drivers/media/platform/verisilicon/ (moved out of staging by 6.19). Generic V4L2 H.264 helpers at drivers/media/v4l2-core/v4l2-h264.c.
Finding 1 — error_idx == count is intentional kernel obfuscation, not under-reporting
v4l2-ctrls-api.c:629 (within try_set_ext_ctrls_common):
ret = prepare_ext_ctrls(hdl, cs, helpers, vdev, false);
if (!ret)
ret = validate_ctrls(cs, helpers, vdev, set);
if (ret && set)
cs->error_idx = cs->count;
If validation fails AND we're calling S_EXT_CTRLS (not TRY_EXT_CTRLS), the kernel deliberately overwrites error_idx with count, forcing the caller to bail cleanly rather than try to partially fix the bad control. The actual failing control is known to validate_ctrls (it set error_idx = i in its loop) but is hidden from the S_EXT caller.
Author comment at lines 209-225 (verbatim):
"It is all fairly theoretical, though. In practice all you can do is to bail out. If error_idx == count, then it is an application bug. ... Note that these rules do not apply to VIDIOC_TRY_EXT_CTRLS: since that never modifies controls the error_idx is just set to whatever control has an invalid value."
So our iter3 Y2 diagnostic was logging exactly what the kernel intended — useless. The escape hatch is VIDIOC_TRY_EXT_CTRLS, which never modifies controls and DOES report the specific failing control.
Finding 2 — H.264 stateless control validators (per-control)
v4l2-ctrls-core.c::std_validate_compound() validates each compound control individually. For our four controls, the per-control validators check (from v4l2-ctrls-core.c:1031-1180):
SPS rejects on:
profile_idc < 122 && chroma_format_idc > 1profile_idc < 244 && chroma_format_idc > 2chroma_format_idc > 3bit_depth_luma_minus8 > 6orbit_depth_chroma_minus8 > 6log2_max_frame_num_minus4 > 12pic_order_cnt_type > 2log2_max_pic_order_cnt_lsb_minus4 > 12max_num_ref_frames > V4L2_H264_REF_LIST_LEN
PPS rejects on:
num_slice_groups_minus1 > 7num_ref_idx_l0_default_active_minus1 > (V4L2_H264_REF_LIST_LEN - 1)
DECODE_PARAMS rejects on:
nal_ref_idc > 3
SCALING_MATRIX: no reject path (just zero-pad).
Frames 1–10 pass these validators (they're the same SPS/PPS/decode-mode constants). Frame 11 must violate one of them OR fail somewhere else (e.g. a request_fd state precondition, cluster-internal coupling).
Finding 3 — Hantro driver-level validation runs after S_EXT_CTRLS
hantro_h264.c::hantro_h264_dec_prepare_run() is called from the V4L2 m2m worker, post-MEDIA_REQUEST_IOC_QUEUE. Its EINVAL paths (lines 449/454/459/464) are NULL-checks for missing controls — they fire if the request didn't carry one of the four required H.264 controls. NOT the iter1+2+3 carryover path.
Conclusion: the failure is in the V4L2 control-handler validate_ctrls for one of our four compound controls. Per-control validators (Finding 2) don't seem to fit since fields 1–10 use the same constants, but error_idx was hidden by the obfuscation rule.
Finding 4 — VIDIOC_TRY_EXT_CTRLS is the diagnostic escape hatch
Per the kernel comment, TRY_EXT_CTRLS ALWAYS reports the specific failing control's index in error_idx, even when set=true would have hidden it. Path forward:
- Amend our driver's Y2 instrumentation: on
VIDIOC_S_EXT_CTRLSreturning -EINVAL witherror_idx == num_controls, retry the same controls viaVIDIOC_TRY_EXT_CTRLSto extract the real failing index. - Log the precise failing control + its raw fields.
- From that, identify the specific field-value combination on frame 11 that the per-control validator rejects.
This is a one-liner in v4l2.c: add another ioctl(video_fd, VIDIOC_TRY_EXT_CTRLS, &controls) call inside the existing EINVAL diagnostic block, then log controls.error_idx from the TRY result.
Implication for Phase 4
Phase 4's plan needs amendment: before any driver-side fix can be authored, we need the precise failing control + field. Steps:
- Y2 v3: add TRY_EXT_CTRLS retry on S_EXT_CTRLS EINVAL.
- Rebuild driver on ohm (~30 sec via meson+ninja).
- Re-run autonomous Phase 7 (
/tmp/run_phase7_v2.sh). - Read the new diagnostic — now naming the specific control.
- Compare frame-11 field values vs frames 1–10 to localize the bad field.
- Fix in driver. Most likely surfaces are: SPS field that wasn't properly latched after IDR, PPS field that flipped at frame 5, DECODE_PARAMS DPB entry with malformed
frame_num/pic_num/flags.
Phase 3 (baseline anchor) is satisfied by iter3's Phase 7 run — same rig, same EINVAL, deterministic.
Stop point
Phase 2 closed. Next: Phase 4 step 1 (Y2 v3 with TRY_EXT_CTRLS retry diagnostic). Phase 3 anchored from iter3. Per the "Stop only if user is needed" rule, no user input required to proceed — but this is a cycle of read→speculate→test that may take several iterations.