iter4 Phase 0: HEVC per-frame S_EXT_CTRLS EINVAL substrate
iter3 1-line kernel fix eliminated the OOPS. Now diagnosing why the 5-control batch (SPS PPS SLICE_PARAMS SCALING_MATRIX DECODE_PARAMS) returns EINVAL with error_idx=count=5 → all-zero output. Locked-in evidence: control sizes match kernel-expected elem_size for every CID. SPS values strace-decoded to (chroma=1, bit_depth=0, 1280x720) all pass validate_sps numerically. coded_fmt from S_FMT trace is 1280x720 S265. validate_new dprintk doesn't fire despite DEV_DEBUG_CTRL=0x20 set → rejection is silent inside try_or_set_cluster's try_ctrl path (rkvdec_hevc_validate_sps). Phase 1 starts with empirical: pr_warn at validate_sps entry to confirm/refute. Instrumented module already built + ready to load post-reboot. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# Phase 0 — iter4 substrate (HEVC per-frame S_EXT_CTRLS EINVAL)
|
||||
|
||||
Opened 2026-05-16 afternoon, immediately following iter3 close. Entry conditions are concrete; this Phase 0 is brief.
|
||||
|
||||
## Research question
|
||||
|
||||
**Which of the five per-frame HEVC controls (SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS) causes the kernel to reject the entire `VIDIOC_S_EXT_CTRLS` batch with `-EINVAL`, and what's the minimal backend or kernel fix to make the batch commit?**
|
||||
|
||||
## Locked-in evidence carried from iter2 + iter3
|
||||
|
||||
| Observation | Source | Status |
|
||||
|-------------|--------|--------|
|
||||
| iter3 1-line kernel patch (`run->ext_sps_st_rps = NULL` in preamble) eliminates the prior OOPS | iter3 verification, kernel-agent#14 comment 623 | confirmed |
|
||||
| ffmpeg exit 0, no `Internal error` in dmesg, `/tmp/o.nv12` = 4147200 bytes (exact NV12 3-frame size) | iter3 close + iter4 reproducer | confirmed |
|
||||
| Output bytes are **all zero** — decoder fills CAPTURE buffers from zero-initialized control state because no controls commit | iter3 close + iter4 strace | confirmed |
|
||||
| Per-frame batch: 5 controls in order SPS(40) PPS(64) SLICE_PARAMS(280) SCALING_MATRIX(1000) DECODE_PARAMS(328) | iter4 strace + dmesg | confirmed |
|
||||
| Backend size fields match kernel-expected `elem_size` for every CID | gcc-compiled `sizeof()` against vendored UAPI headers | confirmed |
|
||||
| Kernel returns `error_idx=count=5` with `-22` (EINVAL) | dmesg `VIDIOC_S_EXT_CTRLS: error -22` | confirmed |
|
||||
| `which=0xf010000` = `V4L2_CTRL_WHICH_REQUEST_VAL` | strace + dmesg | confirmed |
|
||||
| **No `failed to validate control NAME` dprintk fires** despite `dev_debug=0x3f` (bit 5 = V4L2_DEV_DEBUG_CTRL is set) | dmesg | confirmed |
|
||||
| → The EINVAL is therefore NOT from `validate_new` (type_ops->validate); it's silent inside `try_or_set_cluster`'s `try_ctrl` path | source-read v4l2-ctrls-api.c:680-694 | derived |
|
||||
| Only `try_ctrl` op for HEVC controls is `rkvdec_hevc_try_ctrl` → `rkvdec_hevc_validate_sps` (PPS/SLICE/SM/DP `try_ctrl` returns 0) | rkvdec-vdpu381-hevc.c:625 | confirmed |
|
||||
| `rkvdec_hevc_validate_sps` returns `-EINVAL` for: chroma!=1, bit-depth mismatch, bit-depth ∉{0,2}, `sps_width > coded_fmt.width OR sps_height > coded_fmt.height` | rkvdec-vdpu381-hevc.c:515 | confirmed |
|
||||
| Strace-decoded SPS: `chroma_format_idc=1, bit_depth_luma_minus8=0, bit_depth_chroma_minus8=0, pic_width=1280, pic_height=720, flags=0x188` | iter4 strace S_EXT_CTRLS body | confirmed |
|
||||
| Strace-decoded coded_fmt (from S_FMT logs): `width=1280, height=720, format=S265` | dmesg VIDIOC_S_FMT | confirmed |
|
||||
| → All four validate_sps checks numerically PASS for the observed values | source + data | derived (suspicious — bug must be elsewhere) |
|
||||
| iter2 dummy-SPS pre-seed (context.c:235) submits a synthetic HEVC_SPS during `RequestCreateContext` before cap_pool_init, to avoid the EBUSY-on-fmt-change cluster bug | iter2 source | reusable; still firing pre-frame |
|
||||
|
||||
## Substrate
|
||||
|
||||
- Kernel: ampere `7.0.0-rc3-devices+` with iter3 1-line patch (`rkvdec_hevc_run_preamble` initializes ext_sps fields to NULL).
|
||||
- Backend: `libva-v4l2-request-fourier` iter3 instrumented build (md5 `404041ea2dcc03c769e0ab8c43ddadd6`) deployed at `/usr/lib/dri/v4l2_request_drv_video.so` on ampere.
|
||||
- Build host: ampere itself (boltzmann tree is on incompatible branch per iter3 close).
|
||||
- Diagnostic toggles available: `echo 0x3f > /sys/class/video4linux/videoN/dev_debug` enables V4L2 ioctl + ctrl dprintks.
|
||||
- Module-deploy: `make M=drivers/.../rkvdec modules` + scp + `sudo install` to `/lib/modules/<rel>/kernel/...` + depmod + reboot (rmmod blocked when prior decode wedged a thread in D-state holding the refcount).
|
||||
- Reboot caveat: kernel-agent#13 (black-screen on reboot) hit once during iter3 — recovered after waiting.
|
||||
|
||||
## Open questions tabled for Phase 1
|
||||
|
||||
1. **Empirical Q1**: Add `pr_warn` at entry of `rkvdec_hevc_validate_sps` (already done in iter4 instrumented build) — confirm whether it fires per-frame and whether any of the four checks reports unexpected values different from what strace shows the backend submitting. If pr_warn fires and all values pass, validate_sps returns 0 and the EINVAL is elsewhere. If pr_warn does not fire, the rejection is upstream (e.g. `prepare_ext_ctrls`).
|
||||
2. **Q2 (depends on Q1 outcome — branching)**:
|
||||
- **2a** if validate_sps DOES fail one check: the SPS the backend submits is wrong; fix backend's `h265_fill_sps` to match what kernel expects (probably some flag bit or bit-depth interpretation).
|
||||
- **2b** if validate_sps doesn't fail: the EINVAL is upstream — likely in `prepare_ext_ctrls` (size mismatch — already ruled out — or unknown CID) OR per-control loop checks before user_to_new (read-only, grabbed). Need to instrument try_set_ext_ctrls_common to dump where ret first becomes non-zero.
|
||||
3. **Q3**: For the request-API path specifically (`try_set_ext_ctrls_request`), there's an extra `media_request_object_find` + `v4l2_ctrl_request_clone` step. Could one of those fail per-frame? Check via `pr_warn` in those helpers.
|
||||
4. **Q4** (orthogonal): per `feedback_va_st_rps_bits_is_slice_field`, the iter2 backend mapped `short_term_ref_pic_set_size` into `decode_params.short_term_ref_pic_set_size` (later reverted in iter37 to slice_params). Verify the iter2/iter3 build has this correctly placed in slice_params, not decode_params. Decoded slice_params from strace shows the fields explicitly.
|
||||
|
||||
## Phase 0 close
|
||||
|
||||
Substrate locked. Iter3 fix carried in. iter4 instrumented kernel built + scp'd, ready to load post-reboot. Q1 (empirical validate_sps trace) is the cheap gating question — Phase 6 starts there.
|
||||
Reference in New Issue
Block a user