Files
ampere-kernel-decoders/iter4_close.md
T
Markus Fritsche 46c956bd51 iter4 close — second kernel bug: missing HEVC_SLICE_PARAMS registration
Casanova/Collabora v7.0 HEVC series forgot to register
V4L2_CID_STATELESS_HEVC_SLICE_PARAMS in vdpu38x_hevc_ctrl_descs[].
The legacy rkvdec_hevc_ctrl_descs[] (RK3399 path) has it; the new
vdpu381/vdpu383 path doesn't. Every per-frame S_EXT_CTRLS fails
with EINVAL ("cannot find control id 0xa40a92").

Surfaced via dev_debug=0x3f on /sys/class/video4linux/videoN —
prepare_ext_ctrls's "cannot find" dprintk is gated behind
V4L2_DEV_DEBUG_CTRL (bit 0x20), invisible by default.

1-line patch (5 lines with formatting) mirrors the legacy entry:
SLICE_PARAMS as DYNAMIC_ARRAY, dims={600} (HEVC level >6 max).

Verified on ampere: no EINVAL, no dmesg errors, ffmpeg exit 0,
3-frame NV12 output structurally valid. But output bytes are all
Y=16/Cb=Cr=128 (solid black) — separate downstream bitstream-
feeding bug, deferred to iter5.

Iter5 starts with LIBVA_V4L2_DUMP_OUTPUT to confirm whether the
OUTPUT bitstream is reaching the kernel correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 11:18:04 +00:00

106 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iter4 close — second kernel bug: missing HEVC_SLICE_PARAMS registration
Date: 2026-05-16 (afternoon, immediately following iter3 close)
Branch: `master`
Substrate: ampere `7.0.0-rc3-devices+` with iter3 fix (ext_sps NULL init) carried in.
Backend: iter3 instrumented build, md5 `404041ea2dcc03c769e0ab8c43ddadd6`, deployed at `/usr/lib/dri/`.
## Bottom line
**The Casanova/Collabora v7.0 HEVC series forgot to register `V4L2_CID_STATELESS_HEVC_SLICE_PARAMS` in the new `vdpu38x_hevc_ctrl_descs[]` table.** The legacy `rkvdec_hevc_ctrl_descs[]` (RK3399 path) has it; the new vdpu381/vdpu383 path doesn't. Result: every per-frame `VIDIOC_S_EXT_CTRLS` returns `-EINVAL` ("cannot find control id 0xa40a92") and userspace falls through to queue requests with no controls committed → decoder runs on zero-init control state → all-zero output (or worse, OOPSes on uninit memory before iter3 fix).
## Falsifier outcome
F1 (kernel rejects 5-ctrl batch with EINVAL): **TRUE pre-patch** — confirmed by enabling `V4L2_DEV_DEBUG_CTRL` (bit 0x20) on `/sys/class/video4linux/videoN/dev_debug`, which surfaced the previously-silent `prepare_ext_ctrls: cannot find control id 0xa40a92` dprintk.
F2 (registering HEVC_SLICE_PARAMS in `vdpu38x_hevc_ctrl_descs` makes the batch accept): **FALSE → TRUE** — 1-line patch (5 source lines with formatting) eliminated the EINVAL. ffmpeg exit 0, dmesg fully clean of `S_EXT_CTRLS: error` and `cannot find control id`. Decoder runs.
F3 (decoder produces non-empty output post-patch): **FALSE** — output `/tmp/o.nv12` is 4147200 bytes (correct 3×NV12 frames) but contains only Y=16 (luma "video black") and Cb/Cr=128 (chroma neutral) — solid black. Decoder runs but bitstream isn't being interpreted. This is the iter5 hand-off bug.
## Root cause (iter4 Phase 6)
`drivers/media/platform/rockchip/rkvdec/rkvdec.c` has two HEVC ctrl_descs arrays:
| array | line | registers SLICE_PARAMS? |
|-------|------|-------------------------|
| `rkvdec_hevc_ctrl_descs[]` (legacy RK3399 path) | 189 | YES — dynamic array, dims={600} |
| `vdpu38x_hevc_ctrl_descs[]` (Casanova RK3588/RK3576 path) | ~240 | **NO** |
Both are passed to the same `rkvdec_hevc_run_preamble` (rkvdec-hevc-common.c:478) which calls `v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_HEVC_SLICE_PARAMS)`. With the new table, the ctrl isn't in the handler — userspace `VIDIOC_S_EXT_CTRLS` for this CID fails in `prepare_ext_ctrls` → return -EINVAL → kernel sets `error_idx = cs->count` (since `set=true`) → backend sees `error_idx=5, count=5, err=-22`.
The reason this stayed silent in earlier debugging: dprintks for `cannot find control id 0x%x` are gated behind `V4L2_DEV_DEBUG_CTRL = 0x20`. Default `/sys/.../dev_debug` is 0 — no ctrl-class dprintks. Setting `0x3f` (all 6 bits) on every video device surfaced the lookup failure immediately.
## Minimal kernel patch (verified working)
```diff
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -242,6 +242,12 @@ static const struct rkvdec_ctrl_desc vdpu38x_hevc_ctrl_descs[] = {
{
.cfg.id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
},
+ {
+ .cfg.id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
+ .cfg.flags = V4L2_CTRL_FLAG_DYNAMIC_ARRAY,
+ .cfg.type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS,
+ .cfg.dims = { 600 },
+ },
{
.cfg.id = V4L2_CID_STATELESS_HEVC_SPS,
.cfg.ops = &rkvdec_ctrl_ops,
```
Mirror of the legacy `rkvdec_hevc_ctrl_descs[]` entry. `600` is the absolute maximum slices per frame for HEVC level > 6 (matches `visl` and legacy rkvdec).
## Verification (on-target empirical)
```
$ ssh ampere 'sudo dmesg --clear; LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i bbb_60s_720p.hevc.mp4 -vf hwdownload,format=nv12 -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/o.nv12; echo exit=$?'
exit=0
$ ssh ampere 'sudo dmesg | grep -E "rkvdec|cannot find|S_EXT_CTRLS: error"'
[Sat May 16 13:17:08 2026] rkvdec fdc40100.video-codec: missing multi-core support, ignoring this instance
$ ssh ampere 'ls -la /tmp/o.nv12; md5sum /tmp/o.nv12; head -c 4147200 /tmp/o.nv12 | od -An -tu1 -w1 | sort -u'
-rw-r--r-- 1 mfritsche mfritsche 4147200 May 16 13:17 /tmp/o.nv12
25ae521379343783da65b1fc80b1e8e8 /tmp/o.nv12
16
128
```
No dmesg errors. No EINVAL. ffmpeg exit 0. 3-frame NV12 output (correct size). All bytes are 16 (Y) or 128 (Cb/Cr) — solid black, but a structurally-valid decode (no OOPS, no truncation).
## Why output is still black (deferred to iter5)
Possible causes for iter5 to investigate:
1. **OUTPUT bitstream not reaching hardware** — backend assembles slice NALs into `source_data`, but maybe slices_size or QBUF length is wrong → hardware reads empty buffer → produces blank frame.
2. **Slice header field mismatch** — backend's `h265_fill_slice_params` may put bit_size/data_byte_offset/slice_segment_addr in fields the kernel doesn't expect. Strace shows `bit_size=0x1038` (519 bytes), `data_byte_offset=17` — plausible but unverified against the actual NAL.
3. **start_code prefix handling** — backend prepends Annex-B `00 00 00 01` when `h264_start_code=true`. For HEVC under DECODE_MODE_FRAME_BASED + START_CODE_ANNEX_B (both registered in vdpu38x_hevc_ctrl_descs), this should match — but the iter2 backend used `h264_start_code` as a profile-independent flag (per `feedback_unconditional_codec_state`); verify it gates correctly for HEVC.
4. **DECODE_PARAMS dpb/poc fields** — for IDR frame 1, dpb should be empty (num_active_dpb_entries=0), num_poc_st_curr_before/after/lt_curr=0. If backend sets non-zero, kernel may interpret as needing references that don't exist.
iter5 starts with: enable `LIBVA_V4L2_DUMP_OUTPUT=<dir>` to capture the per-frame OUTPUT bitstream bytes, diff against the input HEVC stream's raw NALs to confirm the bitstream is being forwarded correctly. From there, branch into (2)/(3)/(4) depending on findings.
## Phase 6 question completion (iter4)
| Q | Answer |
|---|--------|
| Q1 — empirical: validate_sps fires per-frame? | NO — fires twice (CreateContext dummy + rkvdec_hevc_start), NOT per-frame. Rules out validate_sps as the EINVAL source. |
| Q2a/b — which check fails | Neither validate_sps nor validate_new. Failure is in `prepare_ext_ctrls`'s `find_ref_lock` for `0xa40a92` (HEVC_SLICE_PARAMS) which isn't registered. |
| Q3 — request-API extra steps | Not the issue. The clone path replicates whichever ctrls are registered in master, so missing SLICE_PARAMS propagates. |
| Q4 — st_rps_bits field mapping | Not relevant to this iteration — iter4's bug is upstream of EXT_SPS_*_RPS handling. iter5 may revisit. |
## Substrate state at close
- Backend `.so`: unchanged (md5 `404041ea2dcc03c769e0ab8c43ddadd6`)
- Kernel module: includes both iter3 fix (`run->ext_sps_st_rps/lt_rps = NULL` in preamble) AND iter4 fix (HEVC_SLICE_PARAMS registered in vdpu38x_hevc_ctrl_descs)
- diagnostic `pr_warn` from iter4 Phase 6 still present in `rkvdec_hevc_validate_sps` — harmless, fires twice per session
- Both kernel fixes need filing as separate kernel-agent issues against Casanova v7.0 series (iter3 → kernel-agent#14 (filed); iter4 → kernel-agent#15 (TBD))
- diagnostic `0x3f` on `/sys/.../dev_debug` should be reset to 0 for production (`echo 0 | sudo tee /sys/class/video4linux/video*/dev_debug`)
## Iter4 takeaway
The 8-phase loop's Phase 6 question-driven instrumentation (Q1 empirical validate_sps trace) worked again: pr_warn falsified the assumed culprit immediately, redirecting attention to the dprintk-gated `prepare_ext_ctrls: cannot find control id` log that revealed the actual missing registration. Total iter4 wall-clock: ~30 min from Phase 0 lock-in to verified fix.
Iter5 picks up the "decoder runs but output is solid black" downstream bug.