## Iteration 24 — Phase 8 (close) Closes 2026-05-14. iter24 = kernel printk logging `req_to_new` and `try_or_set_cluster` return values. FULL close. **ROOT CAUSE IDENTIFIED.** ### Method `linux-fresnel-fourier 7.0-8` (pkgrel 7→8). Added pr_info after each kernel framework call in `v4l2_ctrl_request_setup`'s cluster-process block: ```c ret = req_to_new(r); pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n", master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems); ... ret = try_or_set_cluster(NULL, master, true, 0); pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret); ``` ### Result — definitive **libva HEVC** (all 10+ setups, identical pattern): ``` iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1 iter24_try_or_set: master_id=0xa40a90 ret=-16 iter24_loop_break: at master_id=0xa40a90 ret=-16 iter24_loop_done: final ret=-16 ``` `-16` is `-EBUSY`. `req_to_new` succeeds. `try_or_set_cluster` returns -EBUSY for HEVC_SPS, **exiting the setup loop**. **kdirect HEVC**: continues processing all 5 staged controls successfully (ret=0 throughout). ### Source localization The only -EBUSY path in `try_or_set_cluster` is `call_op(master, s_ctrl)` for HEVC_SPS, which dispatches to `rkvdec_s_ctrl` in `drivers/media/platform/rockchip/rkvdec/rkvdec.c:149`: ```c static int rkvdec_s_ctrl(struct v4l2_ctrl *ctrl) { struct rkvdec_ctx *ctx = container_of(ctrl->handler, struct rkvdec_ctx, ctrl_hdl); const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; enum rkvdec_image_fmt image_fmt; struct vb2_queue *vq; ... /* Check if this change requires a capture format reset */ if (!desc->ops->get_image_fmt) return 0; image_fmt = desc->ops->get_image_fmt(ctx, ctrl); if (rkvdec_image_fmt_changed(ctx, image_fmt)) { vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE); if (vb2_is_busy(vq)) return -EBUSY; // ← THIS ctx->image_fmt = image_fmt; rkvdec_reset_decoded_fmt(ctx); } return 0; } ``` ### Root cause When the first HEVC_SPS arrives, rkvdec needs to determine the output image format from SPS fields (chroma_format_idc, bit_depth_luma/chroma_minus8). If the format differs from the previous/default — which it does at first-frame because ctx->image_fmt starts at the default — rkvdec wants to reset the CAPTURE format. But it can only do that if the CAPTURE queue has NO buffers allocated. `vb2_is_busy(vq)` returns true if `vq->num_buffers > 0`. **libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design)**. By the time the first per-frame S_EXT_CTRLS(HEVC_SPS, REQUEST_VAL) fires, CAPTURE is already full → vb2_is_busy=true → -EBUSY → setup loop exits → SPS never committed → all-zero in ctx->ctrl_hdl → rkvdec_hevc_run reads zero. **kdirect (ffmpeg-v4l2request)** allocates CAPTURE buffers AFTER the SPS-driven format is known. So when its first S_EXT_CTRLS fires, CAPTURE is EMPTY → vb2_is_busy=false → format reset succeeds → s_ctrl returns 0 → SPS commits correctly. ### This is THE Bug 5 root cause After 24 iterations of investigation, including 8 wire-byte hypothesis eliminations, 4 mechanism eliminations, and 5 kernel-side printk iterations: **Bug 5 (HEVC libva = all-zero CAPTURE) is caused by libva pre-allocating CAPTURE buffers before the first SPS-set, blocking rkvdec's format-reset.** Bug 4 (H264 libva = keyframe partial) is likely the same root cause — H264_SPS triggers the same image_fmt check via rkvdec_h264_fmt_ops's get_image_fmt. ### Why VP9 works through libva VP9 (rkvdec_vp9_ctrls) might NOT have a get_image_fmt op (vp9_frame is the only control, and chroma+bit_depth come from frame header, not a separate SPS). Or VP9's frame parameters always resolve to the same image_fmt as the default. Either way, no format-reset attempt → no -EBUSY. ### Mechanism status — RESOLVED | # | Mechanism | Status | |---|---|---| | ALL prior | various | DISPROVED iter17-23 | | **iter24** | **rkvdec_s_ctrl returns -EBUSY for HEVC_SPS because CAPTURE queue is busy with libva's pre-allocated pool** | **CONFIRMED — ROOT CAUSE** | ### Fix candidates **Option A** (libva backend fix): Defer libva's CAPTURE pool allocation until AFTER the first per-frame SPS is set. Concretely: - At CreateContext: skip cap_pool_init. - On first BeginPicture/EndPicture: after first S_EXT_CTRLS(SPS) succeeds, then REQBUFS+QUERYBUF+MMAP the CAPTURE pool. - Risk: changes the iter5b-β "permanent CAPTURE pool" model, may regress VP9/MPEG-2. **Option B** (libva backend fix, narrower): Use S_FMT(CAPTURE) BEFORE allocating CAPTURE buffers, with the same image_fmt the SPS will request. This way, ctx->image_fmt is already correct when SPS arrives → rkvdec_image_fmt_changed returns false → no reset attempt → no -EBUSY. **Option C** (kernel fix, upstream): Change rkvdec_s_ctrl to silently no-op the format-reset if the image_fmt is already correct, even if get_image_fmt returns a value that triggered the check. This is risky — it changes upstream rkvdec semantics. **Option B is preferred** — minimal libva change, aligns with kdirect's pattern (set S_FMT(CAPTURE) before allocating). ### iter25 candidate Implement Option B in libva backend's CreateContext: explicit `v4l2_set_format(CAPTURE, V4L2_PIX_FMT_NV12, fixture_w, fixture_h)` BEFORE `cap_pool_init`. Set the expected format from BBB's parameters (chroma 4:2:0, 8-bit → NV12). This builds on iter15's α-19 which already adds an explicit S_FMT(CAPTURE) call — but verify it ACTUALLY runs before cap_pool_init in the libva CreateContext flow. ### Substrate state at iter24 close - Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). - Fork tip `e109306` — unchanged. - Kernel `linux-fresnel-fourier 7.0-8` with iter17 + iter20-24 printks. - 5-codec anchors: unchanged. ### Lesson 8 iterations of wire-byte and ioctl-sequence analysis (iter11-iter18) chased an empirical illusion. Once kernel-side printk landed (iter17), 4 more iterations (iter20-23) walked the symptom down to one function call returning one specific error code. **The bug was in a 5-line kernel function we'd never read.** Now we have the right diagnosis and a clear forward path.