iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24: pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
dp now match kdirect. HEVC frame 2+ still diverges
(separate bug, likely DPB entry mapping).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.2 KiB
Iteration 24 — Phase 8 (close)
Closes 2026-05-14. iter24 = kernel printk logging req_to_new and try_or_set_cluster return values. FULL close. ROOT CAUSE IDENTIFIED.
Method
linux-fresnel-fourier 7.0-8 (pkgrel 7→8). Added pr_info after each kernel framework call in v4l2_ctrl_request_setup's cluster-process block:
ret = req_to_new(r);
pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n",
master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems);
...
ret = try_or_set_cluster(NULL, master, true, 0);
pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret);
Result — definitive
libva HEVC (all 10+ setups, identical pattern):
iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1
iter24_try_or_set: master_id=0xa40a90 ret=-16
iter24_loop_break: at master_id=0xa40a90 ret=-16
iter24_loop_done: final ret=-16
-16 is -EBUSY. req_to_new succeeds. try_or_set_cluster returns -EBUSY for HEVC_SPS, exiting the setup loop.
kdirect HEVC: continues processing all 5 staged controls successfully (ret=0 throughout).
Source localization
The only -EBUSY path in try_or_set_cluster is call_op(master, s_ctrl) for HEVC_SPS, which dispatches to rkvdec_s_ctrl in drivers/media/platform/rockchip/rkvdec/rkvdec.c:149:
static int rkvdec_s_ctrl(struct v4l2_ctrl *ctrl)
{
struct rkvdec_ctx *ctx = container_of(ctrl->handler, struct rkvdec_ctx, ctrl_hdl);
const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
enum rkvdec_image_fmt image_fmt;
struct vb2_queue *vq;
...
/* Check if this change requires a capture format reset */
if (!desc->ops->get_image_fmt)
return 0;
image_fmt = desc->ops->get_image_fmt(ctx, ctrl);
if (rkvdec_image_fmt_changed(ctx, image_fmt)) {
vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
if (vb2_is_busy(vq))
return -EBUSY; // ← THIS
ctx->image_fmt = image_fmt;
rkvdec_reset_decoded_fmt(ctx);
}
return 0;
}
Root cause
When the first HEVC_SPS arrives, rkvdec needs to determine the output image format from SPS fields (chroma_format_idc, bit_depth_luma/chroma_minus8). If the format differs from the previous/default — which it does at first-frame because ctx->image_fmt starts at the default — rkvdec wants to reset the CAPTURE format.
But it can only do that if the CAPTURE queue has NO buffers allocated. vb2_is_busy(vq) returns true if vq->num_buffers > 0.
libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design). By the time the first per-frame S_EXT_CTRLS(HEVC_SPS, REQUEST_VAL) fires, CAPTURE is already full → vb2_is_busy=true → -EBUSY → setup loop exits → SPS never committed → all-zero in ctx->ctrl_hdl → rkvdec_hevc_run reads zero.
kdirect (ffmpeg-v4l2request) allocates CAPTURE buffers AFTER the SPS-driven format is known. So when its first S_EXT_CTRLS fires, CAPTURE is EMPTY → vb2_is_busy=false → format reset succeeds → s_ctrl returns 0 → SPS commits correctly.
This is THE Bug 5 root cause
After 24 iterations of investigation, including 8 wire-byte hypothesis eliminations, 4 mechanism eliminations, and 5 kernel-side printk iterations:
Bug 5 (HEVC libva = all-zero CAPTURE) is caused by libva pre-allocating CAPTURE buffers before the first SPS-set, blocking rkvdec's format-reset.
Bug 4 (H264 libva = keyframe partial) is likely the same root cause — H264_SPS triggers the same image_fmt check via rkvdec_h264_fmt_ops's get_image_fmt.
Why VP9 works through libva
VP9 (rkvdec_vp9_ctrls) might NOT have a get_image_fmt op (vp9_frame is the only control, and chroma+bit_depth come from frame header, not a separate SPS). Or VP9's frame parameters always resolve to the same image_fmt as the default. Either way, no format-reset attempt → no -EBUSY.
Mechanism status — RESOLVED
| # | Mechanism | Status |
|---|---|---|
| ALL prior | various | DISPROVED iter17-23 |
| iter24 | rkvdec_s_ctrl returns -EBUSY for HEVC_SPS because CAPTURE queue is busy with libva's pre-allocated pool | CONFIRMED — ROOT CAUSE |
Fix candidates
Option A (libva backend fix): Defer libva's CAPTURE pool allocation until AFTER the first per-frame SPS is set. Concretely:
- At CreateContext: skip cap_pool_init.
- On first BeginPicture/EndPicture: after first S_EXT_CTRLS(SPS) succeeds, then REQBUFS+QUERYBUF+MMAP the CAPTURE pool.
- Risk: changes the iter5b-β "permanent CAPTURE pool" model, may regress VP9/MPEG-2.
Option B (libva backend fix, narrower): Use S_FMT(CAPTURE) BEFORE allocating CAPTURE buffers, with the same image_fmt the SPS will request. This way, ctx->image_fmt is already correct when SPS arrives → rkvdec_image_fmt_changed returns false → no reset attempt → no -EBUSY.
Option C (kernel fix, upstream): Change rkvdec_s_ctrl to silently no-op the format-reset if the image_fmt is already correct, even if get_image_fmt returns a value that triggered the check. This is risky — it changes upstream rkvdec semantics.
Option B is preferred — minimal libva change, aligns with kdirect's pattern (set S_FMT(CAPTURE) before allocating).
iter25 candidate
Implement Option B in libva backend's CreateContext: explicit v4l2_set_format(CAPTURE, V4L2_PIX_FMT_NV12, fixture_w, fixture_h) BEFORE cap_pool_init. Set the expected format from BBB's parameters (chroma 4:2:0, 8-bit → NV12).
This builds on iter15's α-19 which already adds an explicit S_FMT(CAPTURE) call — but verify it ACTUALLY runs before cap_pool_init in the libva CreateContext flow.
Substrate state at iter24 close
- Backend SHA on fresnel:
c1d4bb53…(iter15 stable, unchanged). - Fork tip
e109306— unchanged. - Kernel
linux-fresnel-fourier 7.0-8with iter17 + iter20-24 printks. - 5-codec anchors: unchanged.
Lesson
8 iterations of wire-byte and ioctl-sequence analysis (iter11-iter18) chased an empirical illusion. Once kernel-side printk landed (iter17), 4 more iterations (iter20-23) walked the symptom down to one function call returning one specific error code. The bug was in a 5-line kernel function we'd never read. Now we have the right diagnosis and a clear forward path.