Files
fresnel-fourier/phase4_iter21_plan.md
marfrit bf67900cd8 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5
iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 10:10:56 +00:00

3.2 KiB
Raw Permalink Blame History

Iteration 21 — Phase 4 (plan)

Opens 2026-05-14. Continues iter20's localization: rkvdec sees all-zero ctx->ctrl_hdl SPS for libva, real bytes for kdirect. The break is between userspace S_EXT_CTRLS and rkvdec's read of ctx->ctrl_hdl[SPS].p_cur.p.

Locked research question (iter21)

"At the v4l2_ctrl_request_setup() entry for libva's per-frame request_fd, is the V4L2 control-handler object (obj) found? For each control_ref in the request's hdl->ctrl_refs, is p_req_valid == true?"

What this narrows

v4l2_ctrl_request_setup(req, main_hdl) at IOC_QUEUE time iterates hdl->ctrl_refs and only applies controls where ref->p_req_valid == true. The bit gets set by the staging path in try_set_ext_ctrls_common (called from try_set_ext_ctrls_request) when which=V4L2_CTRL_WHICH_REQUEST_VAL.

If libva's S_EXT_CTRLS staged correctly, p_req_valid is true for each ctrl libva submitted. If staging failed silently, p_req_valid is false and v4l2_ctrl_request_setup skips them — ctx->ctrl_hdl stays at zero (matches iter20 evidence).

Approach (α-24 inert → kernel path)

α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) was implemented in 1547a5d+a9c897f, returned EACCES for all 13 libva HEVC frames. Reverted in e109306. The kernel disallows G_EXT_CTRLS against a not-yet-completed request — userspace can't probe req->p_new. Mechanism distinguishing requires kernel printk.

iter21 patches v4l2_ctrl_request_setup with two printk lines:

pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n",
        req, main_hdl, obj);

pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n",
        ctrl->id, ref->p_req_valid, have_new_data);

Build linux-fresnel-fourier 7.0-5 (pkgrel 4→5), deploy, reboot, run libva-HEVC + kdirect-HEVC, capture dmesg.

Outcome interpretation

obj at setup entry p_req_valid (libva run) Diagnosis
NULL n/a req has no v4l2_ctrl_handler bound at queue time. libva's S_EXT_CTRLS never staged. Bug in libva's request lifecycle.
non-NULL all false obj found, but staging path never set p_req_valid. Bug in try_set_ext_ctrls_common for libva's invocation.
non-NULL true for SPS staging worked but req_to_new / try_or_set_cluster failed silently. Bug in apply path. Needs another printk after req_to_new.

iter21 finishes when one of these is confirmed for libva. Compare to kdirect baseline (should always show p_req_valid=true for SPS).

Substrate state at iter21 open

  • Kernel linux-fresnel-fourier 7.0-5 building on boltzmann (PID 1584834, log /tmp/iter21-kbuild.log).
  • Backend SHA c1d4bb53… (iter15 stable) — backend unchanged from iter15.
  • Fork tip e109306 (α-24 reverted).
  • 5-codec anchors: unchanged. Zero regression.

Phase 5 review note

Diagnostic kernel patch (2 pr_info calls in well-known V4L2 framework function, no behavior change). Phase 5 review skipped per iter17 precedent for diagnostic-only kernel work.

Phase 7 plan

After 7.0-5 deploys:

  1. Reboot fresnel; sddm autologin reseats mfritsche.
  2. sudo dmesg -C.
  3. Run libva HEVC; capture rkvdec_iter20 + iter21_setup lines.
  4. sudo dmesg -C.
  5. Run kdirect HEVC; capture same.
  6. Diff; localize bug to one of the three table-row diagnoses.