Files
ampere-kernel-decoders/phase0_findings_iter3.md
marfrit dfebd8017f iter3 phase0: HEVC kernel-side investigation substrate
Entry condition: iter2 F1 closed with deterministic x1=0x51a0
evidence + 'our new controls don't reach the kernel' strace.

Substrate:
- kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices
  (same tree as boltzmann's linux-rk3588-marfrit branch)
- module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann
  16-core, deploy via scp + rmmod/insmod cycle (no reboot needed)

5 open questions for Phase 1:
  Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?)
  Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every
     assignment)
  Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS?
  Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log
     h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD.
     Q4 first; informs all others.
  Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c?

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:48:57 +00:00

4.6 KiB
Raw Permalink Blame History

Phase 0 — iter3 substrate (HEVC kernel-side investigation)

Opened 2026-05-16 evening, immediately following iter2's F1 close. Entry conditions are already concrete; this Phase 0 is brief.

Research question

What kernel-side state causes run->ext_sps_st_rps to deterministically equal 0x51a0 in rkvdec_hevc_prepare_hw_st_rps on ampere, and what's the minimal kernel patch that makes the kernel's HEVC RPS preparation safe against the userspace inputs ampere's libva backend actually supplies?

Locked-in evidence carried from iter2

Observation Source Status
OOPS at __pi_memcmp+0x10/0x110 called from rkvdec_hevc_prepare_hw_st_rps+0x38/0x300 ampere dmesg, 3 captures reproducible 100 %
Faulting argument: x1 = 0x51a0 (run->ext_sps_st_rps), pgd=0 (no page-table mapping) dmesg register dump deterministic across reboots
x0 = ffff000… (valid kernel heap, the cache arg), x2 = 0x48 (72 bytes) same normal-looking
Backend's S_EXT_CTRLS for 0xa40a98 (HEVC_EXT_SPS_ST_RPS) + 0xa40a99 (_LT_RPS) never appear in ioctl trace iter2 strace (/tmp/iter2_after.strace.* on ampere) confirmed
Backend's standard 5-control submission returns EINVAL with error_idx=5 same strace kernel rejects whole batch
Kernel ctx->has_sps_st_rps only goes true via |= !!(ctrl->has_changed) in rkvdec.c::rkvdec_s_ctrl source-read gate path identified
Kernel control descriptor for EXT_SPS_*_RPS declares .cfg.dims = { 65 } (dimensional-array, not plain compound) rkvdec.c::vdpu38x_hevc_ctrl_descs[] dynamic-array protocol semantics
Backend infrastructure landed: vendored GStreamer 1.28.2 parser, UAPI shim, per-fd probe, h265 set_controls gate. Build clean, install clean. iter2 commits f91c3f5..1a2c958 reusable

Substrate

  • Kernel source: ampere:~/src/linux-rockchip branch ampere-minimal-devices, tip 7c241f2e2835. Identical mirror also at boltzmann:~/src/linux-rockchip @ linux-rk3588-marfrit (per ampere iter1 phase0).
  • Target build artefact: drivers/media/platform/rockchip/rkvdec/rockchip_vdec.ko only — module-incremental rebuild, NOT full kernel. ~30 s on boltzmann's 16-core after first full pass.
  • Module-deploy path: scp built .ko to ampere, sudo rmmod rockchip_vdec; sudo insmod /tmp/rockchip_vdec.ko. Avoids reboot (cheap iteration).
  • Build invocation: kernel-agent dispatch OR hand-build via make M=drivers/media/platform/rockchip/rkvdec modules against pre-configured tree.
  • dmesg capture path: sudo dmesg --time-format=ctime | grep rkvdec post-test.

Open questions tabled for Phase 1

  1. What concretely is 0x51a0? Three candidate decompositions:
    • 0x51a0 = 20896 = 261 × 80 (where 80 is sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps) per our header)
    • 0x51a0 mod 8 = 0, mod 16 = 0 — aligned; rules out "random heap fragment"
    • 0x51a0 ÷ 4 = 0x1468 (5224). Doesn't map to anything obvious yet.
    • Look for any kernel literal 0x51a0 or struct field that would be at that offset in v4l2_ctrl or rkvdec_ctx.
  2. Where does the ctrl->p_cur.p = 0x51a0 assignment happen? Trace via printk: every rkvdec_s_ctrl call (does our backend's S_EXT_CTRLS hit this?), every v4l2_ctrl_handler_init + v4l2_ctrl_new_custom for the EXT_SPS_*_RPS controls (during driver probe), every assignment to ctrl->p_cur.p for these controls.
  3. Is ctx->has_sps_st_rps ever observably true on a backend that doesn't set these controls? Phase 1 hypothesis if yes: there's a synthetic has_changed=true set during ctrl_handler init for dimensional-array controls. If no, then we're hitting a different code path entirely (maybe an alternate prepare_hw_st_rps call site we haven't found).
  4. Why does our backend's S_EXT_CTRLS for the new CIDs not appear in strace? Cheap to diagnose: add request_log inside h265_populate_ext_sps_rps_cache to print return code + source_data SPS-NAL-found status. Doesn't require kernel rebuild. Do this FIRST in Phase 6 — answers a question that's orthogonal to the kernel-side instrumentation but informs the eventual fix path.
  5. What other rkvdec drivers exist in this kernel source that could be the actual run-target? ampere has rkvdec-vdpu381-hevc.c AND rkvdec-vdpu383-hevc.c — both call rkvdec_hevc_assemble_hw_rps. Which one fires on RK3588 (CoolPi GenBook is which)? Phase 2 source-read.

Phase 0 close

Substrate locked. iter2's evidence is the binding-cell starting condition. Five open questions for Phase 1 to lock — Q4 (cheap backend log) is the gating dependency-of-other-questions and goes first in Phase 6.