iter3 phase0: HEVC kernel-side investigation substrate

Entry condition: iter2 F1 closed with deterministic x1=0x51a0
evidence + 'our new controls don't reach the kernel' strace.

Substrate:
- kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices
  (same tree as boltzmann's linux-rk3588-marfrit branch)
- module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann
  16-core, deploy via scp + rmmod/insmod cycle (no reboot needed)

5 open questions for Phase 1:
  Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?)
  Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every
     assignment)
  Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS?
  Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log
     h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD.
     Q4 first; informs all others.
  Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c?

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 09:48:57 +00:00
parent 17aa443f8f
commit dfebd8017f
+44
View File
@@ -0,0 +1,44 @@
# Phase 0 — iter3 substrate (HEVC kernel-side investigation)
Opened 2026-05-16 evening, immediately following iter2's F1 close. Entry conditions are already concrete; this Phase 0 is brief.
## Research question
**What kernel-side state causes `run->ext_sps_st_rps` to deterministically equal `0x51a0` in `rkvdec_hevc_prepare_hw_st_rps` on ampere, and what's the minimal kernel patch that makes the kernel's HEVC RPS preparation safe against the userspace inputs ampere's libva backend actually supplies?**
## Locked-in evidence carried from iter2
| Observation | Source | Status |
|-------------|--------|--------|
| OOPS at `__pi_memcmp+0x10/0x110` called from `rkvdec_hevc_prepare_hw_st_rps+0x38/0x300` | ampere dmesg, 3 captures | reproducible 100 % |
| Faulting argument: `x1 = 0x51a0` (`run->ext_sps_st_rps`), `pgd=0` (no page-table mapping) | dmesg register dump | deterministic across reboots |
| `x0 = ffff000…` (valid kernel heap, the `cache` arg), `x2 = 0x48` (72 bytes) | same | normal-looking |
| Backend's S_EXT_CTRLS for `0xa40a98` (HEVC_EXT_SPS_ST_RPS) + `0xa40a99` (_LT_RPS) never appear in ioctl trace | iter2 strace (`/tmp/iter2_after.strace.*` on ampere) | confirmed |
| Backend's standard 5-control submission returns `EINVAL` with `error_idx=5` | same strace | kernel rejects whole batch |
| Kernel `ctx->has_sps_st_rps` only goes true via `\|= !!(ctrl->has_changed)` in `rkvdec.c::rkvdec_s_ctrl` | source-read | gate path identified |
| Kernel control descriptor for `EXT_SPS_*_RPS` declares `.cfg.dims = { 65 }` (dimensional-array, not plain compound) | `rkvdec.c::vdpu38x_hevc_ctrl_descs[]` | dynamic-array protocol semantics |
| Backend infrastructure landed: vendored GStreamer 1.28.2 parser, UAPI shim, per-fd probe, h265 set_controls gate. Build clean, install clean. | iter2 commits `f91c3f5..1a2c958` | reusable |
## Substrate
- Kernel source: `ampere:~/src/linux-rockchip` branch `ampere-minimal-devices`, tip `7c241f2e2835`. Identical mirror also at `boltzmann:~/src/linux-rockchip @ linux-rk3588-marfrit` (per ampere iter1 phase0).
- Target build artefact: `drivers/media/platform/rockchip/rkvdec/rockchip_vdec.ko` only — module-incremental rebuild, NOT full kernel. ~30 s on boltzmann's 16-core after first full pass.
- Module-deploy path: scp built `.ko` to ampere, `sudo rmmod rockchip_vdec; sudo insmod /tmp/rockchip_vdec.ko`. Avoids reboot (cheap iteration).
- Build invocation: kernel-agent dispatch OR hand-build via `make M=drivers/media/platform/rockchip/rkvdec modules` against pre-configured tree.
- dmesg capture path: `sudo dmesg --time-format=ctime | grep rkvdec` post-test.
## Open questions tabled for Phase 1
1. **What concretely is `0x51a0`?** Three candidate decompositions:
- `0x51a0` = 20896 = `261 × 80` (where 80 is `sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)` per our header)
- `0x51a0` mod 8 = 0, mod 16 = 0 — aligned; rules out "random heap fragment"
- `0x51a0` ÷ 4 = 0x1468 (5224). Doesn't map to anything obvious yet.
- Look for any kernel literal `0x51a0` or struct field that would be at that offset in `v4l2_ctrl` or `rkvdec_ctx`.
2. **Where does the `ctrl->p_cur.p = 0x51a0` assignment happen?** Trace via printk: every `rkvdec_s_ctrl` call (does our backend's S_EXT_CTRLS hit this?), every `v4l2_ctrl_handler_init` + `v4l2_ctrl_new_custom` for the EXT_SPS_*_RPS controls (during driver probe), every assignment to `ctrl->p_cur.p` for these controls.
3. **Is `ctx->has_sps_st_rps` ever observably true on a backend that doesn't set these controls?** Phase 1 hypothesis if yes: there's a synthetic `has_changed=true` set during ctrl_handler init for dimensional-array controls. If no, then we're hitting a different code path entirely (maybe an alternate `prepare_hw_st_rps` call site we haven't found).
4. **Why does our backend's S_EXT_CTRLS for the new CIDs not appear in strace?** Cheap to diagnose: add `request_log` inside `h265_populate_ext_sps_rps_cache` to print return code + source_data SPS-NAL-found status. Doesn't require kernel rebuild. **Do this FIRST in Phase 6** — answers a question that's orthogonal to the kernel-side instrumentation but informs the eventual fix path.
5. **What other rkvdec drivers exist in this kernel source that could be the actual run-target?** ampere has `rkvdec-vdpu381-hevc.c` AND `rkvdec-vdpu383-hevc.c` — both call `rkvdec_hevc_assemble_hw_rps`. Which one fires on RK3588 (CoolPi GenBook is which)? Phase 2 source-read.
## Phase 0 close
Substrate locked. iter2's evidence is the binding-cell starting condition. Five open questions for Phase 1 to lock — Q4 (cheap backend log) is the gating dependency-of-other-questions and goes first in Phase 6.