Files
ampere-kernel-decoders/phase0_findings_iter3.md
T
marfrit dfebd8017f iter3 phase0: HEVC kernel-side investigation substrate
Entry condition: iter2 F1 closed with deterministic x1=0x51a0
evidence + 'our new controls don't reach the kernel' strace.

Substrate:
- kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices
  (same tree as boltzmann's linux-rk3588-marfrit branch)
- module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann
  16-core, deploy via scp + rmmod/insmod cycle (no reboot needed)

5 open questions for Phase 1:
  Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?)
  Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every
     assignment)
  Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS?
  Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log
     h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD.
     Q4 first; informs all others.
  Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c?

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:48:57 +00:00

45 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — iter3 substrate (HEVC kernel-side investigation)
Opened 2026-05-16 evening, immediately following iter2's F1 close. Entry conditions are already concrete; this Phase 0 is brief.
## Research question
**What kernel-side state causes `run->ext_sps_st_rps` to deterministically equal `0x51a0` in `rkvdec_hevc_prepare_hw_st_rps` on ampere, and what's the minimal kernel patch that makes the kernel's HEVC RPS preparation safe against the userspace inputs ampere's libva backend actually supplies?**
## Locked-in evidence carried from iter2
| Observation | Source | Status |
|-------------|--------|--------|
| OOPS at `__pi_memcmp+0x10/0x110` called from `rkvdec_hevc_prepare_hw_st_rps+0x38/0x300` | ampere dmesg, 3 captures | reproducible 100 % |
| Faulting argument: `x1 = 0x51a0` (`run->ext_sps_st_rps`), `pgd=0` (no page-table mapping) | dmesg register dump | deterministic across reboots |
| `x0 = ffff000…` (valid kernel heap, the `cache` arg), `x2 = 0x48` (72 bytes) | same | normal-looking |
| Backend's S_EXT_CTRLS for `0xa40a98` (HEVC_EXT_SPS_ST_RPS) + `0xa40a99` (_LT_RPS) never appear in ioctl trace | iter2 strace (`/tmp/iter2_after.strace.*` on ampere) | confirmed |
| Backend's standard 5-control submission returns `EINVAL` with `error_idx=5` | same strace | kernel rejects whole batch |
| Kernel `ctx->has_sps_st_rps` only goes true via `\|= !!(ctrl->has_changed)` in `rkvdec.c::rkvdec_s_ctrl` | source-read | gate path identified |
| Kernel control descriptor for `EXT_SPS_*_RPS` declares `.cfg.dims = { 65 }` (dimensional-array, not plain compound) | `rkvdec.c::vdpu38x_hevc_ctrl_descs[]` | dynamic-array protocol semantics |
| Backend infrastructure landed: vendored GStreamer 1.28.2 parser, UAPI shim, per-fd probe, h265 set_controls gate. Build clean, install clean. | iter2 commits `f91c3f5..1a2c958` | reusable |
## Substrate
- Kernel source: `ampere:~/src/linux-rockchip` branch `ampere-minimal-devices`, tip `7c241f2e2835`. Identical mirror also at `boltzmann:~/src/linux-rockchip @ linux-rk3588-marfrit` (per ampere iter1 phase0).
- Target build artefact: `drivers/media/platform/rockchip/rkvdec/rockchip_vdec.ko` only — module-incremental rebuild, NOT full kernel. ~30 s on boltzmann's 16-core after first full pass.
- Module-deploy path: scp built `.ko` to ampere, `sudo rmmod rockchip_vdec; sudo insmod /tmp/rockchip_vdec.ko`. Avoids reboot (cheap iteration).
- Build invocation: kernel-agent dispatch OR hand-build via `make M=drivers/media/platform/rockchip/rkvdec modules` against pre-configured tree.
- dmesg capture path: `sudo dmesg --time-format=ctime | grep rkvdec` post-test.
## Open questions tabled for Phase 1
1. **What concretely is `0x51a0`?** Three candidate decompositions:
- `0x51a0` = 20896 = `261 × 80` (where 80 is `sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)` per our header)
- `0x51a0` mod 8 = 0, mod 16 = 0 — aligned; rules out "random heap fragment"
- `0x51a0` ÷ 4 = 0x1468 (5224). Doesn't map to anything obvious yet.
- Look for any kernel literal `0x51a0` or struct field that would be at that offset in `v4l2_ctrl` or `rkvdec_ctx`.
2. **Where does the `ctrl->p_cur.p = 0x51a0` assignment happen?** Trace via printk: every `rkvdec_s_ctrl` call (does our backend's S_EXT_CTRLS hit this?), every `v4l2_ctrl_handler_init` + `v4l2_ctrl_new_custom` for the EXT_SPS_*_RPS controls (during driver probe), every assignment to `ctrl->p_cur.p` for these controls.
3. **Is `ctx->has_sps_st_rps` ever observably true on a backend that doesn't set these controls?** Phase 1 hypothesis if yes: there's a synthetic `has_changed=true` set during ctrl_handler init for dimensional-array controls. If no, then we're hitting a different code path entirely (maybe an alternate `prepare_hw_st_rps` call site we haven't found).
4. **Why does our backend's S_EXT_CTRLS for the new CIDs not appear in strace?** Cheap to diagnose: add `request_log` inside `h265_populate_ext_sps_rps_cache` to print return code + source_data SPS-NAL-found status. Doesn't require kernel rebuild. **Do this FIRST in Phase 6** — answers a question that's orthogonal to the kernel-side instrumentation but informs the eventual fix path.
5. **What other rkvdec drivers exist in this kernel source that could be the actual run-target?** ampere has `rkvdec-vdpu381-hevc.c` AND `rkvdec-vdpu383-hevc.c` — both call `rkvdec_hevc_assemble_hw_rps`. Which one fires on RK3588 (CoolPi GenBook is which)? Phase 2 source-read.
## Phase 0 close
Substrate locked. iter2's evidence is the binding-cell starting condition. Five open questions for Phase 1 to lock — Q4 (cheap backend log) is the gating dependency-of-other-questions and goes first in Phase 6.