From dfebd8017f6f00aa00a6a5b5b52287cba67a8016 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 09:48:57 +0000 Subject: [PATCH] iter3 phase0: HEVC kernel-side investigation substrate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Entry condition: iter2 F1 closed with deterministic x1=0x51a0 evidence + 'our new controls don't reach the kernel' strace. Substrate: - kernel source ampere:~/src/linux-rockchip @ ampere-minimal-devices (same tree as boltzmann's linux-rk3588-marfrit branch) - module-only rebuild path: rockchip_vdec.ko, ~30s on boltzmann 16-core, deploy via scp + rmmod/insmod cycle (no reboot needed) 5 open questions for Phase 1: Q1 decode 0x51a0 (candidate: 261*80=sizeof × count?) Q2 where does ctrl->p_cur.p = 0x51a0 happen? (printk every assignment) Q3 is ctx->has_sps_st_rps true even w/o backend S_EXT_CTRLS? Q4 (CHEAPEST) why don't our new CIDs reach the kernel — log h265_populate_ext_sps_rps_cache return path. NO KERNEL REBUILD. Q4 first; informs all others. Q5 RK3588 routes through vdpu381-hevc.c or vdpu383-hevc.c? Co-Authored-By: Claude Opus 4.7 --- phase0_findings_iter3.md | 44 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 phase0_findings_iter3.md diff --git a/phase0_findings_iter3.md b/phase0_findings_iter3.md new file mode 100644 index 0000000..af752b9 --- /dev/null +++ b/phase0_findings_iter3.md @@ -0,0 +1,44 @@ +# Phase 0 — iter3 substrate (HEVC kernel-side investigation) + +Opened 2026-05-16 evening, immediately following iter2's F1 close. Entry conditions are already concrete; this Phase 0 is brief. + +## Research question + +**What kernel-side state causes `run->ext_sps_st_rps` to deterministically equal `0x51a0` in `rkvdec_hevc_prepare_hw_st_rps` on ampere, and what's the minimal kernel patch that makes the kernel's HEVC RPS preparation safe against the userspace inputs ampere's libva backend actually supplies?** + +## Locked-in evidence carried from iter2 + +| Observation | Source | Status | +|-------------|--------|--------| +| OOPS at `__pi_memcmp+0x10/0x110` called from `rkvdec_hevc_prepare_hw_st_rps+0x38/0x300` | ampere dmesg, 3 captures | reproducible 100 % | +| Faulting argument: `x1 = 0x51a0` (`run->ext_sps_st_rps`), `pgd=0` (no page-table mapping) | dmesg register dump | deterministic across reboots | +| `x0 = ffff000…` (valid kernel heap, the `cache` arg), `x2 = 0x48` (72 bytes) | same | normal-looking | +| Backend's S_EXT_CTRLS for `0xa40a98` (HEVC_EXT_SPS_ST_RPS) + `0xa40a99` (_LT_RPS) never appear in ioctl trace | iter2 strace (`/tmp/iter2_after.strace.*` on ampere) | confirmed | +| Backend's standard 5-control submission returns `EINVAL` with `error_idx=5` | same strace | kernel rejects whole batch | +| Kernel `ctx->has_sps_st_rps` only goes true via `\|= !!(ctrl->has_changed)` in `rkvdec.c::rkvdec_s_ctrl` | source-read | gate path identified | +| Kernel control descriptor for `EXT_SPS_*_RPS` declares `.cfg.dims = { 65 }` (dimensional-array, not plain compound) | `rkvdec.c::vdpu38x_hevc_ctrl_descs[]` | dynamic-array protocol semantics | +| Backend infrastructure landed: vendored GStreamer 1.28.2 parser, UAPI shim, per-fd probe, h265 set_controls gate. Build clean, install clean. | iter2 commits `f91c3f5..1a2c958` | reusable | + +## Substrate + +- Kernel source: `ampere:~/src/linux-rockchip` branch `ampere-minimal-devices`, tip `7c241f2e2835`. Identical mirror also at `boltzmann:~/src/linux-rockchip @ linux-rk3588-marfrit` (per ampere iter1 phase0). +- Target build artefact: `drivers/media/platform/rockchip/rkvdec/rockchip_vdec.ko` only — module-incremental rebuild, NOT full kernel. ~30 s on boltzmann's 16-core after first full pass. +- Module-deploy path: scp built `.ko` to ampere, `sudo rmmod rockchip_vdec; sudo insmod /tmp/rockchip_vdec.ko`. Avoids reboot (cheap iteration). +- Build invocation: kernel-agent dispatch OR hand-build via `make M=drivers/media/platform/rockchip/rkvdec modules` against pre-configured tree. +- dmesg capture path: `sudo dmesg --time-format=ctime | grep rkvdec` post-test. + +## Open questions tabled for Phase 1 + +1. **What concretely is `0x51a0`?** Three candidate decompositions: + - `0x51a0` = 20896 = `261 × 80` (where 80 is `sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)` per our header) + - `0x51a0` mod 8 = 0, mod 16 = 0 — aligned; rules out "random heap fragment" + - `0x51a0` ÷ 4 = 0x1468 (5224). Doesn't map to anything obvious yet. + - Look for any kernel literal `0x51a0` or struct field that would be at that offset in `v4l2_ctrl` or `rkvdec_ctx`. +2. **Where does the `ctrl->p_cur.p = 0x51a0` assignment happen?** Trace via printk: every `rkvdec_s_ctrl` call (does our backend's S_EXT_CTRLS hit this?), every `v4l2_ctrl_handler_init` + `v4l2_ctrl_new_custom` for the EXT_SPS_*_RPS controls (during driver probe), every assignment to `ctrl->p_cur.p` for these controls. +3. **Is `ctx->has_sps_st_rps` ever observably true on a backend that doesn't set these controls?** Phase 1 hypothesis if yes: there's a synthetic `has_changed=true` set during ctrl_handler init for dimensional-array controls. If no, then we're hitting a different code path entirely (maybe an alternate `prepare_hw_st_rps` call site we haven't found). +4. **Why does our backend's S_EXT_CTRLS for the new CIDs not appear in strace?** Cheap to diagnose: add `request_log` inside `h265_populate_ext_sps_rps_cache` to print return code + source_data SPS-NAL-found status. Doesn't require kernel rebuild. **Do this FIRST in Phase 6** — answers a question that's orthogonal to the kernel-side instrumentation but informs the eventual fix path. +5. **What other rkvdec drivers exist in this kernel source that could be the actual run-target?** ampere has `rkvdec-vdpu381-hevc.c` AND `rkvdec-vdpu383-hevc.c` — both call `rkvdec_hevc_assemble_hw_rps`. Which one fires on RK3588 (CoolPi GenBook is which)? Phase 2 source-read. + +## Phase 0 close + +Substrate locked. iter2's evidence is the binding-cell starting condition. Five open questions for Phase 1 to lock — Q4 (cheap backend log) is the gating dependency-of-other-questions and goes first in Phase 6.