Files
claude-noether a0d79b79ee Phase 2.1 close: all 5 register-packing TODOs implemented, clean build
Commit 63c4db0095ca on boltzmann vp9-enablement-iter1 branch adds the
full register-packing path: RCB addresses, block clock-gating defaults,
frame-area timeout threshold (with rkvdec_schedule_watchdog), 8-segment
packing helper, ref/mode-deltas bit-packing into 28/14-bit combined
fields, and first-cut probability storage aliasing.

Branch is now at 4 commits, 1390 LoC across 4 files. Module compiles
clean against 7.0.0-rc3 ARM64 kernel.

What remains potentially-needed for first-light is in reg103_frame_flags
(prob_update_en, ref/mode/single/comp refresh enables, etc.) — currently
zero-init; will tune in Phase 6 if byte-compare diverges from SW.

Phase 3 (hardware install + first decode) is the natural inflection
point. Module artifact is at boltzmann:~/src/linux-rockchip/drivers/media
/platform/rockchip/rkvdec/rockchip-vdec.ko, ready to install on ampere
after backing up the current sibling-campaign module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:28:27 +00:00

64 lines
4.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 2.1 close — 5 register-packing TODOs implemented
Date: 2026-05-17 ~03:10. Phase 2.1 closed in the same session as Phase 2 — module builds clean with the full register-packing path.
## Commit on `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1`
`63c4db0095ca` — media: rkvdec: VP9 (vdpu381): implement Phase 2.1 register-packing TODOs (+128, -10)
Stacks on Phase 2's 3 commits (`47431635801d`, `da8482271938`, `71cc0d96d212`). Total branch delta now 1390 LoC across 4 files.
## What changed
| TODO | Implementation |
|---|---|
| 1. RCB addresses | `for (i = 0; i < rkvdec_rcb_buf_count(ctx); i++) regs->common_addr.rcb_base[i] = rkvdec_rcb_buf_dma_addr(ctx, i);` — direct mirror of vdpu381-HEVC pattern |
| 2. Block clock-gating defaults | All `reg026_block_gating_en.*` fields set from HEVC defaults (busifd_auto_gating_e = 0 per pattern, rest = 1) |
| 3. Frame-area timeout threshold | RKVDEC_TIMEOUT_1080p/_4K/_8K/_MAX based on pixel count; `rkvdec_schedule_watchdog()` replaces the fixed 2-second `schedule_delayed_work` |
| 4. Seg-register packing | `vdpu381_config_seg_register(vp9_ctx, segid)` helper called 8× — full port of legacy `config_seg_registers` to `reg067_074_segid[]` |
| 5. Loop-filter bit-packing | `ref_deltas[0..3]` → 28-bit packed into `reg094.ref_deltas_lastframe`; `mode_deltas[0..1]` → 14-bit packed into `reg075.mode_deltas_lastframe` (using `(x & 0x7f) << (i*7)`) |
| Bonus: probability storage | `reg160_delta_prob_base`, `reg162_last_prob_base`, `reg172_update_prob_wr_base` aliased to `priv_tbl.probs` for first-cut; per-slot rotation deferred to Phase 6 if adaptive entropy diverges |
| Bonus: prob_idx | `reg028.vp9_rd_prob_idx = vp9_wr_prob_idx = 0` first-cut; BSP rotates across frames |
## Build verification
```
boltzmann$ PATH=/home/mfritsche/bin:$PATH make -j$(nproc) \
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
drivers/media/platform/rockchip/rkvdec/
CC [M] drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.o
LD [M] drivers/media/platform/rockchip/rkvdec/rockchip-vdec.o
```
Clean — zero warnings, zero errors.
## What remains potentially-needed for first-light decode
| Field | Risk | Mitigation |
|---|---|---|
| `reg103_frame_flags.txfmmode_rfsh_en`, `ref_mode_rfsh_en`, `single_ref_rfsh_en`, `comp_ref_rfsh_en`, `inter_coef_rfsh_flag`, `last_key_frame_flag`, `prob_update_en`, `prob_save_en` | Frame may decode but adaptive entropy may diverge from SW after a few frames | Currently zero-init. Verify via byte-compare on frames 1-5 vs SW reference. If divergence, mirror BSP `vp9d_vdpu382_gen_regs` setting logic |
| Probability slot rotation | First-cut single-slot may corrupt frame_context across multiple inter frames | Watch dmesg + byte-compare frames 2+ vs SW reference |
| RCB sizing | HEVC-tuned for 8K, oversized for 4K VP9 | Wastes SRAM but doesn't fail; tune in Phase 6 |
## Phase 3 entry point (unchanged, hardware-domain)
Phase 3 needs hardware:
1. Retry `git push gitea vp9-enablement-iter1` (or push `linux-rk3588-marfrit` base first)
2. Backup ampere's current `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` (with off-device archive per [feedback_backup_before_module_replace](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_backup_before_module_replace.md))
3. `scp` boltzmann's `rockchip-vdec.ko` to ampere; `modprobe -r/modprobe` cycle (no reboot needed — kernel decoder module hot-swappable)
4. Sanity: `v4l2-ctl --device=/dev/video0 --list-formats-out` shows `VP9F` alongside H264_SLICE + HEVC_SLICE
5. Pre-flight IRQ diagnostic: `pr_warn("vp9 irq status=0x%x\n", status)` in `vdpu381_irq_handler` for first 5 decodes (Janet Amendment C)
6. First decode attempt: `LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -i bbb_30s_720p.vp9.webm -vf hwdownload,format=nv12 -frames:v 5 -f rawvideo /tmp/hw-vp9.nv12`
7. Watch dmesg for IRQ status, watchdog firing, error patterns
8. If first-light works: SW byte-compare on frame 100 against `ffmpeg -c:v vp9` reference
## Tonight's campaign summary
Started: ampere-vp9-enablement campaign opened at ~01:00, sibling to ampere-kernel-decoders close.
Closed: 4 commits on branch + 7 docs in campaign repo, module compiles clean on RK3588 target kernel.
Wall-time: ~2h15m for Phase 0 (substrate) + Phase 1 (plan + 2 architect reviews) + Phase 2 (1400 LoC implementation) + Phase 2.1 (full register-packing). All architecturally-correct work; bit-level register correctness verified next session against hardware.
Next session continues at the hardware testing inflection point.