Files
claude-noether a0d79b79ee Phase 2.1 close: all 5 register-packing TODOs implemented, clean build
Commit 63c4db0095ca on boltzmann vp9-enablement-iter1 branch adds the
full register-packing path: RCB addresses, block clock-gating defaults,
frame-area timeout threshold (with rkvdec_schedule_watchdog), 8-segment
packing helper, ref/mode-deltas bit-packing into 28/14-bit combined
fields, and first-cut probability storage aliasing.

Branch is now at 4 commits, 1390 LoC across 4 files. Module compiles
clean against 7.0.0-rc3 ARM64 kernel.

What remains potentially-needed for first-light is in reg103_frame_flags
(prob_update_en, ref/mode/single/comp refresh enables, etc.) — currently
zero-init; will tune in Phase 6 if byte-compare diverges from SW.

Phase 3 (hardware install + first decode) is the natural inflection
point. Module artifact is at boltzmann:~/src/linux-rockchip/drivers/media
/platform/rockchip/rkvdec/rockchip-vdec.ko, ready to install on ampere
after backing up the current sibling-campaign module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:28:27 +00:00

4.5 KiB
Raw Permalink Blame History

Phase 2.1 close — 5 register-packing TODOs implemented

Date: 2026-05-17 ~03:10. Phase 2.1 closed in the same session as Phase 2 — module builds clean with the full register-packing path.

Commit on boltzmann:~/src/linux-rockchip:vp9-enablement-iter1

63c4db0095ca — media: rkvdec: VP9 (vdpu381): implement Phase 2.1 register-packing TODOs (+128, -10)

Stacks on Phase 2's 3 commits (47431635801d, da8482271938, 71cc0d96d212). Total branch delta now 1390 LoC across 4 files.

What changed

TODO Implementation
1. RCB addresses for (i = 0; i < rkvdec_rcb_buf_count(ctx); i++) regs->common_addr.rcb_base[i] = rkvdec_rcb_buf_dma_addr(ctx, i); — direct mirror of vdpu381-HEVC pattern
2. Block clock-gating defaults All reg026_block_gating_en.* fields set from HEVC defaults (busifd_auto_gating_e = 0 per pattern, rest = 1)
3. Frame-area timeout threshold RKVDEC_TIMEOUT_1080p/_4K/_8K/_MAX based on pixel count; rkvdec_schedule_watchdog() replaces the fixed 2-second schedule_delayed_work
4. Seg-register packing vdpu381_config_seg_register(vp9_ctx, segid) helper called 8× — full port of legacy config_seg_registers to reg067_074_segid[]
5. Loop-filter bit-packing ref_deltas[0..3] → 28-bit packed into reg094.ref_deltas_lastframe; mode_deltas[0..1] → 14-bit packed into reg075.mode_deltas_lastframe (using (x & 0x7f) << (i*7))
Bonus: probability storage reg160_delta_prob_base, reg162_last_prob_base, reg172_update_prob_wr_base aliased to priv_tbl.probs for first-cut; per-slot rotation deferred to Phase 6 if adaptive entropy diverges
Bonus: prob_idx reg028.vp9_rd_prob_idx = vp9_wr_prob_idx = 0 first-cut; BSP rotates across frames

Build verification

boltzmann$ PATH=/home/mfritsche/bin:$PATH make -j$(nproc) \
    ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
    drivers/media/platform/rockchip/rkvdec/

  CC [M]  drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.o
  LD [M]  drivers/media/platform/rockchip/rkvdec/rockchip-vdec.o

Clean — zero warnings, zero errors.

What remains potentially-needed for first-light decode

Field Risk Mitigation
reg103_frame_flags.txfmmode_rfsh_en, ref_mode_rfsh_en, single_ref_rfsh_en, comp_ref_rfsh_en, inter_coef_rfsh_flag, last_key_frame_flag, prob_update_en, prob_save_en Frame may decode but adaptive entropy may diverge from SW after a few frames Currently zero-init. Verify via byte-compare on frames 1-5 vs SW reference. If divergence, mirror BSP vp9d_vdpu382_gen_regs setting logic
Probability slot rotation First-cut single-slot may corrupt frame_context across multiple inter frames Watch dmesg + byte-compare frames 2+ vs SW reference
RCB sizing HEVC-tuned for 8K, oversized for 4K VP9 Wastes SRAM but doesn't fail; tune in Phase 6

Phase 3 entry point (unchanged, hardware-domain)

Phase 3 needs hardware:

  1. Retry git push gitea vp9-enablement-iter1 (or push linux-rk3588-marfrit base first)
  2. Backup ampere's current /lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko (with off-device archive per feedback_backup_before_module_replace)
  3. scp boltzmann's rockchip-vdec.ko to ampere; modprobe -r/modprobe cycle (no reboot needed — kernel decoder module hot-swappable)
  4. Sanity: v4l2-ctl --device=/dev/video0 --list-formats-out shows VP9F alongside H264_SLICE + HEVC_SLICE
  5. Pre-flight IRQ diagnostic: pr_warn("vp9 irq status=0x%x\n", status) in vdpu381_irq_handler for first 5 decodes (Janet Amendment C)
  6. First decode attempt: LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -i bbb_30s_720p.vp9.webm -vf hwdownload,format=nv12 -frames:v 5 -f rawvideo /tmp/hw-vp9.nv12
  7. Watch dmesg for IRQ status, watchdog firing, error patterns
  8. If first-light works: SW byte-compare on frame 100 against ffmpeg -c:v vp9 reference

Tonight's campaign summary

Started: ampere-vp9-enablement campaign opened at ~01:00, sibling to ampere-kernel-decoders close. Closed: 4 commits on branch + 7 docs in campaign repo, module compiles clean on RK3588 target kernel.

Wall-time: ~2h15m for Phase 0 (substrate) + Phase 1 (plan + 2 architect reviews) + Phase 2 (1400 LoC implementation) + Phase 2.1 (full register-packing). All architecturally-correct work; bit-level register correctness verified next session against hardware.

Next session continues at the hardware testing inflection point.