Per iter16 close (Bug 4/5/6 confirmed kernel-side, libva byte-correct),
add a single pr_info at rkvdec_hevc_run entry dumping key state values
from run->sps / pps / slices_params[0] / decode_params. Build 7.0-3,
deploy, reboot, run libva-HEVC + kdirect-HEVC, diff dmesg output.
Outcome interpretations:
identical -> bug is in rkvdec assemble_hw_*/config_registers/HW path
different -> libva somehow leaks different struct contents via non-
ioctl path despite identical V4L2 ioctls
Build running on boltzmann via kernel-agent workflow; pkgrel 7.0-2 -> 7.0-3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.5 KiB
Iteration 17 — Phase 0 + Phase 6 (kernel-side investigation)
Opens 2026-05-14 after iter16's confirmation that Bug 4 + Bug 5 + Bug 6 are kernel-side. Per feedback_libva_byte_correct_kernel_bug.md, libva-side hypothesis space is exhausted. Productive direction is kernel investigation via kernel-agent workflow.
Locked research question (iter17)
"At the rkvdec_hevc_run() entry on RK3399, are the struct values pointed to by
run->sps,run->pps,run->slices_params[0],run->decode_paramsbyte-equal between libva-triggered and kdirect-triggered HEVC decodes? If yes, the divergence is inside rkvdec's per-call hardware setup (assemble_hw_, config_registers, IRQ handler). If no, libva is somehow passing different struct contents despite ioctls being byte-equal."*
Approach
Kernel printk diagnostic at rkvdec_hevc_run entry (after rkvdec_hevc_run_preamble that populates run from V4L2 control state):
pr_info("rkvdec_hevc_run: sps_id=%u dpb_buf=%u reorder=%u"
" w=%u h=%u bd_l=%u bd_c=%u chroma=%u"
" num_short_st=%u num_long_lt=%u"
" slices=%u nal_unit_type=%u slice_type=%u"
" decode_flags=0x%x\n", ...);
Build linux-fresnel-fourier 7.0-3 with this printk, deploy, reboot, run libva-HEVC + kdirect-HEVC each, capture dmesg, diff.
Substrate state at iter17 open
- Fork tip
111f8baon noether + fresnel + gitea (α-20 reverted). - Backend SHA
80e65c5a…on fresnel. - Kernel
7.0-2currently. iter17 will deploy7.0-3with the diagnostic printk. - Pkgrel bumped to 3 in PKGBUILD on boltzmann.
Why this is the right next investigation
iter11–iter16 cumulative empirical findings:
- libva submits byte-identical V4L2 controls (every rkvdec-read field).
- libva submits byte-identical OUTPUT bitstream (HEVC frame 1: 96890 bytes match input exactly; VP8: 300614 bytes match input exactly minus header).
- libva ioctl sequence has been brought to structural near-parity with kdirect (S_FMT CAPTURE, DMA_BUF_IOCTL_SYNC, IRAP/IDR flags, POC strip, timestamp counter).
At the V4L2 ioctl layer, libva == kdirect for every observable byte. The kernel must therefore see different state internally — either through some non-ioctl path (vb2 allocator difference, buffer alignment, request_fd ordering) OR within rkvdec's per-call processing (assemble_hw_*, config_registers).
The printk at rkvdec_hevc_run entry shows the kernel's view of run->* struct contents. If those are byte-equal between libva and kdirect, the bug is in assemble/config_registers/IRQ. If they differ, libva is somehow leaking state through a path we haven't traced.
Phase 5 review acknowledgment
This is a diagnostic-only kernel patch (single printk, no behavior change). No Phase 5 review needed for the patch itself; the methodology is direct empirical comparison.
If the printk reveals a real bug, the actual fix will require Phase 5 architectural review per CLAUDE.md "reviews never skippable."
Phase 7 plan
After 7.0-3 deploys:
- Reboot fresnel (sddm autologin kicks in).
sudo dmesg -Cto clear.- Run libva HEVC; capture dmesg; expect 3 frames worth of printk lines.
sudo dmesg -Cagain.- Run kdirect HEVC; capture dmesg.
- Diff the two dmesg outputs.
What outcomes mean
- Identical: kernel sees the same
run->*data from both backends. Bug 4/5/6 is in rkvdec's assemble/config_registers/HW. Next step: instrument those. - Different: libva somehow gets different struct contents into rkvdec despite identical ioctls. Investigate.