diff --git a/phase0_findings_iter17.md b/phase0_findings_iter17.md new file mode 100644 index 0000000..c46f0de --- /dev/null +++ b/phase0_findings_iter17.md @@ -0,0 +1,60 @@ +# Iteration 17 — Phase 0 + Phase 6 (kernel-side investigation) + +Opens 2026-05-14 after iter16's confirmation that Bug 4 + Bug 5 + Bug 6 are kernel-side. Per `feedback_libva_byte_correct_kernel_bug.md`, libva-side hypothesis space is exhausted. Productive direction is kernel investigation via kernel-agent workflow. + +## Locked research question (iter17) + +> *"At the rkvdec_hevc_run() entry on RK3399, are the struct values pointed to by `run->sps`, `run->pps`, `run->slices_params[0]`, `run->decode_params` byte-equal between libva-triggered and kdirect-triggered HEVC decodes? If yes, the divergence is inside rkvdec's per-call hardware setup (assemble_hw_*, config_registers, IRQ handler). If no, libva is somehow passing different struct contents despite ioctls being byte-equal."* + +## Approach + +Kernel printk diagnostic at rkvdec_hevc_run entry (after rkvdec_hevc_run_preamble that populates `run` from V4L2 control state): + +```c +pr_info("rkvdec_hevc_run: sps_id=%u dpb_buf=%u reorder=%u" + " w=%u h=%u bd_l=%u bd_c=%u chroma=%u" + " num_short_st=%u num_long_lt=%u" + " slices=%u nal_unit_type=%u slice_type=%u" + " decode_flags=0x%x\n", ...); +``` + +Build linux-fresnel-fourier 7.0-3 with this printk, deploy, reboot, run libva-HEVC + kdirect-HEVC each, capture dmesg, diff. + +## Substrate state at iter17 open + +- Fork tip `111f8ba` on noether + fresnel + gitea (α-20 reverted). +- Backend SHA `80e65c5a…` on fresnel. +- Kernel `7.0-2` currently. iter17 will deploy `7.0-3` with the diagnostic printk. +- Pkgrel bumped to 3 in PKGBUILD on boltzmann. + +## Why this is the right next investigation + +iter11–iter16 cumulative empirical findings: +- libva submits byte-identical V4L2 controls (every rkvdec-read field). +- libva submits byte-identical OUTPUT bitstream (HEVC frame 1: 96890 bytes match input exactly; VP8: 300614 bytes match input exactly minus header). +- libva ioctl sequence has been brought to structural near-parity with kdirect (S_FMT CAPTURE, DMA_BUF_IOCTL_SYNC, IRAP/IDR flags, POC strip, timestamp counter). + +At the V4L2 ioctl layer, libva == kdirect for every observable byte. The kernel must therefore see different state internally — either through some non-ioctl path (vb2 allocator difference, buffer alignment, request_fd ordering) OR within rkvdec's per-call processing (assemble_hw_*, config_registers). + +The printk at `rkvdec_hevc_run` entry shows the kernel's view of `run->*` struct contents. If those are byte-equal between libva and kdirect, the bug is in assemble/config_registers/IRQ. If they differ, libva is somehow leaking state through a path we haven't traced. + +## Phase 5 review acknowledgment + +This is a diagnostic-only kernel patch (single printk, no behavior change). No Phase 5 review needed for the patch itself; the methodology is direct empirical comparison. + +If the printk reveals a real bug, the actual fix will require Phase 5 architectural review per CLAUDE.md "reviews never skippable." + +## Phase 7 plan + +After 7.0-3 deploys: +1. Reboot fresnel (sddm autologin kicks in). +2. `sudo dmesg -C` to clear. +3. Run libva HEVC; capture dmesg; expect 3 frames worth of printk lines. +4. `sudo dmesg -C` again. +5. Run kdirect HEVC; capture dmesg. +6. Diff the two dmesg outputs. + +## What outcomes mean + +- **Identical**: kernel sees the same `run->*` data from both backends. Bug 4/5/6 is in rkvdec's assemble/config_registers/HW. Next step: instrument those. +- **Different**: libva somehow gets different struct contents into rkvdec despite identical ioctls. Investigate.