Files
fresnel-fourier/phase0_findings_iter17.md
marfrit 57051b665c iter17 Phase 0: kernel-side rkvdec_hevc_run diagnostic printk
Per iter16 close (Bug 4/5/6 confirmed kernel-side, libva byte-correct),
add a single pr_info at rkvdec_hevc_run entry dumping key state values
from run->sps / pps / slices_params[0] / decode_params. Build 7.0-3,
deploy, reboot, run libva-HEVC + kdirect-HEVC, diff dmesg output.

Outcome interpretations:
  identical -> bug is in rkvdec assemble_hw_*/config_registers/HW path
  different -> libva somehow leaks different struct contents via non-
                ioctl path despite identical V4L2 ioctls

Build running on boltzmann via kernel-agent workflow; pkgrel 7.0-2 -> 7.0-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:44:57 +00:00

61 lines
3.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 17 — Phase 0 + Phase 6 (kernel-side investigation)
Opens 2026-05-14 after iter16's confirmation that Bug 4 + Bug 5 + Bug 6 are kernel-side. Per `feedback_libva_byte_correct_kernel_bug.md`, libva-side hypothesis space is exhausted. Productive direction is kernel investigation via kernel-agent workflow.
## Locked research question (iter17)
> *"At the rkvdec_hevc_run() entry on RK3399, are the struct values pointed to by `run->sps`, `run->pps`, `run->slices_params[0]`, `run->decode_params` byte-equal between libva-triggered and kdirect-triggered HEVC decodes? If yes, the divergence is inside rkvdec's per-call hardware setup (assemble_hw_*, config_registers, IRQ handler). If no, libva is somehow passing different struct contents despite ioctls being byte-equal."*
## Approach
Kernel printk diagnostic at rkvdec_hevc_run entry (after rkvdec_hevc_run_preamble that populates `run` from V4L2 control state):
```c
pr_info("rkvdec_hevc_run: sps_id=%u dpb_buf=%u reorder=%u"
" w=%u h=%u bd_l=%u bd_c=%u chroma=%u"
" num_short_st=%u num_long_lt=%u"
" slices=%u nal_unit_type=%u slice_type=%u"
" decode_flags=0x%x\n", ...);
```
Build linux-fresnel-fourier 7.0-3 with this printk, deploy, reboot, run libva-HEVC + kdirect-HEVC each, capture dmesg, diff.
## Substrate state at iter17 open
- Fork tip `111f8ba` on noether + fresnel + gitea (α-20 reverted).
- Backend SHA `80e65c5a…` on fresnel.
- Kernel `7.0-2` currently. iter17 will deploy `7.0-3` with the diagnostic printk.
- Pkgrel bumped to 3 in PKGBUILD on boltzmann.
## Why this is the right next investigation
iter11iter16 cumulative empirical findings:
- libva submits byte-identical V4L2 controls (every rkvdec-read field).
- libva submits byte-identical OUTPUT bitstream (HEVC frame 1: 96890 bytes match input exactly; VP8: 300614 bytes match input exactly minus header).
- libva ioctl sequence has been brought to structural near-parity with kdirect (S_FMT CAPTURE, DMA_BUF_IOCTL_SYNC, IRAP/IDR flags, POC strip, timestamp counter).
At the V4L2 ioctl layer, libva == kdirect for every observable byte. The kernel must therefore see different state internally — either through some non-ioctl path (vb2 allocator difference, buffer alignment, request_fd ordering) OR within rkvdec's per-call processing (assemble_hw_*, config_registers).
The printk at `rkvdec_hevc_run` entry shows the kernel's view of `run->*` struct contents. If those are byte-equal between libva and kdirect, the bug is in assemble/config_registers/IRQ. If they differ, libva is somehow leaking state through a path we haven't traced.
## Phase 5 review acknowledgment
This is a diagnostic-only kernel patch (single printk, no behavior change). No Phase 5 review needed for the patch itself; the methodology is direct empirical comparison.
If the printk reveals a real bug, the actual fix will require Phase 5 architectural review per CLAUDE.md "reviews never skippable."
## Phase 7 plan
After 7.0-3 deploys:
1. Reboot fresnel (sddm autologin kicks in).
2. `sudo dmesg -C` to clear.
3. Run libva HEVC; capture dmesg; expect 3 frames worth of printk lines.
4. `sudo dmesg -C` again.
5. Run kdirect HEVC; capture dmesg.
6. Diff the two dmesg outputs.
## What outcomes mean
- **Identical**: kernel sees the same `run->*` data from both backends. Bug 4/5/6 is in rkvdec's assemble/config_registers/HW. Next step: instrument those.
- **Different**: libva somehow gets different struct contents into rkvdec despite identical ioctls. Investigate.