Files
fresnel-fourier/phase0_findings_iter17.md
T
marfrit 57051b665c iter17 Phase 0: kernel-side rkvdec_hevc_run diagnostic printk
Per iter16 close (Bug 4/5/6 confirmed kernel-side, libva byte-correct),
add a single pr_info at rkvdec_hevc_run entry dumping key state values
from run->sps / pps / slices_params[0] / decode_params. Build 7.0-3,
deploy, reboot, run libva-HEVC + kdirect-HEVC, diff dmesg output.

Outcome interpretations:
  identical -> bug is in rkvdec assemble_hw_*/config_registers/HW path
  different -> libva somehow leaks different struct contents via non-
                ioctl path despite identical V4L2 ioctls

Build running on boltzmann via kernel-agent workflow; pkgrel 7.0-2 -> 7.0-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:44:57 +00:00

3.5 KiB
Raw Blame History

Iteration 17 — Phase 0 + Phase 6 (kernel-side investigation)

Opens 2026-05-14 after iter16's confirmation that Bug 4 + Bug 5 + Bug 6 are kernel-side. Per feedback_libva_byte_correct_kernel_bug.md, libva-side hypothesis space is exhausted. Productive direction is kernel investigation via kernel-agent workflow.

Locked research question (iter17)

"At the rkvdec_hevc_run() entry on RK3399, are the struct values pointed to by run->sps, run->pps, run->slices_params[0], run->decode_params byte-equal between libva-triggered and kdirect-triggered HEVC decodes? If yes, the divergence is inside rkvdec's per-call hardware setup (assemble_hw_, config_registers, IRQ handler). If no, libva is somehow passing different struct contents despite ioctls being byte-equal."*

Approach

Kernel printk diagnostic at rkvdec_hevc_run entry (after rkvdec_hevc_run_preamble that populates run from V4L2 control state):

pr_info("rkvdec_hevc_run: sps_id=%u dpb_buf=%u reorder=%u"
        " w=%u h=%u bd_l=%u bd_c=%u chroma=%u"
        " num_short_st=%u num_long_lt=%u"
        " slices=%u nal_unit_type=%u slice_type=%u"
        " decode_flags=0x%x\n", ...);

Build linux-fresnel-fourier 7.0-3 with this printk, deploy, reboot, run libva-HEVC + kdirect-HEVC each, capture dmesg, diff.

Substrate state at iter17 open

  • Fork tip 111f8ba on noether + fresnel + gitea (α-20 reverted).
  • Backend SHA 80e65c5a… on fresnel.
  • Kernel 7.0-2 currently. iter17 will deploy 7.0-3 with the diagnostic printk.
  • Pkgrel bumped to 3 in PKGBUILD on boltzmann.

Why this is the right next investigation

iter11iter16 cumulative empirical findings:

  • libva submits byte-identical V4L2 controls (every rkvdec-read field).
  • libva submits byte-identical OUTPUT bitstream (HEVC frame 1: 96890 bytes match input exactly; VP8: 300614 bytes match input exactly minus header).
  • libva ioctl sequence has been brought to structural near-parity with kdirect (S_FMT CAPTURE, DMA_BUF_IOCTL_SYNC, IRAP/IDR flags, POC strip, timestamp counter).

At the V4L2 ioctl layer, libva == kdirect for every observable byte. The kernel must therefore see different state internally — either through some non-ioctl path (vb2 allocator difference, buffer alignment, request_fd ordering) OR within rkvdec's per-call processing (assemble_hw_*, config_registers).

The printk at rkvdec_hevc_run entry shows the kernel's view of run->* struct contents. If those are byte-equal between libva and kdirect, the bug is in assemble/config_registers/IRQ. If they differ, libva is somehow leaking state through a path we haven't traced.

Phase 5 review acknowledgment

This is a diagnostic-only kernel patch (single printk, no behavior change). No Phase 5 review needed for the patch itself; the methodology is direct empirical comparison.

If the printk reveals a real bug, the actual fix will require Phase 5 architectural review per CLAUDE.md "reviews never skippable."

Phase 7 plan

After 7.0-3 deploys:

  1. Reboot fresnel (sddm autologin kicks in).
  2. sudo dmesg -C to clear.
  3. Run libva HEVC; capture dmesg; expect 3 frames worth of printk lines.
  4. sudo dmesg -C again.
  5. Run kdirect HEVC; capture dmesg.
  6. Diff the two dmesg outputs.

What outcomes mean

  • Identical: kernel sees the same run->* data from both backends. Bug 4/5/6 is in rkvdec's assemble/config_registers/HW. Next step: instrument those.
  • Different: libva somehow gets different struct contents into rkvdec despite identical ioctls. Investigate.