Files
fresnel-fourier/PRE_COMPACT_HANDOFF.md
T

10 KiB
Raw Blame History

Pre-Compact Handoff — Session 2026-05-14 (updated post iter31)

Use this doc to resume the fresnel-fourier campaign after Claude context compaction.

TL;DR (read first)

  • Bug 4 (H.264 keyframe-partial): FIXED iter25 α-25 — H.264 10F byte-equal to SW reference.
  • Bug 5 (HEVC libva all-zero / frame 2+ divergence): FULLY FIXED — frame 1 via α-25, frames 2+ via iter31 α-29. HEVC 10F byte-equal to SW.
  • VP9: unchanged (HW=SW byte-equal, no regression from α-29).
  • MPEG-2 / VP8: untestable through libva on current kernel boot (pre-existing libva single-device profile-probe limitation; auto-select picks rkvdec which doesn't expose those profiles).

Final score on rkvdec-routed anchors: 3/3 PASS. MPEG-2/VP8 path orthogonal to Bug 4/5.

Substrate state (where things live)

Component Location Tip
Campaign repo (this) /home/mfritsche/src/fresnel-fourier/ c1f9738 on gitea master
Libva backend fork (noether) /home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/ 23eb1bd on gitea master
Libva backend (fresnel deploy) /home/mfritsche/src/libva-v4l2-request-fourier/ sync to gitea master, ninja -C build
Kernel source (boltzmann) ~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/ pkgrel=10 with iter17/20/21/22/23/27/31 diag printks
Kernel running on fresnel linux-fresnel-fourier 7.0-10 diagnostic build; revert to clean 7.0-N before any production work
Test fixtures (fresnel) /home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm} 5 codecs at 720p10s or 1080p30
Anchors (fresnel) /tmp/iter31/{libva,sw}_{h264,hevc,vp9}_10f.yuv per-frame SHA match SW
Memory ~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/ new: feedback_va_st_rps_bits_is_slice_field.md

Identity for gitea pushes

All git.reauktion.de interactions use claude-noether identity (per memory feedback_gitea_as_claude_noether.md). Backend remote URL: ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git.

Backend commits delivered (chronological, this campaign)

23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits   ← Bug 5 remainder fix
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (env-gated, no-op default)
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes
6646b16 Revert iter28b DIAG (universal trim=40 broke IDR)
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
754be1d iter27 diag: env-gated VAAPI slice fields dump
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op, rkvdec ignores)
66ef848 iter26 α-26: populate decode_params.short_term_ref_pic_set_size (mis-routed; rkvdec ignores)
d062fec iter25 α-25 fix: add FRAME_MBS_ONLY to H264 dummy SPS
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt   ← Bug 4 + Bug 5 frame 1 fix

Campaign repo commits delivered

c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
422ecaf Add pre-compact handoff doc for session resumption
c15fc6c iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
8b17bf7 Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
02c4192 iter27/28: probe HEVC frame 2+ divergence; α-27/α-28 no-op
bf67900 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

Phase docs (chronological): phase4_iter21_plan.md, phase4_iter22_plan.md, phase8_iteration20_close.mdphase8_iteration27_close.md, phase8_iteration31_close.md, CAMPAIGN_SESSION_2026_05_14.md.

How to verify the current state

Run on fresnel after git pull + ninja -C build in ~/src/libva-v4l2-request-fourier:

for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm ; do
    name="${codec%%:*}"; fixture="${codec#*:}"
    env LIBVA_DRIVER_NAME=v4l2_request \
        LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
        LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
        LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
        ffmpeg -hide_banner -loglevel error -y \
        -hwaccel vaapi -hwaccel_output_format vaapi \
        -i "/home/mfritsche/fourier-test/$fixture" \
        -vf "hwdownload,format=nv12" -frames:v 10 \
        -f rawvideo -pix_fmt nv12 "/tmp/libva_${name}.yuv"
    ffmpeg -hide_banner -loglevel error -y \
        -i "/home/mfritsche/fourier-test/$fixture" \
        -frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/sw_${name}.yuv"
    if cmp -s "/tmp/libva_${name}.yuv" "/tmp/sw_${name}.yuv"; then
        echo "$name: PASS"
    else
        echo "$name: FAIL"
    fi
done

Expect: 3× PASS.

Root cause summary

Bug 4 (H.264) + Bug 5 frame 1 (HEVC IDR): rkvdec_s_ctrl returned -EBUSY when first SPS set tried to reset image_fmt on a busy CAPTURE queue. libva pre-allocated CAPTURE in CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: synthetic SPS injection pre-cap_pool_init so reset succeeds while queue empty. Source: db0b7f9 + d062fec.

Bug 5 frame 2+ (HEVC non-IDR): libva backend set slice_params->short_term_ref_pic_set_size = 0 (with stale "VAAPI doesn't expose" comment). rkvdec's assemble_sw_rps (rkvdec-hevc.c:386-389) reads this field to compute long-term-RPS bit offset; when zero AND num_short_term_ref_pic_sets <= 1, falls back to 0 → HW entropy decoder consumes slice-header bits as long-term-RPS → garbage state for every non-IDR slice. IDR is gated out (!IDR_PIC flag) so frame 1 unaffected. Fix: slice_params->short_term_ref_pic_set_size = picture->st_rps_bits (VAAPI doc says st_rps_bits IS the slice-header bit count — α-26 mis-routed it into decode_params with same field name but different semantics). Source: 23eb1bd.

Open items (deferred)

1. Kernel substrate cleanup

linux-fresnel-fourier 7.0-10 has 5+ accumulated pr_info diagnostic patches in:

  • drivers/media/v4l2-core/v4l2-ctrls-request.c (iter21-24 setup/clone/loop traces)
  • drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c (iter17/20/27/31 SPS/DP/slice dumps)

Before any production work, revert to clean 7.0-N (i.e., apply only the 3 PBP DTS patches + RFC v2 fence series, without diagnostics). Bump pkgrel to 11 and ship clean.

2. MPEG-2 / VP8 untestable through libva on current kernel boot

Libva backend's find_codec_device (src/request.c:427) selects ONE device for the entire session. On RK3399 with both rkvdec (/dev/media0+/dev/video1) and hantro (/dev/media1+/dev/video2+/dev/video3), the backend picks rkvdec — which exposes H264/HEVC/VP9 only.

Override with LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 to force hantro for MPEG-2/VP8 testing. But that disables H264/HEVC/VP9 simultaneously, and the unconditional HEVC DECODE_MODE/START_CODE controls libva sets at CreateContext (context.c:343-379) fail on hantro with Unable to set control(s): Invalid argument — pre-existing, orthogonal to Bug 4/5.

Fix would require either:

  • Libva backend multi-device probe + per-codec dispatch (~200-400 LOC, called out in phase0_findings_iter7.md).
  • Conditional codec-init controls (skip controls hantro doesn't support).

3. iter29/iter30 env-gated diagnostics in backend

LIBVA_HEVC_DUMP_SLICE_TAIL=1 and LIBVA_TS_SCALE=N are present in the backend but env-gated (no behavior change without env set). Could clean up to keep ship-ready source minimal. Or leave them — useful for future regression debugging. Low priority either way.

4. α-26 dead-code

decode_params->short_term_ref_pic_set_size = picture->st_rps_bits was mis-routed (right value to wrong field). rkvdec doesn't use decode_params's same-named field. Could revert α-26 to set 0 (which is correct per V4L2 spec when SPS-defined RPS bit count is unknown). Cosmetic.

Memory entries (this session arc)

  • New: feedback_va_st_rps_bits_is_slice_field.md — VAAPI's picture->st_rps_bits belongs in slice_params, not decode_params. Same field name, different semantics.
  • Updated: feedback_rkvdec_image_fmt_pre_seed.md — note Bug 5 remainder is now fixed (not via image_fmt; see new entry).
  • Updated: feedback_libva_byte_correct_kernel_bug.md — FULLY OVERTURNED (both Bug 4 and Bug 5 are libva-side fixes).

Key commands quickreference

# Sync backend on fresnel + rebuild
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'

# 3-codec smoke (above script). Each codec ~5s.

# Run libva HEVC + capture rkvdec kernel iter27/31 printk
ssh fresnel 'sudo dmesg -C; env LIBVA_DRIVER_NAME=v4l2_request \
    LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
    LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
    LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
    ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -vf "hwdownload,format=nv12" -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/x.yuv;
    sudo dmesg | grep -E "rkvdec_iter2[07]|rkvdec_iter31"'

# kdirect (ffmpeg-v4l2request) reference
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -frames:v 10 -f null -'  # decode-only, dmesg has iter27/31 entries

# Reboot fresnel (sddm autologin reseats mfritsche per /etc/sddm.conf.d/20-autologin.conf)
ssh fresnel 'sudo systemctl reboot'; sleep 60

What's safe to do without user confirmation

  • Read/grep on noether, boltzmann, fresnel.
  • Push to gitea (claude-noether identity).
  • Reboot fresnel (sddm autologin restores session).
  • Build kernel on boltzmann via makepkg -e --noconfirm in ~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/.
  • Deploy kernel via scp + sudo pacman -U.
  • Run ffmpeg/cmp tests on fresnel.

What needs user confirmation

  • Significant rebuild (~25-30 min CPU time on boltzmann, e.g., ffmpeg or fresh kernel build).
  • Reverting kernel-substrate diagnostics to ship a clean kernel (mechanical but heavy).
  • Architectural change to libva multi-device probe (Item 2) — affects libva backend design.