Files
fresnel-fourier/PRE_COMPACT_HANDOFF.md
T

169 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pre-Compact Handoff — Session 2026-05-14 (updated post iter31)
Use this doc to resume the fresnel-fourier campaign after Claude context compaction.
## TL;DR (read first)
- **Bug 4 (H.264 keyframe-partial): FIXED iter25 α-25** — H.264 10F byte-equal to SW reference.
- **Bug 5 (HEVC libva all-zero / frame 2+ divergence): FULLY FIXED** — frame 1 via α-25, frames 2+ via iter31 α-29. HEVC 10F byte-equal to SW.
- **VP9**: unchanged (HW=SW byte-equal, no regression from α-29).
- **MPEG-2 / VP8**: untestable through libva on current kernel boot (pre-existing libva single-device profile-probe limitation; auto-select picks rkvdec which doesn't expose those profiles).
Final score on rkvdec-routed anchors: **3/3 PASS**. MPEG-2/VP8 path orthogonal to Bug 4/5.
## Substrate state (where things live)
| Component | Location | Tip |
|---|---|---|
| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `c1f9738` on gitea master |
| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `23eb1bd` on gitea master |
| Libva backend (fresnel deploy) | `/home/mfritsche/src/libva-v4l2-request-fourier/` | sync to gitea master, `ninja -C build` |
| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=10 with iter17/20/21/22/23/27/31 diag printks |
| Kernel running on fresnel | `linux-fresnel-fourier 7.0-10` | diagnostic build; revert to clean 7.0-N before any production work |
| Test fixtures (fresnel) | `/home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm}` | 5 codecs at 720p10s or 1080p30 |
| Anchors (fresnel) | `/tmp/iter31/{libva,sw}_{h264,hevc,vp9}_10f.yuv` | per-frame SHA match SW |
| Memory | `~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/` | new: `feedback_va_st_rps_bits_is_slice_field.md` |
## Identity for gitea pushes
All `git.reauktion.de` interactions use `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`.
## Backend commits delivered (chronological, this campaign)
```
23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits ← Bug 5 remainder fix
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (env-gated, no-op default)
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes
6646b16 Revert iter28b DIAG (universal trim=40 broke IDR)
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
754be1d iter27 diag: env-gated VAAPI slice fields dump
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op, rkvdec ignores)
66ef848 iter26 α-26: populate decode_params.short_term_ref_pic_set_size (mis-routed; rkvdec ignores)
d062fec iter25 α-25 fix: add FRAME_MBS_ONLY to H264 dummy SPS
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt ← Bug 4 + Bug 5 frame 1 fix
```
## Campaign repo commits delivered
```
c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
422ecaf Add pre-compact handoff doc for session resumption
c15fc6c iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
8b17bf7 Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
02c4192 iter27/28: probe HEVC frame 2+ divergence; α-27/α-28 no-op
bf67900 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5
```
Phase docs (chronological): `phase4_iter21_plan.md`, `phase4_iter22_plan.md`, `phase8_iteration20_close.md``phase8_iteration27_close.md`, `phase8_iteration31_close.md`, `CAMPAIGN_SESSION_2026_05_14.md`.
## How to verify the current state
Run on fresnel after `git pull` + `ninja -C build` in `~/src/libva-v4l2-request-fourier`:
```bash
for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm ; do
name="${codec%%:*}"; fixture="${codec#*:}"
env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
ffmpeg -hide_banner -loglevel error -y \
-hwaccel vaapi -hwaccel_output_format vaapi \
-i "/home/mfritsche/fourier-test/$fixture" \
-vf "hwdownload,format=nv12" -frames:v 10 \
-f rawvideo -pix_fmt nv12 "/tmp/libva_${name}.yuv"
ffmpeg -hide_banner -loglevel error -y \
-i "/home/mfritsche/fourier-test/$fixture" \
-frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/sw_${name}.yuv"
if cmp -s "/tmp/libva_${name}.yuv" "/tmp/sw_${name}.yuv"; then
echo "$name: PASS"
else
echo "$name: FAIL"
fi
done
```
Expect: 3× PASS.
## Root cause summary
**Bug 4 (H.264) + Bug 5 frame 1 (HEVC IDR)**: `rkvdec_s_ctrl` returned -EBUSY when first SPS set tried to reset `image_fmt` on a busy CAPTURE queue. libva pre-allocated CAPTURE in CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: synthetic SPS injection pre-cap_pool_init so reset succeeds while queue empty. Source: `db0b7f9` + `d062fec`.
**Bug 5 frame 2+ (HEVC non-IDR)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (with stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this field to compute long-term-RPS bit offset; when zero AND `num_short_term_ref_pic_sets <= 1`, falls back to 0 → HW entropy decoder consumes slice-header bits as long-term-RPS → garbage state for every non-IDR slice. IDR is gated out (`!IDR_PIC` flag) so frame 1 unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI doc says st_rps_bits IS the slice-header bit count — α-26 mis-routed it into decode_params with same field name but different semantics). Source: `23eb1bd`.
## Open items (deferred)
### 1. Kernel substrate cleanup
`linux-fresnel-fourier 7.0-10` has 5+ accumulated `pr_info` diagnostic patches in:
- `drivers/media/v4l2-core/v4l2-ctrls-request.c` (iter21-24 setup/clone/loop traces)
- `drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c` (iter17/20/27/31 SPS/DP/slice dumps)
Before any production work, revert to clean 7.0-N (i.e., apply only the 3 PBP DTS patches + RFC v2 fence series, without diagnostics). Bump pkgrel to 11 and ship clean.
### 2. MPEG-2 / VP8 untestable through libva on current kernel boot
Libva backend's `find_codec_device` (`src/request.c:427`) selects ONE device for the entire session. On RK3399 with both rkvdec (`/dev/media0`+`/dev/video1`) and hantro (`/dev/media1`+`/dev/video2`+`/dev/video3`), the backend picks rkvdec — which exposes H264/HEVC/VP9 only.
Override with `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1` to force hantro for MPEG-2/VP8 testing. But that disables H264/HEVC/VP9 simultaneously, and the unconditional HEVC DECODE_MODE/START_CODE controls libva sets at CreateContext (`context.c:343-379`) fail on hantro with `Unable to set control(s): Invalid argument` — pre-existing, orthogonal to Bug 4/5.
Fix would require either:
- Libva backend multi-device probe + per-codec dispatch (~200-400 LOC, called out in `phase0_findings_iter7.md`).
- Conditional codec-init controls (skip controls hantro doesn't support).
### 3. iter29/iter30 env-gated diagnostics in backend
`LIBVA_HEVC_DUMP_SLICE_TAIL=1` and `LIBVA_TS_SCALE=N` are present in the backend but env-gated (no behavior change without env set). Could clean up to keep ship-ready source minimal. Or leave them — useful for future regression debugging. Low priority either way.
### 4. α-26 dead-code
`decode_params->short_term_ref_pic_set_size = picture->st_rps_bits` was mis-routed (right value to wrong field). rkvdec doesn't use decode_params's same-named field. Could revert α-26 to set 0 (which is correct per V4L2 spec when SPS-defined RPS bit count is unknown). Cosmetic.
## Memory entries (this session arc)
- **New**: `feedback_va_st_rps_bits_is_slice_field.md` — VAAPI's `picture->st_rps_bits` belongs in `slice_params`, not `decode_params`. Same field name, different semantics.
- **Updated**: `feedback_rkvdec_image_fmt_pre_seed.md` — note Bug 5 remainder is now fixed (not via image_fmt; see new entry).
- **Updated**: `feedback_libva_byte_correct_kernel_bug.md` — FULLY OVERTURNED (both Bug 4 and Bug 5 are libva-side fixes).
## Key commands quickreference
```bash
# Sync backend on fresnel + rebuild
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'
# 3-codec smoke (above script). Each codec ~5s.
# Run libva HEVC + capture rkvdec kernel iter27/31 printk
ssh fresnel 'sudo dmesg -C; env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
-vf "hwdownload,format=nv12" -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/x.yuv;
sudo dmesg | grep -E "rkvdec_iter2[07]|rkvdec_iter31"'
# kdirect (ffmpeg-v4l2request) reference
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request \
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
-frames:v 10 -f null -' # decode-only, dmesg has iter27/31 entries
# Reboot fresnel (sddm autologin reseats mfritsche per /etc/sddm.conf.d/20-autologin.conf)
ssh fresnel 'sudo systemctl reboot'; sleep 60
```
## What's safe to do without user confirmation
- Read/grep on noether, boltzmann, fresnel.
- Push to gitea (claude-noether identity).
- Reboot fresnel (sddm autologin restores session).
- Build kernel on boltzmann via `makepkg -e --noconfirm` in `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/`.
- Deploy kernel via `scp` + `sudo pacman -U`.
- Run ffmpeg/cmp tests on fresnel.
## What needs user confirmation
- Significant rebuild (~25-30 min CPU time on boltzmann, e.g., ffmpeg or fresh kernel build).
- Reverting kernel-substrate diagnostics to ship a clean kernel (mechanical but heavy).
- Architectural change to libva multi-device probe (Item 2) — affects libva backend design.