169 lines
10 KiB
Markdown
169 lines
10 KiB
Markdown
# Pre-Compact Handoff — Session 2026-05-14 (updated post iter31)
|
||
|
||
Use this doc to resume the fresnel-fourier campaign after Claude context compaction.
|
||
|
||
## TL;DR (read first)
|
||
|
||
- **Bug 4 (H.264 keyframe-partial): FIXED iter25 α-25** — H.264 10F byte-equal to SW reference.
|
||
- **Bug 5 (HEVC libva all-zero / frame 2+ divergence): FULLY FIXED** — frame 1 via α-25, frames 2+ via iter31 α-29. HEVC 10F byte-equal to SW.
|
||
- **VP9**: unchanged (HW=SW byte-equal, no regression from α-29).
|
||
- **MPEG-2 / VP8**: untestable through libva on current kernel boot (pre-existing libva single-device profile-probe limitation; auto-select picks rkvdec which doesn't expose those profiles).
|
||
|
||
Final score on rkvdec-routed anchors: **3/3 PASS**. MPEG-2/VP8 path orthogonal to Bug 4/5.
|
||
|
||
## Substrate state (where things live)
|
||
|
||
| Component | Location | Tip |
|
||
|---|---|---|
|
||
| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `c1f9738` on gitea master |
|
||
| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `23eb1bd` on gitea master |
|
||
| Libva backend (fresnel deploy) | `/home/mfritsche/src/libva-v4l2-request-fourier/` | sync to gitea master, `ninja -C build` |
|
||
| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=10 with iter17/20/21/22/23/27/31 diag printks |
|
||
| Kernel running on fresnel | `linux-fresnel-fourier 7.0-10` | diagnostic build; revert to clean 7.0-N before any production work |
|
||
| Test fixtures (fresnel) | `/home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm}` | 5 codecs at 720p10s or 1080p30 |
|
||
| Anchors (fresnel) | `/tmp/iter31/{libva,sw}_{h264,hevc,vp9}_10f.yuv` | per-frame SHA match SW |
|
||
| Memory | `~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/` | new: `feedback_va_st_rps_bits_is_slice_field.md` |
|
||
|
||
## Identity for gitea pushes
|
||
|
||
All `git.reauktion.de` interactions use `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`.
|
||
|
||
## Backend commits delivered (chronological, this campaign)
|
||
|
||
```
|
||
23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits ← Bug 5 remainder fix
|
||
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (env-gated, no-op default)
|
||
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes
|
||
6646b16 Revert iter28b DIAG (universal trim=40 broke IDR)
|
||
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
|
||
754be1d iter27 diag: env-gated VAAPI slice fields dump
|
||
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op, rkvdec ignores)
|
||
66ef848 iter26 α-26: populate decode_params.short_term_ref_pic_set_size (mis-routed; rkvdec ignores)
|
||
d062fec iter25 α-25 fix: add FRAME_MBS_ONLY to H264 dummy SPS
|
||
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt ← Bug 4 + Bug 5 frame 1 fix
|
||
```
|
||
|
||
## Campaign repo commits delivered
|
||
|
||
```
|
||
c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
|
||
422ecaf Add pre-compact handoff doc for session resumption
|
||
c15fc6c iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
|
||
8b17bf7 Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
|
||
02c4192 iter27/28: probe HEVC frame 2+ divergence; α-27/α-28 no-op
|
||
bf67900 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5
|
||
```
|
||
|
||
Phase docs (chronological): `phase4_iter21_plan.md`, `phase4_iter22_plan.md`, `phase8_iteration20_close.md` … `phase8_iteration27_close.md`, `phase8_iteration31_close.md`, `CAMPAIGN_SESSION_2026_05_14.md`.
|
||
|
||
## How to verify the current state
|
||
|
||
Run on fresnel after `git pull` + `ninja -C build` in `~/src/libva-v4l2-request-fourier`:
|
||
|
||
```bash
|
||
for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm ; do
|
||
name="${codec%%:*}"; fixture="${codec#*:}"
|
||
env LIBVA_DRIVER_NAME=v4l2_request \
|
||
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
|
||
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
|
||
ffmpeg -hide_banner -loglevel error -y \
|
||
-hwaccel vaapi -hwaccel_output_format vaapi \
|
||
-i "/home/mfritsche/fourier-test/$fixture" \
|
||
-vf "hwdownload,format=nv12" -frames:v 10 \
|
||
-f rawvideo -pix_fmt nv12 "/tmp/libva_${name}.yuv"
|
||
ffmpeg -hide_banner -loglevel error -y \
|
||
-i "/home/mfritsche/fourier-test/$fixture" \
|
||
-frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/sw_${name}.yuv"
|
||
if cmp -s "/tmp/libva_${name}.yuv" "/tmp/sw_${name}.yuv"; then
|
||
echo "$name: PASS"
|
||
else
|
||
echo "$name: FAIL"
|
||
fi
|
||
done
|
||
```
|
||
|
||
Expect: 3× PASS.
|
||
|
||
## Root cause summary
|
||
|
||
**Bug 4 (H.264) + Bug 5 frame 1 (HEVC IDR)**: `rkvdec_s_ctrl` returned -EBUSY when first SPS set tried to reset `image_fmt` on a busy CAPTURE queue. libva pre-allocated CAPTURE in CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: synthetic SPS injection pre-cap_pool_init so reset succeeds while queue empty. Source: `db0b7f9` + `d062fec`.
|
||
|
||
**Bug 5 frame 2+ (HEVC non-IDR)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (with stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this field to compute long-term-RPS bit offset; when zero AND `num_short_term_ref_pic_sets <= 1`, falls back to 0 → HW entropy decoder consumes slice-header bits as long-term-RPS → garbage state for every non-IDR slice. IDR is gated out (`!IDR_PIC` flag) so frame 1 unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI doc says st_rps_bits IS the slice-header bit count — α-26 mis-routed it into decode_params with same field name but different semantics). Source: `23eb1bd`.
|
||
|
||
## Open items (deferred)
|
||
|
||
### 1. Kernel substrate cleanup
|
||
|
||
`linux-fresnel-fourier 7.0-10` has 5+ accumulated `pr_info` diagnostic patches in:
|
||
- `drivers/media/v4l2-core/v4l2-ctrls-request.c` (iter21-24 setup/clone/loop traces)
|
||
- `drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c` (iter17/20/27/31 SPS/DP/slice dumps)
|
||
|
||
Before any production work, revert to clean 7.0-N (i.e., apply only the 3 PBP DTS patches + RFC v2 fence series, without diagnostics). Bump pkgrel to 11 and ship clean.
|
||
|
||
### 2. MPEG-2 / VP8 untestable through libva on current kernel boot
|
||
|
||
Libva backend's `find_codec_device` (`src/request.c:427`) selects ONE device for the entire session. On RK3399 with both rkvdec (`/dev/media0`+`/dev/video1`) and hantro (`/dev/media1`+`/dev/video2`+`/dev/video3`), the backend picks rkvdec — which exposes H264/HEVC/VP9 only.
|
||
|
||
Override with `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1` to force hantro for MPEG-2/VP8 testing. But that disables H264/HEVC/VP9 simultaneously, and the unconditional HEVC DECODE_MODE/START_CODE controls libva sets at CreateContext (`context.c:343-379`) fail on hantro with `Unable to set control(s): Invalid argument` — pre-existing, orthogonal to Bug 4/5.
|
||
|
||
Fix would require either:
|
||
- Libva backend multi-device probe + per-codec dispatch (~200-400 LOC, called out in `phase0_findings_iter7.md`).
|
||
- Conditional codec-init controls (skip controls hantro doesn't support).
|
||
|
||
### 3. iter29/iter30 env-gated diagnostics in backend
|
||
|
||
`LIBVA_HEVC_DUMP_SLICE_TAIL=1` and `LIBVA_TS_SCALE=N` are present in the backend but env-gated (no behavior change without env set). Could clean up to keep ship-ready source minimal. Or leave them — useful for future regression debugging. Low priority either way.
|
||
|
||
### 4. α-26 dead-code
|
||
|
||
`decode_params->short_term_ref_pic_set_size = picture->st_rps_bits` was mis-routed (right value to wrong field). rkvdec doesn't use decode_params's same-named field. Could revert α-26 to set 0 (which is correct per V4L2 spec when SPS-defined RPS bit count is unknown). Cosmetic.
|
||
|
||
## Memory entries (this session arc)
|
||
|
||
- **New**: `feedback_va_st_rps_bits_is_slice_field.md` — VAAPI's `picture->st_rps_bits` belongs in `slice_params`, not `decode_params`. Same field name, different semantics.
|
||
- **Updated**: `feedback_rkvdec_image_fmt_pre_seed.md` — note Bug 5 remainder is now fixed (not via image_fmt; see new entry).
|
||
- **Updated**: `feedback_libva_byte_correct_kernel_bug.md` — FULLY OVERTURNED (both Bug 4 and Bug 5 are libva-side fixes).
|
||
|
||
## Key commands quickreference
|
||
|
||
```bash
|
||
# Sync backend on fresnel + rebuild
|
||
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'
|
||
|
||
# 3-codec smoke (above script). Each codec ~5s.
|
||
|
||
# Run libva HEVC + capture rkvdec kernel iter27/31 printk
|
||
ssh fresnel 'sudo dmesg -C; env LIBVA_DRIVER_NAME=v4l2_request \
|
||
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
|
||
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
|
||
ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
|
||
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
|
||
-vf "hwdownload,format=nv12" -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/x.yuv;
|
||
sudo dmesg | grep -E "rkvdec_iter2[07]|rkvdec_iter31"'
|
||
|
||
# kdirect (ffmpeg-v4l2request) reference
|
||
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request \
|
||
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
|
||
-frames:v 10 -f null -' # decode-only, dmesg has iter27/31 entries
|
||
|
||
# Reboot fresnel (sddm autologin reseats mfritsche per /etc/sddm.conf.d/20-autologin.conf)
|
||
ssh fresnel 'sudo systemctl reboot'; sleep 60
|
||
```
|
||
|
||
## What's safe to do without user confirmation
|
||
|
||
- Read/grep on noether, boltzmann, fresnel.
|
||
- Push to gitea (claude-noether identity).
|
||
- Reboot fresnel (sddm autologin restores session).
|
||
- Build kernel on boltzmann via `makepkg -e --noconfirm` in `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/`.
|
||
- Deploy kernel via `scp` + `sudo pacman -U`.
|
||
- Run ffmpeg/cmp tests on fresnel.
|
||
|
||
## What needs user confirmation
|
||
|
||
- Significant rebuild (~25-30 min CPU time on boltzmann, e.g., ffmpeg or fresh kernel build).
|
||
- Reverting kernel-substrate diagnostics to ship a clean kernel (mechanical but heavy).
|
||
- Architectural change to libva multi-device probe (Item 2) — affects libva backend design.
|