Files
fresnel-fourier/PRE_COMPACT_HANDOFF.md
marfrit 7d8d720631 iter39 Phase 7 CLOSE: vainfo + iter38 baseline PASS; Hi10P kernel/HW gap on RK3399
Phase 7 verification on fresnel (kernel 7.0-14 / linux-fresnel-fourier).

C1 vainfo enumeration: PASS — VAProfileH264High10 + VAProfileHEVCMain10
both listed; iter38 baseline 10 profiles intact at 12 total.

C5 iter38 5/5 baseline preserved: PASS — H.264 / HEVC / VP9 / VP8 /
MPEG-2 all libva == kdirect bit-exact, no regression from iter39
backend changes.

C2 Hi10P bit-exact vs kdirect: N/A — kdirect ALSO fails with EINVAL
(0 bytes output). The kernel ctrl table advertises Hi10P + NV15
CAPTURE but RK3399 HW doesn't actually decode 10-bit H264. Verified:
S_FMT(CAPTURE, NV15) succeeds; decode submits cleanly; CAPTURE buffer
returns all-zero. xxd 64 bytes of 0x00. SW reference has 222 unique
luma bytes.

C3 Main10 bit-exact vs kdirect: untested — system x265 is 8-bit-only
build, no kvazaar/x265-hbd in Arch repos, no Main10 sample downloaded
successfully. Same kernel-vs-HW caveat may apply.

Two backend fixes landed during Phase 7 (both pushed to gitea master):

  a13215d — skip pre-S_FMT NV15 CAPTURE format probe (rkvdec only
            advertises NV15 AFTER S_FMT(OUTPUT) + S_EXT_CTRLS(SPS))
  63fed87 — advertise P010 unconditionally in QueryImageFormats
            (ffmpeg-vaapi queries before CreateContext fires; gating
            on is_10bit hid the format from early consumers)

Without these the 10-bit decode pipeline can't even start. With them
it reaches the kernel cleanly.

Memory entry filed:
  feedback_rk3399_h264_hi10p_advertised_not_functional.md
  (kernel ctrl table necessary but NOT sufficient — always cross-check
   with kdirect before treating a profile as truly HW-supported)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 16:40:57 +00:00

247 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pre-Compact Handoff — Session 2026-05-17 (iter39 sub-profile work landed, pending fresnel test)
Use this doc to resume the fresnel-fourier campaign after Claude context compaction. **Iter38 close still holds (5/5 PASS, single libva session). Iter39 sub-profile work (H264 Hi10P + HEVC Main10) committed at backend `662f887` and awaiting Phase 7 validation on fresnel.**
## TL;DR
| Bug / Item | Status | Fix iter |
|---|---|---|
| Bug 4 (H.264 keyframe-partial) | **FIXED** | iter25 α-25 (rkvdec image_fmt pre-seed via synthetic SPS at CreateContext) |
| Bug 5 (HEVC libva all-zero CAPTURE) | **FIXED** | iter25 α-25 (frame 1) + iter31 α-29 (frames 2+: slice_params.short_term_ref_pic_set_size from VAAPI st_rps_bits) |
| VP8 wrong output through libva | **FIXED** | iter33 α-30 (prepend 10/3 byte VP8 uncompressed header to OUTPUT — ffmpeg-vaapi strips it) |
| MPEG-2 HW differs from SW | **NOT A BUG** | hantro IDCT precision (≤3 LSB / pixel, SSIM > 0.9999); libva == kdirect bit-exact |
| Kernel diagnostic printks | **CLEANED** | iter32 (7.0-11) + iter34 (7.0-14) |
| Env-gated DIAG probes (iter29/30/33/35) | **CLEANED** | iter36 (-131 / +7 LOC) |
| α-26 mis-routed cosmetic | **REVERTED** | iter37 (1-line; rkvdec never read that field) |
| Libva multi-device probe | **DONE** | iter38 (single session serves all 5 codecs; no env override needed) |
| H264 Hi10P + HEVC Main10 sub-profile | **CLOSED 2026-05-17 with kernel/HW caveat** | iter39 α-31 (backend `63fed87`): vainfo enumeration ✓, iter38 5/5 baseline preserved ✓, Hi10P decode path reaches kernel cleanly but RK3399 HW produces all-zero CAPTURE (kdirect fails equivalently — kernel-side gap, not backend). Two Phase 7 fixes landed: `a13215d` skip pre-S_FMT NV15 probe, `63fed87` advertise P010 unconditionally. Main10 untested (no fixture). See `phase7_iter39_close.md` + memory [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. |
| Codec | libva 10F sha | kdirect 10F sha | SW 10F sha | L==K | L==SW |
|---|---|---|---|---|---|
| H.264 | dd4f5f2d552c07bc | same | same | ✓ | ✓ |
| HEVC | 108f925bb6cbb6c9 | same | same | ✓ | ✓ |
| VP9 | cf35908ae0f9ab60 | same | same | ✓ | ✓ |
| VP8 | d3231e5b6c0ee10b | same | same | ✓ | ✓ |
| MPEG-2| 95c5905890c937d4 | same | 933b744134e47ba4 | ✓ | ~ (≤3 LSB IDCT precision) |
**5/5 PASS** the libva-vs-kdirect bit-exact correctness contract. 4/5 also bit-equal SW.
`vainfo` with NO env override enumerates the union of profiles from rkvdec + hantro:
```
v4l2-request: auto-selected codec device: /dev/video3 + /dev/media1
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media0
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
```
## Substrate state
| Component | Location | Tip |
|---|---|---|
| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `ba4b6fd` on gitea master |
| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `662f887` on gitea master (iter39 α-31; iter38b is `7ac934e`) |
| Libva backend (fresnel deploy) | `/home/mfritsche/src/libva-v4l2-request-fourier/` | sync to gitea master, `ninja -C build` |
| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=14 clean |
| Kernel running on fresnel | `linux-fresnel-fourier 7.0-14` | clean shipping kernel, no diagnostic printks |
| Test fixtures (fresnel) | `/home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm}` | 5 codecs at 720p10s or 1080p30 |
| Memory | `~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/` | see entries below |
## Identity for gitea pushes
All `git.reauktion.de` interactions use the `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`.
## Device map on 7.0-14
`/dev/video*` and `/dev/media*` numbers SHIFT between kernel boots based on probe order. On the current 7.0-14 boot:
| Driver | /dev/videoN | /dev/mediaN |
|---|---|---|
| rockchip-rga | video0 | n/a |
| rk3399-vpu-enc | video1 | (shared) |
| rk3399-vpu-dec (hantro) | **video2** | **media0** |
| rkvdec | **video3** | **media1** |
`v4l2-ctl --info` + `media-ctl -p` if mapping uncertain on a fresh boot. Iter38 makes this irrelevant for typical use — libva auto-probes both.
## Backend commits delivered (chronological, this campaign day)
```
7ac934e iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10)
c56a77b iter38: multi-device probe — single libva session serves all 5 codecs ← architectural close
25d3e5f iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0
7db15a5 iter36: remove env-gated DIAG probes (iter29/30/33/35)
48fd028 iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents (removed iter36)
7e0848d iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer ← VP8 fix
bf3e3d8 iter33: extend VP8 DIAG to dump VAAPI probability struct directly (removed iter36)
4b3c21b iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents (removed iter36)
23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits ← HEVC fix
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (removed iter36)
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes (removed iter36)
6646b16 Revert iter28b DIAG: trim=40 universal-trim broke IDR frame 1
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
754be1d iter27 diag: env-gated VAAPI slice fields dump
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op)
66ef848 iter26 α-26: decode_params.short_term_ref_pic_set_size from VAAPI (reverted iter37)
d062fec iter25 α-25 fix: FRAME_MBS_ONLY flag for H264 dummy SPS
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt ← H264+HEVC frame 1 fix
```
Load-bearing commits: `db0b7f9 + d062fec` (α-25), `23eb1bd` (α-29), `7e0848d` (α-30), `c56a77b + 7ac934e` (iter38 multi-device).
## Campaign repo commits delivered (today's arc)
```
ba4b6fd iter38 close: multi-device probe — 5/5 codecs in one libva session
7e3eadf iter36 close: env-gated DIAG removed, 5/5 PASS retained
7c06c51 iter35 close: MPEG-2 verified libva-correct; HW IDCT precision intrinsic
70ddbd6 iter34 close: kernel 7.0-14 CLEAN ship — 5/5 codecs PASS
cd2d077 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS
51eee19 iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS
acacf3d iter32 close: kernel substrate cleanup landed → 7.0-11 SHIPPING
85cc178 Update campaign session doc: full-day arc closes at 3/3 PASS
fde8a25 Update handoff doc: HEVC Bug 5 fully fixed (3/3 PASS)
c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
422ecaf Add pre-compact handoff doc for session resumption
… earlier in day: c15fc6c, 8b17bf7, 02c4192, bf67900 (iter20-28 chain)
```
## How to verify the current state
Run on fresnel (post-7.0-14 boot, no env override needed):
```bash
for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm vp8:bbb_720p10s_vp8.webm mpeg2:bbb_720p10s_mpeg2.ts; do
name="${codec%%:*}"; fixture="${codec#*:}"
env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
ffmpeg -hide_banner -loglevel error -y \
-hwaccel vaapi -hwaccel_output_format vaapi \
-i "/home/mfritsche/fourier-test/$fixture" \
-vf "hwdownload,format=nv12" -frames:v 10 \
-f rawvideo -pix_fmt nv12 "/tmp/L_${name}.yuv"
ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
-i "/home/mfritsche/fourier-test/$fixture" -vf "hwdownload,format=nv12" \
-frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/K_${name}.yuv"
L=$(sha256sum "/tmp/L_${name}.yuv" | cut -c1-16)
K=$(sha256sum "/tmp/K_${name}.yuv" | cut -c1-16)
[ "$L" = "$K" ] && echo "$name: PASS" || echo "$name: FAIL"
done
```
Expect: 5× PASS.
## Root cause summary
**Bug 4 + Bug 5 frame 1 (iter25 α-25)**: `rkvdec_s_ctrl` returns -EBUSY when first SPS triggers image_fmt reset on a busy CAPTURE queue. libva pre-allocated 24 CAPTURE buffers at CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: inject synthetic SPS at CreateContext, pre-cap_pool_init, while CAPTURE is still empty.
**Bug 5 frame 2+ (iter31 α-29)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this; when zero with `num_short_term_ref_pic_sets <= 1`, falls back to 0 → entropy decoder consumes slice-header bits as long-term-RPS → garbage for every non-IDR slice. IDR is gated by `!IDR_PIC` so frame 1 was unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI's field IS the slice-header bit count, per `va_dec_hevc.h` doc). α-26 had mis-routed this value into `decode_params` (same field name in V4L2, different semantics — SPS-side bit count) — reverted in iter37.
**VP8 (iter33 α-30)**: ffmpeg-vaapi strips the VP8 uncompressed frame header (3 bytes interframe / 10 bytes keyframe) before submitting via VAAPI. ffmpeg-v4l2request keeps it. Hantro hard-codes `first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3` and uses it for both `mb_offset_bits` and `dct_part_offset`. Without the prepended header in libva's OUTPUT, hantro's offset arithmetic lands inside the compressed bitstream and the entropy decoder produces garbage. Fix: in `codec_store_buffer`, prepend `header_size` zero bytes to OUTPUT for VP8 profile (hantro skips these bytes for actual parsing, uses ctrl-struct values).
**Multi-device probe (iter38)**: VA_DRIVER_INIT opens BOTH rkvdec + hantro fds. `RequestCreateConfig` retargets `driver_data->{video,media}_fd` to the right device per profile (tearing down pools on switch). `RequestQueryConfigProfiles` unions across all open fds. iter38b fixed a latent off-by-one: bounds checks used `MAX_CONFIG_ATTRIBUTES` (10) but profile array is sized by `MAX_PROFILES` (11) — pre-iter38 never returned more than 9 profiles so the bug never bit.
## Open items (low priority, optional polish)
1. **Multi-context simultaneously** — current design supports only one decode context at a time across devices (device switch tears down pools). Could be expanded to per-context pools to support simultaneous mixed-codec decode. Not requested.
2. ~~**Sub-profile support**~~*CLOSED 2026-05-17 with HW caveat (backend `63fed87`)*. H264 Hi10P + HEVC Main10 wired through the backend with NV15→P010 userspace unpack. VP9 Profile 2 explicitly excluded (RK3399 rkvdec kernel ctrl caps at PROFILE_0). PRIME-side P010 emission deferred. Phase 7 verified vainfo enumeration + iter38 5/5 baseline preserved. Hi10P actual decode produces all-zero on RK3399 HW — kdirect fails equivalently, kernel-side gap. Memory entry [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. Main10 untested (no fixture). Full details: `phase7_iter39_close.md`.
## Resumption sequence — iter39 Phase 7 (when fresnel is up)
```bash
# 1. Sync + build backend on fresnel
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && \
git fetch && git reset --hard origin/master && \
ninja -C build && \
sudo install -m644 build/src/v4l2_request_drv_video.so /usr/lib/dri/'
# 2. Push test rig + run
scp ~/src/fresnel-fourier/phase7_iter39_test_rig.sh fresnel:/tmp/
ssh fresnel 'bash /tmp/phase7_iter39_test_rig.sh'
# Expected pass criteria:
# 1. vainfo lists VAProfileH264High10 + VAProfileHEVCMain10
# 2. libva.P010 SHA == kdirect.P010 SHA for Hi10P and Main10 fixtures
# (both paths use -vf hwdownload,format=p010le to normalize NV15)
# 3. SSIM_Y vs libavcodec SW (yuv420p10le) >= 0.999
# 4. iter38 5/5 PASS baseline still holds on H264/HEVC/VP9/VP8/MPEG-2
```
## Iter39 internals — pre-Phase 7 verification done
- **Self-test** of `nv15_unpack_plane_to_p010` (`tests/test_nv15_unpack.c` in backend): zero / all-max / 8 known vectors / remainder widths {1,2,3,7} / multi-row stride-padding / chroma-shape — ALL PASS on noether x86_64.
- **Compile-test**: aarch64 native build on boltzmann clean (gcc 15.2.1 / libva 1.23.0 / libdrm 2.4.133), .so produced, 0 new warnings.
- **Self-review of commit 662f887** vs Phase 5 amendments: APPROVED. All 3 mandatory amendments + MAX_PROFILES bump + guard updates + NV15-stride source confirmed present.
## Iter39 design notes (load-bearing)
- `driver_data->is_10bit` is the per-session flag (request.h). Set in `RequestCreateContext` from `config_object->profile`, cleared in `RequestDestroyContext`. Drives image.c P010 reporting/unpack and context.c CAPTURE pix_fmt.
- `video_format` cache invalidated on bit-depth transition (sibling to iter38's device-switch invalidation in `request_switch_device_for_profile`). Same session can now alternate Main → Main10 contexts.
- Synthetic SPS pre-seed (α-25 lineage) extended for 10-bit: `bit_depth_luma_minus8 = 2`. Image_fmt resolution in `rkvdec-h264-common.c:196` + `rkvdec-hevc-common.c:467` dispatches on bit_depth_luma_minus8 only — profile_idc ignored, `v4l2_ctrl_hevc_sps` has no profile_idc field at all.
- NV15 stride = V4L2-reported `destination_bytesperlines[i]` (kernel may pad above `ceil(width/4)*5`). NEVER assume `width*2`.
- VP9 Profile 2 NOT in any path. Added comment in config.c near VAProfileVP9Profile0 case to deter future "completeness" PRs.
## Memory entries (full campaign set)
- `feedback_rkvdec_image_fmt_pre_seed.md`α-25 (Bug 4 + Bug 5 frame 1)
- `feedback_va_st_rps_bits_is_slice_field.md`α-29 (Bug 5 frame 2+)
- `feedback_vaapi_strips_vp8_uncompressed_header.md`α-30 (VP8)
- `feedback_mpeg2_hw_sw_idct_precision.md` — MPEG-2 PASS criterion = libva==kdirect (HW vs SW gap intrinsic per spec)
- `feedback_multi_device_probe_design.md` — iter38 dual-fd architecture + MAX_PROFILES bounds gotcha
- `feedback_libva_byte_correct_kernel_bug.md`**FULLY OVERTURNED** (both Bug 4 + Bug 5 are libva-side fixes)
- `reference_fresnel_kernel_substrate.md` — 7.0-14 clean, device-enumeration-shift caveat
- MEMORY.md index updated
## Key commands quickreference
```bash
# Sync backend on fresnel + rebuild
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'
# 5-codec smoke (above script). Each codec ~5s.
# Identify which video device is rkvdec vs hantro after a fresh boot
ssh fresnel 'for v in /dev/video*; do v4l2-ctl -d $v --info 2>/dev/null | grep -E "^Card type" | head -1 | awk -v dev=$v "{print dev,\$0}"; done'
# vainfo (auto-detects + opens both decoders since iter38)
ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
vainfo'
# kdirect reference (works for any codec; hwaccel auto-routes)
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
-vf "hwdownload,format=nv12" -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/y.yuv'
# Force single-device mode (skip iter38 alt-probe)
env LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 ...
# Reboot fresnel (sddm autologin reseats mfritsche)
ssh fresnel 'sudo systemctl reboot'; sleep 60
```
## Safe vs needs-confirmation actions
**Safe (no confirmation needed)**:
- Read/grep on noether, boltzmann, fresnel
- Push to gitea (claude-noether identity)
- Reboot fresnel (sddm autologin restores session)
- Build kernel on boltzmann via `makepkg -ef --skipinteg --noconfirm`
- Deploy kernel via `scp` + `sudo pacman -U`
- Run ffmpeg/cmp tests on fresnel
**Needs confirmation**:
- Significant rebuild (~25-30 min CPU on boltzmann, e.g. ffmpeg full rebuild or fresh kernel build)
- Per-context pool refactor (item 1 — would allow simultaneous mixed-codec decode but is invasive)
- Sub-profile rollout (item 2)