From e66c5c05837551b4a0d4ace8a5f1f53c8ac573b8 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Thu, 14 May 2026 19:23:09 +0000 Subject: [PATCH] =?UTF-8?q?Update=20handoff=20doc=20for=20final=20iter38?= =?UTF-8?q?=20close=20=E2=80=94=205/5=20PASS=20in=20single=20libva=20sessi?= =?UTF-8?q?on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- PRE_COMPACT_HANDOFF.md | 135 ++++++++++++++++++++++++----------------- 1 file changed, 81 insertions(+), 54 deletions(-) diff --git a/PRE_COMPACT_HANDOFF.md b/PRE_COMPACT_HANDOFF.md index ce6f8d4..46786a8 100644 --- a/PRE_COMPACT_HANDOFF.md +++ b/PRE_COMPACT_HANDOFF.md @@ -1,16 +1,19 @@ -# Pre-Compact Handoff — Session 2026-05-14 (FINAL post iter34) +# Pre-Compact Handoff — Session 2026-05-14 (FINAL post iter38) -Use this doc to resume the fresnel-fourier campaign after Claude context compaction. **Campaign is at full close state: 5/5 codecs PASS.** +Use this doc to resume the fresnel-fourier campaign after Claude context compaction. **Campaign at definitive close: 5/5 codecs PASS in a single libva session, no env-override required.** -## TL;DR (read first) +## TL;DR -| Bug | Status | Fix | +| Bug / Item | Status | Fix iter | |---|---|---| -| Bug 4 (H.264 keyframe-partial) | **FIXED** iter25 α-25 | rkvdec image_fmt pre-seed via synthetic SPS at CreateContext | -| Bug 5 (HEVC libva all-zero CAPTURE) | **FIXED** iter25 + iter31 | α-25 (image_fmt) + α-29 (slice_params.short_term_ref_pic_set_size from VAAPI st_rps_bits) | -| VP8 wrong output through libva | **FIXED** iter33 α-30 | prepend VP8 uncompressed frame header (10 kf / 3 inter) to OUTPUT | -| MPEG-2 HW differs from SW | **NOT A BUG** | hantro IDCT precision (≤1 LSB / ~67 px); libva==kdirect bit-exact | -| Kernel diagnostic printks | **CLEANED** iter32 + iter34 | 7.0-14 ship | +| Bug 4 (H.264 keyframe-partial) | **FIXED** | iter25 α-25 (rkvdec image_fmt pre-seed via synthetic SPS at CreateContext) | +| Bug 5 (HEVC libva all-zero CAPTURE) | **FIXED** | iter25 α-25 (frame 1) + iter31 α-29 (frames 2+: slice_params.short_term_ref_pic_set_size from VAAPI st_rps_bits) | +| VP8 wrong output through libva | **FIXED** | iter33 α-30 (prepend 10/3 byte VP8 uncompressed header to OUTPUT — ffmpeg-vaapi strips it) | +| MPEG-2 HW differs from SW | **NOT A BUG** | hantro IDCT precision (≤3 LSB / pixel, SSIM > 0.9999); libva == kdirect bit-exact | +| Kernel diagnostic printks | **CLEANED** | iter32 (7.0-11) + iter34 (7.0-14) | +| Env-gated DIAG probes (iter29/30/33/35) | **CLEANED** | iter36 (-131 / +7 LOC) | +| α-26 mis-routed cosmetic | **REVERTED** | iter37 (1-line; rkvdec never read that field) | +| Libva multi-device probe | **DONE** | iter38 (single session serves all 5 codecs; no env override needed) | | Codec | libva 10F sha | kdirect 10F sha | SW 10F sha | L==K | L==SW | |---|---|---|---|---|---| @@ -18,30 +21,47 @@ Use this doc to resume the fresnel-fourier campaign after Claude context compact | HEVC | 108f925bb6cbb6c9 | same | same | ✓ | ✓ | | VP9 | cf35908ae0f9ab60 | same | same | ✓ | ✓ | | VP8 | d3231e5b6c0ee10b | same | same | ✓ | ✓ | -| MPEG-2| 95c5905890c937d4 | same | 933b744134e47ba4 | ✓ | ~ | +| MPEG-2| 95c5905890c937d4 | same | 933b744134e47ba4 | ✓ | ~ (≤3 LSB IDCT precision) | -**5/5 PASS** the libva-vs-kdirect bit-exact contract. +**5/5 PASS** the libva-vs-kdirect bit-exact correctness contract. 4/5 also bit-equal SW. -## Substrate state (where things live) +`vainfo` with NO env override enumerates the union of profiles from rkvdec + hantro: + +``` +v4l2-request: auto-selected codec device: /dev/video3 + /dev/media1 +v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media0 +vainfo: Supported profile and entrypoints + VAProfileMPEG2Simple : VAEntrypointVLD + VAProfileMPEG2Main : VAEntrypointVLD + VAProfileH264Main : VAEntrypointVLD + VAProfileH264High : VAEntrypointVLD + VAProfileH264ConstrainedBaseline: VAEntrypointVLD + VAProfileH264MultiviewHigh : VAEntrypointVLD + VAProfileH264StereoHigh : VAEntrypointVLD + VAProfileHEVCMain : VAEntrypointVLD + VAProfileVP8Version0_3 : VAEntrypointVLD + VAProfileVP9Profile0 : VAEntrypointVLD +``` + +## Substrate state | Component | Location | Tip | |---|---|---| -| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `70ddbd6` on gitea master | -| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `7e0848d` on gitea master | +| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `ba4b6fd` on gitea master | +| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `7ac934e` on gitea master | | Libva backend (fresnel deploy) | `/home/mfritsche/src/libva-v4l2-request-fourier/` | sync to gitea master, `ninja -C build` | -| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=14 clean (no diagnostic printks) | -| Kernel running on fresnel | `linux-fresnel-fourier 7.0-14` | clean shipping kernel | +| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=14 clean | +| Kernel running on fresnel | `linux-fresnel-fourier 7.0-14` | clean shipping kernel, no diagnostic printks | | Test fixtures (fresnel) | `/home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm}` | 5 codecs at 720p10s or 1080p30 | -| Anchors (fresnel) | `/tmp/final/{L,K,S}_.yuv` | 10-frame YUV per codec per backend | | Memory | `~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/` | see entries below | ## Identity for gitea pushes -All `git.reauktion.de` interactions use `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`. +All `git.reauktion.de` interactions use the `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`. -## Device map on 7.0-14 (REVERSED from 7.0-13) +## Device map on 7.0-14 -`/dev/video*` and `/dev/media*` numbers SHIFT between kernel boots based on probe order. On 7.0-14 (current): +`/dev/video*` and `/dev/media*` numbers SHIFT between kernel boots based on probe order. On the current 7.0-14 boot: | Driver | /dev/videoN | /dev/mediaN | |---|---|---| @@ -50,31 +70,39 @@ All `git.reauktion.de` interactions use `claude-noether` identity (per memory `f | rk3399-vpu-dec (hantro) | **video2** | **media0** | | rkvdec | **video3** | **media1** | -Always re-probe via `v4l2-ctl --info` + `media-ctl -p` before hardcoding paths. +`v4l2-ctl --info` + `media-ctl -p` if mapping uncertain on a fresh boot. Iter38 makes this irrelevant for typical use — libva auto-probes both. ## Backend commits delivered (chronological, this campaign day) ``` -7e0848d iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer ← VP8 fix -bf3e3d8 iter33: extend VP8 DIAG to dump VAAPI probability struct directly (env-gated diag) -4b3c21b iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents (env-gated diag) +7ac934e iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10) +c56a77b iter38: multi-device probe — single libva session serves all 5 codecs ← architectural close +25d3e5f iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0 +7db15a5 iter36: remove env-gated DIAG probes (iter29/30/33/35) +48fd028 iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents (removed iter36) +7e0848d iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer ← VP8 fix +bf3e3d8 iter33: extend VP8 DIAG to dump VAAPI probability struct directly (removed iter36) +4b3c21b iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents (removed iter36) 23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits ← HEVC fix -68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (env-gated diag) -0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes (env-gated diag) +68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (removed iter36) +0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes (removed iter36) 6646b16 Revert iter28b DIAG: trim=40 universal-trim broke IDR frame 1 cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC 754be1d iter27 diag: env-gated VAAPI slice fields dump 719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op) -66ef848 iter26 α-26: decode_params.short_term_ref_pic_set_size from VAAPI (mis-routed cosmetic) +66ef848 iter26 α-26: decode_params.short_term_ref_pic_set_size from VAAPI (reverted iter37) d062fec iter25 α-25 fix: FRAME_MBS_ONLY flag for H264 dummy SPS db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt ← H264+HEVC frame 1 fix ``` -The load-bearing commits are `db0b7f9 + d062fec` (α-25), `23eb1bd` (α-29), `7e0848d` (α-30). The DIAG commits are env-gated and inactive by default. +Load-bearing commits: `db0b7f9 + d062fec` (α-25), `23eb1bd` (α-29), `7e0848d` (α-30), `c56a77b + 7ac934e` (iter38 multi-device). ## Campaign repo commits delivered (today's arc) ``` +ba4b6fd iter38 close: multi-device probe — 5/5 codecs in one libva session +7e3eadf iter36 close: env-gated DIAG removed, 5/5 PASS retained +7c06c51 iter35 close: MPEG-2 verified libva-correct; HW IDCT precision intrinsic 70ddbd6 iter34 close: kernel 7.0-14 CLEAN ship — 5/5 codecs PASS cd2d077 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS 51eee19 iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS @@ -83,22 +111,18 @@ acacf3d iter32 close: kernel substrate cleanup landed → 7.0-11 SHIPPING fde8a25 Update handoff doc: HEVC Bug 5 fully fixed (3/3 PASS) c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS 422ecaf Add pre-compact handoff doc for session resumption -…earlier in day: -c15fc6c, 8b17bf7, 02c4192, bf67900 (iter20-28 chain) +… earlier in day: c15fc6c, 8b17bf7, 02c4192, bf67900 (iter20-28 chain) ``` ## How to verify the current state -Run on fresnel (post-7.0-14 boot, devices: rkvdec /dev/video3+/dev/media1, hantro /dev/video2+/dev/media0): +Run on fresnel (post-7.0-14 boot, no env override needed): ```bash -for codec in h264:bbb_1080p30_h264.mp4:rk hevc:bbb_720p10s_hevc.mp4:rk vp9:bbb_720p10s_vp9.webm:rk vp8:bbb_720p10s_vp8.webm:ha mpeg2:bbb_720p10s_mpeg2.ts:ha; do - name="${codec%%:*}"; rest="${codec#*:}"; fixture="${rest%:*}"; dev="${rest##*:}" - if [ "$dev" = "rk" ]; then V=/dev/video3; M=/dev/media1 - else V=/dev/video2; M=/dev/media0; fi +for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm vp8:bbb_720p10s_vp8.webm mpeg2:bbb_720p10s_mpeg2.ts; do + name="${codec%%:*}"; fixture="${codec#*:}" env LIBVA_DRIVER_NAME=v4l2_request \ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \ - LIBVA_V4L2_REQUEST_VIDEO_PATH=$V LIBVA_V4L2_REQUEST_MEDIA_PATH=$M \ ffmpeg -hide_banner -loglevel error -y \ -hwaccel vaapi -hwaccel_output_format vaapi \ -i "/home/mfritsche/fourier-test/$fixture" \ @@ -117,27 +141,29 @@ Expect: 5× PASS. ## Root cause summary -**Bug 4 + Bug 5 frame 1 (iter25 α-25)**: `rkvdec_s_ctrl` returns -EBUSY when first SPS triggers image_fmt reset on busy CAPTURE queue. libva pre-allocated 24 CAPTURE buffers at CreateContext (iter5b-β) before per-frame S_EXT_CTRLS. Fix: inject synthetic SPS at CreateContext, pre-cap_pool_init, while CAPTURE is empty. +**Bug 4 + Bug 5 frame 1 (iter25 α-25)**: `rkvdec_s_ctrl` returns -EBUSY when first SPS triggers image_fmt reset on a busy CAPTURE queue. libva pre-allocated 24 CAPTURE buffers at CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: inject synthetic SPS at CreateContext, pre-cap_pool_init, while CAPTURE is still empty. -**Bug 5 frame 2+ (iter31 α-29)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this; when zero with `num_short_term_ref_pic_sets <= 1`, falls back to 0 → entropy decoder consumes slice-header bits as long-term-RPS → garbage for every non-IDR slice. IDR is gated by `!IDR_PIC` so frame 1 was unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI's field IS the slice-header bit count, per va_dec_hevc.h doc). α-26 had mis-routed this value into decode_params (same field name, different V4L2 semantics). +**Bug 5 frame 2+ (iter31 α-29)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this; when zero with `num_short_term_ref_pic_sets <= 1`, falls back to 0 → entropy decoder consumes slice-header bits as long-term-RPS → garbage for every non-IDR slice. IDR is gated by `!IDR_PIC` so frame 1 was unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI's field IS the slice-header bit count, per `va_dec_hevc.h` doc). α-26 had mis-routed this value into `decode_params` (same field name in V4L2, different semantics — SPS-side bit count) — reverted in iter37. -**VP8 (iter33 α-30)**: ffmpeg-vaapi strips the VP8 uncompressed frame header (3 bytes interframe / 10 bytes keyframe) before submitting via VAAPI. ffmpeg-v4l2request keeps it. Hantro hard-codes `first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3` and uses it for both `mb_offset_bits` and `dct_part_offset`. Without the prepended header in libva's OUTPUT, hantro's offset arithmetic lands inside the compressed bitstream and the entropy decoder produces garbage. Fix: in codec_store_buffer, prepend `header_size` zero bytes to OUTPUT for VP8 profile (hantro skips these bytes for actual parsing, uses ctrl-struct values). +**VP8 (iter33 α-30)**: ffmpeg-vaapi strips the VP8 uncompressed frame header (3 bytes interframe / 10 bytes keyframe) before submitting via VAAPI. ffmpeg-v4l2request keeps it. Hantro hard-codes `first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3` and uses it for both `mb_offset_bits` and `dct_part_offset`. Without the prepended header in libva's OUTPUT, hantro's offset arithmetic lands inside the compressed bitstream and the entropy decoder produces garbage. Fix: in `codec_store_buffer`, prepend `header_size` zero bytes to OUTPUT for VP8 profile (hantro skips these bytes for actual parsing, uses ctrl-struct values). -## Open items (low priority, no blockers) +**Multi-device probe (iter38)**: VA_DRIVER_INIT opens BOTH rkvdec + hantro fds. `RequestCreateConfig` retargets `driver_data->{video,media}_fd` to the right device per profile (tearing down pools on switch). `RequestQueryConfigProfiles` unions across all open fds. iter38b fixed a latent off-by-one: bounds checks used `MAX_CONFIG_ATTRIBUTES` (10) but profile array is sized by `MAX_PROFILES` (11) — pre-iter38 never returned more than 9 profiles so the bug never bit. -1. **Backend env-gated diagnostics cleanup** — `LIBVA_HEVC_DUMP_SLICE_TAIL` (iter29), `LIBVA_TS_SCALE` (iter30), `LIBVA_VP8_DUMP_FRAME` (iter33) are env-gated and inactive by default. Leave for future regression debugging or clean up. Low priority. +## Open items (low priority, optional polish) -2. **α-26 cosmetic revert** — `decode_params->short_term_ref_pic_set_size = picture->st_rps_bits` was mis-routed (rkvdec doesn't use that field). Could revert to 0. Cosmetic; no behavior change. +1. **Multi-context simultaneously** — current design supports only one decode context at a time across devices (device switch tears down pools). Could be expanded to per-context pools to support simultaneous mixed-codec decode. Not requested. -3. **Libva multi-device probe** — currently `find_codec_device` picks ONE device per session, requiring `LIBVA_V4L2_REQUEST_VIDEO_PATH` override to access both rkvdec (H264/HEVC/VP9) and hantro (VP8/MPEG-2) within one workflow. Architectural change in `src/request.c::find_codec_device` (~200-400 LOC). Design judgment from user welcome. +2. **Sub-profile support** — H264 Hi10P, HEVC Main10, VP9 Profile 2 are HW-supported on RK3399 but the libva backend has no entries in `pixelformat_for_profile` and elsewhere. Out of scope for this campaign. -## Memory entries (full set, this campaign) +## Memory entries (full campaign set) - `feedback_rkvdec_image_fmt_pre_seed.md` — α-25 (Bug 4 + Bug 5 frame 1) - `feedback_va_st_rps_bits_is_slice_field.md` — α-29 (Bug 5 frame 2+) - `feedback_vaapi_strips_vp8_uncompressed_header.md` — α-30 (VP8) -- `feedback_libva_byte_correct_kernel_bug.md` — FULLY OVERTURNED (both Bug 4 + Bug 5 are libva-side fixes) -- `reference_fresnel_kernel_substrate.md` — 7.0-14 clean, device-enumeration caveat noted +- `feedback_mpeg2_hw_sw_idct_precision.md` — MPEG-2 PASS criterion = libva==kdirect (HW vs SW gap intrinsic per spec) +- `feedback_multi_device_probe_design.md` — iter38 dual-fd architecture + MAX_PROFILES bounds gotcha +- `feedback_libva_byte_correct_kernel_bug.md` — **FULLY OVERTURNED** (both Bug 4 + Bug 5 are libva-side fixes) +- `reference_fresnel_kernel_substrate.md` — 7.0-14 clean, device-enumeration-shift caveat - MEMORY.md index updated ## Key commands quickreference @@ -146,24 +172,24 @@ Expect: 5× PASS. # Sync backend on fresnel + rebuild ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build' +# 5-codec smoke (above script). Each codec ~5s. + # Identify which video device is rkvdec vs hantro after a fresh boot ssh fresnel 'for v in /dev/video*; do v4l2-ctl -d $v --info 2>/dev/null | grep -E "^Card type" | head -1 | awk -v dev=$v "{print dev,\$0}"; done' -# 5-codec smoke (above script) - -# Run libva HEVC (rkvdec is currently /dev/video3 on 7.0-14) +# vainfo (auto-detects + opens both decoders since iter38) ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request \ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \ - LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 \ - ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \ - -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \ - -vf "hwdownload,format=nv12" -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/x.yuv' + vainfo' # kdirect reference (works for any codec; hwaccel auto-routes) ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \ -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \ -vf "hwdownload,format=nv12" -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/y.yuv' +# Force single-device mode (skip iter38 alt-probe) +env LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 ... + # Reboot fresnel (sddm autologin reseats mfritsche) ssh fresnel 'sudo systemctl reboot'; sleep 60 ``` @@ -180,4 +206,5 @@ ssh fresnel 'sudo systemctl reboot'; sleep 60 **Needs confirmation**: - Significant rebuild (~25-30 min CPU on boltzmann, e.g. ffmpeg full rebuild or fresh kernel build) -- Architectural changes to libva multi-device probe (item 3 above) — affects backend design +- Per-context pool refactor (item 1 — would allow simultaneous mixed-codec decode but is invasive) +- Sub-profile rollout (item 2)