Files
fresnel-fourier/PRE_COMPACT_HANDOFF.md
T
marfrit 407c7c56e1 iter39 Phase 4-6 LANDED on backend — Phase 7 awaiting fresnel power-on
Adds the iter39 sub-profile (H264 Hi10P + HEVC Main10) FR landing
materials and resumption sequence to the campaign repo.

- phase4_iter39_subprofile_plan.md: full Phase 4 plan with Phase 5
  sonnet-architect review amendments folded in. Documents the
  Option A/B/C/D scope tree, the locked Option C choice (full NV15→P010
  userspace unpack), the LOC breakdown (~180), and the test plan.
- phase7_iter39_test_rig.sh: end-to-end test script for fresnel. Encodes
  Hi10P + Main10 fixtures, runs libva vs kdirect bit-exact comparison
  (both via `-vf hwdownload,format=p010le` to normalize the NV15 stride
  difference between paths), SSIM_Y check vs SW reference, and verifies
  the iter38 5/5 baseline still holds.
- PRE_COMPACT_HANDOFF.md: TL;DR table row for iter39 (committed
  pending validation), Phase 7 resumption sequence, internals-summary
  for future-session resumption.

Backend tip: `662f887` (iter39 α-31) + `8746690` (unpack self-test) on
gitea master. Self-test passes on noether x86_64; compile-test clean on
boltzmann aarch64 native; self-review of commit vs Phase 5 amendments
APPROVED. Phase 7 actual decode test blocked on fresnel power-on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:22:34 +00:00

16 KiB
Raw Blame History

Pre-Compact Handoff — Session 2026-05-17 (iter39 sub-profile work landed, pending fresnel test)

Use this doc to resume the fresnel-fourier campaign after Claude context compaction. Iter38 close still holds (5/5 PASS, single libva session). Iter39 sub-profile work (H264 Hi10P + HEVC Main10) committed at backend 662f887 and awaiting Phase 7 validation on fresnel.

TL;DR

Bug / Item Status Fix iter
Bug 4 (H.264 keyframe-partial) FIXED iter25 α-25 (rkvdec image_fmt pre-seed via synthetic SPS at CreateContext)
Bug 5 (HEVC libva all-zero CAPTURE) FIXED iter25 α-25 (frame 1) + iter31 α-29 (frames 2+: slice_params.short_term_ref_pic_set_size from VAAPI st_rps_bits)
VP8 wrong output through libva FIXED iter33 α-30 (prepend 10/3 byte VP8 uncompressed header to OUTPUT — ffmpeg-vaapi strips it)
MPEG-2 HW differs from SW NOT A BUG hantro IDCT precision (≤3 LSB / pixel, SSIM > 0.9999); libva == kdirect bit-exact
Kernel diagnostic printks CLEANED iter32 (7.0-11) + iter34 (7.0-14)
Env-gated DIAG probes (iter29/30/33/35) CLEANED iter36 (-131 / +7 LOC)
α-26 mis-routed cosmetic REVERTED iter37 (1-line; rkvdec never read that field)
Libva multi-device probe DONE iter38 (single session serves all 5 codecs; no env override needed)
H264 Hi10P + HEVC Main10 sub-profile CODE LANDED — Phase 7 PENDING iter39 α-31 (backend 662f887): NV15 CAPTURE pix_fmt, synthetic-SPS bit_depth=2, NV15→P010 userspace unpack in copy_surface_to_image, P010 reporting in DeriveImage/QueryImageFormats. Self-tested (test_nv15_unpack passes on noether). Awaiting fresnel power-on for vainfo enumeration + libva.P010==kdirect.P010 bit-exact verification.
Codec libva 10F sha kdirect 10F sha SW 10F sha L==K L==SW
H.264 dd4f5f2d552c07bc same same
HEVC 108f925bb6cbb6c9 same same
VP9 cf35908ae0f9ab60 same same
VP8 d3231e5b6c0ee10b same same
MPEG-2 95c5905890c937d4 same 933b744134e47ba4 ~ (≤3 LSB IDCT precision)

5/5 PASS the libva-vs-kdirect bit-exact correctness contract. 4/5 also bit-equal SW.

vainfo with NO env override enumerates the union of profiles from rkvdec + hantro:

v4l2-request: auto-selected codec device: /dev/video3 + /dev/media1
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media0
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD

Substrate state

Component Location Tip
Campaign repo (this) /home/mfritsche/src/fresnel-fourier/ ba4b6fd on gitea master
Libva backend fork (noether) /home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/ 662f887 on gitea master (iter39 α-31; iter38b is 7ac934e)
Libva backend (fresnel deploy) /home/mfritsche/src/libva-v4l2-request-fourier/ sync to gitea master, ninja -C build
Kernel source (boltzmann) ~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/ pkgrel=14 clean
Kernel running on fresnel linux-fresnel-fourier 7.0-14 clean shipping kernel, no diagnostic printks
Test fixtures (fresnel) /home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm} 5 codecs at 720p10s or 1080p30
Memory ~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/ see entries below

Identity for gitea pushes

All git.reauktion.de interactions use the claude-noether identity (per memory feedback_gitea_as_claude_noether.md). Backend remote URL: ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git.

Device map on 7.0-14

/dev/video* and /dev/media* numbers SHIFT between kernel boots based on probe order. On the current 7.0-14 boot:

Driver /dev/videoN /dev/mediaN
rockchip-rga video0 n/a
rk3399-vpu-enc video1 (shared)
rk3399-vpu-dec (hantro) video2 media0
rkvdec video3 media1

v4l2-ctl --info + media-ctl -p if mapping uncertain on a fresh boot. Iter38 makes this irrelevant for typical use — libva auto-probes both.

Backend commits delivered (chronological, this campaign day)

7ac934e iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10)
c56a77b iter38: multi-device probe — single libva session serves all 5 codecs   ← architectural close
25d3e5f iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0
7db15a5 iter36: remove env-gated DIAG probes (iter29/30/33/35)
48fd028 iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents               (removed iter36)
7e0848d iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer     ← VP8 fix
bf3e3d8 iter33: extend VP8 DIAG to dump VAAPI probability struct directly       (removed iter36)
4b3c21b iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents             (removed iter36)
23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits  ← HEVC fix
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier              (removed iter36)
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes        (removed iter36)
6646b16 Revert iter28b DIAG: trim=40 universal-trim broke IDR frame 1
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
754be1d iter27 diag: env-gated VAAPI slice fields dump
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets             (no-op)
66ef848 iter26 α-26: decode_params.short_term_ref_pic_set_size from VAAPI       (reverted iter37)
d062fec iter25 α-25 fix: FRAME_MBS_ONLY flag for H264 dummy SPS
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt  ← H264+HEVC frame 1 fix

Load-bearing commits: db0b7f9 + d062fec (α-25), 23eb1bd (α-29), 7e0848d (α-30), c56a77b + 7ac934e (iter38 multi-device).

Campaign repo commits delivered (today's arc)

ba4b6fd iter38 close: multi-device probe — 5/5 codecs in one libva session
7e3eadf iter36 close: env-gated DIAG removed, 5/5 PASS retained
7c06c51 iter35 close: MPEG-2 verified libva-correct; HW IDCT precision intrinsic
70ddbd6 iter34 close: kernel 7.0-14 CLEAN ship — 5/5 codecs PASS
cd2d077 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS
51eee19 iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS
acacf3d iter32 close: kernel substrate cleanup landed → 7.0-11 SHIPPING
85cc178 Update campaign session doc: full-day arc closes at 3/3 PASS
fde8a25 Update handoff doc: HEVC Bug 5 fully fixed (3/3 PASS)
c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
422ecaf Add pre-compact handoff doc for session resumption
… earlier in day: c15fc6c, 8b17bf7, 02c4192, bf67900 (iter20-28 chain)

How to verify the current state

Run on fresnel (post-7.0-14 boot, no env override needed):

for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm vp8:bbb_720p10s_vp8.webm mpeg2:bbb_720p10s_mpeg2.ts; do
    name="${codec%%:*}"; fixture="${codec#*:}"
    env LIBVA_DRIVER_NAME=v4l2_request \
        LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
        ffmpeg -hide_banner -loglevel error -y \
        -hwaccel vaapi -hwaccel_output_format vaapi \
        -i "/home/mfritsche/fourier-test/$fixture" \
        -vf "hwdownload,format=nv12" -frames:v 10 \
        -f rawvideo -pix_fmt nv12 "/tmp/L_${name}.yuv"
    ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
        -i "/home/mfritsche/fourier-test/$fixture" -vf "hwdownload,format=nv12" \
        -frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/K_${name}.yuv"
    L=$(sha256sum "/tmp/L_${name}.yuv" | cut -c1-16)
    K=$(sha256sum "/tmp/K_${name}.yuv" | cut -c1-16)
    [ "$L" = "$K" ] && echo "$name: PASS" || echo "$name: FAIL"
done

Expect: 5× PASS.

Root cause summary

Bug 4 + Bug 5 frame 1 (iter25 α-25): rkvdec_s_ctrl returns -EBUSY when first SPS triggers image_fmt reset on a busy CAPTURE queue. libva pre-allocated 24 CAPTURE buffers at CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: inject synthetic SPS at CreateContext, pre-cap_pool_init, while CAPTURE is still empty.

Bug 5 frame 2+ (iter31 α-29): libva backend set slice_params->short_term_ref_pic_set_size = 0 (stale "VAAPI doesn't expose" comment). rkvdec's assemble_sw_rps (rkvdec-hevc.c:386-389) reads this; when zero with num_short_term_ref_pic_sets <= 1, falls back to 0 → entropy decoder consumes slice-header bits as long-term-RPS → garbage for every non-IDR slice. IDR is gated by !IDR_PIC so frame 1 was unaffected. Fix: slice_params->short_term_ref_pic_set_size = picture->st_rps_bits (VAAPI's field IS the slice-header bit count, per va_dec_hevc.h doc). α-26 had mis-routed this value into decode_params (same field name in V4L2, different semantics — SPS-side bit count) — reverted in iter37.

VP8 (iter33 α-30): ffmpeg-vaapi strips the VP8 uncompressed frame header (3 bytes interframe / 10 bytes keyframe) before submitting via VAAPI. ffmpeg-v4l2request keeps it. Hantro hard-codes first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3 and uses it for both mb_offset_bits and dct_part_offset. Without the prepended header in libva's OUTPUT, hantro's offset arithmetic lands inside the compressed bitstream and the entropy decoder produces garbage. Fix: in codec_store_buffer, prepend header_size zero bytes to OUTPUT for VP8 profile (hantro skips these bytes for actual parsing, uses ctrl-struct values).

Multi-device probe (iter38): VA_DRIVER_INIT opens BOTH rkvdec + hantro fds. RequestCreateConfig retargets driver_data->{video,media}_fd to the right device per profile (tearing down pools on switch). RequestQueryConfigProfiles unions across all open fds. iter38b fixed a latent off-by-one: bounds checks used MAX_CONFIG_ATTRIBUTES (10) but profile array is sized by MAX_PROFILES (11) — pre-iter38 never returned more than 9 profiles so the bug never bit.

Open items (low priority, optional polish)

  1. Multi-context simultaneously — current design supports only one decode context at a time across devices (device switch tears down pools). Could be expanded to per-context pools to support simultaneous mixed-codec decode. Not requested.

  2. Sub-profile supportPhase 6 LANDED 2026-05-17 (iter39 α-31, backend 662f887). H264 Hi10P + HEVC Main10 wired through the backend with NV15→P010 userspace unpack. VP9 Profile 2 explicitly excluded (RK3399 rkvdec kernel ctrl caps at PROFILE_0). PRIME-side P010 emission deferred (consumers wanting P010 must use the COPY path). Phase 7 test rig at phase7_iter39_test_rig.sh; awaiting fresnel.

Resumption sequence — iter39 Phase 7 (when fresnel is up)

# 1. Sync + build backend on fresnel
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && \
    git fetch && git reset --hard origin/master && \
    ninja -C build && \
    sudo install -m644 build/src/v4l2_request_drv_video.so /usr/lib/dri/'

# 2. Push test rig + run
scp ~/src/fresnel-fourier/phase7_iter39_test_rig.sh fresnel:/tmp/
ssh fresnel 'bash /tmp/phase7_iter39_test_rig.sh'

# Expected pass criteria:
#   1. vainfo lists VAProfileH264High10 + VAProfileHEVCMain10
#   2. libva.P010 SHA == kdirect.P010 SHA for Hi10P and Main10 fixtures
#      (both paths use -vf hwdownload,format=p010le to normalize NV15)
#   3. SSIM_Y vs libavcodec SW (yuv420p10le) >= 0.999
#   4. iter38 5/5 PASS baseline still holds on H264/HEVC/VP9/VP8/MPEG-2

Iter39 internals — pre-Phase 7 verification done

  • Self-test of nv15_unpack_plane_to_p010 (tests/test_nv15_unpack.c in backend): zero / all-max / 8 known vectors / remainder widths {1,2,3,7} / multi-row stride-padding / chroma-shape — ALL PASS on noether x86_64.
  • Compile-test: aarch64 native build on boltzmann clean (gcc 15.2.1 / libva 1.23.0 / libdrm 2.4.133), .so produced, 0 new warnings.
  • Self-review of commit 662f887 vs Phase 5 amendments: APPROVED. All 3 mandatory amendments + MAX_PROFILES bump + guard updates + NV15-stride source confirmed present.

Iter39 design notes (load-bearing)

  • driver_data->is_10bit is the per-session flag (request.h). Set in RequestCreateContext from config_object->profile, cleared in RequestDestroyContext. Drives image.c P010 reporting/unpack and context.c CAPTURE pix_fmt.
  • video_format cache invalidated on bit-depth transition (sibling to iter38's device-switch invalidation in request_switch_device_for_profile). Same session can now alternate Main → Main10 contexts.
  • Synthetic SPS pre-seed (α-25 lineage) extended for 10-bit: bit_depth_luma_minus8 = 2. Image_fmt resolution in rkvdec-h264-common.c:196 + rkvdec-hevc-common.c:467 dispatches on bit_depth_luma_minus8 only — profile_idc ignored, v4l2_ctrl_hevc_sps has no profile_idc field at all.
  • NV15 stride = V4L2-reported destination_bytesperlines[i] (kernel may pad above ceil(width/4)*5). NEVER assume width*2.
  • VP9 Profile 2 NOT in any path. Added comment in config.c near VAProfileVP9Profile0 case to deter future "completeness" PRs.

Memory entries (full campaign set)

  • feedback_rkvdec_image_fmt_pre_seed.mdα-25 (Bug 4 + Bug 5 frame 1)
  • feedback_va_st_rps_bits_is_slice_field.mdα-29 (Bug 5 frame 2+)
  • feedback_vaapi_strips_vp8_uncompressed_header.mdα-30 (VP8)
  • feedback_mpeg2_hw_sw_idct_precision.md — MPEG-2 PASS criterion = libva==kdirect (HW vs SW gap intrinsic per spec)
  • feedback_multi_device_probe_design.md — iter38 dual-fd architecture + MAX_PROFILES bounds gotcha
  • feedback_libva_byte_correct_kernel_bug.mdFULLY OVERTURNED (both Bug 4 + Bug 5 are libva-side fixes)
  • reference_fresnel_kernel_substrate.md — 7.0-14 clean, device-enumeration-shift caveat
  • MEMORY.md index updated

Key commands quickreference

# Sync backend on fresnel + rebuild
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'

# 5-codec smoke (above script). Each codec ~5s.

# Identify which video device is rkvdec vs hantro after a fresh boot
ssh fresnel 'for v in /dev/video*; do v4l2-ctl -d $v --info 2>/dev/null | grep -E "^Card type" | head -1 | awk -v dev=$v "{print dev,\$0}"; done'

# vainfo (auto-detects + opens both decoders since iter38)
ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request \
    LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
    vainfo'

# kdirect reference (works for any codec; hwaccel auto-routes)
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -vf "hwdownload,format=nv12" -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/y.yuv'

# Force single-device mode (skip iter38 alt-probe)
env LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 ...

# Reboot fresnel (sddm autologin reseats mfritsche)
ssh fresnel 'sudo systemctl reboot'; sleep 60

Safe vs needs-confirmation actions

Safe (no confirmation needed):

  • Read/grep on noether, boltzmann, fresnel
  • Push to gitea (claude-noether identity)
  • Reboot fresnel (sddm autologin restores session)
  • Build kernel on boltzmann via makepkg -ef --skipinteg --noconfirm
  • Deploy kernel via scp + sudo pacman -U
  • Run ffmpeg/cmp tests on fresnel

Needs confirmation:

  • Significant rebuild (~25-30 min CPU on boltzmann, e.g. ffmpeg full rebuild or fresh kernel build)
  • Per-context pool refactor (item 1 — would allow simultaneous mixed-codec decode but is invasive)
  • Sub-profile rollout (item 2)