Files
fresnel-fourier/PRE_COMPACT_HANDOFF.md
T

12 KiB
Raw Blame History

Pre-Compact Handoff — Session 2026-05-14

Use this doc to resume the fresnel-fourier campaign after Claude context compaction.

TL;DR (read first)

  • Bug 4 (H.264 keyframe-partial): FIXED — H.264 10F byte-equal to SW reference.
  • Bug 5 (HEVC libva all-zero): PARTIAL — frame 1 byte-equal to SW; frame 2+ diverges (separate ffmpeg-vaapi slice_data inflation bug, deferred).
  • VP9: unchanged (HW=SW byte-equal).
  • MPEG-2 / VP8: untestable through libva on current kernel boot (pre-existing libva single-device profile-probe limitation; auto-select picks rkvdec which doesn't expose those profiles).
  • Root cause identified after 6 kernel-printk iterations: rkvdec_s_ctrl returns -EBUSY when first SPS triggers image_fmt reset on a busy CAPTURE queue. Fixed by synthetic SPS injection at libva CreateContext.

Substrate state (where things live)

Component Location Tip
Campaign repo (this) /home/mfritsche/src/fresnel-fourier/ c15fc6c on gitea master
Libva backend fork (noether) /home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/ 6646b16 on gitea master
Libva backend (fresnel deploy) /home/mfritsche/src/libva-v4l2-request-fourier/ sync to gitea master, run ninja -C build
Kernel source (boltzmann) ~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/ pkgrel=9 with iter17/20/21/22/23/27 diag printks
Kernel running on fresnel linux-fresnel-fourier 7.0-9 diagnostic build; revert to clean 7.0-X before any production work
Test fixtures (fresnel) /home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm} 5 codecs at 720p10s or 1080p30
OUTPUT-buffer dumps (fresnel) /tmp/out_dump/output_*.bin from α-16 env LIBVA_V4L2_DUMP_OUTPUT
Memory /home/mfritsche/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/ feedback_rkvdec_image_fmt_pre_seed.md is the key new entry

Identity for gitea pushes

All git.reauktion.de interactions use claude-noether identity (per memory feedback_gitea_as_claude_noether.md). Backend remote URL: ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git.

Backend commits delivered this session

6646b16 Revert iter28b DIAG: trim=40 universal-trim breaks IDR frame 1
c555788 iter28b DIAG: env-gated trim of HEVC slice_data trailing N bytes (reverted)
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC (no-op, rkvdec ignores)
754be1d iter27 diag: env-gated VAAPI slice fields dump
c9bfa21 iter27: remove request_log diag
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets from VAAPI (no-op)
66ef848 iter26 α-26: populate decode_params.short_term_ref_pic_set_size from VAAPI st_rps_bits
d062fec iter25 α-25 fix: add FRAME_MBS_ONLY to H264 dummy SPS
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt   ← THE FIX

Campaign repo commits delivered

c15fc6c iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
8b17bf7 Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
02c4192 iter27/28: probe HEVC frame 2+ divergence; α-27/α-28 no-op; ffmpeg-vaapi slice_data inflation localized
bf67900 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

Phase docs (chronological): phase4_iter21_plan.md, phase4_iter22_plan.md, phase8_iteration20_close.mdphase8_iteration27_close.md, CAMPAIGN_SESSION_2026_05_14.md.

How to verify the current state

Run on fresnel after git pull + ninja -C build in ~/src/libva-v4l2-request-fourier:

# H.264 — should be byte-equal to SW (Bug 4 fixed)
env LIBVA_DRIVER_NAME=v4l2_request \
    LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
    ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
    -i /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 \
    -vf "hwdownload,format=nv12,crop=1920:1080:0:0" \
    -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/libva_h264.yuv
ffmpeg -hide_banner -loglevel error -y \
    -i /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 \
    -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/sw_h264.yuv
cmp /tmp/libva_h264.yuv /tmp/sw_h264.yuv  # SHOULD print nothing (equal)

# HEVC — frame 1 byte-equal, frames 2+ differ
env LIBVA_DRIVER_NAME=v4l2_request \
    LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
    ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -vf "hwdownload,format=nv12,crop=1280:720:0:0" \
    -frames:v 1 -f rawvideo -pix_fmt nv12 /tmp/libva_hevc1.yuv
ffmpeg -hide_banner -loglevel error -y \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -frames:v 1 -f rawvideo -pix_fmt nv12 /tmp/sw_hevc1.yuv
cmp /tmp/libva_hevc1.yuv /tmp/sw_hevc1.yuv  # SHOULD print nothing

Root cause (saved to memory)

rkvdec_s_ctrl on first HEVC_SPS / H264_SPS:

image_fmt = desc->ops->get_image_fmt(ctx, ctrl);
if (rkvdec_image_fmt_changed(ctx, image_fmt)) {
    vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
    if (vb2_is_busy(vq))
        return -EBUSY;     // ← THIS
    ctx->image_fmt = image_fmt;
    rkvdec_reset_decoded_fmt(ctx);
}

ctx->image_fmt defaults to RKVDEC_IMG_FMT_ANY at open. First per-frame SPS resolves to a concrete value (e.g., RKVDEC_IMG_FMT_420_8BIT); since ANY ≠ concrete, image_fmt_changed returns true → tries reset → vb2_is_busy returns true (libva pre-allocated 24 CAPTURE buffers at CreateContext) → -EBUSY → setup loop breaks → SPS never committed to ctx->ctrl_hdl → rkvdec_hevc_run reads zero → all-zero CAPTURE.

The fix (α-25)

In src/context.c::RequestCreateContext, BEFORE cap_pool_init (line ~215), inject one synthetic SPS via non-request v4l2_set_controls:

  • VAProfileHEVCMain: synthetic v4l2_ctrl_hevc_sps with chroma_format_idc=1 (4:2:0), bit_depth=0 (8-bit), width/height from caller.
  • VAProfileH264*: synthetic v4l2_ctrl_h264_sps with same chroma+bit_depth + V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY flag set (else rkvdec_h264_validate_sps doubles height and rejects).

At this point CAPTURE is empty → vb2_is_busy=false → rkvdec_s_ctrl succeeds → ctx->image_fmt = RKVDEC_IMG_FMT_420_8BIT. From then on per-frame SPS finds image_fmt_changed=false → skip reset → commits successfully.

Source: see commit db0b7f9 for the full diff.

Open items (deferred)

1. HEVC frame 2+ divergence

For non-IDR HEVC frames, libva's slice_data_size from VAAPI is consistently 40 bytes larger than ffmpeg-v4l2request's size parameter (= the nal->raw_size value libavcodec dispatches). The 40 extra bytes inflate libva's OUTPUT buffer → rkvdec reads past the slice payload → wrong reference → frame 2+ visual garbage.

Evidence (from iter27 kernel printk):

libva frame 2 OUTPUT = 5552 bytes (3 prefix + 5549 slice_data)
kdirect frame 2 OUTPUT = 5512 bytes (3 prefix + 5509)
diff = 40 bytes per slice, P/B-frame specific (IDR is correct)

Universal trim=40 tested as iter28b → broke IDR (frame 1) which was correct → reverted. Real fix requires:

Option A: Rebuild ffmpeg with fprintf(stderr, "size=%u\n", size) at top of v4l2_request_hevc_decode_slice in libavcodec/v4l2_request_hevc.c:564 to confirm what size libavcodec actually dispatches. Probe added during session, build was killed mid-link due to context length. To redo: source is at /home/mfritsche/src/aur/ffmpeg-git/src/FFmpeg/ on fresnel, restored to clean state. Add the probe, nohup make -j4 ffmpeg > /tmp/log 2>&1 &, wait ~25 min, then run libva HEVC and see actual size values dispatched by libavcodec.

Option B: Write a libva-side HEVC slice trailing-bits parser to find the rbsp_stop_one_bit position dynamically. Scan slice_data buffer backwards, identify the byte containing the stop bit (pattern: data...1 0...0), trim slices_size to that position. Complicated by the fact that the 40 trailing bytes for BBB frame 2 look like real entropy data (not zeros), so simple "trim trailing zeros" doesn't work.

Option C: Patch ffmpeg-vaapi to ensure slice_data_size matches nal->raw_size from libavcodec exactly (suspect there's some internal inflation in vaapi_hevc_decode_slice/ff_vaapi_decode_make_slice_buffer path). Upstream ffmpeg work.

2. MPEG-2 / VP8 untestable through libva on current kernel boot

Libva backend's find_codec_device (in src/request.c:427) selects ONE device for the entire session. On RK3399 with both rkvdec (/dev/media0+/dev/video1 this boot) and hantro (/dev/media1+/dev/video2+/dev/video3), the backend picks rkvdec — which exposes H264/HEVC/VP9 only, not MPEG-2/VP8.

Override with LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 to force hantro for MPEG-2/VP8 testing. But that disables H264/HEVC/VP9 simultaneously, and the unconditional HEVC DECODE_MODE/START_CODE controls libva sets at CreateContext (context.c:343-379) fail on hantro with Unable to set control(s): Invalid argument — pre-existing, not iter25 regression.

Fix would require either:

  • Libva backend multi-device probe + per-codec dispatch (~200-400 LOC, called out in phase0_findings_iter7.md).
  • Conditional codec-init controls (skip controls hantro doesn't support).

3. Kernel substrate cleanup

linux-fresnel-fourier 7.0-9 has 5+ accumulated pr_info diagnostic patches in drivers/media/v4l2-core/v4l2-ctrls-request.c and drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c. Before any production work, revert to clean 7.0-X (i.e., apply only the 3 PBP DTS patches + RFC v2 fence series, without diagnostics). Or just bump to 7.0-X and ship without diagnostics.

Memory entries this session

  • New: feedback_rkvdec_image_fmt_pre_seed.md — root cause + α-25 fix summary.
  • Updated: feedback_libva_byte_correct_kernel_bug.md flagged as partially overturned (the byte-correctness claim was right; the kernel-side bug claim was misleading — actual bug was libva-side CAPTURE-pool timing interacting with kernel state).

Key commands quickreference

# Sync backend on fresnel + rebuild
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'

# Run libva HEVC + capture rkvdec kernel printk
ssh fresnel 'sudo dmesg -C; env LIBVA_DRIVER_NAME=v4l2_request \
    LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
    ffmpeg -hide_banner -loglevel error -y -hwaccel vaapi -hwaccel_output_format vaapi \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -vf "hwdownload,format=nv12,crop=1280:720:0:0" -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/x.yuv;
    sudo dmesg | grep -E "rkvdec|iter2[0-9]_"'

# kdirect (ffmpeg-v4l2request) reference
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y \
    -hwaccel v4l2request -hwaccel_output_format drm_prime \
    -i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
    -vf "hwdownload,format=nv12,crop=1280:720:0:0" -frames:v 3 -f rawvideo -pix_fmt nv12 /tmp/y.yuv'

# Force hantro path (untested with backend, see open-item 2)
env LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 ...

# Reboot fresnel (sddm autologin reseats mfritsche per /etc/sddm.conf.d/20-autologin.conf)
ssh fresnel 'sudo systemctl reboot'; sleep 60

What's safe to do without user confirmation

  • Read/grep on noether, boltzmann, fresnel.
  • Push to gitea (claude-noether identity).
  • Reboot fresnel (sddm autologin restores session).
  • Build kernel on boltzmann via makepkg -e --noconfirm in ~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/.
  • Deploy kernel via scp + sudo pacman -U.
  • Run ffmpeg/cmp tests on fresnel.

What needs user confirmation

  • Significant ffmpeg rebuild (~25 min CPU time).
  • Reverting kernel-substrate diagnostics to ship a clean kernel.
  • Decisions on whether to invest in HEVC frame 2+ fix or MPEG-2/VP8 multi-device probe.