daedalus-decoder

Author	SHA1	Message	Date
claude-noether	1d651c95ec	wip: PR-A6 deblock validation	2026-05-26 11:47:09 +02:00
claude-noether	44e92fa3dc	Stage 2 PR-A3b: real H.264 coefficients through daedalus-decoder, byte-exact Final option-A deliverable. CLI now extracts real per-MB coefficients from libavcodec via the inspection callback + side-buffer (marfrit-packages 0016 + 0017), reconstructs the pre-residual predicted samples P via inverse-of-IDCT-add, and feeds daedalus-decoder with real (P, C, no edges). Daedalus output BYTE-EXACT against libavcodec's pre-deblock AVFrame across 5 frames at 320x240 and 3 frames at 1920x1088, all three substrates (auto / cpu / qpu). Path summary ------------ avctx->thread_count = 1 (single-threaded decode — 0017's side buffer is per-H264Context; multi-threaded would race) avctx->skip_loop_filter = AVDISCARD_ALL (AVFrame stays pre-deblock so the P-recovery subtraction is exact) ff_h264_set_mb_inspect_cb (registers the callback) Inspection callback (per MB, fires post-hl_decode_mb): - Gate on IS_INTRA4x4 && !IS_8x8DCT && !IS_INTRA_PCM (skipped MBs fall back to identity-passthrough in the main loop) - Snapshot pre-deblock pixels from h->cur_pic.f->data[0] - Read coefficients from h->mb_inspect_coeffs (= sl->mb copy, the 0017 side buffer) - For each 4x4 block (16/MB in raster order, indexed via raster_to_zscan[] to find its slot in the z-scan-ordered side buffer): compute IDCT(C) using a transcribed H.264 C reference, derive P = clip(pre_deblock - ((IDCT + 32) >> 6)) - Stash per-MB capture (P + C) for the main loop Main loop: - Default identity-passthrough (predicted = AVFrame pixels, coeffs = 0) - For real-coeffs-valid MBs: override luma with captured P + C - flush_frame, byte-exact compare against AVFrame A diagnostic also asserts (silently when passing) that the callback's pre_deblock snapshot equals AVFrame at each real-coeffs MB position — i.e. h->cur_pic.f IS the eventual AVFrame buffer under skip_loop_filter=AVDISCARD_ALL with thread_count=1. Bug hunted in this PR --------------------- Initial implementation transposed the coefficients from row-major (sl->mb) to "column-major" (the layout that daedalus_decoder.h's mb_input.coeffs docstring describes). This caused ~0.2% Y pixel divergence on real streams (~150/frame at 320x240). Root cause identified via a standalone /tmp/idct_compare.c harness running daedalus's C ref IDCT and FFmpeg's reference C IDCT on identical int16[16] inputs: outputs IDENTICAL. The two functions implement the spec H.264 IDCT on the array regardless of layout interpretation; the "column-major" label is decoration. Removed the transpose; PR is now byte-exact. Follow-up task #184: clarify daedalus_decoder.h's mb_input.coeffs docstring so future integrators don't repeat this transpose mistake. Result on hertz (Pi 5 V3D 7.1) ------------------------------ testsrc2 I-only via libx264 -bf 0 -g 1: 320x240, 5 frames, substrate=auto: Y diff 0/76800, UV diff 0/38400 PASS 320x240, 5 frames, substrate=cpu: Y diff 0/76800, UV diff 0/38400 PASS 320x240, 5 frames, substrate=qpu: Y diff 0/76800, UV diff 0/38400 PASS 1920x1088, 3 frames, substrate=auto: Y diff 0/2088960, UV diff 0/1044480 PASS Real-coeffs path engaged for 77-95 MBs per 320x240 frame and 598-643 MBs per 1080p frame (testsrc2 is mostly flat → many Intra_16x16 MBs that fall back to identity passthrough; richer content streams would engage real-coeffs more). Followups --------- - PR-A4: extend the gate to Intra_16x16 (chroma DC Hadamard + Intra_16x16 luma DC Hadamard pre-pass) — currently ~30-60% of MBs fall back to identity-passthrough due to this. - PR-A5: extend to 8x8 transform (separate IDCT 8x8 dispatch path on the daedalus-decoder side, similar plumbing). - PR-A6: enable libavcodec's deblock (skip_loop_filter=AVDISCARD_NONE) and have daedalus's deblock produce the post-deblock output that matches AVFrame. Closes the loop on the full I-only pipeline. - Task #184: daedalus_decoder.h coeffs docstring clarification.	2026-05-26 11:19:11 +02:00
claude-noether	86a28d2a3b	Stage 2 PR-A2: per-MB inspection callback wiring + invariant checks Validates marfrit-packages patch 0016 (PR #106) end-to-end against the daedalus_decode_h264 CLI. Callback fires once per macroblock in coded order; this PR checks the count + uniqueness invariants WITHOUT yet driving daedalus-decoder differently — that's PR-A3. Infrastructure landed --------------------- CMake gains DAEDALUS_FFMPEG_PREFIX option pointing at a private FFmpeg install carrying patch 0016. When set, the CLI links against it (static .a's from $prefix/lib) and the inspection codepath is compiled in (DAEDALUS_HAVE_H264_MB_INSPECT_CB). When unset, the CLI falls back to the pkg-config-discovered system FFmpeg and behaves as PR-A1b did (identity-passthrough only, no callback). The H264Context struct stays opaque (forward-decl only — its real definition lives in libavcodec's internal h264dec.h which isn't installed). Real per-MB state extraction (sl->mb coeffs, mb_type, intra modes, deblock params) will land in PR-A3 alongside an internal-header include path. The callback's only job in this PR: assert (mb_x, mb_y) lies in the coded grid, mark "seen" in a per-frame bitmap, count invocations. At end-of-frame: assert seen-count == mb_wmb_h, 0 duplicates, 0 out-of-bounds. Per-frame mb-grid init goes BEFORE first avcodec_send_packet (callbacks fire from inside send_packet, before the first receive_frame ever returns — lazy init from AVFrame would miss all of frame 0). Dims come from codecpar->width/height rounded up to 16-mod (H.264 codes 1080 display as 1088 coded). Raster-order check considered but dropped: libavcodec uses MB-level threading in some configs so callbacks fire out of raster order. The contract is "each MB exactly once", not "in raster order"; the bitmap check captures that. Result on hertz (Pi 5, patched FFmpeg at /tmp/ffmpeg-inspect-prefix) ------------------------------------------------------------------- 320x240 I-only, 3 frames: mb-grid 20x15 callback invocations: 900 (= 3 300) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/230400, UV diff 0/115200 PASS 1920x1088 I-only, 3 frames: mb-grid 120x68 callback invocations: 24480 (= 3 * 8160) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/6266880, UV diff 0/3133440 PASS Followups --------- - PR-A3: include libavcodec/h264dec.h via -I to access H264Context internals; extract sl->mb coefficients in the callback, compute P = pre-deblock pixels - IDCT(C) using a transcribed C reference; feed daedalus_decoder with REAL (P, C, edges) instead of identity. Use avctx->skip_loop_filter = AVDISCARD_ALL to make libavcodec output pre-deblock so the subtraction is exact. - PR-A4 onwards: extend to P/B frames + chroma DC + intra prediction coverage.	2026-05-26 07:06:31 +02:00
claude-noether	56f8498057	Stage 2 PR-A1b: tools/daedalus_decode_h264 — H.264 standalone test harness Option A's standalone end-to-end gate against real H.264 streams. First iteration: identity-passthrough validation — daedalus-decoder produces output byte-exact to libavcodec's AVFrame when fed the reconstructed pixels as `predicted`, zero coeffs, no deblock edges. Validates: daedalus-decoder data path (append_mb + flush_frame + NV12 output + coded-vs-display dim handling) at real-stream frame sizes (320x240 and 1920x1088) with real H.264-decoded predicted- sample distributions — not the random patterns the existing test_idct_bitexact + test_deblock_smoke synthesize. Identity-passthrough math: - mb_input.predicted = AVFrame pixels at MB raster position - mb_input.coeffs = 384 int16's, all zero - mb_input.edges = NULL, n_edges = 0 flush_frame: scratch_y/_uv pre-fill from predicted (= AVFrame pixels) IDCT dispatches with all-zero coeffs add 0 (no-op compute) No deblock dispatches (no edges) copy-out → caller's NV12 planes Result MUST equal AVFrame pixels byte-for-byte. Build ----- New cmake option DAEDALUS_BUILD_TOOLS (default OFF). When enabled, pkg-checks libavcodec / libavformat / libavutil and builds the daedalus_decode_h264 binary against the system FFmpeg. Stock libavcodec is sufficient for THIS PR (identity passthrough reads from AVFrame after avcodec_receive_frame; no per-MB internal state extraction needed). Follow-up PRs (A2+) will use the per-MB inspection callback added in marfrit-packages patch 0016 (PR #106) to feed REAL per-MB state (pre-residual predicted samples, residual coeffs, deblock edges) for actual non-trivial daedalus-decoder validation. Usage ----- daedalus_decode_h264 [--substrate cpu\|qpu\|auto] [--max-frames N] <input.h264> <output_dadec.yuv> <output_ref.yuv> Exit codes: 0 = byte-exact match across all frames 1 = argument / setup error 2 = decode error from libavcodec 3 = daedalus-decoder error (ctx, append, flush) 4 = bit-exact comparison failed Result on hertz (Pi 5 V3D 7.1) ------------------------------ I-only test clip via ffmpeg testsrc2 + libx264 -bf 0 -g 1: 320x240, 5 frames: substrate=auto: Y diff 0/76800 UV diff 0/38400 PASS substrate=cpu: Y diff 0/76800 UV diff 0/38400 PASS substrate=qpu: Y diff 0/76800 UV diff 0/38400 PASS 1920x1088 (coded; 1080 display), 3 frames: substrate=auto: Y diff 0/2088960 UV diff 0/1044480 PASS Followups --------- - PR-A2: wire the per-MB inspection callback (marfrit-packages 0016) so per-MB state — coeffs (sl->mb), predicted-before- residual (from prediction kernels), bS/alpha/beta — flows into mb_input instead of zeros, and IDCT / deblock dispatches do real GPU work. At that point we're decoding real H.264 streams through daedalus-decoder for real. - PR-A3: extend to P/B frames once MC dispatch lands.	2026-05-26 06:12:51 +02:00

4 Commits