86a28d2a3bf56fca420e57dd3576134eae4875a1
Validates marfrit-packages patch 0016 (PR #106) end-to-end against the daedalus_decode_h264 CLI. Callback fires once per macroblock in coded order; this PR checks the count + uniqueness invariants WITHOUT yet driving daedalus-decoder differently — that's PR-A3. Infrastructure landed --------------------- CMake gains DAEDALUS_FFMPEG_PREFIX option pointing at a private FFmpeg install carrying patch 0016. When set, the CLI links against it (static .a's from $prefix/lib) and the inspection codepath is compiled in (DAEDALUS_HAVE_H264_MB_INSPECT_CB). When unset, the CLI falls back to the pkg-config-discovered system FFmpeg and behaves as PR-A1b did (identity-passthrough only, no callback). The H264Context struct stays opaque (forward-decl only — its real definition lives in libavcodec's internal h264dec.h which isn't installed). Real per-MB state extraction (sl->mb coeffs, mb_type, intra modes, deblock params) will land in PR-A3 alongside an internal-header include path. The callback's only job in this PR: assert (mb_x, mb_y) lies in the coded grid, mark "seen" in a per-frame bitmap, count invocations. At end-of-frame: assert seen-count == mb_w*mb_h, 0 duplicates, 0 out-of-bounds. Per-frame mb-grid init goes BEFORE first avcodec_send_packet (callbacks fire from inside send_packet, before the first receive_frame ever returns — lazy init from AVFrame would miss all of frame 0). Dims come from codecpar->width/height rounded up to 16-mod (H.264 codes 1080 display as 1088 coded). Raster-order check considered but dropped: libavcodec uses MB-level threading in some configs so callbacks fire out of raster order. The contract is "each MB exactly once", not "in raster order"; the bitmap check captures that. Result on hertz (Pi 5, patched FFmpeg at /tmp/ffmpeg-inspect-prefix) ------------------------------------------------------------------- 320x240 I-only, 3 frames: mb-grid 20x15 callback invocations: 900 (= 3 * 300) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/230400, UV diff 0/115200 PASS 1920x1088 I-only, 3 frames: mb-grid 120x68 callback invocations: 24480 (= 3 * 8160) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/6266880, UV diff 0/3133440 PASS Followups --------- - PR-A3: include libavcodec/h264dec.h via -I to access H264Context internals; extract sl->mb coefficients in the callback, compute P = pre-deblock pixels - IDCT(C) using a transcribed C reference; feed daedalus_decoder with REAL (P, C, edges) instead of identity. Use avctx->skip_loop_filter = AVDISCARD_ALL to make libavcodec output pre-deblock so the subtraction is exact. - PR-A4 onwards: extend to P/B frames + chroma DC + intra prediction coverage.
daedalus-decoder
Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7. Design phase — not implemented yet.
The objective: build the NVDEC-equivalent shape on Pi 5. One Vulkan submit per frame, one fence wait per frame, encoded H.264 bitstream in, NV12 frame out. Reuses daedalus-fourier's V3D compute primitives at the right granularity — not the per-block-call granularity that the kernel-substitution prototype exposed as architecturally wrong.
Sibling projects:
- daedalus-fourier — V3D + NEON kernel pack (IDCT, MC, deblock primitives). Stays as research/microbench artifact.
- daedalus-v4l2 — V4L2 stateless decoder shim + userspace daemon for Pi 5. The eventual consumer of this decoder.
- libva-v4l2-request-fourier — VAAPI ↔ V4L2 stateless bridge. End consumer.
See DESIGN.md for the architecture sketch.
Description
Frame-level GPU H.264 decoder for Raspberry Pi 5 V3D7. NVDEC-shaped pipeline (encoded bitstream in, NV12 out, one Vulkan submit per frame) built on daedalus-fourier's V3D compute primitives. Phase 1 design exploration.
Languages
Markdown
100%