T

claude-noether 86a28d2a3b Stage 2 PR-A2: per-MB inspection callback wiring + invariant checks

Validates marfrit-packages patch 0016 (PR #106) end-to-end against
the daedalus_decode_h264 CLI.  Callback fires once per macroblock
in coded order; this PR checks the count + uniqueness invariants
WITHOUT yet driving daedalus-decoder differently — that's PR-A3.

Infrastructure landed
---------------------

CMake gains DAEDALUS_FFMPEG_PREFIX option pointing at a private
FFmpeg install carrying patch 0016.  When set, the CLI links
against it (static .a's from $prefix/lib) and the inspection
codepath is compiled in (DAEDALUS_HAVE_H264_MB_INSPECT_CB).  When
unset, the CLI falls back to the pkg-config-discovered system
FFmpeg and behaves as PR-A1b did (identity-passthrough only, no
callback).

The H264Context struct stays opaque (forward-decl only — its
real definition lives in libavcodec's internal h264dec.h which
isn't installed).  Real per-MB state extraction (sl->mb coeffs,
mb_type, intra modes, deblock params) will land in PR-A3
alongside an internal-header include path.

The callback's only job in this PR: assert (mb_x, mb_y) lies in
the coded grid, mark "seen" in a per-frame bitmap, count
invocations.  At end-of-frame: assert seen-count == mb_w*mb_h,
0 duplicates, 0 out-of-bounds.

Per-frame mb-grid init goes BEFORE first avcodec_send_packet
(callbacks fire from inside send_packet, before the first
receive_frame ever returns — lazy init from AVFrame would miss
all of frame 0).  Dims come from codecpar->width/height rounded
up to 16-mod (H.264 codes 1080 display as 1088 coded).

Raster-order check considered but dropped: libavcodec uses
MB-level threading in some configs so callbacks fire out of
raster order.  The contract is "each MB exactly once", not "in
raster order"; the bitmap check captures that.

Result on hertz (Pi 5, patched FFmpeg at /tmp/ffmpeg-inspect-prefix)
-------------------------------------------------------------------

  320x240 I-only, 3 frames:
    mb-grid 20x15
    callback invocations: 900 (= 3 * 300)
    missing/duplicates/oob: 0/0/0
    identity-passthrough Y diff 0/230400, UV diff 0/115200
    PASS

  1920x1088 I-only, 3 frames:
    mb-grid 120x68
    callback invocations: 24480 (= 3 * 8160)
    missing/duplicates/oob: 0/0/0
    identity-passthrough Y diff 0/6266880, UV diff 0/3133440
    PASS

Followups
---------

  - PR-A3: include libavcodec/h264dec.h via -I to access H264Context
    internals; extract sl->mb coefficients in the callback, compute
    P = pre-deblock pixels - IDCT(C) using a transcribed C reference;
    feed daedalus_decoder with REAL (P, C, edges) instead of identity.
    Use avctx->skip_loop_filter = AVDISCARD_ALL to make libavcodec
    output pre-deblock so the subtraction is exact.
  - PR-A4 onwards: extend to P/B frames + chroma DC + intra prediction
    coverage.

2026-05-26 07:06:31 +02:00

include

wip: deblock dispatch

2026-05-25 23:14:24 +02:00

src

Stage 2 PR-b: deblock dispatch in flush_frame — luma + chroma, up to 8 submits

2026-05-25 23:30:37 +02:00

tests

Stage 2 PR-b: deblock dispatch in flush_frame — luma + chroma, up to 8 submits

2026-05-25 23:30:37 +02:00

tools

Stage 2 PR-A2: per-MB inspection callback wiring + invariant checks

2026-05-26 07:06:31 +02:00

.gitignore

scaffold: CMake + API skeleton + smoke test

2026-05-24 22:08:46 +02:00

CMakeLists.txt

Stage 2 PR-A2: per-MB inspection callback wiring + invariant checks

2026-05-26 07:06:31 +02:00

DESIGN.md

Merge pull request 'design: §9 open questions → Phase 1 decisions (user confirmed 2026-05-24)' (#1 ) from noether/design-decisions into main

2026-05-24 19:58:41 +00:00

LICENSE

scaffold: CMake + API skeleton + smoke test

2026-05-24 22:08:46 +02:00

README.md

initial design doc — frame-level GPU H.264 decoder for V3D7

2026-05-23 22:44:03 +02:00

README.md

daedalus-decoder

Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7. Design phase — not implemented yet.

The objective: build the NVDEC-equivalent shape on Pi 5. One Vulkan submit per frame, one fence wait per frame, encoded H.264 bitstream in, NV12 frame out. Reuses daedalus-fourier's V3D compute primitives at the right granularity — not the per-block-call granularity that the kernel-substitution prototype exposed as architecturally wrong.

Sibling projects:

daedalus-fourier — V3D + NEON kernel pack (IDCT, MC, deblock primitives). Stays as research/microbench artifact.
daedalus-v4l2 — V4L2 stateless decoder shim + userspace daemon for Pi 5. The eventual consumer of this decoder.
libva-v4l2-request-fourier — VAAPI ↔ V4L2 stateless bridge. End consumer.

See DESIGN.md for the architecture sketch.