claude-noether 56f8498057 Stage 2 PR-A1b: tools/daedalus_decode_h264 — H.264 standalone test harness
Option A's standalone end-to-end gate against real H.264 streams.
First iteration: identity-passthrough validation — daedalus-decoder
produces output byte-exact to libavcodec's AVFrame when fed the
reconstructed pixels as `predicted`, zero coeffs, no deblock edges.

Validates: daedalus-decoder data path (append_mb + flush_frame +
NV12 output + coded-vs-display dim handling) at real-stream frame
sizes (320x240 and 1920x1088) with real H.264-decoded predicted-
sample distributions — not the random patterns the existing
test_idct_bitexact + test_deblock_smoke synthesize.

Identity-passthrough math:
  - mb_input.predicted = AVFrame pixels at MB raster position
  - mb_input.coeffs    = 384 int16's, all zero
  - mb_input.edges     = NULL, n_edges = 0
  flush_frame:
    scratch_y/_uv pre-fill from predicted (= AVFrame pixels)
    IDCT dispatches with all-zero coeffs add 0 (no-op compute)
    No deblock dispatches (no edges)
    copy-out → caller's NV12 planes
  Result MUST equal AVFrame pixels byte-for-byte.

Build
-----

New cmake option DAEDALUS_BUILD_TOOLS (default OFF).  When enabled,
pkg-checks libavcodec / libavformat / libavutil and builds the
daedalus_decode_h264 binary against the system FFmpeg.

Stock libavcodec is sufficient for THIS PR (identity passthrough
reads from AVFrame after avcodec_receive_frame; no per-MB internal
state extraction needed).  Follow-up PRs (A2+) will use the per-MB
inspection callback added in marfrit-packages patch 0016 (PR #106)
to feed REAL per-MB state (pre-residual predicted samples, residual
coeffs, deblock edges) for actual non-trivial daedalus-decoder
validation.

Usage
-----

  daedalus_decode_h264 [--substrate cpu|qpu|auto]
                       [--max-frames N]
                       <input.h264> <output_dadec.yuv> <output_ref.yuv>

Exit codes:
  0 = byte-exact match across all frames
  1 = argument / setup error
  2 = decode error from libavcodec
  3 = daedalus-decoder error (ctx, append, flush)
  4 = bit-exact comparison failed

Result on hertz (Pi 5 V3D 7.1)
------------------------------

I-only test clip via ffmpeg testsrc2 + libx264 -bf 0 -g 1:

  320x240, 5 frames:
    substrate=auto:  Y diff 0/76800   UV diff 0/38400   PASS
    substrate=cpu:   Y diff 0/76800   UV diff 0/38400   PASS
    substrate=qpu:   Y diff 0/76800   UV diff 0/38400   PASS

  1920x1088 (coded; 1080 display), 3 frames:
    substrate=auto:  Y diff 0/2088960 UV diff 0/1044480 PASS

Followups
---------

  - PR-A2: wire the per-MB inspection callback (marfrit-packages
    0016) so per-MB state — coeffs (sl->mb), predicted-before-
    residual (from prediction kernels), bS/alpha/beta — flows into
    mb_input instead of zeros, and IDCT / deblock dispatches do
    real GPU work.  At that point we're decoding real H.264 streams
    through daedalus-decoder for real.
  - PR-A3: extend to P/B frames once MC dispatch lands.
2026-05-26 06:12:51 +02:00
2026-05-25 23:14:24 +02:00

daedalus-decoder

Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7. Design phase — not implemented yet.

The objective: build the NVDEC-equivalent shape on Pi 5. One Vulkan submit per frame, one fence wait per frame, encoded H.264 bitstream in, NV12 frame out. Reuses daedalus-fourier's V3D compute primitives at the right granularity — not the per-block-call granularity that the kernel-substitution prototype exposed as architecturally wrong.

Sibling projects:

  • daedalus-fourier — V3D + NEON kernel pack (IDCT, MC, deblock primitives). Stays as research/microbench artifact.
  • daedalus-v4l2 — V4L2 stateless decoder shim + userspace daemon for Pi 5. The eventual consumer of this decoder.
  • libva-v4l2-request-fourier — VAAPI ↔ V4L2 stateless bridge. End consumer.

See DESIGN.md for the architecture sketch.

S
Description
Frame-level GPU H.264 decoder for Raspberry Pi 5 V3D7. NVDEC-shaped pipeline (encoded bitstream in, NV12 out, one Vulkan submit per frame) built on daedalus-fourier's V3D compute primitives. Phase 1 design exploration.
Readme 560 KiB
Languages
Markdown 100%