Commit Graph

2 Commits

Author SHA1 Message Date
claude-noether 948697ef0d phase1/stage1: bit-exact gate for the frame-scaled luma IDCT 4x4
Adds test_idct_bitexact that exercises daedalus_decoder_flush_frame
end-to-end with random coefficients and compares every output byte
against an inline C reference of the H.264 §8.5.12.1 1D butterfly.
Closes the validation gap from the previous PR ("dispatch succeeds"
becomes "dispatch is bit-exact").

What's tested:

  - 320×240 coded frame (300 MBs), enough to cover multiple workgroups
    of the V3D shader (16 blocks/WG → ≥30 WGs)
  - Per-MB → flat-raster block layout consistent with flush_frame
  - Random coeffs in [-512, 511] (same range as daedalus-fourier
    cycle-6 M1 gate)
  - Inline C reference: H.264 §8.5.12.1 butterfly with column-major
    block layout, +32 rounding, >>6, add-to-predicted (=0), clip255 —
    mirrors daedalus-fourier tests/h264_idct4_ref.c

Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):

  $ ctest --test-dir build --output-on-failure
    Start 1: smoke
  1/2 Test #1: smoke ............................   Passed    0.16 sec
    Start 2: idct_bitexact
  2/2 Test #2: idct_bitexact ....................   Passed    0.03 sec

  100% tests passed, 0 tests failed out of 2

Bit-exact PASS first try — daedalus-fourier's V3D IDCT 4x4 shader
produces identical pixels to the C reference for all 4800 blocks in
the test frame.  Validates BOTH the shader correctness AND the
frame-batched-dispatch correctness (this is the first time
n_blocks > ~30 has been exercised at the recipe-dispatch layer; the
substitution arc only ever called with n_blocks=1).

What is NOT tested by this PR (deferred to follow-ons):

  - Non-zero predicted pixels — flush_frame zero-initialises scratch_y,
    so the IDCT-ADD reduces to clip255(IDCT).  Real predicted comes
    from Stage 2a intra prediction.
  - Z-scan permutation between FFmpeg's per-MB coeffs layout and our
    per-MB → flat raster — the test uses its own coefficient generator
    that already matches our layout, so it doesn't exercise the
    permutation.  The libavcodec-intercept patch is where the
    permutation lands and gets validated against real H.264 streams.
  - Chroma 4×4 IDCT.
  - IDCT 8×8 (High profile).

Stacked on noether/phase1-stage1-idct (PR #3, the frame-scaled
dispatch).  Rebase on main after #3 lands; the diff is purely additive
(one new test file + 5 lines of CMake).
2026-05-24 22:20:21 +02:00
claude-noether 08080f062c scaffold: CMake + API skeleton + smoke test
First code on daedalus-decoder per the Phase 1 decisions merged 2026-05-24.
Repo skeleton only — no Vulkan pipeline yet, no shaders, no libavcodec
intercept.  Establishes the build shape so subsequent work has a place
to land.

Layout:

  LICENSE                          BSD-2-Clause (matches daedalus-fourier)
  .gitignore                       build/, CMake artefacts, *.spv
  CMakeLists.txt                   top-level — finds daedalus-fourier
                                   ≥0.1.0 via pkg-config (per §9.6
                                   decision: find_package, pinned to
                                   tagged release; .pc consumed via
                                   pkg_check_modules until we ship a
                                   CMake config), Vulkan via
                                   find_package, builds static lib
                                   + smoke test, GNUInstallDirs install
  include/daedalus_decoder.h       public API surface:
                                     - daedalus_decoder_{create,destroy,
                                                         version,has_qpu}
                                     - daedalus_decoder_set_output_format
                                       (NV12 default, RGBA opt-in per §5)
                                     - daedalus_decoder_append_mb +
                                       struct daedalus_decoder_mb_input
                                       (matches §3 per-MB descriptor)
                                     - daedalus_decoder_flush_frame
                                       (per-frame submit + wait)
                                     - daedalus_decoder_export_dmabuf
                                       (Vulkan-native VkImage export per
                                       §9.4 decision)
                                   Dimensions are CODED frame size
                                   (mod-16), not displayed — caller
                                   translates from SPS + crop offsets.
  src/internal.h                   internal mb_desc struct (matches
                                   shader std430 layout, to be nailed
                                   down once shaders exist) + per-ctx
                                   state
  src/daedalus_decoder.c           stub bodies:
                                     - create/destroy with proper resource
                                       lifecycle
                                     - append_mb validates + writes CPU
                                       staging buffers (no GPU yet)
                                     - flush_frame returns -2 (not
                                       implemented) — Phase 1 work
                                     - export_dmabuf returns -1
                                     - has_qpu / version diagnostics
  tests/test_smoke.c               link + lifecycle test: bad dims
                                   reject, OOB MB reject, null inputs
                                   reject, raster-order enforcement,
                                   mid-frame format-change reject,
                                   incomplete-frame flush reject.
                                   On hosts without V3D7 Vulkan,
                                   SKIPs gracefully (returns 0).

Verified on hertz (Pi 5 / V3D 7.1 / Mesa V3DV via daedalus-fourier
0.1.0):

  $ cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
  $ cmake --build build
  $ ctest --test-dir build --output-on-failure
  Test #1: smoke ... Passed

  $ ./build/test_smoke
  daedalus-decoder version: 0.0.1
  ctx created: 1920x1088, has_qpu=1
  smoke OK

Note the coded-vs-displayed dims trap: 1080p H.264 has coded height
1088 with 8 rows cropped via SPS frame_cropping_*.  Header docstring
on daedalus_decoder_create() spells this out so future callers don't
hit the multiple-of-16 reject (smoke test caught it during scaffold
write).

Next: Phase 1 implementation begins — IDCT 4×4 / 8×8 frame-scaled
dispatch (reusing daedalus-fourier shaders per Appendix A), intra
prediction wavefront, reconstruct stage, NV12 output via dmabuf
export.  Smoke test grows from "ctx lifecycle works" to
"I-frame-only Baseline decode bit-exact vs FFmpeg reference".
2026-05-24 22:08:46 +02:00