phase1: add deployment-scale bit-exact ctest (1080p) #7

Merged
marfrit merged 1 commits from noether/phase1-bitexact-1080p into main 2026-05-24 20:50:42 +00:00
Owner

Adds a second ctest entry that runs the existing bit-exact test binary against a full coded 1080p frame (1920x1088, 8160 MBs, 146,880 IDCTs total — 4080 4x4 MBs + 4080 8x8 MBs + chroma). No test-binary change; just a add_test(... 1920 1088) line.

Passes on hertz in 0.06s (warm pool) / 0.22s (cold). Catches index arithmetic bugs that the 320x240 test is too small to surface, including the 1088 coded height trap (displayed 1080 + 8 cropped rows).

Adds a second ctest entry that runs the existing bit-exact test binary against a full coded 1080p frame (1920x1088, 8160 MBs, 146,880 IDCTs total — 4080 4x4 MBs + 4080 8x8 MBs + chroma). No test-binary change; just a `add_test(... 1920 1088)` line. Passes on hertz in 0.06s (warm pool) / 0.22s (cold). Catches index arithmetic bugs that the 320x240 test is too small to surface, including the 1088 coded height trap (displayed 1080 + 8 cropped rows).
marfrit added 1 commit 2026-05-24 20:49:19 +00:00
The existing 320x240 bit-exact test (300 MBs) is the fast inner-loop
gate, but it's small enough that index arithmetic bugs that only
surface above 16-bit boundaries would slip through.  This adds a
second ctest entry that runs the same binary against a full coded
1080p frame (1920x1088, 8160 MBs):

  - 4080 MBs at transform_8x8=0 → 65,280 luma 4x4 blocks
  - 4080 MBs at transform_8x8=1 → 16,320 luma 8x8 blocks
  - 65,280 chroma 4x4 blocks (32,640 Cb + 32,640 Cr)
  - 146,880 IDCTs total across 3 separate luma_4x4 + luma_8x8 +
    chroma dispatches; bit-exact compared against the in-test C
    reference for each.

No code change to the test binary itself — it already accepted
width/height as argv[1..2].  Just a second `add_test` in
CMakeLists.txt that invokes it with `1920 1088`.

Coverage rationale:

  - dst_off is uint32_t in daedalus_h264_block_meta; at 1920x1088
    the max offset is ~2.1 MiB, still well within uint32 range, but
    the test exercises the largest stride math we'll see in production
    (per-MB chroma offset = mb_y*8 + cb_plane_size = up to 1.06 MiB).
  - flush_frame partitions 8160 MBs by transform mode → exercises the
    bi4 == 4080*16 and bi8 == 4080*4 accumulators at frame scale.
  - Verifies the 1088 coded height handling (the displayed 1080 +
    8 cropped rows trap that catches Pi 5 H.264 integrations).

Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):

  $ ctest --test-dir build --output-on-failure
      Start 1: smoke
  1/3 Test #1: smoke ............................   Passed    0.09 sec
      Start 2: idct_bitexact
  2/3 Test #2: idct_bitexact ....................   Passed    0.03 sec
      Start 3: idct_bitexact_1080p
  3/3 Test #3: idct_bitexact_1080p ..............   Passed    0.06 sec

  100% tests passed, 0 tests failed out of 3

  $ ./build/test_idct_bitexact 1920 1088
  test_idct_bitexact: 1920x1088 (8160 MBs), seed=0xfeedface5a5a5a5a
  MB mix: 4080 4x4 MBs, 4080 8x8 MBs
  Y bytes total:  2088960
  Y bytes diff:   0 (0.0000%)
  Cb bytes total: 522240  diff: 0 (0.0000%)
  Cr bytes total: 522240  diff: 0 (0.0000%)
  BIT-EXACT PASS (Y + Cb + Cr)

(0.06 s when shader pool warm; ~0.2 s cold via the standalone
invocation above — the 1080p run happens after smoke, so pool is
already primed by the time it runs in ctest.)
marfrit merged commit 4c5c7a33ce into main 2026-05-24 20:50:42 +00:00
marfrit deleted branch noether/phase1-bitexact-1080p 2026-05-24 20:50:43 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/daedalus-decoder#7