From 045553ccaf04d2acc9807695a7fcc6f5eecdd770 Mon Sep 17 00:00:00 2001 From: claude-noether Date: Sun, 24 May 2026 22:49:01 +0200 Subject: [PATCH] phase1: add deployment-scale bit-exact ctest (1080p, 8160 MBs) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The existing 320x240 bit-exact test (300 MBs) is the fast inner-loop gate, but it's small enough that index arithmetic bugs that only surface above 16-bit boundaries would slip through. This adds a second ctest entry that runs the same binary against a full coded 1080p frame (1920x1088, 8160 MBs): - 4080 MBs at transform_8x8=0 → 65,280 luma 4x4 blocks - 4080 MBs at transform_8x8=1 → 16,320 luma 8x8 blocks - 65,280 chroma 4x4 blocks (32,640 Cb + 32,640 Cr) - 146,880 IDCTs total across 3 separate luma_4x4 + luma_8x8 + chroma dispatches; bit-exact compared against the in-test C reference for each. No code change to the test binary itself — it already accepted width/height as argv[1..2]. Just a second `add_test` in CMakeLists.txt that invokes it with `1920 1088`. Coverage rationale: - dst_off is uint32_t in daedalus_h264_block_meta; at 1920x1088 the max offset is ~2.1 MiB, still well within uint32 range, but the test exercises the largest stride math we'll see in production (per-MB chroma offset = mb_y*8 + cb_plane_size = up to 1.06 MiB). - flush_frame partitions 8160 MBs by transform mode → exercises the bi4 == 4080*16 and bi8 == 4080*4 accumulators at frame scale. - Verifies the 1088 coded height handling (the displayed 1080 + 8 cropped rows trap that catches Pi 5 H.264 integrations). Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0): $ ctest --test-dir build --output-on-failure Start 1: smoke 1/3 Test #1: smoke ............................ Passed 0.09 sec Start 2: idct_bitexact 2/3 Test #2: idct_bitexact .................... Passed 0.03 sec Start 3: idct_bitexact_1080p 3/3 Test #3: idct_bitexact_1080p .............. Passed 0.06 sec 100% tests passed, 0 tests failed out of 3 $ ./build/test_idct_bitexact 1920 1088 test_idct_bitexact: 1920x1088 (8160 MBs), seed=0xfeedface5a5a5a5a MB mix: 4080 4x4 MBs, 4080 8x8 MBs Y bytes total: 2088960 Y bytes diff: 0 (0.0000%) Cb bytes total: 522240 diff: 0 (0.0000%) Cr bytes total: 522240 diff: 0 (0.0000%) BIT-EXACT PASS (Y + Cb + Cr) (0.06 s when shader pool warm; ~0.2 s cold via the standalone invocation above — the 1080p run happens after smoke, so pool is already primed by the time it runs in ctest.) --- CMakeLists.txt | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/CMakeLists.txt b/CMakeLists.txt index b7ac6df..97b98f0 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -117,8 +117,17 @@ add_test(NAME smoke COMMAND test_smoke) add_executable(test_idct_bitexact tests/test_idct_bitexact.c) target_link_libraries(test_idct_bitexact PRIVATE daedalus_decoder) target_compile_options(test_idct_bitexact PRIVATE -O2) + +# 320x240 QVGA — fast inner-loop test (300 MBs, sub-second). add_test(NAME idct_bitexact COMMAND test_idct_bitexact) +# 1920x1088 1080p — deployment-scale test (8160 MBs, ~0.25 s on hertz). +# Validates the per-MB block index + pixel offset math at full coded +# height (1088, not 1080 — see daedalus_decoder.h on H.264 coded vs +# displayed dims). Cheap enough to run unconditionally; if it ever +# gets slow we'll split into a CTest LABEL for opt-in. +add_test(NAME idct_bitexact_1080p COMMAND test_idct_bitexact 1920 1088) + # ---- Install ------------------------------------------------------ # # Library + public header. Stage 2/3 will add a pkg-config file and