Phase 8: wire LPF wd=4 + wd=8 QPU through public API
Mirror the IDCT pattern (lazy pipeline + per-call SSBO alloc + dispatch + readback) for cycles 2 (LPF wd=4) and 4 (LPF wd=8). Important caught-empirically bug: the two LPF shaders disagree on push-constant slot order — wd=4 puts dst_stride_u8 at slot 1, wd=8 puts it at slot 2 (with unused blocks_per_row at slot 1). Initial single-struct attempt silently corrupted wd=8 output (1958/2048 = 95.6 % bit-exact on test_api_lpf). Fixed by keeping separate lpf4_pc and lpf8_pc struct definitions. dst-window calc handles both kernels (same -4..+3 byte footprint per row). test_api_lpf exercises both kernels in CPU / QPU / AUTO modes against the C reference. All 6 mode/kernel combinations pass 2048/2048 bit-exact (32 edges × 8 rows × 8 bytes/edge). Phase 8 status after this commit: 3 of 5 kernels wired through API for QPU dispatch (IDCT, LPF wd=4, LPF wd=8 — i.e., all 3 QPU-default kernels per recipe). Cycle 3 MC and cycle 5 CDEF still need wiring for opportunistic-override mode but aren't needed for recipe-AUTO path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -305,6 +305,14 @@ add_executable(test_api_idct
|
||||
target_link_libraries(test_api_idct PRIVATE daedalus_core)
|
||||
target_compile_options(test_api_idct PRIVATE -O2)
|
||||
|
||||
add_executable(test_api_lpf
|
||||
tests/test_api_lpf.c
|
||||
tests/vp9_lpf_ref.c
|
||||
tests/vp9_lpf8_ref.c
|
||||
)
|
||||
target_link_libraries(test_api_lpf PRIVATE daedalus_core)
|
||||
target_compile_options(test_api_lpf PRIVATE -O2)
|
||||
|
||||
if (DAEDALUS_BUILD_VULKAN)
|
||||
# (re-open the conditional so the closing endif() below balances)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user