Phase 8: wire LPF wd=4 + wd=8 QPU through public API

Mirror the IDCT pattern (lazy pipeline + per-call SSBO alloc +
dispatch + readback) for cycles 2 (LPF wd=4) and 4 (LPF wd=8).

Important caught-empirically bug: the two LPF shaders disagree
on push-constant slot order — wd=4 puts dst_stride_u8 at slot 1,
wd=8 puts it at slot 2 (with unused blocks_per_row at slot 1).
Initial single-struct attempt silently corrupted wd=8 output
(1958/2048 = 95.6 % bit-exact on test_api_lpf). Fixed by keeping
separate lpf4_pc and lpf8_pc struct definitions.

dst-window calc handles both kernels (same -4..+3 byte footprint
per row).

test_api_lpf exercises both kernels in CPU / QPU / AUTO modes
against the C reference. All 6 mode/kernel combinations pass
2048/2048 bit-exact (32 edges × 8 rows × 8 bytes/edge).

Phase 8 status after this commit: 3 of 5 kernels wired through
API for QPU dispatch (IDCT, LPF wd=4, LPF wd=8 — i.e., all 3
QPU-default kernels per recipe). Cycle 3 MC and cycle 5 CDEF
still need wiring for opportunistic-override mode but aren't
needed for recipe-AUTO path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 13:57:25 +00:00
parent 1085c5699c
commit eb5cfb34c4
3 changed files with 262 additions and 5 deletions
+8
View File
@@ -305,6 +305,14 @@ add_executable(test_api_idct
target_link_libraries(test_api_idct PRIVATE daedalus_core)
target_compile_options(test_api_idct PRIVATE -O2)
add_executable(test_api_lpf
tests/test_api_lpf.c
tests/vp9_lpf_ref.c
tests/vp9_lpf8_ref.c
)
target_link_libraries(test_api_lpf PRIVATE daedalus_core)
target_compile_options(test_api_lpf PRIVATE -O2)
if (DAEDALUS_BUILD_VULKAN)
# (re-open the conditional so the closing endif() below balances)