Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper
Phase 4 plan with 3 Phase-5 REDs applied inline: - meta layout: m.z=tmp_off, m.w=dir - sec_shift clamped to >=0 (NEON uqsub semantics) - directions table as const ivec2[14], not OR-packed Phase 6 deliverable: v3d_cdef.comp (387 inst, 2 threads, no spills). 3-way M1 (QPU vs C ref vs NEON) PASS 4096/4096. M2: 0.443 Mblock/s -> R5 = 0.116 ORANGE (predicted 0.02-0.05 RED). M4 same-kernel: NEON-3+QPU 8.46 < NEON-4 alone ~10 (negative). M4 mixed (NEON-3 MC + QPU CDEF): CPU 34.17 Mblock/s MC, QPU 0.42 Mblock/s CDEF helper. CPU side higher than the Issue 003 NEON-fallback proxy suggested - cross-substrate contention is gentler than same-side NEON contention. Verdict: CDEF stays on CPU; QPU dispatch path exists for opportunistic use. Deployment recipe table updated for all 5 cycles. Phase 9 lessons: linear extrapolation across cycles is too pessimistic; CDEF is bandwidth-bound on NEON despite high per-block ns; real-substrate-cross contention < NEON-proxy contention. - src/v3d_cdef.comp: cycle 5 QPU shader - tests/bench_v3d_cdef.c: 3-way M1, M2 bench - tests/bench_concurrent_mixed.c: K_CDEF on both sides - tests/cdef_ref.c + bench_neon_cdef.c: sec_shift clamp + expanded damping range to exercise the edge case - CMakeLists.txt: v3d_cdef.spv + bench_v3d_cdef wiring - docs/k5_cdef_phase4.md updated with Phase 5 review applied - docs/k5_cdef_phase7.md: closure doc with full verdict matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+23
-1
@@ -207,7 +207,18 @@ if (DAEDALUS_BUILD_VULKAN)
|
||||
VERBATIM
|
||||
)
|
||||
|
||||
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV})
|
||||
set(CDEF_SPV ${CMAKE_BINARY_DIR}/v3d_cdef.spv)
|
||||
add_custom_command(
|
||||
OUTPUT ${CDEF_SPV}
|
||||
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
||||
-o ${CDEF_SPV}
|
||||
${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
||||
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
||||
COMMENT "glslang: v3d_cdef.comp -> v3d_cdef.spv"
|
||||
VERBATIM
|
||||
)
|
||||
|
||||
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV} ${CDEF_SPV})
|
||||
|
||||
# v3d_runner — reusable Vulkan plumbing.
|
||||
add_library(v3d_runner STATIC src/v3d_runner.c)
|
||||
@@ -255,6 +266,17 @@ if (DAEDALUS_BUILD_VULKAN)
|
||||
target_link_libraries(bench_v3d_lpf8 PRIVATE v3d_runner Vulkan::Vulkan)
|
||||
target_compile_options(bench_v3d_lpf8 PRIVATE -O2)
|
||||
|
||||
# Cycle 5 — QPU CDEF bench (3-way M1 against NEON + C ref).
|
||||
add_executable(bench_v3d_cdef
|
||||
tests/bench_v3d_cdef.c
|
||||
tests/cdef_ref.c
|
||||
${DAV1D_CDEF_ASM_SOURCES}
|
||||
${DAV1D_CDEF_C_SOURCES}
|
||||
)
|
||||
add_dependencies(bench_v3d_cdef daedalus_shaders)
|
||||
target_link_libraries(bench_v3d_cdef PRIVATE v3d_runner Vulkan::Vulkan)
|
||||
target_compile_options(bench_v3d_cdef PRIVATE -O2)
|
||||
|
||||
# M4 — concurrent CPU(NEON) + QPU bench. Links the FFmpeg NEON
|
||||
# snapshot so we can run real NEON kernels on pinned CPU cores
|
||||
# while the QPU runs its dispatch loop concurrently.
|
||||
|
||||
Reference in New Issue
Block a user