Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

Phase 4 plan with 3 Phase-5 REDs applied inline:
  - meta layout: m.z=tmp_off, m.w=dir
  - sec_shift clamped to >=0 (NEON uqsub semantics)
  - directions table as const ivec2[14], not OR-packed

Phase 6 deliverable: v3d_cdef.comp (387 inst, 2 threads, no spills).
3-way M1 (QPU vs C ref vs NEON) PASS 4096/4096.

M2: 0.443 Mblock/s -> R5 = 0.116 ORANGE (predicted 0.02-0.05 RED).
M4 same-kernel: NEON-3+QPU 8.46 < NEON-4 alone ~10 (negative).
M4 mixed (NEON-3 MC + QPU CDEF): CPU 34.17 Mblock/s MC,
  QPU 0.42 Mblock/s CDEF helper. CPU side higher than the
  Issue 003 NEON-fallback proxy suggested - cross-substrate
  contention is gentler than same-side NEON contention.

Verdict: CDEF stays on CPU; QPU dispatch path exists for
opportunistic use. Deployment recipe table updated for all 5
cycles. Phase 9 lessons: linear extrapolation across cycles is
too pessimistic; CDEF is bandwidth-bound on NEON despite high
per-block ns; real-substrate-cross contention < NEON-proxy
contention.

- src/v3d_cdef.comp: cycle 5 QPU shader
- tests/bench_v3d_cdef.c: 3-way M1, M2 bench
- tests/bench_concurrent_mixed.c: K_CDEF on both sides
- tests/cdef_ref.c + bench_neon_cdef.c: sec_shift clamp +
  expanded damping range to exercise the edge case
- CMakeLists.txt: v3d_cdef.spv + bench_v3d_cdef wiring
- docs/k5_cdef_phase4.md updated with Phase 5 review applied
- docs/k5_cdef_phase7.md: closure doc with full verdict matrix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 13:52:46 +00:00
parent 1740e7c165
commit 5223d3cb3f
8 changed files with 849 additions and 36 deletions
+23 -1
View File
@@ -207,7 +207,18 @@ if (DAEDALUS_BUILD_VULKAN)
VERBATIM
)
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV})
set(CDEF_SPV ${CMAKE_BINARY_DIR}/v3d_cdef.spv)
add_custom_command(
OUTPUT ${CDEF_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${CDEF_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
COMMENT "glslang: v3d_cdef.comp -> v3d_cdef.spv"
VERBATIM
)
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV} ${CDEF_SPV})
# v3d_runner — reusable Vulkan plumbing.
add_library(v3d_runner STATIC src/v3d_runner.c)
@@ -255,6 +266,17 @@ if (DAEDALUS_BUILD_VULKAN)
target_link_libraries(bench_v3d_lpf8 PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_lpf8 PRIVATE -O2)
# Cycle 5 — QPU CDEF bench (3-way M1 against NEON + C ref).
add_executable(bench_v3d_cdef
tests/bench_v3d_cdef.c
tests/cdef_ref.c
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
add_dependencies(bench_v3d_cdef daedalus_shaders)
target_link_libraries(bench_v3d_cdef PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_cdef PRIVATE -O2)
# M4 — concurrent CPU(NEON) + QPU bench. Links the FFmpeg NEON
# snapshot so we can run real NEON kernels on pinned CPU cores
# while the QPU runs its dispatch loop concurrently.