Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper
Phase 4 plan with 3 Phase-5 REDs applied inline: - meta layout: m.z=tmp_off, m.w=dir - sec_shift clamped to >=0 (NEON uqsub semantics) - directions table as const ivec2[14], not OR-packed Phase 6 deliverable: v3d_cdef.comp (387 inst, 2 threads, no spills). 3-way M1 (QPU vs C ref vs NEON) PASS 4096/4096. M2: 0.443 Mblock/s -> R5 = 0.116 ORANGE (predicted 0.02-0.05 RED). M4 same-kernel: NEON-3+QPU 8.46 < NEON-4 alone ~10 (negative). M4 mixed (NEON-3 MC + QPU CDEF): CPU 34.17 Mblock/s MC, QPU 0.42 Mblock/s CDEF helper. CPU side higher than the Issue 003 NEON-fallback proxy suggested - cross-substrate contention is gentler than same-side NEON contention. Verdict: CDEF stays on CPU; QPU dispatch path exists for opportunistic use. Deployment recipe table updated for all 5 cycles. Phase 9 lessons: linear extrapolation across cycles is too pessimistic; CDEF is bandwidth-bound on NEON despite high per-block ns; real-substrate-cross contention < NEON-proxy contention. - src/v3d_cdef.comp: cycle 5 QPU shader - tests/bench_v3d_cdef.c: 3-way M1, M2 bench - tests/bench_concurrent_mixed.c: K_CDEF on both sides - tests/cdef_ref.c + bench_neon_cdef.c: sec_shift clamp + expanded damping range to exercise the edge case - CMakeLists.txt: v3d_cdef.spv + bench_v3d_cdef wiring - docs/k5_cdef_phase4.md updated with Phase 5 review applied - docs/k5_cdef_phase7.md: closure doc with full verdict matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+4
-1
@@ -98,7 +98,10 @@ void daedalus_cdef_filter_8x8_pri_sec_ref(
|
||||
{
|
||||
const int pri_tap = 4 - (pri_strength & 1);
|
||||
const int pri_shift = imax(0, damping - ulog2((unsigned) pri_strength));
|
||||
const int sec_shift = damping - ulog2((unsigned) sec_strength);
|
||||
/* Cycle 5 phase 5 RED-2: NEON `uqsub` saturates to 0. Mirror it
|
||||
* here so the C ref is bit-exact against NEON for damping-light
|
||||
* cases (which the original bench param gen didn't exercise). */
|
||||
const int sec_shift = imax(0, damping - ulog2((unsigned) sec_strength));
|
||||
|
||||
/* Walk into the center 8x8 region of the 12×16 padded buffer. */
|
||||
tmp = tmp + 2 * TMP_STRIDE + 2;
|
||||
|
||||
Reference in New Issue
Block a user