Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper
Phase 4 plan with 3 Phase-5 REDs applied inline: - meta layout: m.z=tmp_off, m.w=dir - sec_shift clamped to >=0 (NEON uqsub semantics) - directions table as const ivec2[14], not OR-packed Phase 6 deliverable: v3d_cdef.comp (387 inst, 2 threads, no spills). 3-way M1 (QPU vs C ref vs NEON) PASS 4096/4096. M2: 0.443 Mblock/s -> R5 = 0.116 ORANGE (predicted 0.02-0.05 RED). M4 same-kernel: NEON-3+QPU 8.46 < NEON-4 alone ~10 (negative). M4 mixed (NEON-3 MC + QPU CDEF): CPU 34.17 Mblock/s MC, QPU 0.42 Mblock/s CDEF helper. CPU side higher than the Issue 003 NEON-fallback proxy suggested - cross-substrate contention is gentler than same-side NEON contention. Verdict: CDEF stays on CPU; QPU dispatch path exists for opportunistic use. Deployment recipe table updated for all 5 cycles. Phase 9 lessons: linear extrapolation across cycles is too pessimistic; CDEF is bandwidth-bound on NEON despite high per-block ns; real-substrate-cross contention < NEON-proxy contention. - src/v3d_cdef.comp: cycle 5 QPU shader - tests/bench_v3d_cdef.c: 3-way M1, M2 bench - tests/bench_concurrent_mixed.c: K_CDEF on both sides - tests/cdef_ref.c + bench_neon_cdef.c: sec_shift clamp + expanded damping range to exercise the edge case - CMakeLists.txt: v3d_cdef.spv + bench_v3d_cdef wiring - docs/k5_cdef_phase4.md updated with Phase 5 review applied - docs/k5_cdef_phase7.md: closure doc with full verdict matrix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -79,12 +79,17 @@ static void gen_filter_params(int *pri, int *sec, int *dir, int *damping)
|
||||
* pri_strength: 1..7 (non-zero for combined path)
|
||||
* sec_strength: 1..4
|
||||
* dir: 0..7
|
||||
* damping: 3..6
|
||||
* damping: 1..6 — extended down to 1 (was 3..6) per
|
||||
* cycle 5 phase 5 RED-2: include cases where
|
||||
* sec_shift = damping - ulog2(sec) goes negative
|
||||
* (e.g. damping=1, sec=4 → sec_shift = -1).
|
||||
* Both NEON (uqsub) and C ref (now max(0,...))
|
||||
* saturate to 0 here; the bench should exercise it.
|
||||
*/
|
||||
*pri = (int)(xs() % 7) + 1;
|
||||
*sec = (int)(xs() % 4) + 1;
|
||||
*dir = (int)(xs() & 7);
|
||||
*damping = (int)(xs() % 4) + 3;
|
||||
*damping = (int)(xs() % 6) + 1;
|
||||
}
|
||||
|
||||
static double now_seconds(void)
|
||||
|
||||
Reference in New Issue
Block a user