Files
daedalus-fourier/docs
marfrit 9c0bd72e70 Cycle 5 Phase 3 closed: M1 PASS via bench pointer-convention fix
The previous "layout mismatch" deferral was a one-line bench bug:
NEON expects the caller to pass tmp pointing at the 8x8 block
origin (after the 2*16+2 padding skip), but the bench passed the
raw padded-buffer origin. C ref does the advance internally, so it
filtered the correct block; NEON filtered a (+2 rows, +2 cols)
shifted region. Diagonal-shift trace in the partial doc was
exactly that.

Fix: tmps + i*TMP_INTS + (2*TMP_W + 2) for NEON calls.

Results:
  M1: 10000/10000 bit-exact (100.0000%), all 8 dirs balanced
  M3: 3.809 Mblock/s (consistent with 3.923 from longer window)

Phase 4 unblocked; predicted R5 = 0.02-0.05 (deep RED) per earlier
analysis. Will build QPU CDEF anyway for cycle-completeness +
V4L2 dispatch-path existence.

- tests/bench_neon_cdef.c: 3-line tmp pointer fix
- docs/k5_cdef_phase3.md: supersedes k5_cdef_phase3_partial.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:46:50 +00:00
..