Files
daedalus-fourier/tests
marfrit 436a5c4f74 Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s
M1: 10000/10000 bit-exact (after orientation fix: ff_h264_v_loop_
filter is "vertical filtering of horizontal edges", not "vertical
edge"; 16 columns process the edge horizontally with 8 rows of
vertical context).

M3: 91.947 Medge/s per core. Per-edge 10.9 ns. 11x worst-case
1080p30 floor, 30x realistic floor. Filter triggers on 25 % of
edges (random alpha/beta/tc0 covers both gating paths).

Cycle 8 Phase 9 lesson: H.264/FFmpeg "v_loop_filter" naming uses
filter DIRECTION (vertical) not edge orientation. Edge is
horizontal; filter operates vertically across it. Distinct from
cycle 6's column-major-block lesson but related discovery
pattern. Encoded for future cycles.

R8 prediction revised: 0.09-0.14 ORANGE (down from Phase 1's
0.3-0.8 estimate). H.264 deblock is 2x faster on NEON than VP9
LPF wd=4 (cycle 2) but H.264 deblock has more per-edge branches
that hurt QPU more. Worth building anyway:
- ORANGE in cycle 1's "M4 may rescue" band
- Mixed-kernel deployment helper value (Issue 003) matters more
  than isolation R
- 25%-trigger rate gives 4x effective contribution multiplier
  on QPU side

- tests/h264_deblock_ref.c (column-walking C ref per row segment)
- tests/bench_neon_h264deblock.c (M1 + M3 bench)
- CMakeLists.txt: cycle 8 NEON bench wiring + h264dsp_neon.S
- docs/k8_h264deblock_phase3.md (closure)

Next: Phase 4 plan QPU shader, Phase 5 Sonnet review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:39:36 +00:00
..