daedalus-fourier/tests at 436a5c4f74d3c9cdd60b4c25ff7c5131e41bcf76 - daedalus-fourier - marfrit's space

marfrit/daedalus-fourier

Files

T

History

marfrit 436a5c4f74 Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s

M1: 10000/10000 bit-exact (after orientation fix: ff_h264_v_loop_
filter is "vertical filtering of horizontal edges", not "vertical
edge"; 16 columns process the edge horizontally with 8 rows of
vertical context).

M3: 91.947 Medge/s per core. Per-edge 10.9 ns. 11x worst-case
1080p30 floor, 30x realistic floor. Filter triggers on 25 % of
edges (random alpha/beta/tc0 covers both gating paths).

Cycle 8 Phase 9 lesson: H.264/FFmpeg "v_loop_filter" naming uses
filter DIRECTION (vertical) not edge orientation. Edge is
horizontal; filter operates vertically across it. Distinct from
cycle 6's column-major-block lesson but related discovery
pattern. Encoded for future cycles.

R8 prediction revised: 0.09-0.14 ORANGE (down from Phase 1's
0.3-0.8 estimate). H.264 deblock is 2x faster on NEON than VP9
LPF wd=4 (cycle 2) but H.264 deblock has more per-edge branches
that hurt QPU more. Worth building anyway:
- ORANGE in cycle 1's "M4 may rescue" band
- Mixed-kernel deployment helper value (Issue 003) matters more
  than isolation R
- 25%-trigger rate gives 4x effective contribution multiplier
  on QPU side

- tests/h264_deblock_ref.c (column-walking C ref per row segment)
- tests/bench_neon_h264deblock.c (M1 + M3 bench)
- CMakeLists.txt: cycle 8 NEON bench wiring + h264dsp_neon.S
- docs/k8_h264deblock_phase3.md (closure)

Next: Phase 4 plan QPU shader, Phase 5 Sonnet review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 14:39:36 +00:00

..

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

.gitkeep

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

bench_concurrent_lpf8.c

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

bench_concurrent_lpf.c

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

bench_concurrent_mc.c

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

bench_concurrent_mixed.c

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

bench_concurrent.c

Phase 7 M4: mixed CPU+QPU beats pure 4-core NEON; project continues

2026-05-18 12:18:36 +00:00

bench_neon_cdef.c

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

bench_neon_h264deblock.c

Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s

2026-05-18 14:39:36 +00:00

bench_neon_h264idct4.c

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

bench_neon_h264idct8.c

Cycle 7 closed: H.264 IDCT 8x8 = 151 Mblock/s NEON, Phase 4 deferred

2026-05-18 14:16:42 +00:00

bench_neon_idct.c

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

bench_neon_lpf8.c

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

bench_neon_lpf.c

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

bench_neon_mc.c

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

bench_v3d_cdef.c

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

bench_v3d_idct.c

Phase 6 (v1+v4 production) + Phase 7 closure: R = 0.92 ± 0.03 on hertz

2026-05-18 12:09:00 +00:00

bench_v3d_lpf8.c

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

bench_v3d_lpf.c

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

bench_v3d_mc.c

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

bench_vulkan_dispatch.c

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

cdef_ref.c

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

h264_deblock_ref.c

Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s

2026-05-18 14:39:36 +00:00

h264_idct4_ref.c

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

h264_idct8_ref.c

Cycle 7 closed: H.264 IDCT 8x8 = 151 Mblock/s NEON, Phase 4 deferred

2026-05-18 14:16:42 +00:00

test_api_idct.c

Phase 8: wire IDCT QPU dispatch through public API

2026-05-18 13:55:55 +00:00

test_api_lpf.c

Phase 8: wire LPF wd=4 + wd=8 QPU through public API

2026-05-18 13:57:25 +00:00

vp9_idct8_ref.c

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

vp9_lpf8_ref.c

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

vp9_lpf_ref.c

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

vp9_mc_ref.c

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00