daedalus-fourier/docs at db2205d0e352df29a9c3761a81256a822edb7d01 - daedalus-fourier - marfrit's space

marfrit/daedalus-fourier

Files

T

History

marfrit db2205d0e3 Cycle 7 closed: H.264 IDCT 8x8 = 151 Mblock/s NEON, Phase 4 deferred

M1: 10000/10000 bit-exact first try (column-major-block lesson
from cycle 6 carried over cleanly).

M3: 151.2 Mblock/s per core. Per-block 6.6 ns. 155x the
1080p30 floor (0.972 Mblock/s req'd).

Phase-1 prediction of R7 = 0.5-0.9 YELLOW/GREEN was WRONG. H.264
IDCT 8x8 is dramatically lighter than VP9 IDCT 8x8 (18.5x faster
NEON):

  VP9 IDCT 8x8: 122 ns/block (Q14 trig + COSPI multiplies)
  H.264 IDCT 8x8: 6.6 ns/block (pure integer butterfly + shifts)

Phase 4 deferred via the cycle 6 lightweight-kernel rationale:
NEON per-block << QPU dispatch floor; offload doesn't help.

Phase 9 lesson updated: H.264 transforms (both 4x4 and 8x8) are
NEON-trivial. Skip ALL H.264 transform cycles for QPU. Target
compute-heavy H.264 kernels only (deblock = cycle 8 next; MC
likely RED).

Cycle 7 = 2nd consecutive "predicted GREEN, measured CPU-only"
result. Forces a sharper view of which kernels QPU can actually
help with: deblock and possibly some VP9 cases.

- tests/h264_idct8_ref.c (column-major C ref)
- tests/bench_neon_h264idct8.c (M1 + M3 bench)
- CMakeLists.txt: cycle 7 bench wiring
- docs/k7_h264idct8_phase3_and_4.md (closure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 14:16:42 +00:00

..

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

dev_process.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

k2_deblock_phase1.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase2.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase3.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase4.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k2_deblock_phase5.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k2_deblock_phase7.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k3_mc_phase1.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase2.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase3.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase4.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase5.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase7.md

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

k4_lpf8_phase1_3.md

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

k4_lpf8_phase4_7.md

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

k5_cdef_phase1_2.md

Cycle 5 setup (Phase 1+2): vendor dav1d 1.4.3 CDEF sources

2026-05-18 13:12:25 +00:00

k5_cdef_phase3_partial.md

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

k5_cdef_phase3.md

Cycle 5 Phase 3 closed: M1 PASS via bench pointer-convention fix

2026-05-18 13:46:50 +00:00

k5_cdef_phase4.md

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

k5_cdef_phase7.md

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

k6_h264idct4_phase1.md

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

k6_h264idct4_phase3.md

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

k6_h264idct4_phase4.md

Cycle 6 closed (deferred Phase 4): IDCT 4x4 too small for QPU

2026-05-18 14:15:25 +00:00

k7_h264idct8_phase1.md

Cycle 7 (H.264 IDCT 8x8) opened — Phase 1 goal doc

2026-05-18 14:15:37 +00:00

k7_h264idct8_phase3_and_4.md

Cycle 7 closed: H.264 IDCT 8x8 = 151 Mblock/s NEON, Phase 4 deferred

2026-05-18 14:16:42 +00:00

phase0.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase1.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

phase2.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase3.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

phase4.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase5.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase7_M4.md

Phase 7 M4: mixed CPU+QPU beats pure 4-core NEON; project continues

2026-05-18 12:18:36 +00:00

phase7.md

Phase 6 (v1+v4 production) + Phase 7 closure: R = 0.92 ± 0.03 on hertz

2026-05-18 12:09:00 +00:00

phase8_scoping.md

Phase 8 skeleton: public C API + first end-to-end smoke test

2026-05-18 13:54:43 +00:00

vulkaninfo_v3d_7_1_7_hertz.txt

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00