Commit Graph

1 Commits

Author SHA1 Message Date
marfrit 7288473d79 Cycle 6 closed (deferred Phase 4): IDCT 4x4 too small for QPU
Phase 4 QPU shader DEFERRED (not RED-by-build, but predicted-RED
and not worth building):
- NEON delivers 175 Mblock/s (5.7 ns/block) on a single core
- QPU per-block floor ~250 ns (from cycle 1 scaling) → R6 = 0.022
- Mixed-kernel helper contribution would be ~1-2 Mblock/s — <1%
  of NEON capacity
- 30fps@1080p worst case = 5.85 Mblock/s; NEON delivers 30x that
  on ONE core. No need for QPU help.

Phase 9 lesson: for any cycle with NEON per-block < ~30ns, predict
deep RED and defer Phase 4 unless there's a specific structural
QPU advantage. Shapes future cycle selection: prefer compute-heavy
kernels (cycle 7 H.264 IDCT 8x8 next; cycle 9 luma qpel MC; cycle
10 deblock).

Cycle 6 phase tally: Phase 1 ✓, Phase 2 implicit, Phase 3 ✓
(M1 + M3), Phase 4 DEFERRED, Phase 5-7 N/A, Phase 8 trivial
CPU-only (recipe = stay CPU), Phase 9 ✓.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:15:25 +00:00