marfrit
|
f2ba08e1cf
|
Cycle 8 Phase 4: H.264 deblock QPU shader plan
Per-column dispatch (16 cols/edge, 16 edges/WG, 256 inv/WG).
Follows cycle 2 LPF wd=4 template with H.264-specific
adjustments: alpha/beta gating + tc0 per-4-col-segment + ap/aq
side conditions for conditional p1/q1 writes.
Predicted shaderdb: ~150-200 inst, 2-3 threads. Predicted R8 =
0.09-0.14 ORANGE (per Phase 3 closure).
7 Phase 5 review items flagged for Sonnet audit:
- tc0 sign-extension semantics
- Multiple early-return safety (no barrier follows — safe)
- abs() on int operands
- clamp vs clip3 equivalence
- per-segment tc0 LUT extraction tradeoff
- alpha=0/beta=0 outer precondition
- dst_off arithmetic
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-18 14:40:07 +00:00 |
|