f2ba08e1cf
Per-column dispatch (16 cols/edge, 16 edges/WG, 256 inv/WG). Follows cycle 2 LPF wd=4 template with H.264-specific adjustments: alpha/beta gating + tc0 per-4-col-segment + ap/aq side conditions for conditional p1/q1 writes. Predicted shaderdb: ~150-200 inst, 2-3 threads. Predicted R8 = 0.09-0.14 ORANGE (per Phase 3 closure). 7 Phase 5 review items flagged for Sonnet audit: - tc0 sign-extension semantics - Multiple early-return safety (no barrier follows — safe) - abs() on int operands - clamp vs clip3 equivalence - per-segment tc0 LUT extraction tradeoff - alpha=0/beta=0 outer precondition - dst_off arithmetic Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>