Targets the one H.264 kernel most likely to be QPU-worthy:
in-loop deblock. Cycles 6 and 7 (IDCT 4x4 and 8x8) both came in
CPU-only because H.264 transforms are NEON-trivial. H.264
deblock has analogous structure to VP9 LPF (cycles 2+4, both
GREEN) so predicted R8 = ORANGE/YELLOW.
This commit:
- Vendors ff_h264_*_loop_filter_*_neon from h264dsp_neon.S
(1076 lines, includes both v/h luma + chroma + intra variants
+ weight/biweight)
- PROVENANCE.md updated with the new vendored file
- Phase 1 doc captures the full plan: start with luma vertical
non-intra (most common case), defer Phase 3+ to next session
H.264 deblock C ref scope is ~2 hours (per-row branching,
per-4-row-segment tc0, ap/aq side conditions, alpha/beta
thresholds — much more complex than VP9 LPF wd=4's
single-branch filter). Deferring to fresh attention next
session rather than rushing now.
After cycle 8 closes, the H.264 QPU surface is well-characterised
and the cycles-1-8 inventory drives the Phase 8 V4L2 wrapper's
substrate-routing recipe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>