5a085e7180
Targets the one H.264 kernel most likely to be QPU-worthy: in-loop deblock. Cycles 6 and 7 (IDCT 4x4 and 8x8) both came in CPU-only because H.264 transforms are NEON-trivial. H.264 deblock has analogous structure to VP9 LPF (cycles 2+4, both GREEN) so predicted R8 = ORANGE/YELLOW. This commit: - Vendors ff_h264_*_loop_filter_*_neon from h264dsp_neon.S (1076 lines, includes both v/h luma + chroma + intra variants + weight/biweight) - PROVENANCE.md updated with the new vendored file - Phase 1 doc captures the full plan: start with luma vertical non-intra (most common case), defer Phase 3+ to next session H.264 deblock C ref scope is ~2 hours (per-row branching, per-4-row-segment tc0, ap/aq side conditions, alpha/beta thresholds — much more complex than VP9 LPF wd=4's single-branch filter). Deferring to fresh attention next session rather than rushing now. After cycle 8 closes, the H.264 QPU surface is well-characterised and the cycles-1-8 inventory drives the Phase 8 V4L2 wrapper's substrate-routing recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
98 lines
4.5 KiB
Markdown
98 lines
4.5 KiB
Markdown
# FFmpeg source snapshot
|
||
|
||
Verbatim subset of FFmpeg source pinned for use as reference
|
||
implementations of the VP9 8×8 inverse DCT (Phase 1 target of
|
||
`daedalus-fourier`). See `../../docs/phase2.md §2` and `§5` for
|
||
the rationale.
|
||
|
||
## Upstream pin
|
||
|
||
- **Repository**: https://github.com/FFmpeg/FFmpeg
|
||
- **Tag**: `n7.1.3` (matches `libavcodec61 8:7.1.3-0+deb13u1+rpt1`
|
||
shipping in Debian Trixie on the dev host `hertz`)
|
||
- **Annotated tag object**: `0a9a757e96fdf053697084bbd1f620edeac9d084`
|
||
- **Commit object (tag target)**: `f46e514491172d15bd74b4abb1814cd2f05a763e`
|
||
- **Snapshot fetched**: 2026-05-18 (UTC), via
|
||
`https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>`
|
||
|
||
## Files in this snapshot
|
||
|
||
All files are byte-for-byte copies of the upstream source at the
|
||
tagged commit, no modifications.
|
||
|
||
| Path | Lines | Bytes | SHA-256 |
|
||
|---|---|---|---|
|
||
| `libavcodec/vp9dsp_template.c` | 2578 | 89045 | `41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f` |
|
||
| `libavcodec/aarch64/vp9itxfm_neon.S` | 1580 | 63534 | `82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6` |
|
||
| `libavcodec/aarch64/vp9lpf_neon.S` | 1334 | — | `384e49e7a6e838d9e38aedc00838ed4aebfa6c5bdb343ecaf23ef639bc10fbb7` |
|
||
| `libavcodec/aarch64/vp9mc_neon.S` | 665 | — | `6b1d50f9821742584fdd47758057f810644aff3a008faaa774ff5b9cac4d1fef` |
|
||
| `libavcodec/aarch64/h264idct_neon.S` | 415 | 16269 | `963ffe5f31b5a6a422e13b0d394cf5630126927abfb23aa214f7cbe83d60683f` — H.264 IDCT 4×4/8×8/DC NEON kernels for cycle 6+ |
|
||
| `libavcodec/aarch64/h264dsp_neon.S` | 1076 | — | `978e076f0020e688b40c6dd827708c3d53e17c64a99fd0052e43d983536ce638` — H.264 in-loop deblock + weight/biweight kernels for cycle 8+ |
|
||
| `libavcodec/vp9_subpel_filters_table.c` | — | — | hand-extracted from `libavcodec/vp9dsp.c` at same n7.1.3 pin — provides `ff_vp9_subpel_filters` for `vp9mc_neon.S` to link against without dragging in vp9dsp.c's full init machinery |
|
||
| `libavcodec/aarch64/neon.S` | 173 | 7496 | `72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538` |
|
||
| `libavutil/aarch64/asm.S` | 260 | 8069 | `c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3` |
|
||
| `COPYING.LGPLv2.1` | 502 | — | `b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe` |
|
||
|
||
Verify with:
|
||
|
||
```sh
|
||
( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
|
||
41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f libavcodec/vp9dsp_template.c
|
||
82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 libavcodec/aarch64/vp9itxfm_neon.S
|
||
72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 libavcodec/aarch64/neon.S
|
||
c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 libavutil/aarch64/asm.S
|
||
b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe COPYING.LGPLv2.1
|
||
EOF
|
||
)
|
||
```
|
||
|
||
## License
|
||
|
||
LGPL-2.1-or-later. See `COPYING.LGPLv2.1`. Original copyright
|
||
holders include the FFmpeg authors and Google Inc. (2016) for
|
||
the aarch64 NEON paths. The snapshot inherits FFmpeg's license
|
||
in full.
|
||
|
||
## Why each file is in this snapshot
|
||
|
||
- `libavcodec/vp9dsp_template.c` — contains `idct_idct_8x8_add_c`,
|
||
the bit-exact C reference for the Phase 1 kernel under test (M1).
|
||
- `libavcodec/aarch64/vp9itxfm_neon.S` — contains
|
||
`ff_vp9_idct_idct_8x8_add_neon`, the NEON throughput baseline
|
||
(M3). Also defines `idct8`, `dmbutterfly0`, `dmbutterfly`,
|
||
`dmbutterfly_l`, `butterfly_8h`, and the `idct_coeffs` constant
|
||
table.
|
||
- `libavcodec/aarch64/neon.S` — defines `transpose_8x8H` used by
|
||
`vp9itxfm_neon.S`.
|
||
- `libavutil/aarch64/asm.S` — defines `function`, `endfunc`,
|
||
`movrel`, `const`, `endconst`, and other assembly preamble
|
||
macros required to assemble the above NEON files.
|
||
|
||
## Re-vendoring procedure
|
||
|
||
If the upstream pin needs to change (e.g., hertz updates to a
|
||
newer libavcodec):
|
||
|
||
```sh
|
||
TAG=nX.Y.Z
|
||
BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
|
||
cd external/ffmpeg-snapshot
|
||
for f in libavcodec/vp9dsp_template.c \
|
||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||
libavcodec/aarch64/neon.S \
|
||
libavutil/aarch64/asm.S \
|
||
COPYING.LGPLv2.1; do
|
||
curl -sSf -o "$f" "$BASE/$f"
|
||
done
|
||
sha256sum libavcodec/vp9dsp_template.c \
|
||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||
libavcodec/aarch64/neon.S \
|
||
libavutil/aarch64/asm.S \
|
||
COPYING.LGPLv2.1
|
||
# update this PROVENANCE.md with the new tag, commit hash, and hashes
|
||
```
|
||
|
||
After re-vendoring, re-run the bit-exact gate (M1) and throughput
|
||
baseline (M3) — both can shift across FFmpeg versions even when
|
||
the VP9 spec doesn't change (e.g., NEON micro-optimizations).
|