H.264 scope added 2026-05-18 per user direction. Pi 5's VideoCore VII has no hardware H.264 decoder block (only HEVC), so a QPU-accelerated H.264 path fills the most impactful codec gap. Cycle 6 = first H.264 kernel (4x4 IDCT + add, smallest H.264 transform, simplest first cycle). Phase 1: goal doc + 1080p30 floor analysis (5.85 Mblock/s worst-case, 2.0 Mblock/s realistic since most MBs use 8x8 or P-skip). Phase 3: NEON M3 baseline captured. ff_h264_idct_add_neon on hertz delivers 175 Mblock/s (5.7 ns per block) = 30x worst-case floor margin. H.264 IDCT 4x4 is dramatically lighter than VP9 IDCT 8x8 (21x faster per block). Phase 3 closure also caught the key Phase 9 lesson: H.264/FFmpeg blocks are COLUMN-MAJOR (block[c*4 + r] = (row=r, col=c)). NEON ld1 with 4 registers interleaves loading, and the FFmpeg C ref indexing makes this convention explicit. Initial C ref assumed row-major, M1 was 5% bit-exact; after fix, M1 = 100%. Convention encoded for all subsequent H.264 cycles (cycle 7+). - external/ffmpeg-snapshot/libavcodec/aarch64/h264idct_neon.S (vendored verbatim from FFmpeg n7.1.3, 415 lines) - external/ffmpeg-snapshot/PROVENANCE.md: updated - tests/h264_idct4_ref.c: column-major C ref - tests/bench_neon_h264idct4.c: M1 + M3 bench - CMakeLists.txt: cycle 6 NEON bench wiring - docs/k6_h264idct4_phase1.md, phase3.md Phase 4 next: QPU shader for cycle 6. Predicted R6 = 0.01 (deep RED — kernel too small relative to QPU dispatch overhead) but worth building for cycle-completeness + the opportunistic-helper hypothesis (cycle 6 may stay CPU per recipe). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.3 KiB
FFmpeg source snapshot
Verbatim subset of FFmpeg source pinned for use as reference
implementations of the VP9 8×8 inverse DCT (Phase 1 target of
daedalus-fourier). See ../../docs/phase2.md §2 and §5 for
the rationale.
Upstream pin
- Repository: https://github.com/FFmpeg/FFmpeg
- Tag:
n7.1.3(matcheslibavcodec61 8:7.1.3-0+deb13u1+rpt1shipping in Debian Trixie on the dev hosthertz) - Annotated tag object:
0a9a757e96fdf053697084bbd1f620edeac9d084 - Commit object (tag target):
f46e514491172d15bd74b4abb1814cd2f05a763e - Snapshot fetched: 2026-05-18 (UTC), via
https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>
Files in this snapshot
All files are byte-for-byte copies of the upstream source at the tagged commit, no modifications.
| Path | Lines | Bytes | SHA-256 |
|---|---|---|---|
libavcodec/vp9dsp_template.c |
2578 | 89045 | 41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f |
libavcodec/aarch64/vp9itxfm_neon.S |
1580 | 63534 | 82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 |
libavcodec/aarch64/vp9lpf_neon.S |
1334 | — | 384e49e7a6e838d9e38aedc00838ed4aebfa6c5bdb343ecaf23ef639bc10fbb7 |
libavcodec/aarch64/vp9mc_neon.S |
665 | — | 6b1d50f9821742584fdd47758057f810644aff3a008faaa774ff5b9cac4d1fef |
libavcodec/aarch64/h264idct_neon.S |
415 | 16269 | 963ffe5f31b5a6a422e13b0d394cf5630126927abfb23aa214f7cbe83d60683f — H.264 IDCT 4×4/8×8/DC NEON kernels for cycle 6+ |
libavcodec/vp9_subpel_filters_table.c |
— | — | hand-extracted from libavcodec/vp9dsp.c at same n7.1.3 pin — provides ff_vp9_subpel_filters for vp9mc_neon.S to link against without dragging in vp9dsp.c's full init machinery |
libavcodec/aarch64/neon.S |
173 | 7496 | 72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 |
libavutil/aarch64/asm.S |
260 | 8069 | c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 |
COPYING.LGPLv2.1 |
502 | — | b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe |
Verify with:
( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f libavcodec/vp9dsp_template.c
82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 libavcodec/aarch64/vp9itxfm_neon.S
72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 libavcodec/aarch64/neon.S
c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 libavutil/aarch64/asm.S
b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe COPYING.LGPLv2.1
EOF
)
License
LGPL-2.1-or-later. See COPYING.LGPLv2.1. Original copyright
holders include the FFmpeg authors and Google Inc. (2016) for
the aarch64 NEON paths. The snapshot inherits FFmpeg's license
in full.
Why each file is in this snapshot
libavcodec/vp9dsp_template.c— containsidct_idct_8x8_add_c, the bit-exact C reference for the Phase 1 kernel under test (M1).libavcodec/aarch64/vp9itxfm_neon.S— containsff_vp9_idct_idct_8x8_add_neon, the NEON throughput baseline (M3). Also definesidct8,dmbutterfly0,dmbutterfly,dmbutterfly_l,butterfly_8h, and theidct_coeffsconstant table.libavcodec/aarch64/neon.S— definestranspose_8x8Hused byvp9itxfm_neon.S.libavutil/aarch64/asm.S— definesfunction,endfunc,movrel,const,endconst, and other assembly preamble macros required to assemble the above NEON files.
Re-vendoring procedure
If the upstream pin needs to change (e.g., hertz updates to a newer libavcodec):
TAG=nX.Y.Z
BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
cd external/ffmpeg-snapshot
for f in libavcodec/vp9dsp_template.c \
libavcodec/aarch64/vp9itxfm_neon.S \
libavcodec/aarch64/neon.S \
libavutil/aarch64/asm.S \
COPYING.LGPLv2.1; do
curl -sSf -o "$f" "$BASE/$f"
done
sha256sum libavcodec/vp9dsp_template.c \
libavcodec/aarch64/vp9itxfm_neon.S \
libavcodec/aarch64/neon.S \
libavutil/aarch64/asm.S \
COPYING.LGPLv2.1
# update this PROVENANCE.md with the new tag, commit hash, and hashes
After re-vendoring, re-run the bit-exact gate (M1) and throughput baseline (M3) — both can shift across FFmpeg versions even when the VP9 spec doesn't change (e.g., NEON micro-optimizations).