dcbbc77038
This is a from-scratch initial commit on a fresh .git. The original
scaffold commit (7510b56) and the earlier session's working-tree
docs were lost in a 2026-05-18 10:25 working-tree wipe; the corrupted
.git is preserved at .git-broken-2026-05-18/ (gitignored) for
forensic inspection.
Scope re-anchored from Path A (custom VPU firmware on VC7 scalar
cores; blocked by BCM2712 silicon-RoT mask-ROM signature check)
to Path B (QPU compute kernels via Mesa v3d / Vulkan compute or
direct DRM, on stock signed Pi 5 / CM5). See README.md and
docs/phase0.md for the substrate audit that closed Path A.
Phases closed:
Phase 0 — substrate audit; Path A blocked, Path B open;
codec-back-end-fits-QPU finding (docs/phase0.md)
Phase 1 — first kernel locked (VP9 / AV1 8x8 inverse DCT) with
publish-before-measure R = M2/M3 decision rules
(docs/phase1.md)
Phase 2 — reference impls mapped; FFmpeg n7.1.3 source vendored
under external/ffmpeg-snapshot/ (PROVENANCE.md pins
commit f46e514 + per-file SHA-256s) (docs/phase2.md)
Phase 3 — real baseline measurements on hertz (docs/phase3.md):
M1 bit-exact 100.0000 % (10000/10000)
M3 NEON IDCT8 single 8.171 Mblock/s (122.4 ns/block)
M5a empty Vulkan submit 22.66 us
M5b 1-WG noop dispatch 55.60 us
M5 delta 32.95 us/dispatch
=> per-dispatch overhead is ~455x per-NEON-block cost;
Phase 4 must batch at frame level or close to it.
Build harness in place: CMakeLists.txt + tests/{bench_neon_idct.c,
vp9_idct8_ref.c, bench_vulkan_dispatch.c, shaders/noop.comp} +
external/ffmpeg-snapshot/config.h shim (7 defines + EXTERN_ASM).
Builds clean on Debian Trixie aarch64 with cmake 3.31, ninja 1.12,
libvulkan-dev 1.4.309, glslang-tools 15.1.0. Vendored FFmpeg .S
assembles via the config.h shim.
Next: Phase 4 (plan first QPU IDCT kernel under the M5 batching
constraint) -> Phase 5 second-model review -> Phase 6 implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
93 lines
3.7 KiB
Markdown
93 lines
3.7 KiB
Markdown
# FFmpeg source snapshot
|
||
|
||
Verbatim subset of FFmpeg source pinned for use as reference
|
||
implementations of the VP9 8×8 inverse DCT (Phase 1 target of
|
||
`daedalus-fourier`). See `../../docs/phase2.md §2` and `§5` for
|
||
the rationale.
|
||
|
||
## Upstream pin
|
||
|
||
- **Repository**: https://github.com/FFmpeg/FFmpeg
|
||
- **Tag**: `n7.1.3` (matches `libavcodec61 8:7.1.3-0+deb13u1+rpt1`
|
||
shipping in Debian Trixie on the dev host `hertz`)
|
||
- **Annotated tag object**: `0a9a757e96fdf053697084bbd1f620edeac9d084`
|
||
- **Commit object (tag target)**: `f46e514491172d15bd74b4abb1814cd2f05a763e`
|
||
- **Snapshot fetched**: 2026-05-18 (UTC), via
|
||
`https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>`
|
||
|
||
## Files in this snapshot
|
||
|
||
All files are byte-for-byte copies of the upstream source at the
|
||
tagged commit, no modifications.
|
||
|
||
| Path | Lines | Bytes | SHA-256 |
|
||
|---|---|---|---|
|
||
| `libavcodec/vp9dsp_template.c` | 2578 | 89045 | `41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f` |
|
||
| `libavcodec/aarch64/vp9itxfm_neon.S` | 1580 | 63534 | `82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6` |
|
||
| `libavcodec/aarch64/neon.S` | 173 | 7496 | `72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538` |
|
||
| `libavutil/aarch64/asm.S` | 260 | 8069 | `c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3` |
|
||
| `COPYING.LGPLv2.1` | 502 | — | `b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe` |
|
||
|
||
Verify with:
|
||
|
||
```sh
|
||
( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
|
||
41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f libavcodec/vp9dsp_template.c
|
||
82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 libavcodec/aarch64/vp9itxfm_neon.S
|
||
72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 libavcodec/aarch64/neon.S
|
||
c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 libavutil/aarch64/asm.S
|
||
b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe COPYING.LGPLv2.1
|
||
EOF
|
||
)
|
||
```
|
||
|
||
## License
|
||
|
||
LGPL-2.1-or-later. See `COPYING.LGPLv2.1`. Original copyright
|
||
holders include the FFmpeg authors and Google Inc. (2016) for
|
||
the aarch64 NEON paths. The snapshot inherits FFmpeg's license
|
||
in full.
|
||
|
||
## Why each file is in this snapshot
|
||
|
||
- `libavcodec/vp9dsp_template.c` — contains `idct_idct_8x8_add_c`,
|
||
the bit-exact C reference for the Phase 1 kernel under test (M1).
|
||
- `libavcodec/aarch64/vp9itxfm_neon.S` — contains
|
||
`ff_vp9_idct_idct_8x8_add_neon`, the NEON throughput baseline
|
||
(M3). Also defines `idct8`, `dmbutterfly0`, `dmbutterfly`,
|
||
`dmbutterfly_l`, `butterfly_8h`, and the `idct_coeffs` constant
|
||
table.
|
||
- `libavcodec/aarch64/neon.S` — defines `transpose_8x8H` used by
|
||
`vp9itxfm_neon.S`.
|
||
- `libavutil/aarch64/asm.S` — defines `function`, `endfunc`,
|
||
`movrel`, `const`, `endconst`, and other assembly preamble
|
||
macros required to assemble the above NEON files.
|
||
|
||
## Re-vendoring procedure
|
||
|
||
If the upstream pin needs to change (e.g., hertz updates to a
|
||
newer libavcodec):
|
||
|
||
```sh
|
||
TAG=nX.Y.Z
|
||
BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
|
||
cd external/ffmpeg-snapshot
|
||
for f in libavcodec/vp9dsp_template.c \
|
||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||
libavcodec/aarch64/neon.S \
|
||
libavutil/aarch64/asm.S \
|
||
COPYING.LGPLv2.1; do
|
||
curl -sSf -o "$f" "$BASE/$f"
|
||
done
|
||
sha256sum libavcodec/vp9dsp_template.c \
|
||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||
libavcodec/aarch64/neon.S \
|
||
libavutil/aarch64/asm.S \
|
||
COPYING.LGPLv2.1
|
||
# update this PROVENANCE.md with the new tag, commit hash, and hashes
|
||
```
|
||
|
||
After re-vendoring, re-run the bit-exact gate (M1) and throughput
|
||
baseline (M3) — both can shift across FFmpeg versions even when
|
||
the VP9 spec doesn't change (e.g., NEON micro-optimizations).
|