Path B pivot + Phase 0-3 closed with first baseline numbers

This is a from-scratch initial commit on a fresh .git. The original scaffold commit (7510b56) and the earlier session's working-tree docs were lost in a 2026-05-18 10:25 working-tree wipe; the corrupted .git is preserved at .git-broken-2026-05-18/ (gitignored) for forensic inspection. Scope re-anchored from Path A (custom VPU firmware on VC7 scalar cores; blocked by BCM2712 silicon-RoT mask-ROM signature check) to Path B (QPU compute kernels via Mesa v3d / Vulkan compute or direct DRM, on stock signed Pi 5 / CM5). See README.md and docs/phase0.md for the substrate audit that closed Path A. Phases closed: Phase 0 — substrate audit; Path A blocked, Path B open; codec-back-end-fits-QPU finding (docs/phase0.md) Phase 1 — first kernel locked (VP9 / AV1 8x8 inverse DCT) with publish-before-measure R = M2/M3 decision rules (docs/phase1.md) Phase 2 — reference impls mapped; FFmpeg n7.1.3 source vendored under external/ffmpeg-snapshot/ (PROVENANCE.md pins commit f46e514 + per-file SHA-256s) (docs/phase2.md) Phase 3 — real baseline measurements on hertz (docs/phase3.md): M1 bit-exact 100.0000 % (10000/10000) M3 NEON IDCT8 single 8.171 Mblock/s (122.4 ns/block) M5a empty Vulkan submit 22.66 us M5b 1-WG noop dispatch 55.60 us M5 delta 32.95 us/dispatch => per-dispatch overhead is ~455x per-NEON-block cost; Phase 4 must batch at frame level or close to it. Build harness in place: CMakeLists.txt + tests/{bench_neon_idct.c, vp9_idct8_ref.c, bench_vulkan_dispatch.c, shaders/noop.comp} + external/ffmpeg-snapshot/config.h shim (7 defines + EXTERN_ASM). Builds clean on Debian Trixie aarch64 with cmake 3.31, ninja 1.12, libvulkan-dev 1.4.309, glslang-tools 15.1.0. Vendored FFmpeg .S assembles via the config.h shim. Next: Phase 4 (plan first QPU IDCT kernel under the M5 batching constraint) -> Phase 5 second-model review -> Phase 6 implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 11:30:12 +00:00
commit dcbbc77038
22 changed files with 9030 additions and 0 deletions
@@ -0,0 +1,92 @@
+# FFmpeg source snapshot
+
+Verbatim subset of FFmpeg source pinned for use as reference
+implementations of the VP9 8×8 inverse DCT (Phase 1 target of
+`daedalus-fourier`). See `../../docs/phase2.md §2` and `§5` for
+the rationale.
+
+## Upstream pin
+
+- **Repository**: https://github.com/FFmpeg/FFmpeg
+- **Tag**: `n7.1.3` (matches `libavcodec61 8:7.1.3-0+deb13u1+rpt1`
+  shipping in Debian Trixie on the dev host `hertz`)
+- **Annotated tag object**: `0a9a757e96fdf053697084bbd1f620edeac9d084`
+- **Commit object (tag target)**: `f46e514491172d15bd74b4abb1814cd2f05a763e`
+- **Snapshot fetched**: 2026-05-18 (UTC), via
+  `https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>`
+
+## Files in this snapshot
+
+All files are byte-for-byte copies of the upstream source at the
+tagged commit, no modifications.
+
+| Path | Lines | Bytes | SHA-256 |
+|---|---|---|---|
+| `libavcodec/vp9dsp_template.c` | 2578 | 89045 | `41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f` |
+| `libavcodec/aarch64/vp9itxfm_neon.S` | 1580 | 63534 | `82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6` |
+| `libavcodec/aarch64/neon.S` | 173 | 7496 | `72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538` |
+| `libavutil/aarch64/asm.S` | 260 | 8069 | `c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3` |
+| `COPYING.LGPLv2.1` | 502 | — | `b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe` |
+
+Verify with:
+
+```sh
+( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
+41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f  libavcodec/vp9dsp_template.c
+82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6  libavcodec/aarch64/vp9itxfm_neon.S
+72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538  libavcodec/aarch64/neon.S
+c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3  libavutil/aarch64/asm.S
+b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe  COPYING.LGPLv2.1
+EOF
+)
+```
+
+## License
+
+LGPL-2.1-or-later. See `COPYING.LGPLv2.1`. Original copyright
+holders include the FFmpeg authors and Google Inc. (2016) for
+the aarch64 NEON paths. The snapshot inherits FFmpeg's license
+in full.
+
+## Why each file is in this snapshot
+
+- `libavcodec/vp9dsp_template.c` — contains `idct_idct_8x8_add_c`,
+  the bit-exact C reference for the Phase 1 kernel under test (M1).
+- `libavcodec/aarch64/vp9itxfm_neon.S` — contains
+  `ff_vp9_idct_idct_8x8_add_neon`, the NEON throughput baseline
+  (M3). Also defines `idct8`, `dmbutterfly0`, `dmbutterfly`,
+  `dmbutterfly_l`, `butterfly_8h`, and the `idct_coeffs` constant
+  table.
+- `libavcodec/aarch64/neon.S` — defines `transpose_8x8H` used by
+  `vp9itxfm_neon.S`.
+- `libavutil/aarch64/asm.S` — defines `function`, `endfunc`,
+  `movrel`, `const`, `endconst`, and other assembly preamble
+  macros required to assemble the above NEON files.
+
+## Re-vendoring procedure
+
+If the upstream pin needs to change (e.g., hertz updates to a
+newer libavcodec):
+
+```sh
+TAG=nX.Y.Z
+BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
+cd external/ffmpeg-snapshot
+for f in libavcodec/vp9dsp_template.c \
+         libavcodec/aarch64/vp9itxfm_neon.S \
+         libavcodec/aarch64/neon.S \
+         libavutil/aarch64/asm.S \
+         COPYING.LGPLv2.1; do
+  curl -sSf -o "$f" "$BASE/$f"
+done
+sha256sum libavcodec/vp9dsp_template.c \
+          libavcodec/aarch64/vp9itxfm_neon.S \
+          libavcodec/aarch64/neon.S \
+          libavutil/aarch64/asm.S \
+          COPYING.LGPLv2.1
+# update this PROVENANCE.md with the new tag, commit hash, and hashes
+```
+
+After re-vendoring, re-run the bit-exact gate (M1) and throughput
+baseline (M3) — both can shift across FFmpeg versions even when
+the VP9 spec doesn't change (e.g., NEON micro-optimizations).