Path B pivot + Phase 0-3 closed with first baseline numbers

This is a from-scratch initial commit on a fresh .git. The original
scaffold commit (7510b56) and the earlier session's working-tree
docs were lost in a 2026-05-18 10:25 working-tree wipe; the corrupted
.git is preserved at .git-broken-2026-05-18/ (gitignored) for
forensic inspection.

Scope re-anchored from Path A (custom VPU firmware on VC7 scalar
cores; blocked by BCM2712 silicon-RoT mask-ROM signature check)
to Path B (QPU compute kernels via Mesa v3d / Vulkan compute or
direct DRM, on stock signed Pi 5 / CM5). See README.md and
docs/phase0.md for the substrate audit that closed Path A.

Phases closed:
  Phase 0 — substrate audit; Path A blocked, Path B open;
            codec-back-end-fits-QPU finding (docs/phase0.md)
  Phase 1 — first kernel locked (VP9 / AV1 8x8 inverse DCT) with
            publish-before-measure R = M2/M3 decision rules
            (docs/phase1.md)
  Phase 2 — reference impls mapped; FFmpeg n7.1.3 source vendored
            under external/ffmpeg-snapshot/ (PROVENANCE.md pins
            commit f46e514 + per-file SHA-256s) (docs/phase2.md)
  Phase 3 — real baseline measurements on hertz (docs/phase3.md):
              M1 bit-exact            100.0000 % (10000/10000)
              M3 NEON IDCT8 single    8.171 Mblock/s (122.4 ns/block)
              M5a empty Vulkan submit 22.66 us
              M5b 1-WG noop dispatch  55.60 us
              M5 delta                32.95 us/dispatch
            => per-dispatch overhead is ~455x per-NEON-block cost;
               Phase 4 must batch at frame level or close to it.

Build harness in place: CMakeLists.txt + tests/{bench_neon_idct.c,
vp9_idct8_ref.c, bench_vulkan_dispatch.c, shaders/noop.comp} +
external/ffmpeg-snapshot/config.h shim (7 defines + EXTERN_ASM).
Builds clean on Debian Trixie aarch64 with cmake 3.31, ninja 1.12,
libvulkan-dev 1.4.309, glslang-tools 15.1.0. Vendored FFmpeg .S
assembles via the config.h shim.

Next: Phase 4 (plan first QPU IDCT kernel under the M5 batching
constraint) -> Phase 5 second-model review -> Phase 6 implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 11:30:12 +00:00
commit dcbbc77038
22 changed files with 9030 additions and 0 deletions
+92
View File
@@ -0,0 +1,92 @@
# FFmpeg source snapshot
Verbatim subset of FFmpeg source pinned for use as reference
implementations of the VP9 8×8 inverse DCT (Phase 1 target of
`daedalus-fourier`). See `../../docs/phase2.md §2` and `§5` for
the rationale.
## Upstream pin
- **Repository**: https://github.com/FFmpeg/FFmpeg
- **Tag**: `n7.1.3` (matches `libavcodec61 8:7.1.3-0+deb13u1+rpt1`
shipping in Debian Trixie on the dev host `hertz`)
- **Annotated tag object**: `0a9a757e96fdf053697084bbd1f620edeac9d084`
- **Commit object (tag target)**: `f46e514491172d15bd74b4abb1814cd2f05a763e`
- **Snapshot fetched**: 2026-05-18 (UTC), via
`https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>`
## Files in this snapshot
All files are byte-for-byte copies of the upstream source at the
tagged commit, no modifications.
| Path | Lines | Bytes | SHA-256 |
|---|---|---|---|
| `libavcodec/vp9dsp_template.c` | 2578 | 89045 | `41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f` |
| `libavcodec/aarch64/vp9itxfm_neon.S` | 1580 | 63534 | `82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6` |
| `libavcodec/aarch64/neon.S` | 173 | 7496 | `72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538` |
| `libavutil/aarch64/asm.S` | 260 | 8069 | `c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3` |
| `COPYING.LGPLv2.1` | 502 | — | `b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe` |
Verify with:
```sh
( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f libavcodec/vp9dsp_template.c
82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 libavcodec/aarch64/vp9itxfm_neon.S
72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 libavcodec/aarch64/neon.S
c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 libavutil/aarch64/asm.S
b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe COPYING.LGPLv2.1
EOF
)
```
## License
LGPL-2.1-or-later. See `COPYING.LGPLv2.1`. Original copyright
holders include the FFmpeg authors and Google Inc. (2016) for
the aarch64 NEON paths. The snapshot inherits FFmpeg's license
in full.
## Why each file is in this snapshot
- `libavcodec/vp9dsp_template.c` — contains `idct_idct_8x8_add_c`,
the bit-exact C reference for the Phase 1 kernel under test (M1).
- `libavcodec/aarch64/vp9itxfm_neon.S` — contains
`ff_vp9_idct_idct_8x8_add_neon`, the NEON throughput baseline
(M3). Also defines `idct8`, `dmbutterfly0`, `dmbutterfly`,
`dmbutterfly_l`, `butterfly_8h`, and the `idct_coeffs` constant
table.
- `libavcodec/aarch64/neon.S` — defines `transpose_8x8H` used by
`vp9itxfm_neon.S`.
- `libavutil/aarch64/asm.S` — defines `function`, `endfunc`,
`movrel`, `const`, `endconst`, and other assembly preamble
macros required to assemble the above NEON files.
## Re-vendoring procedure
If the upstream pin needs to change (e.g., hertz updates to a
newer libavcodec):
```sh
TAG=nX.Y.Z
BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
cd external/ffmpeg-snapshot
for f in libavcodec/vp9dsp_template.c \
libavcodec/aarch64/vp9itxfm_neon.S \
libavcodec/aarch64/neon.S \
libavutil/aarch64/asm.S \
COPYING.LGPLv2.1; do
curl -sSf -o "$f" "$BASE/$f"
done
sha256sum libavcodec/vp9dsp_template.c \
libavcodec/aarch64/vp9itxfm_neon.S \
libavcodec/aarch64/neon.S \
libavutil/aarch64/asm.S \
COPYING.LGPLv2.1
# update this PROVENANCE.md with the new tag, commit hash, and hashes
```
After re-vendoring, re-run the bit-exact gate (M1) and throughput
baseline (M3) — both can shift across FFmpeg versions even when
the VP9 spec doesn't change (e.g., NEON micro-optimizations).