Path B pivot + Phase 0-3 closed with first baseline numbers
This is a from-scratch initial commit on a fresh .git. The original
scaffold commit (7510b56) and the earlier session's working-tree
docs were lost in a 2026-05-18 10:25 working-tree wipe; the corrupted
.git is preserved at .git-broken-2026-05-18/ (gitignored) for
forensic inspection.
Scope re-anchored from Path A (custom VPU firmware on VC7 scalar
cores; blocked by BCM2712 silicon-RoT mask-ROM signature check)
to Path B (QPU compute kernels via Mesa v3d / Vulkan compute or
direct DRM, on stock signed Pi 5 / CM5). See README.md and
docs/phase0.md for the substrate audit that closed Path A.
Phases closed:
Phase 0 — substrate audit; Path A blocked, Path B open;
codec-back-end-fits-QPU finding (docs/phase0.md)
Phase 1 — first kernel locked (VP9 / AV1 8x8 inverse DCT) with
publish-before-measure R = M2/M3 decision rules
(docs/phase1.md)
Phase 2 — reference impls mapped; FFmpeg n7.1.3 source vendored
under external/ffmpeg-snapshot/ (PROVENANCE.md pins
commit f46e514 + per-file SHA-256s) (docs/phase2.md)
Phase 3 — real baseline measurements on hertz (docs/phase3.md):
M1 bit-exact 100.0000 % (10000/10000)
M3 NEON IDCT8 single 8.171 Mblock/s (122.4 ns/block)
M5a empty Vulkan submit 22.66 us
M5b 1-WG noop dispatch 55.60 us
M5 delta 32.95 us/dispatch
=> per-dispatch overhead is ~455x per-NEON-block cost;
Phase 4 must batch at frame level or close to it.
Build harness in place: CMakeLists.txt + tests/{bench_neon_idct.c,
vp9_idct8_ref.c, bench_vulkan_dispatch.c, shaders/noop.comp} +
external/ffmpeg-snapshot/config.h shim (7 defines + EXTERN_ASM).
Builds clean on Debian Trixie aarch64 with cmake 3.31, ninja 1.12,
libvulkan-dev 1.4.309, glslang-tools 15.1.0. Vendored FFmpeg .S
assembles via the config.h shim.
Next: Phase 4 (plan first QPU IDCT kernel under the M5 batching
constraint) -> Phase 5 second-model review -> Phase 6 implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+92
@@ -0,0 +1,92 @@
|
||||
# FFmpeg source snapshot
|
||||
|
||||
Verbatim subset of FFmpeg source pinned for use as reference
|
||||
implementations of the VP9 8×8 inverse DCT (Phase 1 target of
|
||||
`daedalus-fourier`). See `../../docs/phase2.md §2` and `§5` for
|
||||
the rationale.
|
||||
|
||||
## Upstream pin
|
||||
|
||||
- **Repository**: https://github.com/FFmpeg/FFmpeg
|
||||
- **Tag**: `n7.1.3` (matches `libavcodec61 8:7.1.3-0+deb13u1+rpt1`
|
||||
shipping in Debian Trixie on the dev host `hertz`)
|
||||
- **Annotated tag object**: `0a9a757e96fdf053697084bbd1f620edeac9d084`
|
||||
- **Commit object (tag target)**: `f46e514491172d15bd74b4abb1814cd2f05a763e`
|
||||
- **Snapshot fetched**: 2026-05-18 (UTC), via
|
||||
`https://raw.githubusercontent.com/FFmpeg/FFmpeg/n7.1.3/<path>`
|
||||
|
||||
## Files in this snapshot
|
||||
|
||||
All files are byte-for-byte copies of the upstream source at the
|
||||
tagged commit, no modifications.
|
||||
|
||||
| Path | Lines | Bytes | SHA-256 |
|
||||
|---|---|---|---|
|
||||
| `libavcodec/vp9dsp_template.c` | 2578 | 89045 | `41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f` |
|
||||
| `libavcodec/aarch64/vp9itxfm_neon.S` | 1580 | 63534 | `82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6` |
|
||||
| `libavcodec/aarch64/neon.S` | 173 | 7496 | `72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538` |
|
||||
| `libavutil/aarch64/asm.S` | 260 | 8069 | `c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3` |
|
||||
| `COPYING.LGPLv2.1` | 502 | — | `b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe` |
|
||||
|
||||
Verify with:
|
||||
|
||||
```sh
|
||||
( cd external/ffmpeg-snapshot && sha256sum -c <<'EOF'
|
||||
41b21f667a6c497b620aa1637d8269badc45d1ac7e621d694441c5bf39356e4f libavcodec/vp9dsp_template.c
|
||||
82ee3ceed4735c63576bafdcee28e2215652743ade55a9eab46a16d9530369f6 libavcodec/aarch64/vp9itxfm_neon.S
|
||||
72d36ce6c3fcc5e53de869cfe10fda16225ebe580c32891bccc240a30a85a538 libavcodec/aarch64/neon.S
|
||||
c0d03143b1bc5a9e358222d08d2d449d595271844fe7a3dc23bffb91abe8b0e3 libavutil/aarch64/asm.S
|
||||
b634ab5640e258563c536e658cad87080553df6f34f62269a21d554844e58bfe COPYING.LGPLv2.1
|
||||
EOF
|
||||
)
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
LGPL-2.1-or-later. See `COPYING.LGPLv2.1`. Original copyright
|
||||
holders include the FFmpeg authors and Google Inc. (2016) for
|
||||
the aarch64 NEON paths. The snapshot inherits FFmpeg's license
|
||||
in full.
|
||||
|
||||
## Why each file is in this snapshot
|
||||
|
||||
- `libavcodec/vp9dsp_template.c` — contains `idct_idct_8x8_add_c`,
|
||||
the bit-exact C reference for the Phase 1 kernel under test (M1).
|
||||
- `libavcodec/aarch64/vp9itxfm_neon.S` — contains
|
||||
`ff_vp9_idct_idct_8x8_add_neon`, the NEON throughput baseline
|
||||
(M3). Also defines `idct8`, `dmbutterfly0`, `dmbutterfly`,
|
||||
`dmbutterfly_l`, `butterfly_8h`, and the `idct_coeffs` constant
|
||||
table.
|
||||
- `libavcodec/aarch64/neon.S` — defines `transpose_8x8H` used by
|
||||
`vp9itxfm_neon.S`.
|
||||
- `libavutil/aarch64/asm.S` — defines `function`, `endfunc`,
|
||||
`movrel`, `const`, `endconst`, and other assembly preamble
|
||||
macros required to assemble the above NEON files.
|
||||
|
||||
## Re-vendoring procedure
|
||||
|
||||
If the upstream pin needs to change (e.g., hertz updates to a
|
||||
newer libavcodec):
|
||||
|
||||
```sh
|
||||
TAG=nX.Y.Z
|
||||
BASE=https://raw.githubusercontent.com/FFmpeg/FFmpeg/$TAG
|
||||
cd external/ffmpeg-snapshot
|
||||
for f in libavcodec/vp9dsp_template.c \
|
||||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||||
libavcodec/aarch64/neon.S \
|
||||
libavutil/aarch64/asm.S \
|
||||
COPYING.LGPLv2.1; do
|
||||
curl -sSf -o "$f" "$BASE/$f"
|
||||
done
|
||||
sha256sum libavcodec/vp9dsp_template.c \
|
||||
libavcodec/aarch64/vp9itxfm_neon.S \
|
||||
libavcodec/aarch64/neon.S \
|
||||
libavutil/aarch64/asm.S \
|
||||
COPYING.LGPLv2.1
|
||||
# update this PROVENANCE.md with the new tag, commit hash, and hashes
|
||||
```
|
||||
|
||||
After re-vendoring, re-run the bit-exact gate (M1) and throughput
|
||||
baseline (M3) — both can shift across FFmpeg versions even when
|
||||
the VP9 spec doesn't change (e.g., NEON micro-optimizations).
|
||||
Reference in New Issue
Block a user