2cd2258a7b
First AV1 kernel cycle and first dav1d-vendored sources. Phase 1+2
docs lay out the structural complexity (CDEF needs pre-padded 12x12
working buffer + external edge context + direction lookup +
constraint function — meaningfully more complex than cycles 1-4).
Phase 3+ deferred to next session — CDEF is the first cycle that
doesn't fit cleanly into a single autonomous run.
Vendored from dav1d 1.4.3 (BSD-2-Clause, cleaner license than
FFmpeg's LGPL-2.1+):
src/arm/64/cdef.S 520 lines — NEON impl
src/arm/64/util.S 278 lines — NEON helpers
src/arm/asm.S 335 lines — GAS preamble
src/cdef_tmpl.c 331 lines — C reference (templated)
include/common/intops.h 84 lines — utility helpers
src/tables_cdef_subset.c hand-extracted — dav1d_cdef_directions
only (avoids dragging full 1013-line
tables.c + transitive includes)
Discovery from Phase 2 analysis:
- Filter type and shape: dav1d_cdef_filter8_pri_sec_8bpc_neon takes
(dst, dst_stride, tmp, pri_strength, sec_strength, dir, damping, h).
The 'tmp' arg is the pre-padded 12x12 buffer constructed externally
by the dav1d C-side padding() function.
- Tap weights are inline-computed (not table): pri_tap = 4 or 3
(based on pri_strength bit), sec_tap = 2 or 1. Only
dav1d_cdef_directions[12][2] is an external table.
- Constraint function: constrain(diff, threshold, shift) =
apply_sign(min(abs(diff), max(0, threshold - (abs(diff) >> shift))),
diff)
Predicted R5 band: 0.15-0.30 (ORANGE). CDEF is compute-heavier than
LPF (per-pixel min/max conditional logic), so likely worse R than
cycle 2/4 but better than cycle 3 MC. M4 gate likely required.
What Phase 3+ needs (next session):
1. config.h shim for dav1d's asm preamble (defines TBD on first build)
2. Standalone C reference for cdef_filter_block_8x8_c
(cdef_tmpl.c references several dav1d private headers; cleaner to
transcribe to a self-contained tests/cdef_ref.c)
3. tests/bench_neon_cdef.c — M1+M3 bench
4. Phase 4 plan, Phase 5 review (mandatory), Phase 6 shader, Phase 7 measure
PROVENANCE.md documents pin + per-file role + re-vendoring procedure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
110 lines
4.2 KiB
Markdown
110 lines
4.2 KiB
Markdown
# dav1d source snapshot
|
|
|
|
Verbatim subset of dav1d source pinned for use as reference
|
|
implementations of AV1 CDEF (cycle 5 of `daedalus-fourier`) and
|
|
potentially future AV1 kernels. dav1d is the canonical AV1 decoder
|
|
library (BSD-2-Clause, maintained by VideoLAN).
|
|
|
|
See `../../docs/k5_cdef_phase1_2.md` for the cycle 5 scope and
|
|
rationale.
|
|
|
|
## Upstream pin
|
|
|
|
- **Repository**: https://github.com/videolan/dav1d (canonical mirror
|
|
of https://code.videolan.org/videolan/dav1d)
|
|
- **Tag**: `1.4.3` (last stable release in the 1.4.x line as of
|
|
2026-05-18; pinned for reproducibility)
|
|
- **Snapshot fetched**: 2026-05-18 (UTC), via
|
|
`https://raw.githubusercontent.com/videolan/dav1d/1.4.3/<path>`
|
|
|
|
## Files in this snapshot
|
|
|
|
All files are byte-for-byte copies of the upstream source at the
|
|
tagged commit, except `tables_cdef_subset.c` which is a hand-extracted
|
|
single-table copy from `src/tables.c` (see §"Why each file" below).
|
|
|
|
| Path | Lines | SHA-256 |
|
|
|---|---|---|
|
|
| `src/arm/64/cdef.S` | 520 | `88d048cbed93f168...` (TODO full hash) |
|
|
| `src/arm/64/util.S` | 278 | `582acd8e2b74a1e8...` |
|
|
| `src/arm/asm.S` | 335 | `6a22def2799876c4...` |
|
|
| `src/cdef_tmpl.c` | 331 | `26a7a5f9fda65c58...` |
|
|
| `include/common/intops.h` | 84 | `c1e7d52b421d6417...` |
|
|
| `src/tables_cdef_subset.c` | hand-extracted | — |
|
|
|
|
Full SHA-256s (regenerated by `phase 3` setup):
|
|
|
|
```sh
|
|
( cd external/dav1d-snapshot && sha256sum \
|
|
src/arm/64/cdef.S src/arm/64/util.S src/arm/asm.S \
|
|
src/cdef_tmpl.c include/common/intops.h )
|
|
```
|
|
|
|
## License
|
|
|
|
BSD-2-Clause. Copyright (c) 2018 VideoLAN and dav1d authors; (c) 2019
|
|
Martin Storsjö (NEON aarch64). Original copyright headers preserved
|
|
in each vendored file.
|
|
|
|
Notably cleaner license than the FFmpeg LGPL-2.1+ snapshot — dav1d's
|
|
BSD allows distribution of binaries without LGPL's "share linking
|
|
ability" requirements. For daedalus-fourier benches that link only
|
|
this snapshot, the binary inherits BSD-2-Clause. Benches that
|
|
combine both snapshots (none currently) inherit LGPL-2.1+ via
|
|
FFmpeg's stronger terms.
|
|
|
|
## Why each file
|
|
|
|
- **`src/arm/64/cdef.S`** — the NEON aarch64 implementation. Provides
|
|
`dav1d_cdef_filter8_pri_sec_8bpc_neon` and pri-only / sec-only
|
|
variants. The Phase 3 NEON baseline (M3₅) measures this symbol.
|
|
|
|
- **`src/arm/64/util.S`** — helper macros (`load_px_8`,
|
|
`handle_pixel_8`, etc.) referenced by cdef.S.
|
|
|
|
- **`src/arm/asm.S`** — top-level GAS preamble (function/endfunc,
|
|
movrel, register macros). dav1d's own version is similar to FFmpeg's
|
|
but with different defines (PRIVATE_PREFIX dav1d_ etc.); Phase 6
|
|
setup will identify the config.h shim needed for standalone
|
|
assembly.
|
|
|
|
- **`src/cdef_tmpl.c`** — the C reference (templated; the
|
|
`cdef_filter_block_c` core function is in here, expanded to
|
|
`cdef_filter_block_8x8_c` via `cdef_fn(8, 8)`).
|
|
|
|
- **`include/common/intops.h`** — utility helpers (apply_sign,
|
|
imin, imax, iclip, umin) used by cdef_tmpl.c.
|
|
|
|
- **`src/tables_cdef_subset.c`** — hand-extracted `dav1d_cdef_directions`
|
|
table from `src/tables.c` (lines 400-414). Provides the only
|
|
table symbol both `cdef.S` and `cdef_tmpl.c` reference externally.
|
|
Pulling in the full `src/tables.c` (1013 lines) would chain-include
|
|
the entire dav1d decoder, which is overkill for our purposes.
|
|
See `tables_cdef_subset.c` header comment for line-range
|
|
reference back to upstream.
|
|
|
|
## Re-vendoring procedure
|
|
|
|
Same as FFmpeg snapshot — see `../ffmpeg-snapshot/PROVENANCE.md`.
|
|
|
|
```sh
|
|
TAG=1.x.y
|
|
BASE=https://raw.githubusercontent.com/videolan/dav1d/$TAG
|
|
cd external/dav1d-snapshot
|
|
for f in src/arm/64/cdef.S src/arm/64/util.S src/arm/asm.S \
|
|
src/cdef_tmpl.c include/common/intops.h; do
|
|
curl -sSf -o "$f" "$BASE/$f"
|
|
done
|
|
# tables_cdef_subset.c needs manual re-extraction from
|
|
# upstream src/tables.c — search for "dav1d_cdef_directions ="
|
|
```
|
|
|
|
## Pending work (Phase 3+, next session)
|
|
|
|
- config.h shim for assembling cdef.S standalone (dav1d's defines
|
|
differ from FFmpeg's; will identify exact list on first build)
|
|
- Standalone C reference for `cdef_filter_block_8x8_c` (this snapshot's
|
|
`cdef_tmpl.c` references several private headers — easier to
|
|
transcribe to a self-contained `tests/cdef_ref.c`)
|
|
- `tests/bench_neon_cdef.c` to capture M3₅ baseline
|