2cd2258a7b
First AV1 kernel cycle and first dav1d-vendored sources. Phase 1+2
docs lay out the structural complexity (CDEF needs pre-padded 12x12
working buffer + external edge context + direction lookup +
constraint function — meaningfully more complex than cycles 1-4).
Phase 3+ deferred to next session — CDEF is the first cycle that
doesn't fit cleanly into a single autonomous run.
Vendored from dav1d 1.4.3 (BSD-2-Clause, cleaner license than
FFmpeg's LGPL-2.1+):
src/arm/64/cdef.S 520 lines — NEON impl
src/arm/64/util.S 278 lines — NEON helpers
src/arm/asm.S 335 lines — GAS preamble
src/cdef_tmpl.c 331 lines — C reference (templated)
include/common/intops.h 84 lines — utility helpers
src/tables_cdef_subset.c hand-extracted — dav1d_cdef_directions
only (avoids dragging full 1013-line
tables.c + transitive includes)
Discovery from Phase 2 analysis:
- Filter type and shape: dav1d_cdef_filter8_pri_sec_8bpc_neon takes
(dst, dst_stride, tmp, pri_strength, sec_strength, dir, damping, h).
The 'tmp' arg is the pre-padded 12x12 buffer constructed externally
by the dav1d C-side padding() function.
- Tap weights are inline-computed (not table): pri_tap = 4 or 3
(based on pri_strength bit), sec_tap = 2 or 1. Only
dav1d_cdef_directions[12][2] is an external table.
- Constraint function: constrain(diff, threshold, shift) =
apply_sign(min(abs(diff), max(0, threshold - (abs(diff) >> shift))),
diff)
Predicted R5 band: 0.15-0.30 (ORANGE). CDEF is compute-heavier than
LPF (per-pixel min/max conditional logic), so likely worse R than
cycle 2/4 but better than cycle 3 MC. M4 gate likely required.
What Phase 3+ needs (next session):
1. config.h shim for dav1d's asm preamble (defines TBD on first build)
2. Standalone C reference for cdef_filter_block_8x8_c
(cdef_tmpl.c references several dav1d private headers; cleaner to
transcribe to a self-contained tests/cdef_ref.c)
3. tests/bench_neon_cdef.c — M1+M3 bench
4. Phase 4 plan, Phase 5 review (mandatory), Phase 6 shader, Phase 7 measure
PROVENANCE.md documents pin + per-file role + re-vendoring procedure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
33 lines
1.3 KiB
C
33 lines
1.3 KiB
C
/*
|
|
* dav1d_cdef_directions — verbatim transcription of the CDEF
|
|
* directions table from dav1d/src/tables.c (1.4.3, lines 400-414).
|
|
* Provided as a standalone .c so the vendored cdef.S has the
|
|
* symbol to link against without pulling in dav1d's full tables.c
|
|
* (which is 1013 lines and chain-references the entire decoder).
|
|
*
|
|
* Used by both the C reference (cdef_tmpl.c) and the NEON
|
|
* implementation (cdef.S).
|
|
*
|
|
* The table has 12 entries (2 + 8 + 2) because direction indexing
|
|
* wraps modulo 8 with ±2 lookahead for secondary taps; the leading
|
|
* and trailing 2 entries are the wrap-around prefixes/suffixes.
|
|
*
|
|
* License: BSD-2-Clause (matches dav1d upstream).
|
|
*/
|
|
#include <stdint.h>
|
|
|
|
const int8_t dav1d_cdef_directions[2 + 8 + 2][2] = {
|
|
{ 1 * 12 + 0, 2 * 12 + 0 }, // 6 (wrap prefix)
|
|
{ 1 * 12 + 0, 2 * 12 - 1 }, // 7 (wrap prefix)
|
|
{ -1 * 12 + 1, -2 * 12 + 2 }, // 0
|
|
{ 0 * 12 + 1, -1 * 12 + 2 }, // 1
|
|
{ 0 * 12 + 1, 0 * 12 + 2 }, // 2
|
|
{ 0 * 12 + 1, 1 * 12 + 2 }, // 3
|
|
{ 1 * 12 + 1, 2 * 12 + 2 }, // 4
|
|
{ 1 * 12 + 0, 2 * 12 + 1 }, // 5
|
|
{ 1 * 12 + 0, 2 * 12 + 0 }, // 6
|
|
{ 1 * 12 + 0, 2 * 12 - 1 }, // 7
|
|
{ -1 * 12 + 1, -2 * 12 + 2 }, // 0 (wrap suffix)
|
|
{ 0 * 12 + 1, -1 * 12 + 2 }, // 1 (wrap suffix)
|
|
};
|