Files
marfrit 2cd2258a7b Cycle 5 setup (Phase 1+2): vendor dav1d 1.4.3 CDEF sources
First AV1 kernel cycle and first dav1d-vendored sources. Phase 1+2
docs lay out the structural complexity (CDEF needs pre-padded 12x12
working buffer + external edge context + direction lookup +
constraint function — meaningfully more complex than cycles 1-4).

Phase 3+ deferred to next session — CDEF is the first cycle that
doesn't fit cleanly into a single autonomous run.

Vendored from dav1d 1.4.3 (BSD-2-Clause, cleaner license than
FFmpeg's LGPL-2.1+):

  src/arm/64/cdef.S            520 lines — NEON impl
  src/arm/64/util.S            278 lines — NEON helpers
  src/arm/asm.S                335 lines — GAS preamble
  src/cdef_tmpl.c              331 lines — C reference (templated)
  include/common/intops.h       84 lines — utility helpers
  src/tables_cdef_subset.c      hand-extracted — dav1d_cdef_directions
                                only (avoids dragging full 1013-line
                                tables.c + transitive includes)

Discovery from Phase 2 analysis:
- Filter type and shape: dav1d_cdef_filter8_pri_sec_8bpc_neon takes
  (dst, dst_stride, tmp, pri_strength, sec_strength, dir, damping, h).
  The 'tmp' arg is the pre-padded 12x12 buffer constructed externally
  by the dav1d C-side padding() function.
- Tap weights are inline-computed (not table): pri_tap = 4 or 3
  (based on pri_strength bit), sec_tap = 2 or 1. Only
  dav1d_cdef_directions[12][2] is an external table.
- Constraint function: constrain(diff, threshold, shift) =
  apply_sign(min(abs(diff), max(0, threshold - (abs(diff) >> shift))),
             diff)

Predicted R5 band: 0.15-0.30 (ORANGE). CDEF is compute-heavier than
LPF (per-pixel min/max conditional logic), so likely worse R than
cycle 2/4 but better than cycle 3 MC. M4 gate likely required.

What Phase 3+ needs (next session):
1. config.h shim for dav1d's asm preamble (defines TBD on first build)
2. Standalone C reference for cdef_filter_block_8x8_c
   (cdef_tmpl.c references several dav1d private headers; cleaner to
   transcribe to a self-contained tests/cdef_ref.c)
3. tests/bench_neon_cdef.c — M1+M3 bench
4. Phase 4 plan, Phase 5 review (mandatory), Phase 6 shader, Phase 7 measure

PROVENANCE.md documents pin + per-file role + re-vendoring procedure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:12:25 +00:00

85 lines
2.6 KiB
C

/*
* Copyright © 2018, VideoLAN and dav1d authors
* Copyright © 2018, Two Orioles, LLC
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef DAV1D_COMMON_INTOPS_H
#define DAV1D_COMMON_INTOPS_H
#include <stdint.h>
#include "common/attributes.h"
static inline int imax(const int a, const int b) {
return a > b ? a : b;
}
static inline int imin(const int a, const int b) {
return a < b ? a : b;
}
static inline unsigned umax(const unsigned a, const unsigned b) {
return a > b ? a : b;
}
static inline unsigned umin(const unsigned a, const unsigned b) {
return a < b ? a : b;
}
static inline int iclip(const int v, const int min, const int max) {
return v < min ? min : v > max ? max : v;
}
static inline int iclip_u8(const int v) {
return iclip(v, 0, 255);
}
static inline int apply_sign(const int v, const int s) {
return s < 0 ? -v : v;
}
static inline int apply_sign64(const int v, const int64_t s) {
return s < 0 ? -v : v;
}
static inline int ulog2(const unsigned v) {
return 31 - clz(v);
}
static inline int u64log2(const uint64_t v) {
return 63 - clzll(v);
}
static inline unsigned inv_recenter(const unsigned r, const unsigned v) {
if (v > (r << 1))
return v;
else if ((v & 1) == 0)
return (v >> 1) + r;
else
return r - ((v + 1) >> 1);
}
#endif /* DAV1D_COMMON_INTOPS_H */