8bc6d27ea7
Adds the High-profile Intra_8x8 luma primitive set. Per H.264
§8.3.2.1, this is distinct from Intra_4x4 in two ways:
1. REFERENCE SAMPLE PRE-FILTER (§8.3.2.1.1). The 25 raw neighbour
samples are smoothed with a 1-2-1 filter BEFORE prediction.
Spec-defined boundary handling at corners and the right edge:
- top-left filt'd: (top[0] + 2*tl + left[0] + 2) >> 2
- top[0] filt'd: (tl + 2*t[0] + t[1] + 2) >> 2
- top[i] for 1..14: (t[i-1] + 2*t[i] + t[i+1] + 2) >> 2
- top[15] filt'd: (t[14] + 3*t[15] + 2) >> 2 ← 3× boundary
- left analogous, with l[7] using 3× boundary.
2. SCALE. All 9 prediction modes operate at 8x8 on the filtered
samples (Intra_4x4 is 4x4 on raw samples).
This PR ships the pre-filter + the 3 simple modes (V, H, DC):
- Mode 0 Vertical (§8.3.2.1.2): pred[r,c] = filt_top[c]
- Mode 1 Horizontal (§8.3.2.1.3): pred[r,c] = filt_left[r]
- Mode 2 DC (§8.3.2.1.4): ((sum_filt_top[0..7] + sum_filt_left[0..7]
+ 8) >> 4) broadcast
The 6 directional modes (DDL, DDR, VR, HD, VL, HU at 8x8 per
§8.3.2.1.5..§8.3.2.1.10) follow in a separate PR. They use the
same filtered samples; only the per-cell formula differs.
Test design (tests/test_intra_pred_8x8_luma.c):
- 3 uniform-context tests, one per mode (sanity).
- 2 gradient tests that exercise the pre-filter's interior +
boundary cases:
* Vertical with top = 0..15: spec arithmetic gives filtered
top[c] = c for c in 0..7 (gradient input → identity through
the 1-2-1 filter on the interior; boundaries arithmetically
verify too). Test expects pred[r,c] = c.
* Horizontal with left = 0..7: same arithmetic chain on the
left col. Test expects pred[r,c] = r.
Verified on hertz:
$ ./build/test_intra_pred_8x8_luma
Vertical (mode 0, uniform top) PASS
Horizontal (mode 1, uniform left) PASS
DC (mode 2, uniform) PASS
Vertical (mode 0, gradient) PASS (filtered gradient)
Horizontal (mode 1, gradient) PASS (filtered gradient)
ALL Intra_8x8 luma PASS (3 modes — V, H, DC)
The pre-filter being right first try is meaningful — the boundary
samples use a 3× weight rather than 2× (filt[top 15] = (t[14] +
3*t[15] + 2) >> 2), which is easy to forget when transcribing. The
gradient test would have surfaced any boundary mistake immediately.
Combined intra-prediction primitive coverage after this PR:
Intra_4x4 luma ✓ (9 modes, PR #12)
Intra_16x16 luma ✓ (4 modes, PR #13)
Intra_8x8 chroma ✓ (4 modes, PR #14)
Intra_8x8 luma △ (3 of 9 modes — V, H, DC ✓; DDL/DDR/VR/HD/VL/HU pending)
The 6 remaining Intra_8x8 luma directional modes are spec-mechanical
follow-ups; each is a ~30-line formula per §8.3.2.1.5+.
646 lines
23 KiB
CMake
646 lines
23 KiB
CMake
# daedalus-fourier — Phase 3 baseline + (later) Phase 6 implementation.
|
|
#
|
|
# Builds:
|
|
# bench_neon_idct — NEON throughput baseline (Phase 3 M3) +
|
|
# bit-exact correctness gate (Phase 1 M1).
|
|
# bench_vulkan_dispatch — Vulkan compute dispatch-overhead baseline (M5).
|
|
#
|
|
# Linkage note: bench_neon_idct statically links the vendored
|
|
# FFmpeg n7.1.3 NEON snapshot (LGPL-2.1+); see
|
|
# external/ffmpeg-snapshot/PROVENANCE.md.
|
|
|
|
cmake_minimum_required(VERSION 3.20)
|
|
project(daedalus-fourier C ASM)
|
|
|
|
set(CMAKE_C_STANDARD 11)
|
|
set(CMAKE_C_STANDARD_REQUIRED ON)
|
|
|
|
if (NOT CMAKE_BUILD_TYPE)
|
|
set(CMAKE_BUILD_TYPE Release)
|
|
endif()
|
|
|
|
if (NOT CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
|
|
message(FATAL_ERROR
|
|
"daedalus-fourier targets aarch64 (Pi 5 / BCM2712). "
|
|
"Cross-compile not yet wired.")
|
|
endif()
|
|
|
|
add_compile_options(-Wall -Wextra -Wno-unused-parameter)
|
|
|
|
# ---- Vendored FFmpeg snapshot (LGPL-2.1+) -----------------------------------
|
|
|
|
set(FFSNAP ${CMAKE_SOURCE_DIR}/external/ffmpeg-snapshot)
|
|
|
|
# Assembly preamble (config.h shim + FFmpeg's asm helpers) used by the
|
|
# vendored .S file. -I flags expose:
|
|
# - FFSNAP/ so `#include "config.h"` finds our shim
|
|
# - FFSNAP/libavcodec/aarch64/ so `#include "neon.S"` finds the helper
|
|
# - FFSNAP/ so `#include "libavutil/aarch64/asm.S"`
|
|
# resolves against the vendored copy
|
|
set(FFASM_FLAGS
|
|
-I${FFSNAP}
|
|
-I${FFSNAP}/libavcodec/aarch64
|
|
-I${FFSNAP}
|
|
)
|
|
|
|
# ---- Vendored dav1d snapshot (BSD-2-Clause) — cycle 5+ ----------------------
|
|
|
|
set(DAV1DSNAP ${CMAKE_SOURCE_DIR}/external/dav1d-snapshot)
|
|
|
|
# dav1d's asm preamble expects "src/arm/asm.S" and "cdef_tmpl.S" / "util.S"
|
|
# (the latter two as bare basenames from within src/arm/64/). Include paths:
|
|
set(DAV1D_ASM_FLAGS
|
|
-I${DAV1DSNAP} # for config.h shim + src/arm/asm.S
|
|
-I${DAV1DSNAP}/src/arm/64 # for util.S, cdef_tmpl.S
|
|
)
|
|
|
|
set(DAV1D_CDEF_ASM_SOURCES
|
|
${DAV1DSNAP}/src/arm/64/cdef.S
|
|
)
|
|
set(DAV1D_CDEF_C_SOURCES
|
|
${DAV1DSNAP}/src/tables_cdef_subset.c
|
|
)
|
|
set_source_files_properties(${DAV1D_CDEF_ASM_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${DAV1D_ASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
set(FFASM_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9itxfm_neon.S
|
|
)
|
|
|
|
# Cycle 6 — H.264 IDCT 4x4 + 8x8 NEON (vendored 2026-05-18).
|
|
set(FFASM_H264IDCT_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264idct_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264IDCT_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 2 — VP9 loop filter NEON source (vendored 2026-05-18).
|
|
set(FFASM_LPF_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9lpf_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_LPF_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 3 — VP9 MC interpolation NEON source + filter coefficient table
|
|
# (vendored 2026-05-18). The .c table provides ff_vp9_subpel_filters
|
|
# symbol which vp9mc_neon.S references via movrel.
|
|
set(FFASM_MC_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9mc_neon.S
|
|
)
|
|
set(FFC_MC_SOURCES
|
|
${FFSNAP}/libavcodec/vp9_subpel_filters_table.c
|
|
)
|
|
set_source_files_properties(${FFASM_MC_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Tell CMake/gas to preprocess .S sources.
|
|
set_source_files_properties(${FFASM_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# ---- NEON baseline microbenches --------------------------------------------
|
|
|
|
# Cycle 6 — H.264 IDCT 4x4 NEON M3 baseline bench.
|
|
add_executable(bench_neon_h264idct4
|
|
tests/bench_neon_h264idct4.c
|
|
tests/h264_idct4_ref.c
|
|
${FFASM_H264IDCT_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264idct4 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 7 — H.264 IDCT 8x8 NEON M3 baseline bench.
|
|
add_executable(bench_neon_h264idct8
|
|
tests/bench_neon_h264idct8.c
|
|
tests/h264_idct8_ref.c
|
|
${FFASM_H264IDCT_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264idct8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 8 — H.264 luma vertical deblock NEON M3 baseline bench.
|
|
set(FFASM_H264DSP_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264dsp_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264DSP_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 9 — H.264 luma qpel MC NEON.
|
|
set(FFASM_H264QPEL_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264qpel_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264QPEL_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
add_executable(bench_neon_h264deblock
|
|
tests/bench_neon_h264deblock.c
|
|
tests/h264_deblock_ref.c
|
|
${FFASM_H264DSP_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264deblock PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 9 — H.264 luma qpel mc20 NEON M3 baseline.
|
|
add_executable(bench_neon_h264qpel_mc20
|
|
tests/bench_neon_h264qpel_mc20.c
|
|
tests/h264_qpel8_mc20_ref.c
|
|
${FFASM_H264QPEL_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264qpel_mc20 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
add_executable(bench_neon_idct
|
|
tests/bench_neon_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
${FFASM_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_idct PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 2 — VP9 loop filter NEON baseline.
|
|
add_executable(bench_neon_lpf
|
|
tests/bench_neon_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_lpf PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 3 — VP9 MC interpolation NEON baseline.
|
|
add_executable(bench_neon_mc
|
|
tests/bench_neon_mc.c
|
|
tests/vp9_mc_ref.c
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_mc PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 4 — VP9 LPF wd=8 NEON baseline (same vendored .S as cycle 2).
|
|
add_executable(bench_neon_lpf8
|
|
tests/bench_neon_lpf8.c
|
|
tests/vp9_lpf8_ref.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_lpf8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 5 — AV1 CDEF NEON baseline (dav1d snapshot).
|
|
add_executable(bench_neon_cdef
|
|
tests/bench_neon_cdef.c
|
|
tests/cdef_ref.c
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_cdef PRIVATE -O3 -march=armv8-a+simd)
|
|
# bench_neon_idct doesn't need vulkan/drm — pure CPU baseline.
|
|
|
|
# ---- Vulkan dispatch-overhead microbench (next chunk) ----------------------
|
|
# Stub: written in a follow-up step. Toggle ON with -DDAEDALUS_BUILD_VULKAN=ON
|
|
# once tests/bench_vulkan_dispatch.c exists.
|
|
|
|
option(DAEDALUS_BUILD_VULKAN "Build Vulkan compute-dispatch microbench" ON)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
find_package(Vulkan REQUIRED)
|
|
|
|
# Compile GLSL compute shaders to SPIR-V via glslangValidator.
|
|
# The binary loads them at runtime from the build dir (cwd-relative).
|
|
find_program(GLSLANG_VALIDATOR
|
|
NAMES glslangValidator glslang
|
|
REQUIRED)
|
|
|
|
set(NOOP_SPV ${CMAKE_BINARY_DIR}/noop.spv)
|
|
add_custom_command(
|
|
OUTPUT ${NOOP_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V -o ${NOOP_SPV}
|
|
${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
|
|
COMMENT "glslang: noop.comp -> noop.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_idct8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${IDCT8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${IDCT8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
|
|
COMMENT "glslang: v3d_idct8.comp -> v3d_idct8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(LPF_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_4_8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${LPF_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${LPF_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
|
|
COMMENT "glslang: v3d_lpf_h_4_8.comp -> v3d_lpf_h_4_8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(MC_SPV ${CMAKE_BINARY_DIR}/v3d_mc_8h.spv)
|
|
add_custom_command(
|
|
OUTPUT ${MC_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${MC_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
|
|
COMMENT "glslang: v3d_mc_8h.comp -> v3d_mc_8h.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(LPF8_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_8_8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${LPF8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${LPF8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
|
|
COMMENT "glslang: v3d_lpf_h_8_8.comp -> v3d_lpf_h_8_8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(CDEF_SPV ${CMAKE_BINARY_DIR}/v3d_cdef.spv)
|
|
add_custom_command(
|
|
OUTPUT ${CDEF_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${CDEF_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
|
COMMENT "glslang: v3d_cdef.comp -> v3d_cdef.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264DEBLOCK_SPV ${CMAKE_BINARY_DIR}/v3d_h264deblock.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264DEBLOCK_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264DEBLOCK_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
|
|
COMMENT "glslang: v3d_h264deblock.comp -> v3d_h264deblock.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_IDCT4_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct4.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_IDCT4_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_IDCT4_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
|
|
COMMENT "glslang: v3d_h264_idct4.comp -> v3d_h264_idct4.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_IDCT8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_IDCT8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
|
|
COMMENT "glslang: v3d_h264_idct8.comp -> v3d_h264_idct8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_QPEL_MC20_SPV ${CMAKE_BINARY_DIR}/v3d_h264_qpel_mc20.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_QPEL_MC20_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_QPEL_MC20_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
|
|
COMMENT "glslang: v3d_h264_qpel_mc20.comp -> v3d_h264_qpel_mc20.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV} ${CDEF_SPV} ${H264DEBLOCK_SPV} ${H264_IDCT4_SPV} ${H264_IDCT8_SPV} ${H264_QPEL_MC20_SPV})
|
|
|
|
# v3d_runner — reusable Vulkan plumbing.
|
|
add_library(v3d_runner STATIC src/v3d_runner.c)
|
|
target_include_directories(v3d_runner PUBLIC src)
|
|
target_link_libraries(v3d_runner PUBLIC Vulkan::Vulkan)
|
|
target_compile_options(v3d_runner PRIVATE -O2)
|
|
|
|
add_executable(bench_vulkan_dispatch tests/bench_vulkan_dispatch.c)
|
|
add_dependencies(bench_vulkan_dispatch daedalus_shaders)
|
|
target_link_libraries(bench_vulkan_dispatch PRIVATE Vulkan::Vulkan)
|
|
target_compile_options(bench_vulkan_dispatch PRIVATE -O2)
|
|
|
|
add_executable(bench_v3d_idct
|
|
tests/bench_v3d_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_idct daedalus_shaders)
|
|
target_link_libraries(bench_v3d_idct PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_idct PRIVATE -O2)
|
|
|
|
# Cycle 2 — QPU LPF bench.
|
|
add_executable(bench_v3d_lpf
|
|
tests/bench_v3d_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_lpf daedalus_shaders)
|
|
target_link_libraries(bench_v3d_lpf PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_lpf PRIVATE -O2)
|
|
|
|
# Cycle 3 — QPU MC bench.
|
|
add_executable(bench_v3d_mc
|
|
tests/bench_v3d_mc.c
|
|
tests/vp9_mc_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_mc daedalus_shaders)
|
|
target_link_libraries(bench_v3d_mc PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_mc PRIVATE -O2)
|
|
|
|
# Cycle 4 — QPU LPF wd=8 bench.
|
|
add_executable(bench_v3d_lpf8
|
|
tests/bench_v3d_lpf8.c
|
|
tests/vp9_lpf8_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_lpf8 daedalus_shaders)
|
|
target_link_libraries(bench_v3d_lpf8 PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_lpf8 PRIVATE -O2)
|
|
|
|
# Cycle 5 — QPU CDEF bench (3-way M1 against NEON + C ref).
|
|
add_executable(bench_v3d_cdef
|
|
tests/bench_v3d_cdef.c
|
|
tests/cdef_ref.c
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
add_dependencies(bench_v3d_cdef daedalus_shaders)
|
|
target_link_libraries(bench_v3d_cdef PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_cdef PRIVATE -O2)
|
|
|
|
# Cycle 8 — QPU H.264 deblock bench (3-way).
|
|
add_executable(bench_v3d_h264deblock
|
|
tests/bench_v3d_h264deblock.c
|
|
tests/h264_deblock_ref.c
|
|
${FFASM_H264DSP_SOURCES}
|
|
)
|
|
add_dependencies(bench_v3d_h264deblock daedalus_shaders)
|
|
target_link_libraries(bench_v3d_h264deblock PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_h264deblock PRIVATE -O2)
|
|
endif()
|
|
|
|
# ---- Phase 8 — public C API library + smoke test ---------------------------
|
|
|
|
add_library(daedalus_core STATIC
|
|
src/daedalus_core.c
|
|
src/v3d_runner.c
|
|
${FFASM_SOURCES}
|
|
${FFASM_LPF_SOURCES}
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
${FFASM_H264IDCT_SOURCES}
|
|
${FFASM_H264DSP_SOURCES}
|
|
${FFASM_H264QPEL_SOURCES}
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
target_include_directories(daedalus_core PUBLIC include)
|
|
target_include_directories(daedalus_core PRIVATE src)
|
|
target_link_libraries(daedalus_core PUBLIC Vulkan::Vulkan)
|
|
target_compile_options(daedalus_core PRIVATE -O2)
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
add_dependencies(daedalus_core daedalus_shaders)
|
|
endif()
|
|
|
|
# ---- Install rules for sibling consumers (Phase 8 V4L2 daemon, etc.) -------
|
|
#
|
|
# Installs:
|
|
# - libdaedalus_core.a → ${CMAKE_INSTALL_LIBDIR}
|
|
# - include/daedalus.h → ${CMAKE_INSTALL_INCLUDEDIR}
|
|
# - daedalus-fourier.pc → ${CMAKE_INSTALL_LIBDIR}/pkgconfig
|
|
# - V3D SPIR-V shaders → ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
# (only when DAEDALUS_BUILD_VULKAN is ON; consumers using
|
|
# daedalus_ctx_create_no_qpu() don't need them)
|
|
#
|
|
# pkg-config tells consumers what to link; the static-archive
|
|
# dependencies (Vulkan, pthread, and the vendored asm symbols)
|
|
# are surfaced through Requires.private + Libs.private so a
|
|
# consumer doing `pkg-config --libs daedalus-fourier` gets the
|
|
# right transitive link line.
|
|
|
|
include(GNUInstallDirs)
|
|
|
|
install(TARGETS daedalus_core
|
|
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
|
)
|
|
|
|
install(FILES include/daedalus.h
|
|
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
|
|
)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
install(FILES
|
|
${NOOP_SPV}
|
|
${IDCT8_SPV}
|
|
${LPF_SPV}
|
|
${MC_SPV}
|
|
${LPF8_SPV}
|
|
${CDEF_SPV}
|
|
${H264DEBLOCK_SPV}
|
|
${H264_IDCT4_SPV}
|
|
${H264_IDCT8_SPV}
|
|
${H264_QPEL_MC20_SPV}
|
|
DESTINATION ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
)
|
|
endif()
|
|
|
|
# pkg-config file. Vulkan goes in Requires.private (consumer's
|
|
# pkg-config call gets it via --static). pthread + dl are needed
|
|
# by the static archive's runtime helpers.
|
|
#
|
|
# `prefix` is derived from ${pcfiledir} so the .pc is relocatable:
|
|
# pkg-config substitutes ${pcfiledir} with the directory holding the
|
|
# .pc at lookup time, and the relative path from
|
|
# <prefix>/<libdir>/pkgconfig back to <prefix> tells pkg-config the
|
|
# install prefix without baking it in. This is why
|
|
# `cmake --install build --prefix /foo` produces a .pc that correctly
|
|
# resolves `prefix=/foo` instead of baking whatever CMAKE_INSTALL_PREFIX
|
|
# was at *configure* time (default /usr/local). DESTDIR-staged
|
|
# installs work too: at runtime pkg-config sees the .pc at its real
|
|
# install path and computes the right prefix.
|
|
#
|
|
# Relative-path depth is computed from CMAKE_INSTALL_LIBDIR (and
|
|
# whatever multiarch tuple GNUInstallDirs adds) so Debian-style
|
|
# `lib/aarch64-linux-gnu/pkgconfig/...` resolves with the right number
|
|
# of `..` components. Layouts where libdir is *not* under prefix are
|
|
# not supported by this scheme; if a packager overrides libdir to an
|
|
# absolute path the relative-path machinery falls back to the absolute
|
|
# value (CMake's file(RELATIVE_PATH) prepends `..` until they meet),
|
|
# which is also relocatable but no longer prefix-agnostic.
|
|
file(RELATIVE_PATH PKGCONFIG_PCDIR_TO_PREFIX
|
|
"${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}/pkgconfig"
|
|
"${CMAKE_INSTALL_PREFIX}")
|
|
|
|
set(PKGCONFIG_OUT ${CMAKE_CURRENT_BINARY_DIR}/daedalus-fourier.pc)
|
|
file(WRITE ${PKGCONFIG_OUT}
|
|
"prefix=\${pcfiledir}/${PKGCONFIG_PCDIR_TO_PREFIX}
|
|
exec_prefix=\${prefix}
|
|
libdir=\${prefix}/${CMAKE_INSTALL_LIBDIR}
|
|
includedir=\${prefix}/${CMAKE_INSTALL_INCLUDEDIR}
|
|
shadersdir=\${prefix}/${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
|
|
Name: daedalus-fourier
|
|
Description: VP9/AV1/H.264 back-end kernels for VC VII (V3D 7.1) + ARM NEON
|
|
Version: 0.1.0
|
|
Libs: -L\${libdir} -ldaedalus_core
|
|
Libs.private: -lpthread -ldl -lm
|
|
Requires.private: vulkan
|
|
Cflags: -I\${includedir}
|
|
")
|
|
install(FILES ${PKGCONFIG_OUT}
|
|
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig
|
|
)
|
|
|
|
add_executable(test_api_idct
|
|
tests/test_api_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
)
|
|
target_link_libraries(test_api_idct PRIVATE daedalus_core)
|
|
target_compile_options(test_api_idct PRIVATE -O2)
|
|
|
|
add_executable(test_api_lpf
|
|
tests/test_api_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
tests/vp9_lpf8_ref.c
|
|
)
|
|
target_link_libraries(test_api_lpf PRIVATE daedalus_core)
|
|
target_compile_options(test_api_lpf PRIVATE -O2)
|
|
|
|
add_executable(test_api_h264
|
|
tests/test_api_h264.c
|
|
tests/h264_idct4_ref.c
|
|
tests/h264_idct8_ref.c
|
|
tests/h264_deblock_ref.c
|
|
tests/h264_h_loop_filter_luma_ref.c
|
|
tests/h264_chroma_loop_filter_ref.c
|
|
tests/h264_intra_loop_filter_ref.c
|
|
tests/h264_qpel8_mc20_ref.c
|
|
tests/h264_qpel8_mc02_ref.c
|
|
tests/h264_qpel8_mc22_ref.c
|
|
tests/h264_qpel8_quarter_axis_ref.c
|
|
tests/h264_qpel8_diag_ref.c
|
|
tests/h264_qpel8_avg_anchors_ref.c
|
|
tests/h264_qpel8_avg_rest_ref.c
|
|
)
|
|
target_link_libraries(test_api_h264 PRIVATE daedalus_core)
|
|
target_compile_options(test_api_h264 PRIVATE -O2)
|
|
|
|
add_executable(test_api_opportunistic_qpu tests/test_api_opportunistic_qpu.c)
|
|
target_link_libraries(test_api_opportunistic_qpu PRIVATE daedalus_core)
|
|
target_compile_options(test_api_opportunistic_qpu PRIVATE -O2)
|
|
|
|
# H.264 Intra_4x4 luma prediction (9 modes) — reference + tests.
|
|
# Pure CPU + spec-derived; no daedalus_core dependency yet (this is
|
|
# the bit-exact gate for the eventual shader / dispatch wiring).
|
|
add_executable(test_intra_pred_4x4
|
|
tests/test_intra_pred_4x4.c
|
|
tests/h264_intra_pred_4x4_ref.c
|
|
)
|
|
target_compile_options(test_intra_pred_4x4 PRIVATE -O2)
|
|
|
|
# H.264 Intra_16x16 luma prediction (4 modes: V, H, DC, Plane) —
|
|
# reference + tests. Same spec-gate role as the 4x4 sibling.
|
|
add_executable(test_intra_pred_16x16
|
|
tests/test_intra_pred_16x16.c
|
|
tests/h264_intra_pred_16x16_ref.c
|
|
)
|
|
target_compile_options(test_intra_pred_16x16 PRIVATE -O2)
|
|
|
|
# H.264 Intra_8x8 chroma prediction (4 modes: DC, H, V, Plane) —
|
|
# reference + tests. DC is per-quadrant (asymmetric); Plane uses
|
|
# slope coefficient 34 instead of luma's 5.
|
|
add_executable(test_intra_pred_chroma8x8
|
|
tests/test_intra_pred_chroma8x8.c
|
|
tests/h264_intra_pred_chroma8x8_ref.c
|
|
)
|
|
target_compile_options(test_intra_pred_chroma8x8 PRIVATE -O2)
|
|
|
|
# H.264 Intra_8x8 luma prediction (High profile, 9 modes + 1-2-1
|
|
# reference-sample pre-filter). This PR ships the pre-filter + the
|
|
# 3 simple modes (V, H, DC); the 6 directional modes follow.
|
|
add_executable(test_intra_pred_8x8_luma
|
|
tests/test_intra_pred_8x8_luma.c
|
|
tests/h264_intra_pred_8x8_luma_ref.c
|
|
)
|
|
target_compile_options(test_intra_pred_8x8_luma PRIVATE -O2)
|
|
|
|
add_executable(bench_pool_overhead tests/bench_pool_overhead.c)
|
|
target_link_libraries(bench_pool_overhead PRIVATE daedalus_core)
|
|
target_compile_options(bench_pool_overhead PRIVATE -O2)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
# (re-open the conditional so the closing endif() below balances)
|
|
|
|
|
|
# M4 — concurrent CPU(NEON) + QPU bench. Links the FFmpeg NEON
|
|
# snapshot so we can run real NEON kernels on pinned CPU cores
|
|
# while the QPU runs its dispatch loop concurrently.
|
|
add_executable(bench_concurrent
|
|
tests/bench_concurrent.c
|
|
${FFASM_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent daedalus_shaders)
|
|
target_link_libraries(bench_concurrent PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 2 M4'' — concurrent LPF.
|
|
add_executable(bench_concurrent_lpf
|
|
tests/bench_concurrent_lpf.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_lpf daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_lpf PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_lpf PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 3 M4''' — concurrent MC.
|
|
add_executable(bench_concurrent_mc
|
|
tests/bench_concurrent_mc.c
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_mc daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_mc PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_mc PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 4 M4'''' — concurrent LPF wd=8.
|
|
add_executable(bench_concurrent_lpf8
|
|
tests/bench_concurrent_lpf8.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_lpf8 daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_lpf8 PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_lpf8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Issue 003 — mixed-kernel M4 bench (NEON-N kernel A + QPU kernel B).
|
|
# Links all FFmpeg + dav1d NEON sources we have (cycles 1-8).
|
|
add_executable(bench_concurrent_mixed
|
|
tests/bench_concurrent_mixed.c
|
|
${FFASM_SOURCES}
|
|
${FFASM_LPF_SOURCES}
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
${FFASM_H264DSP_SOURCES}
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_mixed daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_mixed PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_mixed PRIVATE -O3 -march=armv8-a+simd)
|
|
endif()
|
|
|
|
# ---- Summary ----------------------------------------------------------------
|
|
|
|
message(STATUS "daedalus-fourier build configured for ${CMAKE_SYSTEM_PROCESSOR}")
|
|
message(STATUS " FFmpeg snapshot: ${FFSNAP}")
|
|
message(STATUS " Build type: ${CMAKE_BUILD_TYPE}")
|
|
message(STATUS " Targets: bench_neon_idct"
|
|
"$<$<BOOL:${DAEDALUS_BUILD_VULKAN}>:; bench_vulkan_dispatch>")
|