ce6703a862
Lays the bit-exact gate for H.264 §8.3.1.4 Intra_4x4 luma prediction.
Spec-derived C reference covering all 9 modes; standalone test
exercises each against hand-computed expected 4x4 patterns.
Why fourier (not the decoder) gets this: it's a reusable spec-level
primitive — both daedalus-decoder (Phase 1 Stage 2a intra prediction)
and any future shader work will need the same bit-exact reference.
Putting it in fourier alongside the IDCT / deblock refs keeps the
"spec implementations" library cohesive.
Why CPU C reference, not NEON or QPU: the vendored FFmpeg snapshot
(external/ffmpeg-snapshot/libavcodec/aarch64/) has h264dsp/idct/qpel
but NOT h264pred. Vendoring h264pred_neon.S would expand the snapshot
surface; deferring that pending real perf data. Per the cycle 9
NEON benches that take ~5 ns per 8x8 qpel block, intra prediction
at ~5 ns per 4x4 block × 16 blocks/MB × 8160 MBs = ~650 us/frame at
1080p — well inside budget even at NEON, and much further inside at
plain C. Not the critical-path concern.
Scope:
- tests/h264_intra_pred_4x4_ref.c — 9 prediction modes per
H.264 spec §8.3.1.4 sub-clauses, FFmpeg-style interface:
void daedalus_h264_pred_4x4_<name>_ref(uint8_t *dst, ptrdiff_t stride);
Reads top/top-right/left/top-left neighbours from dst[-stride/-1]
offsets, writes 4×4 output at dst[0..3][0..3]. Assumes all 13
neighbour bytes are valid (interior-MB case; availability
fallbacks are caller-side per spec).
- tests/test_intra_pred_4x4.c — 10 cases:
* 9 uniform-context degenerate tests (one per mode), establishing
that nothing is structurally broken (all output cells must
equal the uniform input value).
* 1 asymmetric Vertical_Right sanity test with 16 distinct
expected cells hand-computed from spec §8.3.1.4.6 — the
"really exercise orientation + row/col arithmetic" gate.
- CMakeLists.txt — new test_intra_pred_4x4 binary (no daedalus_core
dependency; pure-CPU library doesn't need a context to construct).
Verified on hertz:
$ ./build/test_intra_pred_4x4
Vertical (mode 0) PASS
Horizontal (mode 1) PASS
DC (mode 2) PASS
DiagDownLeft (mode 3) PASS
DiagDownRight (mode 4) PASS
VerticalRight (mode 5) PASS
HorizontalDown (mode 6) PASS
VerticalLeft (mode 7) PASS
HorizontalUp (mode 8) PASS
VR asym (sanity) PASS
ALL 10 intra-4x4 mode references PASS
The VR asym test passed first try; the DC test fell on the first
attempt because my test expectation miscomputed the rounding shift
(I wrote 4, actual is 2 = (16+4)>>3). Fixed in the test. Reference
itself never had the bug.
What this does NOT cover (next-step backlog):
- Intra_16x16 luma prediction (4 modes per H.264 §8.3.2): vertical,
horizontal, DC, plane.
- Intra_8x8 chroma prediction (4 modes per H.264 §8.3.3): DC,
horizontal, vertical, plane.
- Intra_8x8 luma prediction (High profile, 9 modes per §8.3.2.1) —
these are the High-profile siblings of the modes in this PR with
the 1-2-1 smoothing pre-filter. Different but well-defined.
- Neighbour availability fallback (top-edge MB, left-edge MB,
slice-boundary, top-right unavailable in some positions).
- Dispatch wrappers — these refs aren't surfaced through
daedalus_dispatch_*(). Whether to do that depends on the
daedalus-decoder Stage 2a architecture (per-block CPU vs
per-diagonal GPU wavefront — TBD).
614 lines
22 KiB
CMake
614 lines
22 KiB
CMake
# daedalus-fourier — Phase 3 baseline + (later) Phase 6 implementation.
|
|
#
|
|
# Builds:
|
|
# bench_neon_idct — NEON throughput baseline (Phase 3 M3) +
|
|
# bit-exact correctness gate (Phase 1 M1).
|
|
# bench_vulkan_dispatch — Vulkan compute dispatch-overhead baseline (M5).
|
|
#
|
|
# Linkage note: bench_neon_idct statically links the vendored
|
|
# FFmpeg n7.1.3 NEON snapshot (LGPL-2.1+); see
|
|
# external/ffmpeg-snapshot/PROVENANCE.md.
|
|
|
|
cmake_minimum_required(VERSION 3.20)
|
|
project(daedalus-fourier C ASM)
|
|
|
|
set(CMAKE_C_STANDARD 11)
|
|
set(CMAKE_C_STANDARD_REQUIRED ON)
|
|
|
|
if (NOT CMAKE_BUILD_TYPE)
|
|
set(CMAKE_BUILD_TYPE Release)
|
|
endif()
|
|
|
|
if (NOT CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
|
|
message(FATAL_ERROR
|
|
"daedalus-fourier targets aarch64 (Pi 5 / BCM2712). "
|
|
"Cross-compile not yet wired.")
|
|
endif()
|
|
|
|
add_compile_options(-Wall -Wextra -Wno-unused-parameter)
|
|
|
|
# ---- Vendored FFmpeg snapshot (LGPL-2.1+) -----------------------------------
|
|
|
|
set(FFSNAP ${CMAKE_SOURCE_DIR}/external/ffmpeg-snapshot)
|
|
|
|
# Assembly preamble (config.h shim + FFmpeg's asm helpers) used by the
|
|
# vendored .S file. -I flags expose:
|
|
# - FFSNAP/ so `#include "config.h"` finds our shim
|
|
# - FFSNAP/libavcodec/aarch64/ so `#include "neon.S"` finds the helper
|
|
# - FFSNAP/ so `#include "libavutil/aarch64/asm.S"`
|
|
# resolves against the vendored copy
|
|
set(FFASM_FLAGS
|
|
-I${FFSNAP}
|
|
-I${FFSNAP}/libavcodec/aarch64
|
|
-I${FFSNAP}
|
|
)
|
|
|
|
# ---- Vendored dav1d snapshot (BSD-2-Clause) — cycle 5+ ----------------------
|
|
|
|
set(DAV1DSNAP ${CMAKE_SOURCE_DIR}/external/dav1d-snapshot)
|
|
|
|
# dav1d's asm preamble expects "src/arm/asm.S" and "cdef_tmpl.S" / "util.S"
|
|
# (the latter two as bare basenames from within src/arm/64/). Include paths:
|
|
set(DAV1D_ASM_FLAGS
|
|
-I${DAV1DSNAP} # for config.h shim + src/arm/asm.S
|
|
-I${DAV1DSNAP}/src/arm/64 # for util.S, cdef_tmpl.S
|
|
)
|
|
|
|
set(DAV1D_CDEF_ASM_SOURCES
|
|
${DAV1DSNAP}/src/arm/64/cdef.S
|
|
)
|
|
set(DAV1D_CDEF_C_SOURCES
|
|
${DAV1DSNAP}/src/tables_cdef_subset.c
|
|
)
|
|
set_source_files_properties(${DAV1D_CDEF_ASM_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${DAV1D_ASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
set(FFASM_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9itxfm_neon.S
|
|
)
|
|
|
|
# Cycle 6 — H.264 IDCT 4x4 + 8x8 NEON (vendored 2026-05-18).
|
|
set(FFASM_H264IDCT_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264idct_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264IDCT_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 2 — VP9 loop filter NEON source (vendored 2026-05-18).
|
|
set(FFASM_LPF_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9lpf_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_LPF_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 3 — VP9 MC interpolation NEON source + filter coefficient table
|
|
# (vendored 2026-05-18). The .c table provides ff_vp9_subpel_filters
|
|
# symbol which vp9mc_neon.S references via movrel.
|
|
set(FFASM_MC_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/vp9mc_neon.S
|
|
)
|
|
set(FFC_MC_SOURCES
|
|
${FFSNAP}/libavcodec/vp9_subpel_filters_table.c
|
|
)
|
|
set_source_files_properties(${FFASM_MC_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Tell CMake/gas to preprocess .S sources.
|
|
set_source_files_properties(${FFASM_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# ---- NEON baseline microbenches --------------------------------------------
|
|
|
|
# Cycle 6 — H.264 IDCT 4x4 NEON M3 baseline bench.
|
|
add_executable(bench_neon_h264idct4
|
|
tests/bench_neon_h264idct4.c
|
|
tests/h264_idct4_ref.c
|
|
${FFASM_H264IDCT_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264idct4 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 7 — H.264 IDCT 8x8 NEON M3 baseline bench.
|
|
add_executable(bench_neon_h264idct8
|
|
tests/bench_neon_h264idct8.c
|
|
tests/h264_idct8_ref.c
|
|
${FFASM_H264IDCT_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264idct8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 8 — H.264 luma vertical deblock NEON M3 baseline bench.
|
|
set(FFASM_H264DSP_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264dsp_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264DSP_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
# Cycle 9 — H.264 luma qpel MC NEON.
|
|
set(FFASM_H264QPEL_SOURCES
|
|
${FFSNAP}/libavcodec/aarch64/h264qpel_neon.S
|
|
)
|
|
set_source_files_properties(${FFASM_H264QPEL_SOURCES} PROPERTIES
|
|
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
|
LANGUAGE ASM)
|
|
|
|
add_executable(bench_neon_h264deblock
|
|
tests/bench_neon_h264deblock.c
|
|
tests/h264_deblock_ref.c
|
|
${FFASM_H264DSP_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264deblock PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 9 — H.264 luma qpel mc20 NEON M3 baseline.
|
|
add_executable(bench_neon_h264qpel_mc20
|
|
tests/bench_neon_h264qpel_mc20.c
|
|
tests/h264_qpel8_mc20_ref.c
|
|
${FFASM_H264QPEL_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_h264qpel_mc20 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
add_executable(bench_neon_idct
|
|
tests/bench_neon_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
${FFASM_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_idct PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 2 — VP9 loop filter NEON baseline.
|
|
add_executable(bench_neon_lpf
|
|
tests/bench_neon_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_lpf PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 3 — VP9 MC interpolation NEON baseline.
|
|
add_executable(bench_neon_mc
|
|
tests/bench_neon_mc.c
|
|
tests/vp9_mc_ref.c
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_mc PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 4 — VP9 LPF wd=8 NEON baseline (same vendored .S as cycle 2).
|
|
add_executable(bench_neon_lpf8
|
|
tests/bench_neon_lpf8.c
|
|
tests/vp9_lpf8_ref.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_lpf8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 5 — AV1 CDEF NEON baseline (dav1d snapshot).
|
|
add_executable(bench_neon_cdef
|
|
tests/bench_neon_cdef.c
|
|
tests/cdef_ref.c
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
target_compile_options(bench_neon_cdef PRIVATE -O3 -march=armv8-a+simd)
|
|
# bench_neon_idct doesn't need vulkan/drm — pure CPU baseline.
|
|
|
|
# ---- Vulkan dispatch-overhead microbench (next chunk) ----------------------
|
|
# Stub: written in a follow-up step. Toggle ON with -DDAEDALUS_BUILD_VULKAN=ON
|
|
# once tests/bench_vulkan_dispatch.c exists.
|
|
|
|
option(DAEDALUS_BUILD_VULKAN "Build Vulkan compute-dispatch microbench" ON)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
find_package(Vulkan REQUIRED)
|
|
|
|
# Compile GLSL compute shaders to SPIR-V via glslangValidator.
|
|
# The binary loads them at runtime from the build dir (cwd-relative).
|
|
find_program(GLSLANG_VALIDATOR
|
|
NAMES glslangValidator glslang
|
|
REQUIRED)
|
|
|
|
set(NOOP_SPV ${CMAKE_BINARY_DIR}/noop.spv)
|
|
add_custom_command(
|
|
OUTPUT ${NOOP_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V -o ${NOOP_SPV}
|
|
${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
|
|
COMMENT "glslang: noop.comp -> noop.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_idct8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${IDCT8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${IDCT8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
|
|
COMMENT "glslang: v3d_idct8.comp -> v3d_idct8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(LPF_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_4_8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${LPF_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${LPF_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
|
|
COMMENT "glslang: v3d_lpf_h_4_8.comp -> v3d_lpf_h_4_8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(MC_SPV ${CMAKE_BINARY_DIR}/v3d_mc_8h.spv)
|
|
add_custom_command(
|
|
OUTPUT ${MC_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${MC_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
|
|
COMMENT "glslang: v3d_mc_8h.comp -> v3d_mc_8h.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(LPF8_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_8_8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${LPF8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${LPF8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
|
|
COMMENT "glslang: v3d_lpf_h_8_8.comp -> v3d_lpf_h_8_8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(CDEF_SPV ${CMAKE_BINARY_DIR}/v3d_cdef.spv)
|
|
add_custom_command(
|
|
OUTPUT ${CDEF_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${CDEF_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
|
|
COMMENT "glslang: v3d_cdef.comp -> v3d_cdef.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264DEBLOCK_SPV ${CMAKE_BINARY_DIR}/v3d_h264deblock.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264DEBLOCK_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264DEBLOCK_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
|
|
COMMENT "glslang: v3d_h264deblock.comp -> v3d_h264deblock.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_IDCT4_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct4.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_IDCT4_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_IDCT4_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
|
|
COMMENT "glslang: v3d_h264_idct4.comp -> v3d_h264_idct4.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct8.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_IDCT8_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_IDCT8_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
|
|
COMMENT "glslang: v3d_h264_idct8.comp -> v3d_h264_idct8.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
set(H264_QPEL_MC20_SPV ${CMAKE_BINARY_DIR}/v3d_h264_qpel_mc20.spv)
|
|
add_custom_command(
|
|
OUTPUT ${H264_QPEL_MC20_SPV}
|
|
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
|
|
-o ${H264_QPEL_MC20_SPV}
|
|
${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
|
|
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
|
|
COMMENT "glslang: v3d_h264_qpel_mc20.comp -> v3d_h264_qpel_mc20.spv"
|
|
VERBATIM
|
|
)
|
|
|
|
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV} ${CDEF_SPV} ${H264DEBLOCK_SPV} ${H264_IDCT4_SPV} ${H264_IDCT8_SPV} ${H264_QPEL_MC20_SPV})
|
|
|
|
# v3d_runner — reusable Vulkan plumbing.
|
|
add_library(v3d_runner STATIC src/v3d_runner.c)
|
|
target_include_directories(v3d_runner PUBLIC src)
|
|
target_link_libraries(v3d_runner PUBLIC Vulkan::Vulkan)
|
|
target_compile_options(v3d_runner PRIVATE -O2)
|
|
|
|
add_executable(bench_vulkan_dispatch tests/bench_vulkan_dispatch.c)
|
|
add_dependencies(bench_vulkan_dispatch daedalus_shaders)
|
|
target_link_libraries(bench_vulkan_dispatch PRIVATE Vulkan::Vulkan)
|
|
target_compile_options(bench_vulkan_dispatch PRIVATE -O2)
|
|
|
|
add_executable(bench_v3d_idct
|
|
tests/bench_v3d_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_idct daedalus_shaders)
|
|
target_link_libraries(bench_v3d_idct PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_idct PRIVATE -O2)
|
|
|
|
# Cycle 2 — QPU LPF bench.
|
|
add_executable(bench_v3d_lpf
|
|
tests/bench_v3d_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_lpf daedalus_shaders)
|
|
target_link_libraries(bench_v3d_lpf PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_lpf PRIVATE -O2)
|
|
|
|
# Cycle 3 — QPU MC bench.
|
|
add_executable(bench_v3d_mc
|
|
tests/bench_v3d_mc.c
|
|
tests/vp9_mc_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_mc daedalus_shaders)
|
|
target_link_libraries(bench_v3d_mc PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_mc PRIVATE -O2)
|
|
|
|
# Cycle 4 — QPU LPF wd=8 bench.
|
|
add_executable(bench_v3d_lpf8
|
|
tests/bench_v3d_lpf8.c
|
|
tests/vp9_lpf8_ref.c
|
|
)
|
|
add_dependencies(bench_v3d_lpf8 daedalus_shaders)
|
|
target_link_libraries(bench_v3d_lpf8 PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_lpf8 PRIVATE -O2)
|
|
|
|
# Cycle 5 — QPU CDEF bench (3-way M1 against NEON + C ref).
|
|
add_executable(bench_v3d_cdef
|
|
tests/bench_v3d_cdef.c
|
|
tests/cdef_ref.c
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
add_dependencies(bench_v3d_cdef daedalus_shaders)
|
|
target_link_libraries(bench_v3d_cdef PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_cdef PRIVATE -O2)
|
|
|
|
# Cycle 8 — QPU H.264 deblock bench (3-way).
|
|
add_executable(bench_v3d_h264deblock
|
|
tests/bench_v3d_h264deblock.c
|
|
tests/h264_deblock_ref.c
|
|
${FFASM_H264DSP_SOURCES}
|
|
)
|
|
add_dependencies(bench_v3d_h264deblock daedalus_shaders)
|
|
target_link_libraries(bench_v3d_h264deblock PRIVATE v3d_runner Vulkan::Vulkan)
|
|
target_compile_options(bench_v3d_h264deblock PRIVATE -O2)
|
|
endif()
|
|
|
|
# ---- Phase 8 — public C API library + smoke test ---------------------------
|
|
|
|
add_library(daedalus_core STATIC
|
|
src/daedalus_core.c
|
|
src/v3d_runner.c
|
|
${FFASM_SOURCES}
|
|
${FFASM_LPF_SOURCES}
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
${FFASM_H264IDCT_SOURCES}
|
|
${FFASM_H264DSP_SOURCES}
|
|
${FFASM_H264QPEL_SOURCES}
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
target_include_directories(daedalus_core PUBLIC include)
|
|
target_include_directories(daedalus_core PRIVATE src)
|
|
target_link_libraries(daedalus_core PUBLIC Vulkan::Vulkan)
|
|
target_compile_options(daedalus_core PRIVATE -O2)
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
add_dependencies(daedalus_core daedalus_shaders)
|
|
endif()
|
|
|
|
# ---- Install rules for sibling consumers (Phase 8 V4L2 daemon, etc.) -------
|
|
#
|
|
# Installs:
|
|
# - libdaedalus_core.a → ${CMAKE_INSTALL_LIBDIR}
|
|
# - include/daedalus.h → ${CMAKE_INSTALL_INCLUDEDIR}
|
|
# - daedalus-fourier.pc → ${CMAKE_INSTALL_LIBDIR}/pkgconfig
|
|
# - V3D SPIR-V shaders → ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
# (only when DAEDALUS_BUILD_VULKAN is ON; consumers using
|
|
# daedalus_ctx_create_no_qpu() don't need them)
|
|
#
|
|
# pkg-config tells consumers what to link; the static-archive
|
|
# dependencies (Vulkan, pthread, and the vendored asm symbols)
|
|
# are surfaced through Requires.private + Libs.private so a
|
|
# consumer doing `pkg-config --libs daedalus-fourier` gets the
|
|
# right transitive link line.
|
|
|
|
include(GNUInstallDirs)
|
|
|
|
install(TARGETS daedalus_core
|
|
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
|
)
|
|
|
|
install(FILES include/daedalus.h
|
|
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
|
|
)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
install(FILES
|
|
${NOOP_SPV}
|
|
${IDCT8_SPV}
|
|
${LPF_SPV}
|
|
${MC_SPV}
|
|
${LPF8_SPV}
|
|
${CDEF_SPV}
|
|
${H264DEBLOCK_SPV}
|
|
${H264_IDCT4_SPV}
|
|
${H264_IDCT8_SPV}
|
|
${H264_QPEL_MC20_SPV}
|
|
DESTINATION ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
)
|
|
endif()
|
|
|
|
# pkg-config file. Vulkan goes in Requires.private (consumer's
|
|
# pkg-config call gets it via --static). pthread + dl are needed
|
|
# by the static archive's runtime helpers.
|
|
#
|
|
# `prefix` is derived from ${pcfiledir} so the .pc is relocatable:
|
|
# pkg-config substitutes ${pcfiledir} with the directory holding the
|
|
# .pc at lookup time, and the relative path from
|
|
# <prefix>/<libdir>/pkgconfig back to <prefix> tells pkg-config the
|
|
# install prefix without baking it in. This is why
|
|
# `cmake --install build --prefix /foo` produces a .pc that correctly
|
|
# resolves `prefix=/foo` instead of baking whatever CMAKE_INSTALL_PREFIX
|
|
# was at *configure* time (default /usr/local). DESTDIR-staged
|
|
# installs work too: at runtime pkg-config sees the .pc at its real
|
|
# install path and computes the right prefix.
|
|
#
|
|
# Relative-path depth is computed from CMAKE_INSTALL_LIBDIR (and
|
|
# whatever multiarch tuple GNUInstallDirs adds) so Debian-style
|
|
# `lib/aarch64-linux-gnu/pkgconfig/...` resolves with the right number
|
|
# of `..` components. Layouts where libdir is *not* under prefix are
|
|
# not supported by this scheme; if a packager overrides libdir to an
|
|
# absolute path the relative-path machinery falls back to the absolute
|
|
# value (CMake's file(RELATIVE_PATH) prepends `..` until they meet),
|
|
# which is also relocatable but no longer prefix-agnostic.
|
|
file(RELATIVE_PATH PKGCONFIG_PCDIR_TO_PREFIX
|
|
"${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}/pkgconfig"
|
|
"${CMAKE_INSTALL_PREFIX}")
|
|
|
|
set(PKGCONFIG_OUT ${CMAKE_CURRENT_BINARY_DIR}/daedalus-fourier.pc)
|
|
file(WRITE ${PKGCONFIG_OUT}
|
|
"prefix=\${pcfiledir}/${PKGCONFIG_PCDIR_TO_PREFIX}
|
|
exec_prefix=\${prefix}
|
|
libdir=\${prefix}/${CMAKE_INSTALL_LIBDIR}
|
|
includedir=\${prefix}/${CMAKE_INSTALL_INCLUDEDIR}
|
|
shadersdir=\${prefix}/${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
|
|
|
|
Name: daedalus-fourier
|
|
Description: VP9/AV1/H.264 back-end kernels for VC VII (V3D 7.1) + ARM NEON
|
|
Version: 0.1.0
|
|
Libs: -L\${libdir} -ldaedalus_core
|
|
Libs.private: -lpthread -ldl -lm
|
|
Requires.private: vulkan
|
|
Cflags: -I\${includedir}
|
|
")
|
|
install(FILES ${PKGCONFIG_OUT}
|
|
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig
|
|
)
|
|
|
|
add_executable(test_api_idct
|
|
tests/test_api_idct.c
|
|
tests/vp9_idct8_ref.c
|
|
)
|
|
target_link_libraries(test_api_idct PRIVATE daedalus_core)
|
|
target_compile_options(test_api_idct PRIVATE -O2)
|
|
|
|
add_executable(test_api_lpf
|
|
tests/test_api_lpf.c
|
|
tests/vp9_lpf_ref.c
|
|
tests/vp9_lpf8_ref.c
|
|
)
|
|
target_link_libraries(test_api_lpf PRIVATE daedalus_core)
|
|
target_compile_options(test_api_lpf PRIVATE -O2)
|
|
|
|
add_executable(test_api_h264
|
|
tests/test_api_h264.c
|
|
tests/h264_idct4_ref.c
|
|
tests/h264_idct8_ref.c
|
|
tests/h264_deblock_ref.c
|
|
tests/h264_h_loop_filter_luma_ref.c
|
|
tests/h264_chroma_loop_filter_ref.c
|
|
tests/h264_intra_loop_filter_ref.c
|
|
tests/h264_qpel8_mc20_ref.c
|
|
)
|
|
target_link_libraries(test_api_h264 PRIVATE daedalus_core)
|
|
target_compile_options(test_api_h264 PRIVATE -O2)
|
|
|
|
add_executable(test_api_opportunistic_qpu tests/test_api_opportunistic_qpu.c)
|
|
target_link_libraries(test_api_opportunistic_qpu PRIVATE daedalus_core)
|
|
target_compile_options(test_api_opportunistic_qpu PRIVATE -O2)
|
|
|
|
# H.264 Intra_4x4 luma prediction (9 modes) — reference + tests.
|
|
# Pure CPU + spec-derived; no daedalus_core dependency yet (this is
|
|
# the bit-exact gate for the eventual shader / dispatch wiring).
|
|
add_executable(test_intra_pred_4x4
|
|
tests/test_intra_pred_4x4.c
|
|
tests/h264_intra_pred_4x4_ref.c
|
|
)
|
|
target_compile_options(test_intra_pred_4x4 PRIVATE -O2)
|
|
|
|
add_executable(bench_pool_overhead tests/bench_pool_overhead.c)
|
|
target_link_libraries(bench_pool_overhead PRIVATE daedalus_core)
|
|
target_compile_options(bench_pool_overhead PRIVATE -O2)
|
|
|
|
if (DAEDALUS_BUILD_VULKAN)
|
|
# (re-open the conditional so the closing endif() below balances)
|
|
|
|
|
|
# M4 — concurrent CPU(NEON) + QPU bench. Links the FFmpeg NEON
|
|
# snapshot so we can run real NEON kernels on pinned CPU cores
|
|
# while the QPU runs its dispatch loop concurrently.
|
|
add_executable(bench_concurrent
|
|
tests/bench_concurrent.c
|
|
${FFASM_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent daedalus_shaders)
|
|
target_link_libraries(bench_concurrent PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 2 M4'' — concurrent LPF.
|
|
add_executable(bench_concurrent_lpf
|
|
tests/bench_concurrent_lpf.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_lpf daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_lpf PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_lpf PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 3 M4''' — concurrent MC.
|
|
add_executable(bench_concurrent_mc
|
|
tests/bench_concurrent_mc.c
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_mc daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_mc PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_mc PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Cycle 4 M4'''' — concurrent LPF wd=8.
|
|
add_executable(bench_concurrent_lpf8
|
|
tests/bench_concurrent_lpf8.c
|
|
${FFASM_LPF_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_lpf8 daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_lpf8 PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_lpf8 PRIVATE -O3 -march=armv8-a+simd)
|
|
|
|
# Issue 003 — mixed-kernel M4 bench (NEON-N kernel A + QPU kernel B).
|
|
# Links all FFmpeg + dav1d NEON sources we have (cycles 1-8).
|
|
add_executable(bench_concurrent_mixed
|
|
tests/bench_concurrent_mixed.c
|
|
${FFASM_SOURCES}
|
|
${FFASM_LPF_SOURCES}
|
|
${FFASM_MC_SOURCES}
|
|
${FFC_MC_SOURCES}
|
|
${FFASM_H264DSP_SOURCES}
|
|
${DAV1D_CDEF_ASM_SOURCES}
|
|
${DAV1D_CDEF_C_SOURCES}
|
|
)
|
|
add_dependencies(bench_concurrent_mixed daedalus_shaders)
|
|
target_link_libraries(bench_concurrent_mixed PRIVATE v3d_runner Vulkan::Vulkan pthread)
|
|
target_compile_options(bench_concurrent_mixed PRIVATE -O3 -march=armv8-a+simd)
|
|
endif()
|
|
|
|
# ---- Summary ----------------------------------------------------------------
|
|
|
|
message(STATUS "daedalus-fourier build configured for ${CMAKE_SYSTEM_PROCESSOR}")
|
|
message(STATUS " FFmpeg snapshot: ${FFSNAP}")
|
|
message(STATUS " Build type: ${CMAKE_BUILD_TYPE}")
|
|
message(STATUS " Targets: bench_neon_idct"
|
|
"$<$<BOOL:${DAEDALUS_BUILD_VULKAN}>:; bench_vulkan_dispatch>")
|