Files
daedalus-fourier/CMakeLists.txt
T
claude-noether 8bc6d27ea7 h264: Intra_8x8 luma prediction (High profile) — pre-filter + 3 modes
Adds the High-profile Intra_8x8 luma primitive set.  Per H.264
§8.3.2.1, this is distinct from Intra_4x4 in two ways:

  1. REFERENCE SAMPLE PRE-FILTER (§8.3.2.1.1).  The 25 raw neighbour
     samples are smoothed with a 1-2-1 filter BEFORE prediction.
     Spec-defined boundary handling at corners and the right edge:
       - top-left filt'd: (top[0] + 2*tl + left[0] + 2) >> 2
       - top[0] filt'd:   (tl + 2*t[0] + t[1] + 2) >> 2
       - top[i] for 1..14: (t[i-1] + 2*t[i] + t[i+1] + 2) >> 2
       - top[15] filt'd:  (t[14] + 3*t[15] + 2) >> 2  ← 3× boundary
       - left analogous, with l[7] using 3× boundary.

  2. SCALE.  All 9 prediction modes operate at 8x8 on the filtered
     samples (Intra_4x4 is 4x4 on raw samples).

This PR ships the pre-filter + the 3 simple modes (V, H, DC):

  - Mode 0 Vertical (§8.3.2.1.2): pred[r,c] = filt_top[c]
  - Mode 1 Horizontal (§8.3.2.1.3): pred[r,c] = filt_left[r]
  - Mode 2 DC (§8.3.2.1.4): ((sum_filt_top[0..7] + sum_filt_left[0..7]
                              + 8) >> 4) broadcast

The 6 directional modes (DDL, DDR, VR, HD, VL, HU at 8x8 per
§8.3.2.1.5..§8.3.2.1.10) follow in a separate PR.  They use the
same filtered samples; only the per-cell formula differs.

Test design (tests/test_intra_pred_8x8_luma.c):

  - 3 uniform-context tests, one per mode (sanity).
  - 2 gradient tests that exercise the pre-filter's interior +
    boundary cases:
      * Vertical with top = 0..15: spec arithmetic gives filtered
        top[c] = c for c in 0..7 (gradient input → identity through
        the 1-2-1 filter on the interior; boundaries arithmetically
        verify too).  Test expects pred[r,c] = c.
      * Horizontal with left = 0..7: same arithmetic chain on the
        left col.  Test expects pred[r,c] = r.

Verified on hertz:

  $ ./build/test_intra_pred_8x8_luma
    Vertical (mode 0, uniform top) PASS
    Horizontal (mode 1, uniform left) PASS
    DC (mode 2, uniform)           PASS
    Vertical (mode 0, gradient)    PASS (filtered gradient)
    Horizontal (mode 1, gradient)  PASS (filtered gradient)

  ALL Intra_8x8 luma PASS (3 modes — V, H, DC)

The pre-filter being right first try is meaningful — the boundary
samples use a 3× weight rather than 2× (filt[top 15] = (t[14] +
3*t[15] + 2) >> 2), which is easy to forget when transcribing.  The
gradient test would have surfaced any boundary mistake immediately.

Combined intra-prediction primitive coverage after this PR:
  Intra_4x4 luma   ✓ (9 modes, PR #12)
  Intra_16x16 luma ✓ (4 modes, PR #13)
  Intra_8x8 chroma ✓ (4 modes, PR #14)
  Intra_8x8 luma   △ (3 of 9 modes — V, H, DC ✓; DDL/DDR/VR/HD/VL/HU pending)

The 6 remaining Intra_8x8 luma directional modes are spec-mechanical
follow-ups; each is a ~30-line formula per §8.3.2.1.5+.
2026-05-25 09:35:49 +02:00

646 lines
23 KiB
CMake

# daedalus-fourier — Phase 3 baseline + (later) Phase 6 implementation.
#
# Builds:
# bench_neon_idct — NEON throughput baseline (Phase 3 M3) +
# bit-exact correctness gate (Phase 1 M1).
# bench_vulkan_dispatch — Vulkan compute dispatch-overhead baseline (M5).
#
# Linkage note: bench_neon_idct statically links the vendored
# FFmpeg n7.1.3 NEON snapshot (LGPL-2.1+); see
# external/ffmpeg-snapshot/PROVENANCE.md.
cmake_minimum_required(VERSION 3.20)
project(daedalus-fourier C ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
if (NOT CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
message(FATAL_ERROR
"daedalus-fourier targets aarch64 (Pi 5 / BCM2712). "
"Cross-compile not yet wired.")
endif()
add_compile_options(-Wall -Wextra -Wno-unused-parameter)
# ---- Vendored FFmpeg snapshot (LGPL-2.1+) -----------------------------------
set(FFSNAP ${CMAKE_SOURCE_DIR}/external/ffmpeg-snapshot)
# Assembly preamble (config.h shim + FFmpeg's asm helpers) used by the
# vendored .S file. -I flags expose:
# - FFSNAP/ so `#include "config.h"` finds our shim
# - FFSNAP/libavcodec/aarch64/ so `#include "neon.S"` finds the helper
# - FFSNAP/ so `#include "libavutil/aarch64/asm.S"`
# resolves against the vendored copy
set(FFASM_FLAGS
-I${FFSNAP}
-I${FFSNAP}/libavcodec/aarch64
-I${FFSNAP}
)
# ---- Vendored dav1d snapshot (BSD-2-Clause) — cycle 5+ ----------------------
set(DAV1DSNAP ${CMAKE_SOURCE_DIR}/external/dav1d-snapshot)
# dav1d's asm preamble expects "src/arm/asm.S" and "cdef_tmpl.S" / "util.S"
# (the latter two as bare basenames from within src/arm/64/). Include paths:
set(DAV1D_ASM_FLAGS
-I${DAV1DSNAP} # for config.h shim + src/arm/asm.S
-I${DAV1DSNAP}/src/arm/64 # for util.S, cdef_tmpl.S
)
set(DAV1D_CDEF_ASM_SOURCES
${DAV1DSNAP}/src/arm/64/cdef.S
)
set(DAV1D_CDEF_C_SOURCES
${DAV1DSNAP}/src/tables_cdef_subset.c
)
set_source_files_properties(${DAV1D_CDEF_ASM_SOURCES} PROPERTIES
COMPILE_OPTIONS "${DAV1D_ASM_FLAGS}"
LANGUAGE ASM)
set(FFASM_SOURCES
${FFSNAP}/libavcodec/aarch64/vp9itxfm_neon.S
)
# Cycle 6 — H.264 IDCT 4x4 + 8x8 NEON (vendored 2026-05-18).
set(FFASM_H264IDCT_SOURCES
${FFSNAP}/libavcodec/aarch64/h264idct_neon.S
)
set_source_files_properties(${FFASM_H264IDCT_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
# Cycle 2 — VP9 loop filter NEON source (vendored 2026-05-18).
set(FFASM_LPF_SOURCES
${FFSNAP}/libavcodec/aarch64/vp9lpf_neon.S
)
set_source_files_properties(${FFASM_LPF_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
# Cycle 3 — VP9 MC interpolation NEON source + filter coefficient table
# (vendored 2026-05-18). The .c table provides ff_vp9_subpel_filters
# symbol which vp9mc_neon.S references via movrel.
set(FFASM_MC_SOURCES
${FFSNAP}/libavcodec/aarch64/vp9mc_neon.S
)
set(FFC_MC_SOURCES
${FFSNAP}/libavcodec/vp9_subpel_filters_table.c
)
set_source_files_properties(${FFASM_MC_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
# Tell CMake/gas to preprocess .S sources.
set_source_files_properties(${FFASM_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
# ---- NEON baseline microbenches --------------------------------------------
# Cycle 6 — H.264 IDCT 4x4 NEON M3 baseline bench.
add_executable(bench_neon_h264idct4
tests/bench_neon_h264idct4.c
tests/h264_idct4_ref.c
${FFASM_H264IDCT_SOURCES}
)
target_compile_options(bench_neon_h264idct4 PRIVATE -O3 -march=armv8-a+simd)
# Cycle 7 — H.264 IDCT 8x8 NEON M3 baseline bench.
add_executable(bench_neon_h264idct8
tests/bench_neon_h264idct8.c
tests/h264_idct8_ref.c
${FFASM_H264IDCT_SOURCES}
)
target_compile_options(bench_neon_h264idct8 PRIVATE -O3 -march=armv8-a+simd)
# Cycle 8 — H.264 luma vertical deblock NEON M3 baseline bench.
set(FFASM_H264DSP_SOURCES
${FFSNAP}/libavcodec/aarch64/h264dsp_neon.S
)
set_source_files_properties(${FFASM_H264DSP_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
# Cycle 9 — H.264 luma qpel MC NEON.
set(FFASM_H264QPEL_SOURCES
${FFSNAP}/libavcodec/aarch64/h264qpel_neon.S
)
set_source_files_properties(${FFASM_H264QPEL_SOURCES} PROPERTIES
COMPILE_OPTIONS "${FFASM_FLAGS}"
LANGUAGE ASM)
add_executable(bench_neon_h264deblock
tests/bench_neon_h264deblock.c
tests/h264_deblock_ref.c
${FFASM_H264DSP_SOURCES}
)
target_compile_options(bench_neon_h264deblock PRIVATE -O3 -march=armv8-a+simd)
# Cycle 9 — H.264 luma qpel mc20 NEON M3 baseline.
add_executable(bench_neon_h264qpel_mc20
tests/bench_neon_h264qpel_mc20.c
tests/h264_qpel8_mc20_ref.c
${FFASM_H264QPEL_SOURCES}
)
target_compile_options(bench_neon_h264qpel_mc20 PRIVATE -O3 -march=armv8-a+simd)
add_executable(bench_neon_idct
tests/bench_neon_idct.c
tests/vp9_idct8_ref.c
${FFASM_SOURCES}
)
target_compile_options(bench_neon_idct PRIVATE -O3 -march=armv8-a+simd)
# Cycle 2 — VP9 loop filter NEON baseline.
add_executable(bench_neon_lpf
tests/bench_neon_lpf.c
tests/vp9_lpf_ref.c
${FFASM_LPF_SOURCES}
)
target_compile_options(bench_neon_lpf PRIVATE -O3 -march=armv8-a+simd)
# Cycle 3 — VP9 MC interpolation NEON baseline.
add_executable(bench_neon_mc
tests/bench_neon_mc.c
tests/vp9_mc_ref.c
${FFASM_MC_SOURCES}
${FFC_MC_SOURCES}
)
target_compile_options(bench_neon_mc PRIVATE -O3 -march=armv8-a+simd)
# Cycle 4 — VP9 LPF wd=8 NEON baseline (same vendored .S as cycle 2).
add_executable(bench_neon_lpf8
tests/bench_neon_lpf8.c
tests/vp9_lpf8_ref.c
${FFASM_LPF_SOURCES}
)
target_compile_options(bench_neon_lpf8 PRIVATE -O3 -march=armv8-a+simd)
# Cycle 5 — AV1 CDEF NEON baseline (dav1d snapshot).
add_executable(bench_neon_cdef
tests/bench_neon_cdef.c
tests/cdef_ref.c
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
target_compile_options(bench_neon_cdef PRIVATE -O3 -march=armv8-a+simd)
# bench_neon_idct doesn't need vulkan/drm — pure CPU baseline.
# ---- Vulkan dispatch-overhead microbench (next chunk) ----------------------
# Stub: written in a follow-up step. Toggle ON with -DDAEDALUS_BUILD_VULKAN=ON
# once tests/bench_vulkan_dispatch.c exists.
option(DAEDALUS_BUILD_VULKAN "Build Vulkan compute-dispatch microbench" ON)
if (DAEDALUS_BUILD_VULKAN)
find_package(Vulkan REQUIRED)
# Compile GLSL compute shaders to SPIR-V via glslangValidator.
# The binary loads them at runtime from the build dir (cwd-relative).
find_program(GLSLANG_VALIDATOR
NAMES glslangValidator glslang
REQUIRED)
set(NOOP_SPV ${CMAKE_BINARY_DIR}/noop.spv)
add_custom_command(
OUTPUT ${NOOP_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V -o ${NOOP_SPV}
${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
DEPENDS ${CMAKE_SOURCE_DIR}/tests/shaders/noop.comp
COMMENT "glslang: noop.comp -> noop.spv"
VERBATIM
)
set(IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_idct8.spv)
add_custom_command(
OUTPUT ${IDCT8_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${IDCT8_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_idct8.comp
COMMENT "glslang: v3d_idct8.comp -> v3d_idct8.spv"
VERBATIM
)
set(LPF_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_4_8.spv)
add_custom_command(
OUTPUT ${LPF_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${LPF_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_4_8.comp
COMMENT "glslang: v3d_lpf_h_4_8.comp -> v3d_lpf_h_4_8.spv"
VERBATIM
)
set(MC_SPV ${CMAKE_BINARY_DIR}/v3d_mc_8h.spv)
add_custom_command(
OUTPUT ${MC_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${MC_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_mc_8h.comp
COMMENT "glslang: v3d_mc_8h.comp -> v3d_mc_8h.spv"
VERBATIM
)
set(LPF8_SPV ${CMAKE_BINARY_DIR}/v3d_lpf_h_8_8.spv)
add_custom_command(
OUTPUT ${LPF8_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${LPF8_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_lpf_h_8_8.comp
COMMENT "glslang: v3d_lpf_h_8_8.comp -> v3d_lpf_h_8_8.spv"
VERBATIM
)
set(CDEF_SPV ${CMAKE_BINARY_DIR}/v3d_cdef.spv)
add_custom_command(
OUTPUT ${CDEF_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${CDEF_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_cdef.comp
COMMENT "glslang: v3d_cdef.comp -> v3d_cdef.spv"
VERBATIM
)
set(H264DEBLOCK_SPV ${CMAKE_BINARY_DIR}/v3d_h264deblock.spv)
add_custom_command(
OUTPUT ${H264DEBLOCK_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${H264DEBLOCK_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264deblock.comp
COMMENT "glslang: v3d_h264deblock.comp -> v3d_h264deblock.spv"
VERBATIM
)
set(H264_IDCT4_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct4.spv)
add_custom_command(
OUTPUT ${H264_IDCT4_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${H264_IDCT4_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct4.comp
COMMENT "glslang: v3d_h264_idct4.comp -> v3d_h264_idct4.spv"
VERBATIM
)
set(H264_IDCT8_SPV ${CMAKE_BINARY_DIR}/v3d_h264_idct8.spv)
add_custom_command(
OUTPUT ${H264_IDCT8_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${H264_IDCT8_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_idct8.comp
COMMENT "glslang: v3d_h264_idct8.comp -> v3d_h264_idct8.spv"
VERBATIM
)
set(H264_QPEL_MC20_SPV ${CMAKE_BINARY_DIR}/v3d_h264_qpel_mc20.spv)
add_custom_command(
OUTPUT ${H264_QPEL_MC20_SPV}
COMMAND ${GLSLANG_VALIDATOR} -V --target-env vulkan1.3
-o ${H264_QPEL_MC20_SPV}
${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
DEPENDS ${CMAKE_SOURCE_DIR}/src/v3d_h264_qpel_mc20.comp
COMMENT "glslang: v3d_h264_qpel_mc20.comp -> v3d_h264_qpel_mc20.spv"
VERBATIM
)
add_custom_target(daedalus_shaders ALL DEPENDS ${NOOP_SPV} ${IDCT8_SPV} ${LPF_SPV} ${MC_SPV} ${LPF8_SPV} ${CDEF_SPV} ${H264DEBLOCK_SPV} ${H264_IDCT4_SPV} ${H264_IDCT8_SPV} ${H264_QPEL_MC20_SPV})
# v3d_runner — reusable Vulkan plumbing.
add_library(v3d_runner STATIC src/v3d_runner.c)
target_include_directories(v3d_runner PUBLIC src)
target_link_libraries(v3d_runner PUBLIC Vulkan::Vulkan)
target_compile_options(v3d_runner PRIVATE -O2)
add_executable(bench_vulkan_dispatch tests/bench_vulkan_dispatch.c)
add_dependencies(bench_vulkan_dispatch daedalus_shaders)
target_link_libraries(bench_vulkan_dispatch PRIVATE Vulkan::Vulkan)
target_compile_options(bench_vulkan_dispatch PRIVATE -O2)
add_executable(bench_v3d_idct
tests/bench_v3d_idct.c
tests/vp9_idct8_ref.c
)
add_dependencies(bench_v3d_idct daedalus_shaders)
target_link_libraries(bench_v3d_idct PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_idct PRIVATE -O2)
# Cycle 2 — QPU LPF bench.
add_executable(bench_v3d_lpf
tests/bench_v3d_lpf.c
tests/vp9_lpf_ref.c
)
add_dependencies(bench_v3d_lpf daedalus_shaders)
target_link_libraries(bench_v3d_lpf PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_lpf PRIVATE -O2)
# Cycle 3 — QPU MC bench.
add_executable(bench_v3d_mc
tests/bench_v3d_mc.c
tests/vp9_mc_ref.c
)
add_dependencies(bench_v3d_mc daedalus_shaders)
target_link_libraries(bench_v3d_mc PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_mc PRIVATE -O2)
# Cycle 4 — QPU LPF wd=8 bench.
add_executable(bench_v3d_lpf8
tests/bench_v3d_lpf8.c
tests/vp9_lpf8_ref.c
)
add_dependencies(bench_v3d_lpf8 daedalus_shaders)
target_link_libraries(bench_v3d_lpf8 PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_lpf8 PRIVATE -O2)
# Cycle 5 — QPU CDEF bench (3-way M1 against NEON + C ref).
add_executable(bench_v3d_cdef
tests/bench_v3d_cdef.c
tests/cdef_ref.c
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
add_dependencies(bench_v3d_cdef daedalus_shaders)
target_link_libraries(bench_v3d_cdef PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_cdef PRIVATE -O2)
# Cycle 8 — QPU H.264 deblock bench (3-way).
add_executable(bench_v3d_h264deblock
tests/bench_v3d_h264deblock.c
tests/h264_deblock_ref.c
${FFASM_H264DSP_SOURCES}
)
add_dependencies(bench_v3d_h264deblock daedalus_shaders)
target_link_libraries(bench_v3d_h264deblock PRIVATE v3d_runner Vulkan::Vulkan)
target_compile_options(bench_v3d_h264deblock PRIVATE -O2)
endif()
# ---- Phase 8 — public C API library + smoke test ---------------------------
add_library(daedalus_core STATIC
src/daedalus_core.c
src/v3d_runner.c
${FFASM_SOURCES}
${FFASM_LPF_SOURCES}
${FFASM_MC_SOURCES}
${FFC_MC_SOURCES}
${FFASM_H264IDCT_SOURCES}
${FFASM_H264DSP_SOURCES}
${FFASM_H264QPEL_SOURCES}
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
target_include_directories(daedalus_core PUBLIC include)
target_include_directories(daedalus_core PRIVATE src)
target_link_libraries(daedalus_core PUBLIC Vulkan::Vulkan)
target_compile_options(daedalus_core PRIVATE -O2)
if (DAEDALUS_BUILD_VULKAN)
add_dependencies(daedalus_core daedalus_shaders)
endif()
# ---- Install rules for sibling consumers (Phase 8 V4L2 daemon, etc.) -------
#
# Installs:
# - libdaedalus_core.a → ${CMAKE_INSTALL_LIBDIR}
# - include/daedalus.h → ${CMAKE_INSTALL_INCLUDEDIR}
# - daedalus-fourier.pc → ${CMAKE_INSTALL_LIBDIR}/pkgconfig
# - V3D SPIR-V shaders → ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
# (only when DAEDALUS_BUILD_VULKAN is ON; consumers using
# daedalus_ctx_create_no_qpu() don't need them)
#
# pkg-config tells consumers what to link; the static-archive
# dependencies (Vulkan, pthread, and the vendored asm symbols)
# are surfaced through Requires.private + Libs.private so a
# consumer doing `pkg-config --libs daedalus-fourier` gets the
# right transitive link line.
include(GNUInstallDirs)
install(TARGETS daedalus_core
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
)
install(FILES include/daedalus.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)
if (DAEDALUS_BUILD_VULKAN)
install(FILES
${NOOP_SPV}
${IDCT8_SPV}
${LPF_SPV}
${MC_SPV}
${LPF8_SPV}
${CDEF_SPV}
${H264DEBLOCK_SPV}
${H264_IDCT4_SPV}
${H264_IDCT8_SPV}
${H264_QPEL_MC20_SPV}
DESTINATION ${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
)
endif()
# pkg-config file. Vulkan goes in Requires.private (consumer's
# pkg-config call gets it via --static). pthread + dl are needed
# by the static archive's runtime helpers.
#
# `prefix` is derived from ${pcfiledir} so the .pc is relocatable:
# pkg-config substitutes ${pcfiledir} with the directory holding the
# .pc at lookup time, and the relative path from
# <prefix>/<libdir>/pkgconfig back to <prefix> tells pkg-config the
# install prefix without baking it in. This is why
# `cmake --install build --prefix /foo` produces a .pc that correctly
# resolves `prefix=/foo` instead of baking whatever CMAKE_INSTALL_PREFIX
# was at *configure* time (default /usr/local). DESTDIR-staged
# installs work too: at runtime pkg-config sees the .pc at its real
# install path and computes the right prefix.
#
# Relative-path depth is computed from CMAKE_INSTALL_LIBDIR (and
# whatever multiarch tuple GNUInstallDirs adds) so Debian-style
# `lib/aarch64-linux-gnu/pkgconfig/...` resolves with the right number
# of `..` components. Layouts where libdir is *not* under prefix are
# not supported by this scheme; if a packager overrides libdir to an
# absolute path the relative-path machinery falls back to the absolute
# value (CMake's file(RELATIVE_PATH) prepends `..` until they meet),
# which is also relocatable but no longer prefix-agnostic.
file(RELATIVE_PATH PKGCONFIG_PCDIR_TO_PREFIX
"${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}/pkgconfig"
"${CMAKE_INSTALL_PREFIX}")
set(PKGCONFIG_OUT ${CMAKE_CURRENT_BINARY_DIR}/daedalus-fourier.pc)
file(WRITE ${PKGCONFIG_OUT}
"prefix=\${pcfiledir}/${PKGCONFIG_PCDIR_TO_PREFIX}
exec_prefix=\${prefix}
libdir=\${prefix}/${CMAKE_INSTALL_LIBDIR}
includedir=\${prefix}/${CMAKE_INSTALL_INCLUDEDIR}
shadersdir=\${prefix}/${CMAKE_INSTALL_DATADIR}/daedalus-fourier/shaders
Name: daedalus-fourier
Description: VP9/AV1/H.264 back-end kernels for VC VII (V3D 7.1) + ARM NEON
Version: 0.1.0
Libs: -L\${libdir} -ldaedalus_core
Libs.private: -lpthread -ldl -lm
Requires.private: vulkan
Cflags: -I\${includedir}
")
install(FILES ${PKGCONFIG_OUT}
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig
)
add_executable(test_api_idct
tests/test_api_idct.c
tests/vp9_idct8_ref.c
)
target_link_libraries(test_api_idct PRIVATE daedalus_core)
target_compile_options(test_api_idct PRIVATE -O2)
add_executable(test_api_lpf
tests/test_api_lpf.c
tests/vp9_lpf_ref.c
tests/vp9_lpf8_ref.c
)
target_link_libraries(test_api_lpf PRIVATE daedalus_core)
target_compile_options(test_api_lpf PRIVATE -O2)
add_executable(test_api_h264
tests/test_api_h264.c
tests/h264_idct4_ref.c
tests/h264_idct8_ref.c
tests/h264_deblock_ref.c
tests/h264_h_loop_filter_luma_ref.c
tests/h264_chroma_loop_filter_ref.c
tests/h264_intra_loop_filter_ref.c
tests/h264_qpel8_mc20_ref.c
tests/h264_qpel8_mc02_ref.c
tests/h264_qpel8_mc22_ref.c
tests/h264_qpel8_quarter_axis_ref.c
tests/h264_qpel8_diag_ref.c
tests/h264_qpel8_avg_anchors_ref.c
tests/h264_qpel8_avg_rest_ref.c
)
target_link_libraries(test_api_h264 PRIVATE daedalus_core)
target_compile_options(test_api_h264 PRIVATE -O2)
add_executable(test_api_opportunistic_qpu tests/test_api_opportunistic_qpu.c)
target_link_libraries(test_api_opportunistic_qpu PRIVATE daedalus_core)
target_compile_options(test_api_opportunistic_qpu PRIVATE -O2)
# H.264 Intra_4x4 luma prediction (9 modes) — reference + tests.
# Pure CPU + spec-derived; no daedalus_core dependency yet (this is
# the bit-exact gate for the eventual shader / dispatch wiring).
add_executable(test_intra_pred_4x4
tests/test_intra_pred_4x4.c
tests/h264_intra_pred_4x4_ref.c
)
target_compile_options(test_intra_pred_4x4 PRIVATE -O2)
# H.264 Intra_16x16 luma prediction (4 modes: V, H, DC, Plane) —
# reference + tests. Same spec-gate role as the 4x4 sibling.
add_executable(test_intra_pred_16x16
tests/test_intra_pred_16x16.c
tests/h264_intra_pred_16x16_ref.c
)
target_compile_options(test_intra_pred_16x16 PRIVATE -O2)
# H.264 Intra_8x8 chroma prediction (4 modes: DC, H, V, Plane) —
# reference + tests. DC is per-quadrant (asymmetric); Plane uses
# slope coefficient 34 instead of luma's 5.
add_executable(test_intra_pred_chroma8x8
tests/test_intra_pred_chroma8x8.c
tests/h264_intra_pred_chroma8x8_ref.c
)
target_compile_options(test_intra_pred_chroma8x8 PRIVATE -O2)
# H.264 Intra_8x8 luma prediction (High profile, 9 modes + 1-2-1
# reference-sample pre-filter). This PR ships the pre-filter + the
# 3 simple modes (V, H, DC); the 6 directional modes follow.
add_executable(test_intra_pred_8x8_luma
tests/test_intra_pred_8x8_luma.c
tests/h264_intra_pred_8x8_luma_ref.c
)
target_compile_options(test_intra_pred_8x8_luma PRIVATE -O2)
add_executable(bench_pool_overhead tests/bench_pool_overhead.c)
target_link_libraries(bench_pool_overhead PRIVATE daedalus_core)
target_compile_options(bench_pool_overhead PRIVATE -O2)
if (DAEDALUS_BUILD_VULKAN)
# (re-open the conditional so the closing endif() below balances)
# M4 — concurrent CPU(NEON) + QPU bench. Links the FFmpeg NEON
# snapshot so we can run real NEON kernels on pinned CPU cores
# while the QPU runs its dispatch loop concurrently.
add_executable(bench_concurrent
tests/bench_concurrent.c
${FFASM_SOURCES}
)
add_dependencies(bench_concurrent daedalus_shaders)
target_link_libraries(bench_concurrent PRIVATE v3d_runner Vulkan::Vulkan pthread)
target_compile_options(bench_concurrent PRIVATE -O3 -march=armv8-a+simd)
# Cycle 2 M4'' — concurrent LPF.
add_executable(bench_concurrent_lpf
tests/bench_concurrent_lpf.c
${FFASM_LPF_SOURCES}
)
add_dependencies(bench_concurrent_lpf daedalus_shaders)
target_link_libraries(bench_concurrent_lpf PRIVATE v3d_runner Vulkan::Vulkan pthread)
target_compile_options(bench_concurrent_lpf PRIVATE -O3 -march=armv8-a+simd)
# Cycle 3 M4''' — concurrent MC.
add_executable(bench_concurrent_mc
tests/bench_concurrent_mc.c
${FFASM_MC_SOURCES}
${FFC_MC_SOURCES}
)
add_dependencies(bench_concurrent_mc daedalus_shaders)
target_link_libraries(bench_concurrent_mc PRIVATE v3d_runner Vulkan::Vulkan pthread)
target_compile_options(bench_concurrent_mc PRIVATE -O3 -march=armv8-a+simd)
# Cycle 4 M4'''' — concurrent LPF wd=8.
add_executable(bench_concurrent_lpf8
tests/bench_concurrent_lpf8.c
${FFASM_LPF_SOURCES}
)
add_dependencies(bench_concurrent_lpf8 daedalus_shaders)
target_link_libraries(bench_concurrent_lpf8 PRIVATE v3d_runner Vulkan::Vulkan pthread)
target_compile_options(bench_concurrent_lpf8 PRIVATE -O3 -march=armv8-a+simd)
# Issue 003 — mixed-kernel M4 bench (NEON-N kernel A + QPU kernel B).
# Links all FFmpeg + dav1d NEON sources we have (cycles 1-8).
add_executable(bench_concurrent_mixed
tests/bench_concurrent_mixed.c
${FFASM_SOURCES}
${FFASM_LPF_SOURCES}
${FFASM_MC_SOURCES}
${FFC_MC_SOURCES}
${FFASM_H264DSP_SOURCES}
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
add_dependencies(bench_concurrent_mixed daedalus_shaders)
target_link_libraries(bench_concurrent_mixed PRIVATE v3d_runner Vulkan::Vulkan pthread)
target_compile_options(bench_concurrent_mixed PRIVATE -O3 -march=armv8-a+simd)
endif()
# ---- Summary ----------------------------------------------------------------
message(STATUS "daedalus-fourier build configured for ${CMAKE_SYSTEM_PROCESSOR}")
message(STATUS " FFmpeg snapshot: ${FFSNAP}")
message(STATUS " Build type: ${CMAKE_BUILD_TYPE}")
message(STATUS " Targets: bench_neon_idct"
"$<$<BOOL:${DAEDALUS_BUILD_VULKAN}>:; bench_vulkan_dispatch>")