Files
daedalus-decoder/CMakeLists.txt
T
claude-noether 352373a9be phase1: add IDCT-layer throughput benchmark (bench_flush_frame)
Establishes a steady-state baseline for the Path C frame-level
dispatch architecture.  Times daedalus_decoder_flush_frame at a
configurable coded resolution with random coefficients, reporting
per-frame latency stats and fps.

NOT a ctest — produces wall-time numbers, doesn't pass/fail.  Run
manually:

  ./build/bench_flush_frame [width] [height] [iters] [warmup]

Defaults to 1920x1088, 100 iters, 5-frame warmup (excludes shader-
pipeline-pool materialisation cost from the timing average).

Measured on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):

  $ ./build/bench_flush_frame
  bench_flush_frame: 1920x1088 (8160 MBs), 100 iters (5 warmup)
  ctx has_qpu=1

  flush_frame (post-warmup, 95 samples):
    min    =   9.699 ms
    median =   9.905 ms
    mean   =  10.014 ms
    p99    =  12.011 ms
    max    =  12.011 ms

  throughput (steady-state, IDCT only — NO intra/MC/deblock):
    mean   = 99.9 fps
    median = 101.0 fps
    target = 30.0 fps (project_30fps_floor_is_fine.md)
    status = MEETS target (with 3.4x headroom for intra/MC/deblock)

Interpretation:

  Per-frame work measured:
    - CPU partition + flat-pack of 8160 MBs into luma_4x4, luma_8x8,
      chroma meta+coeffs buffers
    - 3 GPU dispatches (luma 4x4, luma 8x8, chroma 4x4) with their
      respective vkQueueSubmit + vkQueueWaitIdle round-trips
    - CPU NV12 interleave (chroma planar → UV)
    - calloc/free for scratch_y / coeffs / meta buffers

  Doing all of that in ~10 ms means the architecture pays back the
  Path C design bet: ONE Vulkan submit per dispatch (cycle 8b buffer
  pool keeps amortised cost low) is the right granularity.  The
  per-block dispatch fail-mode that motivated Path C (~6500 ms/frame
  from the libavcodec substitution arc) is 600x slower than this.

  3.4x headroom from 101 fps → 30 fps target gives a budget of
  ~23 ms/frame for the remaining decode work (intra prediction
  wavefront, MC, deblock).  Each of those needs to fit inside that
  budget at steady state for the end-to-end decoder to hit 30 fps
  at 1080p.

  p99 latency 12 ms means even worst-case frames clear the 33-ms
  deadline (30 fps) easily; tail latency isn't a concern at this
  stage.

What this number does NOT validate:
  - Intra prediction shader dispatch overhead (likely per-anti-diagonal
    or per-MB-wavefront; dispatch count goes up)
  - MC dispatch (per qpel-block; up to several per MB)
  - Deblock dispatch (4 edges per MB; per-edge meta entries)
  - Real H.264 streams (random coeffs ≠ real residuals; perf shape
    of memory access is content-independent, but cache pressure may
    differ at scale).
2026-05-24 22:53:49 +02:00

153 lines
5.6 KiB
CMake

# SPDX-License-Identifier: BSD-2-Clause
#
# daedalus-decoder — frame-level GPU H.264 decoder for V3D7 (Pi 5).
# Phase 1 scaffold; see DESIGN.md for architecture.
#
# Build dependencies:
# - daedalus-fourier ≥ 0.1.0 (kernel pack, V3D primitives + recipe layer)
# resolved via pkg-config; install via the daedalus-fourier upstream
# `cmake --install` rule (PR #5 made the .pc relocatable, so any
# install prefix works as long as $PKG_CONFIG_PATH is set).
# - Vulkan headers + libvulkan (pulled in transitively via
# daedalus-fourier, listed here explicitly for the link order).
#
# Build:
# cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
# cmake --build build
# ctest --test-dir build
cmake_minimum_required(VERSION 3.20)
project(daedalus-decoder
VERSION 0.0.1
DESCRIPTION "Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7"
LANGUAGES C)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
set(CMAKE_C_EXTENSIONS OFF)
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
# Pi 5 is the only supported target. Other aarch64 SoCs (Pi 4 V3D4,
# RK3588 Mali, …) might work but would need explicit substrate +
# shader-pack validation per the daedalus-fourier architecture
# backlog. Don't pretend to support what we haven't validated.
if(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
message(WARNING
"daedalus-decoder is designed for aarch64 (Pi 5 BCM2712 / V3D7). "
"Build will proceed but is unlikely to function.")
endif()
add_compile_options(-Wall -Wextra -Wno-unused-parameter)
# ---- Dependencies --------------------------------------------------
find_package(PkgConfig REQUIRED)
# daedalus-fourier — find_package via pkg-config per the Phase 1
# decision §9.6. Minimum version 0.1.0 (the cycle 6-9 shaders + pool
# + recipe-flip baseline). PKG_CONFIG_PATH should point at the
# directory holding daedalus-fourier.pc (e.g. /usr/local/lib/pkgconfig
# or a custom install prefix).
pkg_check_modules(DAEDALUS_FOURIER REQUIRED daedalus-fourier>=0.1.0)
# Vulkan — daedalus-fourier already depends on this; we add it
# explicitly so the link order stays correct (daedalus-fourier static
# archive contains undefined vk* symbols that the loader resolves).
find_package(Vulkan REQUIRED)
# ---- Version string baked into the library ------------------------
# git rev tagged onto the version string for traceability; degrades
# gracefully to bare semver if git isn't available.
execute_process(
COMMAND git -C ${CMAKE_CURRENT_SOURCE_DIR} rev-parse --short=7 HEAD
OUTPUT_VARIABLE DAEDALUS_DECODER_GITREV
OUTPUT_STRIP_TRAILING_WHITESPACE
ERROR_QUIET)
if(DAEDALUS_DECODER_GITREV)
set(DAEDALUS_DECODER_VERSION "${PROJECT_VERSION}+g${DAEDALUS_DECODER_GITREV}")
else()
set(DAEDALUS_DECODER_VERSION "${PROJECT_VERSION}")
endif()
message(STATUS "daedalus-decoder version: ${DAEDALUS_DECODER_VERSION}")
# ---- Library ------------------------------------------------------
add_library(daedalus_decoder STATIC
src/daedalus_decoder.c
)
target_include_directories(daedalus_decoder
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>
PRIVATE
src
${DAEDALUS_FOURIER_INCLUDE_DIRS}
)
target_link_directories(daedalus_decoder
PUBLIC
${DAEDALUS_FOURIER_LIBRARY_DIRS}
)
target_link_libraries(daedalus_decoder
PUBLIC
# Order matters: daedalus-fourier static archive references
# vulkan symbols; the loader needs daedalus-fourier first then
# vulkan to resolve them.
${DAEDALUS_FOURIER_LIBRARIES}
Vulkan::Vulkan
)
target_compile_definitions(daedalus_decoder
PRIVATE
DAEDALUS_DECODER_VERSION="${DAEDALUS_DECODER_VERSION}"
)
target_compile_options(daedalus_decoder PRIVATE -O2)
# ---- Smoke test ---------------------------------------------------
enable_testing()
add_executable(test_smoke tests/test_smoke.c)
target_link_libraries(test_smoke PRIVATE daedalus_decoder)
target_compile_options(test_smoke PRIVATE -O2)
add_test(NAME smoke COMMAND test_smoke)
add_executable(test_idct_bitexact tests/test_idct_bitexact.c)
target_link_libraries(test_idct_bitexact PRIVATE daedalus_decoder)
target_compile_options(test_idct_bitexact PRIVATE -O2)
# 320x240 QVGA — fast inner-loop test (300 MBs, sub-second).
add_test(NAME idct_bitexact COMMAND test_idct_bitexact)
# 1920x1088 1080p — deployment-scale test (8160 MBs, ~0.25 s on hertz).
# Validates the per-MB block index + pixel offset math at full coded
# height (1088, not 1080 — see daedalus_decoder.h on H.264 coded vs
# displayed dims). Cheap enough to run unconditionally; if it ever
# gets slow we'll split into a CTest LABEL for opt-in.
add_test(NAME idct_bitexact_1080p COMMAND test_idct_bitexact 1920 1088)
# ---- Benchmarks (not gated by ctest) ------------------------------
#
# Build-time only; user runs them by hand when checking perf. Adding
# them as ctest would make every CI run slow and the numbers would
# get drowned in pass/fail noise. See the header of each .c for what
# they measure.
add_executable(bench_flush_frame tests/bench_flush_frame.c)
target_link_libraries(bench_flush_frame PRIVATE daedalus_decoder)
target_compile_options(bench_flush_frame PRIVATE -O2)
# ---- Install ------------------------------------------------------
#
# Library + public header. Stage 2/3 will add a pkg-config file and
# CMake config exports once the API stabilises; pre-0.1 the scaffold
# install just gives the static archive a home.
include(GNUInstallDirs)
install(TARGETS daedalus_decoder
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
install(FILES include/daedalus_decoder.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})