045553ccaf
The existing 320x240 bit-exact test (300 MBs) is the fast inner-loop
gate, but it's small enough that index arithmetic bugs that only
surface above 16-bit boundaries would slip through. This adds a
second ctest entry that runs the same binary against a full coded
1080p frame (1920x1088, 8160 MBs):
- 4080 MBs at transform_8x8=0 → 65,280 luma 4x4 blocks
- 4080 MBs at transform_8x8=1 → 16,320 luma 8x8 blocks
- 65,280 chroma 4x4 blocks (32,640 Cb + 32,640 Cr)
- 146,880 IDCTs total across 3 separate luma_4x4 + luma_8x8 +
chroma dispatches; bit-exact compared against the in-test C
reference for each.
No code change to the test binary itself — it already accepted
width/height as argv[1..2]. Just a second `add_test` in
CMakeLists.txt that invokes it with `1920 1088`.
Coverage rationale:
- dst_off is uint32_t in daedalus_h264_block_meta; at 1920x1088
the max offset is ~2.1 MiB, still well within uint32 range, but
the test exercises the largest stride math we'll see in production
(per-MB chroma offset = mb_y*8 + cb_plane_size = up to 1.06 MiB).
- flush_frame partitions 8160 MBs by transform mode → exercises the
bi4 == 4080*16 and bi8 == 4080*4 accumulators at frame scale.
- Verifies the 1088 coded height handling (the displayed 1080 +
8 cropped rows trap that catches Pi 5 H.264 integrations).
Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):
$ ctest --test-dir build --output-on-failure
Start 1: smoke
1/3 Test #1: smoke ............................ Passed 0.09 sec
Start 2: idct_bitexact
2/3 Test #2: idct_bitexact .................... Passed 0.03 sec
Start 3: idct_bitexact_1080p
3/3 Test #3: idct_bitexact_1080p .............. Passed 0.06 sec
100% tests passed, 0 tests failed out of 3
$ ./build/test_idct_bitexact 1920 1088
test_idct_bitexact: 1920x1088 (8160 MBs), seed=0xfeedface5a5a5a5a
MB mix: 4080 4x4 MBs, 4080 8x8 MBs
Y bytes total: 2088960
Y bytes diff: 0 (0.0000%)
Cb bytes total: 522240 diff: 0 (0.0000%)
Cr bytes total: 522240 diff: 0 (0.0000%)
BIT-EXACT PASS (Y + Cb + Cr)
(0.06 s when shader pool warm; ~0.2 s cold via the standalone
invocation above — the 1080p run happens after smoke, so pool is
already primed by the time it runs in ctest.)
142 lines
5.2 KiB
CMake
142 lines
5.2 KiB
CMake
# SPDX-License-Identifier: BSD-2-Clause
|
|
#
|
|
# daedalus-decoder — frame-level GPU H.264 decoder for V3D7 (Pi 5).
|
|
# Phase 1 scaffold; see DESIGN.md for architecture.
|
|
#
|
|
# Build dependencies:
|
|
# - daedalus-fourier ≥ 0.1.0 (kernel pack, V3D primitives + recipe layer)
|
|
# resolved via pkg-config; install via the daedalus-fourier upstream
|
|
# `cmake --install` rule (PR #5 made the .pc relocatable, so any
|
|
# install prefix works as long as $PKG_CONFIG_PATH is set).
|
|
# - Vulkan headers + libvulkan (pulled in transitively via
|
|
# daedalus-fourier, listed here explicitly for the link order).
|
|
#
|
|
# Build:
|
|
# cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
|
|
# cmake --build build
|
|
# ctest --test-dir build
|
|
|
|
cmake_minimum_required(VERSION 3.20)
|
|
project(daedalus-decoder
|
|
VERSION 0.0.1
|
|
DESCRIPTION "Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7"
|
|
LANGUAGES C)
|
|
|
|
set(CMAKE_C_STANDARD 11)
|
|
set(CMAKE_C_STANDARD_REQUIRED ON)
|
|
set(CMAKE_C_EXTENSIONS OFF)
|
|
|
|
if(NOT CMAKE_BUILD_TYPE)
|
|
set(CMAKE_BUILD_TYPE Release)
|
|
endif()
|
|
|
|
# Pi 5 is the only supported target. Other aarch64 SoCs (Pi 4 V3D4,
|
|
# RK3588 Mali, …) might work but would need explicit substrate +
|
|
# shader-pack validation per the daedalus-fourier architecture
|
|
# backlog. Don't pretend to support what we haven't validated.
|
|
if(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64")
|
|
message(WARNING
|
|
"daedalus-decoder is designed for aarch64 (Pi 5 BCM2712 / V3D7). "
|
|
"Build will proceed but is unlikely to function.")
|
|
endif()
|
|
|
|
add_compile_options(-Wall -Wextra -Wno-unused-parameter)
|
|
|
|
# ---- Dependencies --------------------------------------------------
|
|
|
|
find_package(PkgConfig REQUIRED)
|
|
|
|
# daedalus-fourier — find_package via pkg-config per the Phase 1
|
|
# decision §9.6. Minimum version 0.1.0 (the cycle 6-9 shaders + pool
|
|
# + recipe-flip baseline). PKG_CONFIG_PATH should point at the
|
|
# directory holding daedalus-fourier.pc (e.g. /usr/local/lib/pkgconfig
|
|
# or a custom install prefix).
|
|
pkg_check_modules(DAEDALUS_FOURIER REQUIRED daedalus-fourier>=0.1.0)
|
|
|
|
# Vulkan — daedalus-fourier already depends on this; we add it
|
|
# explicitly so the link order stays correct (daedalus-fourier static
|
|
# archive contains undefined vk* symbols that the loader resolves).
|
|
find_package(Vulkan REQUIRED)
|
|
|
|
# ---- Version string baked into the library ------------------------
|
|
|
|
# git rev tagged onto the version string for traceability; degrades
|
|
# gracefully to bare semver if git isn't available.
|
|
execute_process(
|
|
COMMAND git -C ${CMAKE_CURRENT_SOURCE_DIR} rev-parse --short=7 HEAD
|
|
OUTPUT_VARIABLE DAEDALUS_DECODER_GITREV
|
|
OUTPUT_STRIP_TRAILING_WHITESPACE
|
|
ERROR_QUIET)
|
|
if(DAEDALUS_DECODER_GITREV)
|
|
set(DAEDALUS_DECODER_VERSION "${PROJECT_VERSION}+g${DAEDALUS_DECODER_GITREV}")
|
|
else()
|
|
set(DAEDALUS_DECODER_VERSION "${PROJECT_VERSION}")
|
|
endif()
|
|
message(STATUS "daedalus-decoder version: ${DAEDALUS_DECODER_VERSION}")
|
|
|
|
# ---- Library ------------------------------------------------------
|
|
|
|
add_library(daedalus_decoder STATIC
|
|
src/daedalus_decoder.c
|
|
)
|
|
target_include_directories(daedalus_decoder
|
|
PUBLIC
|
|
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
|
|
$<INSTALL_INTERFACE:include>
|
|
PRIVATE
|
|
src
|
|
${DAEDALUS_FOURIER_INCLUDE_DIRS}
|
|
)
|
|
target_link_directories(daedalus_decoder
|
|
PUBLIC
|
|
${DAEDALUS_FOURIER_LIBRARY_DIRS}
|
|
)
|
|
target_link_libraries(daedalus_decoder
|
|
PUBLIC
|
|
# Order matters: daedalus-fourier static archive references
|
|
# vulkan symbols; the loader needs daedalus-fourier first then
|
|
# vulkan to resolve them.
|
|
${DAEDALUS_FOURIER_LIBRARIES}
|
|
Vulkan::Vulkan
|
|
)
|
|
target_compile_definitions(daedalus_decoder
|
|
PRIVATE
|
|
DAEDALUS_DECODER_VERSION="${DAEDALUS_DECODER_VERSION}"
|
|
)
|
|
target_compile_options(daedalus_decoder PRIVATE -O2)
|
|
|
|
# ---- Smoke test ---------------------------------------------------
|
|
|
|
enable_testing()
|
|
|
|
add_executable(test_smoke tests/test_smoke.c)
|
|
target_link_libraries(test_smoke PRIVATE daedalus_decoder)
|
|
target_compile_options(test_smoke PRIVATE -O2)
|
|
add_test(NAME smoke COMMAND test_smoke)
|
|
|
|
add_executable(test_idct_bitexact tests/test_idct_bitexact.c)
|
|
target_link_libraries(test_idct_bitexact PRIVATE daedalus_decoder)
|
|
target_compile_options(test_idct_bitexact PRIVATE -O2)
|
|
|
|
# 320x240 QVGA — fast inner-loop test (300 MBs, sub-second).
|
|
add_test(NAME idct_bitexact COMMAND test_idct_bitexact)
|
|
|
|
# 1920x1088 1080p — deployment-scale test (8160 MBs, ~0.25 s on hertz).
|
|
# Validates the per-MB block index + pixel offset math at full coded
|
|
# height (1088, not 1080 — see daedalus_decoder.h on H.264 coded vs
|
|
# displayed dims). Cheap enough to run unconditionally; if it ever
|
|
# gets slow we'll split into a CTest LABEL for opt-in.
|
|
add_test(NAME idct_bitexact_1080p COMMAND test_idct_bitexact 1920 1088)
|
|
|
|
# ---- Install ------------------------------------------------------
|
|
#
|
|
# Library + public header. Stage 2/3 will add a pkg-config file and
|
|
# CMake config exports once the API stabilises; pre-0.1 the scaffold
|
|
# install just gives the static archive a home.
|
|
|
|
include(GNUInstallDirs)
|
|
install(TARGETS daedalus_decoder
|
|
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
|
|
install(FILES include/daedalus_decoder.h
|
|
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
|