phase1: substrate selector API + cross-substrate bit-exact ctest
Surfaces daedalus-fourier's substrate-override capability at the
decoder boundary. Lets tests run on CPU-only hosts (CI runners,
x86 dev boxes) AND cross-checks V3D shader output against NEON
reference on hosts that have both.
API additions (pre-0.1 ABI, additive):
- daedalus_decoder_substrate enum { AUTO, CPU, QPU }
(mirrors daedalus_substrate; isolated for ABI reasons).
- daedalus_decoder_set_substrate(dec, sub) setter, same
mid-frame-change restrictions as set_output_format.
- Default remains AUTO — the only sensible choice for production.
Internal:
- flush_frame now calls daedalus_dispatch_h264_idct{4,8} with an
explicit substrate instead of daedalus_recipe_dispatch_*. Mapped
via a small map_substrate() helper. No perf delta on AUTO (recipe
layer was just doing the same dispatch under the hood).
Test changes:
- test_smoke: new EXPECTs for set_substrate (valid + bogus).
- test_idct_bitexact: new argv[4] takes "auto" (default), "cpu", or
"qpu" to force the substrate.
- CMakeLists.txt: new ctest entry `idct_bitexact_cpu` re-runs the
QVGA case forcing the CPU path. Catches silent drift between
the V3D shader and the NEON reference; both must produce
identical output for the same coefficient input (and they do —
see ctest log below).
Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):
$ ctest --test-dir build --output-on-failure
Start 1: smoke
1/4 Test #1: smoke ............................ Passed 0.10 sec
Start 2: idct_bitexact
2/4 Test #2: idct_bitexact .................... Passed 0.03 sec
Start 3: idct_bitexact_cpu
3/4 Test #3: idct_bitexact_cpu ................ Passed 0.03 sec
Start 4: idct_bitexact_1080p
4/4 Test #4: idct_bitexact_1080p .............. Passed 0.06 sec
100% tests passed, 0 tests failed out of 4
CPU substrate produces byte-identical Y + Cb + Cr planes against the
same C reference that the AUTO/QPU path matches — confirming the V3D
shaders and the daedalus-fourier NEON path agree at the spec level.
Why we plumbed the lower-level dispatch instead of leaving recipe in
place: recipe is just a thin wrapper that calls dispatch with
AUTO. Once we needed substrate control, the wrapper became a
liability (would have required adding a parallel recipe API for each
substrate); going direct is simpler and the AUTO path is unchanged.
Coverage note: idct_bitexact_cpu runs at QVGA (300 MBs); not also at
1080p because the CPU path's wall time scales linearly with block
count and a 1080p CPU run is ~0.5s on hertz — fine standalone but
slows ctest enough that it would tempt opt-in gating. The bit-exact
content is the same regardless of frame size; the 1080p variant only
exists to gate index-arithmetic bugs that surface above small int
boundaries.
This commit is contained in:
@@ -99,6 +99,33 @@ typedef enum {
|
||||
DAEDALUS_DECODER_OUTPUT_RGBA = 1, /* Stage 5 opt-in */
|
||||
} daedalus_decoder_output_format;
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* Substrate selector. Determines which backend daedalus-fourier
|
||||
* dispatches the per-frame compute through.
|
||||
*
|
||||
* AUTO is the only sensible choice for production — it picks per the
|
||||
* recipe table baked into daedalus-fourier (post 2026-05-23 decree:
|
||||
* QPU when a V3D shader exists, CPU NEON otherwise). The explicit
|
||||
* options exist for testing:
|
||||
*
|
||||
* - CPU forces the dispatch onto the NEON path even when V3D7 is
|
||||
* available. Lets the bit-exact ctests run on hosts without a
|
||||
* working Vulkan/V3D stack (CI runners, dev x86 boxes via
|
||||
* cross-build), and lets us cross-check the V3D shader output
|
||||
* against the NEON reference path on hosts that DO have V3D.
|
||||
* - QPU is the dual — force QPU even on a CPU-preferred kernel.
|
||||
* Useful for benchmarking specific QPU paths in isolation.
|
||||
*
|
||||
* A non-AUTO selection on a host that can't satisfy it
|
||||
* (DAEDALUS_DECODER_SUBSTRATE_QPU on an x86 dev box) propagates a
|
||||
* dispatch failure back through flush_frame as -3.
|
||||
* ----------------------------------------------------------------- */
|
||||
typedef enum {
|
||||
DAEDALUS_DECODER_SUBSTRATE_AUTO = 0,
|
||||
DAEDALUS_DECODER_SUBSTRATE_CPU = 1,
|
||||
DAEDALUS_DECODER_SUBSTRATE_QPU = 2,
|
||||
} daedalus_decoder_substrate;
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* Lifecycle
|
||||
* ----------------------------------------------------------------- */
|
||||
@@ -128,6 +155,12 @@ void daedalus_decoder_destroy(daedalus_decoder *dec);
|
||||
int daedalus_decoder_set_output_format(daedalus_decoder *dec,
|
||||
daedalus_decoder_output_format fmt);
|
||||
|
||||
/* Override the dispatch substrate for subsequent flush_frame calls.
|
||||
* Default is AUTO. Same mid-frame-change restriction as
|
||||
* set_output_format. */
|
||||
int daedalus_decoder_set_substrate(daedalus_decoder *dec,
|
||||
daedalus_decoder_substrate sub);
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* Per-frame submission
|
||||
* ----------------------------------------------------------------- */
|
||||
|
||||
Reference in New Issue
Block a user