phase1: substrate selector API + cross-substrate bit-exact ctest
Surfaces daedalus-fourier's substrate-override capability at the
decoder boundary. Lets tests run on CPU-only hosts (CI runners,
x86 dev boxes) AND cross-checks V3D shader output against NEON
reference on hosts that have both.
API additions (pre-0.1 ABI, additive):
- daedalus_decoder_substrate enum { AUTO, CPU, QPU }
(mirrors daedalus_substrate; isolated for ABI reasons).
- daedalus_decoder_set_substrate(dec, sub) setter, same
mid-frame-change restrictions as set_output_format.
- Default remains AUTO — the only sensible choice for production.
Internal:
- flush_frame now calls daedalus_dispatch_h264_idct{4,8} with an
explicit substrate instead of daedalus_recipe_dispatch_*. Mapped
via a small map_substrate() helper. No perf delta on AUTO (recipe
layer was just doing the same dispatch under the hood).
Test changes:
- test_smoke: new EXPECTs for set_substrate (valid + bogus).
- test_idct_bitexact: new argv[4] takes "auto" (default), "cpu", or
"qpu" to force the substrate.
- CMakeLists.txt: new ctest entry `idct_bitexact_cpu` re-runs the
QVGA case forcing the CPU path. Catches silent drift between
the V3D shader and the NEON reference; both must produce
identical output for the same coefficient input (and they do —
see ctest log below).
Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):
$ ctest --test-dir build --output-on-failure
Start 1: smoke
1/4 Test #1: smoke ............................ Passed 0.10 sec
Start 2: idct_bitexact
2/4 Test #2: idct_bitexact .................... Passed 0.03 sec
Start 3: idct_bitexact_cpu
3/4 Test #3: idct_bitexact_cpu ................ Passed 0.03 sec
Start 4: idct_bitexact_1080p
4/4 Test #4: idct_bitexact_1080p .............. Passed 0.06 sec
100% tests passed, 0 tests failed out of 4
CPU substrate produces byte-identical Y + Cb + Cr planes against the
same C reference that the AUTO/QPU path matches — confirming the V3D
shaders and the daedalus-fourier NEON path agree at the spec level.
Why we plumbed the lower-level dispatch instead of leaving recipe in
place: recipe is just a thin wrapper that calls dispatch with
AUTO. Once we needed substrate control, the wrapper became a
liability (would have required adding a parallel recipe API for each
substrate); going direct is simpler and the AUTO path is unchanged.
Coverage note: idct_bitexact_cpu runs at QVGA (300 MBs); not also at
1080p because the CPU path's wall time scales linearly with block
count and a 1080p CPU run is ~0.5s on hertz — fine standalone but
slows ctest enough that it would tempt opt-in gating. The bit-exact
content is the same regardless of frame size; the 1080p variant only
exists to gate index-arithmetic bugs that surface above small int
boundaries.
This commit is contained in:
@@ -166,6 +166,23 @@ int main(int argc, char **argv)
|
||||
uint64_t seed = argc > 3 ? strtoull(argv[3], NULL, 0) : 0xfeedface5a5a5a5aULL;
|
||||
xs64_state = seed;
|
||||
|
||||
/* Optional 4th argv: "auto" (default) / "cpu" / "qpu" to pin the
|
||||
* dispatch substrate. Both substrates must produce IDENTICAL
|
||||
* output (the V3D shaders are bit-exact gates against the same
|
||||
* spec the NEON path implements); the ctest suite runs the QVGA
|
||||
* test once per substrate to catch any silent drift. */
|
||||
daedalus_decoder_substrate sub = DAEDALUS_DECODER_SUBSTRATE_AUTO;
|
||||
const char *sub_name = "auto";
|
||||
if (argc > 4) {
|
||||
if (!strcmp(argv[4], "cpu")) { sub = DAEDALUS_DECODER_SUBSTRATE_CPU; sub_name = "cpu"; }
|
||||
else if (!strcmp(argv[4], "qpu")) { sub = DAEDALUS_DECODER_SUBSTRATE_QPU; sub_name = "qpu"; }
|
||||
else if (!strcmp(argv[4], "auto")) { /* default */ }
|
||||
else {
|
||||
fprintf(stderr, "unknown substrate '%s' (want auto/cpu/qpu)\n", argv[4]);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
int mb_w = width / 16;
|
||||
int mb_h = height / 16;
|
||||
int n_mbs = mb_w * mb_h;
|
||||
@@ -177,6 +194,11 @@ int main(int argc, char **argv)
|
||||
fprintf(stderr, "SKIP: ctx create failed (Vulkan / V3D7 unavailable)\n");
|
||||
return 0;
|
||||
}
|
||||
if (daedalus_decoder_set_substrate(dec, sub) != 0) {
|
||||
fprintf(stderr, "set_substrate(%s) failed\n", sub_name);
|
||||
return 1;
|
||||
}
|
||||
printf("substrate: %s\n", sub_name);
|
||||
|
||||
/* Build the per-MB inputs. Each MB gets 16 luma 4×4 blocks of
|
||||
* random coeffs in [-512, 511] — same range as the daedalus-fourier
|
||||
|
||||
@@ -52,6 +52,16 @@ int main(void)
|
||||
EXPECT(daedalus_decoder_set_output_format(dec, DAEDALUS_DECODER_OUTPUT_NV12) == 0,
|
||||
"switch back to NV12");
|
||||
|
||||
/* Substrate setter — same lifecycle rules. */
|
||||
EXPECT(daedalus_decoder_set_substrate(dec, DAEDALUS_DECODER_SUBSTRATE_CPU) == 0,
|
||||
"force CPU substrate on virgin ctx");
|
||||
EXPECT(daedalus_decoder_set_substrate(dec, DAEDALUS_DECODER_SUBSTRATE_QPU) == 0,
|
||||
"force QPU substrate on virgin ctx");
|
||||
EXPECT(daedalus_decoder_set_substrate(dec, DAEDALUS_DECODER_SUBSTRATE_AUTO) == 0,
|
||||
"back to AUTO");
|
||||
EXPECT(daedalus_decoder_set_substrate(dec, (daedalus_decoder_substrate) 99) == -1,
|
||||
"bogus substrate rejects");
|
||||
|
||||
/* Append rejects out-of-bounds + null inputs. */
|
||||
int16_t coeffs[384] = {0};
|
||||
struct daedalus_decoder_mb_input mb = {0};
|
||||
|
||||
Reference in New Issue
Block a user