72ee154b36a2ca92e3ace616d5f7c8de2f48e3d1
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
adaabb1f63 |
phase1: IDCT 8x8 dispatch (High profile transform_8x8_size_flag)
Adds the High-profile 8x8 luma transform path alongside the existing
4x4 dispatch. flush_frame now partitions macroblocks by each MB's
transform_8x8 flag and issues a separate luma dispatch per partition:
- mb.transform_8x8 == 0 (Baseline/Main) → coeffs[0..256) interpreted
as 16 4x4 blocks, fed to daedalus_recipe_dispatch_h264_idct4
(existing behaviour, unchanged).
- mb.transform_8x8 == 1 (High) → coeffs[0..256) interpreted
as 4 8x8 blocks (64 int16 each, column-major), fed to the new
daedalus_recipe_dispatch_h264_idct8 call.
Both luma partitions can be non-empty in the same frame (FFmpeg sets
the flag per-MB). Each non-empty partition costs one
vkQueueSubmit + vkQueueWaitIdle; empty partitions are skipped
(common case: Baseline streams skip the 8x8 dispatch entirely).
Chroma is unchanged — 4:2:0 chroma always uses the 4x4 transform.
API surface:
- New uint8_t `transform_8x8` field in `struct daedalus_decoder_mb_input`
(after deblock_*). Backwards-compatible at the source level
because the field defaults to 0 with C99 designated initialisers
or {0} struct zeroing, both of which select the existing 4x4
path. ABI is pre-0.1 (per the header doc) so structural change
is fine.
- Mirrored in `struct daedalus_decoder_mb_desc` (internal layout).
Test changes:
- test_idct_bitexact now exercises a mixed-mode frame: every odd
raster MB uses 8x8, every even uses 4x4 (so flush_frame's
partitioning is also under test, not just the underlying shaders).
- 8x8 C reference (h264_idct8_butterfly + ref_idct8_add)
transcribed from daedalus-fourier tests/h264_idct8_ref.c per
H.264 §8.5.13.2. Block layout column-major; +32 >> 6 rounding;
add-to-predicted; clip255.
- Reference luma compute branches per MB on the same mb_8x8[]
array used to set the input flag.
Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):
$ ./build/test_idct_bitexact
test_idct_bitexact: 320x240 (300 MBs), seed=0xfeedface5a5a5a5a
MB mix: 150 4x4 MBs, 150 8x8 MBs
Y bytes total: 76800
Y bytes diff: 0 (0.0000%)
Cb bytes total: 19200 diff: 0 (0.0000%)
Cr bytes total: 19200 diff: 0 (0.0000%)
BIT-EXACT PASS (Y + Cb + Cr)
$ ctest --test-dir build
100% tests passed, 0 tests failed out of 2
Bit-exact PASS first try for the 8x8 path — 150 8x8 MBs × 4 blocks =
600 8x8 IDCTs against the spec C reference, identical output.
Validates both the daedalus-fourier IDCT 8x8 shader (already gated
by its own cycle-7 bit-exact test, now also gated end-to-end through
our flush_frame), and our 8x8 layout assumptions (column-major coeffs,
raster sb_y*2+sb_x block order, top-left = mb*16 + sb*8).
What's NOT covered yet (deferred):
- Z-scan permutation for FFmpeg compatibility (libavcodec intercept
patch's concern; both 4x4 and 8x8 z-scans differ).
- Chroma DC / luma Intra16x16 DC Hadamard pre-pass.
- Mixed intra/inter MB handling — currently all MBs treated as
residual-only (predicted=0).
Closes the "IDCT 8x8 (High profile)" item from PR #3's deferred list.
|
||
|
|
08080f062c |
scaffold: CMake + API skeleton + smoke test
First code on daedalus-decoder per the Phase 1 decisions merged 2026-05-24.
Repo skeleton only — no Vulkan pipeline yet, no shaders, no libavcodec
intercept. Establishes the build shape so subsequent work has a place
to land.
Layout:
LICENSE BSD-2-Clause (matches daedalus-fourier)
.gitignore build/, CMake artefacts, *.spv
CMakeLists.txt top-level — finds daedalus-fourier
≥0.1.0 via pkg-config (per §9.6
decision: find_package, pinned to
tagged release; .pc consumed via
pkg_check_modules until we ship a
CMake config), Vulkan via
find_package, builds static lib
+ smoke test, GNUInstallDirs install
include/daedalus_decoder.h public API surface:
- daedalus_decoder_{create,destroy,
version,has_qpu}
- daedalus_decoder_set_output_format
(NV12 default, RGBA opt-in per §5)
- daedalus_decoder_append_mb +
struct daedalus_decoder_mb_input
(matches §3 per-MB descriptor)
- daedalus_decoder_flush_frame
(per-frame submit + wait)
- daedalus_decoder_export_dmabuf
(Vulkan-native VkImage export per
§9.4 decision)
Dimensions are CODED frame size
(mod-16), not displayed — caller
translates from SPS + crop offsets.
src/internal.h internal mb_desc struct (matches
shader std430 layout, to be nailed
down once shaders exist) + per-ctx
state
src/daedalus_decoder.c stub bodies:
- create/destroy with proper resource
lifecycle
- append_mb validates + writes CPU
staging buffers (no GPU yet)
- flush_frame returns -2 (not
implemented) — Phase 1 work
- export_dmabuf returns -1
- has_qpu / version diagnostics
tests/test_smoke.c link + lifecycle test: bad dims
reject, OOB MB reject, null inputs
reject, raster-order enforcement,
mid-frame format-change reject,
incomplete-frame flush reject.
On hosts without V3D7 Vulkan,
SKIPs gracefully (returns 0).
Verified on hertz (Pi 5 / V3D 7.1 / Mesa V3DV via daedalus-fourier
0.1.0):
$ cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
$ cmake --build build
$ ctest --test-dir build --output-on-failure
Test #1: smoke ... Passed
$ ./build/test_smoke
daedalus-decoder version: 0.0.1
ctx created: 1920x1088, has_qpu=1
smoke OK
Note the coded-vs-displayed dims trap: 1080p H.264 has coded height
1088 with 8 rows cropped via SPS frame_cropping_*. Header docstring
on daedalus_decoder_create() spells this out so future callers don't
hit the multiple-of-16 reject (smoke test caught it during scaffold
write).
Next: Phase 1 implementation begins — IDCT 4×4 / 8×8 frame-scaled
dispatch (reusing daedalus-fourier shaders per Appendix A), intra
prediction wavefront, reconstruct stage, NV12 output via dmabuf
export. Smoke test grows from "ctx lifecycle works" to
"I-frame-only Baseline decode bit-exact vs FFmpeg reference".
|