phase1: IDCT 8x8 dispatch (High profile transform_8x8_size_flag)
Adds the High-profile 8x8 luma transform path alongside the existing
4x4 dispatch. flush_frame now partitions macroblocks by each MB's
transform_8x8 flag and issues a separate luma dispatch per partition:
- mb.transform_8x8 == 0 (Baseline/Main) → coeffs[0..256) interpreted
as 16 4x4 blocks, fed to daedalus_recipe_dispatch_h264_idct4
(existing behaviour, unchanged).
- mb.transform_8x8 == 1 (High) → coeffs[0..256) interpreted
as 4 8x8 blocks (64 int16 each, column-major), fed to the new
daedalus_recipe_dispatch_h264_idct8 call.
Both luma partitions can be non-empty in the same frame (FFmpeg sets
the flag per-MB). Each non-empty partition costs one
vkQueueSubmit + vkQueueWaitIdle; empty partitions are skipped
(common case: Baseline streams skip the 8x8 dispatch entirely).
Chroma is unchanged — 4:2:0 chroma always uses the 4x4 transform.
API surface:
- New uint8_t `transform_8x8` field in `struct daedalus_decoder_mb_input`
(after deblock_*). Backwards-compatible at the source level
because the field defaults to 0 with C99 designated initialisers
or {0} struct zeroing, both of which select the existing 4x4
path. ABI is pre-0.1 (per the header doc) so structural change
is fine.
- Mirrored in `struct daedalus_decoder_mb_desc` (internal layout).
Test changes:
- test_idct_bitexact now exercises a mixed-mode frame: every odd
raster MB uses 8x8, every even uses 4x4 (so flush_frame's
partitioning is also under test, not just the underlying shaders).
- 8x8 C reference (h264_idct8_butterfly + ref_idct8_add)
transcribed from daedalus-fourier tests/h264_idct8_ref.c per
H.264 §8.5.13.2. Block layout column-major; +32 >> 6 rounding;
add-to-predicted; clip255.
- Reference luma compute branches per MB on the same mb_8x8[]
array used to set the input flag.
Verified on hertz (Pi 5 / V3D 7.1 / daedalus-fourier 0.1.0):
$ ./build/test_idct_bitexact
test_idct_bitexact: 320x240 (300 MBs), seed=0xfeedface5a5a5a5a
MB mix: 150 4x4 MBs, 150 8x8 MBs
Y bytes total: 76800
Y bytes diff: 0 (0.0000%)
Cb bytes total: 19200 diff: 0 (0.0000%)
Cr bytes total: 19200 diff: 0 (0.0000%)
BIT-EXACT PASS (Y + Cb + Cr)
$ ctest --test-dir build
100% tests passed, 0 tests failed out of 2
Bit-exact PASS first try for the 8x8 path — 150 8x8 MBs × 4 blocks =
600 8x8 IDCTs against the spec C reference, identical output.
Validates both the daedalus-fourier IDCT 8x8 shader (already gated
by its own cycle-7 bit-exact test, now also gated end-to-end through
our flush_frame), and our 8x8 layout assumptions (column-major coeffs,
raster sb_y*2+sb_x block order, top-left = mb*16 + sb*8).
What's NOT covered yet (deferred):
- Z-scan permutation for FFmpeg compatibility (libavcodec intercept
patch's concern; both 4x4 and 8x8 z-scans differ).
- Chroma DC / luma Intra16x16 DC Hadamard pre-pass.
- Mixed intra/inter MB handling — currently all MBs treated as
residual-only (predicted=0).
Closes the "IDCT 8x8 (High profile)" item from PR #3's deferred list.
This commit is contained in:
@@ -75,9 +75,19 @@ struct daedalus_decoder_mb_input {
|
||||
int8_t deblock_alpha_c0;
|
||||
int8_t deblock_beta;
|
||||
|
||||
/* Transform coefficients — 256 luma (4x4 x 16) + 64 cb + 64 cr,
|
||||
* column-major within each 4x4 block (matches FFmpeg convention).
|
||||
* Caller-owned; copied during append. */
|
||||
/* High-profile 8x8 transform selector.
|
||||
* 0 = the 256-int16 luma section of coeffs[] holds 16 4x4 blocks
|
||||
* (16 coeffs each, raster sb_y*4+sb_x); the chroma section is
|
||||
* always 4x4.
|
||||
* 1 = the 256-int16 luma section holds 4 8x8 blocks (64 coeffs
|
||||
* each, raster sb_y*2+sb_x). Set per H.264's
|
||||
* transform_8x8_size_flag. Chroma remains 4x4 (4:2:0).
|
||||
*/
|
||||
uint8_t transform_8x8;
|
||||
|
||||
/* Transform coefficients — 256 luma + 64 cb + 64 cr int16, all
|
||||
* column-major within each 4x4 or 8x8 block (matches FFmpeg
|
||||
* convention). Caller-owned; copied during append. */
|
||||
const int16_t *coeffs; /* points at exactly 384 int16_t */
|
||||
};
|
||||
|
||||
|
||||
Reference in New Issue
Block a user