Stage 2 PR-a: predicted samples plumbing — caller-supplied per-MB pixels
First concrete deliverable on the daedalus-decoder Stage 2 path post
the 2026-05-25 architecture re-pin (memory: dejavu / frame-major UMA).
Q2 decision: CPU intra prediction. libavcodec's existing NEON intra
prediction kernels generate predicted samples per MB; daedalus-decoder
accepts those samples through the API and uses them as the IDCT-add
starting state. FFmpeg's `idct_add` semantics — dst += idct(coeffs);
clip255 — fold DESIGN.md's Stage 3 reconstruction into the existing
Stage 1 IDCT dispatch for free. No new GPU work.
API change
----------
`daedalus_decoder_mb_input` gains a `const uint8_t *predicted` field:
predicted [ 0 .. 256) — 16×16 luma, row-major raster
predicted [256 .. 320) — 8×8 Cb, row-major raster
predicted [320 .. 384) — 8×8 Cr, row-major raster
NULL is legal and equivalent to all-zero predicted samples — preserves
the existing IDCT-isolation test contract.
Internal changes
----------------
- `daedalus_decoder` gains predicted_y (W×H) and predicted_uv (planar
Cb||Cr, W×H/2) buffers allocated at create, zeroed at end of every
flush_frame so NULL `mb->predicted` is indistinguishable from
explicit zeros from one frame to the next.
- `append_mb` splats mb->predicted into predicted_y/_uv at raster
(mb_y*16, mb_x*16) for luma and (mb_y*8, mb_x*8) for each chroma
component.
- `flush_frame` replaces `calloc(scratch_y)` and `calloc(scratch_uv)`
with `malloc + memcpy from predicted_y/_uv` — the IDCT dispatch
then writes residual on top, clip-adding to the predicted samples
in place.
Test
----
`test_idct_bitexact` extended:
- Generates random predicted samples (uint8_t) per MB alongside the
existing random coeffs.
- Pre-fills the reference ref_y / ref_cb / ref_cr planes with those
same predicted samples at the corresponding raster positions
BEFORE applying ref_idct4_add / ref_idct8_add per block.
- Compares GPU output to reference byte-for-byte.
Result on hertz (Pi 5 V3D 7.1), all three substrates:
test_idct_bitexact 320 240 0xfeedface5a5a5a5a {cpu, qpu, auto}
Y bytes diff: 0/76800 (0.0000%)
Cb bytes diff: 0/19200 (0.0000%)
Cr bytes diff: 0/19200 (0.0000%)
BIT-EXACT PASS on all three substrates
Catches any silent drift between substrates and any predicted-samples
plumbing mistake on either the API or the dispatch side.
Followups
---------
- Stage 2 PR-b: deblock dispatch in flush_frame.
- Stage 2 daemon refactor (parallel, daedalus-v4l2 daemon): replace
avcodec_send_packet/receive_frame with a libavcodec-parser-only
path that drives daedalus_decoder_append_mb in raster order +
flush_frame at slice boundary.
This commit is contained in:
@@ -89,6 +89,26 @@ struct daedalus_decoder_mb_input {
|
||||
* column-major within each 4x4 or 8x8 block (matches FFmpeg
|
||||
* convention). Caller-owned; copied during append. */
|
||||
const int16_t *coeffs; /* points at exactly 384 int16_t */
|
||||
|
||||
/* Reconstructed predicted samples for this MB, planar order:
|
||||
* [ 0 .. 256) — 16×16 luma, ROW-MAJOR raster (row 0 cols 0..15,
|
||||
* row 1 cols 0..15, ..., row 15 cols 0..15)
|
||||
* [256 .. 320) — 8×8 Cb, ROW-MAJOR raster
|
||||
* [320 .. 384) — 8×8 Cr, ROW-MAJOR raster
|
||||
*
|
||||
* The caller (libavcodec's CPU intra-prediction kernels for Phase 1
|
||||
* I-frames; MC fallback for Phase 2 P-frames before GPU MC lands)
|
||||
* populates this from neighbour samples per H.264 §8.3 / §8.4.
|
||||
* `flush_frame()`'s reconstruction step is `clip255(predicted +
|
||||
* idct(coeffs))` — the IDCT shader reads dst, adds the inverse
|
||||
* transform, writes clipped — so a non-zero `predicted` here makes
|
||||
* the output pixel a valid H.264 reconstruction; zero means
|
||||
* residual-only (used by IDCT-isolation tests).
|
||||
*
|
||||
* NULL is legal and means "all-zero predicted samples" for this MB
|
||||
* (the per-frame predicted buffer is zeroed at flush time so a NULL
|
||||
* is indistinguishable from explicit zeros). */
|
||||
const uint8_t *predicted; /* NULL or exactly 384 uint8_t */
|
||||
};
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
|
||||
Reference in New Issue
Block a user