Stage 2 PR-a: predicted samples plumbing — caller-supplied per-MB pixels #11
Reference in New Issue
Block a user
Delete Branch "noether/stage2-predicted-samples"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
First concrete deliverable on the daedalus-decoder Stage 2 path post the 2026-05-25 architecture re-pin (memory:
dejavu/ frame-major UMA).Q2 decision applies: CPU intra prediction. libavcodec's existing NEON intra prediction kernels generate predicted samples per MB; daedalus-decoder accepts those samples through the API and uses them as the IDCT-add starting state. FFmpeg's
idct_addsemantics (dst += idct(coeffs); clip255) fold DESIGN.md's Stage 3 reconstruction into the existing Stage 1 IDCT dispatch for free. No new GPU work.API change
daedalus_decoder_mb_inputgains aconst uint8_t *predictedfield:[ 0 .. 256)[256 .. 320)[320 .. 384)NULL is legal and equivalent to all-zero predicted samples — preserves the existing IDCT-isolation test contract.
Internal changes
daedalus_decodergainspredicted_y(W×H) andpredicted_uv(planar Cb‖Cr, W×H/2) buffers allocated at create, zeroed at end of everyflush_frameso NULLmb->predictedis indistinguishable from explicit zeros across frame boundaries.append_mbsplatsmb->predictedintopredicted_y/_uvat raster(mb_y*16, mb_x*16)for luma and(mb_y*8, mb_x*8)for each chroma component.flush_framereplacescalloc(scratch_y)/calloc(scratch_uv)withmalloc + memcpy from predicted_y/_uv— the IDCT dispatch writes residual on top, clip-adding to the predicted samples in place.Test
test_idct_bitexactextended:ref_y/ref_cb/ref_crwith those predicted samples at corresponding raster positions BEFORE applyingref_idct4_add/ref_idct8_addper block.Result on hertz (Pi 5 V3D 7.1), all three substrates:
Followups
flush_frame.daedalus-v4l2daemon): replaceavcodec_send_packet/receive_framewith a parser-only path that drivesdaedalus_decoder_append_mbin raster order +flush_frameat slice boundary.First concrete deliverable on the daedalus-decoder Stage 2 path post the 2026-05-25 architecture re-pin (memory: dejavu / frame-major UMA). Q2 decision: CPU intra prediction. libavcodec's existing NEON intra prediction kernels generate predicted samples per MB; daedalus-decoder accepts those samples through the API and uses them as the IDCT-add starting state. FFmpeg's `idct_add` semantics — dst += idct(coeffs); clip255 — fold DESIGN.md's Stage 3 reconstruction into the existing Stage 1 IDCT dispatch for free. No new GPU work. API change ---------- `daedalus_decoder_mb_input` gains a `const uint8_t *predicted` field: predicted [ 0 .. 256) — 16×16 luma, row-major raster predicted [256 .. 320) — 8×8 Cb, row-major raster predicted [320 .. 384) — 8×8 Cr, row-major raster NULL is legal and equivalent to all-zero predicted samples — preserves the existing IDCT-isolation test contract. Internal changes ---------------- - `daedalus_decoder` gains predicted_y (W×H) and predicted_uv (planar Cb||Cr, W×H/2) buffers allocated at create, zeroed at end of every flush_frame so NULL `mb->predicted` is indistinguishable from explicit zeros from one frame to the next. - `append_mb` splats mb->predicted into predicted_y/_uv at raster (mb_y*16, mb_x*16) for luma and (mb_y*8, mb_x*8) for each chroma component. - `flush_frame` replaces `calloc(scratch_y)` and `calloc(scratch_uv)` with `malloc + memcpy from predicted_y/_uv` — the IDCT dispatch then writes residual on top, clip-adding to the predicted samples in place. Test ---- `test_idct_bitexact` extended: - Generates random predicted samples (uint8_t) per MB alongside the existing random coeffs. - Pre-fills the reference ref_y / ref_cb / ref_cr planes with those same predicted samples at the corresponding raster positions BEFORE applying ref_idct4_add / ref_idct8_add per block. - Compares GPU output to reference byte-for-byte. Result on hertz (Pi 5 V3D 7.1), all three substrates: test_idct_bitexact 320 240 0xfeedface5a5a5a5a {cpu, qpu, auto} Y bytes diff: 0/76800 (0.0000%) Cb bytes diff: 0/19200 (0.0000%) Cr bytes diff: 0/19200 (0.0000%) BIT-EXACT PASS on all three substrates Catches any silent drift between substrates and any predicted-samples plumbing mistake on either the API or the dispatch side. Followups --------- - Stage 2 PR-b: deblock dispatch in flush_frame. - Stage 2 daemon refactor (parallel, daedalus-v4l2 daemon): replace avcodec_send_packet/receive_frame with a libavcodec-parser-only path that drives daedalus_decoder_append_mb in raster order + flush_frame at slice boundary.