Files
daedalus-decoder/include
claude-noether a7a0d56ecd Stage 2 PR-a: predicted samples plumbing — caller-supplied per-MB pixels
First concrete deliverable on the daedalus-decoder Stage 2 path post
the 2026-05-25 architecture re-pin (memory: dejavu / frame-major UMA).

Q2 decision: CPU intra prediction.  libavcodec's existing NEON intra
prediction kernels generate predicted samples per MB; daedalus-decoder
accepts those samples through the API and uses them as the IDCT-add
starting state.  FFmpeg's `idct_add` semantics — dst += idct(coeffs);
clip255 — fold DESIGN.md's Stage 3 reconstruction into the existing
Stage 1 IDCT dispatch for free.  No new GPU work.

API change
----------

`daedalus_decoder_mb_input` gains a `const uint8_t *predicted` field:

    predicted [  0 .. 256) — 16×16 luma, row-major raster
    predicted [256 .. 320) — 8×8  Cb,   row-major raster
    predicted [320 .. 384) — 8×8  Cr,   row-major raster

NULL is legal and equivalent to all-zero predicted samples — preserves
the existing IDCT-isolation test contract.

Internal changes
----------------

  - `daedalus_decoder` gains predicted_y (W×H) and predicted_uv (planar
    Cb||Cr, W×H/2) buffers allocated at create, zeroed at end of every
    flush_frame so NULL `mb->predicted` is indistinguishable from
    explicit zeros from one frame to the next.
  - `append_mb` splats mb->predicted into predicted_y/_uv at raster
    (mb_y*16, mb_x*16) for luma and (mb_y*8, mb_x*8) for each chroma
    component.
  - `flush_frame` replaces `calloc(scratch_y)` and `calloc(scratch_uv)`
    with `malloc + memcpy from predicted_y/_uv` — the IDCT dispatch
    then writes residual on top, clip-adding to the predicted samples
    in place.

Test
----

`test_idct_bitexact` extended:

  - Generates random predicted samples (uint8_t) per MB alongside the
    existing random coeffs.
  - Pre-fills the reference ref_y / ref_cb / ref_cr planes with those
    same predicted samples at the corresponding raster positions
    BEFORE applying ref_idct4_add / ref_idct8_add per block.
  - Compares GPU output to reference byte-for-byte.

Result on hertz (Pi 5 V3D 7.1), all three substrates:

  test_idct_bitexact 320 240 0xfeedface5a5a5a5a {cpu, qpu, auto}
    Y bytes diff:  0/76800 (0.0000%)
    Cb bytes diff: 0/19200 (0.0000%)
    Cr bytes diff: 0/19200 (0.0000%)
  BIT-EXACT PASS on all three substrates

Catches any silent drift between substrates and any predicted-samples
plumbing mistake on either the API or the dispatch side.

Followups
---------

  - Stage 2 PR-b: deblock dispatch in flush_frame.
  - Stage 2 daemon refactor (parallel, daedalus-v4l2 daemon): replace
    avcodec_send_packet/receive_frame with a libavcodec-parser-only
    path that drives daedalus_decoder_append_mb in raster order +
    flush_frame at slice boundary.
2026-05-25 23:01:20 +02:00
..