Commit Graph

2 Commits

Author SHA1 Message Date
claude-noether df9e1c9d78 h264: promote Intra_4x4 luma prediction (9 modes) to public API
PR #12 added the 9 Intra_4x4 luma intra prediction modes as test-only
spec references in tests/.  This PR promotes them to public src/
symbols so consumers (the eventual marfrit-packages substitution-arc
patch 0014) can link against them.

  Moved: tests/h264_intra_pred_4x4_ref.c → src/h264_intra_pred_4x4.c
  Renamed: daedalus_h264_pred_4x4_<mode>_ref → daedalus_h264_pred_4x4_<mode>
           (9 functions: vertical/horizontal/dc/ddl/ddr/vr/hd/vl/hu)

The src/ implementation is byte-for-byte the same code as the
test-only ref; this PR is plain plumbing.  The test binary now
links against daedalus_core to pull in the public symbols (instead
of compiling the ref file directly), exercising the path that real
consumers will use.

Same promotion shape as PR #25 (chroma DC Hadamard).

Verified on hertz:

  $ ./build/test_intra_pred_4x4
    Vertical (mode 0)          PASS
    Horizontal (mode 1)        PASS
    DC (mode 2)                PASS
    DiagDownLeft (mode 3)      PASS
    DiagDownRight (mode 4)     PASS
    VerticalRight (mode 5)     PASS
    HorizontalDown (mode 6)    PASS
    VerticalLeft (mode 7)      PASS
    HorizontalUp (mode 8)      PASS
    VR asym (sanity)           PASS

  ALL 10 intra-4x4 mode references PASS

  $ nm -g build/libdaedalus_core.a | grep "T daedalus_h264_pred_4x4"
  (9 symbols exported)

Follow-ups (same promotion pattern, can land in parallel):
  - Intra_16x16 luma (4 modes, PR #13)
  - Intra_8x8 chroma (4 modes, PR #14)
  - Intra_8x8 luma (9 modes, PRs #21 + #22)

Once all 26 intra modes are in the public API, the marfrit-packages
substitution arc can route H264PredContext's pred function pointer
tables through daedalus alongside the IDCT / deblock / qpel / DC
Hadamard substitutions already in place.
2026-05-25 14:53:37 +02:00
claude-noether ce6703a862 h264: Intra_4x4 luma prediction — 9-mode C reference + spec gates
Lays the bit-exact gate for H.264 §8.3.1.4 Intra_4x4 luma prediction.
Spec-derived C reference covering all 9 modes; standalone test
exercises each against hand-computed expected 4x4 patterns.

Why fourier (not the decoder) gets this: it's a reusable spec-level
primitive — both daedalus-decoder (Phase 1 Stage 2a intra prediction)
and any future shader work will need the same bit-exact reference.
Putting it in fourier alongside the IDCT / deblock refs keeps the
"spec implementations" library cohesive.

Why CPU C reference, not NEON or QPU: the vendored FFmpeg snapshot
(external/ffmpeg-snapshot/libavcodec/aarch64/) has h264dsp/idct/qpel
but NOT h264pred.  Vendoring h264pred_neon.S would expand the snapshot
surface; deferring that pending real perf data.  Per the cycle 9
NEON benches that take ~5 ns per 8x8 qpel block, intra prediction
at ~5 ns per 4x4 block × 16 blocks/MB × 8160 MBs = ~650 us/frame at
1080p — well inside budget even at NEON, and much further inside at
plain C.  Not the critical-path concern.

Scope:
  - tests/h264_intra_pred_4x4_ref.c — 9 prediction modes per
    H.264 spec §8.3.1.4 sub-clauses, FFmpeg-style interface:
      void daedalus_h264_pred_4x4_<name>_ref(uint8_t *dst, ptrdiff_t stride);
    Reads top/top-right/left/top-left neighbours from dst[-stride/-1]
    offsets, writes 4×4 output at dst[0..3][0..3].  Assumes all 13
    neighbour bytes are valid (interior-MB case; availability
    fallbacks are caller-side per spec).
  - tests/test_intra_pred_4x4.c — 10 cases:
      * 9 uniform-context degenerate tests (one per mode), establishing
        that nothing is structurally broken (all output cells must
        equal the uniform input value).
      * 1 asymmetric Vertical_Right sanity test with 16 distinct
        expected cells hand-computed from spec §8.3.1.4.6 — the
        "really exercise orientation + row/col arithmetic" gate.
  - CMakeLists.txt — new test_intra_pred_4x4 binary (no daedalus_core
    dependency; pure-CPU library doesn't need a context to construct).

Verified on hertz:

  $ ./build/test_intra_pred_4x4
    Vertical (mode 0)          PASS
    Horizontal (mode 1)        PASS
    DC (mode 2)                PASS
    DiagDownLeft (mode 3)      PASS
    DiagDownRight (mode 4)     PASS
    VerticalRight (mode 5)     PASS
    HorizontalDown (mode 6)    PASS
    VerticalLeft (mode 7)      PASS
    HorizontalUp (mode 8)      PASS
    VR asym (sanity)           PASS

  ALL 10 intra-4x4 mode references PASS

The VR asym test passed first try; the DC test fell on the first
attempt because my test expectation miscomputed the rounding shift
(I wrote 4, actual is 2 = (16+4)>>3).  Fixed in the test.  Reference
itself never had the bug.

What this does NOT cover (next-step backlog):
  - Intra_16x16 luma prediction (4 modes per H.264 §8.3.2): vertical,
    horizontal, DC, plane.
  - Intra_8x8 chroma prediction (4 modes per H.264 §8.3.3): DC,
    horizontal, vertical, plane.
  - Intra_8x8 luma prediction (High profile, 9 modes per §8.3.2.1) —
    these are the High-profile siblings of the modes in this PR with
    the 1-2-1 smoothing pre-filter.  Different but well-defined.
  - Neighbour availability fallback (top-edge MB, left-edge MB,
    slice-boundary, top-right unavailable in some positions).
  - Dispatch wrappers — these refs aren't surfaced through
    daedalus_dispatch_*().  Whether to do that depends on the
    daedalus-decoder Stage 2a architecture (per-block CPU vs
    per-diagonal GPU wavefront — TBD).
2026-05-25 00:14:51 +02:00