Phase 8c: H.264 luma qpel mc20 through public API

Extends daedalus-fourier with daedalus_recipe_dispatch_h264_qpel_mc20
so libavcodec.so can route H264QpelContext.put_h264_qpel_pixels_tab[1][2]
through the recipe layer instead of ff_put_h264_qpel8_mc20_neon directly.

API additions (header + library):
  - daedalus_h264_qpel_meta { dst_off, src_off }
  - daedalus_dispatch_h264_qpel_mc20(ctx, sub, dst, src, stride,
                                     n_blocks, meta)
  - daedalus_recipe_dispatch_h264_qpel_mc20(...)  (AUTO wrapper)
  - DAEDALUS_KERNEL_H264_QPEL_MC20 = 9 in the recipe-query enum
  - daedalus_recipe_substrate_for() returns CPU NEON for cycle 9

The 6-tap horizontal half-pel filter signature matches FFmpeg's
H264QpelContext convention exactly: dst and src share a single stride
and src already points at output column 0 (filter reads cols -2..+3).
Single-stride API to make the marfrit-packages FFmpeg shim a
straight pointer-pass; no buffer rearrangement.

Verdict per docs/k9_h264qpel_mc20.md: CPU NEON.  Per-block 7.6 ns
gives 135x margin over 30 fps 1080p; QPU dispatch floor at ~250 ns
makes any V3D shader strictly worse.  Recipe table reflects that —
the recipe_dispatch entry is a one-line forward to the CPU path.

CMakeLists changes:
  - h264qpel_neon.S added to the daedalus_core static lib (only the
    bench targets owned it before; now the public API needs it too)
  - tests/h264_qpel8_mc20_ref.c added to the test_api_h264 target

Phase 8a/8b smoke gains a 4th case (test_qpel_mc20): 1024/1024
bytes bit-exact via daedalus_recipe_dispatch_h264_qpel_mc20.

Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
This commit is contained in:
2026-05-23 03:25:24 +02:00
parent d87239d817
commit 8fdef27a7d
4 changed files with 116 additions and 0 deletions
+2
View File
@@ -365,6 +365,7 @@ add_library(daedalus_core STATIC
${FFC_MC_SOURCES}
${FFASM_H264IDCT_SOURCES}
${FFASM_H264DSP_SOURCES}
${FFASM_H264QPEL_SOURCES}
${DAV1D_CDEF_ASM_SOURCES}
${DAV1D_CDEF_C_SOURCES}
)
@@ -458,6 +459,7 @@ add_executable(test_api_h264
tests/h264_idct4_ref.c
tests/h264_idct8_ref.c
tests/h264_deblock_ref.c
tests/h264_qpel8_mc20_ref.c
)
target_link_libraries(test_api_h264 PRIVATE daedalus_core)
target_compile_options(test_api_h264 PRIVATE -O2)