Phase 8c: H.264 luma qpel mc20 through public API
Extends daedalus-fourier with daedalus_recipe_dispatch_h264_qpel_mc20
so libavcodec.so can route H264QpelContext.put_h264_qpel_pixels_tab[1][2]
through the recipe layer instead of ff_put_h264_qpel8_mc20_neon directly.
API additions (header + library):
- daedalus_h264_qpel_meta { dst_off, src_off }
- daedalus_dispatch_h264_qpel_mc20(ctx, sub, dst, src, stride,
n_blocks, meta)
- daedalus_recipe_dispatch_h264_qpel_mc20(...) (AUTO wrapper)
- DAEDALUS_KERNEL_H264_QPEL_MC20 = 9 in the recipe-query enum
- daedalus_recipe_substrate_for() returns CPU NEON for cycle 9
The 6-tap horizontal half-pel filter signature matches FFmpeg's
H264QpelContext convention exactly: dst and src share a single stride
and src already points at output column 0 (filter reads cols -2..+3).
Single-stride API to make the marfrit-packages FFmpeg shim a
straight pointer-pass; no buffer rearrangement.
Verdict per docs/k9_h264qpel_mc20.md: CPU NEON. Per-block 7.6 ns
gives 135x margin over 30 fps 1080p; QPU dispatch floor at ~250 ns
makes any V3D shader strictly worse. Recipe table reflects that —
the recipe_dispatch entry is a one-line forward to the CPU path.
CMakeLists changes:
- h264qpel_neon.S added to the daedalus_core static lib (only the
bench targets owned it before; now the public API needs it too)
- tests/h264_qpel8_mc20_ref.c added to the test_api_h264 target
Phase 8a/8b smoke gains a 4th case (test_qpel_mc20): 1024/1024
bytes bit-exact via daedalus_recipe_dispatch_h264_qpel_mc20.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
This commit is contained in:
@@ -263,6 +263,39 @@ int daedalus_dispatch_h264_deblock_luma_v(daedalus_ctx *ctx, daedalus_substrate
|
||||
uint8_t *dst, size_t dst_stride,
|
||||
size_t n_edges, const daedalus_h264_deblock_meta *meta);
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* H.264 luma qpel mc20 (8×8, horizontal half-pel) — cycle 9
|
||||
* (CPU by recipe; per-block 7.6 ns NEON, QPU not viable — see
|
||||
* docs/k9_h264qpel_mc20.md for the R-band rationale).
|
||||
*
|
||||
* Per H.264 §8.4.2.2.1, horizontal half-pel luma 6-tap filter:
|
||||
* dst[r,c] = clip255((s[r,c-2] - 5*s[r,c-1] + 20*s[r,c]
|
||||
* + 20*s[r,c+1] - 5*s[r,c+2] + s[r,c+3]
|
||||
* + 16) >> 5)
|
||||
*
|
||||
* Single-stride: dst and src share `stride`; this matches FFmpeg's
|
||||
* H264QpelContext.put_h264_qpel_pixels_tab[][] convention and the
|
||||
* vendored ff_put_h264_qpel8_mc20_neon signature.
|
||||
*
|
||||
* `src + src_off` points at the leftmost OUTPUT column (col 0); the
|
||||
* filter reads cols -2..+3, so the caller must guarantee src has at
|
||||
* least 2 pixels of left context and 3 pixels of right context per
|
||||
* row. (FFmpeg already maintains an edge-emulated buffer for the
|
||||
* frame boundary; this matches that contract.)
|
||||
* ----------------------------------------------------------------- */
|
||||
typedef struct {
|
||||
uint32_t dst_off; /* byte offset into dst (block top-left) */
|
||||
uint32_t src_off; /* byte offset into src (col 0, row 0) */
|
||||
} daedalus_h264_qpel_meta;
|
||||
|
||||
int daedalus_recipe_dispatch_h264_qpel_mc20(daedalus_ctx *ctx,
|
||||
uint8_t *dst, const uint8_t *src, size_t stride,
|
||||
size_t n_blocks, const daedalus_h264_qpel_meta *meta);
|
||||
|
||||
int daedalus_dispatch_h264_qpel_mc20(daedalus_ctx *ctx, daedalus_substrate sub,
|
||||
uint8_t *dst, const uint8_t *src, size_t stride,
|
||||
size_t n_blocks, const daedalus_h264_qpel_meta *meta);
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* Recipe query — what does the API recommend for each kernel?
|
||||
* ----------------------------------------------------------------- */
|
||||
@@ -275,6 +308,7 @@ typedef enum {
|
||||
DAEDALUS_KERNEL_H264_IDCT4 = 6,
|
||||
DAEDALUS_KERNEL_H264_IDCT8 = 7,
|
||||
DAEDALUS_KERNEL_H264_DEBLOCK_LV = 8,
|
||||
DAEDALUS_KERNEL_H264_QPEL_MC20 = 9,
|
||||
} daedalus_kernel;
|
||||
|
||||
daedalus_substrate daedalus_recipe_substrate_for(daedalus_kernel k);
|
||||
|
||||
Reference in New Issue
Block a user