h264: qpel avg anchors (avg_mc20/02/22, biprediction support)
Begins the avg_ qpel buildout for B-slice biprediction. Each avg_
form computes the same half-pel formula as its put_ sibling, then
L2-averages the result with the existing dst contents — the caller
pre-loads dst with the list0 prediction; the avg_ call adds list1
per H.264 §8.4.2.3.1.
Scope (3 anchors, sets the pattern for the remaining 13 avg_
variants):
- 3 new kernel enums (AVG_MC20=31, AVG_MC02=32, AVG_MC22=33) → CPU.
- 3 NEON externs for the vendored ff_avg_h264_qpel8_{mc20,mc02,mc22}_neon.
- 3 CPU dispatches via existing DEFINE_QPEL_CPU_DISPATCH macro
(the macro is type-agnostic so it didn't need changes for avg_).
- 3 public dispatches via DEFINE_QPEL_DISPATCH macro.
- 3 recipe wrappers via DEFINE_QPEL_RECIPE macro.
- tests/h264_qpel8_avg_anchors_ref.c — per-cell helpers + L2 avg.
- Test harness: run_avg_qpel() seeds dst with random content so
the L2 averaging is actually exercised (not just put_-style
overwrite that would silently pass).
Verified on hertz:
$ ./build/test_api_h264 | tail -3
H.264 qpel avg_mc20: 2048/2048 bytes bit-exact (100.0000%)
H.264 qpel avg_mc02: 2048/2048 bytes bit-exact (100.0000%)
H.264 qpel avg_mc22: 2048/2048 bytes bit-exact (100.0000%)
All 3 anchors bit-exact PASS first try.
Why anchors only in this PR: the avg_ pattern is uniform across all
16 positions (each is just "put_ result + L2 with dst"). Landing
the anchors first confirms the macro pattern works for both put_
and avg_; the remaining 13 (avg_mc10/30/01/03 + avg_mc11..33) follow
the same template in a follow-up PR.
State of the qpel matrix after this PR:
put_ : 15 of 16 positions ✓ (mc00 is integer copy, no wrapper)
avg_ : 3 of 16 positions ✓ (mc20, mc02, mc22 anchors)
13 follow-up positions
This commit is contained in:
@@ -511,6 +511,27 @@ DECLARE_QPEL_DIAG(mc33)
|
||||
|
||||
#undef DECLARE_QPEL_DIAG
|
||||
|
||||
/* H.264 luma qpel avg_ biprediction anchors — 3 half-pel positions
|
||||
* (the put_ result is L2-averaged into the existing dst buffer per
|
||||
* H.264 §8.4.2.3.1). Caller is responsible for pre-loading dst with
|
||||
* the list0 prediction; the avg_ call adds list1.
|
||||
*
|
||||
* Same single-stride convention as put_; CPU NEON only for now.
|
||||
*/
|
||||
#define DECLARE_QPEL_AVG(name) \
|
||||
int daedalus_recipe_dispatch_h264_qpel_ ## name(daedalus_ctx *ctx, \
|
||||
uint8_t *dst, const uint8_t *src, size_t stride, \
|
||||
size_t n_blocks, const daedalus_h264_qpel_meta *meta); \
|
||||
int daedalus_dispatch_h264_qpel_ ## name(daedalus_ctx *ctx, daedalus_substrate sub, \
|
||||
uint8_t *dst, const uint8_t *src, size_t stride, \
|
||||
size_t n_blocks, const daedalus_h264_qpel_meta *meta);
|
||||
|
||||
DECLARE_QPEL_AVG(avg_mc20)
|
||||
DECLARE_QPEL_AVG(avg_mc02)
|
||||
DECLARE_QPEL_AVG(avg_mc22)
|
||||
|
||||
#undef DECLARE_QPEL_AVG
|
||||
|
||||
/* -------------------------------------------------------------------
|
||||
* Recipe query — what does the API recommend for each kernel?
|
||||
* ----------------------------------------------------------------- */
|
||||
@@ -545,6 +566,9 @@ typedef enum {
|
||||
DAEDALUS_KERNEL_H264_QPEL_MC31 = 28,
|
||||
DAEDALUS_KERNEL_H264_QPEL_MC32 = 29,
|
||||
DAEDALUS_KERNEL_H264_QPEL_MC33 = 30,
|
||||
DAEDALUS_KERNEL_H264_QPEL_AVG_MC20 = 31,
|
||||
DAEDALUS_KERNEL_H264_QPEL_AVG_MC02 = 32,
|
||||
DAEDALUS_KERNEL_H264_QPEL_AVG_MC22 = 33,
|
||||
} daedalus_kernel;
|
||||
|
||||
daedalus_substrate daedalus_recipe_substrate_for(daedalus_kernel k);
|
||||
|
||||
Reference in New Issue
Block a user