h264: qpel avg anchors (avg_mc20/02/22, biprediction support) #19

Merged
marfrit merged 1 commits from noether/h264-qpel-avg-anchors into main 2026-05-25 06:45:37 +00:00
Owner

Begins the avg_ qpel buildout for B-slice biprediction. Each avg_ form computes the same half-pel as its put_ sibling, then L2-averages with existing dst contents per H.264 §8.4.2.3.1. Caller pre-loads dst with list0 prediction; avg_ call adds list1.

3 new kernel enums (AVG_MC20=31, AVG_MC02=32, AVG_MC22=33). Existing macros (DEFINE_QPEL_CPU_DISPATCH, DEFINE_QPEL_DISPATCH, DEFINE_QPEL_RECIPE) handle both put_ and avg_ without changes.

New test harness run_avg_qpel() seeds dst with random content so the L2 averaging is actually exercised (not just put_-style overwrite that would silently pass).

All 3 anchors PASS 2048/2048 bytes bit-exact first try.

Qpel matrix state:

  • put_: 15 of 16 positions ✓ (mc00 is integer copy)
  • avg_: 3 of 16 positions ✓ (this PR — anchors)
  • 13 avg_ follow-ups: avg_mc10/30/01/03 + avg_mc11..33 (same template)
Begins the avg_ qpel buildout for B-slice biprediction. Each avg_ form computes the same half-pel as its put_ sibling, then L2-averages with existing dst contents per H.264 §8.4.2.3.1. Caller pre-loads dst with list0 prediction; avg_ call adds list1. 3 new kernel enums (AVG_MC20=31, AVG_MC02=32, AVG_MC22=33). Existing macros (DEFINE_QPEL_CPU_DISPATCH, DEFINE_QPEL_DISPATCH, DEFINE_QPEL_RECIPE) handle both put_ and avg_ without changes. New test harness `run_avg_qpel()` seeds dst with random content so the L2 averaging is actually exercised (not just put_-style overwrite that would silently pass). **All 3 anchors PASS 2048/2048 bytes bit-exact first try.** Qpel matrix state: - put_: 15 of 16 positions ✓ (mc00 is integer copy) - avg_: 3 of 16 positions ✓ (this PR — anchors) - 13 avg_ follow-ups: avg_mc10/30/01/03 + avg_mc11..33 (same template)
marfrit added 1 commit 2026-05-25 06:35:39 +00:00
Begins the avg_ qpel buildout for B-slice biprediction.  Each avg_
form computes the same half-pel formula as its put_ sibling, then
L2-averages the result with the existing dst contents — the caller
pre-loads dst with the list0 prediction; the avg_ call adds list1
per H.264 §8.4.2.3.1.

Scope (3 anchors, sets the pattern for the remaining 13 avg_
variants):
  - 3 new kernel enums (AVG_MC20=31, AVG_MC02=32, AVG_MC22=33) → CPU.
  - 3 NEON externs for the vendored ff_avg_h264_qpel8_{mc20,mc02,mc22}_neon.
  - 3 CPU dispatches via existing DEFINE_QPEL_CPU_DISPATCH macro
    (the macro is type-agnostic so it didn't need changes for avg_).
  - 3 public dispatches via DEFINE_QPEL_DISPATCH macro.
  - 3 recipe wrappers via DEFINE_QPEL_RECIPE macro.
  - tests/h264_qpel8_avg_anchors_ref.c — per-cell helpers + L2 avg.
  - Test harness: run_avg_qpel() seeds dst with random content so
    the L2 averaging is actually exercised (not just put_-style
    overwrite that would silently pass).

Verified on hertz:

  $ ./build/test_api_h264 | tail -3
    H.264 qpel avg_mc20: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc02: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc22: 2048/2048 bytes bit-exact (100.0000%)

  All 3 anchors bit-exact PASS first try.

Why anchors only in this PR: the avg_ pattern is uniform across all
16 positions (each is just "put_ result + L2 with dst").  Landing
the anchors first confirms the macro pattern works for both put_
and avg_; the remaining 13 (avg_mc10/30/01/03 + avg_mc11..33) follow
the same template in a follow-up PR.

State of the qpel matrix after this PR:
  put_ : 15 of 16 positions ✓ (mc00 is integer copy, no wrapper)
  avg_ :  3 of 16 positions ✓ (mc20, mc02, mc22 anchors)
        13 follow-up positions
marfrit merged commit 1cc0990c9f into main 2026-05-25 06:45:37 +00:00
marfrit deleted branch noether/h264-qpel-avg-anchors 2026-05-25 06:45:42 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/daedalus-fourier#19