h264: qpel avg — 12 remaining variants (closes the matrix) #20

Merged
marfrit merged 1 commits from noether/h264-qpel-avg-rest into main 2026-05-25 07:33:06 +00:00
Owner

Closes the H.264 8x8 qpel buildout. Adds the 12 remaining avg_ biprediction positions (4 quarter-axis + 8 diagonals) via the established macro template.

12 new kernel enums (AVG_MC10..MC33 = 34..45). All wiring via existing DEFINE_QPEL_* macros from prior PRs — no new infrastructure. References via two parametric macros (DEFINE_AVG_QUARTER / DEFINE_AVG_DIAG) collapse ~300 LOC of boilerplate into ~80.

All 12 positions PASS 2048/2048 bytes bit-exact first try (after one self-inflicted macro typo caught at compile time).

Final qpel matrix state: put_ 15/16 and avg_ 15/16 (mc00 excluded — integer copy). Every B-slice biprediction case is now serviceable. QPU shaders remain mc20-only (cycle 9); the other 29 positions are CPU NEON via vendored FFmpeg.

Closes the H.264 8x8 qpel buildout. Adds the 12 remaining avg_ biprediction positions (4 quarter-axis + 8 diagonals) via the established macro template. 12 new kernel enums (AVG_MC10..MC33 = 34..45). All wiring via existing DEFINE_QPEL_* macros from prior PRs — no new infrastructure. References via two parametric macros (DEFINE_AVG_QUARTER / DEFINE_AVG_DIAG) collapse ~300 LOC of boilerplate into ~80. **All 12 positions PASS 2048/2048 bytes bit-exact first try** (after one self-inflicted macro typo caught at compile time). **Final qpel matrix state**: put_ 15/16 and avg_ 15/16 (mc00 excluded — integer copy). Every B-slice biprediction case is now serviceable. QPU shaders remain mc20-only (cycle 9); the other 29 positions are CPU NEON via vendored FFmpeg.
marfrit added 1 commit 2026-05-25 06:49:57 +00:00
Closes the H.264 8x8 qpel buildout.  Adds the remaining 12 avg_
biprediction positions:
  4 quarter-axis: avg_mc{10,30,01,03}
  8 diagonals  : avg_mc{11,12,13,21,23,31,32,33}

Each follows the established pattern: same half-pel formula as the
put_ sibling, then L2 average with the existing dst contents per
H.264 §8.4.2.3.1.

Scope:
  - 12 new kernel enums (MC10..MC33 avg_ = 34..45) → CPU.
  - 12 NEON externs for the vendored ff_avg_h264_qpel8_mc*_neon.
  - 12 CPU dispatches via existing DEFINE_QPEL_CPU_DISPATCH macro.
  - 12 public dispatches via DEFINE_QPEL_DISPATCH macro.
  - 12 recipe wrappers via DEFINE_QPEL_RECIPE macro.
  - 12 header decls via DECLARE_QPEL_AVG macro.
  - tests/h264_qpel8_avg_rest_ref.c — references via two parametric
    macros: DEFINE_AVG_QUARTER for the 4 ¼-pel L2 forms,
    DEFINE_AVG_DIAG for the 8 two-half-pel-avg forms.
  - Test harness extended with a RUN(MC) sub-macro that derives both
    the ref name and dispatch name from the bare mcXX.  (The ref
    is daedalus_avg_h264_qpel8_<mc>_ref; the dispatch is
    daedalus_recipe_dispatch_h264_qpel_avg_<mc>.  Macro had a typo
    on first try that duplicated "avg_" in the ref name — caught at
    compile, fixed.)

Verified on hertz:

  $ ./build/test_api_h264 | tail -12
    H.264 qpel avg_mc10: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc30: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc01: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc03: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc11: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc12: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc13: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc21: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc23: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc31: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc32: 2048/2048 bytes bit-exact (100.0000%)
    H.264 qpel avg_mc33: 2048/2048 bytes bit-exact (100.0000%)

  All 12 new positions bit-exact PASS first try.

Final qpel matrix state:
  put_:  mc00 (none — integer copy)
         mc01 ✓  mc02 ✓  mc03 ✓
         mc10 ✓  mc11 ✓  mc12 ✓  mc13 ✓
         mc20 ✓ (QPU+CPU)  mc21 ✓  mc22 ✓  mc23 ✓
         mc30 ✓  mc31 ✓  mc32 ✓  mc33 ✓
  avg_:  same 15-of-16 coverage, all CPU.

Every B-slice biprediction case the libavcodec intercept can throw
at us is now serviceable.  QPU shaders remain mc20-only (cycle 9);
the other 29 positions are CPU NEON.  Whether to write more QPU
shaders depends on real perf measurement — at NEON ~10 ns per
8x8 block, full qpel coverage at 1080p is ~2-3 ms of total work,
well inside budget.
marfrit merged commit 1ee8b1c0ab into main 2026-05-25 07:33:06 +00:00
marfrit deleted branch noether/h264-qpel-avg-rest 2026-05-25 07:33:06 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/daedalus-fourier#20