Adds the "j position" 2D half-pel via cascaded H + V 6-tap lowpass
with intermediate 16-bit precision per H.264 §8.4.2.2.1. One of the
most common qpel positions in real H.264 streams — many encoders
emit 1/2-1/2 motion vectors as their best-RD choice.
Algorithmically distinct from the 1D mc20/mc02 siblings:
- Horizontal 6-tap produces 13 rows of int16 intermediate (no
per-stage clip/round — full precision retained).
- Vertical 6-tap on the intermediate, then +512 >> 10 (the
double-shift compensates for both 6-tap scalings) + clip255.
The intermediate-precision requirement means the C reference can't
just be "call mc20 then mc02" — that would double-clip and produce
the wrong result. The 13-row int16 tmp[] buffer is the central
invariant.
Scope (same pattern as mc02 PR #15):
- Public API: daedalus_dispatch_h264_qpel_mc22 + recipe wrapper.
- Internal: dispatch_h264_qpel_mc22_cpu calling
ff_put_h264_qpel8_mc22_neon.
- Recipe table: DAEDALUS_KERNEL_H264_QPEL_MC22 = 18 → CPU.
- C reference: tests/h264_qpel8_mc22_ref.c — explicit tmp[13][8]
int16 staging buffer; spec-derived shifts and rounding.
- Test: test_qpel_mc22 in test_api_h264, 8 tiles at 16×16 with
output positioned at (SRC_ROW=3, SRC_COL=3) so the kernel's
[-2 .. +10] read window stays in-tile.
Verified on hertz:
$ ./build/test_api_h264 | tail -5
H.264 deblock chroma v intra: 256/256 bytes bit-exact (100.0000%)
H.264 deblock chroma h intra: 256/256 bytes bit-exact (100.0000%)
H.264 qpel mc20: 1024/1024 bytes bit-exact (100.0000%)
H.264 qpel mc02: 2048/2048 bytes bit-exact (100.0000%)
H.264 qpel mc22: 2048/2048 bytes bit-exact (100.0000%)
All 13 H.264 kernels in api_smoke now bit-exact PASS.
mc22 being right first try is meaningful — the +512 >> 10 scaling
+ int16 intermediate sequence has multiple sign/shift/clip pitfalls
and any of them would surface on random inputs immediately.
Coverage matrix update:
put_ mc20 ✓ (QPU+CPU) put_ mc02 ✓ (CPU) put_ mc22 ✓ (CPU)
→ 12 single put_ positions still missing (¼/¾ + HV combos with
L2 averaging).