854bdeda20
Adds the H.264 §8.5.11.1 chroma DC Hadamard transform. In 4:2:0
chroma, the four DC coefficients (one from each chroma 4x4 AC block
within an MB) go through a 2x2 Hadamard before quant-scaling and
before being added back to each block's [0,0] coefficient prior to
the 4x4 AC IDCT.
This PR ships the pure Hadamard transform:
f[0,0] = c[0,0] + c[0,1] + c[1,0] + c[1,1]
f[0,1] = c[0,0] - c[0,1] + c[1,0] - c[1,1]
f[1,0] = c[0,0] + c[0,1] - c[1,0] - c[1,1]
f[1,1] = c[0,0] - c[0,1] - c[1,0] + c[1,1]
implemented as the 2-stage row+col butterfly (1:1 with the NEON
SIMD shape upstream). Operates in-place on int16[4].
What this does NOT do (deferred to caller-side composition):
- QP-dependent scaling per §8.5.11.2. The scale depends on
QP_C (with chroma_qp_offset adjustment), so the formula has
branches (>=6 vs <6) and looks up LevelScale4x4 table values.
The libavcodec intercept patch composes Hadamard + scale +
shift itself since the scale shape varies by codec-level
context (slice header chroma_qp_offset, PPS chroma_qp_offset,
second_chroma_qp_offset for the chroma_qp_index_offset).
- Inverse transform (decode-time used for the FORWARD direction
is the same Hadamard up to scaling, but conceptually the spec
distinguishes them in §8.5.11; we expose only the matrix).
Test design (tests/test_chroma_dc_hadamard.c):
7 cases, all spec-derived hand-computations:
- all-uniform 5 → [20, 0, 0, 0]
- col gradient [0,10,0,10] → [20, -20, 0, 0]
- row gradient [0,0,10,10] → [20, 0, -20, 0]
- anti-diagonal [10,0,0,10] → [20, 0, 0, 20]
- asymmetric [1,2,3,4] → [10, -2, -4, 0]
- sign-alternating [-5,5,-5,5] → [0, -20, 0, 0]
- double-Hadamard invariant: H·H = 4·I, so applying twice
gives [4*c[0], 4*c[1], 4*c[2], 4*c[3]] for any input.
The double-Hadamard test is the strongest correctness gate: any
single sign error in the butterfly would break the H·H = 4·I
algebraic property, surfacing immediately. All 7 PASS first try.
Verified on hertz:
$ ./build/test_chroma_dc_hadamard
all-uniform 5 PASS
col gradient [0,10,0,10] PASS
row gradient [0,0,10,10] PASS
anti-diagonal [10,0,0,10] PASS
asymmetric [1,2,3,4] PASS
sign-alternating [-5,5,-5,5] PASS
double-Hadamard = 4*orig PASS
ALL chroma DC Hadamard tests PASS
With this primitive the H.264 8-bit 4:2:0 pixel-math primitive
matrix is complete in fourier:
- IDCT 4x4 (luma + chroma) ✓
- IDCT 8x8 (luma, High profile) ✓
- Chroma DC Hadamard 2x2 ✓ (this PR)
- Deblock (8 variants) ✓
- Intra prediction (26 modes) ✓
- MC qpel (30 dispatches) ✓
What remains for the libavcodec intercept patch: CABAC/CAVLC entropy
decode, SPS/PPS parsing, slice header parsing, MB type / QP / CBP /
intra mode prediction. All of that lives at the intercept layer
(it's spec-derived from the bitstream syntax, not pixel-math); the
intercept patch will call into these fourier primitives once the
metadata is decoded.
54 lines
1.9 KiB
C
54 lines
1.9 KiB
C
/*
|
|
* Standalone bit-exact C reference for the H.264 chroma DC 2x2
|
|
* Hadamard transform (per H.264 §8.5.11.1).
|
|
*
|
|
* In 4:2:0 chroma, the four DC coefficients (one from each chroma
|
|
* 4x4 AC block within an MB) are arranged into a 2x2 block:
|
|
*
|
|
* c[0,0] c[0,1] block (0,0) DC block (0,1) DC
|
|
* c[1,0] c[1,1] block (1,0) DC block (1,1) DC
|
|
*
|
|
* The 2x2 Hadamard transform:
|
|
*
|
|
* f[0,0] = c[0,0] + c[0,1] + c[1,0] + c[1,1]
|
|
* f[0,1] = c[0,0] - c[0,1] + c[1,0] - c[1,1]
|
|
* f[1,0] = c[0,0] + c[0,1] - c[1,0] - c[1,1]
|
|
* f[1,1] = c[0,0] - c[0,1] - c[1,0] + c[1,1]
|
|
*
|
|
* Equivalently expressed as 2-stage butterflies (row then col), which
|
|
* the NEON impl uses for SIMD friendliness — we present that form
|
|
* here too so the QPU/NEON ports are 1:1.
|
|
*
|
|
* Output f[] replaces the input c[]. The QP-dependent scaling per
|
|
* §8.5.11.2 happens AFTER this primitive — the intercept patch
|
|
* composes Hadamard + LevelScale + shift itself, since the scaling
|
|
* shape depends on QP and on whether we're in the chroma_qp_offset
|
|
* adjustment regime.
|
|
*
|
|
* Input/output layout:
|
|
* c[0..3] in row-major order: [c[0,0], c[0,1], c[1,0], c[1,1]]
|
|
*
|
|
* License: BSD-2-Clause. Algorithm is in the H.264 spec.
|
|
*/
|
|
#include <stdint.h>
|
|
|
|
void daedalus_h264_chroma_dc_hadamard_2x2_ref(int16_t c[4])
|
|
{
|
|
/* Stage 1: butterfly along rows.
|
|
* t[0] = c[0,0] + c[0,1] = c[0] + c[1]
|
|
* t[1] = c[0,0] - c[0,1] = c[0] - c[1]
|
|
* t[2] = c[1,0] + c[1,1] = c[2] + c[3]
|
|
* t[3] = c[1,0] - c[1,1] = c[2] - c[3]
|
|
*/
|
|
int t0 = c[0] + c[1];
|
|
int t1 = c[0] - c[1];
|
|
int t2 = c[2] + c[3];
|
|
int t3 = c[2] - c[3];
|
|
|
|
/* Stage 2: butterfly along cols. */
|
|
c[0] = (int16_t)(t0 + t2); /* f[0,0] = t0+t2 = sum of all 4 */
|
|
c[1] = (int16_t)(t1 + t3); /* f[0,1] = (c0-c1) + (c2-c3) */
|
|
c[2] = (int16_t)(t0 - t2); /* f[1,0] = (c0+c1) - (c2+c3) */
|
|
c[3] = (int16_t)(t1 - t3); /* f[1,1] = (c0-c1) - (c2-c3) */
|
|
}
|