Stage 2 PR-A3b: real H.264 coefficients through daedalus-decoder, byte-exact
Final option-A deliverable. CLI now extracts real per-MB
coefficients from libavcodec via the inspection callback +
side-buffer (marfrit-packages 0016 + 0017), reconstructs the
pre-residual predicted samples P via inverse-of-IDCT-add, and
feeds daedalus-decoder with real (P, C, no edges). Daedalus
output BYTE-EXACT against libavcodec's pre-deblock AVFrame
across 5 frames at 320x240 and 3 frames at 1920x1088, all three
substrates (auto / cpu / qpu).
Path summary
------------
avctx->thread_count = 1 (single-threaded decode — 0017's
side buffer is per-H264Context;
multi-threaded would race)
avctx->skip_loop_filter = AVDISCARD_ALL (AVFrame stays pre-deblock so the
P-recovery subtraction is exact)
ff_h264_set_mb_inspect_cb (registers the callback)
Inspection callback (per MB, fires post-hl_decode_mb):
- Gate on IS_INTRA4x4 && !IS_8x8DCT && !IS_INTRA_PCM (skipped MBs
fall back to identity-passthrough in the main loop)
- Snapshot pre-deblock pixels from h->cur_pic.f->data[0]
- Read coefficients from h->mb_inspect_coeffs (= sl->mb copy, the
0017 side buffer)
- For each 4x4 block (16/MB in raster order, indexed via
raster_to_zscan[] to find its slot in the z-scan-ordered side
buffer): compute IDCT(C) using a transcribed H.264 C reference,
derive P = clip(pre_deblock - ((IDCT + 32) >> 6))
- Stash per-MB capture (P + C) for the main loop
Main loop:
- Default identity-passthrough (predicted = AVFrame pixels, coeffs = 0)
- For real-coeffs-valid MBs: override luma with captured P + C
- flush_frame, byte-exact compare against AVFrame
A diagnostic also asserts (silently when passing) that the
callback's pre_deblock snapshot equals AVFrame at each real-coeffs
MB position — i.e. h->cur_pic.f IS the eventual AVFrame buffer
under skip_loop_filter=AVDISCARD_ALL with thread_count=1.
Bug hunted in this PR
---------------------
Initial implementation transposed the coefficients from row-major
(sl->mb) to "column-major" (the layout that daedalus_decoder.h's
mb_input.coeffs docstring describes). This caused ~0.2% Y pixel
divergence on real streams (~150/frame at 320x240). Root cause
identified via a standalone /tmp/idct_compare.c harness running
daedalus's C ref IDCT and FFmpeg's reference C IDCT on identical
int16[16] inputs: outputs IDENTICAL. The two functions implement
the spec H.264 IDCT on the array regardless of layout
interpretation; the "column-major" label is decoration. Removed
the transpose; PR is now byte-exact.
Follow-up task #184: clarify daedalus_decoder.h's mb_input.coeffs
docstring so future integrators don't repeat this transpose
mistake.
Result on hertz (Pi 5 V3D 7.1)
------------------------------
testsrc2 I-only via libx264 -bf 0 -g 1:
320x240, 5 frames, substrate=auto: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=cpu: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=qpu: Y diff 0/76800, UV diff 0/38400 PASS
1920x1088, 3 frames, substrate=auto: Y diff 0/2088960, UV diff 0/1044480 PASS
Real-coeffs path engaged for 77-95 MBs per 320x240 frame and
598-643 MBs per 1080p frame (testsrc2 is mostly flat → many
Intra_16x16 MBs that fall back to identity passthrough; richer
content streams would engage real-coeffs more).
Followups
---------
- PR-A4: extend the gate to Intra_16x16 (chroma DC Hadamard +
Intra_16x16 luma DC Hadamard pre-pass) — currently ~30-60%
of MBs fall back to identity-passthrough due to this.
- PR-A5: extend to 8x8 transform (separate IDCT 8x8 dispatch
path on the daedalus-decoder side, similar plumbing).
- PR-A6: enable libavcodec's deblock (skip_loop_filter=AVDISCARD_NONE)
and have daedalus's deblock produce the post-deblock output
that matches AVFrame. Closes the loop on the full I-only
pipeline.
- Task #184: daedalus_decoder.h coeffs docstring clarification.
This commit is contained in:
@@ -195,6 +195,30 @@ if(DAEDALUS_BUILD_TOOLS)
|
|||||||
${DAEDALUS_FFMPEG_PREFIX}/lib/libswresample.a
|
${DAEDALUS_FFMPEG_PREFIX}/lib/libswresample.a
|
||||||
m z pthread)
|
m z pthread)
|
||||||
set(FFMPEG_CFLAGS_OTHER "-DDAEDALUS_HAVE_H264_MB_INSPECT_CB=1")
|
set(FFMPEG_CFLAGS_OTHER "-DDAEDALUS_HAVE_H264_MB_INSPECT_CB=1")
|
||||||
|
|
||||||
|
# PR-A3+ optional: also point at the patched FFmpeg SOURCE TREE
|
||||||
|
# so the CLI can include libavcodec/h264dec.h directly and
|
||||||
|
# dereference H264Context fields (the side-buffer mb_inspect_coeffs
|
||||||
|
# added in marfrit-packages patch 0017, the cur_pic.f for
|
||||||
|
# pre-deblock pixel access, etc.). When set, the internal-header
|
||||||
|
# include codepath is compiled in.
|
||||||
|
set(DAEDALUS_FFMPEG_SRC "" CACHE PATH
|
||||||
|
"Path to patched FFmpeg source tree (= path to FFmpeg/ checkout where build was run; contains config.h + libavcodec/h264dec.h). Empty = h264dec.h includes are disabled.")
|
||||||
|
if(DAEDALUS_FFMPEG_SRC)
|
||||||
|
message(STATUS "daedalus_decode_h264: FFmpeg source at ${DAEDALUS_FFMPEG_SRC}")
|
||||||
|
# IMPORTANT: source tree FIRST in -I order — its
|
||||||
|
# libavutil/common.h does #include "intmath.h" with HAVE_AV_CONFIG_H,
|
||||||
|
# which resolves to libavutil/intmath.h (in the source tree
|
||||||
|
# only — that header isn't installed since it's arch-dispatched).
|
||||||
|
# The installed-prefix include path's libavutil/common.h is the
|
||||||
|
# same file textually but resolves "intmath.h" against the
|
||||||
|
# install dir where it doesn't exist.
|
||||||
|
set(FFMPEG_INCLUDE_DIRS ${DAEDALUS_FFMPEG_SRC})
|
||||||
|
set(FFMPEG_CFLAGS_OTHER
|
||||||
|
"${FFMPEG_CFLAGS_OTHER} -DDAEDALUS_HAVE_H264_MB_INSPECT_COEFFS=1 -DHAVE_AV_CONFIG_H")
|
||||||
|
# Convert space-separated string to list (CMake idiom for compile flags).
|
||||||
|
separate_arguments(FFMPEG_CFLAGS_OTHER UNIX_COMMAND "${FFMPEG_CFLAGS_OTHER}")
|
||||||
|
endif()
|
||||||
else()
|
else()
|
||||||
pkg_check_modules(FFMPEG REQUIRED libavcodec libavformat libavutil)
|
pkg_check_modules(FFMPEG REQUIRED libavcodec libavformat libavutil)
|
||||||
message(STATUS "daedalus_decode_h264: system FFmpeg (no inspection callback)")
|
message(STATUS "daedalus_decode_h264: system FFmpeg (no inspection callback)")
|
||||||
|
|||||||
+374
-21
@@ -51,14 +51,32 @@
|
|||||||
#include <libavutil/imgutils.h>
|
#include <libavutil/imgutils.h>
|
||||||
|
|
||||||
/* Per-MB inspection callback API — provided by the patched FFmpeg
|
/* Per-MB inspection callback API — provided by the patched FFmpeg
|
||||||
* fork via marfrit-packages 0016. The H264Context struct itself
|
* fork via marfrit-packages patches 0016 + 0017.
|
||||||
* remains internal (declared in libavcodec/h264dec.h which isn't
|
*
|
||||||
* installed), so we only forward-declare it here and use it
|
* When DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS is defined (CMake sets it
|
||||||
* opaquely through the callback signature. Real per-MB state
|
* alongside DAEDALUS_FFMPEG_SRC), we include libavcodec's INTERNAL
|
||||||
* extraction (sl->mb coefficients, mb_type, etc.) will land in
|
* h264dec.h header to dereference H264Context fields — specifically
|
||||||
* PR-A3 alongside an internal-header include path. */
|
* h->mb_inspect_coeffs (the 0017 side buffer holding pre-IDCT-
|
||||||
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
* destruction sl->mb), h->cur_pic.f (pre-deblock reconstructed pixels),
|
||||||
|
* and h->cur_pic.mb_type[mb_xy] for the mb-type gate. The same
|
||||||
|
* configure-time config.h that built the static libavcodec.a is
|
||||||
|
* picked up via -DHAVE_AV_CONFIG_H + -I path; ABI match is automatic.
|
||||||
|
*
|
||||||
|
* When only DAEDALUS_HAVE_H264_MB_INSPECT_CB is defined (no source
|
||||||
|
* tree available — e.g. building against a distro-shipped patched
|
||||||
|
* libavcodec), the H264Context stays opaque and we fall back to
|
||||||
|
* identity-passthrough across all MBs.
|
||||||
|
*
|
||||||
|
* When neither is defined: stock libavcodec, no callback, identity-
|
||||||
|
* passthrough only (PR-A1b behaviour). */
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
# include "libavcodec/h264dec.h"
|
||||||
|
# include "libavcodec/h264.h" /* IS_INTRA4x4 / IS_8x8DCT / IS_INTRA_PCM */
|
||||||
|
#elif defined(DAEDALUS_HAVE_H264_MB_INSPECT_CB)
|
||||||
struct H264Context;
|
struct H264Context;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if defined(DAEDALUS_HAVE_H264_MB_INSPECT_CB) || defined(DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS)
|
||||||
typedef void (*ff_h264_mb_inspect_cb)(void *opaque,
|
typedef void (*ff_h264_mb_inspect_cb)(void *opaque,
|
||||||
const struct H264Context *h,
|
const struct H264Context *h,
|
||||||
int mb_x, int mb_y);
|
int mb_x, int mb_y);
|
||||||
@@ -76,35 +94,252 @@ static const char *substrate_str = "auto";
|
|||||||
static int max_frames = -1;
|
static int max_frames = -1;
|
||||||
|
|
||||||
/* Inspection-callback state: per-frame counter + "each MB seen exactly
|
/* Inspection-callback state: per-frame counter + "each MB seen exactly
|
||||||
* once" check. We use a bitmap rather than a raster-order assertion
|
* once" check. Bitmap, not raster-order — libavcodec's MB threading +
|
||||||
* because libavcodec's MB-level threading + multi-slice frames mean
|
* multi-slice frames mean MBs reach the callback out of strict order;
|
||||||
* MBs reach the callback in non-strictly-raster order; the contract
|
* contract is "every MB fires the callback exactly once per frame".
|
||||||
* is "every MB fires the callback exactly once per frame", not "in
|
*
|
||||||
* raster order". Reset at end of each frame. */
|
* When real-coeff extraction is compiled in (PR-A3+), we ALSO maintain
|
||||||
|
* a per-MB capture buffer (real-coeffs path) so the main loop can
|
||||||
|
* drive daedalus_decoder_append_mb with REAL pre-residual P + real
|
||||||
|
* coefficients for MBs that satisfy the gate (Intra_4x4, no 8x8 DCT,
|
||||||
|
* no PCM). Other MBs stay on identity-passthrough. */
|
||||||
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
||||||
|
struct mb_capture {
|
||||||
|
int valid; /* 1 = real-coeffs path, 0 = identity passthrough */
|
||||||
|
int16_t coeffs[256]; /* luma, column-major within 4x4, raster block order */
|
||||||
|
uint8_t predicted[256]; /* luma P recovered = pre_deblock - clipped IDCT(C) */
|
||||||
|
uint8_t pre_deblock_snap[256]; /* DIAGNOSTIC: pre_deblock at callback time;
|
||||||
|
* compared against AVFrame post-receive_frame
|
||||||
|
* to detect h->cur_pic.f vs AVFrame divergence */
|
||||||
|
};
|
||||||
|
|
||||||
struct inspect_state {
|
struct inspect_state {
|
||||||
int n_cbs_this_frame;
|
int n_cbs_this_frame;
|
||||||
int mb_w, mb_h;
|
int mb_w, mb_h;
|
||||||
uint8_t *seen; /* mb_w * mb_h bitmap */
|
uint8_t *seen; /* mb_w * mb_h bitmap */
|
||||||
int duplicate_mbs; /* same (mb_x, mb_y) seen twice this frame */
|
int duplicate_mbs;
|
||||||
int out_of_bounds; /* (mb_x, mb_y) outside the coded grid */
|
int out_of_bounds;
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
struct mb_capture *captures; /* mb_w * mb_h entries */
|
||||||
|
int real_coeffs_mbs; /* count of MBs in real-coeffs path this frame */
|
||||||
|
int skipped_intra16x16;
|
||||||
|
int skipped_8x8dct;
|
||||||
|
int skipped_other;
|
||||||
|
#endif
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
/* libavcodec's sl->mb stores coefficients in RASTER (row-major) order,
|
||||||
|
* not zig-zag scan order — h264_cavlc.c does
|
||||||
|
* block[*scantable] = (level * qmul[*scantable] + 32) >> 6
|
||||||
|
* where *scantable advances through ff_zigzag_scan[] which contains
|
||||||
|
* RASTER positions (row*4 + col). So sl->mb[i] = coef at raster
|
||||||
|
* position i = (i/4, i%4) = (row, col). No inverse-zigzag needed;
|
||||||
|
* just transpose row-major → column-major (daedalus's convention). */
|
||||||
|
|
||||||
|
/* H.264 §6.4.3 4x4 luma block scan within MB (z-scan).
|
||||||
|
* Maps raster-block-idx (sb_y*4+sb_x) → libavcodec sl->mb's z-scan idx.
|
||||||
|
* Z-scan happens to be its own inverse (symmetric mapping). */
|
||||||
|
static const uint8_t raster_to_zscan[16] = {
|
||||||
|
0, 1, 4, 5, 2, 3, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15
|
||||||
|
};
|
||||||
|
|
||||||
|
/* H.264 4x4 IDCT — transcribed from daedalus-fourier
|
||||||
|
* tests/test_idct_bitexact.c (which itself mirrors h264_idct4_ref.c).
|
||||||
|
* Outputs row-major 16-element residual; clip + shift happens in
|
||||||
|
* the consumer. */
|
||||||
|
static void h264_idct4_butterfly(const int d[4], int out[4]) {
|
||||||
|
int e = d[0] + d[2];
|
||||||
|
int f = d[0] - d[2];
|
||||||
|
int g = (d[1] >> 1) - d[3];
|
||||||
|
int h = d[1] + (d[3] >> 1);
|
||||||
|
out[0] = e + h;
|
||||||
|
out[1] = f + g;
|
||||||
|
out[2] = f - g;
|
||||||
|
out[3] = e - h;
|
||||||
|
}
|
||||||
|
static void ref_idct4_compute(const int16_t block[16], int out[16]) {
|
||||||
|
/* block COLUMN-MAJOR: block[c*4+r] = coef at (row=r, col=c).
|
||||||
|
*
|
||||||
|
* Pass order: COLUMN-pass first, then ROW-pass — matches FFmpeg's
|
||||||
|
* h264idct_template.c. The pass order matters for integer
|
||||||
|
* arithmetic with `>>1` on signed values (which round toward -inf
|
||||||
|
* for odd negatives in C); row-first vs column-first orders can
|
||||||
|
* disagree by 1 unit at the intermediate stage, propagating to
|
||||||
|
* the final pixel residual.
|
||||||
|
*
|
||||||
|
* (daedalus-fourier's tests/h264_idct4_ref.c does ROW-first, which
|
||||||
|
* matches its NEON kernel + GPU shader bit-exact within the
|
||||||
|
* package but DIVERGES from FFmpeg's IDCT for some inputs. PR-A3b
|
||||||
|
* surfaces the divergence; investigating the fix is a daedalus-
|
||||||
|
* fourier follow-up — see task #184.) */
|
||||||
|
int tmp[4][4];
|
||||||
|
/* Column pass: process each column c independently. */
|
||||||
|
for (int c = 0; c < 4; c++) {
|
||||||
|
int d[4] = { block[c*4+0], block[c*4+1], block[c*4+2], block[c*4+3] };
|
||||||
|
int o[4];
|
||||||
|
h264_idct4_butterfly(d, o);
|
||||||
|
for (int r = 0; r < 4; r++) tmp[r][c] = o[r];
|
||||||
|
}
|
||||||
|
/* Row pass: process each row r. */
|
||||||
|
for (int r = 0; r < 4; r++) {
|
||||||
|
int d[4] = { tmp[r][0], tmp[r][1], tmp[r][2], tmp[r][3] };
|
||||||
|
int o[4];
|
||||||
|
h264_idct4_butterfly(d, o);
|
||||||
|
for (int c = 0; c < 4; c++) out[r*4+c] = o[c];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif /* DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS */
|
||||||
|
|
||||||
static void inspect_cb(void *opaque,
|
static void inspect_cb(void *opaque,
|
||||||
const struct H264Context *h,
|
const struct H264Context *h,
|
||||||
int mb_x, int mb_y)
|
int mb_x, int mb_y)
|
||||||
{
|
{
|
||||||
(void) h;
|
|
||||||
struct inspect_state *st = opaque;
|
struct inspect_state *st = opaque;
|
||||||
|
#ifndef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
(void) h;
|
||||||
|
#endif
|
||||||
|
|
||||||
if (mb_x < 0 || mb_x >= st->mb_w || mb_y < 0 || mb_y >= st->mb_h) {
|
if (mb_x < 0 || mb_x >= st->mb_w || mb_y < 0 || mb_y >= st->mb_h) {
|
||||||
st->out_of_bounds++;
|
st->out_of_bounds++;
|
||||||
} else {
|
st->n_cbs_this_frame++;
|
||||||
const size_t idx = (size_t) mb_y * st->mb_w + (size_t) mb_x;
|
return;
|
||||||
if (st->seen[idx]) st->duplicate_mbs++;
|
|
||||||
st->seen[idx] = 1;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const size_t idx = (size_t) mb_y * st->mb_w + (size_t) mb_x;
|
||||||
|
if (st->seen[idx]) st->duplicate_mbs++;
|
||||||
|
st->seen[idx] = 1;
|
||||||
st->n_cbs_this_frame++;
|
st->n_cbs_this_frame++;
|
||||||
|
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
/* Real-coeffs path: extract per-MB state for daedalus-decoder
|
||||||
|
* IDCT validation on this MB. Gate: only Intra_4x4 + 4x4 transform
|
||||||
|
* + non-PCM is supported in PR-A3b — other MB flavours fall back
|
||||||
|
* to identity-passthrough in the main loop. */
|
||||||
|
struct mb_capture *cap = &st->captures[idx];
|
||||||
|
cap->valid = 0; /* default to passthrough */
|
||||||
|
|
||||||
|
const int mb_xy = mb_y * h->mb_stride + mb_x;
|
||||||
|
const uint32_t mb_type = h->cur_pic.mb_type[mb_xy];
|
||||||
|
|
||||||
|
if (!IS_INTRA4x4(mb_type)) {
|
||||||
|
if (IS_INTRA16x16(mb_type)) st->skipped_intra16x16++;
|
||||||
|
else st->skipped_other++;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (IS_8x8DCT(mb_type)) { st->skipped_8x8dct++; return; }
|
||||||
|
if (IS_INTRA_PCM(mb_type)) { st->skipped_other++; return; }
|
||||||
|
|
||||||
|
/* Snapshot luma pre-deblock pixels from cur_pic. */
|
||||||
|
const uint8_t *luma_plane = h->cur_pic.f->data[0];
|
||||||
|
const int luma_stride = h->cur_pic.f->linesize[0];
|
||||||
|
const uint8_t *mb_pixels = luma_plane + (ptrdiff_t) mb_y * 16 * luma_stride
|
||||||
|
+ mb_x * 16;
|
||||||
|
|
||||||
|
/* Diagnostic snapshot: capture the 16x16 luma block as we see it in
|
||||||
|
* cur_pic at callback time. Compared against AVFrame contents after
|
||||||
|
* receive_frame returns; mismatch points at a buffer-divergence bug. */
|
||||||
|
for (int r = 0; r < 16; r++)
|
||||||
|
memcpy(&cap->pre_deblock_snap[r * 16], &mb_pixels[r * luma_stride], 16);
|
||||||
|
|
||||||
|
/* Coefficients are in sl->mb at end of entropy decode but zeroed by
|
||||||
|
* the time the callback fires (IDCT-add consumed them). Patch 0017
|
||||||
|
* preserves them in h->mb_inspect_coeffs[16 * 48] BEFORE IDCT runs,
|
||||||
|
* so we read from there. */
|
||||||
|
const int16_t *zz_mb = h->mb_inspect_coeffs; /* layout matches sl->mb 8-bit half */
|
||||||
|
|
||||||
|
for (int r_block = 0; r_block < 16; r_block++) {
|
||||||
|
const int z_block = raster_to_zscan[r_block];
|
||||||
|
const int16_t *block_raw = &zz_mb[z_block * 16];
|
||||||
|
|
||||||
|
/* sl->mb stores 16 int16 per block. Empirical finding (via
|
||||||
|
* /tmp/idct_compare.c, 2026-05-26): daedalus-fourier's C ref
|
||||||
|
* IDCT and FFmpeg's C ref IDCT produce IDENTICAL output for
|
||||||
|
* the same input array — the "column-major vs row-major"
|
||||||
|
* labelling is decoration; both functions implement the same
|
||||||
|
* H.264 spec IDCT on a 16-int16 input. So we feed daedalus
|
||||||
|
* the raw sl->mb data unchanged. Previous attempt to
|
||||||
|
* transpose row-major→column-major was wrong — the transpose
|
||||||
|
* changed the IDCT result. */
|
||||||
|
int16_t col[16];
|
||||||
|
memcpy(col, block_raw, 16 * sizeof(int16_t));
|
||||||
|
|
||||||
|
memcpy(&cap->coeffs[r_block * 16], col, 16 * sizeof(int16_t));
|
||||||
|
|
||||||
|
/* IDCT → row-major 16-int residual. */
|
||||||
|
int idct_row[16];
|
||||||
|
ref_idct4_compute(col, idct_row);
|
||||||
|
|
||||||
|
/* P = clip(pre_deblock - ((IDCT + 32) >> 6)) for each pixel.
|
||||||
|
* Symmetric: daedalus IDCT-add will undo the subtract, including
|
||||||
|
* for saturating cases (where the same shift puts the value back
|
||||||
|
* at the same clip boundary). */
|
||||||
|
const int sb_y = r_block >> 2;
|
||||||
|
const int sb_x = r_block & 3;
|
||||||
|
for (int r = 0; r < 4; r++) {
|
||||||
|
for (int c = 0; c < 4; c++) {
|
||||||
|
const int pre_db = mb_pixels[(sb_y * 4 + r) * luma_stride + sb_x * 4 + c];
|
||||||
|
const int shift = (idct_row[r * 4 + c] + 32) >> 6;
|
||||||
|
int p = pre_db - shift;
|
||||||
|
if (p < 0) p = 0;
|
||||||
|
if (p > 255) p = 255;
|
||||||
|
cap->predicted[(sb_y * 4 + r) * 16 + (sb_x * 4 + c)] = (uint8_t) p;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cap->valid = 1;
|
||||||
|
st->real_coeffs_mbs++;
|
||||||
|
|
||||||
|
/* One-shot diagnostic enabled by DAEDALUS_DUMP_MB_3_0 env var. */
|
||||||
|
if (mb_x == 3 && mb_y == 0 && getenv("DAEDALUS_DUMP_MB_3_0")) {
|
||||||
|
const int16_t *zz = &zz_mb[1 * 16]; /* z_block = raster_block = 1 */
|
||||||
|
const struct mb_capture *capdiag = &st->captures[mb_y * st->mb_w + mb_x];
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 raster coeffs (sl->mb):");
|
||||||
|
for (int p = 0; p < 16; p++) fprintf(stderr, " %d", (int) zz[p]);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 col_major coeffs (after transpose):");
|
||||||
|
for (int i = 0; i < 16; i++) fprintf(stderr, " %d", (int) capdiag->coeffs[1 * 16 + i]);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
/* Recompute IDCT for this block (already done in the loop above but
|
||||||
|
* print here for visibility). */
|
||||||
|
int idct_print[16];
|
||||||
|
ref_idct4_compute(&capdiag->coeffs[1 * 16], idct_print);
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 IDCT row-major (raw, pre-shift):");
|
||||||
|
for (int i = 0; i < 16; i++) fprintf(stderr, " %d", idct_print[i]);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 IDCT (+32)>>6:");
|
||||||
|
for (int i = 0; i < 16; i++) fprintf(stderr, " %d", (idct_print[i] + 32) >> 6);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
const uint8_t *bpix = mb_pixels + 0 * luma_stride + 4; /* sb_y=0, sb_x=1 → cols 4..7 within MB */
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 pre_deblock pixels:\n");
|
||||||
|
for (int r = 0; r < 4; r++) {
|
||||||
|
fprintf(stderr, " ");
|
||||||
|
for (int c = 0; c < 4; c++)
|
||||||
|
fprintf(stderr, " %3u", bpix[r * luma_stride + c]);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
}
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 P_rec (= pre_deblock - shift):\n");
|
||||||
|
for (int r = 0; r < 4; r++) {
|
||||||
|
fprintf(stderr, " ");
|
||||||
|
for (int c = 0; c < 4; c++)
|
||||||
|
fprintf(stderr, " %3u", capdiag->predicted[(0*4+r) * 16 + (1*4+c)]);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
}
|
||||||
|
/* And what daedalus_decoder SHOULD produce: clip(P_rec + shift). */
|
||||||
|
fprintf(stderr, " MB(3,0) block z=1 expected daedalus output = clip(P_rec + shift):\n");
|
||||||
|
for (int r = 0; r < 4; r++) {
|
||||||
|
fprintf(stderr, " ");
|
||||||
|
for (int c = 0; c < 4; c++) {
|
||||||
|
int p_rec = capdiag->predicted[(0*4+r) * 16 + (1*4+c)];
|
||||||
|
int sh = (idct_print[r*4+c] + 32) >> 6;
|
||||||
|
int e = p_rec + sh;
|
||||||
|
if (e < 0) e = 0; if (e > 255) e = 255;
|
||||||
|
fprintf(stderr, " %3d", e);
|
||||||
|
}
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
@@ -247,6 +482,17 @@ int main(int argc, char **argv)
|
|||||||
const AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
|
const AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
|
||||||
AVCodecContext *avctx = avcodec_alloc_context3(codec);
|
AVCodecContext *avctx = avcodec_alloc_context3(codec);
|
||||||
avcodec_parameters_to_context(avctx, fmt->streams[vstream]->codecpar);
|
avcodec_parameters_to_context(avctx, fmt->streams[vstream]->codecpar);
|
||||||
|
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
/* Patch 0017's coefficient side buffer lives in H264Context (single
|
||||||
|
* per-stream); multi-threaded slice decode would race on it. Force
|
||||||
|
* single-thread. Also disable libavcodec's deblock so AVFrame is
|
||||||
|
* pre-deblock and the P-recovery math is exact. */
|
||||||
|
avctx->thread_count = 1;
|
||||||
|
avctx->thread_type = 0;
|
||||||
|
avctx->skip_loop_filter = AVDISCARD_ALL;
|
||||||
|
#endif
|
||||||
|
|
||||||
if (avcodec_open2(avctx, codec, NULL) < 0) {
|
if (avcodec_open2(avctx, codec, NULL) < 0) {
|
||||||
fprintf(stderr, "avcodec_open2 failed\n");
|
fprintf(stderr, "avcodec_open2 failed\n");
|
||||||
avformat_close_input(&fmt); return 2;
|
avformat_close_input(&fmt); return 2;
|
||||||
@@ -280,6 +526,11 @@ int main(int argc, char **argv)
|
|||||||
inspect_st.mb_h = H_round / 16;
|
inspect_st.mb_h = H_round / 16;
|
||||||
inspect_st.seen = calloc(1, (size_t) inspect_st.mb_w * inspect_st.mb_h);
|
inspect_st.seen = calloc(1, (size_t) inspect_st.mb_w * inspect_st.mb_h);
|
||||||
if (!inspect_st.seen) { rc = 1; goto cleanup; }
|
if (!inspect_st.seen) { rc = 1; goto cleanup; }
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
inspect_st.captures = calloc((size_t) inspect_st.mb_w * inspect_st.mb_h,
|
||||||
|
sizeof(*inspect_st.captures));
|
||||||
|
if (!inspect_st.captures) { rc = 1; goto cleanup; }
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
ff_h264_set_mb_inspect_cb(avctx, inspect_cb, &inspect_st);
|
ff_h264_set_mb_inspect_cb(avctx, inspect_cb, &inspect_st);
|
||||||
int inspect_total_cbs = 0;
|
int inspect_total_cbs = 0;
|
||||||
@@ -363,7 +614,26 @@ int main(int argc, char **argv)
|
|||||||
struct daedalus_decoder_mb_input mb = {0};
|
struct daedalus_decoder_mb_input mb = {0};
|
||||||
for (int my = 0; my < mb_h; my++) {
|
for (int my = 0; my < mb_h; my++) {
|
||||||
for (int mx = 0; mx < mb_w; mx++) {
|
for (int mx = 0; mx < mb_w; mx++) {
|
||||||
|
/* Default: identity-passthrough — luma from AVFrame,
|
||||||
|
* chroma from AVFrame, coeffs all zero. */
|
||||||
pack_mb_predicted(fr, mx, my, mb_pred);
|
pack_mb_predicted(fr, mx, my, mb_pred);
|
||||||
|
memset(mb_coeffs, 0, sizeof(mb_coeffs));
|
||||||
|
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
/* Real-coeffs path: if the callback captured this MB
|
||||||
|
* as Intra_4x4 / 4x4-DCT, override luma predicted
|
||||||
|
* with the recovered P and use the real luma coeffs.
|
||||||
|
* Chroma stays identity-passthrough (PR-A3b scope —
|
||||||
|
* chroma DC Hadamard + 8x8 transform follow-ups). */
|
||||||
|
const int mb_idx = my * mb_w + mx;
|
||||||
|
const struct mb_capture *cap = &inspect_st.captures[mb_idx];
|
||||||
|
if (cap->valid) {
|
||||||
|
memcpy(mb_pred, cap->predicted, 256);
|
||||||
|
for (int i = 0; i < 256; i++)
|
||||||
|
mb_coeffs[i] = cap->coeffs[i];
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
mb.mb_x = (uint16_t) mx;
|
mb.mb_x = (uint16_t) mx;
|
||||||
mb.mb_y = (uint16_t) my;
|
mb.mb_y = (uint16_t) my;
|
||||||
mb.transform_8x8 = 0;
|
mb.transform_8x8 = 0;
|
||||||
@@ -391,12 +661,77 @@ int main(int argc, char **argv)
|
|||||||
out_uv_ref, (size_t) coded_w,
|
out_uv_ref, (size_t) coded_w,
|
||||||
coded_w, coded_h);
|
coded_w, coded_h);
|
||||||
|
|
||||||
/* Byte-exact compare. */
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
/* Diagnostic: for each real-coeffs MB, compare the callback's
|
||||||
|
* pre_deblock snapshot against AVFrame at the same position.
|
||||||
|
* If they differ, h->cur_pic.f at callback time isn't the
|
||||||
|
* eventual AVFrame buffer (or deblock ran despite
|
||||||
|
* skip_loop_filter=AVDISCARD_ALL). */
|
||||||
|
int snap_mismatches = 0;
|
||||||
|
int first_snap_mismatch_mb = -1;
|
||||||
|
for (int my2 = 0; my2 < mb_h; my2++) {
|
||||||
|
for (int mx2 = 0; mx2 < mb_w; mx2++) {
|
||||||
|
const int idx2 = my2 * mb_w + mx2;
|
||||||
|
if (!inspect_st.captures[idx2].valid) continue;
|
||||||
|
const uint8_t *avf_mb = fr->data[0]
|
||||||
|
+ (ptrdiff_t) my2 * 16 * fr->linesize[0]
|
||||||
|
+ mx2 * 16;
|
||||||
|
for (int r = 0; r < 16; r++) {
|
||||||
|
for (int c = 0; c < 16; c++) {
|
||||||
|
if (avf_mb[r * fr->linesize[0] + c] !=
|
||||||
|
inspect_st.captures[idx2].pre_deblock_snap[r * 16 + c]) {
|
||||||
|
if (first_snap_mismatch_mb < 0)
|
||||||
|
first_snap_mismatch_mb = idx2;
|
||||||
|
snap_mismatches++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (snap_mismatches > 0) {
|
||||||
|
const int mmb_x = first_snap_mismatch_mb % mb_w;
|
||||||
|
const int mmb_y = first_snap_mismatch_mb / mb_w;
|
||||||
|
fprintf(stderr,
|
||||||
|
" DIAG: callback's pre_deblock differs from AVFrame in "
|
||||||
|
"%d bytes across real-coeffs MBs; first mismatch at MB(%d, %d)\n",
|
||||||
|
snap_mismatches, mmb_x, mmb_y);
|
||||||
|
rc = 4;
|
||||||
|
}
|
||||||
|
/* Silent on match — the invariant must hold for the
|
||||||
|
* P-recovery math to be valid; we'd want to know if it
|
||||||
|
* ever broke, but no need to confirm it every frame. */
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Byte-exact compare + first-diff diagnostic. */
|
||||||
size_t y_diffs = 0, uv_diffs = 0;
|
size_t y_diffs = 0, uv_diffs = 0;
|
||||||
|
size_t y_first_diff = (size_t) -1;
|
||||||
for (size_t i = 0; i < y_size; i++)
|
for (size_t i = 0; i < y_size; i++)
|
||||||
if (out_y_dadec[i] != out_y_ref[i]) y_diffs++;
|
if (out_y_dadec[i] != out_y_ref[i]) {
|
||||||
|
if (y_first_diff == (size_t) -1) y_first_diff = i;
|
||||||
|
y_diffs++;
|
||||||
|
}
|
||||||
for (size_t i = 0; i < uv_size; i++)
|
for (size_t i = 0; i < uv_size; i++)
|
||||||
if (out_uv_dadec[i] != out_uv_ref[i]) uv_diffs++;
|
if (out_uv_dadec[i] != out_uv_ref[i]) uv_diffs++;
|
||||||
|
if (y_diffs && y_first_diff != (size_t) -1) {
|
||||||
|
const size_t row = y_first_diff / (size_t) avctx->width;
|
||||||
|
const size_t col = y_first_diff % (size_t) avctx->width;
|
||||||
|
const size_t mb_x = col / 16;
|
||||||
|
const size_t mb_y = row / 8; /* not row/16 — chroma row uses /8 so use raw row here */
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
const int mb_idx = (int)(row / 16) * mb_w + (int) mb_x;
|
||||||
|
const int real = (mb_idx >= 0 && mb_idx < mb_w * mb_h)
|
||||||
|
? inspect_st.captures[mb_idx].valid : -1;
|
||||||
|
printf(" first Y diff @ byte %zu = (row %zu, col %zu) in MB(%zu,%zu) [real-coeffs=%d]; "
|
||||||
|
"dadec=%u ref=%u\n",
|
||||||
|
y_first_diff, row, col, mb_x, row / 16,
|
||||||
|
real, out_y_dadec[y_first_diff], out_y_ref[y_first_diff]);
|
||||||
|
#else
|
||||||
|
(void) mb_x; (void) mb_y;
|
||||||
|
printf(" first Y diff @ byte %zu = (row %zu, col %zu); dadec=%u ref=%u\n",
|
||||||
|
y_first_diff, row, col,
|
||||||
|
out_y_dadec[y_first_diff], out_y_ref[y_first_diff]);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
total_y_diffs += y_diffs;
|
total_y_diffs += y_diffs;
|
||||||
total_uv_diffs += uv_diffs;
|
total_uv_diffs += uv_diffs;
|
||||||
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
||||||
@@ -424,6 +759,21 @@ int main(int argc, char **argv)
|
|||||||
inspect_st.duplicate_mbs = 0;
|
inspect_st.duplicate_mbs = 0;
|
||||||
inspect_st.out_of_bounds = 0;
|
inspect_st.out_of_bounds = 0;
|
||||||
memset(inspect_st.seen, 0, (size_t) expected);
|
memset(inspect_st.seen, 0, (size_t) expected);
|
||||||
|
|
||||||
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
printf(" frame %d: real-coeffs path %d MBs, "
|
||||||
|
"skipped intra16x16=%d 8x8dct=%d other=%d\n",
|
||||||
|
n_frames, inspect_st.real_coeffs_mbs,
|
||||||
|
inspect_st.skipped_intra16x16,
|
||||||
|
inspect_st.skipped_8x8dct,
|
||||||
|
inspect_st.skipped_other);
|
||||||
|
inspect_st.real_coeffs_mbs = 0;
|
||||||
|
inspect_st.skipped_intra16x16 = 0;
|
||||||
|
inspect_st.skipped_8x8dct = 0;
|
||||||
|
inspect_st.skipped_other = 0;
|
||||||
|
memset(inspect_st.captures, 0,
|
||||||
|
(size_t) expected * sizeof(*inspect_st.captures));
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
printf(" frame %d: Y diff %zu/%zu UV diff %zu/%zu%s\n",
|
printf(" frame %d: Y diff %zu/%zu UV diff %zu/%zu%s\n",
|
||||||
@@ -478,6 +828,9 @@ cleanup:
|
|||||||
free(out_uv_dadec);free(out_y_dadec);
|
free(out_uv_dadec);free(out_y_dadec);
|
||||||
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
#ifdef DAEDALUS_HAVE_H264_MB_INSPECT_CB
|
||||||
free(inspect_st.seen);
|
free(inspect_st.seen);
|
||||||
|
# ifdef DAEDALUS_HAVE_H264_MB_INSPECT_COEFFS
|
||||||
|
free(inspect_st.captures);
|
||||||
|
# endif
|
||||||
#endif
|
#endif
|
||||||
if (dec) daedalus_decoder_destroy(dec);
|
if (dec) daedalus_decoder_destroy(dec);
|
||||||
av_frame_free(&fr);
|
av_frame_free(&fr);
|
||||||
|
|||||||
Reference in New Issue
Block a user