Stage 2 PR-A3b: real H.264 coefficients through daedalus-decoder, byte-exact
Final option-A deliverable. CLI now extracts real per-MB
coefficients from libavcodec via the inspection callback +
side-buffer (marfrit-packages 0016 + 0017), reconstructs the
pre-residual predicted samples P via inverse-of-IDCT-add, and
feeds daedalus-decoder with real (P, C, no edges). Daedalus
output BYTE-EXACT against libavcodec's pre-deblock AVFrame
across 5 frames at 320x240 and 3 frames at 1920x1088, all three
substrates (auto / cpu / qpu).
Path summary
------------
avctx->thread_count = 1 (single-threaded decode — 0017's
side buffer is per-H264Context;
multi-threaded would race)
avctx->skip_loop_filter = AVDISCARD_ALL (AVFrame stays pre-deblock so the
P-recovery subtraction is exact)
ff_h264_set_mb_inspect_cb (registers the callback)
Inspection callback (per MB, fires post-hl_decode_mb):
- Gate on IS_INTRA4x4 && !IS_8x8DCT && !IS_INTRA_PCM (skipped MBs
fall back to identity-passthrough in the main loop)
- Snapshot pre-deblock pixels from h->cur_pic.f->data[0]
- Read coefficients from h->mb_inspect_coeffs (= sl->mb copy, the
0017 side buffer)
- For each 4x4 block (16/MB in raster order, indexed via
raster_to_zscan[] to find its slot in the z-scan-ordered side
buffer): compute IDCT(C) using a transcribed H.264 C reference,
derive P = clip(pre_deblock - ((IDCT + 32) >> 6))
- Stash per-MB capture (P + C) for the main loop
Main loop:
- Default identity-passthrough (predicted = AVFrame pixels, coeffs = 0)
- For real-coeffs-valid MBs: override luma with captured P + C
- flush_frame, byte-exact compare against AVFrame
A diagnostic also asserts (silently when passing) that the
callback's pre_deblock snapshot equals AVFrame at each real-coeffs
MB position — i.e. h->cur_pic.f IS the eventual AVFrame buffer
under skip_loop_filter=AVDISCARD_ALL with thread_count=1.
Bug hunted in this PR
---------------------
Initial implementation transposed the coefficients from row-major
(sl->mb) to "column-major" (the layout that daedalus_decoder.h's
mb_input.coeffs docstring describes). This caused ~0.2% Y pixel
divergence on real streams (~150/frame at 320x240). Root cause
identified via a standalone /tmp/idct_compare.c harness running
daedalus's C ref IDCT and FFmpeg's reference C IDCT on identical
int16[16] inputs: outputs IDENTICAL. The two functions implement
the spec H.264 IDCT on the array regardless of layout
interpretation; the "column-major" label is decoration. Removed
the transpose; PR is now byte-exact.
Follow-up task #184: clarify daedalus_decoder.h's mb_input.coeffs
docstring so future integrators don't repeat this transpose
mistake.
Result on hertz (Pi 5 V3D 7.1)
------------------------------
testsrc2 I-only via libx264 -bf 0 -g 1:
320x240, 5 frames, substrate=auto: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=cpu: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=qpu: Y diff 0/76800, UV diff 0/38400 PASS
1920x1088, 3 frames, substrate=auto: Y diff 0/2088960, UV diff 0/1044480 PASS
Real-coeffs path engaged for 77-95 MBs per 320x240 frame and
598-643 MBs per 1080p frame (testsrc2 is mostly flat → many
Intra_16x16 MBs that fall back to identity passthrough; richer
content streams would engage real-coeffs more).
Followups
---------
- PR-A4: extend the gate to Intra_16x16 (chroma DC Hadamard +
Intra_16x16 luma DC Hadamard pre-pass) — currently ~30-60%
of MBs fall back to identity-passthrough due to this.
- PR-A5: extend to 8x8 transform (separate IDCT 8x8 dispatch
path on the daedalus-decoder side, similar plumbing).
- PR-A6: enable libavcodec's deblock (skip_loop_filter=AVDISCARD_NONE)
and have daedalus's deblock produce the post-deblock output
that matches AVFrame. Closes the loop on the full I-only
pipeline.
- Task #184: daedalus_decoder.h coeffs docstring clarification.
This commit is contained in:
@@ -195,6 +195,30 @@ if(DAEDALUS_BUILD_TOOLS)
|
||||
${DAEDALUS_FFMPEG_PREFIX}/lib/libswresample.a
|
||||
m z pthread)
|
||||
set(FFMPEG_CFLAGS_OTHER "-DDAEDALUS_HAVE_H264_MB_INSPECT_CB=1")
|
||||
|
||||
# PR-A3+ optional: also point at the patched FFmpeg SOURCE TREE
|
||||
# so the CLI can include libavcodec/h264dec.h directly and
|
||||
# dereference H264Context fields (the side-buffer mb_inspect_coeffs
|
||||
# added in marfrit-packages patch 0017, the cur_pic.f for
|
||||
# pre-deblock pixel access, etc.). When set, the internal-header
|
||||
# include codepath is compiled in.
|
||||
set(DAEDALUS_FFMPEG_SRC "" CACHE PATH
|
||||
"Path to patched FFmpeg source tree (= path to FFmpeg/ checkout where build was run; contains config.h + libavcodec/h264dec.h). Empty = h264dec.h includes are disabled.")
|
||||
if(DAEDALUS_FFMPEG_SRC)
|
||||
message(STATUS "daedalus_decode_h264: FFmpeg source at ${DAEDALUS_FFMPEG_SRC}")
|
||||
# IMPORTANT: source tree FIRST in -I order — its
|
||||
# libavutil/common.h does #include "intmath.h" with HAVE_AV_CONFIG_H,
|
||||
# which resolves to libavutil/intmath.h (in the source tree
|
||||
# only — that header isn't installed since it's arch-dispatched).
|
||||
# The installed-prefix include path's libavutil/common.h is the
|
||||
# same file textually but resolves "intmath.h" against the
|
||||
# install dir where it doesn't exist.
|
||||
set(FFMPEG_INCLUDE_DIRS ${DAEDALUS_FFMPEG_SRC})
|
||||
set(FFMPEG_CFLAGS_OTHER
|
||||
"${FFMPEG_CFLAGS_OTHER} -DDAEDALUS_HAVE_H264_MB_INSPECT_COEFFS=1 -DHAVE_AV_CONFIG_H")
|
||||
# Convert space-separated string to list (CMake idiom for compile flags).
|
||||
separate_arguments(FFMPEG_CFLAGS_OTHER UNIX_COMMAND "${FFMPEG_CFLAGS_OTHER}")
|
||||
endif()
|
||||
else()
|
||||
pkg_check_modules(FFMPEG REQUIRED libavcodec libavformat libavutil)
|
||||
message(STATUS "daedalus_decode_h264: system FFmpeg (no inspection callback)")
|
||||
|
||||
Reference in New Issue
Block a user