1d651c95ec40e72de728dd90f8ea2a2ba761488f
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1d651c95ec | wip: PR-A6 deblock validation | ||
|
|
44e92fa3dc |
Stage 2 PR-A3b: real H.264 coefficients through daedalus-decoder, byte-exact
Final option-A deliverable. CLI now extracts real per-MB
coefficients from libavcodec via the inspection callback +
side-buffer (marfrit-packages 0016 + 0017), reconstructs the
pre-residual predicted samples P via inverse-of-IDCT-add, and
feeds daedalus-decoder with real (P, C, no edges). Daedalus
output BYTE-EXACT against libavcodec's pre-deblock AVFrame
across 5 frames at 320x240 and 3 frames at 1920x1088, all three
substrates (auto / cpu / qpu).
Path summary
------------
avctx->thread_count = 1 (single-threaded decode — 0017's
side buffer is per-H264Context;
multi-threaded would race)
avctx->skip_loop_filter = AVDISCARD_ALL (AVFrame stays pre-deblock so the
P-recovery subtraction is exact)
ff_h264_set_mb_inspect_cb (registers the callback)
Inspection callback (per MB, fires post-hl_decode_mb):
- Gate on IS_INTRA4x4 && !IS_8x8DCT && !IS_INTRA_PCM (skipped MBs
fall back to identity-passthrough in the main loop)
- Snapshot pre-deblock pixels from h->cur_pic.f->data[0]
- Read coefficients from h->mb_inspect_coeffs (= sl->mb copy, the
0017 side buffer)
- For each 4x4 block (16/MB in raster order, indexed via
raster_to_zscan[] to find its slot in the z-scan-ordered side
buffer): compute IDCT(C) using a transcribed H.264 C reference,
derive P = clip(pre_deblock - ((IDCT + 32) >> 6))
- Stash per-MB capture (P + C) for the main loop
Main loop:
- Default identity-passthrough (predicted = AVFrame pixels, coeffs = 0)
- For real-coeffs-valid MBs: override luma with captured P + C
- flush_frame, byte-exact compare against AVFrame
A diagnostic also asserts (silently when passing) that the
callback's pre_deblock snapshot equals AVFrame at each real-coeffs
MB position — i.e. h->cur_pic.f IS the eventual AVFrame buffer
under skip_loop_filter=AVDISCARD_ALL with thread_count=1.
Bug hunted in this PR
---------------------
Initial implementation transposed the coefficients from row-major
(sl->mb) to "column-major" (the layout that daedalus_decoder.h's
mb_input.coeffs docstring describes). This caused ~0.2% Y pixel
divergence on real streams (~150/frame at 320x240). Root cause
identified via a standalone /tmp/idct_compare.c harness running
daedalus's C ref IDCT and FFmpeg's reference C IDCT on identical
int16[16] inputs: outputs IDENTICAL. The two functions implement
the spec H.264 IDCT on the array regardless of layout
interpretation; the "column-major" label is decoration. Removed
the transpose; PR is now byte-exact.
Follow-up task #184: clarify daedalus_decoder.h's mb_input.coeffs
docstring so future integrators don't repeat this transpose
mistake.
Result on hertz (Pi 5 V3D 7.1)
------------------------------
testsrc2 I-only via libx264 -bf 0 -g 1:
320x240, 5 frames, substrate=auto: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=cpu: Y diff 0/76800, UV diff 0/38400 PASS
320x240, 5 frames, substrate=qpu: Y diff 0/76800, UV diff 0/38400 PASS
1920x1088, 3 frames, substrate=auto: Y diff 0/2088960, UV diff 0/1044480 PASS
Real-coeffs path engaged for 77-95 MBs per 320x240 frame and
598-643 MBs per 1080p frame (testsrc2 is mostly flat → many
Intra_16x16 MBs that fall back to identity passthrough; richer
content streams would engage real-coeffs more).
Followups
---------
- PR-A4: extend the gate to Intra_16x16 (chroma DC Hadamard +
Intra_16x16 luma DC Hadamard pre-pass) — currently ~30-60%
of MBs fall back to identity-passthrough due to this.
- PR-A5: extend to 8x8 transform (separate IDCT 8x8 dispatch
path on the daedalus-decoder side, similar plumbing).
- PR-A6: enable libavcodec's deblock (skip_loop_filter=AVDISCARD_NONE)
and have daedalus's deblock produce the post-deblock output
that matches AVFrame. Closes the loop on the full I-only
pipeline.
- Task #184: daedalus_decoder.h coeffs docstring clarification.
|
||
|
|
86a28d2a3b |
Stage 2 PR-A2: per-MB inspection callback wiring + invariant checks
Validates marfrit-packages patch 0016 (PR #106) end-to-end against the daedalus_decode_h264 CLI. Callback fires once per macroblock in coded order; this PR checks the count + uniqueness invariants WITHOUT yet driving daedalus-decoder differently — that's PR-A3. Infrastructure landed --------------------- CMake gains DAEDALUS_FFMPEG_PREFIX option pointing at a private FFmpeg install carrying patch 0016. When set, the CLI links against it (static .a's from $prefix/lib) and the inspection codepath is compiled in (DAEDALUS_HAVE_H264_MB_INSPECT_CB). When unset, the CLI falls back to the pkg-config-discovered system FFmpeg and behaves as PR-A1b did (identity-passthrough only, no callback). The H264Context struct stays opaque (forward-decl only — its real definition lives in libavcodec's internal h264dec.h which isn't installed). Real per-MB state extraction (sl->mb coeffs, mb_type, intra modes, deblock params) will land in PR-A3 alongside an internal-header include path. The callback's only job in this PR: assert (mb_x, mb_y) lies in the coded grid, mark "seen" in a per-frame bitmap, count invocations. At end-of-frame: assert seen-count == mb_w*mb_h, 0 duplicates, 0 out-of-bounds. Per-frame mb-grid init goes BEFORE first avcodec_send_packet (callbacks fire from inside send_packet, before the first receive_frame ever returns — lazy init from AVFrame would miss all of frame 0). Dims come from codecpar->width/height rounded up to 16-mod (H.264 codes 1080 display as 1088 coded). Raster-order check considered but dropped: libavcodec uses MB-level threading in some configs so callbacks fire out of raster order. The contract is "each MB exactly once", not "in raster order"; the bitmap check captures that. Result on hertz (Pi 5, patched FFmpeg at /tmp/ffmpeg-inspect-prefix) ------------------------------------------------------------------- 320x240 I-only, 3 frames: mb-grid 20x15 callback invocations: 900 (= 3 * 300) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/230400, UV diff 0/115200 PASS 1920x1088 I-only, 3 frames: mb-grid 120x68 callback invocations: 24480 (= 3 * 8160) missing/duplicates/oob: 0/0/0 identity-passthrough Y diff 0/6266880, UV diff 0/3133440 PASS Followups --------- - PR-A3: include libavcodec/h264dec.h via -I to access H264Context internals; extract sl->mb coefficients in the callback, compute P = pre-deblock pixels - IDCT(C) using a transcribed C reference; feed daedalus_decoder with REAL (P, C, edges) instead of identity. Use avctx->skip_loop_filter = AVDISCARD_ALL to make libavcodec output pre-deblock so the subtraction is exact. - PR-A4 onwards: extend to P/B frames + chroma DC + intra prediction coverage. |
||
|
|
56f8498057 |
Stage 2 PR-A1b: tools/daedalus_decode_h264 — H.264 standalone test harness
Option A's standalone end-to-end gate against real H.264 streams.
First iteration: identity-passthrough validation — daedalus-decoder
produces output byte-exact to libavcodec's AVFrame when fed the
reconstructed pixels as `predicted`, zero coeffs, no deblock edges.
Validates: daedalus-decoder data path (append_mb + flush_frame +
NV12 output + coded-vs-display dim handling) at real-stream frame
sizes (320x240 and 1920x1088) with real H.264-decoded predicted-
sample distributions — not the random patterns the existing
test_idct_bitexact + test_deblock_smoke synthesize.
Identity-passthrough math:
- mb_input.predicted = AVFrame pixels at MB raster position
- mb_input.coeffs = 384 int16's, all zero
- mb_input.edges = NULL, n_edges = 0
flush_frame:
scratch_y/_uv pre-fill from predicted (= AVFrame pixels)
IDCT dispatches with all-zero coeffs add 0 (no-op compute)
No deblock dispatches (no edges)
copy-out → caller's NV12 planes
Result MUST equal AVFrame pixels byte-for-byte.
Build
-----
New cmake option DAEDALUS_BUILD_TOOLS (default OFF). When enabled,
pkg-checks libavcodec / libavformat / libavutil and builds the
daedalus_decode_h264 binary against the system FFmpeg.
Stock libavcodec is sufficient for THIS PR (identity passthrough
reads from AVFrame after avcodec_receive_frame; no per-MB internal
state extraction needed). Follow-up PRs (A2+) will use the per-MB
inspection callback added in marfrit-packages patch 0016 (PR #106)
to feed REAL per-MB state (pre-residual predicted samples, residual
coeffs, deblock edges) for actual non-trivial daedalus-decoder
validation.
Usage
-----
daedalus_decode_h264 [--substrate cpu|qpu|auto]
[--max-frames N]
<input.h264> <output_dadec.yuv> <output_ref.yuv>
Exit codes:
0 = byte-exact match across all frames
1 = argument / setup error
2 = decode error from libavcodec
3 = daedalus-decoder error (ctx, append, flush)
4 = bit-exact comparison failed
Result on hertz (Pi 5 V3D 7.1)
------------------------------
I-only test clip via ffmpeg testsrc2 + libx264 -bf 0 -g 1:
320x240, 5 frames:
substrate=auto: Y diff 0/76800 UV diff 0/38400 PASS
substrate=cpu: Y diff 0/76800 UV diff 0/38400 PASS
substrate=qpu: Y diff 0/76800 UV diff 0/38400 PASS
1920x1088 (coded; 1080 display), 3 frames:
substrate=auto: Y diff 0/2088960 UV diff 0/1044480 PASS
Followups
---------
- PR-A2: wire the per-MB inspection callback (marfrit-packages
0016) so per-MB state — coeffs (sl->mb), predicted-before-
residual (from prediction kernels), bS/alpha/beta — flows into
mb_input instead of zeros, and IDCT / deblock dispatches do
real GPU work. At that point we're decoding real H.264 streams
through daedalus-decoder for real.
- PR-A3: extend to P/B frames once MC dispatch lands.
|