From ea99dc8e27b576fffa6d71734a379b375c677647 Mon Sep 17 00:00:00 2001 From: claude-noether Date: Tue, 26 May 2026 09:46:10 +0200 Subject: [PATCH] ffmpeg-v4l2-request-fourier: preserve sl->mb for inspection callback (0017) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion to 0016 (PR #106). Adds a coefficient side buffer in H264Context, populated at the start of ff_h264_hl_decode_mb with a single memcpy from sl->mb BEFORE IDCT-add zeros it. The existing post-pixel-work callback (still in 0016) can now read: - h->mb_inspect_coeffs = pre-IDCT coefficients (this patch) - h->cur_pic.f->data = post-pixel-work pre-deblock reconstruction and derive P = pixels − IDCT(C) for daedalus-decoder's frame-major dispatch in PR-A3+. Memcpy gated on (h->mb_inspect_cb != NULL). Zero cost when no consumer is registered. Side buffer = 16 * 48 int16 = 1536 bytes (matches the 8-bit half of sl->mb's int16_t[16 * 48 * 2] declared size; high-bit-depth uses the upper half — not preserved here since the daedalus-decoder consumer is 8-bit-only). Single-threaded decode assumed at the consumer side (avctx->thread_count = 1). Multi-slice / multi-threaded streams would race on the single side buffer — explicit limitation of the inspection mechanism, future extension would put per-slice buffers in H264SliceContext. Verified: patches 0016 + 0017 apply cleanly and build in sequence against the Kwiboo v4l2-request-n8.1 fork at the pinned commit b57fbbe5. ff_h264_set_mb_inspect_cb symbol exported as before. Wired into arch PKGBUILD + debian build-deb.sh patch sequence. pkgrel bumped 13 → 14. Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring complete, PR-A3 coefficient extraction is the next consumer). --- .../0017-h264-mb-coeffs-side-buffer.patch | 88 +++++++++++++++++++ arch/ffmpeg-v4l2-request-fourier/PKGBUILD | 8 +- .../0017-h264-mb-coeffs-side-buffer.patch | 88 +++++++++++++++++++ .../ffmpeg-v4l2-request-fourier/build-deb.sh | 3 +- 4 files changed, 183 insertions(+), 4 deletions(-) create mode 100644 arch/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch create mode 100644 debian/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch diff --git a/arch/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch b/arch/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch new file mode 100644 index 000000000..b20484626 --- /dev/null +++ b/arch/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch @@ -0,0 +1,88 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Tue, 26 May 2026 07:30:00 +0200 +Subject: [PATCH] avcodec/h264: preserve sl->mb coefficients for the inspection + callback (companion to 0016) +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Patch 0016 adds a per-MB inspection callback fired at the end of +ff_h264_hl_decode_mb. By that time the IDCT-add path has already +zeroed sl->mb (FFmpeg's convention — see ff_h264_idct_add_neon and +friends), so consumers reading coefficients from the callback get +zeros. + +Add a coefficient side buffer in H264Context, populated at the +START of ff_h264_hl_decode_mb (before any IDCT runs) with a single +memcpy from sl->mb. The post-pixel-work callback (still in 0016) +can then read both: + - the side-buffer coefficients (= just-entropy-decoded, pre-IDCT) + - the reconstructed pixels in h->cur_pic.f->data (= P + IDCT(C), + pre-deblock for this MB) +and the consumer can derive P = pixels − IDCT(C) for daedalus- +decoder's frame-major dispatch. + +Memcpy is gated on (h->mb_inspect_cb != NULL) — zero overhead when +no consumer is registered. Buffer size = sizeof(int16_t) * 16 * 48 += 1536 bytes per H264Context (fits in one cache line family; +allocated once at H264Context lifetime, reused per MB). + +8-bit path only. High-bit-depth H.264 uses the upper half of +sl->mb (int16_t[16 * 48 * 2] declared; the * 2 reserves space for +the high-depth case); preserving the high-depth coefficients +correctly would need a wider side buffer. Punted for now — the +daedalus-decoder consumer is 8-bit-only. + +Single-threaded decode assumed at the consumer side (avctx-> +thread_count = 1). Multi-slice / multi-threaded streams would +race on the single side buffer — that's an explicit limitation of +the inspection mechanism, documented in 0016's comment block. +Future extension: per-H264SliceContext side buffers. + +Used by: + - daedalus-decoder/tools/daedalus_decode_h264 PR-A3+ (CLI test + harness extracts coefficients here for daedalus-decoder + IDCT validation on real H.264 streams). + +Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring). +--- + libavcodec/h264_mb.c | 9 +++++++++ + libavcodec/h264dec.h | 8 ++++++++ + 2 files changed, 17 insertions(+) + +--- a/libavcodec/h264dec.h ++++ b/libavcodec/h264dec.h +@@ -593,6 +593,14 @@ + /* Per-MB inspection hook — set via ff_h264_set_mb_inspect_cb. */ + ff_h264_mb_inspect_cb mb_inspect_cb; + void *mb_inspect_opaque; ++ ++ /* Per-MB coefficient side buffer — populated at the start of ++ * ff_h264_hl_decode_mb so the post-pixel-work inspection callback ++ * can read the just-entropy-decoded coefficients before IDCT-add ++ * zeros sl->mb. 16 blocks × 48 int16 = libavcodec sl->mb size ++ * (matches DECLARE_ALIGNED(16, int16_t, mb)[16 * 48 * 2] for the ++ * 8-bit half; high-bit-depth paths skip this — see h264_mb.c). */ ++ DECLARE_ALIGNED(16, int16_t, mb_inspect_coeffs)[16 * 48]; + } H264Context; + + extern const uint16_t ff_h264_mb_sizes[4]; +--- a/libavcodec/h264_mb.c ++++ b/libavcodec/h264_mb.c +@@ -801,6 +801,15 @@ + { + const int mb_xy = sl->mb_xy; + const int mb_type = h->cur_pic.mb_type[mb_xy]; ++ ++ /* Snapshot just-entropy-decoded coefficients before IDCT-add ++ * destroys them. Only when an inspection callback is registered ++ * — zero cost otherwise. 8-bit path only (high-bit-depth uses ++ * the upper half of sl->mb which we don't preserve here). */ ++ if (h->mb_inspect_cb && !h->pixel_shift) ++ memcpy((int16_t *) (uintptr_t) h->mb_inspect_coeffs, sl->mb, ++ sizeof(((H264Context *) NULL)->mb_inspect_coeffs)); ++ + int is_complex = CONFIG_SMALL || sl->is_complex || + IS_INTRA_PCM(mb_type) || sl->qscale == 0; + diff --git a/arch/ffmpeg-v4l2-request-fourier/PKGBUILD b/arch/ffmpeg-v4l2-request-fourier/PKGBUILD index e9bc3907b..6380351ac 100644 --- a/arch/ffmpeg-v4l2-request-fourier/PKGBUILD +++ b/arch/ffmpeg-v4l2-request-fourier/PKGBUILD @@ -24,7 +24,7 @@ _srcname=FFmpeg _version='8.1' _commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24 pkgver=8.1.r123329.b57fbbe -pkgrel=13 # pkgrel=13 — per-MB inspection callback (0016) for daedalus-decoder CLI test harness; observation-only, no behaviour change to existing decode path +pkgrel=14 # pkgrel=14 — per-MB coefficient side buffer (0017) extending 0016 for daedalus-decoder CLI IDCT validation; observation-only, no behaviour change to existing decode path epoch=2 # daedalus-fourier pin. 209a421 = PR #2 merge (Phase 8c — public API @@ -103,8 +103,9 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}" '0013-h264-deblock-chroma-intra-daedalus-fourier.patch' '0014-h264-ctx-qpu-capable.patch' '0015-h264-ctx-revert-to-no-qpu.patch' - '0016-h264-mb-inspect-callback.patch') -sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP') + '0016-h264-mb-inspect-callback.patch' + '0017-h264-mb-coeffs-side-buffer.patch') +sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP') pkgver() { cd "${_srcname}" @@ -131,6 +132,7 @@ prepare() { patch -Np1 -i "${srcdir}/0014-h264-ctx-qpu-capable.patch" patch -Np1 -i "${srcdir}/0015-h264-ctx-revert-to-no-qpu.patch" patch -Np1 -i "${srcdir}/0016-h264-mb-inspect-callback.patch" + patch -Np1 -i "${srcdir}/0017-h264-mb-coeffs-side-buffer.patch" } build() { diff --git a/debian/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch b/debian/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch new file mode 100644 index 000000000..b20484626 --- /dev/null +++ b/debian/ffmpeg-v4l2-request-fourier/0017-h264-mb-coeffs-side-buffer.patch @@ -0,0 +1,88 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Tue, 26 May 2026 07:30:00 +0200 +Subject: [PATCH] avcodec/h264: preserve sl->mb coefficients for the inspection + callback (companion to 0016) +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Patch 0016 adds a per-MB inspection callback fired at the end of +ff_h264_hl_decode_mb. By that time the IDCT-add path has already +zeroed sl->mb (FFmpeg's convention — see ff_h264_idct_add_neon and +friends), so consumers reading coefficients from the callback get +zeros. + +Add a coefficient side buffer in H264Context, populated at the +START of ff_h264_hl_decode_mb (before any IDCT runs) with a single +memcpy from sl->mb. The post-pixel-work callback (still in 0016) +can then read both: + - the side-buffer coefficients (= just-entropy-decoded, pre-IDCT) + - the reconstructed pixels in h->cur_pic.f->data (= P + IDCT(C), + pre-deblock for this MB) +and the consumer can derive P = pixels − IDCT(C) for daedalus- +decoder's frame-major dispatch. + +Memcpy is gated on (h->mb_inspect_cb != NULL) — zero overhead when +no consumer is registered. Buffer size = sizeof(int16_t) * 16 * 48 += 1536 bytes per H264Context (fits in one cache line family; +allocated once at H264Context lifetime, reused per MB). + +8-bit path only. High-bit-depth H.264 uses the upper half of +sl->mb (int16_t[16 * 48 * 2] declared; the * 2 reserves space for +the high-depth case); preserving the high-depth coefficients +correctly would need a wider side buffer. Punted for now — the +daedalus-decoder consumer is 8-bit-only. + +Single-threaded decode assumed at the consumer side (avctx-> +thread_count = 1). Multi-slice / multi-threaded streams would +race on the single side buffer — that's an explicit limitation of +the inspection mechanism, documented in 0016's comment block. +Future extension: per-H264SliceContext side buffers. + +Used by: + - daedalus-decoder/tools/daedalus_decode_h264 PR-A3+ (CLI test + harness extracts coefficients here for daedalus-decoder + IDCT validation on real H.264 streams). + +Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring). +--- + libavcodec/h264_mb.c | 9 +++++++++ + libavcodec/h264dec.h | 8 ++++++++ + 2 files changed, 17 insertions(+) + +--- a/libavcodec/h264dec.h ++++ b/libavcodec/h264dec.h +@@ -593,6 +593,14 @@ + /* Per-MB inspection hook — set via ff_h264_set_mb_inspect_cb. */ + ff_h264_mb_inspect_cb mb_inspect_cb; + void *mb_inspect_opaque; ++ ++ /* Per-MB coefficient side buffer — populated at the start of ++ * ff_h264_hl_decode_mb so the post-pixel-work inspection callback ++ * can read the just-entropy-decoded coefficients before IDCT-add ++ * zeros sl->mb. 16 blocks × 48 int16 = libavcodec sl->mb size ++ * (matches DECLARE_ALIGNED(16, int16_t, mb)[16 * 48 * 2] for the ++ * 8-bit half; high-bit-depth paths skip this — see h264_mb.c). */ ++ DECLARE_ALIGNED(16, int16_t, mb_inspect_coeffs)[16 * 48]; + } H264Context; + + extern const uint16_t ff_h264_mb_sizes[4]; +--- a/libavcodec/h264_mb.c ++++ b/libavcodec/h264_mb.c +@@ -801,6 +801,15 @@ + { + const int mb_xy = sl->mb_xy; + const int mb_type = h->cur_pic.mb_type[mb_xy]; ++ ++ /* Snapshot just-entropy-decoded coefficients before IDCT-add ++ * destroys them. Only when an inspection callback is registered ++ * — zero cost otherwise. 8-bit path only (high-bit-depth uses ++ * the upper half of sl->mb which we don't preserve here). */ ++ if (h->mb_inspect_cb && !h->pixel_shift) ++ memcpy((int16_t *) (uintptr_t) h->mb_inspect_coeffs, sl->mb, ++ sizeof(((H264Context *) NULL)->mb_inspect_coeffs)); ++ + int is_complex = CONFIG_SMALL || sl->is_complex || + IS_INTRA_PCM(mb_type) || sl->qscale == 0; + diff --git a/debian/ffmpeg-v4l2-request-fourier/build-deb.sh b/debian/ffmpeg-v4l2-request-fourier/build-deb.sh index 19d85ad18..d56a92c4f 100755 --- a/debian/ffmpeg-v4l2-request-fourier/build-deb.sh +++ b/debian/ffmpeg-v4l2-request-fourier/build-deb.sh @@ -33,7 +33,7 @@ FFMPEG_VERSION=8.1 # epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie); # +rfourier suffix to avoid colliding with upstream/Debian rebuilds. PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe -PKGREL=13 # pkgrel=13 — per-MB inspection callback (0016) for daedalus-decoder CLI test harness; observation-only, no behaviour change to existing decode path +PKGREL=14 # pkgrel=14 — per-MB coefficient side buffer (0017) extending 0016 for daedalus-decoder CLI IDCT validation; observation-only, no behaviour change to existing decode path # (cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes # the libavcodec.so substitution sequence 6 IDCT4 / 7 IDCT8 / # 8 luma-v deblock / 9 qpel mc20). Pulls daedalus-fourier PR #2 @@ -83,6 +83,7 @@ patch -Np1 -i "$HERE/0013-h264-deblock-chroma-intra-daedalus-fourier.patch" patch -Np1 -i "$HERE/0014-h264-ctx-qpu-capable.patch" patch -Np1 -i "$HERE/0015-h264-ctx-revert-to-no-qpu.patch" patch -Np1 -i "$HERE/0016-h264-mb-inspect-callback.patch" +patch -Np1 -i "$HERE/0017-h264-mb-coeffs-side-buffer.patch" # --- daedalus-fourier: fetch + build static .a with PIC, install to a # per-build prefix; libavcodec.so links it into the shared object so -- 2.47.3