Files
claude-noether ea99dc8e27 ffmpeg-v4l2-request-fourier: preserve sl->mb for inspection callback (0017)
Companion to 0016 (PR #106).  Adds a coefficient side buffer in
H264Context, populated at the start of ff_h264_hl_decode_mb with a
single memcpy from sl->mb BEFORE IDCT-add zeros it.  The existing
post-pixel-work callback (still in 0016) can now read:
  - h->mb_inspect_coeffs  = pre-IDCT coefficients (this patch)
  - h->cur_pic.f->data    = post-pixel-work pre-deblock reconstruction

and derive P = pixels − IDCT(C) for daedalus-decoder's frame-major
dispatch in PR-A3+.

Memcpy gated on (h->mb_inspect_cb != NULL).  Zero cost when no
consumer is registered.  Side buffer = 16 * 48 int16 = 1536 bytes
(matches the 8-bit half of sl->mb's int16_t[16 * 48 * 2] declared
size; high-bit-depth uses the upper half — not preserved here since
the daedalus-decoder consumer is 8-bit-only).

Single-threaded decode assumed at the consumer side
(avctx->thread_count = 1).  Multi-slice / multi-threaded streams
would race on the single side buffer — explicit limitation of the
inspection mechanism, future extension would put per-slice buffers
in H264SliceContext.

Verified: patches 0016 + 0017 apply cleanly and build in sequence
against the Kwiboo v4l2-request-n8.1 fork at the pinned commit
b57fbbe5.  ff_h264_set_mb_inspect_cb symbol exported as before.

Wired into arch PKGBUILD + debian build-deb.sh patch sequence.
pkgrel bumped 13 → 14.

Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring complete,
PR-A3 coefficient extraction is the next consumer).
2026-05-26 09:46:10 +02:00

89 lines
3.8 KiB
Diff
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 26 May 2026 07:30:00 +0200
Subject: [PATCH] avcodec/h264: preserve sl->mb coefficients for the inspection
callback (companion to 0016)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Patch 0016 adds a per-MB inspection callback fired at the end of
ff_h264_hl_decode_mb. By that time the IDCT-add path has already
zeroed sl->mb (FFmpeg's convention — see ff_h264_idct_add_neon and
friends), so consumers reading coefficients from the callback get
zeros.
Add a coefficient side buffer in H264Context, populated at the
START of ff_h264_hl_decode_mb (before any IDCT runs) with a single
memcpy from sl->mb. The post-pixel-work callback (still in 0016)
can then read both:
- the side-buffer coefficients (= just-entropy-decoded, pre-IDCT)
- the reconstructed pixels in h->cur_pic.f->data (= P + IDCT(C),
pre-deblock for this MB)
and the consumer can derive P = pixels IDCT(C) for daedalus-
decoder's frame-major dispatch.
Memcpy is gated on (h->mb_inspect_cb != NULL) — zero overhead when
no consumer is registered. Buffer size = sizeof(int16_t) * 16 * 48
= 1536 bytes per H264Context (fits in one cache line family;
allocated once at H264Context lifetime, reused per MB).
8-bit path only. High-bit-depth H.264 uses the upper half of
sl->mb (int16_t[16 * 48 * 2] declared; the * 2 reserves space for
the high-depth case); preserving the high-depth coefficients
correctly would need a wider side buffer. Punted for now — the
daedalus-decoder consumer is 8-bit-only.
Single-threaded decode assumed at the consumer side (avctx->
thread_count = 1). Multi-slice / multi-threaded streams would
race on the single side buffer — that's an explicit limitation of
the inspection mechanism, documented in 0016's comment block.
Future extension: per-H264SliceContext side buffers.
Used by:
- daedalus-decoder/tools/daedalus_decode_h264 PR-A3+ (CLI test
harness extracts coefficients here for daedalus-decoder
IDCT validation on real H.264 streams).
Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring).
---
libavcodec/h264_mb.c | 9 +++++++++
libavcodec/h264dec.h | 8 ++++++++
2 files changed, 17 insertions(+)
--- a/libavcodec/h264dec.h
+++ b/libavcodec/h264dec.h
@@ -593,6 +593,14 @@
/* Per-MB inspection hook — set via ff_h264_set_mb_inspect_cb. */
ff_h264_mb_inspect_cb mb_inspect_cb;
void *mb_inspect_opaque;
+
+ /* Per-MB coefficient side buffer — populated at the start of
+ * ff_h264_hl_decode_mb so the post-pixel-work inspection callback
+ * can read the just-entropy-decoded coefficients before IDCT-add
+ * zeros sl->mb. 16 blocks × 48 int16 = libavcodec sl->mb size
+ * (matches DECLARE_ALIGNED(16, int16_t, mb)[16 * 48 * 2] for the
+ * 8-bit half; high-bit-depth paths skip this — see h264_mb.c). */
+ DECLARE_ALIGNED(16, int16_t, mb_inspect_coeffs)[16 * 48];
} H264Context;
extern const uint16_t ff_h264_mb_sizes[4];
--- a/libavcodec/h264_mb.c
+++ b/libavcodec/h264_mb.c
@@ -801,6 +801,15 @@
{
const int mb_xy = sl->mb_xy;
const int mb_type = h->cur_pic.mb_type[mb_xy];
+
+ /* Snapshot just-entropy-decoded coefficients before IDCT-add
+ * destroys them. Only when an inspection callback is registered
+ * — zero cost otherwise. 8-bit path only (high-bit-depth uses
+ * the upper half of sl->mb which we don't preserve here). */
+ if (h->mb_inspect_cb && !h->pixel_shift)
+ memcpy((int16_t *) (uintptr_t) h->mb_inspect_coeffs, sl->mb,
+ sizeof(((H264Context *) NULL)->mb_inspect_coeffs));
+
int is_complex = CONFIG_SMALL || sl->is_complex ||
IS_INTRA_PCM(mb_type) || sl->qscale == 0;