From 493c762967b05ec3d9da5410a91a3af5939cd054 Mon Sep 17 00:00:00 2001 From: claude-noether Date: Fri, 22 May 2026 10:20:27 +0200 Subject: [PATCH] =?UTF-8?q?ffmpeg-v4l2-request-fourier:=20substitute=20H.2?= =?UTF-8?q?64=20IDCT=208=C3=978=20=E2=86=92=20daedalus-fourier?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. ## What - Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/ h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus and a daedalus_recipe_dispatch_h264_idct8 call; patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to the new shim. - arch/PKGBUILD + debian/build-deb.sh: append the new patch to the apply list; bump pkgrel/PKGREL to 7. - No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8. ## Why The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6. Production validation of cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly, avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms baseline). Cycle 7 inherits the same shim's pthread_once context. Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus column-major convention). ## Scope NOT covered (deferred) - Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with n_blocks>1 lands later alongside the cycle-6 bulk-paths work. - High-bit-depth (10-bit) path untouched. - Cycles 8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): https://git.reauktion.de/reauktion/daedalus-v4l2/issues/11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4) - marfrit-packages PR #78 (libxml2 ABI-skew workaround) - marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green) --- .../0004-h264-idct8-daedalus-fourier.patch | 107 ++++++++++++++++++ arch/ffmpeg-v4l2-request-fourier/PKGBUILD | 8 +- .../0004-h264-idct8-daedalus-fourier.patch | 107 ++++++++++++++++++ .../ffmpeg-v4l2-request-fourier/build-deb.sh | 13 +-- .../debian/changelog | 18 +++ 5 files changed, 243 insertions(+), 10 deletions(-) create mode 100644 arch/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch create mode 100644 debian/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch diff --git a/arch/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch b/arch/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch new file mode 100644 index 000000000..fd358f4cd --- /dev/null +++ b/arch/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch @@ -0,0 +1,107 @@ +From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Fri, 22 May 2026 10:18:21 +0200 +Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through + daedalus-fourier +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +H264DSPContext.idct8_add (called per 8x8 block from the High-profile +intra-8x8-DCT decode path in h264_mb.c) now dispatches through +daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. + +The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8) +the recipe is CPU NEON, so this is effectively a NEON-to-NEON +substitution layered on top of the cycle-6 IDCT 4x4 wiring. Same +pthread_once global context, same destructive-zero semantics; FFmpeg +column-major 8x8 storage block[r + 8*c] matches daedalus's convention. + +Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains +on the in-tree NEON .S code and will be batched through +daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up. + +Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 +green). + +Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7. +--- + libavcodec/aarch64/h264_idct_daedalus.c | 29 ++++++++++++++++------- + libavcodec/aarch64/h264dsp_init_aarch64.c | 3 ++- + 2 files changed, 23 insertions(+), 9 deletions(-) + +diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c +index 538d223..cbb98af 100644 +--- a/libavcodec/aarch64/h264_idct_daedalus.c ++++ b/libavcodec/aarch64/h264_idct_daedalus.c +@@ -1,14 +1,16 @@ + /* +- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim. ++ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims. + * +- * Routes H264DSPContext.idct_add through +- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon. +- * The recipe layer picks the substrate (CPU NEON by default for +- * cycle 6; future cycles may dispatch to V3D opportunistically). ++ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4 ++ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8 ++ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The ++ * recipe layer picks the substrate (CPU NEON by default for cycles ++ * 6 + 7; future cycles may dispatch to V3D opportunistically). + * +- * FFmpeg's 4x4 block memory layout matches daedalus's column-major +- * convention: block[r + 4*c] = coefficient at (row r, col c). Both +- * sides destructively zero the block after the transform. ++ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's ++ * column-major convention: block[r + N*c] = coefficient at ++ * (row r, col c) for N ∈ {4, 8}. Both sides destructively zero the ++ * block after the transform. + * + * The library context is process-global and lazily initialised under + * pthread_once. We pick the no-QPU constructor here because +@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void) + } + + void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride); ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride); + + void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride) + { +@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride) + daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride, + block, 1, &meta); + } ++ ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride) ++{ ++ static const daedalus_h264_block_meta meta = { .dst_off = 0 }; ++ ++ pthread_once(&g_dctx_once, daedalus_ctx_init_once); ++ ++ daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride, ++ block, 1, &meta); ++} +diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c +index b993df2..741e551 100644 +--- a/libavcodec/aarch64/h264dsp_init_aarch64.c ++++ b/libavcodec/aarch64/h264dsp_init_aarch64.c +@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset, + const uint8_t nnzc[15 * 8]); + + void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride); ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride); + void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride); + void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset, + int16_t *block, int stride, +@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, + c->idct_add16intra = ff_h264_idct_add16intra_neon; + if (chroma_format_idc <= 1) + c->idct_add8 = ff_h264_idct_add8_neon; +- c->idct8_add = ff_h264_idct8_add_neon; ++ c->idct8_add = ff_h264_idct8_add_daedalus; + c->idct8_dc_add = ff_h264_idct8_dc_add_neon; + c->idct8_add4 = ff_h264_idct8_add4_neon; + } else if (have_neon(cpu_flags) && bit_depth == 10) { +-- +2.47.3 + diff --git a/arch/ffmpeg-v4l2-request-fourier/PKGBUILD b/arch/ffmpeg-v4l2-request-fourier/PKGBUILD index e74f2d591..6670cfadd 100644 --- a/arch/ffmpeg-v4l2-request-fourier/PKGBUILD +++ b/arch/ffmpeg-v4l2-request-fourier/PKGBUILD @@ -24,7 +24,7 @@ _srcname=FFmpeg _version='8.1' _commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24 pkgver=8.1.r123329.b57fbbe -pkgrel=6 # pkgrel=6 — H.264 IDCT 4x4 daedalus-fourier substitution (2026-05-21) +pkgrel=7 # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution (cycle 7, 2026-05-22) epoch=2 # daedalus-fourier pin — first kernel substitution in libavcodec @@ -90,8 +90,9 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}" "daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz" '0001-libudev-bypass-fallback.patch' '0002-nv15-to-p010-unpack.patch' - '0003-h264-idct4-daedalus-fourier.patch') -sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP') + '0003-h264-idct4-daedalus-fourier.patch' + '0004-h264-idct8-daedalus-fourier.patch') +sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP') pkgver() { cd "${_srcname}" @@ -105,6 +106,7 @@ prepare() { patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch" patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch" patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch" + patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch" } build() { diff --git a/debian/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch b/debian/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch new file mode 100644 index 000000000..fd358f4cd --- /dev/null +++ b/debian/ffmpeg-v4l2-request-fourier/0004-h264-idct8-daedalus-fourier.patch @@ -0,0 +1,107 @@ +From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Fri, 22 May 2026 10:18:21 +0200 +Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through + daedalus-fourier +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +H264DSPContext.idct8_add (called per 8x8 block from the High-profile +intra-8x8-DCT decode path in h264_mb.c) now dispatches through +daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. + +The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8) +the recipe is CPU NEON, so this is effectively a NEON-to-NEON +substitution layered on top of the cycle-6 IDCT 4x4 wiring. Same +pthread_once global context, same destructive-zero semantics; FFmpeg +column-major 8x8 storage block[r + 8*c] matches daedalus's convention. + +Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains +on the in-tree NEON .S code and will be batched through +daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up. + +Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 +green). + +Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7. +--- + libavcodec/aarch64/h264_idct_daedalus.c | 29 ++++++++++++++++------- + libavcodec/aarch64/h264dsp_init_aarch64.c | 3 ++- + 2 files changed, 23 insertions(+), 9 deletions(-) + +diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c +index 538d223..cbb98af 100644 +--- a/libavcodec/aarch64/h264_idct_daedalus.c ++++ b/libavcodec/aarch64/h264_idct_daedalus.c +@@ -1,14 +1,16 @@ + /* +- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim. ++ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims. + * +- * Routes H264DSPContext.idct_add through +- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon. +- * The recipe layer picks the substrate (CPU NEON by default for +- * cycle 6; future cycles may dispatch to V3D opportunistically). ++ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4 ++ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8 ++ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The ++ * recipe layer picks the substrate (CPU NEON by default for cycles ++ * 6 + 7; future cycles may dispatch to V3D opportunistically). + * +- * FFmpeg's 4x4 block memory layout matches daedalus's column-major +- * convention: block[r + 4*c] = coefficient at (row r, col c). Both +- * sides destructively zero the block after the transform. ++ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's ++ * column-major convention: block[r + N*c] = coefficient at ++ * (row r, col c) for N ∈ {4, 8}. Both sides destructively zero the ++ * block after the transform. + * + * The library context is process-global and lazily initialised under + * pthread_once. We pick the no-QPU constructor here because +@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void) + } + + void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride); ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride); + + void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride) + { +@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride) + daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride, + block, 1, &meta); + } ++ ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride) ++{ ++ static const daedalus_h264_block_meta meta = { .dst_off = 0 }; ++ ++ pthread_once(&g_dctx_once, daedalus_ctx_init_once); ++ ++ daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride, ++ block, 1, &meta); ++} +diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c +index b993df2..741e551 100644 +--- a/libavcodec/aarch64/h264dsp_init_aarch64.c ++++ b/libavcodec/aarch64/h264dsp_init_aarch64.c +@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset, + const uint8_t nnzc[15 * 8]); + + void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride); ++void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride); + void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride); + void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset, + int16_t *block, int stride, +@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, + c->idct_add16intra = ff_h264_idct_add16intra_neon; + if (chroma_format_idc <= 1) + c->idct_add8 = ff_h264_idct_add8_neon; +- c->idct8_add = ff_h264_idct8_add_neon; ++ c->idct8_add = ff_h264_idct8_add_daedalus; + c->idct8_dc_add = ff_h264_idct8_dc_add_neon; + c->idct8_add4 = ff_h264_idct8_add4_neon; + } else if (have_neon(cpu_flags) && bit_depth == 10) { +-- +2.47.3 + diff --git a/debian/ffmpeg-v4l2-request-fourier/build-deb.sh b/debian/ffmpeg-v4l2-request-fourier/build-deb.sh index 7b52860e3..398b710ba 100755 --- a/debian/ffmpeg-v4l2-request-fourier/build-deb.sh +++ b/debian/ffmpeg-v4l2-request-fourier/build-deb.sh @@ -33,13 +33,11 @@ FFMPEG_VERSION=8.1 # epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie); # +rfourier suffix to avoid colliding with upstream/Debian rebuilds. PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe -PKGREL=6 # pkgrel=6 — drop --enable-libxml2 to avoid runner/target libxml2 - # SOVERSION skew (runner has libxml2 ≥ 2.14 = SONAME 16; trixie - # has 2.12 = SONAME 2; -5 .deb dlopens libavformat → fails on - # "libxml2.so.16: cannot open shared object"). Neither the - # daedalus-v4l2 daemon (direct AVPacket feed) nor mpv-fourier - # nor firefox-fourier consumers need FFmpeg's DASH demuxer. - # (2026-05-21) +PKGREL=7 # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution + # (cycle 7). Stacks on top of cycle-6 IDCT 4x4 (PR #76) and + # the libxml2-drop ABI-skew workaround (PR #78). Wires + # H264DSPContext.idct8_add through + # daedalus_recipe_dispatch_h264_idct8. (2026-05-22) # daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6 # H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon already ships @@ -69,6 +67,7 @@ fi patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch" patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch" patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch" +patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch" # --- daedalus-fourier: fetch + build static .a with PIC, install to a # per-build prefix; libavcodec.so links it into the shared object so diff --git a/debian/ffmpeg-v4l2-request-fourier/debian/changelog b/debian/ffmpeg-v4l2-request-fourier/debian/changelog index 6af7309dc..977a692c8 100644 --- a/debian/ffmpeg-v4l2-request-fourier/debian/changelog +++ b/debian/ffmpeg-v4l2-request-fourier/debian/changelog @@ -1,3 +1,21 @@ +ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium + + * Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add + (per-block 8x8 IDCT, called from the High-profile intra-8x8-DCT + macroblock path in libavcodec/h264_mb.c) now dispatches through + daedalus_recipe_dispatch_h264_idct8 instead of + ff_h264_idct8_add_neon. Cycle 7 of the daedalus-v4l2#11 step 2 + substitution arc — NEON-by-recipe, same pthread_once context the + cycle-6 IDCT 4x4 shim already owns. + * Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 + green; FFmpeg 8x8 block storage block[r + 8*c] matches daedalus + column-major convention). + * Bulk c->idct8_add4 (inter 8x8-DCT macroblocks) stays on the + in-tree NEON .S code; batched substitution lands later. + * No SONAME change, no Depends change. + + -- Markus Fritsche Fri, 22 May 2026 10:30:00 +0000 + ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium * Drop --enable-libxml2 + libxml2 Depends — the Gitea