ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier

Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.idct8_add — called per 8×8 block from the
High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now
dispatches through daedalus_recipe_dispatch_h264_idct8 instead of
ff_h264_idct8_add_neon.

## What

- Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/
  ffmpeg-v4l2-request-fourier/).  Extends libavcodec/aarch64/
  h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus
  and a daedalus_recipe_dispatch_h264_idct8 call; patches
  libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to
  the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append the new patch to the
  apply list; bump pkgrel/PKGREL to 7.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
  d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8.

## Why

The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of cycle 6.  Production validation of
cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly,
avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms
baseline).  Cycle 7 inherits the same shim's pthread_once context.

Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus
column-major convention).

## Scope NOT covered (deferred)

- Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the
  in-tree NEON .S code; batched substitution with n_blocks>1 lands
  later alongside the cycle-6 bulk-paths work.
- High-bit-depth (10-bit) path untouched.
- Cycles 8/9 — separate PRs.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.

## Refs

- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #78 (libxml2 ABI-skew workaround)
- marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)
This commit is contained in:
2026-05-22 10:20:27 +02:00
parent 360e8eb6bf
commit 493c762967
5 changed files with 243 additions and 10 deletions
@@ -0,0 +1,107 @@
From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 10:18:21 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.idct8_add (called per 8x8 block from the High-profile
intra-8x8-DCT decode path in h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of the cycle-6 IDCT 4x4 wiring. Same
pthread_once global context, same destructive-zero semantics; FFmpeg
column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
on the in-tree NEON .S code and will be batched through
daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
---
libavcodec/aarch64/h264_idct_daedalus.c | 29 ++++++++++++++++-------
libavcodec/aarch64/h264dsp_init_aarch64.c | 3 ++-
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
index 538d223..cbb98af 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,14 +1,16 @@
/*
- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
+ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
*
- * Routes H264DSPContext.idct_add through
- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
- * The recipe layer picks the substrate (CPU NEON by default for
- * cycle 6; future cycles may dispatch to V3D opportunistically).
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
+ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
+ * recipe layer picks the substrate (CPU NEON by default for cycles
+ * 6 + 7; future cycles may dispatch to V3D opportunistically).
*
- * FFmpeg's 4x4 block memory layout matches daedalus's column-major
- * convention: block[r + 4*c] = coefficient at (row r, col c). Both
- * sides destructively zero the block after the transform.
+ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
+ * column-major convention: block[r + N*c] = coefficient at
+ * (row r, col c) for N ∈ {4, 8}. Both sides destructively zero the
+ * block after the transform.
*
* The library context is process-global and lazily initialised under
* pthread_once. We pick the no-QPU constructor here because
@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
}
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
{
@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
block, 1, &meta);
}
+
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
+{
+ static const daedalus_h264_block_meta meta = { .dst_off = 0 };
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
+ block, 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index b993df2..741e551 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
const uint8_t nnzc[15 * 8]);
void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
int16_t *block, int stride,
@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
c->idct_add16intra = ff_h264_idct_add16intra_neon;
if (chroma_format_idc <= 1)
c->idct_add8 = ff_h264_idct_add8_neon;
- c->idct8_add = ff_h264_idct8_add_neon;
+ c->idct8_add = ff_h264_idct8_add_daedalus;
c->idct8_dc_add = ff_h264_idct8_dc_add_neon;
c->idct8_add4 = ff_h264_idct8_add4_neon;
} else if (have_neon(cpu_flags) && bit_depth == 10) {
--
2.47.3
+5 -3
View File
@@ -24,7 +24,7 @@ _srcname=FFmpeg
_version='8.1'
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
pkgver=8.1.r123329.b57fbbe
pkgrel=6 # pkgrel=6 — H.264 IDCT 4x4 daedalus-fourier substitution (2026-05-21)
pkgrel=7 # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution (cycle 7, 2026-05-22)
epoch=2
# daedalus-fourier pin — first kernel substitution in libavcodec
@@ -90,8 +90,9 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
"daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz"
'0001-libudev-bypass-fallback.patch'
'0002-nv15-to-p010-unpack.patch'
'0003-h264-idct4-daedalus-fourier.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
'0003-h264-idct4-daedalus-fourier.patch'
'0004-h264-idct8-daedalus-fourier.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
pkgver() {
cd "${_srcname}"
@@ -105,6 +106,7 @@ prepare() {
patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
}
build() {