ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 4×4 → daedalus-fourier #76
Reference in New Issue
Block a user
Delete Branch "claude-noether/marfrit-packages:noether/ffmpeg-fourier-idct4-daedalus"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct_add — called per 4×4 block from the intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through
daedalus_recipe_dispatch_h264_idct4instead offf_h264_idct_add_neon.What
0003-h264-idct4-daedalus-fourier.patch(in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Createslibavcodec/aarch64/h264_idct_daedalus.c(ff_h264_idct_add_daedalusshim + lazypthread_oncecontext init viadaedalus_ctx_create_no_qpu), patcheslibavcodec/aarch64/h264dsp_init_aarch64.cto wirec->idct_addto the shim, adds the new .o tolibavcodec/aarch64/Makefile.arch/PKGBUILD+debian/build-deb.sh: fetch + build daedalus-fourier (pinned atd87239d— lockstep with the daedalus-v4l2 daemon's inline build) with-DCMAKE_POSITION_INDEPENDENT_CODE=ONinto a per-build temp prefix, then pass--extra-cflags=-I.../include --extra-ldflags=-L.../lib --extra-libs="-ldaedalus_core -lvulkan -lpthread"to FFmpeg configure.daedalus_core.ais static-linked intolibavcodec.so.62.debian/controlDepends gainslibvulkan1(daedalus_core PUBLIC-links Vulkan::Vulkan for the queryable QPU substrate; the no-QPU constructor still works at runtime but the loader needslibvulkan.so.1present to dlopen libavcodec.so.62).arch/PKGBUILDdepends gainsvulkan-icd-loader, makedepends gainscmake/ninja/vulkan-headers.Why
The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution with one extra dispatch call and recipe-table lookup. The point of this first cycle isn't perf wins — it's plumbing. Once the path is wired and stable, follow-up patches batch through the bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20).
Bit-exact against
ff_h264_idct_add_neon(daedalus-fourier cycle 6 green; FFmpeg's 4×4 block storage matches daedalus's column-major convention).Scope NOT covered (deferred to follow-ups)
c->idct_addpath; intra-4×4-only macroblocks are a minority. Batched substitution lands in a follow-up.Risk
Dispatch overhead per single-block call (one library entry + recipe-table lookup) means a possible per-call regression compared to the in-tree NEON
.S. Intra-4×4 is a minority of MBs in typical streams, so net daemondecode_usimpact should be in noise. The daedalus-v4l2 daemon'sdecoder statssummary (added in r39+g3bc0da1) is the measurement instrument.SONAME
Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. No daedalus-v4l2-dkms or daedalus-v4l2 bump required.
Refs