ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 4×4 → daedalus-fourier #76

Merged
marfrit merged 3 commits from claude-noether/marfrit-packages:noether/ffmpeg-fourier-idct4-daedalus into main 2026-05-21 19:57:32 +00:00
Owner

First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct_add — called per 4×4 block from the intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.

What

  • Add 0003-h264-idct4-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Creates libavcodec/aarch64/h264_idct_daedalus.c (ff_h264_idct_add_daedalus shim + lazy pthread_once context init via daedalus_ctx_create_no_qpu), patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct_add to the shim, adds the new .o to libavcodec/aarch64/Makefile.
  • arch/PKGBUILD + debian/build-deb.sh: fetch + build daedalus-fourier (pinned at d87239d — lockstep with the daedalus-v4l2 daemon's inline build) with -DCMAKE_POSITION_INDEPENDENT_CODE=ON into a per-build temp prefix, then pass --extra-cflags=-I.../include --extra-ldflags=-L.../lib --extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg configure. daedalus_core.a is static-linked into libavcodec.so.62.
  • debian/control Depends gains libvulkan1 (daedalus_core PUBLIC-links Vulkan::Vulkan for the queryable QPU substrate; the no-QPU constructor still works at runtime but the loader needs libvulkan.so.1 present to dlopen libavcodec.so.62).
  • arch/PKGBUILD depends gains vulkan-icd-loader, makedepends gains cmake / ninja / vulkan-headers.

Why

The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution with one extra dispatch call and recipe-table lookup. The point of this first cycle isn't perf wins — it's plumbing. Once the path is wired and stable, follow-up patches batch through the bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20).

Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6 green; FFmpeg's 4×4 block storage matches daedalus's column-major convention).

Scope NOT covered (deferred to follow-ups)

  • Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT 4×4 calls in real H.264 streams go through these, not the per-block c->idct_add path; intra-4×4-only macroblocks are a minority. Batched substitution lands in a follow-up.
  • High-bit-depth (10-bit) path — not touched; 8-bit only.
  • Cycles 7/8/9 — separate PRs.

Risk

Dispatch overhead per single-block call (one library entry + recipe-table lookup) means a possible per-call regression compared to the in-tree NEON .S. Intra-4×4 is a minority of MBs in typical streams, so net daemon decode_us impact should be in noise. The daedalus-v4l2 daemon's decoder stats summary (added in r39+g3bc0da1) is the measurement instrument.

SONAME

Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. No daedalus-v4l2-dkms or daedalus-v4l2 bump required.

Refs

  • reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
  • marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)
First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct_add — called per 4×4 block from the intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through `daedalus_recipe_dispatch_h264_idct4` instead of `ff_h264_idct_add_neon`. ## What - Add `0003-h264-idct4-daedalus-fourier.patch` (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Creates `libavcodec/aarch64/h264_idct_daedalus.c` (`ff_h264_idct_add_daedalus` shim + lazy `pthread_once` context init via `daedalus_ctx_create_no_qpu`), patches `libavcodec/aarch64/h264dsp_init_aarch64.c` to wire `c->idct_add` to the shim, adds the new .o to `libavcodec/aarch64/Makefile`. - `arch/PKGBUILD` + `debian/build-deb.sh`: fetch + build daedalus-fourier (pinned at `d87239d` — lockstep with the daedalus-v4l2 daemon's inline build) with `-DCMAKE_POSITION_INDEPENDENT_CODE=ON` into a per-build temp prefix, then pass `--extra-cflags=-I.../include --extra-ldflags=-L.../lib --extra-libs="-ldaedalus_core -lvulkan -lpthread"` to FFmpeg configure. `daedalus_core.a` is static-linked into `libavcodec.so.62`. - `debian/control` Depends gains `libvulkan1` (daedalus_core PUBLIC-links Vulkan::Vulkan for the queryable QPU substrate; the no-QPU constructor still works at runtime but the loader needs `libvulkan.so.1` present to dlopen libavcodec.so.62). - `arch/PKGBUILD` depends gains `vulkan-icd-loader`, makedepends gains `cmake` / `ninja` / `vulkan-headers`. ## Why The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution with one extra dispatch call and recipe-table lookup. The point of this first cycle isn't perf wins — it's plumbing. Once the path is wired and stable, follow-up patches batch through the bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20). Bit-exact against `ff_h264_idct_add_neon` (daedalus-fourier cycle 6 green; FFmpeg's 4×4 block storage matches daedalus's column-major convention). ## Scope NOT covered (deferred to follow-ups) - Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT 4×4 calls in real H.264 streams go through these, not the per-block `c->idct_add` path; intra-4×4-only macroblocks are a minority. Batched substitution lands in a follow-up. - High-bit-depth (10-bit) path — not touched; 8-bit only. - Cycles 7/8/9 — separate PRs. ## Risk Dispatch overhead per single-block call (one library entry + recipe-table lookup) means a possible per-call regression compared to the in-tree NEON `.S`. Intra-4×4 is a minority of MBs in typical streams, so net daemon `decode_us` impact should be in noise. The daedalus-v4l2 daemon's `decoder stats` summary (added in r39+g3bc0da1) is the measurement instrument. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. No daedalus-v4l2-dkms or daedalus-v4l2 bump required. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): https://git.reauktion.de/reauktion/daedalus-v4l2/issues/11 - marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)
marfrit added 1 commit 2026-05-21 19:45:24 +00:00
First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.idct_add — called per 4×4 block from the
intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.

## What

- Add 0003-h264-idct4-daedalus-fourier.patch (in both arch/ and
  debian/ ffmpeg-v4l2-request-fourier/).  Creates
  libavcodec/aarch64/h264_idct_daedalus.c (ff_h264_idct_add_daedalus
  shim + lazy pthread_once context init via
  daedalus_ctx_create_no_qpu), patches
  libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct_add to
  the shim, adds the new .o to libavcodec/aarch64/Makefile.
- arch/PKGBUILD + debian/build-deb.sh: fetch + build
  daedalus-fourier (pinned at d87239d — lockstep with the
  daedalus-v4l2 daemon's inline build) with
  -DCMAKE_POSITION_INDEPENDENT_CODE=ON into a per-build temp prefix,
  then pass --extra-cflags=-I.../include --extra-ldflags=-L.../lib
  --extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
  configure.  daedalus_core.a is static-linked into libavcodec.so.62.
- debian/control Depends gains libvulkan1 (daedalus_core PUBLIC-links
  Vulkan::Vulkan for the queryable QPU substrate; the no-QPU
  constructor still works at runtime but the loader needs
  libvulkan.so.1 present to dlopen libavcodec.so.62).
- arch/PKGBUILD depends gains vulkan-icd-loader, makedepends gains
  cmake / ninja / vulkan-headers.

## Why

The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution with one extra dispatch call and recipe-table lookup.
The point of this first cycle isn't perf wins — it's plumbing.  Once
the path is wired and stable, follow-up patches batch through the
bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack
cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20).

Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
green; FFmpeg's 4×4 block storage matches daedalus's column-major
convention).

## Scope NOT covered

- Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT
  4×4 calls in real H.264 streams go through these, not the per-
  block c->idct_add path; intra-4×4-only macroblocks are a minority.
  Batched substitution lands in a follow-up.
- High-bit-depth (10-bit) path — not touched; 8-bit only.
- Cycles 7/8/9 — separate PRs.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.
No daedalus-v4l2-dkms or daedalus-v4l2 bump required.

## Refs

- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)
claude-noether added 1 commit 2026-05-21 19:47:51 +00:00
The ffmpeg-v4l2-request-debian job now needs to build daedalus-fourier
into a temp prefix before configuring FFmpeg (substitution patch
0003-h264-idct4-daedalus-fourier.patch links libdaedalus_core.a into
libavcodec.so).  Mirror the build-deps the daedalus-v4l2-debian job
already declared for the same reason.

No-op on Arch — makepkg --syncdeps auto-installs cmake/ninja/
vulkan-headers from the PKGBUILD makedepends.
claude-noether added 1 commit 2026-05-21 19:56:50 +00:00
marfrit merged commit a536e20218 into main 2026-05-21 19:57:32 +00:00
marfrit deleted branch noether/ffmpeg-fourier-idct4-daedalus 2026-05-21 19:57:32 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/marfrit-packages#76