ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier #85

Merged
marfrit merged 1 commits from claude-noether/marfrit-packages:noether/ffmpeg-fourier-idct8-daedalus into main 2026-05-22 08:32:15 +00:00
Owner

Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.

What

  • Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus and a daedalus_recipe_dispatch_h264_idct8 call; patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to the new shim.
  • arch/PKGBUILD + debian/build-deb.sh: append the new patch to the apply list; bump pkgrel/PKGREL to 7.
  • No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8.

Why

The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6.

Production validation of cycle 6 on higgs Firefox YouTube (post-PR #78): 3040 frames decoded cleanly, avg_decode_us=3388 µs vs the pre-substitution ~4 ms baseline → no regression. Cycle 7 inherits the same shim's pthread_once context.

Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus column-major convention).

Scope NOT covered (deferred)

  • Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with n_blocks>1 lands later alongside the cycle-6 bulk-paths work.
  • High-bit-depth (10-bit) path untouched.
  • Cycles 8/9 — separate PRs.

SONAME

Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.

Refs

Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through `daedalus_recipe_dispatch_h264_idct8` instead of `ff_h264_idct8_add_neon`. ## What - Add `0004-h264-idct8-daedalus-fourier.patch` (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends `libavcodec/aarch64/h264_idct_daedalus.c` (introduced by 0003) with `ff_h264_idct8_add_daedalus` and a `daedalus_recipe_dispatch_h264_idct8` call; patches `libavcodec/aarch64/h264dsp_init_aarch64.c` to wire `c->idct8_add` to the new shim. - `arch/PKGBUILD` + `debian/build-deb.sh`: append the new patch to the apply list; bump pkgrel/PKGREL to **7**. - No new build-deps, no Depends change, no daedalus-fourier rev — the `d87239d` pin already exposes `daedalus_recipe_dispatch_h264_idct8`. ## Why The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6. Production validation of cycle 6 on higgs Firefox YouTube **(post-PR #78)**: **3040 frames decoded cleanly, avg_decode_us=3388 µs** vs the pre-substitution ~4 ms baseline → no regression. Cycle 7 inherits the same shim's pthread_once context. Bit-exact against `ff_h264_idct8_add_neon` (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage `block[r + 8*c]` matches daedalus column-major convention). ## Scope NOT covered (deferred) - Bulk `c->idct8_add4` (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with `n_blocks>1` lands later alongside the cycle-6 bulk-paths work. - High-bit-depth (10-bit) path untouched. - Cycles 8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): https://git.reauktion.de/reauktion/daedalus-v4l2/issues/11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4): https://git.reauktion.de/marfrit/marfrit-packages/pulls/76 - marfrit-packages PR #78 (libxml2 ABI-skew workaround): https://git.reauktion.de/marfrit/marfrit-packages/pulls/78
marfrit added 1 commit 2026-05-22 08:20:50 +00:00
Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.idct8_add — called per 8×8 block from the
High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now
dispatches through daedalus_recipe_dispatch_h264_idct8 instead of
ff_h264_idct8_add_neon.

## What

- Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/
  ffmpeg-v4l2-request-fourier/).  Extends libavcodec/aarch64/
  h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus
  and a daedalus_recipe_dispatch_h264_idct8 call; patches
  libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to
  the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append the new patch to the
  apply list; bump pkgrel/PKGREL to 7.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
  d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8.

## Why

The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of cycle 6.  Production validation of
cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly,
avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms
baseline).  Cycle 7 inherits the same shim's pthread_once context.

Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus
column-major convention).

## Scope NOT covered (deferred)

- Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the
  in-tree NEON .S code; batched substitution with n_blocks>1 lands
  later alongside the cycle-6 bulk-paths work.
- High-bit-depth (10-bit) path untouched.
- Cycles 8/9 — separate PRs.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.

## Refs

- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #78 (libxml2 ABI-skew workaround)
- marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)
marfrit merged commit 510a31622c into main 2026-05-22 08:32:15 +00:00
marfrit deleted branch noether/ffmpeg-fourier-idct8-daedalus 2026-05-22 08:32:16 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/marfrit-packages#85