Files
marfrit-packages/arch/ffmpeg-v4l2-request-fourier
claude-noether 9c70ffffe7 ffmpeg-v4l2-request-fourier: flip libavcodec daedalus ctx no_qpu → qpu-capable (0014)
(Renumbered from 0013 — PR #102 landed 0013-h264-deblock-chroma-intra
while this PR was open, so the next free slot is 0014.)

Patches 0003 (IDCT 4x4) and 0007 (qpel mc20) created the libavcodec.so
process-global daedalus_ctx via daedalus_ctx_create_no_qpu().  Rationale
at the time: cycle 6/9 had only CPU NEON paths, so a QPU-capable ctx
would have meant pointless Vulkan init in every host process.

Two things changed since:

  1. Every H.264 hot-path primitive now has a V3D7 compute shader.
     IDCT 4x4/8x8 + 8 deblock variants (luma+chroma × V+H × inter+intra)
     + 30 qpel positions.  See daedalus-fourier PRs #28-#35.

  2. Dispatch overhead has been hammered down — buffer pool in
     v3d_runner + persistent command buffer.  daedalus-fourier PR #36
     bench on hertz (Pi 5 V3D 7.1, 30 iters x 5 warmup):

       1080p worst-case sum (IDCT4 + deblock luma + qpel mc22):
         CPU NEON only:  5.57 ms
         QPU only:       1.30 ms   (CPU/QPU sum ratio = 4.30x)

PR #10's CPU-4x-faster-than-QPU verdict (which justified the original
no_qpu ctx choice) is reversed by ~17x.

This commit adds 0014-h264-ctx-qpu-capable.patch which flips both H.264
TUs (h264_idct_daedalus.c, h264_qpel_daedalus.c) from
daedalus_ctx_create_no_qpu() to daedalus_ctx_create().

daedalus_ctx_create() probes for a usable Vulkan device and falls back
to no_qpu mode if unavailable, so this is safe on hosts without V3D
(x86 build runners, Debian aarch64 builders without renderD, etc.).
Hosts WITH V3D (Pi 5 deployment targets) now route the H.264 hot-path
through V3D compute instead of CPU NEON.

Wired into both arch PKGBUILD (source[] + prepare()) and debian
build-deb.sh; both pkgrel bumped 10 → 11.

Refs reauktion/daedalus-fourier!36.
2026-05-25 21:18:18 +02:00
..