(Renumbered from 0013 — PR #102 landed 0013-h264-deblock-chroma-intra
while this PR was open, so the next free slot is 0014.)
Patches 0003 (IDCT 4x4) and 0007 (qpel mc20) created the libavcodec.so
process-global daedalus_ctx via daedalus_ctx_create_no_qpu(). Rationale
at the time: cycle 6/9 had only CPU NEON paths, so a QPU-capable ctx
would have meant pointless Vulkan init in every host process.
Two things changed since:
1. Every H.264 hot-path primitive now has a V3D7 compute shader.
IDCT 4x4/8x8 + 8 deblock variants (luma+chroma × V+H × inter+intra)
+ 30 qpel positions. See daedalus-fourier PRs #28-#35.
2. Dispatch overhead has been hammered down — buffer pool in
v3d_runner + persistent command buffer. daedalus-fourier PR #36
bench on hertz (Pi 5 V3D 7.1, 30 iters x 5 warmup):
1080p worst-case sum (IDCT4 + deblock luma + qpel mc22):
CPU NEON only: 5.57 ms
QPU only: 1.30 ms (CPU/QPU sum ratio = 4.30x)
PR #10's CPU-4x-faster-than-QPU verdict (which justified the original
no_qpu ctx choice) is reversed by ~17x.
This commit adds 0014-h264-ctx-qpu-capable.patch which flips both H.264
TUs (h264_idct_daedalus.c, h264_qpel_daedalus.c) from
daedalus_ctx_create_no_qpu() to daedalus_ctx_create().
daedalus_ctx_create() probes for a usable Vulkan device and falls back
to no_qpu mode if unavailable, so this is safe on hosts without V3D
(x86 build runners, Debian aarch64 builders without renderD, etc.).
Hosts WITH V3D (Pi 5 deployment targets) now route the H.264 hot-path
through V3D compute instead of CPU NEON.
Wired into both arch PKGBUILD (source[] + prepare()) and debian
build-deb.sh; both pkgrel bumped 10 → 11.
Refs reauktion/daedalus-fourier!36.