forked from marfrit/marfrit-packages
ffmpeg-v4l2-request-fourier: substitute H.264 qpel mc20 → daedalus-fourier
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal half-pel, 6-tap "put" — the canonical representative of the H.264 luma motion-compensation family) now dispatches through daedalus_recipe_dispatch_h264_qpel_mc20 instead of ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the 4-cycle libavcodec.so substitution sequence: cycle 6 (PR #76) H.264 IDCT 4x4 done cycle 7 (PR #85) H.264 IDCT 8x8 done cycle 8 (PR #86) H.264 luma-v deblock done cycle 9 (this) H.264 qpel mc20 Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public API gains daedalus_recipe_dispatch_h264_qpel_mc20 + DAEDALUS_KERNEL_H264_QPEL_MC20). Verdict per docs/k9_h264qpel_mc20.md: CPU NEON. Per-block 7.6 ns at 131 Mblock/s gives 135× margin over 30 fps 1080p; QPU dispatch floor at ~250 ns makes any V3D shader strictly worse. Substitution is plumbing-only — same daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8 shims already own (kept SEPARATE from the H264DSP shim's ctx because H264QPEL is its own libavcodec Makefile module and link order does not guarantee a single .o owns the ctx symbol; one extra ~µs init per process, paid lazily on first MC call). Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16 size tier stay on the in-tree NEON .S code per the cycle-9 phase-1 rationale (mc20 8x8 is representative; remaining variants would multiply recipe-lookup overhead without changing the substrate verdict). Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier cycle 9 green; 10000/10000 random blocks bit-exact, M3 = 131 Mblock/s). No SONAME change, no Depends change. PKGREL 9 → 10. Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
This commit is contained in:
@@ -1,3 +1,37 @@
|
||||
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-10) bookworm trixie; urgency=medium
|
||||
|
||||
* Add 0007-h264-qpel-mc20-daedalus-fourier.patch —
|
||||
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma
|
||||
horizontal half-pel, 6-tap "put" — the canonical representative
|
||||
of the H.264 luma motion-compensation family) now dispatches
|
||||
through daedalus_recipe_dispatch_h264_qpel_mc20 instead of
|
||||
ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11
|
||||
step 2 substitution arc; closes the 4-cycle libavcodec.so
|
||||
substitution sequence (6 IDCT4 / 7 IDCT8 / 8 luma-v deblock /
|
||||
9 qpel mc20).
|
||||
* Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public
|
||||
API extended with daedalus_recipe_dispatch_h264_qpel_mc20 +
|
||||
DAEDALUS_KERNEL_H264_QPEL_MC20).
|
||||
* Cycle 9 is "CPU primary; QPU pointless" per
|
||||
docs/k9_h264qpel_mc20.md. Per-block 7.6 ns at 131 Mblock/s
|
||||
gives 135x margin over 30 fps 1080p; QPU dispatch floor at
|
||||
~250 ns makes any V3D shader strictly worse. Substitution
|
||||
is plumbing-only, NEON-by-recipe — same
|
||||
daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8
|
||||
shims already own (kept SEPARATE from the H264DSP shim's ctx
|
||||
because H264QPEL is its own libavcodec Makefile module and
|
||||
link order does not guarantee a single .o owns the ctx symbol;
|
||||
one extra ~µs init per process, paid lazily on first MC call).
|
||||
* Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the
|
||||
16x16 size tier stay on the in-tree NEON .S code. Per the
|
||||
cycle-9 phase-1 rationale, mc20 8x8 is representative of the
|
||||
whole family's per-block cost.
|
||||
* Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
|
||||
cycle 9 green; 10000/10000 random blocks).
|
||||
* No SONAME change, no Depends change.
|
||||
|
||||
-- Markus Fritsche <mfritsche@reauktion.de> Sat, 23 May 2026 12:00:00 +0000
|
||||
|
||||
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-9) bookworm trixie; urgency=medium
|
||||
|
||||
* Add 0006-h264-restore-low-delay.patch — restore the documented
|
||||
|
||||
Reference in New Issue
Block a user