forked from marfrit/marfrit-packages
0bfc4ab03e
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal half-pel, 6-tap "put" — the canonical representative of the H.264 luma motion-compensation family) now dispatches through daedalus_recipe_dispatch_h264_qpel_mc20 instead of ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the 4-cycle libavcodec.so substitution sequence: cycle 6 (PR #76) H.264 IDCT 4x4 done cycle 7 (PR #85) H.264 IDCT 8x8 done cycle 8 (PR #86) H.264 luma-v deblock done cycle 9 (this) H.264 qpel mc20 Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public API gains daedalus_recipe_dispatch_h264_qpel_mc20 + DAEDALUS_KERNEL_H264_QPEL_MC20). Verdict per docs/k9_h264qpel_mc20.md: CPU NEON. Per-block 7.6 ns at 131 Mblock/s gives 135× margin over 30 fps 1080p; QPU dispatch floor at ~250 ns makes any V3D shader strictly worse. Substitution is plumbing-only — same daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8 shims already own (kept SEPARATE from the H264DSP shim's ctx because H264QPEL is its own libavcodec Makefile module and link order does not guarantee a single .o owns the ctx symbol; one extra ~µs init per process, paid lazily on first MC call). Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16 size tier stay on the in-tree NEON .S code per the cycle-9 phase-1 rationale (mc20 8x8 is representative; remaining variants would multiply recipe-lookup overhead without changing the substrate verdict). Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier cycle 9 green; 10000/10000 random blocks bit-exact, M3 = 131 Mblock/s). No SONAME change, no Depends change. PKGREL 9 → 10. Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
183 lines
9.4 KiB
Plaintext
183 lines
9.4 KiB
Plaintext
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-10) bookworm trixie; urgency=medium
|
|
|
|
* Add 0007-h264-qpel-mc20-daedalus-fourier.patch —
|
|
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma
|
|
horizontal half-pel, 6-tap "put" — the canonical representative
|
|
of the H.264 luma motion-compensation family) now dispatches
|
|
through daedalus_recipe_dispatch_h264_qpel_mc20 instead of
|
|
ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11
|
|
step 2 substitution arc; closes the 4-cycle libavcodec.so
|
|
substitution sequence (6 IDCT4 / 7 IDCT8 / 8 luma-v deblock /
|
|
9 qpel mc20).
|
|
* Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public
|
|
API extended with daedalus_recipe_dispatch_h264_qpel_mc20 +
|
|
DAEDALUS_KERNEL_H264_QPEL_MC20).
|
|
* Cycle 9 is "CPU primary; QPU pointless" per
|
|
docs/k9_h264qpel_mc20.md. Per-block 7.6 ns at 131 Mblock/s
|
|
gives 135x margin over 30 fps 1080p; QPU dispatch floor at
|
|
~250 ns makes any V3D shader strictly worse. Substitution
|
|
is plumbing-only, NEON-by-recipe — same
|
|
daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8
|
|
shims already own (kept SEPARATE from the H264DSP shim's ctx
|
|
because H264QPEL is its own libavcodec Makefile module and
|
|
link order does not guarantee a single .o owns the ctx symbol;
|
|
one extra ~µs init per process, paid lazily on first MC call).
|
|
* Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the
|
|
16x16 size tier stay on the in-tree NEON .S code. Per the
|
|
cycle-9 phase-1 rationale, mc20 8x8 is representative of the
|
|
whole family's per-block cost.
|
|
* Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
|
|
cycle 9 green; 10000/10000 random blocks).
|
|
* No SONAME change, no Depends change.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Sat, 23 May 2026 12:00:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-9) bookworm trixie; urgency=medium
|
|
|
|
* Add 0006-h264-restore-low-delay.patch — restore the documented
|
|
AV_CODEC_FLAG_LOW_DELAY semantics in the H.264 decoder. FFmpeg
|
|
8.x dropped the H.264 low_delay code path entirely; setting the
|
|
flag at avcodec_open2 no longer prevents the display-order DPB
|
|
output queue from running. Visible on Firefox YouTube as the
|
|
2-1-4-3 B-frame pair-swap, re-introduced silently by the
|
|
SONAME 61→62 jump in daedalus-v4l2 PR #16.
|
|
* h264_select_output_frame: early-exit when LOW_DELAY is set;
|
|
emit the just-decoded picture as next_output_pic, mirror the
|
|
corruption / recovery-point tracking, skip delayed_pic[] and
|
|
the POC reorder machinery entirely.
|
|
* h264_field_start: suppress the SPS-driven
|
|
has_b_frames = sps->num_reorder_frames clobber when LOW_DELAY
|
|
is set — without this the per-slice bitstream_restriction_flag
|
|
re-pickup would reintroduce a nonzero reorder buffer mid-
|
|
stream.
|
|
* Restores the same one-frame-per-send_packet contract the
|
|
daedalus-v4l2 daemon's decoder.c already relies on (the flag
|
|
is set unconditionally for H.264). No daemon side change.
|
|
* No SONAME change, no Depends change.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 13:30:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-8) bookworm trixie; urgency=medium
|
|
|
|
* Add 0005-h264-deblock-luma-v-daedalus-fourier.patch —
|
|
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
|
|
deblock, called per macroblock-row edge from the slice deblock
|
|
loop in libavcodec/h264_loopfilter.c) now dispatches through
|
|
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
|
|
ff_h264_v_loop_filter_luma_neon. Cycle 8 of the daedalus-v4l2#11
|
|
step 2 substitution arc.
|
|
* Cycle 8 is marked "CPU primary; QPU opportunistic" in
|
|
daedalus-fourier, but the libavcodec.so context here uses
|
|
daedalus_ctx_create_no_qpu (process-global pthread_once,
|
|
shared with cycles 6/7). Opportunistic QPU is deferred to a
|
|
separate change that gates Vulkan init on a feature flag, to
|
|
avoid implicit Vulkan init in arbitrary host processes. For
|
|
now cycle 8 is plumbing-only — NEON-by-recipe.
|
|
* Intra (bS=4) loop filter c->v_loop_filter_luma_intra stays on
|
|
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
|
|
only covers the non-intra path per its API docstring.
|
|
* Bit-exact against ff_h264_v_loop_filter_luma_neon (daedalus-fourier
|
|
cycle 8 green).
|
|
* No SONAME change, no Depends change.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 12:30:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
|
|
|
|
* Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
|
|
(per-block 8x8 IDCT, called from the High-profile intra-8x8-DCT
|
|
macroblock path in libavcodec/h264_mb.c) now dispatches through
|
|
daedalus_recipe_dispatch_h264_idct8 instead of
|
|
ff_h264_idct8_add_neon. Cycle 7 of the daedalus-v4l2#11 step 2
|
|
substitution arc — NEON-by-recipe, same pthread_once context the
|
|
cycle-6 IDCT 4x4 shim already owns.
|
|
* Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
|
|
green; FFmpeg 8x8 block storage block[r + 8*c] matches daedalus
|
|
column-major convention).
|
|
* Bulk c->idct8_add4 (inter 8x8-DCT macroblocks) stays on the
|
|
in-tree NEON .S code; batched substitution lands later.
|
|
* No SONAME change, no Depends change.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 10:30:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium
|
|
|
|
* Drop --enable-libxml2 + libxml2 Depends — the Gitea
|
|
debian-aarch64 runner ships libxml2 ≥ 2.14 (SONAME 16) while
|
|
Debian trixie targets 2.12 (SONAME 2). -5 built fine, then
|
|
failed to load on higgs trixie:
|
|
dlopen(libavformat.so.62): libxml2.so.16:
|
|
cannot open shared object file
|
|
Neither the daedalus-v4l2 daemon (direct AVPacket feed —
|
|
libavformat used only for the in-tree v4l2request hwaccel
|
|
glue) nor mpv-fourier (Lua + ytdlp + mpv's stream code do
|
|
DASH/HLS) nor firefox-fourier (gecko-media DASH demux)
|
|
consumes FFmpeg's libxml2-backed DASH demuxer, so dropping is
|
|
feature-neutral. Mirrors the libva trixie/runner ABI-skew
|
|
workaround documented in PR #62.
|
|
* CI workflow build-deps lose libxml2-dev for the same reason.
|
|
* No source code change beyond configure flags + Depends.
|
|
Substitution stays as PRs #76/#77 landed.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 23:30:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-5) bookworm trixie; urgency=medium
|
|
|
|
* pkgrel-only bump (3 → 5) to force a rebuild of the H.264 IDCT 4x4
|
|
daedalus-fourier substitution that landed in marfrit-packages PR
|
|
#76. An orphan -4 .deb already sat in the apt pool (dated
|
|
2026-05-19, no matching source commit in main); CI's
|
|
check-already-published.sh compares with `dpkg --compare-versions
|
|
pool_ver ge source_full`, which short-circuited PR #76's -3
|
|
build. Skipping past -4 lets the CI workflow actually publish the
|
|
substitution.
|
|
* No source code change beyond PKGREL and this changelog entry.
|
|
Substitution + control + build-deb.sh wiring stay as PR #76 left
|
|
them.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 21:30:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-3) bookworm trixie; urgency=medium
|
|
|
|
* Add 0003-h264-idct4-daedalus-fourier.patch — H264DSPContext.idct_add
|
|
(per-block 4x4 IDCT, called from the intra-4x4 decode path in
|
|
libavcodec/h264_mb.c) now dispatches through
|
|
daedalus_recipe_dispatch_h264_idct4 instead of
|
|
ff_h264_idct_add_neon. First end-to-end exercise of the
|
|
daedalus-fourier kernel pack inside libavcodec.so on the
|
|
production decode hot path (daedalus-v4l2#11 step 2 — cycle 6
|
|
H.264 IDCT 4x4, NEON-by-recipe).
|
|
* build-deb.sh: fetches + builds daedalus-fourier (pinned at
|
|
d87239d, lockstep with the daemon's static link) with
|
|
-fPIC into a per-build temp prefix, then passes
|
|
--extra-cflags=-I.../include --extra-ldflags=-L.../lib
|
|
--extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
|
|
configure. Static-linked into libavcodec.so.62.
|
|
* Bulk paths (idct_add16 / idct_add16intra / idct_add8) remain on
|
|
the stock NEON .S code and will be batched through
|
|
daedalus_recipe_dispatch_h264_idct4 with n_blocks>1 in a
|
|
follow-up. Cycles 7/8/9 (IDCT 8x8 / luma-v deblock / qpel mc20)
|
|
land in subsequent patches.
|
|
* Depends gains libvulkan1 — daedalus_core PUBLIC-links Vulkan
|
|
(queryable QPU substrate); the no-QPU constructor still works,
|
|
but the loader refuses libavcodec.so.62 at dlopen time without
|
|
libvulkan.so.1 present.
|
|
* No ABI change; SONAMEs stay 62/62/60.
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 20:00:00 +0000
|
|
|
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-1) bookworm trixie; urgency=medium
|
|
|
|
* Initial Debian packaging for the Kwiboo FFmpeg fork with V4L2
|
|
Request API hwaccel patches.
|
|
* Mirror of arch/ffmpeg-v4l2-request-fourier (same pin b57fbbe,
|
|
same configure flags, same 2 patches: libudev-bypass-fallback +
|
|
nv15-to-p010-unpack).
|
|
* Drop-in replacement for Debian's stock ffmpeg + libav*; takes
|
|
epoch 2 to win the apt version comparison.
|
|
* Required by mpv-fourier and firefox-fourier; not strictly
|
|
required for the VAAPI-only path on daedalus-v4l2 hosts (stock
|
|
libva + Debian ffmpeg works there).
|
|
|
|
-- Markus Fritsche <mfritsche@reauktion.de> Mon, 18 May 2026 23:00:00 +0000
|