marfrit-packages

Author	SHA1	Message	Date
claude-noether	2732a022f8	ffmpeg-v4l2-request-fourier: route remaining H.264 qpel 8x8 positions through daedalus-fourier (0012) Closes the H.264 qpel substitution. Extends 0007 (which routed only mc20 put_) to ALL 15 useful positions in BOTH the put_ and avg_ tables, skipping mc00 (integer copy / pointer-only fast path). 29 substitutions total: 14 new put_ + 15 avg_. Each wraps a single daedalus_recipe_dispatch_h264_qpel_{avg_,}mcXY call (the dispatches landed in daedalus-fourier PRs #15-#20). Collapsed via a single DEFINE_QPEL_WRAPPER macro on the libavcodec shim side so the diff is uniform. All recipe-table entries route AUTO to CPU NEON — no QPU shaders for any qpel position other than mc20 yet. Plumbing-only NEON-to-NEON via the daedalus recipe layer; bit-exact against the in-tree ff__h264_qpel8__neon path (each daedalus dispatch is already bit-exact-gated by the corresponding fourier PR's test). 16x16 qpel tables ([0][...]) stay on the in-tree NEON. daedalus only exposes 8x8 today; 16x16 substitution can land once fourier provides those variants. Verified the patch applies cleanly on top of 0001-0011 against the pinned upstream commit b57fbbe5 on hertz.	2026-05-25 14:05:56 +02:00
claude-noether	d8aa3aae8d	ffmpeg-v4l2-request-fourier: route H.264 chroma DC Hadamard through daedalus-fourier (0011) Substitutes H264DSPContext.chroma_dc_dequant_idct in the 4:2:0 / bit_depth=8 init path with a wrapper that composes the daedalus chroma DC Hadamard primitive (daedalus-fourier PR #25) with the qmul scaling FFmpeg's reference does in one fused function (h264idct_template.c::ff_h264_chroma_dc_dequant_idct). Algorithm per H.264 §8.5.11.1 / §8.5.11.2: 1. Extract 4 DCs from the scattered positions in the per-MB coefficient buffer (stride=32, xStride=16) 2. 2x2 Hadamard transform (daedalus primitive) 3. qmul scale + >> 7, write back to original positions Bit-exact against ff_h264_chroma_dc_dequant_idct_8_c. The Hadamard itself is gated by the fourier PR #23 7-case test suite (including the H·H = 4·I algebraic invariant), and the public-API parity test added in PR #25 confirms the src/ symbol matches the test ref. 4:2:2 chroma stays on the in-tree ff_h264_chroma422_dc_dequant_idct_c path — same chroma_format_idc<=1 gating shape as 0009 chroma deblock. Pin bump: _daedalus_fourier_commit / DAEDALUS_FOURIER_COMMIT bumped to b9f9ff2a (post-PR #25) so the build picks up the public daedalus_h264_chroma_dc_hadamard_2x2 symbol. Verified the patch applies cleanly on top of 0001-0010 against the pinned upstream commit b57fbbe5 on hertz.	2026-05-25 13:39:54 +02:00
claude-noether	45be17fbdf	ffmpeg-v4l2-request-fourier: route H.264 luma intra deblock through daedalus-fourier (0010) Adds the bS=4 intra-strength variants of the already-substituted luma_v / luma_h deblock (0005, 0008). Intra MBs and certain inter-MB edges (4x4 transform boundaries inside an Intra_NxN neighbour) force boundary strength to 4 per H.264 §8.7.2.1. H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra Both kernels landed in daedalus-fourier PR #11. Recipe → CPU NEON (no intra QPU shaders yet); plumbing-only NEON-to-NEON via daedalus. Signature differs from bS<4: no tc0 argument. Wrapper passes daedalus_h264_deblock_meta with alpha/beta set; tc0[] is ignored by the intra dispatch (bS=4 hardcodes the strength). Chroma intra variants are deferred to a follow-up because the chroma init has a 4:2:0 / 4:2:2 split (chroma_format_idc gating) — the daedalus dispatch is 4:2:0-only and needs explicit conditional substitution to avoid running on 4:2:2 chroma. Verified the patch applies cleanly on top of 0001-0009 against the pinned upstream commit b57fbbe5 on hertz.	2026-05-25 13:21:00 +02:00
claude-noether	babb280410	ffmpeg-v4l2-request-fourier: route H.264 chroma v/h deblock through daedalus-fourier (0009) Chroma siblings of 0005 (luma_v) and 0008 (luma_h). Same NEON-to-NEON pattern via the daedalus recipe layer: H264DSPContext.v_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_v H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h Both kernels landed in daedalus-fourier PR #10. Recipe table routes AUTO to CPU NEON (no chroma QPU shaders yet), so this is plumbing- only and stays bit-exact against the in-tree NEON. Intra chroma (bS=4) loop filters remain on in-tree NEON; daedalus_h264_deblock_meta covers the non-intra (bS<4) path. Verified the patch applies cleanly on top of 0001-0008 against the pinned upstream commit b57fbbe5 on hertz. Wires the new patch into both arch/PKGBUILD and debian/build-deb.sh.	2026-05-25 13:16:45 +02:00
claude-noether	624f83e877	ffmpeg-v4l2-request-fourier: route H.264 luma-h deblock through daedalus-fourier (0008) Adds patch 0008 to the substitution arc, mirroring 0005's V variant for H.264 non-intra bS<4 horizontal luma deblock. H264DSPContext.h_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_h The H kernel was added to daedalus-fourier in PR #9 (vendored ff_h264_h_loop_filter_luma_neon, wired through the same CPU-dispatch pattern as V). Recipe table routes AUTO to CPU NEON (no QPU shader for H yet), so this is a NEON-to-NEON substitution via the daedalus recipe layer — same shape as 0005. The libavcodec.so ctx remains no-QPU (daedalus_ctx_create_no_qpu), matching the existing 0003/0004/0005/0007 patches. Higher-cycle QPU init waits for a feature-flag gating change in a separate PR. Intra (bS=4) h_loop_filter_luma_intra stays on the in-tree NEON .S code; daedalus_h264_deblock_meta covers the non-intra path only. A follow-up can route intra once daedalus-fourier exposes the intra-h dispatch (the kernel already exists internally per fourier PR #11). Wires the new patch into both arch/PKGBUILD and debian/build-deb.sh sequences. Verified the patch applies cleanly on top of 0001-0007 against the pinned upstream commit b57fbbe5 on hertz.	2026-05-25 13:10:05 +02:00
claude-noether	0bfc4ab03e	ffmpeg-v4l2-request-fourier: substitute H.264 qpel mc20 → daedalus-fourier H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal half-pel, 6-tap "put" — the canonical representative of the H.264 luma motion-compensation family) now dispatches through daedalus_recipe_dispatch_h264_qpel_mc20 instead of ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the 4-cycle libavcodec.so substitution sequence: cycle 6 (PR #76) H.264 IDCT 4x4 done cycle 7 (PR #85) H.264 IDCT 8x8 done cycle 8 (PR #86) H.264 luma-v deblock done cycle 9 (this) H.264 qpel mc20 Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public API gains daedalus_recipe_dispatch_h264_qpel_mc20 + DAEDALUS_KERNEL_H264_QPEL_MC20). Verdict per docs/k9_h264qpel_mc20.md: CPU NEON. Per-block 7.6 ns at 131 Mblock/s gives 135× margin over 30 fps 1080p; QPU dispatch floor at ~250 ns makes any V3D shader strictly worse. Substitution is plumbing-only — same daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8 shims already own (kept SEPARATE from the H264DSP shim's ctx because H264QPEL is its own libavcodec Makefile module and link order does not guarantee a single .o owns the ctx symbol; one extra ~µs init per process, paid lazily on first MC call). Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16 size tier stay on the in-tree NEON .S code per the cycle-9 phase-1 rationale (mc20 8x8 is representative; remaining variants would multiply recipe-lookup overhead without changing the substrate verdict). Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier cycle 9 green; 10000/10000 random blocks bit-exact, M3 = 131 Mblock/s). No SONAME change, no Depends change. PKGREL 9 → 10. Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.	2026-05-23 03:32:29 +02:00
marfrit	5c69460722	ffmpeg-v4l2-request-fourier: restore AV_CODEC_FLAG_LOW_DELAY in H.264 decoder FFmpeg 8.x dropped the H.264 decoder's low_delay code path — AV_CODEC_FLAG_LOW_DELAY no longer prevents h264_select_output_frame from running the display-order DPB output queue. The daedalus-v4l2 daemon's `ctx->flags \|= AV_CODEC_FLAG_LOW_DELAY` at daemon/src/decoder.c:202 has been a silent no-op since the SONAME 61→62 jump landed in reauktion/daedalus-v4l2 PR #16; on Firefox YouTube this re-introduced the 2-1-4-3 B-frame pair-swap that PR #12's daemon flag was supposed to prevent. Fix lives in libavcodec, not the daemon: restore the documented LOW_DELAY semantics so the daemon (and any other V4L2-stateless- style consumer) keeps the one-frame-per-send_packet decode-order output contract it already declares. ## Patch 0006-h264-restore-low-delay.patch touches libavcodec/h264_slice.c: - h264_select_output_frame: early-exit when LOW_DELAY is set. Emit the just-decoded picture as next_output_pic, mirror the corruption / recovery-point tracking the main path performs, skip delayed_pic[] / POC reorder machinery entirely. - h264_field_start: suppress the SPS-driven `has_b_frames = sps->num_reorder_frames` clobber when LOW_DELAY is set. Without this the per-slice bitstream_restriction_flag re-pickup would reintroduce a nonzero reorder buffer mid-stream even after the daemon set has_b_frames=0 at avcodec_open2. ## Why not daemon-side A daemon SPS-rewrite (`num_reorder_frames=0`) was considered but rejected: it works only for the daemon's reconstructed SPS NAL, not for any in-band SPS the daemon dlopens libavformat to parse in other code paths. Restoring documented FFmpeg flag semantics is the smaller, more durable change and keeps the daemon interface stable. ## Packaging - PKGREL/pkgrel bump to 9. - No new build-deps, no Depends change. - Substitution arc cycles 6/7/8 unchanged. ## Refs - reauktion/daedalus-v4l2#11 / #12 (LOW_DELAY half-measure on daemon side, originally landed against FFmpeg 7.x). - daemon/src/decoder.c:202 (`ctx->flags \|= AV_CODEC_FLAG_LOW_DELAY` for H.264 only — unchanged, but now actually has effect again).	2026-05-22 14:20:37 +02:00
marfrit	29e0852d11	ffmpeg-v4l2-request-fourier: substitute H.264 luma-v deblock → daedalus-fourier Cycle 8 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.v_loop_filter_luma — non-intra bS<4 vertical luma deblock, called per macroblock-row edge from the slice deblock loop in libavcodec/h264_loopfilter.c — now dispatches through daedalus_recipe_dispatch_h264_deblock_luma_v instead of ff_h264_v_loop_filter_luma_neon. ## What - Add 0005-h264-deblock-luma-v-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/h264_idct_daedalus.c with ff_h264_v_loop_filter_luma_daedalus (constructs a daedalus_h264_deblock_meta from FFmpeg's (alpha, beta, tc0[4]) and calls daedalus_recipe_dispatch_h264_deblock_luma_v with n_edges=1). Patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->v_loop_filter_luma to the new shim. - arch/PKGBUILD + debian/build-deb.sh: append patch + bump pkgrel/PKGREL to 8. - No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_deblock_luma_v. ## Why Cycle 8 is marked "CPU primary; QPU opportunistic" in the daedalus- fourier API docstring. Per the hybrid substrate philosophy ("if there's a coprocessor, use it") we eventually want the QPU opportunism active here. But the libavcodec.so context is process-global and shared with cycles 6/7 via pthread_once, and it uses daedalus_ctx_create_no_qpu deliberately to avoid implicit Vulkan init in arbitrary host processes (Firefox content, mpv-fourier, ffmpeg-fourier CLI, ...). Switching to daedalus_ctx_create here without a feature flag would be a footgun. So cycle 8 lands as plumbing-only NEON-by-recipe substitution for now; opportunistic QPU enablement is a separate follow-up that adds a DAEDALUS_FOURIER_ENABLE_QPU env var or equivalent. ## Scope NOT covered - Intra (bS=4) loop filter c->v_loop_filter_luma_intra — daedalus's daedalus_h264_deblock_meta only covers the non-intra path. - Horizontal-edge variant c->h_loop_filter_luma — separate kernel (not yet in daedalus-fourier API). - Chroma loop filters — separate kernels. - Bulk batching — single-edge dispatch wastes the kernel's n_edges>1 amortization. Same caveat as cycles 6/7; follow-up. - QPU opportunism — see "Why" above. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11: reauktion/daedalus-v4l2#11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4) - marfrit-packages PR #85 (cycle 7 IDCT 8×8) - marfrit/daedalus-fourier cycle 8 close (deblock luma-v NEON green)	2026-05-22 12:17:14 +02:00
marfrit	493c762967	ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. ## What - Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/ h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus and a daedalus_recipe_dispatch_h264_idct8 call; patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to the new shim. - arch/PKGBUILD + debian/build-deb.sh: append the new patch to the apply list; bump pkgrel/PKGREL to 7. - No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8. ## Why The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6. Production validation of cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly, avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms baseline). Cycle 7 inherits the same shim's pthread_once context. Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus column-major convention). ## Scope NOT covered (deferred) - Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with n_blocks>1 lands later alongside the cycle-6 bulk-paths work. - High-bit-depth (10-bit) path untouched. - Cycles 8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4) - marfrit-packages PR #78 (libxml2 ABI-skew workaround) - marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)	2026-05-22 10:20:27 +02:00
marfrit	e641d679d3	ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 4×4 → daedalus-fourier First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct_add — called per 4×4 block from the intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon. ## What - Add 0003-h264-idct4-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Creates libavcodec/aarch64/h264_idct_daedalus.c (ff_h264_idct_add_daedalus shim + lazy pthread_once context init via daedalus_ctx_create_no_qpu), patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct_add to the shim, adds the new .o to libavcodec/aarch64/Makefile. - arch/PKGBUILD + debian/build-deb.sh: fetch + build daedalus-fourier (pinned at d87239d — lockstep with the daedalus-v4l2 daemon's inline build) with -DCMAKE_POSITION_INDEPENDENT_CODE=ON into a per-build temp prefix, then pass --extra-cflags=-I.../include --extra-ldflags=-L.../lib --extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg configure. daedalus_core.a is static-linked into libavcodec.so.62. - debian/control Depends gains libvulkan1 (daedalus_core PUBLIC-links Vulkan::Vulkan for the queryable QPU substrate; the no-QPU constructor still works at runtime but the loader needs libvulkan.so.1 present to dlopen libavcodec.so.62). - arch/PKGBUILD depends gains vulkan-icd-loader, makedepends gains cmake / ninja / vulkan-headers. ## Why The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution with one extra dispatch call and recipe-table lookup. The point of this first cycle isn't perf wins — it's plumbing. Once the path is wired and stable, follow-up patches batch through the bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20). Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6 green; FFmpeg's 4×4 block storage matches daedalus's column-major convention). ## Scope NOT covered - Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT 4×4 calls in real H.264 streams go through these, not the per- block c->idct_add path; intra-4×4-only macroblocks are a minority. Batched substitution lands in a follow-up. - High-bit-depth (10-bit) path — not touched; 8-bit only. - Cycles 7/8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. No daedalus-v4l2-dkms or daedalus-v4l2 bump required. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11 - marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)	2026-05-21 21:44:35 +02:00
Markus Fritsche	9e9447502e	ffmpeg-v4l2-request-fourier: patch NV15→P010 unpack for Hi10P / Main10 The n8.1 pin's hwcontext_v4l2request.c deliberately blanks the transfer-formats list for AV_PIX_FMT_YUV420P10 sw_format (the mapping target for V4L2_PIX_FMT_NV15), so `ffmpeg -hwaccel v4l2request -vf hwdownload,format=p010le` on a Hi10P / Main10 input failed at filter-init with -22 EINVAL — even though kernel-side decode succeeded. 0002-nv15-to-p010-unpack.patch adds an inline NV15→P010 unpack (5 bytes per 4 samples, little-endian → high-10-of-16) inside v4l2request_transfer_data_from, exposes AV_PIX_FMT_P010 in transfer_get_formats for that sw_format, and rejects non-P010 destinations explicitly with ENOSYS instead of silently corrupting output via av_frame_copy on NV15-packed bytes. Verified on fresnel (RK3399, linux-fresnel-fourier 7.0-14): - 5-frame smoke test from issue #21 → exit 0, 13.8MB output - 20-frame mid-fixture decode → bit-exact HW==SW sha256 7d9b66d48d8f17b2281da1881c663ecc31722bb218aba1ae23bf28d07aa66b08 - 8-bit baseline (bbb_60s_720p.h264.mp4) still bit-exact HW==SW (no regression in the existing NV12 path) - Cross-device repro of original EINVAL on unpatched ampere (RK3588) pkgrel=4, confirming the bug is upstream-FFmpeg-side, not RK3399-specific Patch is upstream-able to Kwiboo's v4l2-request-n8.1 branch. Closes #21.	2026-05-18 08:35:19 +00:00
claude-noether	d6c4260eb8	ffmpeg-v4l2-request-git → ffmpeg-v4l2-request-fourier: rename directory PKGBUILD already renamed itself (pkgname=ffmpeg-v4l2-request-fourier, replaces=(ffmpeg ffmpeg-v4l2-request-git)) but the containing directory was never moved. This commit completes the rename to align the path with the package identity and the rest of the -fourier umbrella (libva, mpv, kwin, qt6-base, chromium, linux-). CI workflow path-trigger is wildcard (arch/*), unaffected. Step names + cp source path updated to the new directory.	2026-05-17 01:04:37 +02:00

12 Commits