15 Commits

Author SHA1 Message Date
marfrit 91022b390e mesa-panvk-bifrost{,-video}: fix url= to real Gitea repo
Both PKGBUILDs referenced url=https://github.com/marfrit/panvk-bifrost,
which was a hallucinated URL — no such repo existed. The campaign's
real source-of-truth home was just created at
https://git.reauktion.de/marfrit/panvk-bifrost (mfritsche, 2026-05-23).

Point both PKGBUILDs at the real URL so `pacman -Si` and any consumer
reading package metadata follows a working link.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:17:41 +02:00
marfrit b736dd0529 Merge pull request 'ffmpeg-v4l2-request-fourier: substitute H.264 qpel mc20 → daedalus-fourier' (#90) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-qpel-mc20-daedalus into main
Reviewed-on: marfrit/marfrit-packages#90
2026-05-23 01:34:04 +00:00
claude-noether 0bfc4ab03e ffmpeg-v4l2-request-fourier: substitute H.264 qpel mc20 → daedalus-fourier
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
half-pel, 6-tap "put" — the canonical representative of the H.264
luma motion-compensation family) now dispatches through
daedalus_recipe_dispatch_h264_qpel_mc20 instead of
ff_put_h264_qpel8_mc20_neon.

Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
4-cycle libavcodec.so substitution sequence:

  cycle 6 (PR #76)  H.264 IDCT 4x4         done
  cycle 7 (PR #85)  H.264 IDCT 8x8         done
  cycle 8 (PR #86)  H.264 luma-v deblock   done
  cycle 9 (this)    H.264 qpel mc20

Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public API
gains daedalus_recipe_dispatch_h264_qpel_mc20 +
DAEDALUS_KERNEL_H264_QPEL_MC20).

Verdict per docs/k9_h264qpel_mc20.md: CPU NEON.  Per-block 7.6 ns at
131 Mblock/s gives 135× margin over 30 fps 1080p; QPU dispatch floor
at ~250 ns makes any V3D shader strictly worse.  Substitution is
plumbing-only — same daedalus_ctx_create_no_qpu pthread_once shape
the cycles 6/7/8 shims already own (kept SEPARATE from the H264DSP
shim's ctx because H264QPEL is its own libavcodec Makefile module
and link order does not guarantee a single .o owns the ctx symbol;
one extra ~µs init per process, paid lazily on first MC call).

Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
size tier stay on the in-tree NEON .S code per the cycle-9 phase-1
rationale (mc20 8x8 is representative; remaining variants would
multiply recipe-lookup overhead without changing the substrate
verdict).

Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
cycle 9 green; 10000/10000 random blocks bit-exact, M3 = 131 Mblock/s).

No SONAME change, no Depends change.  PKGREL 9 → 10.

Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
2026-05-23 03:32:29 +02:00
marfrit 8729c2db92 Merge pull request 'daedalus-v4l2 + daedalus-v4l2-dkms: bump to 872eec5 — PROTO_MAX_PAYLOAD 1 MiB (#20)' (#89) from claude-noether/marfrit-packages:noether/daedalus-bump-872eec5-1mib-payload into main
Reviewed-on: marfrit/marfrit-packages#89
2026-05-22 18:52:53 +00:00
marfrit d449ec1073 daedalus-v4l2 + daedalus-v4l2-dkms: bump to 872eec5 — PROTO_MAX_PAYLOAD 1 MiB (#20)
Picks up reauktion/daedalus-v4l2 PR #20 (closes #19): wire-protocol
cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB to 1 MiB.
DAEDALUS_MAX_BITSTREAM follows; daedalus_fill_output_fmt now reports
OUTPUT_MPLANE sizeimage = ~1 MiB.

Fixes the Firefox YouTube avc1 SW-fallback observed on higgs when
any H.264 slice exceeded 64 KiB (routine on 720p+ streams).
libva-v4l2-request-fourier's S_FMT-driven OUTPUT-pool resize was
clamping back to 65484 and Firefox lost the slice; now the kernel
honours the larger sizeimage.

Both packages bumped to 0.1.0+r45+g872eec5-1:

  - daedalus-v4l2 (daemon): r43 -> r45.  Daemon-side allocations
    are dynamic, so the only growth is one ~1 MiB read buffer per
    daemon process at startup.
  - daedalus-v4l2-dkms (kernel module): r33 -> r45.  Skips the
    daemon-only bumps r37/r39/r41/r43 (no kernel/include change in
    that range) and lands the PROTO_MAX_PAYLOAD bump.

LOCK-STEP INSTALL REQUIRED: effective cap is min(kernel, daemon).
A stale kernel with a new daemon (or vice versa) still rejects
>64 KiB payloads.  apt/pacman should pick both up in one
transaction since they share the same upstream pin.

Wire-protocol value-only change in include/daedalus_v4l2_proto.h;
struct layout unchanged.  DAEDALUS_PROTO_VERSION stays at 0.
2026-05-22 20:50:04 +02:00
marfrit 9d30c34be9 Merge pull request 'daedalus-v4l2: 6e6dfa1 -> 1d8f5af — pause-time tiny-bitstream filter (#18)' (#88) from claude-noether/marfrit-packages:noether/daedalus-bump-1d8f5af-pause-filter into main
Reviewed-on: marfrit/marfrit-packages#88
2026-05-22 16:20:14 +00:00
marfrit 1ca18ac130 daedalus-v4l2: 6e6dfa1 -> 1d8f5af — pause-time tiny-bitstream filter (#18)
Picks up reauktion/daedalus-v4l2 PR #18 (closes #17): daemon drops
degenerate (<4 byte) bitstreams at REQ_DECODE entry instead of
letting avcodec_send_packet emit AVERROR_INVALIDDATA, replies
RESP_FRAME NO_FRAME so libva's V4L2 surface pool stays alive.

Fixes the Firefox YouTube avc1 pause→resume regression observed on
higgs: libva-v4l2-request-fourier flushes a 3-byte stub into
OUTPUT_MPLANE at the pause boundary; the old daemon path turned
that into a decode failure, Firefox marked H.264-via-VAAPI as
broken for the session, and routed every subsequent frame to
libmozavcodec SW.  After this bump the daemon logs 'tiny bitstream
3 bytes — dropping as no-op' and the next real REQ_DECODE
proceeds normally.

Wire protocol unchanged.  daedalus-v4l2-dkms bump not needed.
2026-05-22 18:16:33 +02:00
marfrit cf9eef6cfa Merge pull request 'ffmpeg-v4l2-request-fourier: restore AV_CODEC_FLAG_LOW_DELAY in H.264 decoder' (#87) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-restore-low-delay into main
Reviewed-on: marfrit/marfrit-packages#87
2026-05-22 14:27:43 +00:00
marfrit 5c69460722 ffmpeg-v4l2-request-fourier: restore AV_CODEC_FLAG_LOW_DELAY in H.264 decoder
FFmpeg 8.x dropped the H.264 decoder's low_delay code path —
AV_CODEC_FLAG_LOW_DELAY no longer prevents h264_select_output_frame
from running the display-order DPB output queue.  The daedalus-v4l2
daemon's `ctx->flags |= AV_CODEC_FLAG_LOW_DELAY` at
daemon/src/decoder.c:202 has been a silent no-op since the SONAME
61→62 jump landed in reauktion/daedalus-v4l2 PR #16; on Firefox
YouTube this re-introduced the 2-1-4-3 B-frame pair-swap that PR
#12's daemon flag was supposed to prevent.

Fix lives in libavcodec, not the daemon: restore the documented
LOW_DELAY semantics so the daemon (and any other V4L2-stateless-
style consumer) keeps the one-frame-per-send_packet decode-order
output contract it already declares.

## Patch

0006-h264-restore-low-delay.patch touches libavcodec/h264_slice.c:

- h264_select_output_frame: early-exit when LOW_DELAY is set.
  Emit the just-decoded picture as next_output_pic, mirror the
  corruption / recovery-point tracking the main path performs,
  skip delayed_pic[] / POC reorder machinery entirely.

- h264_field_start: suppress the SPS-driven
  `has_b_frames = sps->num_reorder_frames` clobber when LOW_DELAY
  is set.  Without this the per-slice bitstream_restriction_flag
  re-pickup would reintroduce a nonzero reorder buffer mid-stream
  even after the daemon set has_b_frames=0 at avcodec_open2.

## Why not daemon-side

A daemon SPS-rewrite (`num_reorder_frames=0`) was considered but
rejected: it works only for the daemon's reconstructed SPS NAL,
not for any in-band SPS the daemon dlopens libavformat to parse
in other code paths.  Restoring documented FFmpeg flag semantics
is the smaller, more durable change and keeps the daemon
interface stable.

## Packaging

- PKGREL/pkgrel bump to 9.
- No new build-deps, no Depends change.
- Substitution arc cycles 6/7/8 unchanged.

## Refs

- reauktion/daedalus-v4l2#11 / #12 (LOW_DELAY half-measure on
  daemon side, originally landed against FFmpeg 7.x).
- daemon/src/decoder.c:202 (`ctx->flags |= AV_CODEC_FLAG_LOW_DELAY`
  for H.264 only — unchanged, but now actually has effect again).
2026-05-22 14:20:37 +02:00
marfrit d11a52405d Merge pull request 'ffmpeg-v4l2-request-fourier: substitute H.264 luma-v deblock → daedalus-fourier' (#86) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-deblock-luma-v-daedalus into main
Reviewed-on: marfrit/marfrit-packages#86
2026-05-22 10:29:09 +00:00
marfrit 29e0852d11 ffmpeg-v4l2-request-fourier: substitute H.264 luma-v deblock → daedalus-fourier
Cycle 8 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.v_loop_filter_luma — non-intra bS<4 vertical
luma deblock, called per macroblock-row edge from the slice deblock
loop in libavcodec/h264_loopfilter.c — now dispatches through
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
ff_h264_v_loop_filter_luma_neon.

## What

- Add 0005-h264-deblock-luma-v-daedalus-fourier.patch (in both arch/
  and debian/ ffmpeg-v4l2-request-fourier/).  Extends
  libavcodec/aarch64/h264_idct_daedalus.c with
  ff_h264_v_loop_filter_luma_daedalus (constructs a
  daedalus_h264_deblock_meta from FFmpeg's (alpha, beta, tc0[4]) and
  calls daedalus_recipe_dispatch_h264_deblock_luma_v with n_edges=1).
  Patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire
  c->v_loop_filter_luma to the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append patch + bump pkgrel/PKGREL
  to 8.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
  d87239d pin already exposes daedalus_recipe_dispatch_h264_deblock_luma_v.

## Why

Cycle 8 is marked "CPU primary; QPU opportunistic" in the daedalus-
fourier API docstring.  Per the hybrid substrate philosophy
("if there's a coprocessor, use it") we eventually want the QPU
opportunism active here.  But the libavcodec.so context is
process-global and shared with cycles 6/7 via pthread_once, and it
uses daedalus_ctx_create_no_qpu deliberately to avoid implicit
Vulkan init in arbitrary host processes (Firefox content, mpv-fourier,
ffmpeg-fourier CLI, ...).  Switching to daedalus_ctx_create here
without a feature flag would be a footgun.

So cycle 8 lands as plumbing-only NEON-by-recipe substitution for
now; opportunistic QPU enablement is a separate follow-up that adds
a DAEDALUS_FOURIER_ENABLE_QPU env var or equivalent.

## Scope NOT covered

- Intra (bS=4) loop filter c->v_loop_filter_luma_intra — daedalus's
  daedalus_h264_deblock_meta only covers the non-intra path.
- Horizontal-edge variant c->h_loop_filter_luma — separate kernel
  (not yet in daedalus-fourier API).
- Chroma loop filters — separate kernels.
- Bulk batching — single-edge dispatch wastes the kernel's n_edges>1
  amortization.  Same caveat as cycles 6/7; follow-up.
- QPU opportunism — see "Why" above.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.

## Refs

- reauktion/daedalus-v4l2 issue #11: reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #85 (cycle 7 IDCT 8×8)
- marfrit/daedalus-fourier cycle 8 close (deblock luma-v NEON green)
2026-05-22 12:17:14 +02:00
marfrit 510a31622c Merge pull request 'ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier' (#85) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-idct8-daedalus into main
Reviewed-on: marfrit/marfrit-packages#85
2026-05-22 08:32:15 +00:00
marfrit db9ae16da9 Merge pull request 'mesa-panvk-bifrost-video: regenerate 0005 patch from POST-review snapshot' (#84) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-video-retrigger into main
Reviewed-on: marfrit/marfrit-packages#84
2026-05-22 08:20:34 +00:00
marfrit 493c762967 ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier
Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.idct8_add — called per 8×8 block from the
High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now
dispatches through daedalus_recipe_dispatch_h264_idct8 instead of
ff_h264_idct8_add_neon.

## What

- Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/
  ffmpeg-v4l2-request-fourier/).  Extends libavcodec/aarch64/
  h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus
  and a daedalus_recipe_dispatch_h264_idct8 call; patches
  libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to
  the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append the new patch to the
  apply list; bump pkgrel/PKGREL to 7.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
  d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8.

## Why

The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of cycle 6.  Production validation of
cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly,
avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms
baseline).  Cycle 7 inherits the same shim's pthread_once context.

Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus
column-major convention).

## Scope NOT covered (deferred)

- Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the
  in-tree NEON .S code; batched substitution with n_blocks>1 lands
  later alongside the cycle-6 bulk-paths work.
- High-bit-depth (10-bit) path untouched.
- Cycles 8/9 — separate PRs.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.

## Refs

- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #78 (libxml2 ABI-skew workaround)
- marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)
2026-05-22 10:20:27 +02:00
marfrit 7ecbcb3c1b mesa-panvk-bifrost-video: regenerate 0005 patch from POST-review snapshot
The original 0005 patch was generated from the pre-Phase-5-review source
snapshot (phase5_review_input_2026-05-21.tgz), missing the four
load-bearing review fixes that landed in the post-review snapshot:
  - probe_hantro gate on KHR_video_* extension advertisement
  - per-session ts_counter (was process-global static)
  - panvk_v4l2_session_finish full unwind (munmap + STREAMOFF + REQBUFS=0)
  - MIN2(rb.count, 18) clamp on num_*_buffers

Run #162 (job 17032) failed in prepare() because the PKGBUILD sanity
check 'grep -q "KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro()"'
didn't match the actual patched output (which still had the pre-review
'KHR_video_queue = PAN_ARCH < 9,').

This patch (regenerated from phase5_post_review_2026-05-21.tgz) carries
all four review fixes. Validated locally: vanilla mesa-26.0.6 + r1..r4 +
this patch reproduces prepare()-OK byte-for-byte.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 10:18:11 +02:00
20 changed files with 1231 additions and 67 deletions
+3 -3
View File
@@ -23,10 +23,10 @@ _module=daedalus_v4l2
# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls). # content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
# PROTO_VERSION drops 1 → 0; lock-step install with # PROTO_VERSION drops 1 → 0; lock-step install with
# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED. # daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
_commit=5d8b4369e58ab947d1c56b1f718293c57c6065b5 _commit=872eec505eb91b561892d02a0526749348ddc121
pkgver=0.1.0.r33.5d8b436 pkgver=0.1.0.r45.872eec5
pkgrel=1 # reset for new upstream pin (5d8b436 — revert parking design) pkgrel=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0.r45.872eec5 REQUIRED
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5" pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
arch=('any') arch=('any')
url="https://git.reauktion.de/reauktion/daedalus-v4l2" url="https://git.reauktion.de/reauktion/daedalus-v4l2"
+3 -3
View File
@@ -23,12 +23,12 @@ _upstreampkg=daedalus-v4l2
# (daedalus-v4l2#11). Daemon still needs daedalus-fourier at # (daedalus-v4l2#11). Daemon still needs daedalus-fourier at
# build time (Arch packaging for that is a follow-up; Debian side # build time (Arch packaging for that is a follow-up; Debian side
# fetches inline via build-deb.sh). # fetches inline via build-deb.sh).
_commit=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c _commit=872eec505eb91b561892d02a0526749348ddc121
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each # 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
# Phase 8.x close. pkgver() recomputes at build time. # Phase 8.x close. pkgver() recomputes at build time.
pkgver=0.1.0.r41.6e6dfa1 pkgver=0.1.0.r45.872eec5
pkgrel=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier) pkgrel=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0.r45.872eec5 REQUIRED
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)" pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
arch=('aarch64') arch=('aarch64')
url="https://git.reauktion.de/reauktion/daedalus-v4l2" url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -0,0 +1,107 @@
From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 10:18:21 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.idct8_add (called per 8x8 block from the High-profile
intra-8x8-DCT decode path in h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of the cycle-6 IDCT 4x4 wiring. Same
pthread_once global context, same destructive-zero semantics; FFmpeg
column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
on the in-tree NEON .S code and will be batched through
daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
---
libavcodec/aarch64/h264_idct_daedalus.c | 29 ++++++++++++++++-------
libavcodec/aarch64/h264dsp_init_aarch64.c | 3 ++-
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
index 538d223..cbb98af 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,14 +1,16 @@
/*
- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
+ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
*
- * Routes H264DSPContext.idct_add through
- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
- * The recipe layer picks the substrate (CPU NEON by default for
- * cycle 6; future cycles may dispatch to V3D opportunistically).
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
+ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
+ * recipe layer picks the substrate (CPU NEON by default for cycles
+ * 6 + 7; future cycles may dispatch to V3D opportunistically).
*
- * FFmpeg's 4x4 block memory layout matches daedalus's column-major
- * convention: block[r + 4*c] = coefficient at (row r, col c). Both
- * sides destructively zero the block after the transform.
+ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
+ * column-major convention: block[r + N*c] = coefficient at
+ * (row r, col c) for N ∈ {4, 8}. Both sides destructively zero the
+ * block after the transform.
*
* The library context is process-global and lazily initialised under
* pthread_once. We pick the no-QPU constructor here because
@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
}
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
{
@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
block, 1, &meta);
}
+
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
+{
+ static const daedalus_h264_block_meta meta = { .dst_off = 0 };
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
+ block, 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index b993df2..741e551 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
const uint8_t nnzc[15 * 8]);
void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
int16_t *block, int stride,
@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
c->idct_add16intra = ff_h264_idct_add16intra_neon;
if (chroma_format_idc <= 1)
c->idct_add8 = ff_h264_idct_add8_neon;
- c->idct8_add = ff_h264_idct8_add_neon;
+ c->idct8_add = ff_h264_idct8_add_daedalus;
c->idct8_dc_add = ff_h264_idct8_dc_add_neon;
c->idct8_add4 = ff_h264_idct8_add4_neon;
} else if (have_neon(cpu_flags) && bit_depth == 10) {
--
2.47.3
@@ -0,0 +1,121 @@
From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 12:15:16 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
deblock, called per macroblock-row edge from the slice deblock
loop) now dispatches through
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
ff_h264_v_loop_filter_luma_neon.
The recipe layer picks the substrate; for cycle 8 the daedalus
docstring marks the kernel "CPU primary; QPU opportunistic", but
the libavcodec.so context here is built with
daedalus_ctx_create_no_qpu — process-global pthread_once init,
shared with cycles 6/7. QPU opportunism stays gated off until a
follow-up adds an explicit feature flag (no implicit Vulkan init
in arbitrary host processes). In the meantime cycle 8 is a
plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
only covers the non-intra path per its docstring.
FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
(int32_t alpha/beta + inline int8_t tc0[4]). pix already points
to row 0 of the bottom block per FFmpeg's deblock convention,
satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
---
libavcodec/aarch64/h264_idct_daedalus.c | 36 +++++++++++++++++++----
libavcodec/aarch64/h264dsp_init_aarch64.c | 4 ++-
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
index cbb98af..92365fa 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,11 +1,14 @@
/*
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
*
- * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
- * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
- * recipe layer picks the substrate (CPU NEON by default for cycles
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
+ * H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
+ * instead of the in-tree ff_h264_*_neon assembly. The recipe layer
+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
+ * so cycle 8 stays on the CPU NEON path until a separate change
+ * gates QPU init on a daedalus-fourier feature flag).
*
* FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
* column-major convention: block[r + N*c] = coefficient at
@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0);
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
{
@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
block, 1, &meta);
}
+
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0)
+{
+ daedalus_h264_deblock_meta meta = {
+ .dst_off = 0,
+ .alpha = alpha,
+ .beta = beta,
+ };
+ meta.tc0[0] = tc0[0];
+ meta.tc0[1] = tc0[1];
+ meta.tc0[2] = tc0[2];
+ meta.tc0[3] = tc0[3];
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
+ 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index 741e551..85ac381 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -27,6 +27,8 @@
void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
int beta, int8_t *tc0);
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0);
void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
int beta, int8_t *tc0);
void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
int cpu_flags = av_get_cpu_flags();
if (have_neon(cpu_flags) && bit_depth == 8) {
- c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_neon;
+ c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_daedalus;
c->h_loop_filter_luma = ff_h264_h_loop_filter_luma_neon;
c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
--
2.47.3
@@ -0,0 +1,82 @@
From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 14:18:25 +0200
Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
FFmpeg 8.x dropped the H.264 decoder's low_delay path —
AV_CODEC_FLAG_LOW_DELAY no longer prevents
h264_select_output_frame from running the display-order DPB
output queue. V4L2-stateless-style consumers (daedalus-v4l2
daemon, libva-v4l2-request-fourier) that set the flag end up
seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
Restore the documented semantics:
- Early-exit at the top of h264_select_output_frame when the
flag is set: emit the just-decoded picture immediately as
next_output_pic, mirror the corruption / recovery-point
tracking the main path performs, and skip the entire
delayed_pic[] / POC reorder machinery.
- Suppress the SPS-driven has_b_frames clobber in
h264_field_start when the flag is set, so the per-slice
bitstream_restriction_flag re-pickup cannot reintroduce a
nonzero reorder buffer mid-stream.
This is a fork-only change required by the daedalus-v4l2 daemon's
one-frame-per-send_packet contract; upstream FFmpeg consumers that
expect display-order output remain untouched (flag default = off).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
+ flag-restoration follow-up.
---
libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index 97fab70..a7bfbd6 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
cur->mmco_reset = h->mmco_reset;
h->mmco_reset = 0;
+ /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
+ * decoder's low_delay path). Bypass the display-order DPB
+ * output queue: emit the just-decoded picture immediately, in
+ * decode order, one per send_packet. V4L2-stateless-style
+ * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
+ * do their own POC-based reorder downstream and require this
+ * behaviour. */
+ if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
+ h->next_output_pic = cur;
+ h->next_outputed_poc = cur->poc;
+ h->frame_recovered |= cur->recovered;
+ cur->recovered |= h->frame_recovered & FRAME_RECOVERED_SEI;
+ if (!cur->recovered) {
+ if (!(h->avctx->flags & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
+ !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
+ h->next_output_pic = NULL;
+ else
+ cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
+ }
+ return 0;
+ }
+
if (sps->bitstream_restriction_flag ||
h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
sps = h->ps.sps;
if (sps->bitstream_restriction_flag &&
+ !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
h->avctx->has_b_frames < sps->num_reorder_frames) {
h->avctx->has_b_frames = sps->num_reorder_frames;
}
--
2.47.3
@@ -0,0 +1,139 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 23 May 2026 12:00:00 +0200
Subject: [PATCH] avcodec/aarch64/h264qpel: route 8x8 mc20 through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
half-pel, 6-tap "put" variant — the canonical representative of the
H.264 luma motion-compensation family) now dispatches through
daedalus_recipe_dispatch_h264_qpel_mc20 instead of
ff_put_h264_qpel8_mc20_neon.
Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
4-cycle libavcodec.so substitution sequence (6 IDCT 4x4 / 7 IDCT 8x8 /
8 luma-v deblock / 9 qpel mc20).
The recipe layer picks the substrate. Per docs/k9_h264qpel_mc20.md
the verdict is CPU NEON: per-block 7.6 ns at 131 Mblock/s gives 135x
margin over 30 fps 1080p, and the QPU dispatch floor (~250 ns)
makes any V3D shader strictly worse. Substitution is plumbing-only,
NEON-by-recipe — same daedalus_ctx_create_no_qpu pthread_once
context shape the cycles 6/7/8 shims already own (kept SEPARATE
from the H264DSP shim's ctx because H264QPEL is its own libavcodec
Makefile module and link order does not guarantee a single .o
owns the ctx symbol; one extra ~µs init per process, paid lazily).
Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
size tier stay on the in-tree NEON .S code. Per the cycle-9 phase-1
rationale, mc20 8x8 is representative of the whole family's per-block
cost — extending the substitution to other variants would multiply
recipe-lookup overhead without changing the substrate verdict.
Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
cycle 9 green; M1 = 100% bit-exact across 10000 random blocks).
No SONAME change, no Depends change.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
---
libavcodec/aarch64/Makefile | 3 +-
libavcodec/aarch64/h264_qpel_daedalus.c | 50 ++++++++++++++++++++++
libavcodec/aarch64/h264qpel_init_aarch64.c | 4 +-
3 files changed, 55 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/h264_qpel_daedalus.c
diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -7,7 +7,8 @@ OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o \
aarch64/h264_idct_daedalus.o
OBJS-$(CONFIG_HUFFYUVDSP) += aarch64/huffyuvdsp_init_aarch64.o
OBJS-$(CONFIG_H264PRED) += aarch64/h264pred_init.o
-OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o
+OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o \
+ aarch64/h264_qpel_daedalus.o
OBJS-$(CONFIG_HPELDSP) += aarch64/hpeldsp_init_aarch64.o
OBJS-$(CONFIG_IDCTDSP) += aarch64/idctdsp_init_aarch64.o
OBJS-$(CONFIG_ME_CMP) += aarch64/me_cmp_init_aarch64.o
diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
new file mode 100644
--- /dev/null
+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
@@ -0,0 +1,50 @@
+/*
+ * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
+ * — daedalus-fourier substitution shim.
+ *
+ * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
+ * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
+ * ff_put_h264_qpel8_mc20_neon. The recipe layer picks the substrate
+ * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
+ * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
+ *
+ * Sibling to libavcodec/aarch64/h264_idct_daedalus.c. We keep a
+ * SEPARATE process-global pthread_once context here instead of
+ * sharing the H264DSP one because H264QPEL is its own libavcodec
+ * Makefile module and link order does not guarantee a single .o
+ * owns the ctx symbol. The cost is one extra
+ * daedalus_ctx_create_no_qpu (~µs) per process; daemon and host
+ * processes pay this lazily on first MC call.
+ *
+ * FFmpeg H264QpelContext convention: both dst and src use a SINGLE
+ * stride and `src` already points at the leftmost OUTPUT column
+ * (col 0); the 6-tap filter reads cols -2..+3. This matches
+ * daedalus_recipe_dispatch_h264_qpel_mc20's documented contract
+ * directly, so dst_off = src_off = 0.
+ */
+
+#include <pthread.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include <daedalus.h>
+
+#include "libavutil/attributes.h"
+
+static daedalus_ctx *g_dctx;
+static pthread_once_t g_dctx_once = PTHREAD_ONCE_INIT;
+
+static void daedalus_ctx_init_once(void)
+{
+ g_dctx = daedalus_ctx_create_no_qpu();
+}
+
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
+
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride)
+{
+ static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 };
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+ daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
+ 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c
+++ b/libavcodec/aarch64/h264qpel_init_aarch64.c
@@ -47,6 +47,8 @@ void ff_put_h264_qpel8_mc00_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t str
void ff_put_h264_qpel8_mc10_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc20_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
+ ptrdiff_t stride);
void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -184,7 +186,7 @@ av_cold void ff_h264qpel_init_aarch64(H264QpelContext *c, int bit_depth)
c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
- c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_neon;
+ c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
--
2.47.3
+15 -7
View File
@@ -24,13 +24,13 @@ _srcname=FFmpeg
_version='8.1' _version='8.1'
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24 _commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
pkgver=8.1.r123329.b57fbbe pkgver=8.1.r123329.b57fbbe
pkgrel=6 # pkgrel=6 — H.264 IDCT 4x4 daedalus-fourier substitution (2026-05-21) pkgrel=10 # pkgrel=10 — H.264 luma qpel mc20 daedalus-fourier substitution (cycle 9, 2026-05-23)
epoch=2 epoch=2
# daedalus-fourier pin — first kernel substitution in libavcodec # daedalus-fourier pin. 209a421 = PR #2 merge (Phase 8c — public API
# (cycle 6 H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon's # gains daedalus_recipe_dispatch_h264_qpel_mc20 + DAEDALUS_KERNEL_H264_QPEL_MC20).
# inline build; lockstep with that until the public API rolls. # Cycle 9 closes the libavcodec.so substitution arc started at cycle 6.
_daedalus_fourier_commit='d87239d8172307d9a1b93c95cbed116d175b85cc' _daedalus_fourier_commit='209a4218bcb98b91c04f07ad61513bb04adb13ad'
pkgdesc='FFmpeg with V4L2 Request API hwaccel (Rockchip / Allwinner stateless decode)' pkgdesc='FFmpeg with V4L2 Request API hwaccel (Rockchip / Allwinner stateless decode)'
arch=('aarch64') arch=('aarch64')
url='https://github.com/Kwiboo/FFmpeg' url='https://github.com/Kwiboo/FFmpeg'
@@ -90,8 +90,12 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
"daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz" "daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz"
'0001-libudev-bypass-fallback.patch' '0001-libudev-bypass-fallback.patch'
'0002-nv15-to-p010-unpack.patch' '0002-nv15-to-p010-unpack.patch'
'0003-h264-idct4-daedalus-fourier.patch') '0003-h264-idct4-daedalus-fourier.patch'
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP') '0004-h264-idct8-daedalus-fourier.patch'
'0005-h264-deblock-luma-v-daedalus-fourier.patch'
'0006-h264-restore-low-delay.patch'
'0007-h264-qpel-mc20-daedalus-fourier.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
pkgver() { pkgver() {
cd "${_srcname}" cd "${_srcname}"
@@ -105,6 +109,10 @@ prepare() {
patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch" patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch" patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch" patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
patch -Np1 -i "${srcdir}/0005-h264-deblock-luma-v-daedalus-fourier.patch"
patch -Np1 -i "${srcdir}/0006-h264-restore-low-delay.patch"
patch -Np1 -i "${srcdir}/0007-h264-qpel-mc20-daedalus-fourier.patch"
} }
build() { build() {
@@ -1,6 +1,6 @@
diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h
--- a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:46:57.477785029 +0200 --- a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:46:57.477785029 +0200
+++ b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-22 10:17:41.214043265 +0200
@@ -88,8 +88,18 @@ @@ -88,8 +88,18 @@
struct panvk_cmd_compute_state compute; struct panvk_cmd_compute_state compute;
struct panvk_push_constant_state push_constants; struct panvk_push_constant_state push_constants;
@@ -22,7 +22,7 @@ diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/p
diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
--- a/src/panfrost/vulkan/meson.build 2026-05-21 22:46:59.277811484 +0200 --- a/src/panfrost/vulkan/meson.build 2026-05-21 22:46:59.277811484 +0200
+++ b/src/panfrost/vulkan/meson.build 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/meson.build 2026-05-22 10:17:41.214043265 +0200
@@ -41,6 +41,10 @@ @@ -41,6 +41,10 @@
'panvk_device_memory.c', 'panvk_device_memory.c',
'panvk_host_copy.c', 'panvk_host_copy.c',
@@ -36,7 +36,7 @@ diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
'panvk_physical_device.c', 'panvk_physical_device.c',
diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffer.c diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffer.c
--- a/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:46:57.485785147 +0200 --- a/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:46:57.485785147 +0200
+++ b/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_buffer.c 2026-05-22 10:17:41.214043265 +0200
@@ -88,6 +88,8 @@ @@ -88,6 +88,8 @@
*bind_status->pResult = VK_SUCCESS; *bind_status->pResult = VK_SUCCESS;
@@ -48,7 +48,7 @@ diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffe
} }
diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffer.h diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffer.h
--- a/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:46:57.485785147 +0200 --- a/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:46:57.485785147 +0200
+++ b/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_buffer.h 2026-05-22 10:17:41.214043265 +0200
@@ -14,8 +14,14 @@ @@ -14,8 +14,14 @@
struct panvk_priv_bo; struct panvk_priv_bo;
@@ -66,7 +66,7 @@ diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffe
VK_DEFINE_NONDISP_HANDLE_CASTS(panvk_buffer, vk.base, VkBuffer, VK_DEFINE_NONDISP_HANDLE_CASTS(panvk_buffer, vk.base, VkBuffer,
diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_device.h diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_device.h
--- a/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:46:57.489785206 +0200 --- a/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:46:57.489785206 +0200
+++ b/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_device.h 2026-05-22 10:17:41.214043265 +0200
@@ -45,6 +45,8 @@ @@ -45,6 +45,8 @@
enum panvk_queue_family { enum panvk_queue_family {
PANVK_QUEUE_FAMILY_GPU, PANVK_QUEUE_FAMILY_GPU,
@@ -102,7 +102,7 @@ diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_devic
struct { struct {
diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/panvk_physical_device.c diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/panvk_physical_device.c
--- a/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:46:57.497785323 +0200 --- a/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:46:57.497785323 +0200
+++ b/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_physical_device.c 2026-05-22 10:17:41.214043265 +0200
@@ -577,12 +577,22 @@ @@ -577,12 +577,22 @@
.queueFlags = VK_QUEUE_SPARSE_BINDING_BIT, .queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
.queueCount = 1, .queueCount = 1,
@@ -234,8 +234,8 @@ diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/pa
+} +}
diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
--- a/src/panfrost/vulkan/panvk_v4l2.c 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_v4l2.c 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_v4l2.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_v4l2.c 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,569 @@ @@ -0,0 +1,615 @@
+/* +/*
+ * panvk-bifrost-video Phase 4 commit 3: + * panvk-bifrost-video Phase 4 commit 3:
+ * + *
@@ -250,6 +250,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
+#include "panvk_video_decode.h" +#include "panvk_video_decode.h"
+#include "panvk_device.h" +#include "panvk_device.h"
+ +
+#include "util/macros.h"
+#include "vk_alloc.h" +#include "vk_alloc.h"
+#include "vk_log.h" +#include "vk_log.h"
+ +
@@ -417,7 +418,9 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
+ mesa_loge("panvk_v4l2: REQBUFS OUTPUT failed: %s", strerror(errno)); + mesa_loge("panvk_v4l2: REQBUFS OUTPUT failed: %s", strerror(errno));
+ return -errno; + return -errno;
+ } + }
+ vs->num_output_buffers = rb.count; + /* REQBUFS may round up the count above the request — clamp to our
+ * fixed-size mmap arrays (Phase 5 review: prevents output_map OOB). */
+ vs->num_output_buffers = MIN2(rb.count, 18);
+ vs->output_next = 0; + vs->output_next = 0;
+ +
+ /* CAPTURE: MMAP — kernel-allocated, mmap to CPU for copy-out path. */ + /* CAPTURE: MMAP — kernel-allocated, mmap to CPU for copy-out path. */
@@ -430,7 +433,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
+ mesa_loge("panvk_v4l2: REQBUFS CAPTURE failed: %s", strerror(errno)); + mesa_loge("panvk_v4l2: REQBUFS CAPTURE failed: %s", strerror(errno));
+ return -errno; + return -errno;
+ } + }
+ vs->num_capture_buffers = rb.count; + vs->num_capture_buffers = MIN2(rb.count, 18);
+ vs->capture_next = 0; + vs->capture_next = 0;
+ +
+ return 0; + return 0;
@@ -788,6 +791,49 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
+ struct vk_device *vk_dev, + struct vk_device *vk_dev,
+ const VkAllocationCallbacks *alloc) + const VkAllocationCallbacks *alloc)
+{ +{
+ /* Unwind in reverse order of session_init. Each step is guarded by
+ * "have we got far enough to need this" so the function is safe to
+ * call on partially-initialised sessions (the session_init failure
+ * paths jump here via `goto fail`). */
+
+ /* munmap CAPTURE + OUTPUT (no-op for entries left at NULL by an
+ * earlier-failed mmap loop). */
+ for (unsigned i = 0; i < 18; i++) {
+ if (vs->capture_map[i]) {
+ munmap(vs->capture_map[i], vs->capture_map_size[i]);
+ vs->capture_map[i] = NULL;
+ vs->capture_map_size[i] = 0;
+ }
+ if (vs->output_map[i]) {
+ munmap(vs->output_map[i], vs->output_map_size[i]);
+ vs->output_map[i] = NULL;
+ vs->output_map_size[i] = 0;
+ }
+ }
+
+ if (vs->video_fd >= 0) {
+ /* STREAMOFF (safe to call even if STREAMON never ran — kernel
+ * returns EINVAL which we ignore). */
+ enum v4l2_buf_type t;
+ t = vs->mplane ? V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE
+ : V4L2_BUF_TYPE_VIDEO_OUTPUT;
+ (void) ioctl(vs->video_fd, VIDIOC_STREAMOFF, &t);
+ t = vs->mplane ? V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE
+ : V4L2_BUF_TYPE_VIDEO_CAPTURE;
+ (void) ioctl(vs->video_fd, VIDIOC_STREAMOFF, &t);
+
+ /* Release the kernel buffer queues via REQBUFS count=0. */
+ struct v4l2_requestbuffers rb;
+ memset(&rb, 0, sizeof(rb));
+ rb.memory = V4L2_MEMORY_MMAP;
+ rb.type = vs->mplane ? V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE
+ : V4L2_BUF_TYPE_VIDEO_OUTPUT;
+ (void) ioctl(vs->video_fd, VIDIOC_REQBUFS, &rb);
+ rb.type = vs->mplane ? V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE
+ : V4L2_BUF_TYPE_VIDEO_CAPTURE;
+ (void) ioctl(vs->video_fd, VIDIOC_REQBUFS, &rb);
+ }
+
+ if (vs->request_fds) { + if (vs->request_fds) {
+ for (unsigned i = 0; i < vs->num_request_fds; i++) + for (unsigned i = 0; i < vs->num_request_fds; i++)
+ if (vs->request_fds[i] >= 0) + if (vs->request_fds[i] >= 0)
@@ -807,7 +853,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
+} +}
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4l2_h264.c diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4l2_h264.c
--- a/src/panfrost/vulkan/panvk_v4l2_h264.c 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_v4l2_h264.c 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_v4l2_h264.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_v4l2_h264.c 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,478 @@ @@ -0,0 +1,478 @@
+/* +/*
+ * panvk-bifrost-video Phase 4: Vulkan StdVideo H.264 → V4L2 stateless H.264 + * panvk-bifrost-video Phase 4: Vulkan StdVideo H.264 → V4L2 stateless H.264
@@ -1289,7 +1335,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4
+} +}
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,314 @@ @@ -0,0 +1,314 @@
+/* +/*
+ * H.264 slice header bit-parser implementation. + * H.264 slice header bit-parser implementation.
@@ -1607,7 +1653,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vu
+} +}
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,94 @@ @@ -0,0 +1,94 @@
+/* +/*
+ * H.264 slice header bit-parser for panvk-bifrost-video / V4L2 stateless + * H.264 slice header bit-parser for panvk-bifrost-video / V4L2 stateless
@@ -1705,17 +1751,35 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vu
+#endif /* PANVK_V4L2_H264_SLICE_HEADER_H */ +#endif /* PANVK_V4L2_H264_SLICE_HEADER_H */
diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk_video_decode.c diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk_video_decode.c
--- a/src/panfrost/vulkan/panvk_video_decode.c 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_video_decode.c 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_video_decode.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_video_decode.c 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,362 @@ @@ -0,0 +1,380 @@
+/* +/*
+ * panvk-bifrost-video Phase 4 commit 7b: + * panvk-bifrost-video: Vulkan video decode entrypoints (H.264).
+ * Vulkan-side decode dispatch wired to V4L2 hantro via dmabuf.
+ * + *
+ * Phase 1 simplification: cmd_buffer state tracking via DEVICE-level + * Drives the V4L2 stateless hantro VPU backend (panvk_v4l2.c) from
+ * active_video struct (under a mutex). Per-cmdbuf state hand-off is + * Vulkan vkCmdDecodeVideoKHR. Decode is synchronous at record time —
+ * Phase >>1 once arch-agnostic source can access per-arch cmd_buffer + * the full V4L2 ioctl dance runs to completion inside the command-
+ * structs without the include-path gymnastics. This works for + * recording call before returning to the application. The queue-side
+ * single-session decode workloads (mpv, ffmpeg, vk-video-samples). + * `driver_submit` is a no-op signal-everything (see panvk_vX_device.c).
+ *
+ * Phase 1 simplifications worth knowing about:
+ *
+ * - Cmd-buffer state lives at the DEVICE level (`active_video`) under
+ * a single mutex, NOT per-cmd-buffer. Concurrent video sessions on
+ * the same device clobber each other. Sufficient for current single-
+ * session consumers (mpv-fourier, ffmpeg-vulkan-h264, vk-video-
+ * samples). Spec-compliant multi-session is a Phase >>1 follow-up.
+ *
+ * - Source bitstream is read via `src_buf->mem->addr.host`, i.e. the
+ * bound VkDeviceMemory's CPU mapping. Works because panvk-bifrost
+ * only exposes HOST_VISIBLE memory types; an app that bound the
+ * bitstream buffer to non-HOST_VISIBLE memory would get a logged
+ * error and a silent decode skip (CmdDecodeVideoKHR is void, so we
+ * have no clean error-return path). VkPhysicalDeviceVideo*
+ * constraints would be the right place to make this contractual.
+ *
+ * - Requires `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1` (mesa-upstream gate
+ * on panvk-on-Bifrost which is not conformant).
+ * + *
+ * SPDX-License-Identifier: MIT + * SPDX-License-Identifier: MIT
+ */ + */
@@ -1929,10 +1993,10 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk
+ * `tv_sec * 1e9 + tv_usec * 1e3`). Sub-microsecond bits are dropped, so + * `tv_sec * 1e9 + tv_usec * 1e3`). Sub-microsecond bits are dropped, so
+ * any high-resolution stamp (e.g. a 64-bit pointer cast) makes the + * any high-resolution stamp (e.g. a 64-bit pointer cast) makes the
+ * lookup miss and P/B frames decode against zero references. Use a + * lookup miss and P/B frames decode against zero references. Use a
+ * monotonic per-session counter in microseconds (i.e. * 1000 ns). + * per-session monotonic counter in microseconds (i.e. * 1000 ns) so
+ * concurrent sessions sharing /dev/video1 don't collide on stamp.
+ */ + */
+ static uint32_t panvk_video_ts_counter = 0; + const uint64_t output_ts = ((uint64_t)++vs->ts_counter) * 1000ULL;
+ const uint64_t output_ts = ((uint64_t)++panvk_video_ts_counter) * 1000ULL;
+ uint32_t dst_dpb_slot = pDecodeInfo->pSetupReferenceSlot + uint32_t dst_dpb_slot = pDecodeInfo->pSetupReferenceSlot
+ ? (uint32_t) pDecodeInfo->pSetupReferenceSlot->slotIndex : 0u; + ? (uint32_t) pDecodeInfo->pSetupReferenceSlot->slotIndex : 0u;
+ +
@@ -2071,8 +2135,8 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk
+} +}
diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk_video_decode.h diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk_video_decode.h
--- a/src/panfrost/vulkan/panvk_video_decode.h 1970-01-01 01:00:00.000000000 +0100 --- a/src/panfrost/vulkan/panvk_video_decode.h 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_video_decode.h 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_video_decode.h 2026-05-22 10:17:41.214043265 +0200
@@ -0,0 +1,114 @@ @@ -0,0 +1,124 @@
+/* +/*
+ * panvk-bifrost-video Phase 4 commit 3: extended for V4L2 state. + * panvk-bifrost-video Phase 4 commit 3: extended for V4L2 state.
+ * + *
@@ -2103,12 +2167,22 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk
+ struct v4l2_format fmt_output; + struct v4l2_format fmt_output;
+ struct v4l2_format fmt_capture; + struct v4l2_format fmt_capture;
+ +
+ /* Request fd pool. PANVK_V4L2_REQUEST_FD_COUNT entries. */ + /* Request fd pool. PANVK_V4L2_REQUEST_FD_COUNT entries.
+ * Size of request_fd_used[] is bounded by the same compile-time max;
+ * keep them coupled to avoid silent overflow if the pool grows. */
+#define PANVK_VIDEO_REQUEST_FD_MAX 32
+ int *request_fds; + int *request_fds;
+ bool request_fd_used[32]; /* tracks per-fd "ever queued" → REINIT before reuse */ + bool request_fd_used[PANVK_VIDEO_REQUEST_FD_MAX];
+ unsigned num_request_fds; + unsigned num_request_fds;
+ uint32_t request_fd_next; /* round-robin index */ + uint32_t request_fd_next; /* round-robin index */
+ +
+ /* Per-session V4L2 buffer-identity counter. Multiplied by 1000 ns at
+ * QBUF time so the stamp round-trips losslessly through (tv_sec,
+ * tv_usec) — hantro's reflist builder matches dpb[i].reference_ts
+ * against the kernel-side OUTPUT timestamp. Per-session (not process-
+ * global) so concurrent sessions sharing /dev/video1 don't collide. */
+ uint32_t ts_counter;
+
+ /* DPB slotIndex → V4L2 reference_ts mapping (Phase 1 D5) */ + /* DPB slotIndex → V4L2 reference_ts mapping (Phase 1 D5) */
+ struct { + struct {
+ bool valid; + bool valid;
@@ -2189,7 +2263,7 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk
+#endif /* PANVK_VIDEO_DECODE_H */ +#endif /* PANVK_VIDEO_DECODE_H */
diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX_device.c diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX_device.c
--- a/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:46:57.505785441 +0200 --- a/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:46:57.505785441 +0200
+++ b/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_vX_device.c 2026-05-22 10:17:41.214043265 +0200
@@ -203,6 +203,27 @@ @@ -203,6 +203,27 @@
} }
} }
@@ -2372,14 +2446,27 @@ diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX
qf->queues = qf->queues =
diff -urN a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c diff -urN a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:46:59.273811425 +0200 --- a/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:46:59.273811425 +0200
+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:47:09.189957157 +0200 +++ b/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-22 10:17:41.214043265 +0200
@@ -170,6 +170,9 @@ @@ -12,6 +12,7 @@
#include <sys/sysmacros.h>
#include "git_sha1.h"
+#include "panvk_video_decode.h"
#include "vk_android.h"
#include "vk_device.h"
@@ -170,6 +171,14 @@
.EXT_queue_family_foreign = true, .EXT_queue_family_foreign = true,
.EXT_robustness2 = true, .EXT_robustness2 = true,
.EXT_transform_feedback = PAN_ARCH < 9, /* iter13: JM-class only for now */ .EXT_transform_feedback = PAN_ARCH < 9, /* iter13: JM-class only for now */
+ .KHR_video_queue = PAN_ARCH < 9, /* panvk-bifrost-video Phase 4 commit 1 */ + /* Video extensions are advertised only when (a) we're on a Bifrost
+ .KHR_video_decode_queue = PAN_ARCH < 9, /* hantro V4L2-stateless backend */ + * arch (PAN_ARCH < 9) AND (b) a hantro VPU is reachable on the
+ .KHR_video_decode_h264 = PAN_ARCH < 9, /* H.264 only initially */ + * expected V4L2 nodes — otherwise CreateVideoSessionKHR would
+ * succeed at the panvk layer and then fail at v4l2_open_fds, giving
+ * the app a misleading capability claim. */
+ .KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
+ .KHR_video_decode_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
+ .KHR_video_decode_h264 = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
.EXT_sampler_filter_minmax = PAN_ARCH >= 10, .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
.EXT_scalar_block_layout = true, .EXT_scalar_block_layout = true,
.EXT_separate_stencil_usage = true, .EXT_separate_stencil_usage = true,
+1 -1
View File
@@ -45,7 +45,7 @@ pkgver=26.0.6.r5.video1
pkgrel=1 pkgrel=1
pkgdesc="Patched Mesa libvulkan_panfrost.so adding VK_KHR_video_decode_h264 on Bifrost SBCs (sibling of mesa-panvk-bifrost-r4)" pkgdesc="Patched Mesa libvulkan_panfrost.so adding VK_KHR_video_decode_h264 on Bifrost SBCs (sibling of mesa-panvk-bifrost-r4)"
arch=('aarch64') arch=('aarch64')
url="https://github.com/marfrit/panvk-bifrost" url="https://git.reauktion.de/marfrit/panvk-bifrost"
license=('MIT') license=('MIT')
depends=( depends=(
+1 -1
View File
@@ -34,7 +34,7 @@ pkgver=26.0.6.r4
pkgrel=1 pkgrel=1
pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)" pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
arch=('aarch64') arch=('aarch64')
url="https://github.com/marfrit/panvk-bifrost" url="https://git.reauktion.de/marfrit/panvk-bifrost"
license=('MIT') license=('MIT')
# We co-install at /usr/lib/panvk-bifrost/ so no conflicts with stock mesa. # We co-install at /usr/lib/panvk-bifrost/ so no conflicts with stock mesa.
+3 -3
View File
@@ -14,9 +14,9 @@
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh # Sibling userspace package: ../daedalus-v4l2/build-deb.sh
set -euo pipefail set -euo pipefail
UPSTREAM_COMMIT=5d8b4369e58ab947d1c56b1f718293c57c6065b5 UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
PKGVER=0.1.0+r33+g5d8b436 PKGVER=0.1.0+r45+g872eec5
PKGREL=1 # reset for new upstream pin (5d8b436 — revert parking design); still carries the #64 multi-kernel postinst fix PKGREL=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED
MODULE_NAME=daedalus_v4l2 MODULE_NAME=daedalus_v4l2
HERE=$(dirname "$(readlink -f "$0")") HERE=$(dirname "$(readlink -f "$0")")
+21
View File
@@ -1,3 +1,24 @@
daedalus-v4l2-dkms (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
* Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
to 1 MiB in include/daedalus_v4l2_proto.h. The kernel module
inherits the larger DAEDALUS_MAX_BITSTREAM via the same #define
and daedalus_fill_output_fmt now reports OUTPUT_MPLANE
sizeimage = ~1 MiB instead of 65484.
* Skips the r33 -> r45 commit range — between 5d8b436 and 872eec5
only one kernel/include change landed (the PROTO_MAX_PAYLOAD
bump above). The intervening daemon-only bumps (r37 / r39 /
r41 / r43) didn't touch kernel/ or include/ at all.
* Effective wire cap is min(kernel, daemon) — lock-step install
WITH daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED.
* Allocations (kmemdup / kmalloc on payload, vb2 plane backing)
are dynamic and sized per-payload at runtime; the bump only
sets the ceiling. KMALLOC_MAX_SIZE on aarch64 SLUB is several
MiB so 1 MiB is well within bounds.
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 21:00:00 +0000
daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel * Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel
+3 -3
View File
@@ -19,9 +19,9 @@ set -euo pipefail
# source tree we own in marfrit-packages. Headers + .pc files # source tree we own in marfrit-packages. Headers + .pc files
# come from ffmpeg-v4l2-request-fourier (installed by the CI # come from ffmpeg-v4l2-request-fourier (installed by the CI
# workflow before this script runs; see PKG_CONFIG_PATH below). # workflow before this script runs; see PKG_CONFIG_PATH below).
UPSTREAM_COMMIT=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
PKGVER=0.1.0+r41+g6e6dfa1 PKGVER=0.1.0+r45+g872eec5
PKGREL=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier) PKGREL=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED
# daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge # daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge
# (install rules + pkg-config, enables this consumer to find_package # (install rules + pkg-config, enables this consumer to find_package
+43
View File
@@ -1,3 +1,46 @@
daedalus-v4l2 (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
* Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
to 1 MiB. DAEDALUS_MAX_BITSTREAM follows; daedalus_fill_output_fmt
now reports OUTPUT_MPLANE sizeimage = ~1 MiB instead of 65484.
libva-v4l2-request-fourier's S_FMT-driven OUTPUT-pool resize
finally succeeds; Firefox no longer falls off to libmozavcodec
SW when an H.264 slice exceeds 64 KiB (routine on any
720p+ stream).
* #define-only change in include/daedalus_v4l2_proto.h; struct
layout unchanged. But effective cap is min(kernel, daemon) —
lock-step install of this package WITH
daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED.
* Daemon-side allocations are dynamic (malloc-on-payload), so
the practical growth is one ~1 MiB read buffer per daemon
process at startup. Negligible on Pi 5 / 8 GB.
* Picks up the same r43 -> r45 transition as daedalus-v4l2-dkms
(which had been stuck at r33+g5d8b436 since the parking-design
revert because the kernel module didn't change in r37/r39/r41/r43).
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 21:00:00 +0000
daedalus-v4l2 (0.1.0+r43+g1d8f5af-1) bookworm trixie; urgency=medium
* Bump to 1d8f5af — picks up daedalus-v4l2 PR #18 (closes #17).
Daemon now drops degenerate (<4 byte) bitstreams at the REQ_DECODE
entry instead of letting avcodec_send_packet return
AVERROR_INVALIDDATA. Reply RESP_FRAME with status=
DAEDALUS_DECODE_NO_FRAME so libva's V4L2 surface pool stays
healthy.
* Fixes the Firefox YouTube avc1 pause→resume regression observed
on higgs: libva-v4l2-request-fourier flushes a 3-byte stub
(presumably a bare NAL start code) into OUTPUT_MPLANE at the
pause boundary; the old INVALIDDATA error path made Firefox
fall off to libmozavcodec SW for the rest of the session. With
this filter the daemon logs the sentinel as 'tiny bitstream 3
bytes — dropping as no-op' and the next real REQ_DECODE
proceeds normally.
* Wire protocol unchanged. No daedalus-v4l2-dkms bump needed.
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 17:30:00 +0000
daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium
* Bump to 6e6dfa1 — daedalus-v4l2 PR #16. Daemon dlopens Kwiboo * Bump to 6e6dfa1 — daedalus-v4l2 PR #16. Daemon dlopens Kwiboo
@@ -0,0 +1,107 @@
From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 10:18:21 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.idct8_add (called per 8x8 block from the High-profile
intra-8x8-DCT decode path in h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of the cycle-6 IDCT 4x4 wiring. Same
pthread_once global context, same destructive-zero semantics; FFmpeg
column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
on the in-tree NEON .S code and will be batched through
daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
---
libavcodec/aarch64/h264_idct_daedalus.c | 29 ++++++++++++++++-------
libavcodec/aarch64/h264dsp_init_aarch64.c | 3 ++-
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
index 538d223..cbb98af 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,14 +1,16 @@
/*
- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
+ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
*
- * Routes H264DSPContext.idct_add through
- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
- * The recipe layer picks the substrate (CPU NEON by default for
- * cycle 6; future cycles may dispatch to V3D opportunistically).
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
+ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
+ * recipe layer picks the substrate (CPU NEON by default for cycles
+ * 6 + 7; future cycles may dispatch to V3D opportunistically).
*
- * FFmpeg's 4x4 block memory layout matches daedalus's column-major
- * convention: block[r + 4*c] = coefficient at (row r, col c). Both
- * sides destructively zero the block after the transform.
+ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
+ * column-major convention: block[r + N*c] = coefficient at
+ * (row r, col c) for N ∈ {4, 8}. Both sides destructively zero the
+ * block after the transform.
*
* The library context is process-global and lazily initialised under
* pthread_once. We pick the no-QPU constructor here because
@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
}
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
{
@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
block, 1, &meta);
}
+
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
+{
+ static const daedalus_h264_block_meta meta = { .dst_off = 0 };
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
+ block, 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index b993df2..741e551 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
const uint8_t nnzc[15 * 8]);
void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
int16_t *block, int stride,
@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
c->idct_add16intra = ff_h264_idct_add16intra_neon;
if (chroma_format_idc <= 1)
c->idct_add8 = ff_h264_idct_add8_neon;
- c->idct8_add = ff_h264_idct8_add_neon;
+ c->idct8_add = ff_h264_idct8_add_daedalus;
c->idct8_dc_add = ff_h264_idct8_dc_add_neon;
c->idct8_add4 = ff_h264_idct8_add4_neon;
} else if (have_neon(cpu_flags) && bit_depth == 10) {
--
2.47.3
@@ -0,0 +1,121 @@
From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 12:15:16 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
deblock, called per macroblock-row edge from the slice deblock
loop) now dispatches through
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
ff_h264_v_loop_filter_luma_neon.
The recipe layer picks the substrate; for cycle 8 the daedalus
docstring marks the kernel "CPU primary; QPU opportunistic", but
the libavcodec.so context here is built with
daedalus_ctx_create_no_qpu — process-global pthread_once init,
shared with cycles 6/7. QPU opportunism stays gated off until a
follow-up adds an explicit feature flag (no implicit Vulkan init
in arbitrary host processes). In the meantime cycle 8 is a
plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
only covers the non-intra path per its docstring.
FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
(int32_t alpha/beta + inline int8_t tc0[4]). pix already points
to row 0 of the bottom block per FFmpeg's deblock convention,
satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
---
libavcodec/aarch64/h264_idct_daedalus.c | 36 +++++++++++++++++++----
libavcodec/aarch64/h264dsp_init_aarch64.c | 4 ++-
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
index cbb98af..92365fa 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,11 +1,14 @@
/*
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
*
- * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
- * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
- * recipe layer picks the substrate (CPU NEON by default for cycles
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
+ * H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
+ * instead of the in-tree ff_h264_*_neon assembly. The recipe layer
+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
+ * so cycle 8 stays on the CPU NEON path until a separate change
+ * gates QPU init on a daedalus-fourier feature flag).
*
* FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
* column-major convention: block[r + N*c] = coefficient at
@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0);
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
{
@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
block, 1, &meta);
}
+
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0)
+{
+ daedalus_h264_deblock_meta meta = {
+ .dst_off = 0,
+ .alpha = alpha,
+ .beta = beta,
+ };
+ meta.tc0[0] = tc0[0];
+ meta.tc0[1] = tc0[1];
+ meta.tc0[2] = tc0[2];
+ meta.tc0[3] = tc0[3];
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
+ 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index 741e551..85ac381 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -27,6 +27,8 @@
void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
int beta, int8_t *tc0);
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
+ int alpha, int beta, int8_t *tc0);
void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
int beta, int8_t *tc0);
void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
int cpu_flags = av_get_cpu_flags();
if (have_neon(cpu_flags) && bit_depth == 8) {
- c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_neon;
+ c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_daedalus;
c->h_loop_filter_luma = ff_h264_h_loop_filter_luma_neon;
c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
--
2.47.3
@@ -0,0 +1,82 @@
From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Fri, 22 May 2026 14:18:25 +0200
Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
FFmpeg 8.x dropped the H.264 decoder's low_delay path —
AV_CODEC_FLAG_LOW_DELAY no longer prevents
h264_select_output_frame from running the display-order DPB
output queue. V4L2-stateless-style consumers (daedalus-v4l2
daemon, libva-v4l2-request-fourier) that set the flag end up
seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
Restore the documented semantics:
- Early-exit at the top of h264_select_output_frame when the
flag is set: emit the just-decoded picture immediately as
next_output_pic, mirror the corruption / recovery-point
tracking the main path performs, and skip the entire
delayed_pic[] / POC reorder machinery.
- Suppress the SPS-driven has_b_frames clobber in
h264_field_start when the flag is set, so the per-slice
bitstream_restriction_flag re-pickup cannot reintroduce a
nonzero reorder buffer mid-stream.
This is a fork-only change required by the daedalus-v4l2 daemon's
one-frame-per-send_packet contract; upstream FFmpeg consumers that
expect display-order output remain untouched (flag default = off).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
+ flag-restoration follow-up.
---
libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index 97fab70..a7bfbd6 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
cur->mmco_reset = h->mmco_reset;
h->mmco_reset = 0;
+ /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
+ * decoder's low_delay path). Bypass the display-order DPB
+ * output queue: emit the just-decoded picture immediately, in
+ * decode order, one per send_packet. V4L2-stateless-style
+ * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
+ * do their own POC-based reorder downstream and require this
+ * behaviour. */
+ if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
+ h->next_output_pic = cur;
+ h->next_outputed_poc = cur->poc;
+ h->frame_recovered |= cur->recovered;
+ cur->recovered |= h->frame_recovered & FRAME_RECOVERED_SEI;
+ if (!cur->recovered) {
+ if (!(h->avctx->flags & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
+ !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
+ h->next_output_pic = NULL;
+ else
+ cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
+ }
+ return 0;
+ }
+
if (sps->bitstream_restriction_flag ||
h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
sps = h->ps.sps;
if (sps->bitstream_restriction_flag &&
+ !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
h->avctx->has_b_frames < sps->num_reorder_frames) {
h->avctx->has_b_frames = sps->num_reorder_frames;
}
--
2.47.3
@@ -0,0 +1,139 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 23 May 2026 12:00:00 +0200
Subject: [PATCH] avcodec/aarch64/h264qpel: route 8x8 mc20 through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
half-pel, 6-tap "put" variant — the canonical representative of the
H.264 luma motion-compensation family) now dispatches through
daedalus_recipe_dispatch_h264_qpel_mc20 instead of
ff_put_h264_qpel8_mc20_neon.
Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
4-cycle libavcodec.so substitution sequence (6 IDCT 4x4 / 7 IDCT 8x8 /
8 luma-v deblock / 9 qpel mc20).
The recipe layer picks the substrate. Per docs/k9_h264qpel_mc20.md
the verdict is CPU NEON: per-block 7.6 ns at 131 Mblock/s gives 135x
margin over 30 fps 1080p, and the QPU dispatch floor (~250 ns)
makes any V3D shader strictly worse. Substitution is plumbing-only,
NEON-by-recipe — same daedalus_ctx_create_no_qpu pthread_once
context shape the cycles 6/7/8 shims already own (kept SEPARATE
from the H264DSP shim's ctx because H264QPEL is its own libavcodec
Makefile module and link order does not guarantee a single .o
owns the ctx symbol; one extra ~µs init per process, paid lazily).
Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
size tier stay on the in-tree NEON .S code. Per the cycle-9 phase-1
rationale, mc20 8x8 is representative of the whole family's per-block
cost — extending the substitution to other variants would multiply
recipe-lookup overhead without changing the substrate verdict.
Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
cycle 9 green; M1 = 100% bit-exact across 10000 random blocks).
No SONAME change, no Depends change.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
---
libavcodec/aarch64/Makefile | 3 +-
libavcodec/aarch64/h264_qpel_daedalus.c | 50 ++++++++++++++++++++++
libavcodec/aarch64/h264qpel_init_aarch64.c | 4 +-
3 files changed, 55 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/h264_qpel_daedalus.c
diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -7,7 +7,8 @@ OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o \
aarch64/h264_idct_daedalus.o
OBJS-$(CONFIG_HUFFYUVDSP) += aarch64/huffyuvdsp_init_aarch64.o
OBJS-$(CONFIG_H264PRED) += aarch64/h264pred_init.o
-OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o
+OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o \
+ aarch64/h264_qpel_daedalus.o
OBJS-$(CONFIG_HPELDSP) += aarch64/hpeldsp_init_aarch64.o
OBJS-$(CONFIG_IDCTDSP) += aarch64/idctdsp_init_aarch64.o
OBJS-$(CONFIG_ME_CMP) += aarch64/me_cmp_init_aarch64.o
diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
new file mode 100644
--- /dev/null
+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
@@ -0,0 +1,50 @@
+/*
+ * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
+ * — daedalus-fourier substitution shim.
+ *
+ * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
+ * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
+ * ff_put_h264_qpel8_mc20_neon. The recipe layer picks the substrate
+ * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
+ * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
+ *
+ * Sibling to libavcodec/aarch64/h264_idct_daedalus.c. We keep a
+ * SEPARATE process-global pthread_once context here instead of
+ * sharing the H264DSP one because H264QPEL is its own libavcodec
+ * Makefile module and link order does not guarantee a single .o
+ * owns the ctx symbol. The cost is one extra
+ * daedalus_ctx_create_no_qpu (~µs) per process; daemon and host
+ * processes pay this lazily on first MC call.
+ *
+ * FFmpeg H264QpelContext convention: both dst and src use a SINGLE
+ * stride and `src` already points at the leftmost OUTPUT column
+ * (col 0); the 6-tap filter reads cols -2..+3. This matches
+ * daedalus_recipe_dispatch_h264_qpel_mc20's documented contract
+ * directly, so dst_off = src_off = 0.
+ */
+
+#include <pthread.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include <daedalus.h>
+
+#include "libavutil/attributes.h"
+
+static daedalus_ctx *g_dctx;
+static pthread_once_t g_dctx_once = PTHREAD_ONCE_INIT;
+
+static void daedalus_ctx_init_once(void)
+{
+ g_dctx = daedalus_ctx_create_no_qpu();
+}
+
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
+
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride)
+{
+ static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 };
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+ daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
+ 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c
+++ b/libavcodec/aarch64/h264qpel_init_aarch64.c
@@ -47,6 +47,8 @@ void ff_put_h264_qpel8_mc00_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t str
void ff_put_h264_qpel8_mc10_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc20_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
+ ptrdiff_t stride);
void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -184,7 +186,7 @@ av_cold void ff_h264qpel_init_aarch64(H264QpelContext *c, int bit_depth)
c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
- c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_neon;
+ c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
--
2.47.3
+16 -11
View File
@@ -33,18 +33,19 @@ FFMPEG_VERSION=8.1
# epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie); # epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
# +rfourier suffix to avoid colliding with upstream/Debian rebuilds. # +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
PKGREL=6 # pkgrel=6drop --enable-libxml2 to avoid runner/target libxml2 PKGREL=10 # pkgrel=10H.264 luma qpel mc20 daedalus-fourier substitution
# SOVERSION skew (runner has libxml2 ≥ 2.14 = SONAME 16; trixie # (cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes
# has 2.12 = SONAME 2; -5 .deb dlopens libavformat → fails on # the libavcodec.so substitution sequence 6 IDCT4 / 7 IDCT8 /
# "libxml2.so.16: cannot open shared object"). Neither the # 8 luma-v deblock / 9 qpel mc20). Pulls daedalus-fourier PR #2
# daedalus-v4l2 daemon (direct AVPacket feed) nor mpv-fourier # which extends the public API with
# nor firefox-fourier consumers need FFmpeg's DASH demuxer. # daedalus_recipe_dispatch_h264_qpel_mc20. (2026-05-23)
# (2026-05-21)
# daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6 # daedalus-fourier pin. 209a421 = daedalus-fourier PR #2 merge — public
# H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon already ships # API now exposes daedalus_recipe_dispatch_h264_qpel_mc20 +
# inline; rev in lockstep with the daemon when the public API rolls. # DAEDALUS_KERNEL_H264_QPEL_MC20. Cycle 9 plumbs the last H.264 NEON
DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc # kernel through the recipe layer. Daemon-side build (debian/daedalus-v4l2)
# can bump in a follow-up; this PR only changes the libavcodec.so consumer.
DAEDALUS_FOURIER_COMMIT=209a4218bcb98b91c04f07ad61513bb04adb13ad
HERE=$(dirname "$(readlink -f "$0")") HERE=$(dirname "$(readlink -f "$0")")
@@ -69,6 +70,10 @@ fi
patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch" patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch" patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch" patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch"
patch -Np1 -i "$HERE/0005-h264-deblock-luma-v-daedalus-fourier.patch"
patch -Np1 -i "$HERE/0006-h264-restore-low-delay.patch"
patch -Np1 -i "$HERE/0007-h264-qpel-mc20-daedalus-fourier.patch"
# --- daedalus-fourier: fetch + build static .a with PIC, install to a # --- daedalus-fourier: fetch + build static .a with PIC, install to a
# per-build prefix; libavcodec.so links it into the shared object so # per-build prefix; libavcodec.so links it into the shared object so
+102
View File
@@ -1,3 +1,105 @@
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-10) bookworm trixie; urgency=medium
* Add 0007-h264-qpel-mc20-daedalus-fourier.patch —
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma
horizontal half-pel, 6-tap "put" — the canonical representative
of the H.264 luma motion-compensation family) now dispatches
through daedalus_recipe_dispatch_h264_qpel_mc20 instead of
ff_put_h264_qpel8_mc20_neon. Cycle 9 of the daedalus-v4l2#11
step 2 substitution arc; closes the 4-cycle libavcodec.so
substitution sequence (6 IDCT4 / 7 IDCT8 / 8 luma-v deblock /
9 qpel mc20).
* Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public
API extended with daedalus_recipe_dispatch_h264_qpel_mc20 +
DAEDALUS_KERNEL_H264_QPEL_MC20).
* Cycle 9 is "CPU primary; QPU pointless" per
docs/k9_h264qpel_mc20.md. Per-block 7.6 ns at 131 Mblock/s
gives 135x margin over 30 fps 1080p; QPU dispatch floor at
~250 ns makes any V3D shader strictly worse. Substitution
is plumbing-only, NEON-by-recipe — same
daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8
shims already own (kept SEPARATE from the H264DSP shim's ctx
because H264QPEL is its own libavcodec Makefile module and
link order does not guarantee a single .o owns the ctx symbol;
one extra ~µs init per process, paid lazily on first MC call).
* Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the
16x16 size tier stay on the in-tree NEON .S code. Per the
cycle-9 phase-1 rationale, mc20 8x8 is representative of the
whole family's per-block cost.
* Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
cycle 9 green; 10000/10000 random blocks).
* No SONAME change, no Depends change.
-- Markus Fritsche <mfritsche@reauktion.de> Sat, 23 May 2026 12:00:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-9) bookworm trixie; urgency=medium
* Add 0006-h264-restore-low-delay.patch — restore the documented
AV_CODEC_FLAG_LOW_DELAY semantics in the H.264 decoder. FFmpeg
8.x dropped the H.264 low_delay code path entirely; setting the
flag at avcodec_open2 no longer prevents the display-order DPB
output queue from running. Visible on Firefox YouTube as the
2-1-4-3 B-frame pair-swap, re-introduced silently by the
SONAME 61→62 jump in daedalus-v4l2 PR #16.
* h264_select_output_frame: early-exit when LOW_DELAY is set;
emit the just-decoded picture as next_output_pic, mirror the
corruption / recovery-point tracking, skip delayed_pic[] and
the POC reorder machinery entirely.
* h264_field_start: suppress the SPS-driven
has_b_frames = sps->num_reorder_frames clobber when LOW_DELAY
is set — without this the per-slice bitstream_restriction_flag
re-pickup would reintroduce a nonzero reorder buffer mid-
stream.
* Restores the same one-frame-per-send_packet contract the
daedalus-v4l2 daemon's decoder.c already relies on (the flag
is set unconditionally for H.264). No daemon side change.
* No SONAME change, no Depends change.
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 13:30:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-8) bookworm trixie; urgency=medium
* Add 0005-h264-deblock-luma-v-daedalus-fourier.patch —
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
deblock, called per macroblock-row edge from the slice deblock
loop in libavcodec/h264_loopfilter.c) now dispatches through
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
ff_h264_v_loop_filter_luma_neon. Cycle 8 of the daedalus-v4l2#11
step 2 substitution arc.
* Cycle 8 is marked "CPU primary; QPU opportunistic" in
daedalus-fourier, but the libavcodec.so context here uses
daedalus_ctx_create_no_qpu (process-global pthread_once,
shared with cycles 6/7). Opportunistic QPU is deferred to a
separate change that gates Vulkan init on a feature flag, to
avoid implicit Vulkan init in arbitrary host processes. For
now cycle 8 is plumbing-only — NEON-by-recipe.
* Intra (bS=4) loop filter c->v_loop_filter_luma_intra stays on
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
only covers the non-intra path per its API docstring.
* Bit-exact against ff_h264_v_loop_filter_luma_neon (daedalus-fourier
cycle 8 green).
* No SONAME change, no Depends change.
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 12:30:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
* Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
(per-block 8x8 IDCT, called from the High-profile intra-8x8-DCT
macroblock path in libavcodec/h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct8 instead of
ff_h264_idct8_add_neon. Cycle 7 of the daedalus-v4l2#11 step 2
substitution arc — NEON-by-recipe, same pthread_once context the
cycle-6 IDCT 4x4 shim already owns.
* Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green; FFmpeg 8x8 block storage block[r + 8*c] matches daedalus
column-major convention).
* Bulk c->idct8_add4 (inter 8x8-DCT macroblocks) stays on the
in-tree NEON .S code; batched substitution lands later.
* No SONAME change, no Depends change.
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 10:30:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium
* Drop --enable-libxml2 + libxml2 Depends — the Gitea * Drop --enable-libxml2 + libxml2 Depends — the Gitea