forked from marfrit/marfrit-packages
Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| d449ec1073 | |||
| 9d30c34be9 | |||
| 1ca18ac130 | |||
| cf9eef6cfa | |||
| 5c69460722 | |||
| d11a52405d | |||
| 29e0852d11 | |||
| 510a31622c | |||
| db9ae16da9 | |||
| 7ecbcb3c1b |
@@ -23,10 +23,10 @@ _module=daedalus_v4l2
|
|||||||
# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
|
# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
|
||||||
# PROTO_VERSION drops 1 → 0; lock-step install with
|
# PROTO_VERSION drops 1 → 0; lock-step install with
|
||||||
# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
|
# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
|
||||||
_commit=5d8b4369e58ab947d1c56b1f718293c57c6065b5
|
_commit=872eec505eb91b561892d02a0526749348ddc121
|
||||||
|
|
||||||
pkgver=0.1.0.r33.5d8b436
|
pkgver=0.1.0.r45.872eec5
|
||||||
pkgrel=1 # reset for new upstream pin (5d8b436 — revert parking design)
|
pkgrel=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0.r45.872eec5 REQUIRED
|
||||||
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
|
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
|
||||||
arch=('any')
|
arch=('any')
|
||||||
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
||||||
|
|||||||
@@ -23,12 +23,12 @@ _upstreampkg=daedalus-v4l2
|
|||||||
# (daedalus-v4l2#11). Daemon still needs daedalus-fourier at
|
# (daedalus-v4l2#11). Daemon still needs daedalus-fourier at
|
||||||
# build time (Arch packaging for that is a follow-up; Debian side
|
# build time (Arch packaging for that is a follow-up; Debian side
|
||||||
# fetches inline via build-deb.sh).
|
# fetches inline via build-deb.sh).
|
||||||
_commit=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c
|
_commit=872eec505eb91b561892d02a0526749348ddc121
|
||||||
|
|
||||||
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
|
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
|
||||||
# Phase 8.x close. pkgver() recomputes at build time.
|
# Phase 8.x close. pkgver() recomputes at build time.
|
||||||
pkgver=0.1.0.r41.6e6dfa1
|
pkgver=0.1.0.r45.872eec5
|
||||||
pkgrel=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier)
|
pkgrel=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0.r45.872eec5 REQUIRED
|
||||||
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
|
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
|
||||||
arch=('aarch64')
|
arch=('aarch64')
|
||||||
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
||||||
|
|||||||
@@ -0,0 +1,121 @@
|
|||||||
|
From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
|
||||||
|
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||||
|
Date: Fri, 22 May 2026 12:15:16 +0200
|
||||||
|
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
|
||||||
|
daedalus-fourier
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: text/plain; charset=UTF-8
|
||||||
|
Content-Transfer-Encoding: 8bit
|
||||||
|
|
||||||
|
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
|
||||||
|
deblock, called per macroblock-row edge from the slice deblock
|
||||||
|
loop) now dispatches through
|
||||||
|
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
|
||||||
|
ff_h264_v_loop_filter_luma_neon.
|
||||||
|
|
||||||
|
The recipe layer picks the substrate; for cycle 8 the daedalus
|
||||||
|
docstring marks the kernel "CPU primary; QPU opportunistic", but
|
||||||
|
the libavcodec.so context here is built with
|
||||||
|
daedalus_ctx_create_no_qpu — process-global pthread_once init,
|
||||||
|
shared with cycles 6/7. QPU opportunism stays gated off until a
|
||||||
|
follow-up adds an explicit feature flag (no implicit Vulkan init
|
||||||
|
in arbitrary host processes). In the meantime cycle 8 is a
|
||||||
|
plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
|
||||||
|
|
||||||
|
Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
|
||||||
|
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
|
||||||
|
only covers the non-intra path per its docstring.
|
||||||
|
|
||||||
|
FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
|
||||||
|
(int32_t alpha/beta + inline int8_t tc0[4]). pix already points
|
||||||
|
to row 0 of the bottom block per FFmpeg's deblock convention,
|
||||||
|
satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
|
||||||
|
|
||||||
|
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
|
||||||
|
---
|
||||||
|
libavcodec/aarch64/h264_idct_daedalus.c | 36 +++++++++++++++++++----
|
||||||
|
libavcodec/aarch64/h264dsp_init_aarch64.c | 4 ++-
|
||||||
|
2 files changed, 33 insertions(+), 7 deletions(-)
|
||||||
|
|
||||||
|
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
index cbb98af..92365fa 100644
|
||||||
|
--- a/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
@@ -1,11 +1,14 @@
|
||||||
|
/*
|
||||||
|
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
|
||||||
|
+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
|
||||||
|
*
|
||||||
|
- * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
|
||||||
|
- * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
|
||||||
|
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
|
||||||
|
- * recipe layer picks the substrate (CPU NEON by default for cycles
|
||||||
|
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
|
||||||
|
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
|
||||||
|
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
|
||||||
|
+ * H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
|
||||||
|
+ * instead of the in-tree ff_h264_*_neon assembly. The recipe layer
|
||||||
|
+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
|
||||||
|
+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
|
||||||
|
+ * so cycle 8 stays on the CPU NEON path until a separate change
|
||||||
|
+ * gates QPU init on a daedalus-fourier feature flag).
|
||||||
|
*
|
||||||
|
* FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
|
||||||
|
* column-major convention: block[r + N*c] = coefficient at
|
||||||
|
@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
|
||||||
|
|
||||||
|
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
|
||||||
|
void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0);
|
||||||
|
|
||||||
|
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
|
||||||
|
{
|
||||||
|
@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
|
||||||
|
daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
|
||||||
|
block, 1, &meta);
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0)
|
||||||
|
+{
|
||||||
|
+ daedalus_h264_deblock_meta meta = {
|
||||||
|
+ .dst_off = 0,
|
||||||
|
+ .alpha = alpha,
|
||||||
|
+ .beta = beta,
|
||||||
|
+ };
|
||||||
|
+ meta.tc0[0] = tc0[0];
|
||||||
|
+ meta.tc0[1] = tc0[1];
|
||||||
|
+ meta.tc0[2] = tc0[2];
|
||||||
|
+ meta.tc0[3] = tc0[3];
|
||||||
|
+
|
||||||
|
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
|
||||||
|
+
|
||||||
|
+ daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
|
||||||
|
+ 1, &meta);
|
||||||
|
+}
|
||||||
|
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
index 741e551..85ac381 100644
|
||||||
|
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
@@ -27,6 +27,8 @@
|
||||||
|
|
||||||
|
void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
int beta, int8_t *tc0);
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0);
|
||||||
|
void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
int beta, int8_t *tc0);
|
||||||
|
void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
|
||||||
|
int cpu_flags = av_get_cpu_flags();
|
||||||
|
|
||||||
|
if (have_neon(cpu_flags) && bit_depth == 8) {
|
||||||
|
- c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_neon;
|
||||||
|
+ c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_daedalus;
|
||||||
|
c->h_loop_filter_luma = ff_h264_h_loop_filter_luma_neon;
|
||||||
|
c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
|
||||||
|
c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
|
||||||
|
--
|
||||||
|
2.47.3
|
||||||
|
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
|
||||||
|
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||||
|
Date: Fri, 22 May 2026 14:18:25 +0200
|
||||||
|
Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: text/plain; charset=UTF-8
|
||||||
|
Content-Transfer-Encoding: 8bit
|
||||||
|
|
||||||
|
FFmpeg 8.x dropped the H.264 decoder's low_delay path —
|
||||||
|
AV_CODEC_FLAG_LOW_DELAY no longer prevents
|
||||||
|
h264_select_output_frame from running the display-order DPB
|
||||||
|
output queue. V4L2-stateless-style consumers (daedalus-v4l2
|
||||||
|
daemon, libva-v4l2-request-fourier) that set the flag end up
|
||||||
|
seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
|
||||||
|
|
||||||
|
Restore the documented semantics:
|
||||||
|
|
||||||
|
- Early-exit at the top of h264_select_output_frame when the
|
||||||
|
flag is set: emit the just-decoded picture immediately as
|
||||||
|
next_output_pic, mirror the corruption / recovery-point
|
||||||
|
tracking the main path performs, and skip the entire
|
||||||
|
delayed_pic[] / POC reorder machinery.
|
||||||
|
|
||||||
|
- Suppress the SPS-driven has_b_frames clobber in
|
||||||
|
h264_field_start when the flag is set, so the per-slice
|
||||||
|
bitstream_restriction_flag re-pickup cannot reintroduce a
|
||||||
|
nonzero reorder buffer mid-stream.
|
||||||
|
|
||||||
|
This is a fork-only change required by the daedalus-v4l2 daemon's
|
||||||
|
one-frame-per-send_packet contract; upstream FFmpeg consumers that
|
||||||
|
expect display-order output remain untouched (flag default = off).
|
||||||
|
|
||||||
|
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
|
||||||
|
+ flag-restoration follow-up.
|
||||||
|
---
|
||||||
|
libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
|
||||||
|
1 file changed, 23 insertions(+)
|
||||||
|
|
||||||
|
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
|
||||||
|
index 97fab70..a7bfbd6 100644
|
||||||
|
--- a/libavcodec/h264_slice.c
|
||||||
|
+++ b/libavcodec/h264_slice.c
|
||||||
|
@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
|
||||||
|
cur->mmco_reset = h->mmco_reset;
|
||||||
|
h->mmco_reset = 0;
|
||||||
|
|
||||||
|
+ /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
|
||||||
|
+ * decoder's low_delay path). Bypass the display-order DPB
|
||||||
|
+ * output queue: emit the just-decoded picture immediately, in
|
||||||
|
+ * decode order, one per send_packet. V4L2-stateless-style
|
||||||
|
+ * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
|
||||||
|
+ * do their own POC-based reorder downstream and require this
|
||||||
|
+ * behaviour. */
|
||||||
|
+ if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
|
||||||
|
+ h->next_output_pic = cur;
|
||||||
|
+ h->next_outputed_poc = cur->poc;
|
||||||
|
+ h->frame_recovered |= cur->recovered;
|
||||||
|
+ cur->recovered |= h->frame_recovered & FRAME_RECOVERED_SEI;
|
||||||
|
+ if (!cur->recovered) {
|
||||||
|
+ if (!(h->avctx->flags & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
|
||||||
|
+ !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
|
||||||
|
+ h->next_output_pic = NULL;
|
||||||
|
+ else
|
||||||
|
+ cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
|
||||||
|
+ }
|
||||||
|
+ return 0;
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
if (sps->bitstream_restriction_flag ||
|
||||||
|
h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
|
||||||
|
h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
|
||||||
|
@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
|
||||||
|
sps = h->ps.sps;
|
||||||
|
|
||||||
|
if (sps->bitstream_restriction_flag &&
|
||||||
|
+ !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
|
||||||
|
h->avctx->has_b_frames < sps->num_reorder_frames) {
|
||||||
|
h->avctx->has_b_frames = sps->num_reorder_frames;
|
||||||
|
}
|
||||||
|
--
|
||||||
|
2.47.3
|
||||||
|
|
||||||
@@ -24,7 +24,7 @@ _srcname=FFmpeg
|
|||||||
_version='8.1'
|
_version='8.1'
|
||||||
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
|
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
|
||||||
pkgver=8.1.r123329.b57fbbe
|
pkgver=8.1.r123329.b57fbbe
|
||||||
pkgrel=7 # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution (cycle 7, 2026-05-22)
|
pkgrel=9 # pkgrel=9 — restore AV_CODEC_FLAG_LOW_DELAY for H.264 (2026-05-22)
|
||||||
epoch=2
|
epoch=2
|
||||||
|
|
||||||
# daedalus-fourier pin — first kernel substitution in libavcodec
|
# daedalus-fourier pin — first kernel substitution in libavcodec
|
||||||
@@ -91,8 +91,10 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
|
|||||||
'0001-libudev-bypass-fallback.patch'
|
'0001-libudev-bypass-fallback.patch'
|
||||||
'0002-nv15-to-p010-unpack.patch'
|
'0002-nv15-to-p010-unpack.patch'
|
||||||
'0003-h264-idct4-daedalus-fourier.patch'
|
'0003-h264-idct4-daedalus-fourier.patch'
|
||||||
'0004-h264-idct8-daedalus-fourier.patch')
|
'0004-h264-idct8-daedalus-fourier.patch'
|
||||||
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
|
'0005-h264-deblock-luma-v-daedalus-fourier.patch'
|
||||||
|
'0006-h264-restore-low-delay.patch')
|
||||||
|
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
|
||||||
|
|
||||||
pkgver() {
|
pkgver() {
|
||||||
cd "${_srcname}"
|
cd "${_srcname}"
|
||||||
@@ -107,6 +109,8 @@ prepare() {
|
|||||||
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
|
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
|
||||||
patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
|
patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
|
||||||
patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
|
patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
|
||||||
|
patch -Np1 -i "${srcdir}/0005-h264-deblock-luma-v-daedalus-fourier.patch"
|
||||||
|
patch -Np1 -i "${srcdir}/0006-h264-restore-low-delay.patch"
|
||||||
}
|
}
|
||||||
|
|
||||||
build() {
|
build() {
|
||||||
|
|||||||
+122
-35
@@ -1,6 +1,6 @@
|
|||||||
diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h
|
diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h
|
||||||
--- a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:46:57.477785029 +0200
|
--- a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:46:57.477785029 +0200
|
||||||
+++ b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/jm/panvk_cmd_buffer.h 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -88,8 +88,18 @@
|
@@ -88,8 +88,18 @@
|
||||||
struct panvk_cmd_compute_state compute;
|
struct panvk_cmd_compute_state compute;
|
||||||
struct panvk_push_constant_state push_constants;
|
struct panvk_push_constant_state push_constants;
|
||||||
@@ -22,7 +22,7 @@ diff -urN a/src/panfrost/vulkan/jm/panvk_cmd_buffer.h b/src/panfrost/vulkan/jm/p
|
|||||||
|
|
||||||
diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
|
diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
|
||||||
--- a/src/panfrost/vulkan/meson.build 2026-05-21 22:46:59.277811484 +0200
|
--- a/src/panfrost/vulkan/meson.build 2026-05-21 22:46:59.277811484 +0200
|
||||||
+++ b/src/panfrost/vulkan/meson.build 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/meson.build 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -41,6 +41,10 @@
|
@@ -41,6 +41,10 @@
|
||||||
'panvk_device_memory.c',
|
'panvk_device_memory.c',
|
||||||
'panvk_host_copy.c',
|
'panvk_host_copy.c',
|
||||||
@@ -36,7 +36,7 @@ diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
|
|||||||
'panvk_physical_device.c',
|
'panvk_physical_device.c',
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffer.c
|
diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffer.c
|
||||||
--- a/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:46:57.485785147 +0200
|
--- a/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:46:57.485785147 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_buffer.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_buffer.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -88,6 +88,8 @@
|
@@ -88,6 +88,8 @@
|
||||||
*bind_status->pResult = VK_SUCCESS;
|
*bind_status->pResult = VK_SUCCESS;
|
||||||
|
|
||||||
@@ -48,7 +48,7 @@ diff -urN a/src/panfrost/vulkan/panvk_buffer.c b/src/panfrost/vulkan/panvk_buffe
|
|||||||
}
|
}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffer.h
|
diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffer.h
|
||||||
--- a/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:46:57.485785147 +0200
|
--- a/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:46:57.485785147 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_buffer.h 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_buffer.h 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -14,8 +14,14 @@
|
@@ -14,8 +14,14 @@
|
||||||
|
|
||||||
struct panvk_priv_bo;
|
struct panvk_priv_bo;
|
||||||
@@ -66,7 +66,7 @@ diff -urN a/src/panfrost/vulkan/panvk_buffer.h b/src/panfrost/vulkan/panvk_buffe
|
|||||||
VK_DEFINE_NONDISP_HANDLE_CASTS(panvk_buffer, vk.base, VkBuffer,
|
VK_DEFINE_NONDISP_HANDLE_CASTS(panvk_buffer, vk.base, VkBuffer,
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_device.h
|
diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_device.h
|
||||||
--- a/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:46:57.489785206 +0200
|
--- a/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:46:57.489785206 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_device.h 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_device.h 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -45,6 +45,8 @@
|
@@ -45,6 +45,8 @@
|
||||||
enum panvk_queue_family {
|
enum panvk_queue_family {
|
||||||
PANVK_QUEUE_FAMILY_GPU,
|
PANVK_QUEUE_FAMILY_GPU,
|
||||||
@@ -102,7 +102,7 @@ diff -urN a/src/panfrost/vulkan/panvk_device.h b/src/panfrost/vulkan/panvk_devic
|
|||||||
struct {
|
struct {
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/panvk_physical_device.c
|
diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/panvk_physical_device.c
|
||||||
--- a/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:46:57.497785323 +0200
|
--- a/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:46:57.497785323 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_physical_device.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_physical_device.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -577,12 +577,22 @@
|
@@ -577,12 +577,22 @@
|
||||||
.queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
|
.queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
|
||||||
.queueCount = 1,
|
.queueCount = 1,
|
||||||
@@ -234,8 +234,8 @@ diff -urN a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/pa
|
|||||||
+}
|
+}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
||||||
--- a/src/panfrost/vulkan/panvk_v4l2.c 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_v4l2.c 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_v4l2.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_v4l2.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,569 @@
|
@@ -0,0 +1,615 @@
|
||||||
+/*
|
+/*
|
||||||
+ * panvk-bifrost-video Phase 4 commit 3:
|
+ * panvk-bifrost-video Phase 4 commit 3:
|
||||||
+ *
|
+ *
|
||||||
@@ -250,6 +250,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
|||||||
+#include "panvk_video_decode.h"
|
+#include "panvk_video_decode.h"
|
||||||
+#include "panvk_device.h"
|
+#include "panvk_device.h"
|
||||||
+
|
+
|
||||||
|
+#include "util/macros.h"
|
||||||
+#include "vk_alloc.h"
|
+#include "vk_alloc.h"
|
||||||
+#include "vk_log.h"
|
+#include "vk_log.h"
|
||||||
+
|
+
|
||||||
@@ -417,7 +418,9 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
|||||||
+ mesa_loge("panvk_v4l2: REQBUFS OUTPUT failed: %s", strerror(errno));
|
+ mesa_loge("panvk_v4l2: REQBUFS OUTPUT failed: %s", strerror(errno));
|
||||||
+ return -errno;
|
+ return -errno;
|
||||||
+ }
|
+ }
|
||||||
+ vs->num_output_buffers = rb.count;
|
+ /* REQBUFS may round up the count above the request — clamp to our
|
||||||
|
+ * fixed-size mmap arrays (Phase 5 review: prevents output_map OOB). */
|
||||||
|
+ vs->num_output_buffers = MIN2(rb.count, 18);
|
||||||
+ vs->output_next = 0;
|
+ vs->output_next = 0;
|
||||||
+
|
+
|
||||||
+ /* CAPTURE: MMAP — kernel-allocated, mmap to CPU for copy-out path. */
|
+ /* CAPTURE: MMAP — kernel-allocated, mmap to CPU for copy-out path. */
|
||||||
@@ -430,7 +433,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
|||||||
+ mesa_loge("panvk_v4l2: REQBUFS CAPTURE failed: %s", strerror(errno));
|
+ mesa_loge("panvk_v4l2: REQBUFS CAPTURE failed: %s", strerror(errno));
|
||||||
+ return -errno;
|
+ return -errno;
|
||||||
+ }
|
+ }
|
||||||
+ vs->num_capture_buffers = rb.count;
|
+ vs->num_capture_buffers = MIN2(rb.count, 18);
|
||||||
+ vs->capture_next = 0;
|
+ vs->capture_next = 0;
|
||||||
+
|
+
|
||||||
+ return 0;
|
+ return 0;
|
||||||
@@ -788,6 +791,49 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
|||||||
+ struct vk_device *vk_dev,
|
+ struct vk_device *vk_dev,
|
||||||
+ const VkAllocationCallbacks *alloc)
|
+ const VkAllocationCallbacks *alloc)
|
||||||
+{
|
+{
|
||||||
|
+ /* Unwind in reverse order of session_init. Each step is guarded by
|
||||||
|
+ * "have we got far enough to need this" so the function is safe to
|
||||||
|
+ * call on partially-initialised sessions (the session_init failure
|
||||||
|
+ * paths jump here via `goto fail`). */
|
||||||
|
+
|
||||||
|
+ /* munmap CAPTURE + OUTPUT (no-op for entries left at NULL by an
|
||||||
|
+ * earlier-failed mmap loop). */
|
||||||
|
+ for (unsigned i = 0; i < 18; i++) {
|
||||||
|
+ if (vs->capture_map[i]) {
|
||||||
|
+ munmap(vs->capture_map[i], vs->capture_map_size[i]);
|
||||||
|
+ vs->capture_map[i] = NULL;
|
||||||
|
+ vs->capture_map_size[i] = 0;
|
||||||
|
+ }
|
||||||
|
+ if (vs->output_map[i]) {
|
||||||
|
+ munmap(vs->output_map[i], vs->output_map_size[i]);
|
||||||
|
+ vs->output_map[i] = NULL;
|
||||||
|
+ vs->output_map_size[i] = 0;
|
||||||
|
+ }
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ if (vs->video_fd >= 0) {
|
||||||
|
+ /* STREAMOFF (safe to call even if STREAMON never ran — kernel
|
||||||
|
+ * returns EINVAL which we ignore). */
|
||||||
|
+ enum v4l2_buf_type t;
|
||||||
|
+ t = vs->mplane ? V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE
|
||||||
|
+ : V4L2_BUF_TYPE_VIDEO_OUTPUT;
|
||||||
|
+ (void) ioctl(vs->video_fd, VIDIOC_STREAMOFF, &t);
|
||||||
|
+ t = vs->mplane ? V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE
|
||||||
|
+ : V4L2_BUF_TYPE_VIDEO_CAPTURE;
|
||||||
|
+ (void) ioctl(vs->video_fd, VIDIOC_STREAMOFF, &t);
|
||||||
|
+
|
||||||
|
+ /* Release the kernel buffer queues via REQBUFS count=0. */
|
||||||
|
+ struct v4l2_requestbuffers rb;
|
||||||
|
+ memset(&rb, 0, sizeof(rb));
|
||||||
|
+ rb.memory = V4L2_MEMORY_MMAP;
|
||||||
|
+ rb.type = vs->mplane ? V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE
|
||||||
|
+ : V4L2_BUF_TYPE_VIDEO_OUTPUT;
|
||||||
|
+ (void) ioctl(vs->video_fd, VIDIOC_REQBUFS, &rb);
|
||||||
|
+ rb.type = vs->mplane ? V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE
|
||||||
|
+ : V4L2_BUF_TYPE_VIDEO_CAPTURE;
|
||||||
|
+ (void) ioctl(vs->video_fd, VIDIOC_REQBUFS, &rb);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
+ if (vs->request_fds) {
|
+ if (vs->request_fds) {
|
||||||
+ for (unsigned i = 0; i < vs->num_request_fds; i++)
|
+ for (unsigned i = 0; i < vs->num_request_fds; i++)
|
||||||
+ if (vs->request_fds[i] >= 0)
|
+ if (vs->request_fds[i] >= 0)
|
||||||
@@ -807,7 +853,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2.c b/src/panfrost/vulkan/panvk_v4l2.c
|
|||||||
+}
|
+}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4l2_h264.c
|
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4l2_h264.c
|
||||||
--- a/src/panfrost/vulkan/panvk_v4l2_h264.c 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_v4l2_h264.c 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_v4l2_h264.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_v4l2_h264.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,478 @@
|
@@ -0,0 +1,478 @@
|
||||||
+/*
|
+/*
|
||||||
+ * panvk-bifrost-video Phase 4: Vulkan StdVideo H.264 → V4L2 stateless H.264
|
+ * panvk-bifrost-video Phase 4: Vulkan StdVideo H.264 → V4L2 stateless H.264
|
||||||
@@ -1289,7 +1335,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264.c b/src/panfrost/vulkan/panvk_v4
|
|||||||
+}
|
+}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c
|
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c
|
||||||
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,314 @@
|
@@ -0,0 +1,314 @@
|
||||||
+/*
|
+/*
|
||||||
+ * H.264 slice header bit-parser implementation.
|
+ * H.264 slice header bit-parser implementation.
|
||||||
@@ -1607,7 +1653,7 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c b/src/panfrost/vu
|
|||||||
+}
|
+}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h
|
diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h
|
||||||
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,94 @@
|
@@ -0,0 +1,94 @@
|
||||||
+/*
|
+/*
|
||||||
+ * H.264 slice header bit-parser for panvk-bifrost-video / V4L2 stateless
|
+ * H.264 slice header bit-parser for panvk-bifrost-video / V4L2 stateless
|
||||||
@@ -1705,17 +1751,35 @@ diff -urN a/src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h b/src/panfrost/vu
|
|||||||
+#endif /* PANVK_V4L2_H264_SLICE_HEADER_H */
|
+#endif /* PANVK_V4L2_H264_SLICE_HEADER_H */
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk_video_decode.c
|
diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk_video_decode.c
|
||||||
--- a/src/panfrost/vulkan/panvk_video_decode.c 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_video_decode.c 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_video_decode.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_video_decode.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,362 @@
|
@@ -0,0 +1,380 @@
|
||||||
+/*
|
+/*
|
||||||
+ * panvk-bifrost-video Phase 4 commit 7b:
|
+ * panvk-bifrost-video: Vulkan video decode entrypoints (H.264).
|
||||||
+ * Vulkan-side decode dispatch wired to V4L2 hantro via dmabuf.
|
|
||||||
+ *
|
+ *
|
||||||
+ * Phase 1 simplification: cmd_buffer state tracking via DEVICE-level
|
+ * Drives the V4L2 stateless hantro VPU backend (panvk_v4l2.c) from
|
||||||
+ * active_video struct (under a mutex). Per-cmdbuf state hand-off is
|
+ * Vulkan vkCmdDecodeVideoKHR. Decode is synchronous at record time —
|
||||||
+ * Phase >>1 once arch-agnostic source can access per-arch cmd_buffer
|
+ * the full V4L2 ioctl dance runs to completion inside the command-
|
||||||
+ * structs without the include-path gymnastics. This works for
|
+ * recording call before returning to the application. The queue-side
|
||||||
+ * single-session decode workloads (mpv, ffmpeg, vk-video-samples).
|
+ * `driver_submit` is a no-op signal-everything (see panvk_vX_device.c).
|
||||||
|
+ *
|
||||||
|
+ * Phase 1 simplifications worth knowing about:
|
||||||
|
+ *
|
||||||
|
+ * - Cmd-buffer state lives at the DEVICE level (`active_video`) under
|
||||||
|
+ * a single mutex, NOT per-cmd-buffer. Concurrent video sessions on
|
||||||
|
+ * the same device clobber each other. Sufficient for current single-
|
||||||
|
+ * session consumers (mpv-fourier, ffmpeg-vulkan-h264, vk-video-
|
||||||
|
+ * samples). Spec-compliant multi-session is a Phase >>1 follow-up.
|
||||||
|
+ *
|
||||||
|
+ * - Source bitstream is read via `src_buf->mem->addr.host`, i.e. the
|
||||||
|
+ * bound VkDeviceMemory's CPU mapping. Works because panvk-bifrost
|
||||||
|
+ * only exposes HOST_VISIBLE memory types; an app that bound the
|
||||||
|
+ * bitstream buffer to non-HOST_VISIBLE memory would get a logged
|
||||||
|
+ * error and a silent decode skip (CmdDecodeVideoKHR is void, so we
|
||||||
|
+ * have no clean error-return path). VkPhysicalDeviceVideo*
|
||||||
|
+ * constraints would be the right place to make this contractual.
|
||||||
|
+ *
|
||||||
|
+ * - Requires `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1` (mesa-upstream gate
|
||||||
|
+ * on panvk-on-Bifrost which is not conformant).
|
||||||
+ *
|
+ *
|
||||||
+ * SPDX-License-Identifier: MIT
|
+ * SPDX-License-Identifier: MIT
|
||||||
+ */
|
+ */
|
||||||
@@ -1929,10 +1993,10 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk
|
|||||||
+ * `tv_sec * 1e9 + tv_usec * 1e3`). Sub-microsecond bits are dropped, so
|
+ * `tv_sec * 1e9 + tv_usec * 1e3`). Sub-microsecond bits are dropped, so
|
||||||
+ * any high-resolution stamp (e.g. a 64-bit pointer cast) makes the
|
+ * any high-resolution stamp (e.g. a 64-bit pointer cast) makes the
|
||||||
+ * lookup miss and P/B frames decode against zero references. Use a
|
+ * lookup miss and P/B frames decode against zero references. Use a
|
||||||
+ * monotonic per-session counter in microseconds (i.e. * 1000 ns).
|
+ * per-session monotonic counter in microseconds (i.e. * 1000 ns) so
|
||||||
|
+ * concurrent sessions sharing /dev/video1 don't collide on stamp.
|
||||||
+ */
|
+ */
|
||||||
+ static uint32_t panvk_video_ts_counter = 0;
|
+ const uint64_t output_ts = ((uint64_t)++vs->ts_counter) * 1000ULL;
|
||||||
+ const uint64_t output_ts = ((uint64_t)++panvk_video_ts_counter) * 1000ULL;
|
|
||||||
+ uint32_t dst_dpb_slot = pDecodeInfo->pSetupReferenceSlot
|
+ uint32_t dst_dpb_slot = pDecodeInfo->pSetupReferenceSlot
|
||||||
+ ? (uint32_t) pDecodeInfo->pSetupReferenceSlot->slotIndex : 0u;
|
+ ? (uint32_t) pDecodeInfo->pSetupReferenceSlot->slotIndex : 0u;
|
||||||
+
|
+
|
||||||
@@ -2071,8 +2135,8 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.c b/src/panfrost/vulkan/panvk
|
|||||||
+}
|
+}
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk_video_decode.h
|
diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk_video_decode.h
|
||||||
--- a/src/panfrost/vulkan/panvk_video_decode.h 1970-01-01 01:00:00.000000000 +0100
|
--- a/src/panfrost/vulkan/panvk_video_decode.h 1970-01-01 01:00:00.000000000 +0100
|
||||||
+++ b/src/panfrost/vulkan/panvk_video_decode.h 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_video_decode.h 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -0,0 +1,114 @@
|
@@ -0,0 +1,124 @@
|
||||||
+/*
|
+/*
|
||||||
+ * panvk-bifrost-video Phase 4 commit 3: extended for V4L2 state.
|
+ * panvk-bifrost-video Phase 4 commit 3: extended for V4L2 state.
|
||||||
+ *
|
+ *
|
||||||
@@ -2103,12 +2167,22 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk
|
|||||||
+ struct v4l2_format fmt_output;
|
+ struct v4l2_format fmt_output;
|
||||||
+ struct v4l2_format fmt_capture;
|
+ struct v4l2_format fmt_capture;
|
||||||
+
|
+
|
||||||
+ /* Request fd pool. PANVK_V4L2_REQUEST_FD_COUNT entries. */
|
+ /* Request fd pool. PANVK_V4L2_REQUEST_FD_COUNT entries.
|
||||||
|
+ * Size of request_fd_used[] is bounded by the same compile-time max;
|
||||||
|
+ * keep them coupled to avoid silent overflow if the pool grows. */
|
||||||
|
+#define PANVK_VIDEO_REQUEST_FD_MAX 32
|
||||||
+ int *request_fds;
|
+ int *request_fds;
|
||||||
+ bool request_fd_used[32]; /* tracks per-fd "ever queued" → REINIT before reuse */
|
+ bool request_fd_used[PANVK_VIDEO_REQUEST_FD_MAX];
|
||||||
+ unsigned num_request_fds;
|
+ unsigned num_request_fds;
|
||||||
+ uint32_t request_fd_next; /* round-robin index */
|
+ uint32_t request_fd_next; /* round-robin index */
|
||||||
+
|
+
|
||||||
|
+ /* Per-session V4L2 buffer-identity counter. Multiplied by 1000 ns at
|
||||||
|
+ * QBUF time so the stamp round-trips losslessly through (tv_sec,
|
||||||
|
+ * tv_usec) — hantro's reflist builder matches dpb[i].reference_ts
|
||||||
|
+ * against the kernel-side OUTPUT timestamp. Per-session (not process-
|
||||||
|
+ * global) so concurrent sessions sharing /dev/video1 don't collide. */
|
||||||
|
+ uint32_t ts_counter;
|
||||||
|
+
|
||||||
+ /* DPB slotIndex → V4L2 reference_ts mapping (Phase 1 D5) */
|
+ /* DPB slotIndex → V4L2 reference_ts mapping (Phase 1 D5) */
|
||||||
+ struct {
|
+ struct {
|
||||||
+ bool valid;
|
+ bool valid;
|
||||||
@@ -2189,7 +2263,7 @@ diff -urN a/src/panfrost/vulkan/panvk_video_decode.h b/src/panfrost/vulkan/panvk
|
|||||||
+#endif /* PANVK_VIDEO_DECODE_H */
|
+#endif /* PANVK_VIDEO_DECODE_H */
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX_device.c
|
diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX_device.c
|
||||||
--- a/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:46:57.505785441 +0200
|
--- a/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:46:57.505785441 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_vX_device.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_vX_device.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -203,6 +203,27 @@
|
@@ -203,6 +203,27 @@
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -2372,14 +2446,27 @@ diff -urN a/src/panfrost/vulkan/panvk_vX_device.c b/src/panfrost/vulkan/panvk_vX
|
|||||||
qf->queues =
|
qf->queues =
|
||||||
diff -urN a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
|
diff -urN a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
|
||||||
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:46:59.273811425 +0200
|
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:46:59.273811425 +0200
|
||||||
+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-21 22:47:09.189957157 +0200
|
+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c 2026-05-22 10:17:41.214043265 +0200
|
||||||
@@ -170,6 +170,9 @@
|
@@ -12,6 +12,7 @@
|
||||||
|
#include <sys/sysmacros.h>
|
||||||
|
|
||||||
|
#include "git_sha1.h"
|
||||||
|
+#include "panvk_video_decode.h"
|
||||||
|
|
||||||
|
#include "vk_android.h"
|
||||||
|
#include "vk_device.h"
|
||||||
|
@@ -170,6 +171,14 @@
|
||||||
.EXT_queue_family_foreign = true,
|
.EXT_queue_family_foreign = true,
|
||||||
.EXT_robustness2 = true,
|
.EXT_robustness2 = true,
|
||||||
.EXT_transform_feedback = PAN_ARCH < 9, /* iter13: JM-class only for now */
|
.EXT_transform_feedback = PAN_ARCH < 9, /* iter13: JM-class only for now */
|
||||||
+ .KHR_video_queue = PAN_ARCH < 9, /* panvk-bifrost-video Phase 4 commit 1 */
|
+ /* Video extensions are advertised only when (a) we're on a Bifrost
|
||||||
+ .KHR_video_decode_queue = PAN_ARCH < 9, /* hantro V4L2-stateless backend */
|
+ * arch (PAN_ARCH < 9) AND (b) a hantro VPU is reachable on the
|
||||||
+ .KHR_video_decode_h264 = PAN_ARCH < 9, /* H.264 only initially */
|
+ * expected V4L2 nodes — otherwise CreateVideoSessionKHR would
|
||||||
|
+ * succeed at the panvk layer and then fail at v4l2_open_fds, giving
|
||||||
|
+ * the app a misleading capability claim. */
|
||||||
|
+ .KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
|
||||||
|
+ .KHR_video_decode_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
|
||||||
|
+ .KHR_video_decode_h264 = PAN_ARCH < 9 && panvk_v4l2_probe_hantro(),
|
||||||
.EXT_sampler_filter_minmax = PAN_ARCH >= 10,
|
.EXT_sampler_filter_minmax = PAN_ARCH >= 10,
|
||||||
.EXT_scalar_block_layout = true,
|
.EXT_scalar_block_layout = true,
|
||||||
.EXT_separate_stencil_usage = true,
|
.EXT_separate_stencil_usage = true,
|
||||||
|
|||||||
+3
-3
@@ -14,9 +14,9 @@
|
|||||||
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh
|
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
UPSTREAM_COMMIT=5d8b4369e58ab947d1c56b1f718293c57c6065b5
|
UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
|
||||||
PKGVER=0.1.0+r33+g5d8b436
|
PKGVER=0.1.0+r45+g872eec5
|
||||||
PKGREL=1 # reset for new upstream pin (5d8b436 — revert parking design); still carries the #64 multi-kernel postinst fix
|
PKGREL=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED
|
||||||
MODULE_NAME=daedalus_v4l2
|
MODULE_NAME=daedalus_v4l2
|
||||||
|
|
||||||
HERE=$(dirname "$(readlink -f "$0")")
|
HERE=$(dirname "$(readlink -f "$0")")
|
||||||
|
|||||||
+21
@@ -1,3 +1,24 @@
|
|||||||
|
daedalus-v4l2-dkms (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
|
||||||
|
Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
|
||||||
|
to 1 MiB in include/daedalus_v4l2_proto.h. The kernel module
|
||||||
|
inherits the larger DAEDALUS_MAX_BITSTREAM via the same #define
|
||||||
|
and daedalus_fill_output_fmt now reports OUTPUT_MPLANE
|
||||||
|
sizeimage = ~1 MiB instead of 65484.
|
||||||
|
* Skips the r33 -> r45 commit range — between 5d8b436 and 872eec5
|
||||||
|
only one kernel/include change landed (the PROTO_MAX_PAYLOAD
|
||||||
|
bump above). The intervening daemon-only bumps (r37 / r39 /
|
||||||
|
r41 / r43) didn't touch kernel/ or include/ at all.
|
||||||
|
* Effective wire cap is min(kernel, daemon) — lock-step install
|
||||||
|
WITH daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED.
|
||||||
|
* Allocations (kmemdup / kmalloc on payload, vb2 plane backing)
|
||||||
|
are dynamic and sized per-payload at runtime; the bump only
|
||||||
|
sets the ceiling. KMALLOC_MAX_SIZE on aarch64 SLUB is several
|
||||||
|
MiB so 1 MiB is well within bounds.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 21:00:00 +0000
|
||||||
|
|
||||||
daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
|
daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel
|
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel
|
||||||
|
|||||||
Vendored
+3
-3
@@ -19,9 +19,9 @@ set -euo pipefail
|
|||||||
# source tree we own in marfrit-packages. Headers + .pc files
|
# source tree we own in marfrit-packages. Headers + .pc files
|
||||||
# come from ffmpeg-v4l2-request-fourier (installed by the CI
|
# come from ffmpeg-v4l2-request-fourier (installed by the CI
|
||||||
# workflow before this script runs; see PKG_CONFIG_PATH below).
|
# workflow before this script runs; see PKG_CONFIG_PATH below).
|
||||||
UPSTREAM_COMMIT=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c
|
UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
|
||||||
PKGVER=0.1.0+r41+g6e6dfa1
|
PKGVER=0.1.0+r45+g872eec5
|
||||||
PKGREL=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier)
|
PKGREL=1 # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED
|
||||||
|
|
||||||
# daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge
|
# daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge
|
||||||
# (install rules + pkg-config, enables this consumer to find_package
|
# (install rules + pkg-config, enables this consumer to find_package
|
||||||
|
|||||||
+43
@@ -1,3 +1,46 @@
|
|||||||
|
daedalus-v4l2 (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
|
||||||
|
Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
|
||||||
|
to 1 MiB. DAEDALUS_MAX_BITSTREAM follows; daedalus_fill_output_fmt
|
||||||
|
now reports OUTPUT_MPLANE sizeimage = ~1 MiB instead of 65484.
|
||||||
|
libva-v4l2-request-fourier's S_FMT-driven OUTPUT-pool resize
|
||||||
|
finally succeeds; Firefox no longer falls off to libmozavcodec
|
||||||
|
SW when an H.264 slice exceeds 64 KiB (routine on any
|
||||||
|
720p+ stream).
|
||||||
|
* #define-only change in include/daedalus_v4l2_proto.h; struct
|
||||||
|
layout unchanged. But effective cap is min(kernel, daemon) —
|
||||||
|
lock-step install of this package WITH
|
||||||
|
daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED.
|
||||||
|
* Daemon-side allocations are dynamic (malloc-on-payload), so
|
||||||
|
the practical growth is one ~1 MiB read buffer per daemon
|
||||||
|
process at startup. Negligible on Pi 5 / 8 GB.
|
||||||
|
* Picks up the same r43 -> r45 transition as daedalus-v4l2-dkms
|
||||||
|
(which had been stuck at r33+g5d8b436 since the parking-design
|
||||||
|
revert because the kernel module didn't change in r37/r39/r41/r43).
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 21:00:00 +0000
|
||||||
|
|
||||||
|
daedalus-v4l2 (0.1.0+r43+g1d8f5af-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 1d8f5af — picks up daedalus-v4l2 PR #18 (closes #17).
|
||||||
|
Daemon now drops degenerate (<4 byte) bitstreams at the REQ_DECODE
|
||||||
|
entry instead of letting avcodec_send_packet return
|
||||||
|
AVERROR_INVALIDDATA. Reply RESP_FRAME with status=
|
||||||
|
DAEDALUS_DECODE_NO_FRAME so libva's V4L2 surface pool stays
|
||||||
|
healthy.
|
||||||
|
* Fixes the Firefox YouTube avc1 pause→resume regression observed
|
||||||
|
on higgs: libva-v4l2-request-fourier flushes a 3-byte stub
|
||||||
|
(presumably a bare NAL start code) into OUTPUT_MPLANE at the
|
||||||
|
pause boundary; the old INVALIDDATA error path made Firefox
|
||||||
|
fall off to libmozavcodec SW for the rest of the session. With
|
||||||
|
this filter the daemon logs the sentinel as 'tiny bitstream 3
|
||||||
|
bytes — dropping as no-op' and the next real REQ_DECODE
|
||||||
|
proceeds normally.
|
||||||
|
* Wire protocol unchanged. No daedalus-v4l2-dkms bump needed.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 17:30:00 +0000
|
||||||
|
|
||||||
daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium
|
daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
* Bump to 6e6dfa1 — daedalus-v4l2 PR #16. Daemon dlopens Kwiboo
|
* Bump to 6e6dfa1 — daedalus-v4l2 PR #16. Daemon dlopens Kwiboo
|
||||||
|
|||||||
+121
@@ -0,0 +1,121 @@
|
|||||||
|
From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
|
||||||
|
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||||
|
Date: Fri, 22 May 2026 12:15:16 +0200
|
||||||
|
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
|
||||||
|
daedalus-fourier
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: text/plain; charset=UTF-8
|
||||||
|
Content-Transfer-Encoding: 8bit
|
||||||
|
|
||||||
|
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
|
||||||
|
deblock, called per macroblock-row edge from the slice deblock
|
||||||
|
loop) now dispatches through
|
||||||
|
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
|
||||||
|
ff_h264_v_loop_filter_luma_neon.
|
||||||
|
|
||||||
|
The recipe layer picks the substrate; for cycle 8 the daedalus
|
||||||
|
docstring marks the kernel "CPU primary; QPU opportunistic", but
|
||||||
|
the libavcodec.so context here is built with
|
||||||
|
daedalus_ctx_create_no_qpu — process-global pthread_once init,
|
||||||
|
shared with cycles 6/7. QPU opportunism stays gated off until a
|
||||||
|
follow-up adds an explicit feature flag (no implicit Vulkan init
|
||||||
|
in arbitrary host processes). In the meantime cycle 8 is a
|
||||||
|
plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
|
||||||
|
|
||||||
|
Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
|
||||||
|
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
|
||||||
|
only covers the non-intra path per its docstring.
|
||||||
|
|
||||||
|
FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
|
||||||
|
(int32_t alpha/beta + inline int8_t tc0[4]). pix already points
|
||||||
|
to row 0 of the bottom block per FFmpeg's deblock convention,
|
||||||
|
satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
|
||||||
|
|
||||||
|
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
|
||||||
|
---
|
||||||
|
libavcodec/aarch64/h264_idct_daedalus.c | 36 +++++++++++++++++++----
|
||||||
|
libavcodec/aarch64/h264dsp_init_aarch64.c | 4 ++-
|
||||||
|
2 files changed, 33 insertions(+), 7 deletions(-)
|
||||||
|
|
||||||
|
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
index cbb98af..92365fa 100644
|
||||||
|
--- a/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
|
||||||
|
@@ -1,11 +1,14 @@
|
||||||
|
/*
|
||||||
|
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
|
||||||
|
+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
|
||||||
|
*
|
||||||
|
- * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
|
||||||
|
- * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
|
||||||
|
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly. The
|
||||||
|
- * recipe layer picks the substrate (CPU NEON by default for cycles
|
||||||
|
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
|
||||||
|
+ * Routes H264DSPContext.idct_add → daedalus_recipe_dispatch_h264_idct4
|
||||||
|
+ * H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
|
||||||
|
+ * H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
|
||||||
|
+ * instead of the in-tree ff_h264_*_neon assembly. The recipe layer
|
||||||
|
+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
|
||||||
|
+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
|
||||||
|
+ * so cycle 8 stays on the CPU NEON path until a separate change
|
||||||
|
+ * gates QPU init on a daedalus-fourier feature flag).
|
||||||
|
*
|
||||||
|
* FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
|
||||||
|
* column-major convention: block[r + N*c] = coefficient at
|
||||||
|
@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
|
||||||
|
|
||||||
|
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
|
||||||
|
void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0);
|
||||||
|
|
||||||
|
void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
|
||||||
|
{
|
||||||
|
@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
|
||||||
|
daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
|
||||||
|
block, 1, &meta);
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0)
|
||||||
|
+{
|
||||||
|
+ daedalus_h264_deblock_meta meta = {
|
||||||
|
+ .dst_off = 0,
|
||||||
|
+ .alpha = alpha,
|
||||||
|
+ .beta = beta,
|
||||||
|
+ };
|
||||||
|
+ meta.tc0[0] = tc0[0];
|
||||||
|
+ meta.tc0[1] = tc0[1];
|
||||||
|
+ meta.tc0[2] = tc0[2];
|
||||||
|
+ meta.tc0[3] = tc0[3];
|
||||||
|
+
|
||||||
|
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
|
||||||
|
+
|
||||||
|
+ daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
|
||||||
|
+ 1, &meta);
|
||||||
|
+}
|
||||||
|
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
index 741e551..85ac381 100644
|
||||||
|
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
|
||||||
|
@@ -27,6 +27,8 @@
|
||||||
|
|
||||||
|
void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
int beta, int8_t *tc0);
|
||||||
|
+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
|
||||||
|
+ int alpha, int beta, int8_t *tc0);
|
||||||
|
void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
int beta, int8_t *tc0);
|
||||||
|
void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
|
||||||
|
@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
|
||||||
|
int cpu_flags = av_get_cpu_flags();
|
||||||
|
|
||||||
|
if (have_neon(cpu_flags) && bit_depth == 8) {
|
||||||
|
- c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_neon;
|
||||||
|
+ c->v_loop_filter_luma = ff_h264_v_loop_filter_luma_daedalus;
|
||||||
|
c->h_loop_filter_luma = ff_h264_h_loop_filter_luma_neon;
|
||||||
|
c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
|
||||||
|
c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
|
||||||
|
--
|
||||||
|
2.47.3
|
||||||
|
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
|
||||||
|
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||||
|
Date: Fri, 22 May 2026 14:18:25 +0200
|
||||||
|
Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: text/plain; charset=UTF-8
|
||||||
|
Content-Transfer-Encoding: 8bit
|
||||||
|
|
||||||
|
FFmpeg 8.x dropped the H.264 decoder's low_delay path —
|
||||||
|
AV_CODEC_FLAG_LOW_DELAY no longer prevents
|
||||||
|
h264_select_output_frame from running the display-order DPB
|
||||||
|
output queue. V4L2-stateless-style consumers (daedalus-v4l2
|
||||||
|
daemon, libva-v4l2-request-fourier) that set the flag end up
|
||||||
|
seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
|
||||||
|
|
||||||
|
Restore the documented semantics:
|
||||||
|
|
||||||
|
- Early-exit at the top of h264_select_output_frame when the
|
||||||
|
flag is set: emit the just-decoded picture immediately as
|
||||||
|
next_output_pic, mirror the corruption / recovery-point
|
||||||
|
tracking the main path performs, and skip the entire
|
||||||
|
delayed_pic[] / POC reorder machinery.
|
||||||
|
|
||||||
|
- Suppress the SPS-driven has_b_frames clobber in
|
||||||
|
h264_field_start when the flag is set, so the per-slice
|
||||||
|
bitstream_restriction_flag re-pickup cannot reintroduce a
|
||||||
|
nonzero reorder buffer mid-stream.
|
||||||
|
|
||||||
|
This is a fork-only change required by the daedalus-v4l2 daemon's
|
||||||
|
one-frame-per-send_packet contract; upstream FFmpeg consumers that
|
||||||
|
expect display-order output remain untouched (flag default = off).
|
||||||
|
|
||||||
|
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
|
||||||
|
+ flag-restoration follow-up.
|
||||||
|
---
|
||||||
|
libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
|
||||||
|
1 file changed, 23 insertions(+)
|
||||||
|
|
||||||
|
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
|
||||||
|
index 97fab70..a7bfbd6 100644
|
||||||
|
--- a/libavcodec/h264_slice.c
|
||||||
|
+++ b/libavcodec/h264_slice.c
|
||||||
|
@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
|
||||||
|
cur->mmco_reset = h->mmco_reset;
|
||||||
|
h->mmco_reset = 0;
|
||||||
|
|
||||||
|
+ /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
|
||||||
|
+ * decoder's low_delay path). Bypass the display-order DPB
|
||||||
|
+ * output queue: emit the just-decoded picture immediately, in
|
||||||
|
+ * decode order, one per send_packet. V4L2-stateless-style
|
||||||
|
+ * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
|
||||||
|
+ * do their own POC-based reorder downstream and require this
|
||||||
|
+ * behaviour. */
|
||||||
|
+ if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
|
||||||
|
+ h->next_output_pic = cur;
|
||||||
|
+ h->next_outputed_poc = cur->poc;
|
||||||
|
+ h->frame_recovered |= cur->recovered;
|
||||||
|
+ cur->recovered |= h->frame_recovered & FRAME_RECOVERED_SEI;
|
||||||
|
+ if (!cur->recovered) {
|
||||||
|
+ if (!(h->avctx->flags & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
|
||||||
|
+ !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
|
||||||
|
+ h->next_output_pic = NULL;
|
||||||
|
+ else
|
||||||
|
+ cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
|
||||||
|
+ }
|
||||||
|
+ return 0;
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
if (sps->bitstream_restriction_flag ||
|
||||||
|
h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
|
||||||
|
h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
|
||||||
|
@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
|
||||||
|
sps = h->ps.sps;
|
||||||
|
|
||||||
|
if (sps->bitstream_restriction_flag &&
|
||||||
|
+ !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
|
||||||
|
h->avctx->has_b_frames < sps->num_reorder_frames) {
|
||||||
|
h->avctx->has_b_frames = sps->num_reorder_frames;
|
||||||
|
}
|
||||||
|
--
|
||||||
|
2.47.3
|
||||||
|
|
||||||
+9
-5
@@ -33,11 +33,13 @@ FFMPEG_VERSION=8.1
|
|||||||
# epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
|
# epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
|
||||||
# +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
|
# +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
|
||||||
PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
|
PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
|
||||||
PKGREL=7 # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution
|
PKGREL=9 # pkgrel=9 — restore AV_CODEC_FLAG_LOW_DELAY semantics in the
|
||||||
# (cycle 7). Stacks on top of cycle-6 IDCT 4x4 (PR #76) and
|
# H.264 decoder (FFmpeg 8.x dropped them). Fixes the 2-1-4-3
|
||||||
# the libxml2-drop ABI-skew workaround (PR #78). Wires
|
# B-frame pair-swap that re-appeared in Firefox YouTube after
|
||||||
# H264DSPContext.idct8_add through
|
# the SONAME 61→62 jump (PR #75) silently neutered the
|
||||||
# daedalus_recipe_dispatch_h264_idct8. (2026-05-22)
|
# daemon's ctx->flags |= AV_CODEC_FLAG_LOW_DELAY at
|
||||||
|
# daemon/src/decoder.c:202. Substitution arc unchanged.
|
||||||
|
# (2026-05-22)
|
||||||
|
|
||||||
# daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6
|
# daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6
|
||||||
# H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon already ships
|
# H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon already ships
|
||||||
@@ -68,6 +70,8 @@ patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch"
|
|||||||
patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
|
patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
|
||||||
patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
|
patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
|
||||||
patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch"
|
patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch"
|
||||||
|
patch -Np1 -i "$HERE/0005-h264-deblock-luma-v-daedalus-fourier.patch"
|
||||||
|
patch -Np1 -i "$HERE/0006-h264-restore-low-delay.patch"
|
||||||
|
|
||||||
# --- daedalus-fourier: fetch + build static .a with PIC, install to a
|
# --- daedalus-fourier: fetch + build static .a with PIC, install to a
|
||||||
# per-build prefix; libavcodec.so links it into the shared object so
|
# per-build prefix; libavcodec.so links it into the shared object so
|
||||||
|
|||||||
@@ -1,3 +1,53 @@
|
|||||||
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-9) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Add 0006-h264-restore-low-delay.patch — restore the documented
|
||||||
|
AV_CODEC_FLAG_LOW_DELAY semantics in the H.264 decoder. FFmpeg
|
||||||
|
8.x dropped the H.264 low_delay code path entirely; setting the
|
||||||
|
flag at avcodec_open2 no longer prevents the display-order DPB
|
||||||
|
output queue from running. Visible on Firefox YouTube as the
|
||||||
|
2-1-4-3 B-frame pair-swap, re-introduced silently by the
|
||||||
|
SONAME 61→62 jump in daedalus-v4l2 PR #16.
|
||||||
|
* h264_select_output_frame: early-exit when LOW_DELAY is set;
|
||||||
|
emit the just-decoded picture as next_output_pic, mirror the
|
||||||
|
corruption / recovery-point tracking, skip delayed_pic[] and
|
||||||
|
the POC reorder machinery entirely.
|
||||||
|
* h264_field_start: suppress the SPS-driven
|
||||||
|
has_b_frames = sps->num_reorder_frames clobber when LOW_DELAY
|
||||||
|
is set — without this the per-slice bitstream_restriction_flag
|
||||||
|
re-pickup would reintroduce a nonzero reorder buffer mid-
|
||||||
|
stream.
|
||||||
|
* Restores the same one-frame-per-send_packet contract the
|
||||||
|
daedalus-v4l2 daemon's decoder.c already relies on (the flag
|
||||||
|
is set unconditionally for H.264). No daemon side change.
|
||||||
|
* No SONAME change, no Depends change.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 13:30:00 +0000
|
||||||
|
|
||||||
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-8) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Add 0005-h264-deblock-luma-v-daedalus-fourier.patch —
|
||||||
|
H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
|
||||||
|
deblock, called per macroblock-row edge from the slice deblock
|
||||||
|
loop in libavcodec/h264_loopfilter.c) now dispatches through
|
||||||
|
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
|
||||||
|
ff_h264_v_loop_filter_luma_neon. Cycle 8 of the daedalus-v4l2#11
|
||||||
|
step 2 substitution arc.
|
||||||
|
* Cycle 8 is marked "CPU primary; QPU opportunistic" in
|
||||||
|
daedalus-fourier, but the libavcodec.so context here uses
|
||||||
|
daedalus_ctx_create_no_qpu (process-global pthread_once,
|
||||||
|
shared with cycles 6/7). Opportunistic QPU is deferred to a
|
||||||
|
separate change that gates Vulkan init on a feature flag, to
|
||||||
|
avoid implicit Vulkan init in arbitrary host processes. For
|
||||||
|
now cycle 8 is plumbing-only — NEON-by-recipe.
|
||||||
|
* Intra (bS=4) loop filter c->v_loop_filter_luma_intra stays on
|
||||||
|
the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
|
||||||
|
only covers the non-intra path per its API docstring.
|
||||||
|
* Bit-exact against ff_h264_v_loop_filter_luma_neon (daedalus-fourier
|
||||||
|
cycle 8 green).
|
||||||
|
* No SONAME change, no Depends change.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Fri, 22 May 2026 12:30:00 +0000
|
||||||
|
|
||||||
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
|
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
* Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
|
* Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
|
||||||
|
|||||||
Reference in New Issue
Block a user