forked from marfrit/marfrit-packages
Compare commits
11 Commits
9301894997
...
1414dfeac2
| Author | SHA1 | Date | |
|---|---|---|---|
| 1414dfeac2 | |||
| 41c1e0b6b9 | |||
| c9a4b82f2c | |||
| 736b6da176 | |||
| 34972ae9c1 | |||
| a9f1b833b9 | |||
| 83e8eca56d | |||
| 1c8c186681 | |||
| a0be2dcc9f | |||
| eb89f12c3e | |||
| ce2fff1a4f |
@@ -1166,9 +1166,18 @@ jobs:
|
|||||||
# daemon never link-binds against libav (Option γ — dlopen
|
# daemon never link-binds against libav (Option γ — dlopen
|
||||||
# at runtime), so any header set with the right struct
|
# at runtime), so any header set with the right struct
|
||||||
# definitions works.
|
# definitions works.
|
||||||
|
# libvulkan-dev + glslang-tools: needed by the in-build
|
||||||
|
# daedalus-fourier fetch (build-deb.sh fetches the sibling
|
||||||
|
# library, cmake-builds it into a temp prefix, then the
|
||||||
|
# daedalus daemon static-links against it via pkg-config).
|
||||||
|
# Without these, daedalus-fourier's find_package(Vulkan)
|
||||||
|
# and glslangValidator find_program both fail at configure
|
||||||
|
# time. See marfrit/daedalus-fourier PR #1 +
|
||||||
|
# reauktion/daedalus-v4l2 PR #13.
|
||||||
retry apt-get install -y --no-install-recommends \
|
retry apt-get install -y --no-install-recommends \
|
||||||
build-essential cmake ninja-build pkg-config git \
|
build-essential cmake ninja-build pkg-config git \
|
||||||
libavcodec-dev libavformat-dev libavutil-dev libdrm-dev \
|
libavcodec-dev libavformat-dev libavutil-dev libdrm-dev \
|
||||||
|
libvulkan-dev glslang-tools \
|
||||||
linux-libc-dev \
|
linux-libc-dev \
|
||||||
curl ca-certificates openssh-client rsync dpkg-dev
|
curl ca-certificates openssh-client rsync dpkg-dev
|
||||||
|
|
||||||
|
|||||||
@@ -18,12 +18,15 @@ _module=daedalus_v4l2
|
|||||||
|
|
||||||
# Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
|
# Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
|
||||||
# bit-versioned together so the chardev wire protocol stays in sync.
|
# bit-versioned together so the chardev wire protocol stays in sync.
|
||||||
# PROTO_VERSION 0 → 1 at this pin (H.264 B-frame reorder fix); must
|
# 5d8b436 reverts PRs #7 + #8 (parking design that broke libva's
|
||||||
# install both packages atomically.
|
# 1:1 contract — see daedalus-v4l2#9 + #10). Tree is
|
||||||
_commit=79256dc7ef41f83873ca9c23db20f5888858e65d
|
# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
|
||||||
|
# PROTO_VERSION drops 1 → 0; lock-step install with
|
||||||
|
# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
|
||||||
|
_commit=5d8b4369e58ab947d1c56b1f718293c57c6065b5
|
||||||
|
|
||||||
pkgver=0.1.0.r28.79256dc
|
pkgver=0.1.0.r33.5d8b436
|
||||||
pkgrel=1 # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
|
pkgrel=1 # reset for new upstream pin (5d8b436 — revert parking design)
|
||||||
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
|
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
|
||||||
arch=('any')
|
arch=('any')
|
||||||
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
||||||
|
|||||||
@@ -16,18 +16,21 @@
|
|||||||
pkgname=daedalus-v4l2
|
pkgname=daedalus-v4l2
|
||||||
_upstreampkg=daedalus-v4l2
|
_upstreampkg=daedalus-v4l2
|
||||||
|
|
||||||
# Pin the daedalus-v4l2 tip. 79256dc = "kernel + daemon: H.264 B-frame
|
# 77e14e5 = post-revert state plus daedalus-v4l2 PRs #12 (LOW_DELAY
|
||||||
# display reorder fix (closes #6)" — adds the wire-protocol src_pts /
|
# half-measure for the H.264 display-reorder visual) and #13 (daemon
|
||||||
# output_src_pts / RESP_FRAME flags split that lets H.264 streams with
|
# now links daedalus-fourier and logs substrate availability at
|
||||||
# B-frames preserve display order through libva → kernel → daemon.
|
# startup). Daemon now needs `daedalus-fourier` at build time —
|
||||||
# PROTO_VERSION bumps 0 → 1; lock-step userspace + kernel rebuild
|
# Arch packaging for that sibling library is a follow-up; until it
|
||||||
# REQUIRED (daedalus-v4l2-dkms PKGBUILD pinned to the same commit).
|
# lands as an AUR-style PKGBUILD, this Arch build expects
|
||||||
_commit=79256dc7ef41f83873ca9c23db20f5888858e65d
|
# daedalus-fourier installed to /usr/local (or equivalent) by the
|
||||||
|
# user. See debian/daedalus-v4l2/build-deb.sh for the Debian-side
|
||||||
|
# in-build fetch-and-install of daedalus-fourier.
|
||||||
|
_commit=77e14e5a192f0eef0b41dd1140205e29d13d4d58
|
||||||
|
|
||||||
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
|
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
|
||||||
# Phase 8.x close. pkgver() recomputes at build time.
|
# Phase 8.x close. pkgver() recomputes at build time.
|
||||||
pkgver=0.1.0.r28.79256dc
|
pkgver=0.1.0.r37.77e14e5
|
||||||
pkgrel=1 # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
|
pkgrel=1 # reset for new upstream pin (77e14e5 — daedalus-fourier linkage)
|
||||||
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
|
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
|
||||||
arch=('aarch64')
|
arch=('aarch64')
|
||||||
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
|
||||||
|
|||||||
@@ -24,28 +24,29 @@ pkgname=libva-v4l2-request-fourier
|
|||||||
epoch=1
|
epoch=1
|
||||||
_upstreampkg=libva-v4l2-request
|
_upstreampkg=libva-v4l2-request
|
||||||
|
|
||||||
# Pin the fork tip. 2860d75 = PR #14 merge "picture: bounds-check
|
# Pin the fork tip. c454618 = PR #16 merge "picture, request_pool:
|
||||||
# codec_store_buffer slice writes against source_size (#13)" — addresses
|
# transparent OUTPUT-pool resize on bitstream overrun (#15)" —
|
||||||
# marfrit/libva-v4l2-request-fourier#13. Guards the three append sites
|
# follow-up root-cause fix to #13/#14. On a mid-stream bitstream-
|
||||||
# in codec_store_buffer's VASliceDataBufferType branch (H.264 Annex-B
|
# budget overrun (typical cause: SPS-driven resolution upshift in an
|
||||||
# start code, VP8 uncompressed-header pad, slice payload) against the
|
# adaptive-bitrate stream), codec_store_buffer now snapshots the in-
|
||||||
# OUTPUT pool slot's fixed sizeimage; returns
|
# flight surface's accumulated bytes, releases its OUTPUT pool slot,
|
||||||
# VA_STATUS_ERROR_ALLOCATION_FAILED with a request_log line instead of
|
# calls request_pool_resize (STREAMOFF → REQBUFS(0) → S_FMT with
|
||||||
# memcpy'ing past the mmap on a resolution upshift mid-stream (SIGSEGV
|
# 2×sizeimage hint, capped at 1 GiB, page-aligned → CREATE_BUFS →
|
||||||
# in mpv --hwdec=vaapi-copy, heap-corruption hazard in Firefox RDD).
|
# mmap → media_request_alloc → STREAMON), re-acquires a slot, re-
|
||||||
# The root-cause refactor (re-init OUTPUT pool / re-create surfaces on
|
# mirrors the surface's source_{data,size,request_fd}, restores the
|
||||||
# resolution change, or grow source_data on demand) is tracked as the
|
# bytes, and continues. The frame survives instead of being dropped
|
||||||
# follow-up backlog item; this PR is the memory-safety floor.
|
# back to libavcodec for surface recreation. CAPTURE side untouched
|
||||||
|
# (per-queue V4L2 streaming independence).
|
||||||
#
|
#
|
||||||
# Prior pin (77f9236) = PR #12 merge — av1_set_controls
|
# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
|
||||||
# (V4L2_CID_STATELESS_AV1_SEQUENCE for the daedalus daemon track).
|
# check floor (#13).
|
||||||
_commit=2860d75afe6a8e34df6afb508ff85c822bf9e908
|
_commit=c454618ae11addce2e17b560f4deeacbed067d98
|
||||||
|
|
||||||
# Project version from meson.build (1.0.0) + commit count + short sha,
|
# Project version from meson.build (1.0.0) + commit count + short sha,
|
||||||
# matching the ffmpeg-v4l2-request-fourier convention. Recomputed at
|
# matching the ffmpeg-v4l2-request-fourier convention. Recomputed at
|
||||||
# build time by pkgver() below; the static value here is a placeholder
|
# build time by pkgver() below; the static value here is a placeholder
|
||||||
# so AUR-style consumers see something coherent before src/ exists.
|
# so AUR-style consumers see something coherent before src/ exists.
|
||||||
pkgver=1.0.0.r388.2860d75
|
pkgver=1.0.0.r390.c454618
|
||||||
pkgrel=1
|
pkgrel=1
|
||||||
pkgdesc="VA-API backend for V4L2 stateless decoders (multiplanar fork — fourier umbrella)"
|
pkgdesc="VA-API backend for V4L2 stateless decoders (multiplanar fork — fourier umbrella)"
|
||||||
arch=('aarch64')
|
arch=('aarch64')
|
||||||
|
|||||||
@@ -0,0 +1,629 @@
|
|||||||
|
diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
|
||||||
|
--- a/src/panfrost/vulkan/meson.build 2026-05-21 14:04:02.529474145 +0200
|
||||||
|
+++ b/src/panfrost/vulkan/meson.build 2026-05-21 14:04:04.106755486 +0200
|
||||||
|
@@ -123,6 +123,7 @@
|
||||||
|
'panvk_vX_nir_lower_input_attachment_loads.c',
|
||||||
|
'panvk_vX_sampler.c',
|
||||||
|
'panvk_vX_shader.c',
|
||||||
|
+ 'panvk_vX_xfb_lower.c',
|
||||||
|
sha1_h,
|
||||||
|
]
|
||||||
|
|
||||||
|
diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
|
||||||
|
--- a/src/panfrost/vulkan/panvk_shader.h 2026-05-21 14:04:02.525251986 +0200
|
||||||
|
+++ b/src/panfrost/vulkan/panvk_shader.h 2026-05-21 14:04:04.084251800 +0200
|
||||||
|
@@ -154,6 +154,8 @@
|
||||||
|
/* aligned_u64 attribute below inserts the 4-byte alignment gap
|
||||||
|
* after num_vertices automatically — no explicit pad needed. */
|
||||||
|
aligned_u64 xfb_address[4]; /* iter13: 4 transform feedback buffer base addresses */
|
||||||
|
+ uint32_t xfb_topology; /* iter17: panvk_xfb_topology enum value */
|
||||||
|
+ uint32_t xfb_output_count; /* iter17: per-instance output verts after decomp */
|
||||||
|
#endif
|
||||||
|
int32_t first_vertex;
|
||||||
|
int32_t base_instance;
|
||||||
|
@@ -569,4 +571,76 @@
|
||||||
|
struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
|
||||||
|
struct panvk_shader **shader_out);
|
||||||
|
|
||||||
|
+
|
||||||
|
+#if PAN_ARCH < 9
|
||||||
|
+/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
|
||||||
|
+ * we need to distinguish at shader runtime for XFB capture. LIST topologies
|
||||||
|
+ * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
|
||||||
|
+enum panvk_xfb_topology {
|
||||||
|
+ PANVK_XFB_TOPO_LIST = 0,
|
||||||
|
+ PANVK_XFB_TOPO_LINE_STRIP = 1,
|
||||||
|
+ PANVK_XFB_TOPO_TRI_STRIP = 2,
|
||||||
|
+ PANVK_XFB_TOPO_TRI_FAN = 3,
|
||||||
|
+ PANVK_XFB_TOPO_LINE_LIST_ADJ = 4,
|
||||||
|
+ PANVK_XFB_TOPO_LINE_STRIP_ADJ = 5,
|
||||||
|
+ PANVK_XFB_TOPO_TRI_LIST_ADJ = 6,
|
||||||
|
+ PANVK_XFB_TOPO_TRI_STRIP_ADJ = 7,
|
||||||
|
+};
|
||||||
|
+
|
||||||
|
+#include "panvk_macros.h"
|
||||||
|
+struct nir_shader;
|
||||||
|
+bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
|
||||||
|
+
|
||||||
|
+/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
|
||||||
|
+static inline uint32_t
|
||||||
|
+panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
|
||||||
|
+{
|
||||||
|
+ switch (topo) {
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
|
||||||
|
+ return PANVK_XFB_TOPO_LINE_STRIP;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
|
||||||
|
+ return PANVK_XFB_TOPO_TRI_STRIP;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
|
||||||
|
+ return PANVK_XFB_TOPO_TRI_FAN;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
|
||||||
|
+ return PANVK_XFB_TOPO_LINE_LIST_ADJ;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
|
||||||
|
+ return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
|
||||||
|
+ return PANVK_XFB_TOPO_TRI_LIST_ADJ;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
|
||||||
|
+ return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
|
||||||
|
+ default:
|
||||||
|
+ return PANVK_XFB_TOPO_LIST;
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* Compute the per-instance output vertex count for a given (topology, input count). */
|
||||||
|
+static inline uint32_t
|
||||||
|
+panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
|
||||||
|
+{
|
||||||
|
+ switch (topo) {
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
|
||||||
|
+ return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
|
||||||
|
+ return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
|
||||||
|
+ return (input_count / 4u) * 2u;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
|
||||||
|
+ return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
|
||||||
|
+ return (input_count / 6u) * 3u;
|
||||||
|
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
|
||||||
|
+ return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
|
||||||
|
+ default:
|
||||||
|
+ return input_count; /* LIST topologies: 1:1 mapping */
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+#endif
|
||||||
|
+
|
||||||
|
+
|
||||||
|
#endif
|
||||||
|
diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
|
||||||
|
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c 2026-05-21 14:04:02.528576354 +0200
|
||||||
|
+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c 2026-05-21 14:04:04.091357598 +0200
|
||||||
|
@@ -727,6 +727,20 @@
|
||||||
|
/* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
|
||||||
|
* reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
|
||||||
|
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
|
||||||
|
+
|
||||||
|
+ /* iter17: XFB primitive-decomposition sysvals.
|
||||||
|
+ * xfb_topology = enum value for the current bound topology.
|
||||||
|
+ * xfb_output_count = per-instance output vertex count after decomposition.
|
||||||
|
+ * For LIST topologies, output_count == input vertex count and the shader
|
||||||
|
+ * takes the iter13 single-store fast path. */
|
||||||
|
+ {
|
||||||
|
+ VkPrimitiveTopology vk_topo =
|
||||||
|
+ cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
|
||||||
|
+ uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
|
||||||
|
+ uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
|
||||||
|
+ set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
|
||||||
|
+ set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
|
||||||
|
+ }
|
||||||
|
{
|
||||||
|
const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
|
||||||
|
/* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
|
||||||
|
diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
|
||||||
|
--- a/src/panfrost/vulkan/panvk_vX_shader.c 2026-05-21 14:04:02.527576494 +0200
|
||||||
|
+++ b/src/panfrost/vulkan/panvk_vX_shader.c 2026-05-21 14:04:04.098356619 +0200
|
||||||
|
@@ -895,7 +895,10 @@
|
||||||
|
nir->info.has_transform_feedback_varyings) {
|
||||||
|
NIR_PASS(_, nir, nir_opt_constant_folding);
|
||||||
|
NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
|
||||||
|
- NIR_PASS(_, nir, pan_nir_lower_xfb);
|
||||||
|
+ /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
|
||||||
|
+ * primitive decomposition for non-LIST topologies. Single-store LIST
|
||||||
|
+ * fast path matches iter13 behavior. */
|
||||||
|
+ NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
|
||||||
|
--- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c 1970-01-01 01:00:00.000000000 +0100
|
||||||
|
+++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c 2026-05-21 14:04:04.115354242 +0200
|
||||||
|
@@ -0,0 +1,486 @@
|
||||||
|
+/*
|
||||||
|
+ * Copyright © 2026 mfritsche / claude-noether
|
||||||
|
+ * SPDX-License-Identifier: MIT
|
||||||
|
+ *
|
||||||
|
+ * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
|
||||||
|
+ * primitive decomposition for transform_feedback on non-LIST topologies
|
||||||
|
+ * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
|
||||||
|
+ *
|
||||||
|
+ * Approach: emit a topology dispatch at the start of each store_output
|
||||||
|
+ * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
|
||||||
|
+ * into per-topology emission logic. For each affected topology, the lowered
|
||||||
|
+ * code emits guarded conditional stores — one per primitive this vertex
|
||||||
|
+ * contributes to, computing the output buffer position via primitive index
|
||||||
|
+ * and slot within the decomposed primitive.
|
||||||
|
+ *
|
||||||
|
+ * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
|
||||||
|
+ * matches iter13's single-store behavior.
|
||||||
|
+ *
|
||||||
|
+ * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
|
||||||
|
+ * as slot 2 — handled via a NIR loop bounded by num_vertices.
|
||||||
|
+ *
|
||||||
|
+ * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
|
||||||
|
+ */
|
||||||
|
+
|
||||||
|
+#include "panvk_macros.h"
|
||||||
|
+
|
||||||
|
+#if PAN_ARCH < 9
|
||||||
|
+
|
||||||
|
+#include "panvk_shader.h"
|
||||||
|
+
|
||||||
|
+#include "compiler/nir/nir_builder.h"
|
||||||
|
+#include "pan_nir.h"
|
||||||
|
+
|
||||||
|
+#include <vulkan/vulkan_core.h>
|
||||||
|
+
|
||||||
|
+/* ----- Address arithmetic ----- */
|
||||||
|
+
|
||||||
|
+static nir_def *
|
||||||
|
+xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
|
||||||
|
+ uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *byte_off = nir_iadd_imm(b,
|
||||||
|
+ nir_imul_imm(b, out_idx, stride), offset_bytes);
|
||||||
|
+ return nir_iadd(b, buf, nir_u2u64(b, byte_off));
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+static void
|
||||||
|
+emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
|
||||||
|
+ nir_def *instance_id, nir_def *raw_vid, nir_def *value,
|
||||||
|
+ uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *out_idx = nir_iadd(b,
|
||||||
|
+ nir_imul(b, instance_id, output_count), raw_vid);
|
||||||
|
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
|
||||||
|
+ nir_store_global(b, value, addr);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+static void
|
||||||
|
+emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
|
||||||
|
+ nir_def *instance_id, nir_def *eligible,
|
||||||
|
+ nir_def *prim_idx, nir_def *slot,
|
||||||
|
+ uint32_t verts_per_prim,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_push_if(b, eligible);
|
||||||
|
+ {
|
||||||
|
+ nir_def *out_idx = nir_iadd(b,
|
||||||
|
+ nir_imul(b, instance_id, output_count),
|
||||||
|
+ nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
|
||||||
|
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
|
||||||
|
+ nir_store_global(b, value, addr);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* ----- Per-topology emission ----- */
|
||||||
|
+
|
||||||
|
+/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
|
||||||
|
+static void
|
||||||
|
+emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
|
||||||
|
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
|
||||||
|
+
|
||||||
|
+ /* Prim v, slot 0: v < N-2 */
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ult(b, v, Nm2),
|
||||||
|
+ v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
|
||||||
|
+
|
||||||
|
+ /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -1);
|
||||||
|
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
|
||||||
|
+ nir_def *slot = nir_iadd_imm(b, parity, 1);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 1)),
|
||||||
|
+ nir_ult(b, v, Nm1));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, slot, 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -2);
|
||||||
|
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
|
||||||
|
+ nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 2)),
|
||||||
|
+ nir_ult(b, v, N));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, slot, 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
|
||||||
|
+static void
|
||||||
|
+emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
|
||||||
|
+
|
||||||
|
+ /* Prim v, slot 0: v < N-1 */
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ult(b, v, Nm1),
|
||||||
|
+ v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
|
||||||
|
+
|
||||||
|
+ /* Prim v-1, slot 1: 1 <= v < N */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -1);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 1)),
|
||||||
|
+ nir_ult(b, v, N));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
|
||||||
|
+ * vertex v=0: contributes to ALL prims as slot 2 (loop required)
|
||||||
|
+ * vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
|
||||||
|
+ * vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
|
||||||
|
+ */
|
||||||
|
+static void
|
||||||
|
+emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
|
||||||
|
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
|
||||||
|
+
|
||||||
|
+ /* Prim v-1, slot 0: 1 <= v < N-1 */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -1);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 1)),
|
||||||
|
+ nir_ult(b, v, Nm1));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /* Prim v-2, slot 1: 2 <= v < N */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -2);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 2)),
|
||||||
|
+ nir_ult(b, v, N));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /* Central vertex (v == 0): loop over all prims, write to slot 2. */
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, v, 0));
|
||||||
|
+ {
|
||||||
|
+ nir_variable *p_var = nir_local_variable_create(b->impl,
|
||||||
|
+ glsl_uint_type(), "fan_p");
|
||||||
|
+ nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
|
||||||
|
+ nir_push_loop(b);
|
||||||
|
+ {
|
||||||
|
+ nir_def *p = nir_load_var(b, p_var);
|
||||||
|
+ nir_push_if(b, nir_uge(b, p, Nm2));
|
||||||
|
+ {
|
||||||
|
+ nir_jump(b, nir_jump_break);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+
|
||||||
|
+ nir_def *out_idx = nir_iadd(b,
|
||||||
|
+ nir_imul(b, instance_id, output_count),
|
||||||
|
+ nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
|
||||||
|
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
|
||||||
|
+ nir_store_global(b, value, addr);
|
||||||
|
+
|
||||||
|
+ nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_loop(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
|
||||||
|
+ * v contributes if v%4 == 1: prim v/4 slot 0
|
||||||
|
+ * v contributes if v%4 == 2: prim v/4 slot 1
|
||||||
|
+ */
|
||||||
|
+static void
|
||||||
|
+emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ (void)N; /* eligibility is mod-based, not range-based */
|
||||||
|
+ nir_def *vmod4 = nir_iand_imm(b, v, 3u);
|
||||||
|
+ nir_def *prim = nir_ushr_imm(b, v, 2); /* v / 4 */
|
||||||
|
+
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ieq_imm(b, vmod4, 1),
|
||||||
|
+ prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
|
||||||
|
+
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ieq_imm(b, vmod4, 2),
|
||||||
|
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
|
||||||
|
+ * v contributes to prim v-1 slot 0 (1 <= v <= N-2)
|
||||||
|
+ * v contributes to prim v-2 slot 1 (2 <= v <= N-1)
|
||||||
|
+ */
|
||||||
|
+static void
|
||||||
|
+emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
|
||||||
|
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
|
||||||
|
+
|
||||||
|
+ /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -1);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 1)),
|
||||||
|
+ nir_ult(b, v, Nm1));
|
||||||
|
+ (void)Nm2;
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v, -2);
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 2)),
|
||||||
|
+ nir_ult(b, v, N));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
|
||||||
|
+ * v contributes if v%6 == 0: prim v/6 slot 0
|
||||||
|
+ * v contributes if v%6 == 2: prim v/6 slot 1
|
||||||
|
+ * v contributes if v%6 == 4: prim v/6 slot 2
|
||||||
|
+ */
|
||||||
|
+static void
|
||||||
|
+emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ (void)N;
|
||||||
|
+ nir_def *vmod6 = nir_umod_imm(b, v, 6);
|
||||||
|
+ nir_def *prim = nir_udiv_imm(b, v, 6);
|
||||||
|
+
|
||||||
|
+ for (uint32_t slot = 0; slot < 3; slot++) {
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ieq_imm(b, vmod6, slot * 2),
|
||||||
|
+ prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
|
||||||
|
+ * even i: {2i, 2i+2, 2i+4} (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
|
||||||
|
+ * odd i: {2i, 2i+4, 2i+2} (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
|
||||||
|
+ *
|
||||||
|
+ * Only EVEN input vertices contribute (since all output indices are 2*something).
|
||||||
|
+ * For even input v:
|
||||||
|
+ * prim v/2 slot 0 (always, if v/2 < N/2-2)
|
||||||
|
+ * prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd (when v >= 2)
|
||||||
|
+ * prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd (when v >= 4)
|
||||||
|
+ */
|
||||||
|
+static void
|
||||||
|
+emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
|
||||||
|
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
|
||||||
|
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
|
||||||
|
+{
|
||||||
|
+ /* Bail for odd input vertices — they never contribute. */
|
||||||
|
+ nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
|
||||||
|
+ nir_push_if(b, v_is_even);
|
||||||
|
+ {
|
||||||
|
+ nir_def *N_half = nir_ushr_imm(b, N, 1);
|
||||||
|
+ nir_def *max_prim = nir_iadd_imm(b, N_half, -2); /* N/2 - 2 */
|
||||||
|
+ nir_def *v_half = nir_ushr_imm(b, v, 1);
|
||||||
|
+
|
||||||
|
+ /* Prim v/2 slot 0: v/2 < N/2 - 2 */
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id,
|
||||||
|
+ nir_ult(b, v_half, max_prim),
|
||||||
|
+ v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
|
||||||
|
+
|
||||||
|
+ /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v_half, -1);
|
||||||
|
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
|
||||||
|
+ nir_def *slot = nir_iadd_imm(b, parity, 1); /* even→1, odd→2 */
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 2)),
|
||||||
|
+ nir_ult(b, prim, max_prim));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, slot, 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
|
||||||
|
+ {
|
||||||
|
+ nir_def *prim = nir_iadd_imm(b, v_half, -2);
|
||||||
|
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
|
||||||
|
+ nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity); /* even→2, odd→1 */
|
||||||
|
+ nir_def *eligible = nir_iand(b,
|
||||||
|
+ nir_uge(b, v, nir_imm_int(b, 4)),
|
||||||
|
+ nir_ult(b, prim, max_prim));
|
||||||
|
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
|
||||||
|
+ prim, slot, 3, value, stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* ----- Main lowering: per store_output XFB channel ----- */
|
||||||
|
+
|
||||||
|
+static void
|
||||||
|
+lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
|
||||||
|
+ unsigned channel_idx, unsigned num_components,
|
||||||
|
+ unsigned buffer, unsigned offset_words)
|
||||||
|
+{
|
||||||
|
+ assert(buffer < MAX_XFB_BUFFERS);
|
||||||
|
+ assert(nir_intrinsic_component(intr) == 0);
|
||||||
|
+
|
||||||
|
+ uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
|
||||||
|
+ assert(stride != 0);
|
||||||
|
+ uint16_t offset_bytes = offset_words * 4;
|
||||||
|
+
|
||||||
|
+ BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
|
||||||
|
+ BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
|
||||||
|
+
|
||||||
|
+ nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
|
||||||
|
+ nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
|
||||||
|
+ nir_def *N = nir_load_num_vertices(b);
|
||||||
|
+ nir_def *v = nir_load_raw_vertex_id_pan(b);
|
||||||
|
+ nir_def *instance = nir_load_instance_id(b);
|
||||||
|
+ nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
|
||||||
|
+
|
||||||
|
+ nir_def *src = intr->src[0].ssa;
|
||||||
|
+ nir_component_mask_t mask = nir_component_mask(num_components);
|
||||||
|
+ nir_def *value = nir_channels(b, src, mask << channel_idx);
|
||||||
|
+
|
||||||
|
+ /* Topology dispatch ladder. LIST first (fast path). */
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
|
||||||
|
+ {
|
||||||
|
+ emit_list_store(b, buf, out_count, instance, v, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ /* iter17 Janet Finding 3: gate all non-LIST emission on
|
||||||
|
+ * output_count > 0. For degenerate input counts (N < min required
|
||||||
|
+ * for the topology), output_count is 0 and we must emit NO stores
|
||||||
|
+ * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
|
||||||
|
+ * eligibility predicates and we falsely fire stores. */
|
||||||
|
+ nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
|
||||||
|
+ {
|
||||||
|
+ emit_tri_strip(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
|
||||||
|
+ {
|
||||||
|
+ emit_line_strip(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
|
||||||
|
+ {
|
||||||
|
+ emit_tri_fan(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
|
||||||
|
+ {
|
||||||
|
+ emit_line_list_adj(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
|
||||||
|
+ {
|
||||||
|
+ emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
|
||||||
|
+ {
|
||||||
|
+ emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_push_else(b, NULL);
|
||||||
|
+ {
|
||||||
|
+ /* TRI_STRIP_ADJ — last case */
|
||||||
|
+ emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
|
||||||
|
+ stride, offset_bytes);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL); /* Janet Finding 3: close output_count > 0 guard */
|
||||||
|
+ }
|
||||||
|
+ nir_pop_if(b, NULL);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
|
||||||
|
+ * dispatch store_output through our topology-aware emission. */
|
||||||
|
+static bool
|
||||||
|
+lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
|
||||||
|
+ UNUSED void *data)
|
||||||
|
+{
|
||||||
|
+ if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
|
||||||
|
+ b->cursor = nir_instr_remove(&intr->instr);
|
||||||
|
+ nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
|
||||||
|
+ nir_load_raw_vertex_offset_pan(b));
|
||||||
|
+ nir_def_rewrite_uses(&intr->def, repl);
|
||||||
|
+ return true;
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ if (intr->intrinsic != nir_intrinsic_store_output)
|
||||||
|
+ return false;
|
||||||
|
+
|
||||||
|
+ bool progress = false;
|
||||||
|
+ b->cursor = nir_before_instr(&intr->instr);
|
||||||
|
+
|
||||||
|
+ /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
|
||||||
|
+ * Outer loop selects which annotation; inner picks which channel. */
|
||||||
|
+ for (unsigned i = 0; i < 2; ++i) {
|
||||||
|
+ nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
|
||||||
|
+ : nir_intrinsic_io_xfb(intr);
|
||||||
|
+ for (unsigned j = 0; j < 2; ++j) {
|
||||||
|
+ if (!xfb.out[j].num_components)
|
||||||
|
+ continue;
|
||||||
|
+ lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
|
||||||
|
+ xfb.out[j].buffer, xfb.out[j].offset);
|
||||||
|
+ progress = true;
|
||||||
|
+ }
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ if (progress)
|
||||||
|
+ nir_instr_remove(&intr->instr);
|
||||||
|
+ return progress;
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+bool
|
||||||
|
+panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
|
||||||
|
+{
|
||||||
|
+ return nir_shader_intrinsics_pass(
|
||||||
|
+ nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+#endif /* PAN_ARCH < 9 */
|
||||||
@@ -30,7 +30,7 @@
|
|||||||
|
|
||||||
pkgname=mesa-panvk-bifrost
|
pkgname=mesa-panvk-bifrost
|
||||||
_mesaver=26.0.6
|
_mesaver=26.0.6
|
||||||
pkgver=26.0.6.r3
|
pkgver=26.0.6.r4
|
||||||
pkgrel=1
|
pkgrel=1
|
||||||
pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
|
pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
|
||||||
arch=('aarch64')
|
arch=('aarch64')
|
||||||
@@ -80,6 +80,7 @@ source=(
|
|||||||
"0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
|
"0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
|
||||||
"0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
|
"0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
|
||||||
"0003-panvk-bifrost-vk-ext-transform-feedback.patch"
|
"0003-panvk-bifrost-vk-ext-transform-feedback.patch"
|
||||||
|
"0004-panvk-bifrost-xfb-primitive-decomposition.patch"
|
||||||
"brave-vulkan"
|
"brave-vulkan"
|
||||||
"icd.json"
|
"icd.json"
|
||||||
)
|
)
|
||||||
@@ -90,6 +91,7 @@ sha256sums=(
|
|||||||
'SKIP'
|
'SKIP'
|
||||||
'SKIP'
|
'SKIP'
|
||||||
'SKIP'
|
'SKIP'
|
||||||
|
'SKIP'
|
||||||
)
|
)
|
||||||
|
|
||||||
prepare() {
|
prepare() {
|
||||||
@@ -116,6 +118,15 @@ prepare() {
|
|||||||
# reports "Hardware accelerated" across the board for the affected paths).
|
# reports "Hardware accelerated" across the board for the affected paths).
|
||||||
patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
|
patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
|
||||||
|
|
||||||
|
# iter17: XFB primitive decomposition for non-LIST topologies (TRI_STRIP,
|
||||||
|
# TRI_FAN, LINE_STRIP, *_WITH_ADJACENCY). Replacement panvk-specific
|
||||||
|
# NIR pass (panvk_per_arch(nir_lower_xfb)) substituted for upstream
|
||||||
|
# pan_nir_lower_xfb. Closes the 162 dEQP-VK winding_* failures from
|
||||||
|
# iter15 (958 P / 81 F / 0 Crash on full XFB CTS — remaining 81 fails
|
||||||
|
# are by-design resume_* tests, transformFeedbackDraw=false).
|
||||||
|
# Phase-doc context: ~/src/panvk-bifrost/iter17/phase{0,1,2,4,5,6,8}_*.md.
|
||||||
|
patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
|
||||||
|
|
||||||
# Sanity-check the patches landed.
|
# Sanity-check the patches landed.
|
||||||
grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
|
grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
|
||||||
grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
|
grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
|
||||||
@@ -124,8 +135,12 @@ prepare() {
|
|||||||
grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
|
grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
|
||||||
# iter13 sanity:
|
# iter13 sanity:
|
||||||
grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
|
grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
|
||||||
grep -q "pan_nir_lower_xfb" src/panfrost/vulkan/panvk_vX_shader.c
|
|
||||||
test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
|
test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
|
||||||
|
# iter17 sanity: pan_nir_lower_xfb call site has been replaced; new file present.
|
||||||
|
grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
|
||||||
|
grep -q "xfb_topology" src/panfrost/vulkan/panvk_shader.h
|
||||||
|
grep -q "panvk_xfb_topology" src/panfrost/vulkan/panvk_shader.h
|
||||||
|
test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
|
||||||
}
|
}
|
||||||
|
|
||||||
build() {
|
build() {
|
||||||
|
|||||||
+3
-3
@@ -14,9 +14,9 @@
|
|||||||
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh
|
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
UPSTREAM_COMMIT=79256dc7ef41f83873ca9c23db20f5888858e65d
|
UPSTREAM_COMMIT=5d8b4369e58ab947d1c56b1f718293c57c6065b5
|
||||||
PKGVER=0.1.0+r28+g79256dc
|
PKGVER=0.1.0+r33+g5d8b436
|
||||||
PKGREL=1 # reset for new upstream pin (79256dc — H.264 B-frame reorder fix); still carries the #64 multi-kernel postinst fix
|
PKGREL=1 # reset for new upstream pin (5d8b436 — revert parking design); still carries the #64 multi-kernel postinst fix
|
||||||
MODULE_NAME=daedalus_v4l2
|
MODULE_NAME=daedalus_v4l2
|
||||||
|
|
||||||
HERE=$(dirname "$(readlink -f "$0")")
|
HERE=$(dirname "$(readlink -f "$0")")
|
||||||
|
|||||||
+36
@@ -1,3 +1,39 @@
|
|||||||
|
daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel
|
||||||
|
module returns to the pre-#7 buf_done_and_job_finish completion
|
||||||
|
model: no src/dst lifecycle decoupling, no parked dst_bufs, no
|
||||||
|
1:1-contract violation against libva-v4l2-request-fourier
|
||||||
|
(closes daedalus-v4l2#9 + #10 as won't-fix at this layer; proper
|
||||||
|
fix tracked at daedalus-v4l2#11).
|
||||||
|
* Wire-protocol drops 1 → 0; lock-step install with daedalus-v4l2
|
||||||
|
0.1.0+r33+g5d8b436 REQUIRED.
|
||||||
|
* Carries forward the #64 multi-kernel postinst fix.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:50:00 +0000
|
||||||
|
|
||||||
|
daedalus-v4l2-dkms (0.1.0+r30+g6ffe92b-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 6ffe92b — fixes the kernel panic regression introduced
|
||||||
|
by 79256dc's split-completion design (closes daedalus-v4l2#8).
|
||||||
|
`device_run` now removes both src + dst from `m2m_ctx`'s
|
||||||
|
rdy_queue at pickup time, not at `buf_done` time. Without
|
||||||
|
this, after `SRC_CONSUMED`'s `job_finish` released the m2m
|
||||||
|
scheduler, the NEXT `device_run` saw the still-queued parked
|
||||||
|
dst_buf and paired it with a fresh src — two inflight entries
|
||||||
|
referencing the same vb2_buffer, the later `HAS_PIXELS`
|
||||||
|
triggered list_del on an already-detached list_head, smashing
|
||||||
|
the rdy_queue → hard reboot on Pi CM5 during `mpv vaapi-copy`
|
||||||
|
playback of 720p H.264 (2026-05-21).
|
||||||
|
* Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 1.
|
||||||
|
Daemon (userspace daedalus-v4l2 package) need NOT bump in
|
||||||
|
lockstep with this DKMS update; the existing
|
||||||
|
daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible with
|
||||||
|
daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.
|
||||||
|
* Carries forward the #64 multi-kernel postinst fix.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:00:00 +0000
|
||||||
|
|
||||||
daedalus-v4l2-dkms (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
|
daedalus-v4l2-dkms (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
|
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
|
||||||
|
|||||||
Vendored
+43
-11
@@ -11,16 +11,25 @@
|
|||||||
# Upstream repo: https://git.reauktion.de/reauktion/daedalus-v4l2
|
# Upstream repo: https://git.reauktion.de/reauktion/daedalus-v4l2
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
# Same pin as the Arch PKGBUILD. 79256dc = "kernel + daemon: H.264
|
# 77e14e5 = post-revert state plus daedalus-v4l2 PRs #12 (LOW_DELAY
|
||||||
# B-frame display reorder fix (closes #6)" — adds the wire-protocol
|
# half-measure for the H.264 display-reorder visual) and #13 (daemon
|
||||||
# src_pts / output_src_pts / RESP_FRAME flags split that lets H.264
|
# now links daedalus-fourier and logs substrate availability at
|
||||||
# streams with B-frames preserve display order through libva → kernel
|
# startup). PROTO_VERSION stays at 0; daedalus-v4l2-dkms only needs
|
||||||
# → daemon. PROTO_VERSION bumps 0 → 1; lock-step userspace + kernel
|
# bumping when the kernel module changes (no kmod changes here).
|
||||||
# rebuild REQUIRED (daedalus-v4l2-dkms build-deb.sh pinned to the same
|
#
|
||||||
# commit).
|
# New build-dep: daedalus-fourier kernel library. Fetched + built +
|
||||||
UPSTREAM_COMMIT=79256dc7ef41f83873ca9c23db20f5888858e65d
|
# installed to a per-build temp prefix below, exposed to the daemon
|
||||||
PKGVER=0.1.0+r28+g79256dc
|
# cmake via PKG_CONFIG_PATH. Static-linked into the daemon binary,
|
||||||
PKGREL=1 # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
|
# so the resulting .deb has no runtime dep on daedalus-fourier.
|
||||||
|
UPSTREAM_COMMIT=77e14e5a192f0eef0b41dd1140205e29d13d4d58
|
||||||
|
PKGVER=0.1.0+r37+g77e14e5
|
||||||
|
PKGREL=1 # reset for new upstream pin (77e14e5 — daedalus-fourier linkage)
|
||||||
|
|
||||||
|
# daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge
|
||||||
|
# (install rules + pkg-config, enables this consumer to find_package
|
||||||
|
# + link). Bump in lockstep with the upstream daemon when daedalus-
|
||||||
|
# fourier's API or installed shaders are changed by a new consumer.
|
||||||
|
DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc
|
||||||
|
|
||||||
HERE=$(dirname "$(readlink -f "$0")")
|
HERE=$(dirname "$(readlink -f "$0")")
|
||||||
|
|
||||||
@@ -30,14 +39,37 @@ export SOURCE_DATE_EPOCH=1779231600
|
|||||||
work=$(mktemp -d)
|
work=$(mktemp -d)
|
||||||
trap "rm -rf $work" EXIT
|
trap "rm -rf $work" EXIT
|
||||||
|
|
||||||
|
# --- daedalus-fourier: fetch + build + install to per-build prefix ---
|
||||||
|
#
|
||||||
|
# Static-linked into the daemon, so the temp prefix is only for the
|
||||||
|
# duration of this build script. Requires libvulkan-dev + glslang-tools
|
||||||
|
# on the runner (already needed for the daedalus-fourier benches).
|
||||||
|
FOURIER_PREFIX=$work/fourier-prefix
|
||||||
|
mkdir -p "$FOURIER_PREFIX"
|
||||||
|
|
||||||
|
cd "$work"
|
||||||
|
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
|
||||||
|
"https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
|
||||||
|
tar xzf daedalus-fourier.tar.gz
|
||||||
|
cd daedalus-fourier
|
||||||
|
cmake -B build -G Ninja \
|
||||||
|
-DCMAKE_BUILD_TYPE=Release \
|
||||||
|
-DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
|
||||||
|
cmake --build build --target daedalus_core
|
||||||
|
cmake --install build
|
||||||
|
|
||||||
|
# --- daedalus-v4l2: fetch + build daemon against installed daedalus-fourier ---
|
||||||
|
|
||||||
cd "$work"
|
cd "$work"
|
||||||
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
|
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
|
||||||
"https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
|
"https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
|
||||||
tar xzf daedalus-v4l2.tar.gz
|
tar xzf daedalus-v4l2.tar.gz
|
||||||
SRCDIR=daedalus-v4l2
|
SRCDIR=daedalus-v4l2
|
||||||
|
|
||||||
# Build daemon (CMake)
|
# Build daemon (CMake) — point pkg-config at the daedalus-fourier
|
||||||
|
# temp prefix so pkg_check_modules(DAEDALUS_FOURIER …) resolves to it.
|
||||||
cd "$SRCDIR/daemon"
|
cd "$SRCDIR/daemon"
|
||||||
|
PKG_CONFIG_PATH="$FOURIER_PREFIX/lib/pkgconfig" \
|
||||||
cmake -B build -G Ninja \
|
cmake -B build -G Ninja \
|
||||||
-DCMAKE_BUILD_TYPE=Release \
|
-DCMAKE_BUILD_TYPE=Release \
|
||||||
-DCMAKE_INSTALL_PREFIX=/usr
|
-DCMAKE_INSTALL_PREFIX=/usr
|
||||||
|
|||||||
+56
@@ -1,3 +1,59 @@
|
|||||||
|
daedalus-v4l2 (0.1.0+r37+g77e14e5-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 77e14e5 — picks up daedalus-v4l2 PRs #12 + #13.
|
||||||
|
* #12 (LOW_DELAY half-measure): the daemon now sets
|
||||||
|
AV_CODEC_FLAG_LOW_DELAY on the H.264 AVCodecContext so libavcodec
|
||||||
|
emits frames in decode order ~99% of the time (a few stragglers
|
||||||
|
at GOP boundaries when the stream's SPS num_reorder_frames
|
||||||
|
overrides the flag). Visible improvement vs the 2-1-4-3
|
||||||
|
pair-swap on Firefox YouTube + mpv playback; not a permanent
|
||||||
|
fix (see #11 for the architectural plan).
|
||||||
|
* #13 (daedalus-fourier linkage): the daemon now pkg-config-links
|
||||||
|
against the daedalus-fourier kernel library (marfrit/
|
||||||
|
daedalus-fourier) and logs substrate availability at startup.
|
||||||
|
No kernels dispatched yet — this is the build-time / link-time
|
||||||
|
foundation for the H.264 daemon-rewrite plan in #11
|
||||||
|
(substituting daedalus-fourier IDCT 4×4 / IDCT 8×8 / luma
|
||||||
|
deblock primitives for libavcodec's per-MB pixel math, one
|
||||||
|
cycle at a time, measuring CPU saved per substitution).
|
||||||
|
* Build-deb.sh now fetches + builds + installs daedalus-fourier
|
||||||
|
(pinned at d87239d, marfrit/daedalus-fourier PR #1) into a
|
||||||
|
per-build temp prefix, then builds the daemon with
|
||||||
|
PKG_CONFIG_PATH pointing at it. daedalus-fourier is
|
||||||
|
statically linked into the daemon binary, so the resulting
|
||||||
|
.deb has no new runtime deps. Requires libvulkan-dev +
|
||||||
|
glslang-tools on the CI runner (the daedalus-fourier benches
|
||||||
|
already needed those).
|
||||||
|
* Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 0.
|
||||||
|
No daedalus-v4l2-dkms bump needed.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 16:30:00 +0000
|
||||||
|
|
||||||
|
daedalus-v4l2 (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
|
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8 (the parking
|
||||||
|
design that broke libva-v4l2-request-fourier's 1:1 CAPTURE
|
||||||
|
contract; see daedalus-v4l2#9 + #10). After daemon-r28+g79256dc
|
||||||
|
landed, mpv (--hwdec=vaapi-copy) failed pre-playing with
|
||||||
|
"Unable to dequeue buffer: Resource temporarily unavailable" /
|
||||||
|
"Failed to end picture decode" because the daemon parked CAPTURE
|
||||||
|
buffers waiting for libavcodec to release H.264 B-frames in
|
||||||
|
display order — violating the V4L2 stateless 1:1 contract.
|
||||||
|
Firefox tolerated the mess (visible "2 1 4 3" pair-swap); mpv
|
||||||
|
bailed.
|
||||||
|
* This bump restores f0d4186-equivalent behaviour, plus PR #4
|
||||||
|
(cosmetic H.264 DECODE_MODE / START_CODE menu controls). PR #7
|
||||||
|
+ PR #8 wire-protocol additions (src_pts / output_src_pts /
|
||||||
|
RESP_FRAME flags) are reverted — DAEDALUS_PROTO_VERSION drops
|
||||||
|
back from 1 → 0. Lock-step install with daedalus-v4l2-dkms
|
||||||
|
0.1.0+r33+g5d8b436 REQUIRED.
|
||||||
|
* Visible regression: H.264 B-frame streams in Firefox revert to
|
||||||
|
the original "2 1 4 3 6 5" pair-swap visual. The proper fix
|
||||||
|
(concurrent in-flight requests in daemon + display-order reorder
|
||||||
|
in libva-v4l2-request-fourier) is tracked at daedalus-v4l2#11.
|
||||||
|
|
||||||
|
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:50:00 +0000
|
||||||
|
|
||||||
daedalus-v4l2 (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
|
daedalus-v4l2 (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
|
||||||
|
|
||||||
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
|
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
|
||||||
|
|||||||
+18
-16
@@ -10,23 +10,25 @@
|
|||||||
# Upstream fork: https://git.reauktion.de/marfrit/libva-v4l2-request-fourier
|
# Upstream fork: https://git.reauktion.de/marfrit/libva-v4l2-request-fourier
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
# Same pin as the Arch PKGBUILD. 2860d75 = PR #14 merge "picture:
|
# Same pin as the Arch PKGBUILD. c454618 = PR #16 merge "picture,
|
||||||
# bounds-check codec_store_buffer slice writes against source_size
|
# request_pool: transparent OUTPUT-pool resize on bitstream overrun
|
||||||
# (#13)" — guards the three append sites in codec_store_buffer's
|
# (#15)" — follow-up root-cause fix to #13/#14. On a mid-stream
|
||||||
# VASliceDataBufferType branch (H.264 Annex-B start code, VP8
|
# bitstream-budget overrun (typical cause: SPS-driven resolution
|
||||||
# uncompressed-header pad, slice payload) against the OUTPUT pool
|
# upshift in an adaptive-bitrate stream), codec_store_buffer now
|
||||||
# slot's fixed sizeimage; returns VA_STATUS_ERROR_ALLOCATION_FAILED
|
# snapshots the in-flight surface's accumulated bytes, releases its
|
||||||
# with a request_log line instead of memcpy'ing past the mmap on a
|
# OUTPUT pool slot, calls request_pool_resize (STREAMOFF →
|
||||||
# resolution upshift mid-stream. Fixes a SIGSEGV in mpv
|
# REQBUFS(0) → S_FMT with 2×sizeimage hint, capped at 1 GiB, page-
|
||||||
# --hwdec=vaapi-copy and a heap-corruption hazard in Firefox RDD.
|
# aligned → CREATE_BUFS → mmap → media_request_alloc → STREAMON),
|
||||||
# The root-cause refactor (re-init OUTPUT pool / re-create surfaces
|
# re-acquires a slot, re-mirrors the surface's source_{data,size,
|
||||||
# on resolution change, or grow source_data on demand) is tracked
|
# request_fd}, restores the bytes, and continues. The frame
|
||||||
# as the follow-up backlog item; this is the memory-safety floor.
|
# survives instead of being dropped back to libavcodec for surface
|
||||||
|
# recreation. CAPTURE side untouched (per-queue V4L2 streaming
|
||||||
|
# independence).
|
||||||
#
|
#
|
||||||
# Prior pin (77f9236) = PR #12 merge — av1_set_controls
|
# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
|
||||||
# (V4L2_CID_STATELESS_AV1_SEQUENCE for the daedalus daemon track).
|
# check floor (#13).
|
||||||
UPSTREAM_COMMIT=2860d75afe6a8e34df6afb508ff85c822bf9e908
|
UPSTREAM_COMMIT=c454618ae11addce2e17b560f4deeacbed067d98
|
||||||
PKGVER=1.0.0+r388+g2860d75
|
PKGVER=1.0.0+r390+gc454618
|
||||||
PKGREL=1
|
PKGREL=1
|
||||||
|
|
||||||
HERE=$(dirname "$(readlink -f "$0")")
|
HERE=$(dirname "$(readlink -f "$0")")
|
||||||
|
|||||||
Reference in New Issue
Block a user