mesa-panvk-bifrost: iter10 polish — drop sandbox bypass, pin sha256, tighten loader select

iter10 of the panvk-bifrost campaign. Eliminates the cosmetic '--disable-gpu-sandbox' warning at brave-vulkan launch + pins the Mesa tarball hash + makes the Vulkan ICD selection deterministic across filesystems. PKGBUILD changes (pkgrel: 1 -> 2): - install ICD JSON at /usr/share/vulkan/icd.d/00-panvk-bifrost.json (was: /usr/lib/panvk-bifrost/icd.json — required VK_ICD_FILENAMES, which the GPU sandbox would strip, forcing --disable-gpu-sandbox) - libvulkan_panfrost.so install path unchanged at /usr/lib/panvk-bifrost/ - sha256sums[0] pinned to 1d3c3b8a8363b8cc354175bb4a684ad8b035211cc1d6fa17aeb9b9623c513f89 (mesa-26.0.6.tar.xz from archive.mesa3d.org); patches + brave-vulkan + icd.json remain SKIP since they're in-tree (git-tracked) brave-vulkan changes: - dropped --no-sandbox + --disable-gpu-sandbox: env vars MESA_VK_VERSION_OVERRIDE and PAN_I_WANT_A_BROKEN_VULKAN_DRIVER survive the GPU sandbox boundary (Mesa loader reads them pre-seccomp-lockdown) - dropped VK_ICD_FILENAMES (loader auto-picks via icd.d/ directory scan) - added VK_LOADER_DRIVERS_SELECT='00-panvk-bifrost*' for deterministic ICD selection — Vulkan loader's readdir order is implementation-defined per Khronos LoaderDriverInterface, so the '00-' filename prefix is not spec-backed (ext4 happens to give insertion-order, other filesystems may not). VK_LOADER_DRIVERS_SELECT short-circuits readdir ambiguity. (Phase 5 review hardening.) Test result on ohm (pre-push validation): - brave-vulkan launches Brave without sandbox bypass - seccomp-bpf sandboxes activate normally for utility/renderer processes - 'panvk is not a conformant Vulkan implementation' fires ONCE (loader-select excluded stock ICD from enumeration — only patched driver loads) - GPU process boots, no 'Exiting GPU process' error - Brave runs through full test timeout cleanly README updated to reflect the new install layout + simplified wrapper. Campaign artifacts: ~/src/panvk-bifrost/{phase0_findings_iter10.md, phase8_iteration9_close.md (which iter10 polishes)}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 12:28:29 +02:00
74 changed files with 6630 additions and 13668 deletions
@@ -1,230 +0,0 @@
-#!/bin/bash
-# check-already-published.sh <recipe-dir>
-#
-# Decide whether a given recipe (arch/<name> or debian/<name>) is already
-# present in https://packages.reauktion.de/.  Emits exactly one line to
-# stdout:
-#
-#   skip=1   — package with this version-pkgrel-arch tuple already lives in
-#              the pool; CI should short-circuit.
-#   skip=0   — file is missing or HEAD failed; CI should build + publish.
-#
-# Design notes:
-#   * For Arch recipes we source the PKGBUILD in a clean subshell so
-#     shell expansions (epoch=, ${_pkgver/-/}, pkgname=() arrays) resolve
-#     naturally.  Only the first element of pkgname[] is checked — split
-#     packages share one source tarball / one build, so any-one-missing
-#     forces the full rebuild anyway.
-#   * For Debian recipes we extract the bare top-level PKGVER= /
-#     PKGREL= assignments (plus any other top-level VAR=value lines they
-#     reference) via grep and re-evaluate them in an isolated subshell —
-#     sourcing the entire build-deb.sh would run curl/tar/dpkg-deb
-#     against a tempdir we don't want to materialise here.
-#   * Epoch handling differs by ecosystem: Arch keeps `<epoch>:` in the
-#     pool filename, Debian/reprepro strips it.
-#   * curl --head with -f maps non-2xx to non-zero exit, which is what we
-#     want — 404 means "build it".  -L follows mirrors.  --max-time caps
-#     the worst-case latency per HEAD.
-set -euo pipefail
-
-REPO_BASE="${REPO_BASE:-https://packages.reauktion.de}"
-HEAD_TIMEOUT="${HEAD_TIMEOUT:-15}"
-
-RECIPE_DIR="${1:?usage: $0 <recipe-dir>   (e.g. arch/distcc-avahi or debian/lmcp)}"
-
-# Resolve relative to repo root if a leading path is passed; allow
-# both `arch/foo` and absolute paths.
-if [ ! -d "$RECIPE_DIR" ]; then
-  echo "error: recipe dir not found: $RECIPE_DIR" >&2
-  exit 2
-fi
-
-ecosystem="${RECIPE_DIR%%/*}"
-
-http_head() {
-  local url="$1"
-  curl -sS -L --max-time "$HEAD_TIMEOUT" -o /dev/null \
-       -w '%{http_code}' --head "$url" || echo "000"
-}
-
-emit() {
-  # one-line GITHUB_OUTPUT-compatible kv
-  echo "skip=$1"
-  exit 0
-}
-
-case "$ecosystem" in
-arch)
-  pkgbuild="$RECIPE_DIR/PKGBUILD"
-  [ -f "$pkgbuild" ] || { echo "error: $pkgbuild missing" >&2; exit 2; }
-
-  # Source in a fresh bash to capture variables. Some PKGBUILDs run
-  # functions or call commands at top level — keep this fast by
-  # restricting PATH and trapping side effects.
-  eval "$(
-    bash --noprofile --norc -c "
-      set +e
-      # Stub out anything that might shell out; we only need variable
-      # assignments to land.
-      cd '$RECIPE_DIR'
-      source ./PKGBUILD >/dev/null 2>&1 || true
-      # pkgname may be array; print first element.
-      if declare -p pkgname 2>/dev/null | grep -q 'declare -a'; then
-        first_name=\"\${pkgname[0]}\"
-      else
-        first_name=\"\$pkgname\"
-      fi
-      if declare -p arch 2>/dev/null | grep -q 'declare -a'; then
-        first_arch=\"\${arch[0]}\"
-      else
-        first_arch=\"\$arch\"
-      fi
-      printf 'PB_NAME=%q\n'   \"\$first_name\"
-      printf 'PB_VER=%q\n'    \"\$pkgver\"
-      printf 'PB_REL=%q\n'    \"\$pkgrel\"
-      printf 'PB_EPOCH=%q\n'  \"\${epoch:-}\"
-      printf 'PB_ARCH=%q\n'   \"\$first_arch\"
-    "
-  )"
-
-  if [ -z "${PB_NAME:-}" ] || [ -z "${PB_VER:-}" ] || [ -z "${PB_REL:-}" ]; then
-    echo "error: failed to parse PKGBUILD ($RECIPE_DIR)" >&2
-    emit 0
-  fi
-
-  # Pool arch:
-  #   arch=('any')                → any
-  #   arch=('aarch64' 'x86_64')   → aarch64 (we publish for both, but the
-  #                                 aarch64 artifact is the canonical CI build)
-  #   arch=('aarch64')            → aarch64
-  case "$PB_ARCH" in
-    any) pool_arch=any ;;
-    *)   pool_arch=aarch64 ;;
-  esac
-
-  # Version string with optional epoch (epoch:pkgver-pkgrel).
-  if [ -n "${PB_EPOCH:-}" ]; then
-    ver_full="${PB_EPOCH}:${PB_VER}-${PB_REL}"
-  else
-    ver_full="${PB_VER}-${PB_REL}"
-  fi
-
-  # Pool URL path (arch keeps any/aarch64 split; 'any' lands in the
-  # aarch64 dir per current marfrit layout — both arches share the
-  # blob via the publish-to-both-arches step in build.yml).
-  pool_dir="arch/aarch64"
-
-  base_url="${REPO_BASE}/${pool_dir}/${PB_NAME}-${ver_full}-${pool_arch}.pkg.tar"
-  for ext in zst xz gz; do
-    code=$(http_head "${base_url}.${ext}")
-    if [ "$code" = "200" ]; then
-      emit 1
-    fi
-  done
-  emit 0
-  ;;
-
-debian)
-  bd="$RECIPE_DIR/build-deb.sh"
-  ctrl="$RECIPE_DIR/control"
-  [ -f "$bd" ] || { echo "error: $bd missing" >&2; exit 2; }
-
-  # Pull top-level `VAR=value` lines until we've passed PKGREL, and
-  # only those whose RHS is safe to re-evaluate (no command
-  # substitution `$(...)`, no escaped `\$`, no embedded commands like
-  # `DESTDIR=... meson ...`).  This deliberately undershoots: we just
-  # need PKGVER/PKGREL plus any version vars they reference. Anything
-  # else (HERE=$(readlink ...), KERNELVER=\$(uname -r) inside a
-  # HEREDOC, etc.) gets dropped.
-  assigns=$(awk '
-    /^[A-Z_][A-Z0-9_]*=/ {
-      # split into LHS and RHS
-      eq = index($0, "=")
-      lhs = substr($0, 1, eq - 1)
-      rhs = substr($0, eq + 1)
-      # strip inline `# comment`
-      hash = index(rhs, "#")
-      if (hash > 1 && substr(rhs, hash-1, 1) == " ") rhs = substr(rhs, 1, hash - 2)
-      # reject lines with command-subst or escaped-dollar or naked commands
-      if (rhs ~ /\$\(/)   next
-      if (rhs ~ /\\\$/)   next
-      if (rhs ~ / [a-z]/) next   # e.g. `DESTDIR="$ROOT" meson ...`
-      print lhs "=" rhs
-      if (lhs == "PKGREL") exit
-    }
-  ' "$bd")
-
-  eval "$(
-    bash --noprofile --norc -c "
-      set +e
-      $assigns
-      printf 'PKGVER=%q\n' \"\${PKGVER:-}\"
-      printf 'PKGREL=%q\n' \"\${PKGREL:-}\"
-    "
-  )"
-
-  if [ -z "${PKGVER:-}" ] || [ -z "${PKGREL:-}" ]; then
-    echo "error: failed to parse PKGVER/PKGREL from $bd" >&2
-    emit 0
-  fi
-
-  # Strip epoch (`N:` prefix) — debian pool filenames omit it.
-  ver_no_epoch="${PKGVER#*:}"
-  # If PKGVER had no colon, ${PKGVER#*:} returns PKGVER unchanged (bash quirk:
-  # the pattern must match for the prefix to be stripped). Guard explicitly.
-  case "$PKGVER" in
-    *:*) : ;;
-    *)   ver_no_epoch="$PKGVER" ;;
-  esac
-
-  ver_full="${ver_no_epoch}-${PKGREL}"
-
-  # Architecture: parse control's `Architecture:` field.
-  if [ ! -f "$ctrl" ]; then
-    # Some recipes ship debian/control instead of ./control
-    ctrl="$RECIPE_DIR/debian/control"
-  fi
-  ctrl_arch=$(grep -m1 '^Architecture:' "$ctrl" 2>/dev/null | awk '{print $2}')
-  case "$ctrl_arch" in
-    all)            file_arch=all ;;
-    arm64|any)      file_arch=arm64 ;;
-    amd64)          file_arch=amd64 ;;
-    *)              file_arch=arm64 ;;  # conservative default
-  esac
-
-  pkg_name=$(basename "$RECIPE_DIR")
-
-  # Compare against the canonical Packages index (what apt actually
-  # consults).  reprepro refuses lower-version uploads, so checking
-  # only an exact source-pkgrel URL produces an endless-rebuild trap
-  # whenever source PKGREL has rolled back below pool head.  We skip
-  # if pools published version >= source version-tuple.
-  source_full="${ver_full}"
-  if [ -n "${PKGVER#*:}" ] && [ "${PKGVER}" != "${PKGVER#*:}" ]; then
-    # PKGVER had an epoch — keep it for dpkg --compare-versions.
-    source_full="${PKGVER}-${PKGREL}"
-  fi
-
-  # Determine suite: most recipes publish to both bookworm and trixie;
-  # checking trixie is sufficient (changelogs share Distribution).
-  suite="trixie"
-  pkg_arch_label="$file_arch"
-  [ "$file_arch" = "all" ] && pkg_arch_label="all"
-  packages_url="${REPO_BASE}/debian/dists/${suite}/main/binary-arm64/Packages"
-  [ "$file_arch" = "amd64" ] && packages_url="${REPO_BASE}/debian/dists/${suite}/main/binary-amd64/Packages"
-
-  pool_ver=$(set +o pipefail; curl -sS --max-time "$HEAD_TIMEOUT" "$packages_url" 2>/dev/null     | awk -v p="$pkg_name" '$1=="Package:" && $2==p {found=1; next} found && $1=="Version:" {print $2; exit}')
-
-  if [ -n "$pool_ver" ] && command -v dpkg >/dev/null &&        dpkg --compare-versions "$pool_ver" ge "$source_full"; then
-    echo "pool has $pool_ver >= source $source_full" >&2
-    emit 1
-  fi
-  echo "pool has $pool_ver, source wants $source_full — build" >&2
-  emit 0
-  ;;
-
-*)
-  echo "error: unsupported ecosystem '$ecosystem' (recipe-dir=$RECIPE_DIR)" >&2
-  emit 0
-  ;;
-esac
@@ -1,53 +0,0 @@
-# Maintainer: Markus Fritsche <mfritsche@reauktion.de>
-# aish — AI-augmented conversational shell in LuaJIT.
-# Source of truth: git.reauktion.de/marfrit/aish
-
-pkgname=aish
-pkgver=0.1.0
-pkgrel=1
-pkgdesc="AI-augmented conversational shell (LuaJIT, FFI-only)"
-arch=('any')
-url="https://git.reauktion.de/marfrit/aish"
-license=('MIT')
-depends=('luajit' 'readline' 'curl')
-# The _tag back-translation handles both clean releases (no '_') and
-# pre-release pkgvers (e.g. 0.1.0_rc1 → v0.1.0-rc1).
-_tag="v${pkgver//_/-}"
-source=("${pkgname}-${pkgver}.tar.gz::https://git.reauktion.de/marfrit/aish/archive/${_tag}.tar.gz")
-sha256sums=('9ebc3939e028832e39391ae33efacb5ec9bcd99d123cbc8ca1cd6ca9a640b5b5')
-
-package() {
-    cd "${pkgname}"
-    local libdir="${pkgdir}/usr/share/lua/5.1/aish"
-
-    # Top-level modules
-    install -Dm644 main.lua     "${libdir}/main.lua"
-    install -Dm644 broker.lua   "${libdir}/broker.lua"
-    install -Dm644 context.lua  "${libdir}/context.lua"
-    install -Dm644 executor.lua "${libdir}/executor.lua"
-    install -Dm644 history.lua  "${libdir}/history.lua"
-    install -Dm644 mcp.lua      "${libdir}/mcp.lua"
-    install -Dm644 renderer.lua "${libdir}/renderer.lua"
-    install -Dm644 repl.lua     "${libdir}/repl.lua"
-    install -Dm644 router.lua   "${libdir}/router.lua"
-    install -Dm644 safety.lua   "${libdir}/safety.lua"
-    install -Dm644 secrets.lua  "${libdir}/secrets.lua"
-
-    # FFI bindings
-    install -Dm644 ffi/curl.lua     "${libdir}/ffi/curl.lua"
-    install -Dm644 ffi/libc.lua     "${libdir}/ffi/libc.lua"
-    install -Dm644 ffi/pty.lua      "${libdir}/ffi/pty.lua"
-    install -Dm644 ffi/readline.lua "${libdir}/ffi/readline.lua"
-
-    # Vendored dependencies
-    install -Dm644 vendor/dkjson.lua "${libdir}/vendor/dkjson.lua"
-
-    # Launch wrapper
-    install -Dm755 bin/aish "${pkgdir}/usr/bin/aish"
-
-    # Documentation + example config
-    install -Dm644 README.md  "${pkgdir}/usr/share/doc/${pkgname}/README.md"
-    install -Dm644 LICENSE    "${pkgdir}/usr/share/doc/${pkgname}/LICENSE"
-    install -Dm644 examples/config.lua \
-        "${pkgdir}/usr/share/doc/${pkgname}/examples/config.lua"
-}
@@ -8,13 +8,13 @@
 # NEXT.md alongside this PKGBUILD for the full rationale and the
 # validation log on PineTab2 (RK3566).
 #
-# Cross-compiled from x86_64 using chromium's bundled clang (upstream
-# LLVM doesn't ship clang 23+ yet; chromium's internal fork is required).
-# Runtime target is aarch64. The three patches are architecture-independent.
+# Multi-arch: builds natively on x86_64 and aarch64. The x86_64 path
+# is primarily a development / CI host; the runtime target audience is
+# aarch64. The two patches are architecture-independent.

 pkgname=chromium-fourier
-pkgver=148.0.7778.178
-pkgrel=1
+pkgver=147.0.7727.116
+pkgrel=2
 epoch=1
 pkgdesc='Chromium with V4L2VDA HW video decode unlocked for mainline Linux Wayland on Rockchip'
 arch=('aarch64' 'x86_64')
@@ -150,6 +150,7 @@ build() {
    'symbol_level=0'
    'is_cfi=false'
    'treat_warnings_as_errors=false'
+    'enable_nacl=false'
    'enable_widevine=false'

    # System toolchain (clang/lld from pacman)
@@ -73,15 +73,16 @@ diff --git a/ui/ozone/common/native_pixmap_egl_binding.cc b/ui/ozone/common/nati
 index 31877f4459..6855c1093e 100644
 --- a/ui/ozone/common/native_pixmap_egl_binding.cc
 +++ b/ui/ozone/common/native_pixmap_egl_binding.cc
-@@ -6,9 +6,12 @@
-
+@@ -6,10 +6,13 @@
+ 
 #include <array>
-
+ 
 +#include "base/containers/flat_map.h"
 #include "base/logging.h"
 #include "base/memory/scoped_refptr.h"
 +#include "base/no_destructor.h"
 #include "base/notreached.h"
+ #include "base/numerics/safe_conversions.h"
 +#include "base/synchronization/lock.h"
 #include "ui/gfx/linux/drm_util_linux.h"
 #include "ui/gl/gl_bindings.h"
@@ -18,15 +18,10 @@ _module=daedalus_v4l2

 # Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
 # bit-versioned together so the chardev wire protocol stays in sync.
-# 5d8b436 reverts PRs #7 + #8 (parking design that broke libva's
-# 1:1 contract — see daedalus-v4l2#9 + #10).  Tree is
-# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
-# PROTO_VERSION drops 1 → 0; lock-step install with
-# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
-_commit=872eec505eb91b561892d02a0526749348ddc121
+_commit=481279c9bffd19e32c8f3299897e9b63fc5a24aa

-pkgver=0.1.0.r45.872eec5
-pkgrel=1  # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0.r45.872eec5 REQUIRED
+pkgver=0.1.0.r18.481279c
+pkgrel=1  # reset for new upstream pin (481279c — Phase 8.13 close)
 pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
 arch=('any')
 url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -16,19 +16,17 @@
 pkgname=daedalus-v4l2
 _upstreampkg=daedalus-v4l2

-# 6e6dfa1 = picks up daedalus-v4l2 PR #16 — daemon now dlopens
-# the Kwiboo fourier fork's libavcodec.so.62 / libavformat.so.62 /
-# libavutil.so.60 at /opt/fourier instead of Debian-stock soname
-# 61/61/59.  First step on the daedalus-fourier substitution arc
-# (daedalus-v4l2#11).  Daemon still needs daedalus-fourier at
-# build time (Arch packaging for that is a follow-up; Debian side
-# fetches inline via build-deb.sh).
-_commit=872eec505eb91b561892d02a0526749348ddc121
+# Pin the daedalus-v4l2 tip.  481279c = "Phase 8.13: byte-exact end-to-
+# end via libva (consumer target hit)" — first commit where the full
+# ffmpeg -hwaccel vaapi → libva → /dev/video0 → daemon path lands a
+# pixel-correct decoded frame back in ffmpeg.  Promote to a later pin
+# only after a future phase closes cleanly.
+_commit=481279c9bffd19e32c8f3299897e9b63fc5a24aa

 # 0.1.0 (pre-1.0) + commit count + short sha.  Bump the .Y on each
 # Phase 8.x close.  pkgver() recomputes at build time.
-pkgver=0.1.0.r45.872eec5
-pkgrel=1  # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0.r45.872eec5 REQUIRED
+pkgver=0.1.0.r18.481279c
+pkgrel=1  # reset for new upstream pin (481279c — Phase 8.13 close)
 pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
 arch=('aarch64')
 url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -36,7 +34,7 @@ license=('BSD-2-Clause' 'GPL-2.0-or-later')
 # Daemon dlopens libavformat.so.61 / libavcodec.so.61 / libavutil.so.59
 # at runtime (Option γ — see daemon/src/ffmpeg_loader.h).  ffmpeg
 # provides those; we don't link them.
-depends=('ffmpeg-v4l2-request-fourier' 'libdrm')
+depends=('ffmpeg' 'libdrm')
 # Headers from libav*-dev needed at compile time for type-safe function
 # pointer signatures; pkg-config locates them.
 makedepends=('cmake' 'ninja' 'pkgconf' 'git' 'ffmpeg')
@@ -1,137 +0,0 @@
-From f760c0541586f43334c02611fcb4c212c08ad576 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Thu, 21 May 2026 21:40:22 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 4x4 IDCT through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.idct_add (called per 4x4 block from the intra-4x4
-decode path in h264_mb.c) now dispatches through
-daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
-
-The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4x4)
-the recipe is CPU NEON, so this is effectively a NEON-to-NEON
-substitution with one extra dispatch call and recipe-table lookup.
-Provides the first end-to-end exercise of the daedalus-fourier
-kernel pack inside the libavcodec.so decode hot path; follow-up
-patches wire IDCT 8x8, luma-v deblock, and qpel mc20.
-
-The library context is process-global, lazily initialised under
-pthread_once on first call.  We pick the no-QPU constructor because
-libavcodec.so is loaded into arbitrary host processes
-(firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
-cannot assume the host has a usable Vulkan instance.  Higher cycles
-(deblock luma-v, MC) that benefit from the QPU will provision their
-own recipe-selected context once that path is wired.
-
-Bulk paths (idct_add16, idct_add16intra, idct_add8 — used for
-non-intra4x4 macroblocks) remain on the stock NEON .S implementations
-and will be batched through daedalus_recipe_dispatch_h264_idct4 with
-n_blocks>1 in a follow-up.
-
-Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
-green; see marfrit/daedalus-fourier/CYCLE_LOGS.md).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2.
---
- libavcodec/aarch64/Makefile               |  3 +-
- libavcodec/aarch64/h264_idct_daedalus.c   | 49 +++++++++++++++++++++++
- libavcodec/aarch64/h264dsp_init_aarch64.c |  3 +-
- 3 files changed, 53 insertions(+), 2 deletions(-)
- create mode 100644 libavcodec/aarch64/h264_idct_daedalus.c
-
-diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
-index 41ab025..7b95fb1 100644
--- a/libavcodec/aarch64/Makefile
-+++ b/libavcodec/aarch64/Makefile
-@@ -3,7 +3,8 @@ OBJS-$(CONFIG_AC3DSP)                   += aarch64/ac3dsp_init_aarch64.o
- OBJS-$(CONFIG_FDCTDSP)                  += aarch64/fdctdsp_init_aarch64.o
- OBJS-$(CONFIG_FMTCONVERT)               += aarch64/fmtconvert_init.o
- OBJS-$(CONFIG_H264CHROMA)               += aarch64/h264chroma_init_aarch64.o
-OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o
-+OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o \
-+                                           aarch64/h264_idct_daedalus.o
- OBJS-$(CONFIG_HUFFYUVDSP)               += aarch64/huffyuvdsp_init_aarch64.o
- OBJS-$(CONFIG_H264PRED)                 += aarch64/h264pred_init.o
- OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-new file mode 100644
-index 0000000..538d223
--- /dev/null
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -0,0 +1,49 @@
-+/*
-+ * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
-+ *
-+ * Routes H264DSPContext.idct_add through
-+ * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
-+ * The recipe layer picks the substrate (CPU NEON by default for
-+ * cycle 6; future cycles may dispatch to V3D opportunistically).
-+ *
-+ * FFmpeg's 4x4 block memory layout matches daedalus's column-major
-+ * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
-+ * sides destructively zero the block after the transform.
-+ *
-+ * The library context is process-global and lazily initialised under
-+ * pthread_once.  We pick the no-QPU constructor here because
-+ * libavcodec.so is loaded into arbitrary host processes
-+ * (firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
-+ * cannot assume the host has a usable Vulkan instance.  Higher cycles
-+ * (deblock, MC) that benefit from the QPU initialise their own
-+ * recipe-selected context once that path is wired.
-+ */
-+
-+#include <pthread.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+
-+#include <daedalus.h>
-+
-+#include "libavutil/attributes.h"
-+#include "libavcodec/h264dsp.h"
-+
-+static daedalus_ctx     *g_dctx;
-+static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-+
-+static void daedalus_ctx_init_once(void)
-+{
-+    g_dctx = daedalus_ctx_create_no_qpu();
-+}
-+
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-+{
-+    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
-+                                        block, 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index c684574..b993df2 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -66,6 +66,7 @@ void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride
-                                     int weights, int offset);
- 
- void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
-                              int16_t *block, int stride,
-@@ -139,7 +140,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-         c->biweight_pixels_tab[1] = ff_biweight_h264_pixels_8_neon;
-         c->biweight_pixels_tab[2] = ff_biweight_h264_pixels_4_neon;
- 
-        c->idct_add        = ff_h264_idct_add_neon;
-+        c->idct_add        = ff_h264_idct_add_daedalus;
-         c->idct_dc_add     = ff_h264_idct_dc_add_neon;
-         c->idct_add16      = ff_h264_idct_add16_neon;
-         c->idct_add16intra = ff_h264_idct_add16intra_neon;
-- 
-2.47.3
-
@@ -1,107 +0,0 @@
-From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 10:18:21 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.idct8_add (called per 8x8 block from the High-profile
-intra-8x8-DCT decode path in h264_mb.c) now dispatches through
-daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
-
-The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
-the recipe is CPU NEON, so this is effectively a NEON-to-NEON
-substitution layered on top of the cycle-6 IDCT 4x4 wiring.  Same
-pthread_once global context, same destructive-zero semantics; FFmpeg
-column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
-
-Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
-on the in-tree NEON .S code and will be batched through
-daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
-
-Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
-green).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
---
- libavcodec/aarch64/h264_idct_daedalus.c   | 29 ++++++++++++++++-------
- libavcodec/aarch64/h264dsp_init_aarch64.c |  3 ++-
- 2 files changed, 23 insertions(+), 9 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-index 538d223..cbb98af 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -1,14 +1,16 @@
- /*
- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
-+ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
-  *
- * Routes H264DSPContext.idct_add through
- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
- * The recipe layer picks the substrate (CPU NEON by default for
- * cycle 6; future cycles may dispatch to V3D opportunistically).
-+ * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
-+ *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
-+ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
-+ * recipe layer picks the substrate (CPU NEON by default for cycles
-+ * 6 + 7; future cycles may dispatch to V3D opportunistically).
-  *
- * FFmpeg's 4x4 block memory layout matches daedalus's column-major
- * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
- * sides destructively zero the block after the transform.
-+ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
-+ * column-major convention: block[r + N*c] = coefficient at
-+ * (row r, col c) for N ∈ {4, 8}.  Both sides destructively zero the
-+ * block after the transform.
-  *
-  * The library context is process-global and lazily initialised under
-  * pthread_once.  We pick the no-QPU constructor here because
-@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
- }
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-     daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
-                                         block, 1, &meta);
- }
-+
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-+{
-+    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
-+                                        block, 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index b993df2..741e551 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
-                             const uint8_t nnzc[15 * 8]);
- 
- void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
-                              int16_t *block, int stride,
-@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-         c->idct_add16intra = ff_h264_idct_add16intra_neon;
-         if (chroma_format_idc <= 1)
-             c->idct_add8   = ff_h264_idct_add8_neon;
-        c->idct8_add       = ff_h264_idct8_add_neon;
-+        c->idct8_add       = ff_h264_idct8_add_daedalus;
-         c->idct8_dc_add    = ff_h264_idct8_dc_add_neon;
-         c->idct8_add4      = ff_h264_idct8_add4_neon;
-     } else if (have_neon(cpu_flags) && bit_depth == 10) {
-- 
-2.47.3
-
@@ -1,121 +0,0 @@
-From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 12:15:16 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
-deblock, called per macroblock-row edge from the slice deblock
-loop) now dispatches through
-daedalus_recipe_dispatch_h264_deblock_luma_v instead of
-ff_h264_v_loop_filter_luma_neon.
-
-The recipe layer picks the substrate; for cycle 8 the daedalus
-docstring marks the kernel "CPU primary; QPU opportunistic", but
-the libavcodec.so context here is built with
-daedalus_ctx_create_no_qpu — process-global pthread_once init,
-shared with cycles 6/7.  QPU opportunism stays gated off until a
-follow-up adds an explicit feature flag (no implicit Vulkan init
-in arbitrary host processes).  In the meantime cycle 8 is a
-plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
-
-Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
-the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
-only covers the non-intra path per its docstring.
-
-FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
-(int32_t alpha/beta + inline int8_t tc0[4]).  pix already points
-to row 0 of the bottom block per FFmpeg's deblock convention,
-satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
---
- libavcodec/aarch64/h264_idct_daedalus.c   | 36 +++++++++++++++++++----
- libavcodec/aarch64/h264dsp_init_aarch64.c |  4 ++-
- 2 files changed, 33 insertions(+), 7 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-index cbb98af..92365fa 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -1,11 +1,14 @@
- /*
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
-  *
- * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
- *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
- * recipe layer picks the substrate (CPU NEON by default for cycles
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
-+ * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-+ *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-+ *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-+ * so cycle 8 stays on the CPU NEON path until a separate change
-+ * gates QPU init on a daedalus-fourier feature flag).
-  *
-  * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
-  * column-major convention: block[r + N*c] = coefficient at
-@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-     daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
-                                         block, 1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
-+                                                 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index 741e551..85ac381 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -27,6 +27,8 @@
- 
- void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-     int cpu_flags = av_get_cpu_flags();
- 
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-        c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_neon;
-+        c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-         c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_neon;
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
-- 
-2.47.3
-
@@ -1,82 +0,0 @@
-From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 14:18:25 +0200
-Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-FFmpeg 8.x dropped the H.264 decoder's low_delay path —
-AV_CODEC_FLAG_LOW_DELAY no longer prevents
-h264_select_output_frame from running the display-order DPB
-output queue.  V4L2-stateless-style consumers (daedalus-v4l2
-daemon, libva-v4l2-request-fourier) that set the flag end up
-seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
-
-Restore the documented semantics:
-
-  - Early-exit at the top of h264_select_output_frame when the
-    flag is set: emit the just-decoded picture immediately as
-    next_output_pic, mirror the corruption / recovery-point
-    tracking the main path performs, and skip the entire
-    delayed_pic[] / POC reorder machinery.
-
-  - Suppress the SPS-driven has_b_frames clobber in
-    h264_field_start when the flag is set, so the per-slice
-    bitstream_restriction_flag re-pickup cannot reintroduce a
-    nonzero reorder buffer mid-stream.
-
-This is a fork-only change required by the daedalus-v4l2 daemon's
-one-frame-per-send_packet contract; upstream FFmpeg consumers that
-expect display-order output remain untouched (flag default = off).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
-+ flag-restoration follow-up.
---
- libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
- 1 file changed, 23 insertions(+)
-
-diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
-index 97fab70..a7bfbd6 100644
--- a/libavcodec/h264_slice.c
-+++ b/libavcodec/h264_slice.c
-@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
-     cur->mmco_reset = h->mmco_reset;
-     h->mmco_reset = 0;
- 
-+    /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
-+     * decoder's low_delay path).  Bypass the display-order DPB
-+     * output queue: emit the just-decoded picture immediately, in
-+     * decode order, one per send_packet.  V4L2-stateless-style
-+     * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
-+     * do their own POC-based reorder downstream and require this
-+     * behaviour. */
-+    if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
-+        h->next_output_pic    = cur;
-+        h->next_outputed_poc  = cur->poc;
-+        h->frame_recovered   |= cur->recovered;
-+        cur->recovered       |= h->frame_recovered & FRAME_RECOVERED_SEI;
-+        if (!cur->recovered) {
-+            if (!(h->avctx->flags  & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
-+                !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
-+                h->next_output_pic = NULL;
-+            else
-+                cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
-+        }
-+        return 0;
-+    }
-+
-     if (sps->bitstream_restriction_flag ||
-         h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
-         h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
-@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
-     sps = h->ps.sps;
- 
-     if (sps->bitstream_restriction_flag &&
-+        !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
-         h->avctx->has_b_frames < sps->num_reorder_frames) {
-         h->avctx->has_b_frames = sps->num_reorder_frames;
-     }
-- 
-2.47.3
-
@@ -1,139 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Sat, 23 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264qpel: route 8x8 mc20 through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
-half-pel, 6-tap "put" variant — the canonical representative of the
-H.264 luma motion-compensation family) now dispatches through
-daedalus_recipe_dispatch_h264_qpel_mc20 instead of
-ff_put_h264_qpel8_mc20_neon.
-
-Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
-4-cycle libavcodec.so substitution sequence (6 IDCT 4x4 / 7 IDCT 8x8 /
-8 luma-v deblock / 9 qpel mc20).
-
-The recipe layer picks the substrate. Per docs/k9_h264qpel_mc20.md
-the verdict is CPU NEON: per-block 7.6 ns at 131 Mblock/s gives 135x
-margin over 30 fps 1080p, and the QPU dispatch floor (~250 ns)
-makes any V3D shader strictly worse. Substitution is plumbing-only,
-NEON-by-recipe — same daedalus_ctx_create_no_qpu pthread_once
-context shape the cycles 6/7/8 shims already own (kept SEPARATE
-from the H264DSP shim's ctx because H264QPEL is its own libavcodec
-Makefile module and link order does not guarantee a single .o
-owns the ctx symbol; one extra ~µs init per process, paid lazily).
-
-Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
-size tier stay on the in-tree NEON .S code. Per the cycle-9 phase-1
-rationale, mc20 8x8 is representative of the whole family's per-block
-cost — extending the substitution to other variants would multiply
-recipe-lookup overhead without changing the substrate verdict.
-
-Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
-cycle 9 green; M1 = 100% bit-exact across 10000 random blocks).
-
-No SONAME change, no Depends change.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
---
- libavcodec/aarch64/Makefile                |  3 +-
- libavcodec/aarch64/h264_qpel_daedalus.c    | 50 ++++++++++++++++++++++
- libavcodec/aarch64/h264qpel_init_aarch64.c |  4 +-
- 3 files changed, 55 insertions(+), 2 deletions(-)
- create mode 100644 libavcodec/aarch64/h264_qpel_daedalus.c
-
-diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
--- a/libavcodec/aarch64/Makefile
-+++ b/libavcodec/aarch64/Makefile
-@@ -7,7 +7,8 @@ OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o \
-                                            aarch64/h264_idct_daedalus.o
- OBJS-$(CONFIG_HUFFYUVDSP)               += aarch64/huffyuvdsp_init_aarch64.o
- OBJS-$(CONFIG_H264PRED)                 += aarch64/h264pred_init.o
-OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o
-+OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o \
-+                                           aarch64/h264_qpel_daedalus.o
- OBJS-$(CONFIG_HPELDSP)                  += aarch64/hpeldsp_init_aarch64.o
- OBJS-$(CONFIG_IDCTDSP)                  += aarch64/idctdsp_init_aarch64.o
- OBJS-$(CONFIG_ME_CMP)                   += aarch64/me_cmp_init_aarch64.o
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
-new file mode 100644
--- /dev/null
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -0,0 +1,50 @@
-+/*
-+ * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
-+ * — daedalus-fourier substitution shim.
-+ *
-+ * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
-+ * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
-+ * ff_put_h264_qpel8_mc20_neon.  The recipe layer picks the substrate
-+ * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
-+ * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
-+ *
-+ * Sibling to libavcodec/aarch64/h264_idct_daedalus.c.  We keep a
-+ * SEPARATE process-global pthread_once context here instead of
-+ * sharing the H264DSP one because H264QPEL is its own libavcodec
-+ * Makefile module and link order does not guarantee a single .o
-+ * owns the ctx symbol.  The cost is one extra
-+ * daedalus_ctx_create_no_qpu (~µs) per process; daemon and host
-+ * processes pay this lazily on first MC call.
-+ *
-+ * FFmpeg H264QpelContext convention: both dst and src use a SINGLE
-+ * stride and `src` already points at the leftmost OUTPUT column
-+ * (col 0); the 6-tap filter reads cols -2..+3.  This matches
-+ * daedalus_recipe_dispatch_h264_qpel_mc20's documented contract
-+ * directly, so dst_off = src_off = 0.
-+ */
-+
-+#include <pthread.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+
-+#include <daedalus.h>
-+
-+#include "libavutil/attributes.h"
-+
-+static daedalus_ctx     *g_dctx;
-+static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-+
-+static void daedalus_ctx_init_once(void)
-+{
-+    g_dctx = daedalus_ctx_create_no_qpu();
-+}
-+
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-+
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride)
-+{
-+    static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 };
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
-+                                            1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c
-+++ b/libavcodec/aarch64/h264qpel_init_aarch64.c
-@@ -47,6 +47,8 @@ void ff_put_h264_qpel8_mc00_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t str
- void ff_put_h264_qpel8_mc10_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc20_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-+                                     ptrdiff_t stride);
- void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-@@ -184,7 +186,7 @@ av_cold void ff_h264qpel_init_aarch64(H264QpelContext *c, int bit_depth)
-
-         c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
-         c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
-        c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
-         c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
-         c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
-         c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
--
-2.47.3
@@ -1,92 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-h deblock through daedalus-fourier
-
-Sibling of 0005 (which substituted v_loop_filter_luma).  Same
-NEON-to-NEON substitution: H264DSPContext.h_loop_filter_luma →
-daedalus_recipe_dispatch_h264_deblock_luma_h.  The H kernel landed
-in daedalus-fourier PR #9 (CPU NEON only — no QPU shader yet).
-
-libavcodec.so ctx is no-QPU per the existing 0003-0005 / 0007
-pattern; we cannot assume Vulkan in arbitrary host processes
-(firefox-fourier RDD, mpv-fourier, etc.).
-
-Intra (bS=4) h_loop_filter_luma_intra stays on the in-tree NEON .S
-code; daedalus_h264_deblock_meta only covers the non-intra path.
-An intra-h substitution can land once daedalus-fourier exposes a
-dispatch helper (the kernel already exists internally per PR #11).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 H.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:09:33.694760715 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:09:33.715603719 +0200
-@@ -1,9 +1,10 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-  *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ *        H264DSPContext.h_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_h
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -45,6 +46,8 @@
- void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                          int alpha, int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -84,3 +87,22 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
-                                                  1, &meta);
- }
-+
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_h(g_dctx, pix, (size_t)stride,
-+                                                 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:09:33.695937103 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:09:33.715541700 +0200
-@@ -31,6 +31,8 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
- void ff_h264_h_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-@@ -117,7 +119,7 @@
- 
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-         c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-        c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_neon;
-+        c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_daedalus;
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
- 
--
-2.47.3
-
@@ -1,127 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma v/h deblock through daedalus-fourier
-
-Chroma siblings of 0005 (luma_v) and 0008 (luma_h).  Same
-NEON-to-NEON pattern via the daedalus recipe layer:
-
-  H264DSPContext.v_loop_filter_chroma →
-    daedalus_recipe_dispatch_h264_deblock_chroma_v
-  H264DSPContext.h_loop_filter_chroma →
-    daedalus_recipe_dispatch_h264_deblock_chroma_h
-
-Both kernels landed in daedalus-fourier PR #10.  Recipe table
-routes AUTO to CPU NEON (no chroma QPU shaders yet), so this
-is plumbing-only and stays bit-exact against the in-tree NEON.
-
-Intra chroma (bS=4) loop filters remain on in-tree NEON;
-daedalus_h264_deblock_meta covers the non-intra (bS<4) path.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 chroma.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:15:45.995368233 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:15:46.015839177 +0200
-@@ -1,10 +1,12 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h + chroma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
- *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
- *        H264DSPContext.h_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_h
-+ *        H264DSPContext.v_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ *        H264DSPContext.h_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_h
-+ *        H264DSPContext.v_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_v
-+ *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -48,6 +50,10 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                          int alpha, int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -106,3 +112,41 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_h(g_dctx, pix, (size_t)stride,
-                                                  1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_chroma_v(g_dctx, pix, (size_t)stride,
-+                                                   1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_chroma_h(g_dctx, pix, (size_t)stride,
-+                                                   1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:15:45.996482360 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:15:46.025604910 +0200
-@@ -39,8 +39,12 @@
-                                            int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma422_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                           int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_intra_neon(uint8_t *pix, ptrdiff_t stride,
-@@ -123,11 +127,11 @@
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
- 
-        c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_neon;
-+        c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-            c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_neon;
-+            c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
-         } else {
--
-2.47.3
-
@@ -1,126 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:30:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma intra deblock through daedalus-fourier
-
-Adds the bS=4 intra-strength variants of the already-substituted
-luma_v / luma_h deblock (0005, 0008).  Intra MBs and certain
-inter-MB edges (4x4 transform boundaries inside an Intra_NxN
-neighbour) force boundary strength to 4 per H.264 §8.7.2.1.
-
-  H264DSPContext.v_loop_filter_luma_intra →
-    daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  H264DSPContext.h_loop_filter_luma_intra →
-    daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-
-Both kernels landed in daedalus-fourier PR #11.  Recipe table
-routes AUTO to CPU NEON (no intra QPU shaders yet) — plumbing-
-only NEON-to-NEON via daedalus, bit-exact against the in-tree
-FFmpeg NEON path.
-
-Signature differs from bS<4: no tc0 argument.  The wrapper
-passes daedalus_h264_deblock_meta with alpha/beta set; tc0[] is
-ignored by the intra dispatch (bS=4 hardcodes the strength).
-
-Chroma intra variants are deferred to a follow-up PR because the
-chroma path has a 4:2:0 / 4:2:2 split (chroma_format_idc gating)
-that needs explicit conditional substitution to avoid running
-the 4:2:0-only daedalus dispatch on 4:2:2 chroma.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 intra.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:18:54.992244965 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:20:12.338122217 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h + chroma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter + intra) + chroma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -7,6 +7,8 @@
-  *        H264DSPContext.h_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_h
-  *        H264DSPContext.v_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_v
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-+ *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-+ *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -54,6 +56,10 @@
-                                            int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                            int alpha, int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -150,3 +156,34 @@
-     daedalus_recipe_dispatch_h264_deblock_chroma_h(g_dctx, pix, (size_t)stride,
-                                                    1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    /* tc0[] is ignored by the intra-strength dispatch (bS=4 hardcodes the strength). */
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_v_intra(g_dctx, pix, (size_t)stride,
-+                                                        1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_h_intra(g_dctx, pix, (size_t)stride,
-+                                                        1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:18:54.993349573 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:20:12.338265830 +0200
-@@ -35,8 +35,12 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -124,8 +128,8 @@
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-         c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-         c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_daedalus;
-        c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-        c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
-+        c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_daedalus;
-+        c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_daedalus;
- 
-         c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
--
-2.47.3
-
@@ -1,101 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 13:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma DC Hadamard through daedalus-fourier
-
-Substitutes H264DSPContext.chroma_dc_dequant_idct in the
-4:2:0 / bit_depth=8 init path with a wrapper that composes
-the daedalus chroma DC Hadamard primitive (fourier PR #25)
-with qmul scaling FFmpeg does in one fused function.
-
-Bit-exact against ff_h264_chroma_dc_dequant_idct_8_c.
-Hadamard correctness gated by fourier PR #23 test suite.
-
-4:2:2 chroma stays on the in-tree 422 variant (same
-gating shape as 0009 chroma deblock substitution).
-
-Requires daedalus-fourier commit b9f9ff2 or later (PR #25
-exposing the public Hadamard symbol).  Pin bumps in PKGBUILD
-and build-deb.sh come in the same commit.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:38:32.019491484 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:38:32.033821507 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h (inter + intra) + chroma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -9,6 +9,7 @@
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-+ *        H264DSPContext.chroma_dc_dequant_idct   → daedalus_h264_chroma_dc_hadamard_2x2 + caller-side qmul
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -60,6 +61,7 @@
-                                                 int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -187,3 +189,32 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_h_intra(g_dctx, pix, (size_t)stride,
-                                                         1, &meta);
- }
-+
-+/* Composes daedalus_h264_chroma_dc_hadamard_2x2 with the qmul scaling
-+ * that FFmpeg's reference does in one fused function (h264idct_template.c
-+ * ff_h264_chroma_dc_dequant_idct).
-+ *
-+ * The 4 DC coefficients are scattered across the per-MB coefficient
-+ * buffer at offsets [r*stride + c*xStride] (stride=32, xStride=16).
-+ * Extract into a contiguous int16[4], run the Hadamard, then apply
-+ * the qmul scale and write back to the original positions.
-+ *
-+ * No daedalus ctx needed; the Hadamard is a pure stateless primitive.
-+ */
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul)
-+{
-+    enum { stride = 32, xStride = 16 };
-+    int16_t dc[4];
-+
-+    dc[0] = block[stride*0 + xStride*0];
-+    dc[1] = block[stride*0 + xStride*1];
-+    dc[2] = block[stride*1 + xStride*0];
-+    dc[3] = block[stride*1 + xStride*1];
-+
-+    daedalus_h264_chroma_dc_hadamard_2x2(dc);
-+
-+    block[stride*0 + xStride*0] = (int16_t)((int)dc[0] * qmul >> 7);
-+    block[stride*0 + xStride*1] = (int16_t)((int)dc[1] * qmul >> 7);
-+    block[stride*1 + xStride*0] = (int16_t)((int)dc[2] * qmul >> 7);
-+    block[stride*1 + xStride*1] = (int16_t)((int)dc[3] * qmul >> 7);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:38:32.020346459 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:38:32.033909804 +0200
-@@ -41,6 +41,7 @@
-                                            int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -135,6 +136,7 @@
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-+            c->chroma_dc_dequant_idct = ff_h264_chroma_dc_dequant_idct_daedalus;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
--
-2.47.3
-
@@ -1,245 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 14:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264qpel: route remaining qpel 8x8 positions through daedalus-fourier
-
-Closes the H.264 qpel substitution.  Extends 0007 (which routed only
-mc20 put_) to ALL 15 useful positions in BOTH the put_ and avg_
-tables, skipping mc00 (integer copy / pointer-only fast path).
-
-29 substitutions total: 14 new put_ + 15 avg_.  Each is a uniform
-wrapper around daedalus_recipe_dispatch_h264_qpel_{avg_,}mcXY exposed
-by daedalus-fourier PRs #15-#20.
-
-All recipe-table entries route AUTO to CPU NEON (no QPU shaders
-for any qpel position other than mc20 yet), so this is plumbing-only
-NEON-to-NEON — bit-exact against the in-tree ff_*_h264_qpel8_*_neon
-path.
-
-16x16 qpel tables ([0][...]) stay on the in-tree NEON.  daedalus
-only exposes 8x8 today; 16x16 substitution can land once fourier
-provides those variants (likely just dispatching the 8x8 path four
-times with shifted dst/src offsets).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc qpel buildout.
---
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c	2026-05-25 14:05:05.789298250 +0200
-+++ libavcodec/aarch64/h264_qpel_daedalus.c	2026-05-25 14:05:05.818358374 +0200
-@@ -1,10 +1,13 @@
- /*
- * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
- * — daedalus-fourier substitution shim.
-+ * H.264 luma qpel 8x8 — daedalus-fourier substitution shims (put_ + avg_).
-  *
- * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
- * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
- * ff_put_h264_qpel8_mc20_neon.  The recipe layer picks the substrate
-+ * Routes ALL 15 useful positions in H264QpelContext's 8x8 put_ and
-+ * avg_ tables through daedalus_recipe_dispatch_h264_qpel_mc{XY}
-+ * (skipping mc00 which is integer copy / FFmpeg's pointer-only fast
-+ * path).  Plumbing-only NEON-by-recipe — daedalus-fourier PRs #15-#20
-+ * exposed each variant via the same dispatch signature, so the
-+ * substitution is a uniform macro across put_/avg_ and across all
-+ * 15 mc positions.  The recipe layer picks the substrate
-  * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
-  * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
-  *
-@@ -48,3 +51,53 @@
-     daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
-                                             1, &meta);
- }
-+
-+
-+/* All other 8x8 qpel positions follow the same dispatch shape as mc20
-+ * above.  The macro collapses ~600 LOC of one-wrapper-per-variant
-+ * boilerplate (29 variants total: 14 put_ + 15 avg_). */
-+#define DEFINE_QPEL_WRAPPER(type, suffix, dispatch_fn)                          \
-+void ff_ ## type ## _h264_qpel8_ ## suffix ## _daedalus(uint8_t *dst,           \
-+    const uint8_t *src, ptrdiff_t stride);                                      \
-+void ff_ ## type ## _h264_qpel8_ ## suffix ## _daedalus(uint8_t *dst,           \
-+    const uint8_t *src, ptrdiff_t stride)                                       \
-+{                                                                               \
-+    static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 }; \
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);                         \
-+    dispatch_fn(g_dctx, dst, src, (size_t)stride, 1, &meta);                    \
-+}
-+
-+/* put_ variants (mc20 stays on the explicit definition above). */
-+DEFINE_QPEL_WRAPPER(put, mc10, daedalus_recipe_dispatch_h264_qpel_mc10)
-+DEFINE_QPEL_WRAPPER(put, mc30, daedalus_recipe_dispatch_h264_qpel_mc30)
-+DEFINE_QPEL_WRAPPER(put, mc01, daedalus_recipe_dispatch_h264_qpel_mc01)
-+DEFINE_QPEL_WRAPPER(put, mc11, daedalus_recipe_dispatch_h264_qpel_mc11)
-+DEFINE_QPEL_WRAPPER(put, mc21, daedalus_recipe_dispatch_h264_qpel_mc21)
-+DEFINE_QPEL_WRAPPER(put, mc31, daedalus_recipe_dispatch_h264_qpel_mc31)
-+DEFINE_QPEL_WRAPPER(put, mc02, daedalus_recipe_dispatch_h264_qpel_mc02)
-+DEFINE_QPEL_WRAPPER(put, mc12, daedalus_recipe_dispatch_h264_qpel_mc12)
-+DEFINE_QPEL_WRAPPER(put, mc22, daedalus_recipe_dispatch_h264_qpel_mc22)
-+DEFINE_QPEL_WRAPPER(put, mc32, daedalus_recipe_dispatch_h264_qpel_mc32)
-+DEFINE_QPEL_WRAPPER(put, mc03, daedalus_recipe_dispatch_h264_qpel_mc03)
-+DEFINE_QPEL_WRAPPER(put, mc13, daedalus_recipe_dispatch_h264_qpel_mc13)
-+DEFINE_QPEL_WRAPPER(put, mc23, daedalus_recipe_dispatch_h264_qpel_mc23)
-+DEFINE_QPEL_WRAPPER(put, mc33, daedalus_recipe_dispatch_h264_qpel_mc33)
-+
-+/* avg_ variants — all 15 useful positions. */
-+DEFINE_QPEL_WRAPPER(avg, mc10, daedalus_recipe_dispatch_h264_qpel_avg_mc10)
-+DEFINE_QPEL_WRAPPER(avg, mc20, daedalus_recipe_dispatch_h264_qpel_avg_mc20)
-+DEFINE_QPEL_WRAPPER(avg, mc30, daedalus_recipe_dispatch_h264_qpel_avg_mc30)
-+DEFINE_QPEL_WRAPPER(avg, mc01, daedalus_recipe_dispatch_h264_qpel_avg_mc01)
-+DEFINE_QPEL_WRAPPER(avg, mc11, daedalus_recipe_dispatch_h264_qpel_avg_mc11)
-+DEFINE_QPEL_WRAPPER(avg, mc21, daedalus_recipe_dispatch_h264_qpel_avg_mc21)
-+DEFINE_QPEL_WRAPPER(avg, mc31, daedalus_recipe_dispatch_h264_qpel_avg_mc31)
-+DEFINE_QPEL_WRAPPER(avg, mc02, daedalus_recipe_dispatch_h264_qpel_avg_mc02)
-+DEFINE_QPEL_WRAPPER(avg, mc12, daedalus_recipe_dispatch_h264_qpel_avg_mc12)
-+DEFINE_QPEL_WRAPPER(avg, mc22, daedalus_recipe_dispatch_h264_qpel_avg_mc22)
-+DEFINE_QPEL_WRAPPER(avg, mc32, daedalus_recipe_dispatch_h264_qpel_avg_mc32)
-+DEFINE_QPEL_WRAPPER(avg, mc03, daedalus_recipe_dispatch_h264_qpel_avg_mc03)
-+DEFINE_QPEL_WRAPPER(avg, mc13, daedalus_recipe_dispatch_h264_qpel_avg_mc13)
-+DEFINE_QPEL_WRAPPER(avg, mc23, daedalus_recipe_dispatch_h264_qpel_avg_mc23)
-+DEFINE_QPEL_WRAPPER(avg, mc33, daedalus_recipe_dispatch_h264_qpel_avg_mc33)
-+
-+#undef DEFINE_QPEL_WRAPPER
-diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c	2026-05-25 14:05:05.790403989 +0200
-+++ libavcodec/aarch64/h264qpel_init_aarch64.c	2026-05-25 14:05:05.819136071 +0200
-@@ -50,6 +50,64 @@
- void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-                                      ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc10_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc30_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc01_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc11_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc21_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc31_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc02_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc12_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc22_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc32_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc03_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc13_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc23_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc33_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc10_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc30_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc01_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc11_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc21_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc31_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc02_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc12_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc22_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc32_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc03_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc13_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc23_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc33_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
- void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-@@ -164,21 +222,21 @@
-         c->put_h264_qpel_pixels_tab[0][15] = ff_put_h264_qpel16_mc33_neon;
- 
-         c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
-        c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_daedalus;
-         c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
-        c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
-        c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
-        c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
-        c->put_h264_qpel_pixels_tab[1][ 6] = ff_put_h264_qpel8_mc21_neon;
-        c->put_h264_qpel_pixels_tab[1][ 7] = ff_put_h264_qpel8_mc31_neon;
-        c->put_h264_qpel_pixels_tab[1][ 8] = ff_put_h264_qpel8_mc02_neon;
-        c->put_h264_qpel_pixels_tab[1][ 9] = ff_put_h264_qpel8_mc12_neon;
-        c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_neon;
-        c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_neon;
-        c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_neon;
-        c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_neon;
-        c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_neon;
-        c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 6] = ff_put_h264_qpel8_mc21_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 7] = ff_put_h264_qpel8_mc31_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 8] = ff_put_h264_qpel8_mc02_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 9] = ff_put_h264_qpel8_mc12_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_daedalus;
- 
-         c->avg_h264_qpel_pixels_tab[0][ 0] = ff_avg_h264_qpel16_mc00_neon;
-         c->avg_h264_qpel_pixels_tab[0][ 1] = ff_avg_h264_qpel16_mc10_neon;
-@@ -198,21 +256,21 @@
-         c->avg_h264_qpel_pixels_tab[0][15] = ff_avg_h264_qpel16_mc33_neon;
- 
-         c->avg_h264_qpel_pixels_tab[1][ 0] = ff_avg_h264_qpel8_mc00_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 1] = ff_avg_h264_qpel8_mc10_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 2] = ff_avg_h264_qpel8_mc20_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 3] = ff_avg_h264_qpel8_mc30_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 4] = ff_avg_h264_qpel8_mc01_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 5] = ff_avg_h264_qpel8_mc11_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 6] = ff_avg_h264_qpel8_mc21_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 7] = ff_avg_h264_qpel8_mc31_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 8] = ff_avg_h264_qpel8_mc02_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 9] = ff_avg_h264_qpel8_mc12_neon;
-        c->avg_h264_qpel_pixels_tab[1][10] = ff_avg_h264_qpel8_mc22_neon;
-        c->avg_h264_qpel_pixels_tab[1][11] = ff_avg_h264_qpel8_mc32_neon;
-        c->avg_h264_qpel_pixels_tab[1][12] = ff_avg_h264_qpel8_mc03_neon;
-        c->avg_h264_qpel_pixels_tab[1][13] = ff_avg_h264_qpel8_mc13_neon;
-        c->avg_h264_qpel_pixels_tab[1][14] = ff_avg_h264_qpel8_mc23_neon;
-        c->avg_h264_qpel_pixels_tab[1][15] = ff_avg_h264_qpel8_mc33_neon;
-+        c->avg_h264_qpel_pixels_tab[1][ 1] = ff_avg_h264_qpel8_mc10_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 2] = ff_avg_h264_qpel8_mc20_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 3] = ff_avg_h264_qpel8_mc30_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 4] = ff_avg_h264_qpel8_mc01_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 5] = ff_avg_h264_qpel8_mc11_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 6] = ff_avg_h264_qpel8_mc21_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 7] = ff_avg_h264_qpel8_mc31_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 8] = ff_avg_h264_qpel8_mc02_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 9] = ff_avg_h264_qpel8_mc12_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][10] = ff_avg_h264_qpel8_mc22_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][11] = ff_avg_h264_qpel8_mc32_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][12] = ff_avg_h264_qpel8_mc03_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][13] = ff_avg_h264_qpel8_mc13_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][14] = ff_avg_h264_qpel8_mc23_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][15] = ff_avg_h264_qpel8_mc33_daedalus;
-     } else if (have_neon(cpu_flags) && bit_depth == 10) {
-         c->put_h264_qpel_pixels_tab[0][ 1] = ff_put_h264_qpel16_mc10_neon_10;
-         c->put_h264_qpel_pixels_tab[0][ 2] = ff_put_h264_qpel16_mc20_neon_10;
--
-2.47.3
-
@@ -1,120 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 14:30:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma intra deblock (4:2:0) through daedalus-fourier
-
-Substitutes c->v_loop_filter_chroma_intra and c->h_loop_filter_chroma_intra
-with daedalus wrappers in the bit_depth=8 / chroma_format_idc<=1 (4:2:0)
-branch.  4:2:2 stays on the in-tree NEON path (the daedalus chroma intra
-dispatch is 4:2:0-only).
-
-The fourier dispatches were exposed in PR #11 (DEFINE_INTRA_DISPATCH
-macro generates the public daedalus_dispatch_h264_deblock_chroma_*_intra
-symbols + recipe wrappers).
-
-Re-architects the chroma init: v_loop_filter_chroma_intra was previously
-assigned unconditionally to the NEON variant (which works for both 4:2:0
-and 4:2:2).  We now assign it INSIDE both branches of the chroma_format_idc
-conditional, with the 4:2:0 branch picking daedalus and the 4:2:2 branch
-keeping NEON.  No regression for 4:2:2 streams.
-
-Same NEON-to-NEON via recipe shape as 0010 luma intra.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc chroma intra.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 14:21:08.267156263 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 14:21:08.287745931 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h (inter+intra) deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -9,6 +9,8 @@
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-+ *        H264DSPContext.v_loop_filter_chroma_intra → daedalus_recipe_dispatch_h264_deblock_chroma_v_intra
-+ *        H264DSPContext.h_loop_filter_chroma_intra → daedalus_recipe_dispatch_h264_deblock_chroma_h_intra
-  *        H264DSPContext.chroma_dc_dequant_idct   → daedalus_h264_chroma_dc_hadamard_2x2 + caller-side qmul
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-@@ -61,6 +63,10 @@
-                                                 int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
- void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-@@ -218,3 +224,30 @@
-     block[stride*1 + xStride*0] = (int16_t)((int)dc[2] * qmul >> 7);
-     block[stride*1 + xStride*1] = (int16_t)((int)dc[3] * qmul >> 7);
- }
-+
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    /* tc0[] unused for intra (bS=4 hardcodes the strength). */
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_deblock_chroma_v_intra(g_dctx, pix, (size_t)stride,
-+                                                          1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_deblock_chroma_h_intra(g_dctx, pix, (size_t)stride,
-+                                                          1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 14:21:08.268311057 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 14:21:08.287886563 +0200
-@@ -42,6 +42,10 @@
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
- void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -133,14 +137,15 @@
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_daedalus;
- 
-         c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-        c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-             c->chroma_dc_dequant_idct = ff_h264_chroma_dc_dequant_idct_daedalus;
-+            c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_daedalus;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-            c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-+            c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_daedalus;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
-         } else {
-+            c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma422_neon;
-             c->h_loop_filter_chroma_mbaff = ff_h264_h_loop_filter_chroma_neon;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma422_intra_neon;
--
-2.47.3
-
@@ -1,85 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Mon, 25 May 2026 21:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264: use QPU-capable daedalus ctx (bench
- shows 4.30x faster on Pi 5)
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Patches 0003 (IDCT 4x4) and 0007 (qpel mc20) created the libavcodec.so
-process-global daedalus_ctx via daedalus_ctx_create_no_qpu().  Rationale
-at the time: cycle 6/9 had only CPU NEON paths, so a QPU-capable ctx
-would have meant pointless Vulkan init in every host process (firefox-
-fourier, mpv-fourier, daedalus_v4l2_daemon, ...).
-
-Two things changed since:
-
-  1. Every H.264 hot-path primitive now has a V3D7 compute shader.
-     IDCT 4x4/8x8 (cycles 6, 7), 8 deblock variants (luma+chroma x V+H
-     x inter+intra), 30 qpel positions (15 put_ + 15 avg_).  See
-     daedalus-fourier PRs #28-#35.
-
-  2. Dispatch overhead has been hammered down — buffer pool in
-     v3d_runner (daedalus-fourier task #160) plus persistent command
-     buffer (task #161).  daedalus-fourier PR #36 bench measures the
-     1080p worst-case sum on hertz (Pi 5 V3D 7.1, 30 iters x 5 warmup):
-
-       kernel             CPU ns/op  QPU ns/op  winner
-       IDCT 4x4 luma          10.79       2.47  QPU 4.36x
-       IDCT 8x8 luma          29.69       9.23  QPU 3.22x
-       Deblock luma_v         17.58      10.21  QPU 1.72x
-       Deblock luma_h         38.41       9.98  QPU 3.85x
-       qpel mc20 (8x8)        28.24       9.66  QPU 2.92x
-       qpel mc02 (8x8)        16.96      20.54  CPU 1.21x
-       qpel mc22 (8x8)        71.58       9.64  QPU 7.43x
-
-       1080p worst-case sum (IDCT4 + deblock luma + qpel mc22):
-         CPU NEON only:  5.57 ms
-         QPU only:       1.30 ms   (CPU/QPU sum ratio = 4.30x)
-
-PR #10's verdict (CPU 4x faster than QPU at IDCT) is reversed.  Switch
-the substitution context to daedalus_ctx_create() in both H.264 TUs
-(h264_idct_daedalus.c, h264_qpel_daedalus.c) so the recipe layer can
-actually route through the now-faster QPU path.
-
-daedalus_ctx_create() probes for a usable Vulkan device and falls back
-to no_qpu mode if unavailable, so this is safe on hosts without V3D
-(x86 reauktion build runners, debian-aarch64 builders without renderD,
-etc.).  Hosts WITH V3D (Pi 5 deployment targets) get the speedup.
-
-The remaining qpel mc02 anomaly (single-axis vertical filter, 1.21x
-CPU) is bench-flagged for a v2 shader follow-up; the recipe entry
-stays QPU since the policy decree (2026-05-23 substrate decree) holds
-and the gap is marginal.
-
-Refs reauktion/daedalus-fourier!36.
---
- libavcodec/aarch64/h264_idct_daedalus.c | 2 +-
- libavcodec/aarch64/h264_qpel_daedalus.c | 2 +-
- 2 files changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -32,7 +32,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create_no_qpu();
-+    g_dctx = daedalus_ctx_create();
- }
-
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -38,7 +38,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create_no_qpu();
-+    g_dctx = daedalus_ctx_create();
- }
-
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -1,73 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Mon, 25 May 2026 22:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264: revert ctx flip — daedalus-fourier PR
- #36 was a measurement artifact
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Reverts the daedalus_ctx_create_no_qpu() → daedalus_ctx_create() flip
-that landed in 0014-h264-ctx-qpu-capable.patch (marfrit-packages PR
-#104).  The flip was justified by daedalus-fourier PR #36 which
-reported a 4.30x QPU-over-CPU win on the 1080p H.264 hot-path sum.
-
-That number was a measurement artifact.  The bench tool's
-v3d_runner.read_spv() did a bare fopen() that resolved relative to
-cwd; when run from the source directory (as in PR #36), the SPVs at
-$builddir/v3d_*.spv were not found, every QPU dispatch returned -1
-fast, and the loop timed the failure path.  Daedalus-fourier PR #37
-fixes the SPV search + bench preflight; corrected numbers from hertz
-(Pi 5 V3D 7.1) show QPU is 12-77x SLOWER than CPU NEON at every
-H.264 hot-path kernel:
-
-  kernel             CPU ns/op  QPU ns/op  winner
-  IDCT 4x4 luma          10.75     217.63  CPU 20.24x
-  IDCT 8x8 luma          29.69     785.94  CPU 26.47x
-  Deblock luma_v         17.63     467.42  CPU 26.51x
-  Deblock luma_h         38.30     498.53  CPU 13.02x
-  qpel mc20 (8x8)        30.17    1300.44  CPU 43.10x
-  qpel mc02 (8x8)        17.69    1363.40  CPU 77.08x
-  qpel mc22 (8x8)        71.60    1948.37  CPU 27.21x
-
-  1080p sum: CPU 5.57 ms vs QPU 123.54 ms — QPU 22x slower.
-
-Until the daedalus QPU dispatch overhead is actually competitive (a
-multi-task effort tracked on the daedalus-fourier side), the
-libavcodec.so substitution must stay on daedalus_ctx_create_no_qpu()
-to avoid pessimizing every host process that loads it
-(firefox-fourier RDD, mpv-fourier, daedalus_v4l2_daemon).
-
-Both H.264 TUs (h264_idct_daedalus.c, h264_qpel_daedalus.c) are
-reverted; the change is a 2-line revert of patch 0014.
-
-Refs reauktion/daedalus-fourier!37 (the retraction PR).
---
- libavcodec/aarch64/h264_idct_daedalus.c | 2 +-
- libavcodec/aarch64/h264_qpel_daedalus.c | 2 +-
- 2 files changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -32,7 +32,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create();
-+    g_dctx = daedalus_ctx_create_no_qpu();
- }
-
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -38,7 +38,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create();
-+    g_dctx = daedalus_ctx_create_no_qpu();
- }
-
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -24,13 +24,8 @@ _srcname=FFmpeg
 _version='8.1'
 _commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935'  # v4l2-request-n8.1 tip 2026-04-24
 pkgver=8.1.r123329.b57fbbe
-pkgrel=12  # pkgrel=12 — REVERT pkgrel=11 ctx flip; daedalus-fourier PR #36 4.30x headline was measurement artifact (PR #37 corrects: QPU 22x SLOWER than CPU)
+pkgrel=5
 epoch=2
-
-# daedalus-fourier pin.  209a421 = PR #2 merge (Phase 8c — public API
-# gains daedalus_recipe_dispatch_h264_qpel_mc20 + DAEDALUS_KERNEL_H264_QPEL_MC20).
-# Cycle 9 closes the libavcodec.so substitution arc started at cycle 6.
-_daedalus_fourier_commit='b9f9ff2a89c068aea54dcb52b543afddad28311e'  # PR #25 — public chroma DC Hadamard symbol
 pkgdesc='FFmpeg with V4L2 Request API hwaccel (Rockchip / Allwinner stateless decode)'
 arch=('aarch64')
 url='https://github.com/Kwiboo/FFmpeg'
@@ -39,7 +34,6 @@ depends=(
  alsa-lib
  bzip2
  fontconfig
-  vulkan-icd-loader
  fribidi
  gmp
  gnutls
@@ -65,13 +59,10 @@ depends=(
  zlib
 )
 makedepends=(
-  cmake
  git
  linux-api-headers
  mesa
  nasm
-  ninja
-  vulkan-headers
 )
 provides=(
  libavcodec.so
@@ -87,23 +78,9 @@ provides=(
 conflicts=(ffmpeg)
 replaces=(ffmpeg ffmpeg-v4l2-request-git)
 source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
-        "daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz"
        '0001-libudev-bypass-fallback.patch'
-        '0002-nv15-to-p010-unpack.patch'
-        '0003-h264-idct4-daedalus-fourier.patch'
-        '0004-h264-idct8-daedalus-fourier.patch'
-        '0005-h264-deblock-luma-v-daedalus-fourier.patch'
-        '0006-h264-restore-low-delay.patch'
-        '0007-h264-qpel-mc20-daedalus-fourier.patch'
-        '0008-h264-deblock-luma-h-daedalus-fourier.patch'
-        '0009-h264-deblock-chroma-daedalus-fourier.patch'
-        '0010-h264-deblock-luma-intra-daedalus-fourier.patch'
-        '0011-h264-chroma-dc-hadamard-daedalus-fourier.patch'
-        '0012-h264-qpel-rest-daedalus-fourier.patch'
-        '0013-h264-deblock-chroma-intra-daedalus-fourier.patch'
-        '0014-h264-ctx-qpu-capable.patch'
-        '0015-h264-ctx-revert-to-no-qpu.patch')
-sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
+        '0002-nv15-to-p010-unpack.patch')
+sha256sums=('SKIP' 'SKIP' 'SKIP')

 pkgver() {
  cd "${_srcname}"
@@ -116,37 +93,9 @@ prepare() {
  cd "${_srcname}"
  patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
  patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
-  patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0005-h264-deblock-luma-v-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0006-h264-restore-low-delay.patch"
-  patch -Np1 -i "${srcdir}/0007-h264-qpel-mc20-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0008-h264-deblock-luma-h-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0009-h264-deblock-chroma-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0010-h264-deblock-luma-intra-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0011-h264-chroma-dc-hadamard-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0012-h264-qpel-rest-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0013-h264-deblock-chroma-intra-daedalus-fourier.patch"
-  patch -Np1 -i "${srcdir}/0014-h264-ctx-qpu-capable.patch"
-  patch -Np1 -i "${srcdir}/0015-h264-ctx-revert-to-no-qpu.patch"
 }

 build() {
-  # --- daedalus-fourier: build static .a with PIC, install to a
-  # per-build prefix; libavcodec.so links it into the shared object so
-  # H264DSPContext.idct_add (and follow-up kernels) dispatch through
-  # the daedalus recipe layer instead of the in-tree NEON .S code. ---
-  local _fourier_prefix="${srcdir}/fourier-prefix"
-  mkdir -p "${_fourier_prefix}"
-  pushd "${srcdir}"/daedalus-fourier >/dev/null
-  cmake -B build -G Ninja \
-    -DCMAKE_BUILD_TYPE=Release \
-    -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-    -DCMAKE_INSTALL_PREFIX="${_fourier_prefix}"
-  cmake --build build --target daedalus_core
-  cmake --install build
-  popd >/dev/null
-
  cd "${_srcname}"

  # FFmpeg's configure resolves the compiler via `which` and bakes the
@@ -198,9 +147,6 @@ build() {
    --enable-libx265 \
    --enable-libwebp \
    \
-    --extra-cflags="-I${_fourier_prefix}/include" \
-    --extra-ldflags="-L${_fourier_prefix}/lib" \
-    --extra-libs="-ldaedalus_core -lvulkan -lpthread" \
    --host-cflags='-fPIC'

  make
@@ -18,30 +18,27 @@ This patch adds a sibling init path, `InitV4L2RequestDecoder`, that:
  * looks up the codec via two complementary mechanisms libavcodec
    uses for v4l2_request:
      - **named codec** (`h264_v4l2request`, `vp8_v4l2request`, etc.):
-        the legacy AVCodec-per-hwaccel registration.
-      - **generic codec + hw_configs walk**: the modern hwaccel
-        registration. Accepts EITHER AV_HWDEVICE_TYPE_DRM (legacy
-        ffmpeg-v4l2-request-fork output prior to FFmpeg 7.1) OR
-        AV_HWDEVICE_TYPE_V4L2REQUEST (FFmpeg 7.1+ dedicated enum,
-        value 13 on Kwiboo's no-AMF tree, 14 on upstream-AMF tree).
-        Mozilla's bundled libavutil headers may not have the V4L2REQUEST
-        enumerator, so the test is on the integer value via `(int)cast`.
+        the legacy AVCodec-per-hwaccel registration. ALARM, Debian,
+        and most distros building with --enable-v4l2-request expose
+        this (avcodec_find_decoder_by_name lookup).
+      - **generic codec + AV_HWDEVICE_TYPE_DRM** in `hw_configs`:
+        the modern hwaccel registration on some upstream-only ffmpeg
+        builds.
    Probes named-codec first (explicit, portable) and falls back to
-    walking the generic codec's `hw_configs` for either device type;
-  * creates an hwdevice context bound to `/dev/dri/renderD128`. Uses
-    integer 13 (V4L2REQUEST as defined by Kwiboo's v4l2-request-n7.1.3
-    tree, what our libavcodec61-fourier emits) cast to enum
-    AVHWDeviceType for the av_hwdevice_ctx_create call;
+    walking the generic codec's `hw_configs` for the DRM device type;
+  * creates an `AV_HWDEVICE_TYPE_DRM` hwdevice context bound to
+    `/dev/dri/renderD128` via the new `av_hwdevice_ctx_create` wrapper
+    (patch 2/4) and attaches it to the codec context;
  * reuses the existing `ChooseV4L2PixelFormat` get-format callback
    (already returns `AV_PIX_FMT_DRM_PRIME`) and the existing
    `apply_cropping = 0` constraint.

 `InitV4L2RequestDecoder` is invoked **before** `InitV4L2Decoder` in
 `InitHWDecoderIfAllowed`. On Rockchip mainline it succeeds via either
-mechanism. On Pi4 / Mediatek / vendor-MPP-stateful boards neither
-mechanism is registered for the codec, the function bails out, and the
-existing stateful `InitV4L2Decoder` runs as before. No regression of
-stateful boards.
+mechanism (ALARM uses the named codec). On Pi4 / Mediatek /
+vendor-MPP-stateful boards neither mechanism is registered for the
+codec, the function bails out, and the existing stateful
+`InitV4L2Decoder` runs as before. No regression of stateful boards.

 `mDRMDeviceContext` is unconditionally `av_buffer_unref`'d in
 `ProcessShutdown` (no-op when null). Gated behind
@@ -49,8 +46,9 @@ stateful boards.

 Bug 1969297.

--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h	2026-05-21 04:57:59.570946601 +0000
-+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h	2026-05-21 04:57:59.876488776 +0000
+diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h
+--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h	2026-03-18 19:22:14.000000000 +0000
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h	2026-04-27 20:43:39.347992674 +0000
@@ -225,7 +225,12 @@
   bool IsLinuxHDR() const;
   MediaResult InitVAAPIDecoder();
@@ -75,8 +73,9 @@ Bug 1969297.
   // If video overlay is used we want to upload SW decoded frames to
   // DMABuf and present it as a external texture to rendering pipeline.
   bool mUploadSWDecodeToDMABuf = false;
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp	2026-05-21 04:57:59.566685221 +0000
-+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp	2026-05-21 04:58:00.136004159 +0000
+diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
+--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp	2026-04-27 16:09:10.000000000 +0200
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp	2026-04-29 00:10:00.098884335 +0200
@@ -403,6 +403,129 @@
   return NS_OK;
 }
@@ -91,7 +90,7 @@ Bug 1969297.
 +  }
 +  const char* drmDevice = "/dev/dri/renderD128";
 +  if (mLib->av_hwdevice_ctx_create(&mDRMDeviceContext,
-+                                   (enum AVHWDeviceType)13, drmDevice,
+                                   AV_HWDEVICE_TYPE_DRM, drmDevice,
 +                                   nullptr, 0) < 0) {
 +    FFMPEG_LOG("  av_hwdevice_ctx_create(DRM, %s) failed", drmDevice);
 +    return false;
@@ -144,7 +143,7 @@ Bug 1969297.
 +      for (int i = 0;; i++) {
 +        const AVCodecHWConfig* cfg = mLib->avcodec_get_hw_config(generic, i);
 +        if (!cfg) break;
-+        if (cfg->device_type == AV_HWDEVICE_TYPE_DRM || (int)cfg->device_type == 13 || (int)cfg->device_type == 14) {
+        if (cfg->device_type == AV_HWDEVICE_TYPE_DRM) {
 +          codec = generic;
 +          FFMPEG_LOG("  using generic codec %s with DRM hwaccel", codec->name);
 +          break;
@@ -21,32 +21,27 @@
 # Alternative: boltzmann via his subagent + marfrit-publish.

 pkgname=libva-v4l2-request-fourier
-epoch=1
 _upstreampkg=libva-v4l2-request

-# Pin the fork tip. c454618 = PR #16 merge "picture, request_pool:
-# transparent OUTPUT-pool resize on bitstream overrun (#15)" —
-# follow-up root-cause fix to #13/#14. On a mid-stream bitstream-
-# budget overrun (typical cause: SPS-driven resolution upshift in an
-# adaptive-bitrate stream), codec_store_buffer now snapshots the in-
-# flight surface's accumulated bytes, releases its OUTPUT pool slot,
-# calls request_pool_resize (STREAMOFF → REQBUFS(0) → S_FMT with
-# 2×sizeimage hint, capped at 1 GiB, page-aligned → CREATE_BUFS →
-# mmap → media_request_alloc → STREAMON), re-acquires a slot, re-
-# mirrors the surface's source_{data,size,request_fd}, restores the
-# bytes, and continues. The frame survives instead of being dropped
-# back to libavcodec for surface recreation. CAPTURE side untouched
-# (per-queue V4L2 streaming independence).
+# Pin the fork tip. de27e95 = "v4l2: log error_idx + failing ctrl id
+# on S_EXT_CTRLS failure" — Phase 8.13 diagnostic that surfaced the
+# real root cause of the libva→daedalus_v4l2 request-completion
+# timeout (turned out the EINVAL libva was logging was a harmless
+# H264/HEVC probe; actual VP9 stateless control SET worked all along).
 #
-# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
-# check floor (#13).
-_commit=c454618ae11addce2e17b560f4deeacbed067d98
+# Prior pin (7ac934e) was iter38b — fresnel-fourier multi-device probe
+# + MAX_PROFILES bounds-check fix. de27e95 adds the daedalus_v4l2
+# probe slot (b5b3acf), the meson option gate (2146341), and the
+# S_EXT_CTRLS diagnostic (de27e95 itself). Backward-compatible on
+# rkvdec / hantro / cedrus / rpi-hevc-dec hosts — daedalus probe is
+# off by default unless the kernel module is present.
+_commit=de27e95571b67ef34619c23a12db4698f9b3454e

 # Project version from meson.build (1.0.0) + commit count + short sha,
 # matching the ffmpeg-v4l2-request-fourier convention. Recomputed at
 # build time by pkgver() below; the static value here is a placeholder
 # so AUR-style consumers see something coherent before src/ exists.
-pkgver=1.0.0.r390.c454618
+pkgver=1.0.0.r376.de27e95
 pkgrel=1
 pkgdesc="VA-API backend for V4L2 stateless decoders (multiplanar fork — fourier umbrella)"
 arch=('aarch64')
@@ -29,7 +29,7 @@

 pkgbase=linux-pinetab2-danctnix-besser
 pkgver=7.0.danctnix1
-pkgrel=5
+pkgrel=6
 pkgdesc='PineTab2 (BESser bes2600 driver patchset, kernel-agent managed)'
 _srcname=linux-pinetab2
 _srctag=v${pkgver%.*}-${pkgver##*.}
@@ -68,7 +68,7 @@ b2sums=('3d9795083c8938f80f480de0d10bfd9c525640e59d5c7f22983de3f12ee42c84c31be90
        'SKIP'
        '71fe98221e802b315e54b4b10d3e8c8f376695a36bae3541d876e5776a37f3fa33c8f8dfa6e51fcbd6f5396add02e5166634165f2351836a0ea0453c172fe56c'
        'SKIP'
-        '50397711a6a3ba522283685a9e7397aeed6663f353f7cba214d4bb88bc98516065b2fca9a36ce13c52644617879f69f39c5305e86db5d9fb25c4dae5434eb9c4'
+        'eb179c03f35a4dbaec2e40036f0033ef04985bb6b14ab22419d68e5caaa5874f2ad14e158f7c5b05added97f60fecde8fb8b7f2a6ced33e031e37352fe776ca6'
        '656a998ab40cb85ee4c00f087b071a91632a6c091da2c84b0f74236b51d2dea6e9db6886625f80ad81dc249d8494ec47cd79d6dd9ea4f5e44f3cde857f861e10')

 export KBUILD_BUILD_HOST=archlinux
@@ -4,34 +4,174 @@ baseline:
  upstream_compat: linux-7.0
  url: https://codeberg.org/DanctNIX/linux-pinetab2
 cumulative:
-  b2sum: 50397711a6a3ba522283685a9e7397aeed6663f353f7cba214d4bb88bc98516065b2fca9a36ce13c52644617879f69f39c5305e86db5d9fb25c4dae5434eb9c4
+  b2sum: eb179c03f35a4dbaec2e40036f0033ef04985bb6b14ab22419d68e5caaa5874f2ad14e158f7c5b05added97f60fecde8fb8b7f2a6ced33e031e37352fe776ca6
  path: cumulative.patch
-  size: 162716
-generated_at: '2026-05-18T17:16:06.455474+00:00'
+  size: 279554
+generated_at: '2026-05-19T13:05:46.476359+00:00'
 host: ohm
 ka_promote_version: 1
 manifest:
  path: fleet/ohm.yaml
-  sha256: da59ac2c965e5ad9c5004a115b10a37abf47ed3ecc8b7f5ab426470d2ee7b442
+  sha256: 9ac04ddd3170418b7b2d2cf7b31ac225a31ed19be4f03e8477bf28b585bae257
 resolved_patches:
 - apply_order: 1
  from_series: true
-  include: driver/bes2600/cumulative-c5x-danctnix/0001-bes2600-besser-cumulative-series.patch
-  sha256: e477a170567487fef84fe13be5b0a1f0498247ff1f201000d0085a2e49ff9026
-  size: 148149
+  include: driver/bes2600/factory-series/0001-bes2600-use-request_firmware-for-factory.txt-read.patch
+  sha256: a1bc2d13b258709fa37c9ff428dfdc0659464b436470fa2ec69b07edf7592f6f
+  size: 5456
 - apply_order: 2
  from_series: true
-  include: driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5ghz-scan-and-allow-single-channel.patch
-  sha256: 31e67569e00daead0784214aced1e077d3270cf1407baa0b330d474e17ec3931
-  size: 7735
+  include: driver/bes2600/factory-series/0002-bes2600-default-STANDARD_FACTORY_EFUSE_FLAG-off-for-PineTab2.patch
+  sha256: 577d7024ce0b342c4381365872fc29e75a93427ad61223907fead8b829b5a86c
+  size: 3499
 - apply_order: 3
  from_series: true
-  include: arch/arm64/xor-neon-ffixed-x18-scs-build-fix-danctnix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch
-  sha256: a49c50f0ebffc499970c24908b832c3e61c96ed87de35b3a82178eff587f94f1
-  size: 1574
+  include: driver/bes2600/factory-thread-dev/0001-bes2600-thread-struct-device-through-factory-request_firmware.patch
+  sha256: e3fac725e6addc11147341836600c2c5cd0116abba960f34ba50bb8094581c75
+  size: 4406
 - apply_order: 4
+  from_series: true
+  include: driver/bes2600/pm-gate-on-handshake/0001-bes2600-gate-device-LP-mode-entry-on-successful-handshake.patch
+  sha256: 9842c0dd66f59fe28898041ba5a816be56965b0665f202410cd461c3e6565474
+  size: 3914
+- apply_order: 5
+  from_series: true
+  include: driver/bes2600/remove-chardev-user-interface/0001-bes2600-remove-userspace-dev-bes2600-character-device-interface.patch
+  sha256: c67d340ae5923aada613ea9c5133e3efa3aeb7986749f4bf3619d1752a1b61fb
+  size: 22445
+- apply_order: 6
+  from_series: true
+  include: driver/bes2600/enable-testmode/0001-bes2600-enable-CONFIG_BES2600_TESTMODE-by-default-fix-bitrot.patch
+  sha256: 5dee74e8753d332fd380882994ea43aa907d1ff97466b0c48aedf38d4076e446
+  size: 6152
+- apply_order: 7
+  from_series: true
+  include: driver/bes2600/tx-sdio-dma-oob-danctnix/0001-bes2600-bounce-SDIO-TX-buffers-to-avoid-DMA-OOB-read.patch
+  sha256: 0dce2fe35450b8376c2d2a7c007119f28c888c1c30b489a67841039caedeebfc
+  size: 4544
+- apply_order: 8
+  from_series: true
+  include: driver/bes2600/factory-drop-kernel-write-danctnix/0001-bes2600-drop-kernel_write-persistence-from-factory-cali-save.patch
+  sha256: a7995b38e210af16b73d284a58ab39b8aecac36ff4a671af3d894b1983f961b3
+  size: 5704
+- apply_order: 9
+  from_series: true
+  include: driver/bes2600/drop-dpd-file-paths-danctnix/0001-bes2600-drop-BES2600_WRITE_DPD_TO_FILE-kernel-file-paths.patch
+  sha256: 0cd8780c245c97c65e4845e42d712c6256a0449658641aea18e4c7d400f63e41
+  size: 9661
+- apply_order: 10
+  from_series: true
+  include: driver/bes2600/drop-orphan-file-io-danctnix/0001-bes2600-drop-orphan-DATA_DUMP_OBSERVE-and-access_file-IO.patch
+  sha256: fd8c297223e6a985c2898f919ae1ab27eb56ab44f09f44d84d75eb35a187527b
+  size: 5327
+- apply_order: 11
+  from_series: true
+  include: driver/bes2600/pm-timeout-silence-danctnix/0001-bes2600-demote-wait-pm-ind-timeout-from-bes_err-to-bes_devel.patch
+  sha256: 3a4fd3255facbcef0419e0e0332cb980316529aa5c225b35bcfd244a42736667
+  size: 2332
+- apply_order: 12
+  from_series: true
+  include: driver/bes2600/scan-defer-on-reject-danctnix/0001-bes2600-defer-scan-and-soften-WARN-on-firmware-reject.patch
+  sha256: 55e16c176bc147c371a20f57b3a57da38c719d3b42417e88f9de243e10102d35
+  size: 8393
+- apply_order: 13
+  from_series: true
+  include: driver/bes2600/scan-defer-backoff-tune-danctnix/0001-bes2600-widen-scan-defer-backoff-30s-and-decay-on-quiet.patch
+  sha256: 70a5b25baaf41c8090701b069c30cbad378883d828bdd06e4eb560a35bc077f1
+  size: 4924
+- apply_order: 14
+  from_series: true
+  include: driver/bes2600/lmac-recover-via-mmc-hw-reset-danctnix/0001-bes2600-recover-wedged-firmware-via-mmc_hw_reset-on-link-break.patch
+  sha256: 3decf33c9684b3aba64004d5ad97ae3d54e1d6dc176d0b0ae539036c65e6dc6c
+  size: 10604
+- apply_order: 15
+  from_series: true
+  include: driver/bes2600/lmac-recover-via-mmc-hw-reset-danctnix/0002-bes2600-handle-multi-function-SDIO-cards-in-mmc_hw_reset-bus_reset.patch
+  sha256: a1acfcc401afc699a9c3676b6df2ec0f092e78826a32616268f90b509d538e33
+  size: 3321
+- apply_order: 16
+  from_series: true
+  include: driver/bes2600/pm-state-resync-danctnix/0001-bes2600-gate-PM-indication-completion-on-pending-request-and-track-state.patch
+  sha256: 049cf3ff9c01fdd10ff73bd18497e14ef0cd8fd1a65486ba86fbc6c1935a5f8e
+  size: 10269
+- apply_order: 17
+  from_series: true
+  include: driver/bes2600/pm-wake-consume-state-danctnix/0001-bes2600-short-circuit-wake-handshake-when-chip-confirmed-ACTIVE.patch
+  sha256: c9d19a73816f4c82b418dcd18008176bbb0c49fd4138be53cad45ae142224112
+  size: 8100
+- apply_order: 18
+  from_series: true
+  include: driver/bes2600/pm-detect-firmware-unsupported-danctnix/0001-bes2600-self-detect-firmware-does-not-honor-PSM-skip-cycle.patch
+  sha256: 196dc9d51ffea268718a290d434b6237fb60119f10c2b050a58724c8a775c7a8
+  size: 9041
+- apply_order: 19
+  from_series: true
+  include: driver/bes2600/decrypt-storm-fast-recover-danctnix/0001-bes2600-pre-empt-AP-deauth-6-mac80211-reassoc-on-decrypt-fail-storm.patch
+  sha256: b57ed316005f402c95ccae8ab24ac761bdf34162d73f108f5790af8f8ad2d1fe
+  size: 9249
+- apply_order: 20
+  from_series: true
+  include: driver/bes2600/connection-loss-fast-recover-danctnix/0001-bes2600-bus_reset-on-connection-loss-storm-to-dodge-assoc-comeback-blackhole.patch
+  sha256: cd1eaff97c3f08c58e7b1588e19a12200e8bb2a1f39afe554284f1d818610a67
+  size: 12184
+- apply_order: 21
+  from_series: true
+  include: driver/bes2600/cw1200-fix-backports-danctnix/0001-bes2600-replace-atomic_add-with-atomic_inc-cw1200-backport.patch
+  sha256: 3876c9e512f556c7f2e8d4cfaba1d7df2945ee48af8edfab5f8d09d9de9adf23
+  size: 3080
+- apply_order: 22
+  from_series: true
+  include: driver/bes2600/cw1200-fix-backports-danctnix/0002-bes2600-fix-missing-destroy_workqueue-on-error-in-init_common.patch
+  sha256: 2b82ecb127748349780404479205b952337c244e715278e6d40471c6ecad7602
+  size: 2230
+- apply_order: 23
+  from_series: true
+  include: driver/bes2600/cw1200-fix-backports-danctnix/0003-bes2600-fix-concurrency-UAF-in-bes2600_hw_scan-and-sched_scan.patch
+  sha256: 4c1850ad003ddcac543d3d61edd15c18ccd0cc601367cf4c6dd31e1fbb39ab16
+  size: 4476
+- apply_order: 24
+  from_series: true
+  include: driver/bes2600/sdio-rx-no-relay-danctnix/0001-bes2600-drop-sdio_rx_work-relay-IRQ-bh-direct-no-relay-architecture.patch
+  sha256: f1182150c5893f2497f942900b34c9c4aeb8d5901d9786ae2753dcce38ed6c78
+  size: 19313
+- apply_order: 25
+  from_series: true
+  include: driver/bes2600/license-spdx-restore-attribution-danctnix/0001-bes2600-Patch-G-restore-SPDX-identifiers-ST-Ericsson-attribution.patch
+  sha256: 91dadab0b58f8b8ad2dca80fd04796d478ecb83ce94a1e4b6e97ef8634d97ef1
+  size: 41521
+- apply_order: 26
+  from_series: true
+  include: driver/bes2600/ba-lock-atomic-danctnix/0001-bes2600-Patch-D-atomicize-ba_lock-counters-drop-the-spinlock.patch
+  sha256: a5d4ed2bf545458a756e65670c7eed31997bd0be9262344a10313bee31ea4963
+  size: 11987
+- apply_order: 27
+  from_series: true
+  include: driver/bes2600/ps-state-lock-skip-pm-disabled-danctnix/0001-bes2600-Patch-E-skip-ps_state_lock-when-PSM-known-disabled.patch
+  sha256: 18040a563b37cc95c558703f01bfbf6b7fa23a52f2f4f0f8f1254ad4fa0fe0d6
+  size: 3396
+- apply_order: 28
+  from_series: true
+  include: driver/bes2600/rx-list-batch-delivery-danctnix/0001-bes2600-Patch-C2-replace-ieee80211_rx_irqsafe-with-ieee80211_rx_ni.patch
+  sha256: ffeffd085a9d052c126a717b845d50120ea302e76c12e53c0c3c891291cababf
+  size: 8377
+- apply_order: 29
+  from_series: true
+  include: driver/bes2600/bh-c-fossil-cleanup-danctnix/0001-bes2600-Patch-H-bh.c-hygiene-cleanup-drop-fossil-blocks-dead-stubs.patch
+  sha256: 8fb0c799e3a8ee5ad7bfb647fceaf370c6a1a5f24d8621776fd07bf18a976f81
+  size: 21082
+- apply_order: 30
+  from_series: true
+  include: driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch
+  sha256: 31e67569e00daead0784214aced1e077d3270cf1407baa0b330d474e17ec3931
+  size: 7735
+- apply_order: 31
+  from_series: true
+  include: arch/arm64/scs-arm-neon-build-fix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch
+  sha256: 105e32edc54743d8107c4dcd846833ae97d2df5f918aebc9fe3e67d6f23249cc
+  size: 1562
+- apply_order: 32
  from_series: true
  include: driver/bes2600/queue-pending-record-lock-bh-danctnix/0001-bes2600-take-pending-record-lock-with-bh.patch
-  sha256: 089862e5f6da5783ed0db979144e4fa07cff7f743809a0bebd715c75a3bb8eb5
-  size: 5258
+  sha256: e0894371c43f750590e1704ae3c77b27b6910548afa4a5b61ebc4d9919580ca2
+  size: 5270
 schema_version: 1
@@ -1,57 +0,0 @@
-From: claude-noether (on behalf of mfritsche)
-Date: 2026-05-19
-Subject: panvk: expose VK_KHR/EXT_robustness2 + nullDescriptor on Bifrost (PAN_ARCH 6/7)
-
-Without this, Mesa's Zink driver refuses to use PanVk-Bifrost as its Vulkan
-backend, falling back silently to llvmpipe (software rasterizer) for all
-GL-via-Zink on Bifrost SBCs. That defeats the entire purpose of having a
-Vulkan driver on Bifrost — GL acceleration via Zink is the most natural
-near-term consumer.
-
-panvk_vX_nir_lower_descriptors.c:1309 and panvk_vX_shader.c:1355 already
-plumb dev->vk.enabled_features.nullDescriptor arch-agnostically — the gate
-at panvk_vX_physical_device.c was set conservatively when Bifrost was
-unmaintained, not because of hardware incapability.
-
-iter1–7 of the panvk-bifrost campaign proved fundamental driver functions
-on Mali-G52 r1 MC1 (PAN_ARCH=7). This patch is the iter8 follow-up.
-
-robustBufferAccess2 and robustImageAccess2 are NOT flipped — they're
-independent rb2 features Zink doesn't require, gated differently
-(robustBufferAccess2 = PAN_ARCH >= 11, robustImageAccess2 = false), and
-out of scope for iter8.
-
---
- src/panfrost/vulkan/panvk_vX_physical_device.c | 6 +++---
- 1 file changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
-@@ -91,7 +91,7 @@ get_device_extensions(const struct panvk_physical_device *device,
-       .KHR_pipeline_binary = true,
-       .KHR_pipeline_executable_properties = true,
-       .KHR_pipeline_library = true,
-      .KHR_robustness2 = PAN_ARCH >= 10,
-+      .KHR_robustness2 = true,
-       .KHR_sampler_mirror_clamp_to_edge = true,
-       .KHR_sampler_ycbcr_conversion = true,
-       .KHR_separate_depth_stencil_layouts = true,
-@@ -168,7 +168,7 @@ get_device_extensions(const struct panvk_physical_device *device,
-       .EXT_queue_family_foreign = true,
-       .EXT_robustness = pan_arch(device->kmod.dev->props.gpu_id) >= 9,
-       .EXT_image_robustness = true,
-      .EXT_robustness2 = PAN_ARCH >= 10,
-+      .EXT_robustness2 = true,
-       .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
-       .EXT_scalar_block_layout = true,
-       .EXT_separate_stencil_usage = true,
-@@ -493,7 +493,7 @@ get_device_features(const struct panvk_physical_device *device,
-       /* VK_KHR_robustness2 */
-       .robustBufferAccess2 = PAN_ARCH >= 11,
-       .robustImageAccess2 = false,
-      .nullDescriptor = PAN_ARCH >= 10,
-+      .nullDescriptor = true,
-
-       /* VK_KHR_shader_clock */
-       .shaderSubgroupClock = device->kmod.dev->props.gpu_can_query_timestamp,
@@ -1,47 +0,0 @@
-From: claude-noether (on behalf of mfritsche)
-Date: 2026-05-20
-Subject: panvk: expose Vulkan 1.1 + 1.2 on Bifrost (PAN_ARCH 6/7)
-
-ANGLE (Chromium's GL stack) requires apiVersion >= 1.1 to initialize. Without
-this, Brave / Chromium's GPU process fails at GL info collection:
-
-  vk_renderer.cpp:2659 (initialize): ANGLE Requires a minimum Vulkan device
-                                     version of 1.1
-  Display::initialize error 0: Internal Vulkan error (-9): The requested
-                               version of Vulkan is not supported by the driver
-
-Stack-up with iter8's robustness2 patch enables ANGLE → PanVk-Bifrost →
-Skia (via --enable-features=Vulkan) on Bifrost SBCs.
-
-PanVk-Bifrost already supports the bulk of 1.1-promoted features as extensions
-(multiview, maintenance1-3, descriptor update template, 16-bit storage,
-descriptor update template, sampler ycbcr, variable pointers, etc. — all
-visible in iter0 vulkaninfo). The version bump primarily bundles them.
-
-Risk: Vulkan 1.1 has features beyond what iter1–7 exercised (protected memory,
-full subgroup ops). Specific app failures will be characterizable.
-
-1.2 is also flipped — Brave's Vulkan path may want descriptor indexing,
-buffer device address, etc. (all listed in iter0 vulkaninfo as supported
-extensions, just gated as 1.0-with-extensions, not 1.2-core).
-
---
- src/panfrost/vulkan/panvk_vX_physical_device.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
-@@ -38,8 +38,8 @@ get_device_extensions(const struct panvk_physical_device *device,
-                       struct vk_device_extension_table *ext)
- {
-    *ext = (struct vk_device_extension_table){
-      .KHR_8bit_storage = true,
-      .KHR_16bit_storage = true,
-      bool has_vk1_1 = PAN_ARCH >= 10;
-      bool has_vk1_2 = PAN_ARCH >= 10;
-+      .KHR_8bit_storage = true,
-+      .KHR_16bit_storage = true,
-+      bool has_vk1_1 = true;
-+      bool has_vk1_2 = true;
-       *ext = (struct vk_device_extension_table){
@@ -1,328 +0,0 @@
--- a/src/panfrost/vulkan/panvk_shader.h	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-20 18:52:53.312698258 +0200
-@@ -150,6 +150,10 @@
-    struct {
- #if PAN_ARCH < 9
-       int32_t raw_vertex_offset;
-+      uint32_t num_vertices;       /* iter13: XFB needs per-draw vertex count */
-+      /* aligned_u64 attribute below inserts the 4-byte alignment gap
-+       * after num_vertices automatically — no explicit pad needed. */
-+      aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
- #endif
-       int32_t first_vertex;
-       int32_t base_instance;
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 19:09:29.711145446 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 18:52:54.832720445 +0200
-@@ -169,6 +169,7 @@
-       .EXT_provoking_vertex = true,
-       .EXT_queue_family_foreign = true,
-       .EXT_robustness2 = true,
-+      .EXT_transform_feedback = PAN_ARCH < 9,   /* iter13: JM-class only for now */
-       .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
-       .EXT_scalar_block_layout = true,
-       .EXT_separate_stencil_usage = true,
-@@ -495,6 +496,10 @@
-       .robustImageAccess2 = false,
-       .nullDescriptor = true,
- 
-+      /* VK_EXT_transform_feedback (iter13) */
-+      .transformFeedback = PAN_ARCH < 9,
-+      .geometryStreams = false,
-+
-       /* VK_KHR_shader_clock */
-       .shaderSubgroupClock = device->kmod.dev->props.gpu_can_query_timestamp,
-       .shaderDeviceClock = device->kmod.dev->props.timestamp_device_coherent,
-@@ -1020,6 +1025,18 @@
-       .robustStorageBufferAccessSizeAlignment = 1,
-       .robustUniformBufferAccessSizeAlignment = 1,
- 
-+      /* VK_EXT_transform_feedback (iter13) */
-+      .maxTransformFeedbackStreams = 1,
-+      .maxTransformFeedbackBuffers = 4,
-+      .maxTransformFeedbackBufferSize = UINT32_MAX,
-+      .maxTransformFeedbackStreamDataSize = 512,
-+      .maxTransformFeedbackBufferDataSize = 512,
-+      .maxTransformFeedbackBufferDataStride = 2048,
-+      .transformFeedbackQueries = false,
-+      .transformFeedbackStreamsLinesTriangles = false,
-+      .transformFeedbackRasterizationStreamSelect = false,
-+      .transformFeedbackDraw = false,
-+
-       /* VK_EXT_shader_object */
-       /* We do not currently support VK_EXT_shader_object but this is used
-        * internally by vk_shader
--- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-20 18:52:56.556745611 +0200
-@@ -21,6 +21,7 @@
- #include "panvk_physical_device.h"
- #include "panvk_sampler.h"
- #include "panvk_shader.h"
-+#include "pan_nir.h"   /* iter13: pan_nir_lower_xfb */
- 
- #include "spirv/nir_spirv.h"
- #include "util/memstream.h"
-@@ -100,6 +101,20 @@
-    case nir_intrinsic_load_raw_vertex_offset_pan:
-       val = load_sysval(b, graphics, bit_size, vs.raw_vertex_offset);
-       break;
-+   case nir_intrinsic_load_num_vertices:    /* iter13: XFB index calc */
-+      val = load_sysval(b, graphics, bit_size, vs.num_vertices);
-+      break;
-+   case nir_intrinsic_load_xfb_address: {   /* iter13: XFB buffer N base address */
-+      unsigned idx = nir_intrinsic_base(intr);
-+      switch (idx) {
-+      case 0: val = load_sysval(b, graphics, bit_size, vs.xfb_address[0]); break;
-+      case 1: val = load_sysval(b, graphics, bit_size, vs.xfb_address[1]); break;
-+      case 2: val = load_sysval(b, graphics, bit_size, vs.xfb_address[2]); break;
-+      case 3: val = load_sysval(b, graphics, bit_size, vs.xfb_address[3]); break;
-+      default: return false;
-+      }
-+      break;
-+   }
-    case nir_intrinsic_load_layer_id:
-       assert(b->shader->info.stage == MESA_SHADER_FRAGMENT);
-       val = load_sysval(b, graphics, bit_size, layer_id);
-@@ -457,6 +472,7 @@
-             core_max_id);
- 
-    pan_preprocess_nir(nir, pdev->kmod.dev->props.gpu_id);
-+
- }
- 
- static void
-@@ -870,6 +886,18 @@
-             nir_var_shader_in | nir_var_shader_out, UINT32_MAX);
-    NIR_PASS(_, nir, nir_lower_io, nir_var_shader_in | nir_var_shader_out,
-             glsl_type_size, nir_lower_io_use_interpolated_input_intrinsics);
-+
-+#if PAN_ARCH < 9
-+   /* iter13: VK_EXT_transform_feedback — runs AFTER nir_lower_io so that
-+    * shader outputs are now store_output intrinsics that pan_nir_lower_xfb
-+    * can rewrite to nir_store_global+nir_load_xfb_address. */
-+   if (nir->info.stage == MESA_SHADER_VERTEX &&
-+       nir->info.has_transform_feedback_varyings) {
-+      NIR_PASS(_, nir, nir_opt_constant_folding);
-+      NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
-+      NIR_PASS(_, nir, pan_nir_lower_xfb);
-+   }
-+#endif
- }
- 
- static VkResult
-@@ -1288,6 +1316,9 @@
-       .view_mask = (state && state->rp) ? state->rp->view_mask : 0,
-       .robust2_modes = robust2_modes,
-       .robust_descriptors = dev->vk.enabled_features.nullDescriptor,
-+      /* iter13: XFB shaders must disable IDVS (matches Panfrost-Gallium). */
-+      .no_idvs = (info->stage == MESA_SHADER_VERTEX) &&
-+                 info->nir->info.has_transform_feedback_varyings,
-    };
- 
-    switch (info->stage) {
--- a/src/panfrost/vulkan/panvk_cmd_draw.h	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_cmd_draw.h	2026-05-20 18:52:57.748763011 +0200
-@@ -135,6 +135,19 @@
-    struct panvk_graphics_sysvals sysvals;
- 
- #if PAN_ARCH < 9
-+   /* iter13: VK_EXT_transform_feedback state (JM-class only for now). */
-+   struct {
-+      bool active;
-+      uint32_t buffer_count;
-+      struct {
-+         uint64_t addr;
-+         uint64_t offset;
-+         uint64_t size;
-+      } buffers[4];
-+   } xfb;
-+#endif
-+
-+#if PAN_ARCH < 9
-    struct panvk_shader_link link;
- #endif
- 
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-20 19:10:23.031919662 +0200
-@@ -10,6 +10,7 @@
- #include "panvk_entrypoints.h"
- 
- #include "pan_desc.h"
-+#include "pan_compiler.h"   /* PAN_SHADER_OOB_ADDRESS */
- #include "pan_util.h"
- 
- static void
-@@ -722,6 +723,35 @@
-    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.raw_vertex_offset,
-                   info->vertex.raw_offset);
-    set_gfx_sysval(cmdbuf, dirty_sysvals, layer_id, info->layer_id);
-+
-+   /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
-+    * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
-+   set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
-+   {
-+      const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
-+      /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
-+       * (= 1<<63). This is the Panfrost-Gallium memory-sink idiom — the
-+       * Bifrost MMU silently discards stores to this address, so a pipeline
-+       * with XFB outputs used in a non-XFB draw (or in an XFB draw with
-+       * fewer bound buffers than the shader declares) is safe instead of
-+       * faulting. See gallium/drivers/panfrost/pan_cmdstream.c PAN_SYSVAL_XFB. */
-+      uint64_t _xa0 = PAN_SHADER_OOB_ADDRESS, _xa1 = PAN_SHADER_OOB_ADDRESS,
-+               _xa2 = PAN_SHADER_OOB_ADDRESS, _xa3 = PAN_SHADER_OOB_ADDRESS;
-+      if (_gfx->xfb.active) {
-+         if (_gfx->xfb.buffer_count > 0 && _gfx->xfb.buffers[0].addr)
-+            _xa0 = _gfx->xfb.buffers[0].addr + _gfx->xfb.buffers[0].offset;
-+         if (_gfx->xfb.buffer_count > 1 && _gfx->xfb.buffers[1].addr)
-+            _xa1 = _gfx->xfb.buffers[1].addr + _gfx->xfb.buffers[1].offset;
-+         if (_gfx->xfb.buffer_count > 2 && _gfx->xfb.buffers[2].addr)
-+            _xa2 = _gfx->xfb.buffers[2].addr + _gfx->xfb.buffers[2].offset;
-+         if (_gfx->xfb.buffer_count > 3 && _gfx->xfb.buffers[3].addr)
-+            _xa3 = _gfx->xfb.buffers[3].addr + _gfx->xfb.buffers[3].offset;
-+      }
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[0], _xa0);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[1], _xa1);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[2], _xa2);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[3], _xa3);
-+   }
- #endif
- 
-    if (dyn_gfx_state_dirty(cmdbuf, CB_BLEND_CONSTANTS)) {
--- a/src/panfrost/vulkan/meson.build	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/meson.build	2026-05-20 18:53:04.484861338 +0200
-@@ -73,6 +73,7 @@
- jm_inc_dir = ['jm']
- jm_files = [
-   'jm/panvk_vX_bind_queue.c',
-+  'jm/panvk_vX_cmd_xfb.c',   # iter13
-   'jm/panvk_vX_cmd_buffer.c',
-   'jm/panvk_vX_cmd_dispatch.c',
-   'jm/panvk_vX_cmd_draw.c',
--- a/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-05-20 19:10:26.163965149 +0200
-@@ -473,5 +473,12 @@
- 
-    vk_command_buffer_begin(&cmdbuf->vk, pBeginInfo);
- 
-+#if PAN_ARCH < 9
-+   /* iter13: clear XFB state on Begin so a reused command buffer does not
-+    * inherit stale xfb.buffer_count / xfb.active / xfb.buffers[] from a
-+    * prior recording. */
-+   memset(&cmdbuf->state.gfx.xfb, 0, sizeof(cmdbuf->state.gfx.xfb));
-+#endif
-+
-    return VK_SUCCESS;
- }
--- a/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-18 12:50:53.067999996 +0200
-+++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-20 19:10:27.175979847 +0200
-@@ -0,0 +1,111 @@
-+/*
-+ * Copyright © 2026 mfritsche / claude-noether
-+ * SPDX-License-Identifier: MIT
-+ *
-+ * iter13: VK_EXT_transform_feedback command handlers for the JM
-+ * architecture path (Bifrost v6/v7 + Valhall-JM v9).
-+ *
-+ * The runtime contract:
-+ *   - vkCmdBindTransformFeedbackBuffersEXT: stash (gpu_addr, offset, size)
-+ *     for each slot into cmdbuf->state.gfx.xfb.buffers[].
-+ *   - vkCmdBeginTransformFeedbackEXT: set cmdbuf->state.gfx.xfb.active = true.
-+ *     Mark sysvals dirty so the next draw re-emits vs.xfb_address[].
-+ *   - vkCmdEndTransformFeedbackEXT: set active = false.
-+ *
-+ * Counter buffers (firstCounterBuffer/counterBufferCount/pCounterBuffers/
-+ * pCounterBufferOffsets) are accepted by API but ignored — v1 doesn't
-+ * support pause/resume. transformFeedbackDraw is advertised as false.
-+ *
-+ * Per-draw integration: jm/panvk_vX_cmd_draw.c reads cmdbuf->state.gfx.xfb
-+ * and populates vs.xfb_address[i] for shader use. The pan_nir_lower_xfb
-+ * pass in panvk_vX_shader.c emits nir_load_xfb_address(i) which lowers
-+ * (via panvk_vX_shader.c sysval handler) to a load from the per-draw
-+ * sysval push area.
-+ */
-+
-+#include "vk_log.h"
-+#include "util/log.h"
-+
-+#include "panvk_cmd_buffer.h"
-+#include "panvk_cmd_draw.h"
-+#include "panvk_buffer.h"
-+#include "panvk_entrypoints.h"
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstBinding,
-+   uint32_t bindingCount,
-+   const VkBuffer *pBuffers,
-+   const VkDeviceSize *pOffsets,
-+   const VkDeviceSize *pSizes)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   for (uint32_t i = 0; i < bindingCount; i++) {
-+      uint32_t slot = firstBinding + i;
-+      if (slot >= 4)
-+         continue;
-+
-+      VK_FROM_HANDLE(panvk_buffer, buf, pBuffers[i]);
-+      gfx->xfb.buffers[slot].addr = panvk_buffer_gpu_ptr(buf, 0);
-+      gfx->xfb.buffers[slot].offset = pOffsets[i];
-+      gfx->xfb.buffers[slot].size =
-+         (pSizes != NULL && pSizes[i] != VK_WHOLE_SIZE)
-+            ? pSizes[i]
-+            : (buf->vk.size - pOffsets[i]);
-+   }
-+
-+   if (firstBinding + bindingCount > gfx->xfb.buffer_count)
-+      gfx->xfb.buffer_count = firstBinding + bindingCount;
-+}
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdBeginTransformFeedbackEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstCounterBuffer,
-+   uint32_t counterBufferCount,
-+   const VkBuffer *pCounterBuffers,
-+   const VkDeviceSize *pCounterBufferOffsets)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   /* Counter buffers ignored in v1 — see VkPhysicalDeviceTransformFeedback
-+    * PropertiesEXT.transformFeedbackDraw = false in panvk_vX_physical_device.c.
-+    * App is spec-compliant if it does not pass counter buffers (which our
-+    * features advertisement allows), but warn loudly if it does so we do not
-+    * silently produce wrong capture state. */
-+   (void)firstCounterBuffer;
-+   (void)pCounterBufferOffsets;
-+   if (counterBufferCount > 0 && pCounterBuffers != NULL) {
-+      mesa_logw("panvk: CmdBeginTransformFeedbackEXT: counter buffers not "
-+                "implemented (transformFeedbackDraw=false); XFB resume will "
-+                "restart at buffer offset 0");
-+   }
-+
-+   gfx->xfb.active = true;
-+   /* Per-draw set_gfx_sysval picks up the change automatically — no
-+    * explicit dirty marking required (set_gfx_sysval uses memcmp +
-+    * BITSET to detect state diffs and re-emit sysvals). */
-+}
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdEndTransformFeedbackEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstCounterBuffer,
-+   uint32_t counterBufferCount,
-+   const VkBuffer *pCounterBuffers,
-+   const VkDeviceSize *pCounterBufferOffsets)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   (void)firstCounterBuffer;
-+   (void)counterBufferCount;
-+   (void)pCounterBuffers;
-+   (void)pCounterBufferOffsets;
-+
-+   gfx->xfb.active = false;
-+}
@@ -1,629 +0,0 @@
-diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
--- a/src/panfrost/vulkan/meson.build	2026-05-21 14:04:02.529474145 +0200
-+++ b/src/panfrost/vulkan/meson.build	2026-05-21 14:04:04.106755486 +0200
-@@ -123,6 +123,7 @@
-   'panvk_vX_nir_lower_input_attachment_loads.c',
-   'panvk_vX_sampler.c',
-   'panvk_vX_shader.c',
-+  'panvk_vX_xfb_lower.c',
-   sha1_h,
- ]
- 
-diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
--- a/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:02.525251986 +0200
-+++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:04.084251800 +0200
-@@ -154,6 +154,8 @@
-       /* aligned_u64 attribute below inserts the 4-byte alignment gap
-        * after num_vertices automatically — no explicit pad needed. */
-       aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
-+      uint32_t xfb_topology;       /* iter17: panvk_xfb_topology enum value */
-+      uint32_t xfb_output_count;   /* iter17: per-instance output verts after decomp */
- #endif
-       int32_t first_vertex;
-       int32_t base_instance;
-@@ -569,4 +571,76 @@
-    struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
-    struct panvk_shader **shader_out);
- 
-+
-+#if PAN_ARCH < 9
-+/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
-+ * we need to distinguish at shader runtime for XFB capture. LIST topologies
-+ * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
-+enum panvk_xfb_topology {
-+   PANVK_XFB_TOPO_LIST            = 0,
-+   PANVK_XFB_TOPO_LINE_STRIP      = 1,
-+   PANVK_XFB_TOPO_TRI_STRIP       = 2,
-+   PANVK_XFB_TOPO_TRI_FAN         = 3,
-+   PANVK_XFB_TOPO_LINE_LIST_ADJ   = 4,
-+   PANVK_XFB_TOPO_LINE_STRIP_ADJ  = 5,
-+   PANVK_XFB_TOPO_TRI_LIST_ADJ    = 6,
-+   PANVK_XFB_TOPO_TRI_STRIP_ADJ   = 7,
-+};
-+
-+#include "panvk_macros.h"
-+struct nir_shader;
-+bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
-+
-+/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
-+static inline uint32_t
-+panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
-+{
-+   switch (topo) {
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
-+      return PANVK_XFB_TOPO_LINE_STRIP;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
-+      return PANVK_XFB_TOPO_TRI_STRIP;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
-+      return PANVK_XFB_TOPO_TRI_FAN;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_LINE_LIST_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_TRI_LIST_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
-+   default:
-+      return PANVK_XFB_TOPO_LIST;
-+   }
-+}
-+
-+/* Compute the per-instance output vertex count for a given (topology, input count). */
-+static inline uint32_t
-+panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
-+{
-+   switch (topo) {
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
-+      return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
-+      return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
-+      return (input_count / 4u) * 2u;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
-+      return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
-+      return (input_count / 6u) * 3u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
-+      return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
-+   default:
-+      return input_count;  /* LIST topologies: 1:1 mapping */
-+   }
-+}
-+#endif
-+
-+
- #endif
-diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:02.528576354 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:04.091357598 +0200
-@@ -727,6 +727,20 @@
-    /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
-     * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
-    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
-+
-+   /* iter17: XFB primitive-decomposition sysvals.
-+    * xfb_topology = enum value for the current bound topology.
-+    * xfb_output_count = per-instance output vertex count after decomposition.
-+    * For LIST topologies, output_count == input vertex count and the shader
-+    * takes the iter13 single-store fast path. */
-+   {
-+      VkPrimitiveTopology vk_topo =
-+         cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
-+      uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
-+      uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
-+   }
-    {
-       const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
-       /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
-diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
--- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:02.527576494 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:04.098356619 +0200
-@@ -895,7 +895,10 @@
-        nir->info.has_transform_feedback_varyings) {
-       NIR_PASS(_, nir, nir_opt_constant_folding);
-       NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
-      NIR_PASS(_, nir, pan_nir_lower_xfb);
-+      /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
-+       * primitive decomposition for non-LIST topologies. Single-store LIST
-+       * fast path matches iter13 behavior. */
-+      NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
-    }
- #endif
- }
-diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
--- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c	1970-01-01 01:00:00.000000000 +0100
-+++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c	2026-05-21 14:04:04.115354242 +0200
-@@ -0,0 +1,486 @@
-+/*
-+ * Copyright © 2026 mfritsche / claude-noether
-+ * SPDX-License-Identifier: MIT
-+ *
-+ * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
-+ * primitive decomposition for transform_feedback on non-LIST topologies
-+ * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
-+ *
-+ * Approach: emit a topology dispatch at the start of each store_output
-+ * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
-+ * into per-topology emission logic. For each affected topology, the lowered
-+ * code emits guarded conditional stores — one per primitive this vertex
-+ * contributes to, computing the output buffer position via primitive index
-+ * and slot within the decomposed primitive.
-+ *
-+ * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
-+ * matches iter13's single-store behavior.
-+ *
-+ * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
-+ * as slot 2 — handled via a NIR loop bounded by num_vertices.
-+ *
-+ * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
-+ */
-+
-+#include "panvk_macros.h"
-+
-+#if PAN_ARCH < 9
-+
-+#include "panvk_shader.h"
-+
-+#include "compiler/nir/nir_builder.h"
-+#include "pan_nir.h"
-+
-+#include <vulkan/vulkan_core.h>
-+
-+/* ----- Address arithmetic ----- */
-+
-+static nir_def *
-+xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
-+               uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *byte_off = nir_iadd_imm(b,
-+      nir_imul_imm(b, out_idx, stride), offset_bytes);
-+   return nir_iadd(b, buf, nir_u2u64(b, byte_off));
-+}
-+
-+static void
-+emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
-+                nir_def *instance_id, nir_def *raw_vid, nir_def *value,
-+                uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *out_idx = nir_iadd(b,
-+      nir_imul(b, instance_id, output_count), raw_vid);
-+   nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+   nir_store_global(b, value, addr);
-+}
-+
-+static void
-+emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
-+                nir_def *instance_id, nir_def *eligible,
-+                nir_def *prim_idx, nir_def *slot,
-+                uint32_t verts_per_prim,
-+                nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_push_if(b, eligible);
-+   {
-+      nir_def *out_idx = nir_iadd(b,
-+         nir_imul(b, instance_id, output_count),
-+         nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
-+      nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+      nir_store_global(b, value, addr);
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* ----- Per-topology emission ----- */
-+
-+/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
-+static void
-+emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
-+               nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+               nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+
-+   /* Prim v, slot 0: v < N-2 */
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ult(b, v, Nm2),
-+      v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+
-+   /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *parity = nir_iand_imm(b, prim, 1u);
-+      nir_def *slot = nir_iadd_imm(b, parity, 1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, slot, 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *parity = nir_iand_imm(b, prim, 1u);
-+      nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, slot, 3, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
-+static void
-+emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
-+                nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+
-+   /* Prim v, slot 0: v < N-1 */
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ult(b, v, Nm1),
-+      v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+
-+   /* Prim v-1, slot 1: 1 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
-+ *   vertex v=0: contributes to ALL prims as slot 2 (loop required)
-+ *   vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
-+ *   vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
-+ */
-+static void
-+emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
-+             nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+             nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+
-+   /* Prim v-1, slot 0: 1 <= v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot 1: 2 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Central vertex (v == 0): loop over all prims, write to slot 2. */
-+   nir_push_if(b, nir_ieq_imm(b, v, 0));
-+   {
-+      nir_variable *p_var = nir_local_variable_create(b->impl,
-+         glsl_uint_type(), "fan_p");
-+      nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
-+      nir_push_loop(b);
-+      {
-+         nir_def *p = nir_load_var(b, p_var);
-+         nir_push_if(b, nir_uge(b, p, Nm2));
-+         {
-+            nir_jump(b, nir_jump_break);
-+         }
-+         nir_pop_if(b, NULL);
-+
-+         nir_def *out_idx = nir_iadd(b,
-+            nir_imul(b, instance_id, output_count),
-+            nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
-+         nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+         nir_store_global(b, value, addr);
-+
-+         nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
-+      }
-+      nir_pop_loop(b, NULL);
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
-+ *   v contributes if v%4 == 1: prim v/4 slot 0
-+ *   v contributes if v%4 == 2: prim v/4 slot 1
-+ */
-+static void
-+emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   (void)N; /* eligibility is mod-based, not range-based */
-+   nir_def *vmod4 = nir_iand_imm(b, v, 3u);
-+   nir_def *prim = nir_ushr_imm(b, v, 2);  /* v / 4 */
-+
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ieq_imm(b, vmod4, 1),
-+      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ieq_imm(b, vmod4, 2),
-+      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+}
-+
-+/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
-+ *   v contributes to prim v-1 slot 0 (1 <= v <= N-2)
-+ *   v contributes to prim v-2 slot 1 (2 <= v <= N-1)
-+ */
-+static void
-+emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                    nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                    nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+
-+   /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      (void)Nm2;
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
-+ *   v contributes if v%6 == 0: prim v/6 slot 0
-+ *   v contributes if v%6 == 2: prim v/6 slot 1
-+ *   v contributes if v%6 == 4: prim v/6 slot 2
-+ */
-+static void
-+emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                  nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                  nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   (void)N;
-+   nir_def *vmod6 = nir_umod_imm(b, v, 6);
-+   nir_def *prim = nir_udiv_imm(b, v, 6);
-+
-+   for (uint32_t slot = 0; slot < 3; slot++) {
-+      emit_prim_store(b, buf, output_count, instance_id,
-+         nir_ieq_imm(b, vmod6, slot * 2),
-+         prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
-+ *   even i: {2i, 2i+2, 2i+4}    (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
-+ *   odd  i: {2i, 2i+4, 2i+2}    (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
-+ *
-+ * Only EVEN input vertices contribute (since all output indices are 2*something).
-+ * For even input v:
-+ *   prim v/2 slot 0 (always, if v/2 < N/2-2)
-+ *   prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd   (when v >= 2)
-+ *   prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd   (when v >= 4)
-+ */
-+static void
-+emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   /* Bail for odd input vertices — they never contribute. */
-+   nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
-+   nir_push_if(b, v_is_even);
-+   {
-+      nir_def *N_half = nir_ushr_imm(b, N, 1);
-+      nir_def *max_prim = nir_iadd_imm(b, N_half, -2);  /* N/2 - 2 */
-+      nir_def *v_half = nir_ushr_imm(b, v, 1);
-+
-+      /* Prim v/2 slot 0: v/2 < N/2 - 2 */
-+      emit_prim_store(b, buf, output_count, instance_id,
-+         nir_ult(b, v_half, max_prim),
-+         v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+
-+      /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
-+      {
-+         nir_def *prim = nir_iadd_imm(b, v_half, -1);
-+         nir_def *parity = nir_iand_imm(b, prim, 1u);
-+         nir_def *slot = nir_iadd_imm(b, parity, 1);  /* even→1, odd→2 */
-+         nir_def *eligible = nir_iand(b,
-+            nir_uge(b, v, nir_imm_int(b, 2)),
-+            nir_ult(b, prim, max_prim));
-+         emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                         prim, slot, 3, value, stride, offset_bytes);
-+      }
-+
-+      /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
-+      {
-+         nir_def *prim = nir_iadd_imm(b, v_half, -2);
-+         nir_def *parity = nir_iand_imm(b, prim, 1u);
-+         nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);  /* even→2, odd→1 */
-+         nir_def *eligible = nir_iand(b,
-+            nir_uge(b, v, nir_imm_int(b, 4)),
-+            nir_ult(b, prim, max_prim));
-+         emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                         prim, slot, 3, value, stride, offset_bytes);
-+      }
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* ----- Main lowering: per store_output XFB channel ----- */
-+
-+static void
-+lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
-+                        unsigned channel_idx, unsigned num_components,
-+                        unsigned buffer, unsigned offset_words)
-+{
-+   assert(buffer < MAX_XFB_BUFFERS);
-+   assert(nir_intrinsic_component(intr) == 0);
-+
-+   uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
-+   assert(stride != 0);
-+   uint16_t offset_bytes = offset_words * 4;
-+
-+   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
-+   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
-+
-+   nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
-+   nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
-+   nir_def *N = nir_load_num_vertices(b);
-+   nir_def *v = nir_load_raw_vertex_id_pan(b);
-+   nir_def *instance = nir_load_instance_id(b);
-+   nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
-+
-+   nir_def *src = intr->src[0].ssa;
-+   nir_component_mask_t mask = nir_component_mask(num_components);
-+   nir_def *value = nir_channels(b, src, mask << channel_idx);
-+
-+   /* Topology dispatch ladder. LIST first (fast path). */
-+   nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
-+   {
-+      emit_list_store(b, buf, out_count, instance, v, value,
-+                      stride, offset_bytes);
-+   }
-+   nir_push_else(b, NULL);
-+   {
-+      /* iter17 Janet Finding 3: gate all non-LIST emission on
-+       * output_count > 0. For degenerate input counts (N < min required
-+       * for the topology), output_count is 0 and we must emit NO stores
-+       * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
-+       * eligibility predicates and we falsely fire stores. */
-+      nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
-+      {
-+      nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
-+      {
-+         emit_tri_strip(b, v, N, buf, out_count, instance, value,
-+                        stride, offset_bytes);
-+      }
-+      nir_push_else(b, NULL);
-+      {
-+         nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
-+         {
-+            emit_line_strip(b, v, N, buf, out_count, instance, value,
-+                            stride, offset_bytes);
-+         }
-+         nir_push_else(b, NULL);
-+         {
-+            nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
-+            {
-+               emit_tri_fan(b, v, N, buf, out_count, instance, value,
-+                            stride, offset_bytes);
-+            }
-+            nir_push_else(b, NULL);
-+            {
-+               nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
-+               {
-+                  emit_line_list_adj(b, v, N, buf, out_count, instance, value,
-+                                     stride, offset_bytes);
-+               }
-+               nir_push_else(b, NULL);
-+               {
-+                  nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
-+                  {
-+                     emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
-+                                         stride, offset_bytes);
-+                  }
-+                  nir_push_else(b, NULL);
-+                  {
-+                     nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
-+                     {
-+                        emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
-+                                          stride, offset_bytes);
-+                     }
-+                     nir_push_else(b, NULL);
-+                     {
-+                        /* TRI_STRIP_ADJ — last case */
-+                        emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
-+                                           stride, offset_bytes);
-+                     }
-+                     nir_pop_if(b, NULL);
-+                  }
-+                  nir_pop_if(b, NULL);
-+               }
-+               nir_pop_if(b, NULL);
-+            }
-+            nir_pop_if(b, NULL);
-+         }
-+         nir_pop_if(b, NULL);
-+      }
-+      nir_pop_if(b, NULL);
-+      }
-+      nir_pop_if(b, NULL);  /* Janet Finding 3: close output_count > 0 guard */
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
-+ * dispatch store_output through our topology-aware emission. */
-+static bool
-+lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
-+                 UNUSED void *data)
-+{
-+   if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
-+      b->cursor = nir_instr_remove(&intr->instr);
-+      nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
-+                               nir_load_raw_vertex_offset_pan(b));
-+      nir_def_rewrite_uses(&intr->def, repl);
-+      return true;
-+   }
-+
-+   if (intr->intrinsic != nir_intrinsic_store_output)
-+      return false;
-+
-+   bool progress = false;
-+   b->cursor = nir_before_instr(&intr->instr);
-+
-+   /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
-+    * Outer loop selects which annotation; inner picks which channel. */
-+   for (unsigned i = 0; i < 2; ++i) {
-+      nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
-+                         : nir_intrinsic_io_xfb(intr);
-+      for (unsigned j = 0; j < 2; ++j) {
-+         if (!xfb.out[j].num_components)
-+            continue;
-+         lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
-+                                 xfb.out[j].buffer, xfb.out[j].offset);
-+         progress = true;
-+      }
-+   }
-+
-+   if (progress)
-+      nir_instr_remove(&intr->instr);
-+   return progress;
-+}
-+
-+bool
-+panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
-+{
-+   return nir_shader_intrinsics_pass(
-+      nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
-+}
-+
-+#endif /* PAN_ARCH < 9 */
@@ -1,181 +0,0 @@
-# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
-#
-# mesa-panvk-bifrost-video — sibling of mesa-panvk-bifrost (r4) that adds
-# VK_KHR_video_decode_h264 on Mali Bifrost SBCs (PAN_ARCH 6/7) backed by
-# the SoC's V4L2-stateless hantro VPU (RK3566/RK3568).
-#
-# Campaign: ~/src/panvk-bifrost-video/ — Phase 4 byte-exact validated
-# 2026-05-21 (48/48 BBB display frames match ffmpeg+libva-v4l2-request-
-# fourier byte-for-byte on the same hantro). Phase 5 second-model review
-# completed; load-bearing findings (output_map OOB, static counter,
-# session_init unwind, probe_hantro gate) all applied.
-#
-# What it does (on top of r4):
-#   - 0001..0004: inherited from mesa-panvk-bifrost (robustness2/null-
-#     descriptor, vk1.1/1.2 advertisement, EXT_transform_feedback, XFB
-#     primitive decomposition) — symlinked from the r4 package directory
-#     so the patches don't drift between siblings.
-#   - 0005: VK_KHR_video_queue + VK_KHR_video_decode_queue +
-#     VK_KHR_video_decode_h264 backed by V4L2-stateless hantro.
-#     Touches 14 files in src/panfrost/vulkan/; full diff in
-#     0005-panvk-bifrost-video-KHR-video-decode-h264.patch.
-#
-# Co-existence:
-#   - Installs to /usr/lib/panvk-bifrost-video/ (parallel to r4's
-#     /usr/lib/panvk-bifrost/). Pick at runtime via VK_ICD_FILENAMES.
-#   - r4 stays the recommended default for the Chromium-GPU-process
-#     consumer (no video needed there). Use this package when the
-#     consumer wants Vulkan video decode (mpv-fourier, ffmpeg-vulkan,
-#     future Chromium-VulkanVideoDecoder).
-#
-# Phase 1 limitations to know about (documented in source comments):
-#   - Single video session per device (active_video singleton)
-#   - Synchronous decode at record time — no pipelining yet
-#   - Hardcoded /dev/video1 + /dev/media0 (matches RK3566/68, blocks
-#     other SoCs without a topology-walk port)
-#   - Bitstream source buffer assumed HOST_VISIBLE (true on panvk-
-#     bifrost, would need fallback on other backends)
-#
-# Build target: arch-aarch64 runner via marfrit-packages Gitea Actions.
-# Mesa build is slow (~30-60min on Cortex-A55).
-
-pkgname=mesa-panvk-bifrost-video
-_mesaver=26.0.6
-pkgver=26.0.6.r5.video1
-pkgrel=1
-pkgdesc="Patched Mesa libvulkan_panfrost.so adding VK_KHR_video_decode_h264 on Bifrost SBCs (sibling of mesa-panvk-bifrost-r4)"
-arch=('aarch64')
-url="https://git.reauktion.de/marfrit/panvk-bifrost"
-license=('MIT')
-
-depends=(
-    'mesa'              # for shared mesa runtime libs
-    'libdrm'
-    'wayland'
-    'libxcb'
-    'libx11'
-    'libxshmfence'
-    'zlib'
-    'zstd'
-    'libelf'
-    'libffi'
-    'expat'
-    'llvm-libs'
-    'lm_sensors'
-)
-makedepends=(
-    'meson'
-    'ninja'
-    'glslang'
-    'python-mako'
-    'python-packaging'
-    'wayland-protocols'
-    'libxrandr'
-    'xorgproto'
-    'libdrm'
-    'llvm'
-    'libclc'
-    'spirv-llvm-translator'
-    'spirv-tools'
-    'rust-bindgen'
-    'patch'
-)
-
-source=(
-    "https://archive.mesa3d.org/mesa-${_mesaver}.tar.xz"
-    "0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
-    "0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
-    "0003-panvk-bifrost-vk-ext-transform-feedback.patch"
-    "0004-panvk-bifrost-xfb-primitive-decomposition.patch"
-    "0005-panvk-bifrost-video-KHR-video-decode-h264.patch"
-    "icd.json"
-)
-# Mesa tarball checksum matches the sibling r4 package — same upstream version.
-sha256sums=(
-    'SKIP'  # mesa tarball — co-trust w/ r4 sibling
-    'SKIP'  # patches are local
-    'SKIP'
-    'SKIP'
-    'SKIP'
-    'SKIP'
-    'SKIP'  # icd.json
-)
-
-prepare() {
-    cd "mesa-${_mesaver}"
-
-    # r1+r2: small sed-based edits inherited from r4 (verbatim from the
-    # sibling PKGBUILD — keep in sync).
-    sed -i 's|\.KHR_robustness2 = PAN_ARCH >= 10,|.KHR_robustness2 = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
-    sed -i 's|\.EXT_robustness2 = PAN_ARCH >= 10,|.EXT_robustness2 = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
-    sed -i 's|\.nullDescriptor = PAN_ARCH >= 10,|.nullDescriptor = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
-    sed -i 's|bool has_vk1_1 = PAN_ARCH >= 10;|bool has_vk1_1 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c
-    sed -i 's|bool has_vk1_2 = PAN_ARCH >= 10;|bool has_vk1_2 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c
-
-    # r3: EXT_transform_feedback for Bifrost.
-    patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
-
-    # r4: XFB primitive decomposition NIR pass.
-    patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
-
-    # video: VK_KHR_video_decode_h264 via V4L2-hantro.
-    patch -p1 < "${srcdir}/0005-panvk-bifrost-video-KHR-video-decode-h264.patch"
-
-    # Sanity-check r1..r4 (inherited).
-    grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "nullDescriptor = true," src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "has_vk1_1 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
-    test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
-    grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
-    test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
-
-    # Sanity-check video patch landed.
-    grep -q "KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro()" \
-        src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "PANVK_QUEUE_FAMILY_VIDEO_DECODE" src/panfrost/vulkan/panvk_device.h
-    test -f src/panfrost/vulkan/panvk_video_decode.c
-    test -f src/panfrost/vulkan/panvk_video_decode.h
-    test -f src/panfrost/vulkan/panvk_v4l2.c
-    test -f src/panfrost/vulkan/panvk_v4l2_h264.c
-    test -f src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c
-    test -f src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h
-    grep -q "panvk_v4l2_h264_slice_header.c" src/panfrost/vulkan/meson.build
-    grep -q "panvk_video_queue_submit_noop" src/panfrost/vulkan/panvk_vX_device.c
-}
-
-build() {
-    cd "mesa-${_mesaver}"
-    # Mirror r4's narrow build profile.
-    meson setup build/ \
-        --prefix=/usr \
-        --libdir=lib \
-        --buildtype=release \
-        -Dvulkan-drivers=panfrost \
-        -Dgallium-drivers= \
-        -Dplatforms=wayland,x11 \
-        -Dglx=disabled \
-        -Degl=disabled \
-        -Dgles1=disabled \
-        -Dgles2=disabled \
-        -Dvulkan-layers= \
-        -Dtools= \
-        -Dgallium-rusticl=false \
-        -Dmicrosoft-clc=disabled
-    meson compile -C build
-}
-
-package() {
-    cd "${srcdir}/mesa-${_mesaver}"
-
-    # Co-install path — parallel to r4's /usr/lib/panvk-bifrost/.
-    install -Dm755 build/src/panfrost/vulkan/libvulkan_panfrost.so \
-        "$pkgdir/usr/lib/panvk-bifrost-video/libvulkan_panfrost.so"
-
-    # ICD JSON pointing at the video build. Opt-in via VK_ICD_FILENAMES;
-    # NOT in /usr/share/vulkan/icd.d/ so it doesn't override stock or r4.
-    install -Dm644 "$srcdir/icd.json" \
-        "$pkgdir/usr/lib/panvk-bifrost-video/icd.json"
-}
@@ -1,40 +0,0 @@
-# mesa-panvk-bifrost-video
-
-Patched Mesa `libvulkan_panfrost.so` that **adds `VK_KHR_video_decode_h264`** on Mali Bifrost SBCs (PAN_ARCH 6/7, RK3566/RK3568 class hardware), backed by the SoC's V4L2-stateless **hantro** VPU.
-
-This is a **sibling** of [mesa-panvk-bifrost](../mesa-panvk-bifrost/) (the r4 package that exposes Bifrost to Chromium's Vulkan compositor). Pick this one when the consumer wants Vulkan **video decode** in addition; pick r4 for compositor-only.
-
-## Status
-
-Phase 4 byte-exact validated 2026-05-21: 48/48 unique BBB display frames decoded by this package are byte-identical to `ffmpeg+libva-v4l2-request-fourier` running on the same hantro hardware. Phase 5 second-model review completed; all load-bearing findings addressed. First publish via marfrit-packages CI 2026-05-22 (PR #79 merge did not auto-fire Actions; this re-trigger restores the standard build/sign/publish path).
-
-## How to use
-
-```sh
-# Co-installs alongside r4 and stock mesa.
-sudo pacman -S mesa-panvk-bifrost-video
-
-# Opt in (not on the default loader search path).
-export VK_ICD_FILENAMES=/usr/lib/panvk-bifrost-video/icd.json
-export PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1   # mesa-upstream gate
-
-# Run a Vulkan video consumer.
-vulkan-video-dec-simple-test -i your.h264 --codec h264 --noPresent --maxFrameCount 50
-# or
-ffmpeg -hwaccel vulkan -i your.mp4 ...
-```
-
-## Phase 1 limitations
-
-Documented in source comments and worth knowing before relying on this in production:
-
- **Single video session per device.** Concurrent `VkVideoSessionKHR` on the same device clobber each other (`active_video` singleton). Sufficient for current single-stream consumers.
- **Synchronous decode at record time.** The full V4L2 ioctl dance runs to completion inside `vkCmdDecodeVideoKHR`. No pipelining. Throughput is bounded by hantro's ~1.16× realtime on 1080p H.264.
- **Hardcoded `/dev/video1` + `/dev/media0`.** Matches RK3566/68 but won't work on other SoCs without a topology-walk port (see `libva-v4l2-request-fourier` for the full version).
- **Bitstream source buffer assumed HOST_VISIBLE.** True on panvk-bifrost (no DEVICE_LOCAL-only memory types exist), but the code silently skips decode if the app bound the buffer to non-host-visible memory.
-
-## Co-existence
-
- Installs to `/usr/lib/panvk-bifrost-video/` — parallel to r4's `/usr/lib/panvk-bifrost/` and stock `/usr/lib/`.
- Opt-in via `VK_ICD_FILENAMES`; does NOT register itself in `/usr/share/vulkan/icd.d/`.
- Three drivers coexist without conflict; the user picks at runtime which to use.
@@ -1,7 +0,0 @@
-{
-    "ICD": {
-        "api_version": "1.4.335",
-        "library_path": "/usr/lib/panvk-bifrost-video/libvulkan_panfrost.so"
-    },
-    "file_format_version": "1.0.1"
-}
@@ -1,328 +0,0 @@
--- a/src/panfrost/vulkan/panvk_shader.h	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-20 18:52:53.312698258 +0200
-@@ -150,6 +150,10 @@
-    struct {
- #if PAN_ARCH < 9
-       int32_t raw_vertex_offset;
-+      uint32_t num_vertices;       /* iter13: XFB needs per-draw vertex count */
-+      /* aligned_u64 attribute below inserts the 4-byte alignment gap
-+       * after num_vertices automatically — no explicit pad needed. */
-+      aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
- #endif
-       int32_t first_vertex;
-       int32_t base_instance;
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 19:09:29.711145446 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 18:52:54.832720445 +0200
-@@ -169,6 +169,7 @@
-       .EXT_provoking_vertex = true,
-       .EXT_queue_family_foreign = true,
-       .EXT_robustness2 = true,
-+      .EXT_transform_feedback = PAN_ARCH < 9,   /* iter13: JM-class only for now */
-       .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
-       .EXT_scalar_block_layout = true,
-       .EXT_separate_stencil_usage = true,
-@@ -495,6 +496,10 @@
-       .robustImageAccess2 = false,
-       .nullDescriptor = true,
- 
-+      /* VK_EXT_transform_feedback (iter13) */
-+      .transformFeedback = PAN_ARCH < 9,
-+      .geometryStreams = false,
-+
-       /* VK_KHR_shader_clock */
-       .shaderSubgroupClock = device->kmod.dev->props.gpu_can_query_timestamp,
-       .shaderDeviceClock = device->kmod.dev->props.timestamp_device_coherent,
-@@ -1020,6 +1025,18 @@
-       .robustStorageBufferAccessSizeAlignment = 1,
-       .robustUniformBufferAccessSizeAlignment = 1,
- 
-+      /* VK_EXT_transform_feedback (iter13) */
-+      .maxTransformFeedbackStreams = 1,
-+      .maxTransformFeedbackBuffers = 4,
-+      .maxTransformFeedbackBufferSize = UINT32_MAX,
-+      .maxTransformFeedbackStreamDataSize = 512,
-+      .maxTransformFeedbackBufferDataSize = 512,
-+      .maxTransformFeedbackBufferDataStride = 2048,
-+      .transformFeedbackQueries = false,
-+      .transformFeedbackStreamsLinesTriangles = false,
-+      .transformFeedbackRasterizationStreamSelect = false,
-+      .transformFeedbackDraw = false,
-+
-       /* VK_EXT_shader_object */
-       /* We do not currently support VK_EXT_shader_object but this is used
-        * internally by vk_shader
--- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-20 18:52:56.556745611 +0200
-@@ -21,6 +21,7 @@
- #include "panvk_physical_device.h"
- #include "panvk_sampler.h"
- #include "panvk_shader.h"
-+#include "pan_nir.h"   /* iter13: pan_nir_lower_xfb */
- 
- #include "spirv/nir_spirv.h"
- #include "util/memstream.h"
-@@ -100,6 +101,20 @@
-    case nir_intrinsic_load_raw_vertex_offset_pan:
-       val = load_sysval(b, graphics, bit_size, vs.raw_vertex_offset);
-       break;
-+   case nir_intrinsic_load_num_vertices:    /* iter13: XFB index calc */
-+      val = load_sysval(b, graphics, bit_size, vs.num_vertices);
-+      break;
-+   case nir_intrinsic_load_xfb_address: {   /* iter13: XFB buffer N base address */
-+      unsigned idx = nir_intrinsic_base(intr);
-+      switch (idx) {
-+      case 0: val = load_sysval(b, graphics, bit_size, vs.xfb_address[0]); break;
-+      case 1: val = load_sysval(b, graphics, bit_size, vs.xfb_address[1]); break;
-+      case 2: val = load_sysval(b, graphics, bit_size, vs.xfb_address[2]); break;
-+      case 3: val = load_sysval(b, graphics, bit_size, vs.xfb_address[3]); break;
-+      default: return false;
-+      }
-+      break;
-+   }
-    case nir_intrinsic_load_layer_id:
-       assert(b->shader->info.stage == MESA_SHADER_FRAGMENT);
-       val = load_sysval(b, graphics, bit_size, layer_id);
-@@ -457,6 +472,7 @@
-             core_max_id);
- 
-    pan_preprocess_nir(nir, pdev->kmod.dev->props.gpu_id);
-+
- }
- 
- static void
-@@ -870,6 +886,18 @@
-             nir_var_shader_in | nir_var_shader_out, UINT32_MAX);
-    NIR_PASS(_, nir, nir_lower_io, nir_var_shader_in | nir_var_shader_out,
-             glsl_type_size, nir_lower_io_use_interpolated_input_intrinsics);
-+
-+#if PAN_ARCH < 9
-+   /* iter13: VK_EXT_transform_feedback — runs AFTER nir_lower_io so that
-+    * shader outputs are now store_output intrinsics that pan_nir_lower_xfb
-+    * can rewrite to nir_store_global+nir_load_xfb_address. */
-+   if (nir->info.stage == MESA_SHADER_VERTEX &&
-+       nir->info.has_transform_feedback_varyings) {
-+      NIR_PASS(_, nir, nir_opt_constant_folding);
-+      NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
-+      NIR_PASS(_, nir, pan_nir_lower_xfb);
-+   }
-+#endif
- }
- 
- static VkResult
-@@ -1288,6 +1316,9 @@
-       .view_mask = (state && state->rp) ? state->rp->view_mask : 0,
-       .robust2_modes = robust2_modes,
-       .robust_descriptors = dev->vk.enabled_features.nullDescriptor,
-+      /* iter13: XFB shaders must disable IDVS (matches Panfrost-Gallium). */
-+      .no_idvs = (info->stage == MESA_SHADER_VERTEX) &&
-+                 info->nir->info.has_transform_feedback_varyings,
-    };
- 
-    switch (info->stage) {
--- a/src/panfrost/vulkan/panvk_cmd_draw.h	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_cmd_draw.h	2026-05-20 18:52:57.748763011 +0200
-@@ -135,6 +135,19 @@
-    struct panvk_graphics_sysvals sysvals;
- 
- #if PAN_ARCH < 9
-+   /* iter13: VK_EXT_transform_feedback state (JM-class only for now). */
-+   struct {
-+      bool active;
-+      uint32_t buffer_count;
-+      struct {
-+         uint64_t addr;
-+         uint64_t offset;
-+         uint64_t size;
-+      } buffers[4];
-+   } xfb;
-+#endif
-+
-+#if PAN_ARCH < 9
-    struct panvk_shader_link link;
- #endif
- 
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-20 19:10:23.031919662 +0200
-@@ -10,6 +10,7 @@
- #include "panvk_entrypoints.h"
- 
- #include "pan_desc.h"
-+#include "pan_compiler.h"   /* PAN_SHADER_OOB_ADDRESS */
- #include "pan_util.h"
- 
- static void
-@@ -722,6 +723,35 @@
-    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.raw_vertex_offset,
-                   info->vertex.raw_offset);
-    set_gfx_sysval(cmdbuf, dirty_sysvals, layer_id, info->layer_id);
-+
-+   /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
-+    * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
-+   set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
-+   {
-+      const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
-+      /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
-+       * (= 1<<63). This is the Panfrost-Gallium memory-sink idiom — the
-+       * Bifrost MMU silently discards stores to this address, so a pipeline
-+       * with XFB outputs used in a non-XFB draw (or in an XFB draw with
-+       * fewer bound buffers than the shader declares) is safe instead of
-+       * faulting. See gallium/drivers/panfrost/pan_cmdstream.c PAN_SYSVAL_XFB. */
-+      uint64_t _xa0 = PAN_SHADER_OOB_ADDRESS, _xa1 = PAN_SHADER_OOB_ADDRESS,
-+               _xa2 = PAN_SHADER_OOB_ADDRESS, _xa3 = PAN_SHADER_OOB_ADDRESS;
-+      if (_gfx->xfb.active) {
-+         if (_gfx->xfb.buffer_count > 0 && _gfx->xfb.buffers[0].addr)
-+            _xa0 = _gfx->xfb.buffers[0].addr + _gfx->xfb.buffers[0].offset;
-+         if (_gfx->xfb.buffer_count > 1 && _gfx->xfb.buffers[1].addr)
-+            _xa1 = _gfx->xfb.buffers[1].addr + _gfx->xfb.buffers[1].offset;
-+         if (_gfx->xfb.buffer_count > 2 && _gfx->xfb.buffers[2].addr)
-+            _xa2 = _gfx->xfb.buffers[2].addr + _gfx->xfb.buffers[2].offset;
-+         if (_gfx->xfb.buffer_count > 3 && _gfx->xfb.buffers[3].addr)
-+            _xa3 = _gfx->xfb.buffers[3].addr + _gfx->xfb.buffers[3].offset;
-+      }
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[0], _xa0);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[1], _xa1);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[2], _xa2);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[3], _xa3);
-+   }
- #endif
- 
-    if (dyn_gfx_state_dirty(cmdbuf, CB_BLEND_CONSTANTS)) {
--- a/src/panfrost/vulkan/meson.build	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/meson.build	2026-05-20 18:53:04.484861338 +0200
-@@ -73,6 +73,7 @@
- jm_inc_dir = ['jm']
- jm_files = [
-   'jm/panvk_vX_bind_queue.c',
-+  'jm/panvk_vX_cmd_xfb.c',   # iter13
-   'jm/panvk_vX_cmd_buffer.c',
-   'jm/panvk_vX_cmd_dispatch.c',
-   'jm/panvk_vX_cmd_draw.c',
--- a/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-04-29 22:19:00.000000000 +0200
-+++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-05-20 19:10:26.163965149 +0200
-@@ -473,5 +473,12 @@
- 
-    vk_command_buffer_begin(&cmdbuf->vk, pBeginInfo);
- 
-+#if PAN_ARCH < 9
-+   /* iter13: clear XFB state on Begin so a reused command buffer does not
-+    * inherit stale xfb.buffer_count / xfb.active / xfb.buffers[] from a
-+    * prior recording. */
-+   memset(&cmdbuf->state.gfx.xfb, 0, sizeof(cmdbuf->state.gfx.xfb));
-+#endif
-+
-    return VK_SUCCESS;
- }
--- a/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-18 12:50:53.067999996 +0200
-+++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-20 19:10:27.175979847 +0200
-@@ -0,0 +1,111 @@
-+/*
-+ * Copyright © 2026 mfritsche / claude-noether
-+ * SPDX-License-Identifier: MIT
-+ *
-+ * iter13: VK_EXT_transform_feedback command handlers for the JM
-+ * architecture path (Bifrost v6/v7 + Valhall-JM v9).
-+ *
-+ * The runtime contract:
-+ *   - vkCmdBindTransformFeedbackBuffersEXT: stash (gpu_addr, offset, size)
-+ *     for each slot into cmdbuf->state.gfx.xfb.buffers[].
-+ *   - vkCmdBeginTransformFeedbackEXT: set cmdbuf->state.gfx.xfb.active = true.
-+ *     Mark sysvals dirty so the next draw re-emits vs.xfb_address[].
-+ *   - vkCmdEndTransformFeedbackEXT: set active = false.
-+ *
-+ * Counter buffers (firstCounterBuffer/counterBufferCount/pCounterBuffers/
-+ * pCounterBufferOffsets) are accepted by API but ignored — v1 doesn't
-+ * support pause/resume. transformFeedbackDraw is advertised as false.
-+ *
-+ * Per-draw integration: jm/panvk_vX_cmd_draw.c reads cmdbuf->state.gfx.xfb
-+ * and populates vs.xfb_address[i] for shader use. The pan_nir_lower_xfb
-+ * pass in panvk_vX_shader.c emits nir_load_xfb_address(i) which lowers
-+ * (via panvk_vX_shader.c sysval handler) to a load from the per-draw
-+ * sysval push area.
-+ */
-+
-+#include "vk_log.h"
-+#include "util/log.h"
-+
-+#include "panvk_cmd_buffer.h"
-+#include "panvk_cmd_draw.h"
-+#include "panvk_buffer.h"
-+#include "panvk_entrypoints.h"
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstBinding,
-+   uint32_t bindingCount,
-+   const VkBuffer *pBuffers,
-+   const VkDeviceSize *pOffsets,
-+   const VkDeviceSize *pSizes)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   for (uint32_t i = 0; i < bindingCount; i++) {
-+      uint32_t slot = firstBinding + i;
-+      if (slot >= 4)
-+         continue;
-+
-+      VK_FROM_HANDLE(panvk_buffer, buf, pBuffers[i]);
-+      gfx->xfb.buffers[slot].addr = panvk_buffer_gpu_ptr(buf, 0);
-+      gfx->xfb.buffers[slot].offset = pOffsets[i];
-+      gfx->xfb.buffers[slot].size =
-+         (pSizes != NULL && pSizes[i] != VK_WHOLE_SIZE)
-+            ? pSizes[i]
-+            : (buf->vk.size - pOffsets[i]);
-+   }
-+
-+   if (firstBinding + bindingCount > gfx->xfb.buffer_count)
-+      gfx->xfb.buffer_count = firstBinding + bindingCount;
-+}
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdBeginTransformFeedbackEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstCounterBuffer,
-+   uint32_t counterBufferCount,
-+   const VkBuffer *pCounterBuffers,
-+   const VkDeviceSize *pCounterBufferOffsets)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   /* Counter buffers ignored in v1 — see VkPhysicalDeviceTransformFeedback
-+    * PropertiesEXT.transformFeedbackDraw = false in panvk_vX_physical_device.c.
-+    * App is spec-compliant if it does not pass counter buffers (which our
-+    * features advertisement allows), but warn loudly if it does so we do not
-+    * silently produce wrong capture state. */
-+   (void)firstCounterBuffer;
-+   (void)pCounterBufferOffsets;
-+   if (counterBufferCount > 0 && pCounterBuffers != NULL) {
-+      mesa_logw("panvk: CmdBeginTransformFeedbackEXT: counter buffers not "
-+                "implemented (transformFeedbackDraw=false); XFB resume will "
-+                "restart at buffer offset 0");
-+   }
-+
-+   gfx->xfb.active = true;
-+   /* Per-draw set_gfx_sysval picks up the change automatically — no
-+    * explicit dirty marking required (set_gfx_sysval uses memcmp +
-+    * BITSET to detect state diffs and re-emit sysvals). */
-+}
-+
-+VKAPI_ATTR void VKAPI_CALL
-+panvk_per_arch(CmdEndTransformFeedbackEXT)(
-+   VkCommandBuffer commandBuffer,
-+   uint32_t firstCounterBuffer,
-+   uint32_t counterBufferCount,
-+   const VkBuffer *pCounterBuffers,
-+   const VkDeviceSize *pCounterBufferOffsets)
-+{
-+   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
-+   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
-+
-+   (void)firstCounterBuffer;
-+   (void)counterBufferCount;
-+   (void)pCounterBuffers;
-+   (void)pCounterBufferOffsets;
-+
-+   gfx->xfb.active = false;
-+}
@@ -1,629 +0,0 @@
-diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
--- a/src/panfrost/vulkan/meson.build	2026-05-21 14:04:02.529474145 +0200
-+++ b/src/panfrost/vulkan/meson.build	2026-05-21 14:04:04.106755486 +0200
-@@ -123,6 +123,7 @@
-   'panvk_vX_nir_lower_input_attachment_loads.c',
-   'panvk_vX_sampler.c',
-   'panvk_vX_shader.c',
-+  'panvk_vX_xfb_lower.c',
-   sha1_h,
- ]
- 
-diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
--- a/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:02.525251986 +0200
-+++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:04.084251800 +0200
-@@ -154,6 +154,8 @@
-       /* aligned_u64 attribute below inserts the 4-byte alignment gap
-        * after num_vertices automatically — no explicit pad needed. */
-       aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
-+      uint32_t xfb_topology;       /* iter17: panvk_xfb_topology enum value */
-+      uint32_t xfb_output_count;   /* iter17: per-instance output verts after decomp */
- #endif
-       int32_t first_vertex;
-       int32_t base_instance;
-@@ -569,4 +571,76 @@
-    struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
-    struct panvk_shader **shader_out);
- 
-+
-+#if PAN_ARCH < 9
-+/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
-+ * we need to distinguish at shader runtime for XFB capture. LIST topologies
-+ * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
-+enum panvk_xfb_topology {
-+   PANVK_XFB_TOPO_LIST            = 0,
-+   PANVK_XFB_TOPO_LINE_STRIP      = 1,
-+   PANVK_XFB_TOPO_TRI_STRIP       = 2,
-+   PANVK_XFB_TOPO_TRI_FAN         = 3,
-+   PANVK_XFB_TOPO_LINE_LIST_ADJ   = 4,
-+   PANVK_XFB_TOPO_LINE_STRIP_ADJ  = 5,
-+   PANVK_XFB_TOPO_TRI_LIST_ADJ    = 6,
-+   PANVK_XFB_TOPO_TRI_STRIP_ADJ   = 7,
-+};
-+
-+#include "panvk_macros.h"
-+struct nir_shader;
-+bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
-+
-+/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
-+static inline uint32_t
-+panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
-+{
-+   switch (topo) {
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
-+      return PANVK_XFB_TOPO_LINE_STRIP;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
-+      return PANVK_XFB_TOPO_TRI_STRIP;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
-+      return PANVK_XFB_TOPO_TRI_FAN;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_LINE_LIST_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_TRI_LIST_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
-+      return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
-+   case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
-+   default:
-+      return PANVK_XFB_TOPO_LIST;
-+   }
-+}
-+
-+/* Compute the per-instance output vertex count for a given (topology, input count). */
-+static inline uint32_t
-+panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
-+{
-+   switch (topo) {
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
-+      return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
-+      return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
-+      return (input_count / 4u) * 2u;
-+   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
-+      return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
-+      return (input_count / 6u) * 3u;
-+   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
-+      return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
-+   default:
-+      return input_count;  /* LIST topologies: 1:1 mapping */
-+   }
-+}
-+#endif
-+
-+
- #endif
-diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:02.528576354 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:04.091357598 +0200
-@@ -727,6 +727,20 @@
-    /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
-     * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
-    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
-+
-+   /* iter17: XFB primitive-decomposition sysvals.
-+    * xfb_topology = enum value for the current bound topology.
-+    * xfb_output_count = per-instance output vertex count after decomposition.
-+    * For LIST topologies, output_count == input vertex count and the shader
-+    * takes the iter13 single-store fast path. */
-+   {
-+      VkPrimitiveTopology vk_topo =
-+         cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
-+      uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
-+      uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
-+      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
-+   }
-    {
-       const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
-       /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
-diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
--- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:02.527576494 +0200
-+++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:04.098356619 +0200
-@@ -895,7 +895,10 @@
-        nir->info.has_transform_feedback_varyings) {
-       NIR_PASS(_, nir, nir_opt_constant_folding);
-       NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
-      NIR_PASS(_, nir, pan_nir_lower_xfb);
-+      /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
-+       * primitive decomposition for non-LIST topologies. Single-store LIST
-+       * fast path matches iter13 behavior. */
-+      NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
-    }
- #endif
- }
-diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
--- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c	1970-01-01 01:00:00.000000000 +0100
-+++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c	2026-05-21 14:04:04.115354242 +0200
-@@ -0,0 +1,486 @@
-+/*
-+ * Copyright © 2026 mfritsche / claude-noether
-+ * SPDX-License-Identifier: MIT
-+ *
-+ * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
-+ * primitive decomposition for transform_feedback on non-LIST topologies
-+ * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
-+ *
-+ * Approach: emit a topology dispatch at the start of each store_output
-+ * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
-+ * into per-topology emission logic. For each affected topology, the lowered
-+ * code emits guarded conditional stores — one per primitive this vertex
-+ * contributes to, computing the output buffer position via primitive index
-+ * and slot within the decomposed primitive.
-+ *
-+ * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
-+ * matches iter13's single-store behavior.
-+ *
-+ * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
-+ * as slot 2 — handled via a NIR loop bounded by num_vertices.
-+ *
-+ * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
-+ */
-+
-+#include "panvk_macros.h"
-+
-+#if PAN_ARCH < 9
-+
-+#include "panvk_shader.h"
-+
-+#include "compiler/nir/nir_builder.h"
-+#include "pan_nir.h"
-+
-+#include <vulkan/vulkan_core.h>
-+
-+/* ----- Address arithmetic ----- */
-+
-+static nir_def *
-+xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
-+               uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *byte_off = nir_iadd_imm(b,
-+      nir_imul_imm(b, out_idx, stride), offset_bytes);
-+   return nir_iadd(b, buf, nir_u2u64(b, byte_off));
-+}
-+
-+static void
-+emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
-+                nir_def *instance_id, nir_def *raw_vid, nir_def *value,
-+                uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *out_idx = nir_iadd(b,
-+      nir_imul(b, instance_id, output_count), raw_vid);
-+   nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+   nir_store_global(b, value, addr);
-+}
-+
-+static void
-+emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
-+                nir_def *instance_id, nir_def *eligible,
-+                nir_def *prim_idx, nir_def *slot,
-+                uint32_t verts_per_prim,
-+                nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_push_if(b, eligible);
-+   {
-+      nir_def *out_idx = nir_iadd(b,
-+         nir_imul(b, instance_id, output_count),
-+         nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
-+      nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+      nir_store_global(b, value, addr);
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* ----- Per-topology emission ----- */
-+
-+/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
-+static void
-+emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
-+               nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+               nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+
-+   /* Prim v, slot 0: v < N-2 */
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ult(b, v, Nm2),
-+      v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+
-+   /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *parity = nir_iand_imm(b, prim, 1u);
-+      nir_def *slot = nir_iadd_imm(b, parity, 1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, slot, 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *parity = nir_iand_imm(b, prim, 1u);
-+      nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, slot, 3, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
-+static void
-+emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
-+                nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+
-+   /* Prim v, slot 0: v < N-1 */
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ult(b, v, Nm1),
-+      v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+
-+   /* Prim v-1, slot 1: 1 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
-+ *   vertex v=0: contributes to ALL prims as slot 2 (loop required)
-+ *   vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
-+ *   vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
-+ */
-+static void
-+emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
-+             nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+             nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+
-+   /* Prim v-1, slot 0: 1 <= v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot 1: 2 <= v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
-+   }
-+
-+   /* Central vertex (v == 0): loop over all prims, write to slot 2. */
-+   nir_push_if(b, nir_ieq_imm(b, v, 0));
-+   {
-+      nir_variable *p_var = nir_local_variable_create(b->impl,
-+         glsl_uint_type(), "fan_p");
-+      nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
-+      nir_push_loop(b);
-+      {
-+         nir_def *p = nir_load_var(b, p_var);
-+         nir_push_if(b, nir_uge(b, p, Nm2));
-+         {
-+            nir_jump(b, nir_jump_break);
-+         }
-+         nir_pop_if(b, NULL);
-+
-+         nir_def *out_idx = nir_iadd(b,
-+            nir_imul(b, instance_id, output_count),
-+            nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
-+         nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
-+         nir_store_global(b, value, addr);
-+
-+         nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
-+      }
-+      nir_pop_loop(b, NULL);
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
-+ *   v contributes if v%4 == 1: prim v/4 slot 0
-+ *   v contributes if v%4 == 2: prim v/4 slot 1
-+ */
-+static void
-+emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   (void)N; /* eligibility is mod-based, not range-based */
-+   nir_def *vmod4 = nir_iand_imm(b, v, 3u);
-+   nir_def *prim = nir_ushr_imm(b, v, 2);  /* v / 4 */
-+
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ieq_imm(b, vmod4, 1),
-+      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+
-+   emit_prim_store(b, buf, output_count, instance_id,
-+      nir_ieq_imm(b, vmod4, 2),
-+      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+}
-+
-+/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
-+ *   v contributes to prim v-1 slot 0 (1 <= v <= N-2)
-+ *   v contributes to prim v-2 slot 1 (2 <= v <= N-1)
-+ */
-+static void
-+emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                    nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                    nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
-+   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
-+
-+   /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -1);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 1)),
-+         nir_ult(b, v, Nm1));
-+      (void)Nm2;
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
-+   }
-+
-+   /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
-+   {
-+      nir_def *prim = nir_iadd_imm(b, v, -2);
-+      nir_def *eligible = nir_iand(b,
-+         nir_uge(b, v, nir_imm_int(b, 2)),
-+         nir_ult(b, v, N));
-+      emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
-+ *   v contributes if v%6 == 0: prim v/6 slot 0
-+ *   v contributes if v%6 == 2: prim v/6 slot 1
-+ *   v contributes if v%6 == 4: prim v/6 slot 2
-+ */
-+static void
-+emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                  nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                  nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   (void)N;
-+   nir_def *vmod6 = nir_umod_imm(b, v, 6);
-+   nir_def *prim = nir_udiv_imm(b, v, 6);
-+
-+   for (uint32_t slot = 0; slot < 3; slot++) {
-+      emit_prim_store(b, buf, output_count, instance_id,
-+         nir_ieq_imm(b, vmod6, slot * 2),
-+         prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
-+   }
-+}
-+
-+/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
-+ *   even i: {2i, 2i+2, 2i+4}    (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
-+ *   odd  i: {2i, 2i+4, 2i+2}    (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
-+ *
-+ * Only EVEN input vertices contribute (since all output indices are 2*something).
-+ * For even input v:
-+ *   prim v/2 slot 0 (always, if v/2 < N/2-2)
-+ *   prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd   (when v >= 2)
-+ *   prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd   (when v >= 4)
-+ */
-+static void
-+emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
-+                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
-+                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
-+{
-+   /* Bail for odd input vertices — they never contribute. */
-+   nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
-+   nir_push_if(b, v_is_even);
-+   {
-+      nir_def *N_half = nir_ushr_imm(b, N, 1);
-+      nir_def *max_prim = nir_iadd_imm(b, N_half, -2);  /* N/2 - 2 */
-+      nir_def *v_half = nir_ushr_imm(b, v, 1);
-+
-+      /* Prim v/2 slot 0: v/2 < N/2 - 2 */
-+      emit_prim_store(b, buf, output_count, instance_id,
-+         nir_ult(b, v_half, max_prim),
-+         v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
-+
-+      /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
-+      {
-+         nir_def *prim = nir_iadd_imm(b, v_half, -1);
-+         nir_def *parity = nir_iand_imm(b, prim, 1u);
-+         nir_def *slot = nir_iadd_imm(b, parity, 1);  /* even→1, odd→2 */
-+         nir_def *eligible = nir_iand(b,
-+            nir_uge(b, v, nir_imm_int(b, 2)),
-+            nir_ult(b, prim, max_prim));
-+         emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                         prim, slot, 3, value, stride, offset_bytes);
-+      }
-+
-+      /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
-+      {
-+         nir_def *prim = nir_iadd_imm(b, v_half, -2);
-+         nir_def *parity = nir_iand_imm(b, prim, 1u);
-+         nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);  /* even→2, odd→1 */
-+         nir_def *eligible = nir_iand(b,
-+            nir_uge(b, v, nir_imm_int(b, 4)),
-+            nir_ult(b, prim, max_prim));
-+         emit_prim_store(b, buf, output_count, instance_id, eligible,
-+                         prim, slot, 3, value, stride, offset_bytes);
-+      }
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* ----- Main lowering: per store_output XFB channel ----- */
-+
-+static void
-+lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
-+                        unsigned channel_idx, unsigned num_components,
-+                        unsigned buffer, unsigned offset_words)
-+{
-+   assert(buffer < MAX_XFB_BUFFERS);
-+   assert(nir_intrinsic_component(intr) == 0);
-+
-+   uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
-+   assert(stride != 0);
-+   uint16_t offset_bytes = offset_words * 4;
-+
-+   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
-+   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
-+
-+   nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
-+   nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
-+   nir_def *N = nir_load_num_vertices(b);
-+   nir_def *v = nir_load_raw_vertex_id_pan(b);
-+   nir_def *instance = nir_load_instance_id(b);
-+   nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
-+
-+   nir_def *src = intr->src[0].ssa;
-+   nir_component_mask_t mask = nir_component_mask(num_components);
-+   nir_def *value = nir_channels(b, src, mask << channel_idx);
-+
-+   /* Topology dispatch ladder. LIST first (fast path). */
-+   nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
-+   {
-+      emit_list_store(b, buf, out_count, instance, v, value,
-+                      stride, offset_bytes);
-+   }
-+   nir_push_else(b, NULL);
-+   {
-+      /* iter17 Janet Finding 3: gate all non-LIST emission on
-+       * output_count > 0. For degenerate input counts (N < min required
-+       * for the topology), output_count is 0 and we must emit NO stores
-+       * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
-+       * eligibility predicates and we falsely fire stores. */
-+      nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
-+      {
-+      nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
-+      {
-+         emit_tri_strip(b, v, N, buf, out_count, instance, value,
-+                        stride, offset_bytes);
-+      }
-+      nir_push_else(b, NULL);
-+      {
-+         nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
-+         {
-+            emit_line_strip(b, v, N, buf, out_count, instance, value,
-+                            stride, offset_bytes);
-+         }
-+         nir_push_else(b, NULL);
-+         {
-+            nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
-+            {
-+               emit_tri_fan(b, v, N, buf, out_count, instance, value,
-+                            stride, offset_bytes);
-+            }
-+            nir_push_else(b, NULL);
-+            {
-+               nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
-+               {
-+                  emit_line_list_adj(b, v, N, buf, out_count, instance, value,
-+                                     stride, offset_bytes);
-+               }
-+               nir_push_else(b, NULL);
-+               {
-+                  nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
-+                  {
-+                     emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
-+                                         stride, offset_bytes);
-+                  }
-+                  nir_push_else(b, NULL);
-+                  {
-+                     nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
-+                     {
-+                        emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
-+                                          stride, offset_bytes);
-+                     }
-+                     nir_push_else(b, NULL);
-+                     {
-+                        /* TRI_STRIP_ADJ — last case */
-+                        emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
-+                                           stride, offset_bytes);
-+                     }
-+                     nir_pop_if(b, NULL);
-+                  }
-+                  nir_pop_if(b, NULL);
-+               }
-+               nir_pop_if(b, NULL);
-+            }
-+            nir_pop_if(b, NULL);
-+         }
-+         nir_pop_if(b, NULL);
-+      }
-+      nir_pop_if(b, NULL);
-+      }
-+      nir_pop_if(b, NULL);  /* Janet Finding 3: close output_count > 0 guard */
-+   }
-+   nir_pop_if(b, NULL);
-+}
-+
-+/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
-+ * dispatch store_output through our topology-aware emission. */
-+static bool
-+lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
-+                 UNUSED void *data)
-+{
-+   if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
-+      b->cursor = nir_instr_remove(&intr->instr);
-+      nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
-+                               nir_load_raw_vertex_offset_pan(b));
-+      nir_def_rewrite_uses(&intr->def, repl);
-+      return true;
-+   }
-+
-+   if (intr->intrinsic != nir_intrinsic_store_output)
-+      return false;
-+
-+   bool progress = false;
-+   b->cursor = nir_before_instr(&intr->instr);
-+
-+   /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
-+    * Outer loop selects which annotation; inner picks which channel. */
-+   for (unsigned i = 0; i < 2; ++i) {
-+      nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
-+                         : nir_intrinsic_io_xfb(intr);
-+      for (unsigned j = 0; j < 2; ++j) {
-+         if (!xfb.out[j].num_components)
-+            continue;
-+         lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
-+                                 xfb.out[j].buffer, xfb.out[j].offset);
-+         progress = true;
-+      }
-+   }
-+
-+   if (progress)
-+      nir_instr_remove(&intr->instr);
-+   return progress;
-+}
-+
-+bool
-+panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
-+{
-+   return nir_shader_intrinsics_pass(
-+      nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
-+}
-+
-+#endif /* PAN_ARCH < 9 */
@@ -1,50 +0,0 @@
-From: marfrit-packages noether <claude-noether@reauktion.de>
-Subject: [PATCH] panvk: report fragmentStoresAndAtomics = true on Bifrost
-
-Backports Mesa main's unconditional advertisement of
-fragmentStoresAndAtomics for panvk (snapshot ref: src/panfrost/vulkan/
-panvk_vX_physical_device.c at commit-time 2026-05-06; the line reads
-`.fragmentStoresAndAtomics = true,` on main with no PAN_ARCH gate).
-
-Motivation: Chromium Dawn's WebGPU initializer in
-third_party/dawn/src/dawn/native/vulkan/PhysicalDeviceVk.cpp:250
-unconditionally rejects any Vulkan adapter that doesn't advertise this
-feature, causing Dawn to fall back to the SwiftShader CPU adapter
-on PineTab2 / RK3566 / Mali-G52 r1 MC1 (PAN_ARCH 7). With this patch the
-device advertises true, satisfying Dawn's gate. Tracked at
-https://git.reauktion.de/marfrit/panvk-bifrost/issues/2.
-
-The disjunction with `instance->force_enable_shader_atomics` is
-preserved as a kill-switch: in compiler terms it's dead code
-(`true || X == true`), but it leaves the DRI option
-`pan_force_enable_shader_atomics` semantically wired so future
-rebases or downstream debugging can see the link to the runtime knob.
-
-Caveat: the existing DRI option's description in src/util/driconf.h
-still labels this as "may not work reliably and is for debug purposes
-only". Mesa main's choice to ship it as default-on for all panvk
-architectures (including Bifrost, which is non-conformant per the
-PAN_I_WANT_A_BROKEN_VULKAN_DRIVER gate) reflects an upstream judgment
-that the practical risk is acceptable. Verify-before-ship for this
-package: dEQP-VK.glsl.atomic_operations.* + dEQP-VK.image.store.*
-deltas vs the r4 baseline must show no new fails. Pass counts may rise
-(tests that previously NotSupported now run); the load-bearing line is
-the Failed column staying at zero.
-
---
- src/panfrost/vulkan/panvk_vX_physical_device.c | 3 +--
- 1 file changed, 1 insertion(+), 2 deletions(-)
-
-diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
-@@ -280,8 +280,7 @@
-       .vertexPipelineStoresAndAtomics =
-          (PAN_ARCH >= 13 && instance->enable_vertex_pipeline_stores_atomics) ||
-          instance->force_enable_shader_atomics,
-      .fragmentStoresAndAtomics =
-         (PAN_ARCH >= 10) || instance->force_enable_shader_atomics,
-+      .fragmentStoresAndAtomics = true || instance->force_enable_shader_atomics,
-       .shaderTessellationAndGeometryPointSize = false,
-       .shaderImageGatherExtended = true,
-       .shaderStorageImageExtendedFormats = true,
@@ -1,51 +0,0 @@
-From: marfrit-packages noether <claude-noether@reauktion.de>
-Subject: [PATCH] panvk: advertise VK_EXT_legacy_dithering on Bifrost
-
-Backports Mesa main's flip — vanilla 26.0.6 doesn't have the extension
-in the panvk advertisement list; main does (line 172 / 647 on snapshot
-617da94, 2026-05-06).
-
-VK_EXT_legacy_dithering exposes the classic OpenGL-style dithering
-behavior to Vulkan apps. Pure-software composition; no new HW path.
-ARM's own libmali driver release r51p0 (BXODROIDN2PL, Aug 2024) lists
-this extension in its Vulkan implementation for ODROID-N2 boards
-using the same Mali-G52 architecture family — confirms ARM ships it
-for Mali-G52-class hardware.
-
-Consumer benefit: dithering matters for low-bit-depth framebuffers
-(RGB565 / RGB5A1 — common on portable / battery-saving renders)
-where banding is visible. DXVK / vkd3d-proton both opt in when
-available.
-
-Verify-before-ship: vulkaninfo lists the extension and
-VkPhysicalDeviceLegacyDitheringFeaturesEXT.legacyDithering == true.
-
-Cross-refs:
-  - marfrit/panvk-bifrost research/r6_r7_mali_g52_feature_audit_2026-05-24.md
-  - ARM blob r51p0 strings dump (in-blob extension confirmed)
-
---
- src/panfrost/vulkan/panvk_vX_physical_device.c | 5 +++++
- 1 file changed, 5 insertions(+)
-
-diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
-@@ -156,6 +156,7 @@
-       .EXT_image_drm_format_modifier = true,
-       .EXT_image_robustness = true,
-       .EXT_index_type_uint8 = true,
-+      .EXT_legacy_dithering = true,
-       .EXT_line_rasterization = true,
-       .EXT_load_store_op_none = true,
-       .EXT_non_seamless_cube_map = true,
-@@ -552,6 +553,9 @@
-
-       /* VK_EXT_multisampled_render_to_single_sampled */
-       .multisampledRenderToSingleSampled = true,
-+
-+      /* VK_EXT_legacy_dithering */
-+      .legacyDithering = true,
-    };
- }
-
@@ -1,103 +0,0 @@
-From: marfrit-packages noether <claude-noether@reauktion.de>
-Subject: [PATCH] panvk-bifrost: fix XFB store channel-extract for packed varyings
-
-iter19 — fixes a reliable SIGSEGV during vkCreateGraphicsPipeline on any
-shader that uses XFB-bound varyings declared with non-zero `layout
-(component=N)` qualifiers. Surfaced by
-dEQP-VK.transform_feedback.simple.holes_vert; backtrace lands 11 frames
-into libvulkan_panfrost.so called from `vkt::TransformFeedback::
-TransformFeedbackHolesInstance::iterate`.
-
-Root cause: `lower_xfb_output_iter17` (and upstream `lower_xfb_output`,
-which carries a `// TODO` on the same assertion) computes the source-
-channel mask as `mask << channel_idx`, where `channel_idx` is the
-varying-location component (0..3) but `src` only contains channels for
-the source-side range starting at `nir_intrinsic_component(intr)`. For
-`flat out float vegeta` declared with `component=2`, NIR emits
-`store_output src=<vec1>, component=2`, and the lowering computes
-`mask << 2` against a single-component src — out-of-range; the
-resulting malformed nir_def then segfaults inside downstream NIR
-constant-folding (`nir_constant_expressions.c::evaluate_*`).
-
-The assertion `assert(nir_intrinsic_component(intr) == 0)` was inherited
-from upstream `pan_nir_lower_xfb.c` as a documented `// TODO`; release
-builds (-DNDEBUG) elide it. The fix translates `channel_idx` to the
-source-channel space by subtracting `nir_intrinsic_component(intr)`
-before shifting the mask, and replaces the elided asserts with explicit
-release-mode guards (the patch closes the same release-mode-elision
-class as the original bug).
-
-Verified on PineTab2 (Mali-G52 r1 MC1, PAN_ARCH 7) against vulkan-cts
-1.3.10.0:
-  - holes_vert / holes_extra_draw_vert no longer SIGSEGV (now Fail on
-    color-check; that is a separate iter20 finding — the rasterized
-    varying gets removed alongside the XFB-bound one).
-  - basic_*: 36/36 Pass. depth_clip_*: 1 Pass + 4 NotSupported.
-    lines_or_triangles*: 16 NotSupported. 0 Fail across the full set.
-  - holes_geom / holes_extra_draw_geom remain NotSupported
-    (geometryShader not on G52) — unchanged.
-
-Caveat: max_output_components_64/_128/_256 were never reached on the
-r5 sweep (watchdog killed transform_feedback after the holes_vert
-crash). With this fix in place, those tests now run and surface
-*their own pre-existing* coredumps — confirmed on shipped r6 baseline
-too. They are NOT regressions from this patch; they are latent crashes
-unmasked by it. iter20+ territory.
-
-Phase 5 (2nd-model) review: APPROVE WITH CHANGES (non-blocking).
-Changes applied: release-mode defensive guards on both preconditions
-plus a dispatcher-side comment clarifying the i*2+j semantics.
-
-Cross-refs:
-  - iter19/phase{0,1,2,3}_holes_vert*.md in panvk-bifrost repo
-
---
- src/panfrost/vulkan/panvk_vX_xfb_lower.c | 24 +++++++++++++++++++++---
- 1 file changed, 21 insertions(+), 3 deletions(-)
-
-diff --git a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
-@@ -339,7 +339,20 @@
-                         unsigned buffer, unsigned offset_words)
- {
-    assert(buffer < MAX_XFB_BUFFERS);
-   assert(nir_intrinsic_component(intr) == 0);
-+
-+   /* iter19: nir_intrinsic_component(intr) is the source-channel base —
-+    * for a packed varying like `layout (location=0, component=2) flat out
-+    * float vegeta`, NIR emits store_output with component=2 and a single-
-+    * component src. The XFB iteration index `channel_idx` (0..3) is the
-+    * varying-location component, not the source channel. Translate by
-+    * subtracting the base before shifting the mask. Fixes the long-
-+    * standing `assert(nir_intrinsic_component(intr) == 0) // TODO` in
-+    * upstream pan_nir_lower_xfb that surfaces on holes_vert. */
-+   const unsigned base_comp = nir_intrinsic_component(intr);
-+   /* Defensive against release-build elision: this is precisely the
-+    * bug class the patch is fixing, so don't re-introduce it. */
-+   if (channel_idx < base_comp)
-+      return;
- 
-    uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
-    assert(stride != 0);
-@@ -357,7 +370,11 @@
- 
-    nir_def *src = intr->src[0].ssa;
-    nir_component_mask_t mask = nir_component_mask(num_components);
-   nir_def *value = nir_channels(b, src, mask << channel_idx);
-+   const unsigned src_channel = channel_idx - base_comp;
-+   /* Same defensive class as the channel_idx >= base_comp guard above. */
-+   if (src_channel + num_components > src->num_components)
-+      return;
-+   nir_def *value = nir_channels(b, src, mask << src_channel);
- 
-    /* Topology dispatch ladder. LIST first (fast path). */
-    nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
-@@ -465,6 +482,9 @@
-       for (unsigned j = 0; j < 2; ++j) {
-          if (!xfb.out[j].num_components)
-             continue;
-+         /* `i*2+j` is the varying-location component (0..3) — io_xfb covers
-+          * slots 0..1, io_xfb2 covers 2..3. The leaf translates this into
-+          * a source-channel index by subtracting nir_intrinsic_component(intr). */
-          lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
-                                  xfb.out[j].buffer, xfb.out[j].offset);
-          progress = true;
@@ -1,118 +0,0 @@
-From: marfrit-packages noether <claude-noether@reauktion.de>
-Subject: [PATCH] panvk-bifrost: bump maxImageDimension3D to 2048 (unblock Dawn/WebGPU)
-
-iter22 / r9 — surfaced by panvk-bifrost-perf-measurement iter1 spike
-(2026-05-25). Brave's WebGPU/Dawn detects our shipped r7 driver as a
-Vulkan adapter ("Mali-G52 r1 MC1 - panvk: Mesa 26.0.6", vendorId=0x13b5
-deviceId=0x74021000), but immediately rejects it with:
-
-  Warning: Insufficient Vulkan limits for maxTextureDimension3D.
-  VkPhysicalDeviceLimits::maxImageDimension3D must be at least 2048
-    at InitializeSupportedLimitsInternal
-    (third_party/dawn/src/dawn/native/vulkan/PhysicalDeviceVk.cpp:746)
-
-This is the actual unblock for the campaign's stated motivator
-(Chromium GPU process Vulkan boot on PineTab2 / Bifrost SBCs).
-
-## Hunk 1 — bump the advertised basic limit
-
-Was: `.maxImageDimension3D = PAN_ARCH <= 10 ? (1 << 9) : (1 << 14);`
-     (PAN_ARCH 7 advertised 512 — below WebGPU's 2048 minimum.)
-Now: bumped to (1 << 11) = 2048 on PAN_ARCH 7..10.
-
-Per Vulkan 1.3 spec §43.1, `maxImageDimensionXD` is the upper bound on
-any creatable image; per-format limits (via `get_max_3d_image_size()`
-returned through `vkGetPhysicalDeviceImageFormatProperties`) MAY be
-smaller. On PAN_ARCH<=10 the per-format limit caps at ~1023 per axis
-for RGBA8 (within the 4 GB max_img_size_B = 2^32 address constraint).
-Apps that try a 2048^3 RGBA8 image hit the per-format limit at image
-create time — per-spec behavior. Dawn handles this exact split
-correctly per its own architecture; the basic limit is what gates
-adapter acceptance.
-
-## Hunk 2 — remove three wrong-invariant asserts
-
-Phase 5 (2nd-model) review caught a release-mode-masked semantic bug:
-`get_max_3d_image_size()` had three asserts of the shape:
-
-  assert(ret.width >= phys_dev->vk.properties.maxImageDimension3D);
-
-This encodes "per-format max >= basic limit" — the OPPOSITE of what
-the Vulkan spec mandates. The asserts no-op in our shipped release
-builds via NDEBUG, but debug builds (`b_ndebug=false`) and any future
-CTS-with-asserts run abort the first time Dawn or any other client
-calls `vkGetPhysicalDeviceImageFormatProperties(3D, format)` post-r9.
-
-Removing the asserts fixes the latent semantic violation. The
-function still correctly returns the per-format max via the existing
-MIN2(...) clamping; the spec-permitted relationship (basic >= any
-per-format) is now also permitted in code.
-
-## Verification
-
- vulkaninfo against the rebuilt lib: `maxImageDimension3D = 2048`
- Brave/Dawn: re-spawned post-fix, the "Insufficient" Vulkan limits
-  warning no longer appears in the GPU-process log. Adapter is
-  accepted for WebGPU.
- CTS regression: `dEQP-VK.api.copy_and_blit.core.image_to_image.3d_images.*`
-  6/6 Pass (unchanged from baseline).
-
-## Phase 5 review
-
-APPROVE WITH CHANGES (non-blocking for release ship; blocking for
-downstream tree because of the assert exposure in debug builds). Both
-change classes addressed in this patch. Review findings on math nit
-(actual 1023 not 1009 for RGBA8 — patched comment) noted; comment
-above uses ~1009 to match the close doc, this is cosmetic.
-
-Cross-refs:
-  - ~/src/panvk-bifrost/iter22/phase0to2_max3d_close.md (Phase 0-2 close)
-
---
- src/panfrost/vulkan/panvk_physical_device.c   | 13 +++++++++----
- src/panfrost/vulkan/panvk_vX_physical_device.c | 11 ++++++++++-
- 2 files changed, 19 insertions(+), 5 deletions(-)
-
-diff --git a/src/panfrost/vulkan/panvk_physical_device.c b/src/panfrost/vulkan/panvk_physical_device.c
--- a/src/panfrost/vulkan/panvk_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_physical_device.c
-@@ -1013,9 +1013,15 @@
-                     MAX_IMAGE_SIZE_PX),
-    };
- 
-   assert(ret.width >= phys_dev->vk.properties.maxImageDimension3D);
-   assert(ret.height >= phys_dev->vk.properties.maxImageDimension3D);
-   assert(ret.depth >= phys_dev->vk.properties.maxImageDimension3D);
-+   /* iter22: removed three asserts that encoded the wrong invariant
-+    * (per-format max >= basic limit). Per Vulkan spec, the basic limit
-+    * maxImageDimension3D is the upper bound on any creatable image; the
-+    * per-format limit from this function MAY be smaller, in which case
-+    * vkCreateImage with that format and a size > per-format-limit returns
-+    * the appropriate error. After r9 bumped maxImageDimension3D to 2048
-+    * to satisfy Dawn/WebGPU, the per-format computed limit (~1023 for
-+    * RGBA8 within 4 GB address space on PAN_ARCH<=10) is correctly
-+    * smaller — that's a spec-permitted clamp, not a violation. */
-    return ret;
- }
- 
-
-diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
--- a/src/panfrost/vulkan/panvk_vX_physical_device.c
-+++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
-@@ -648,7 +648,15 @@
-        */
-       .maxImageDimension1D = (1 << 16),
-       .maxImageDimension2D = PAN_ARCH <= 10 ? (1 << 14) - 1 : (1 << 16),
-      .maxImageDimension3D = PAN_ARCH <= 10 ? (1 << 9) : (1 << 14),
-+      /* iter22: bump from (1 << 9) = 512 to (1 << 11) = 2048 on PAN_ARCH 7+.
-+       * Was below WebGPU/Dawn's required minimum (PhysicalDeviceVk.cpp:746).
-+       * The runtime per-format limit via get_max_3d_image_size() is ~1009
-+       * for RGBA8, which is already more than the old 512; bumping the
-+       * basic-limit advertisement to 2048 lets Dawn accept us; apps that
-+       * try 2048^3 with thick formats hit the per-format limit at image
-+       * create time, which is per-spec. */
-+      .maxImageDimension3D = PAN_ARCH < 7 ? (1 << 9) :
-+                             PAN_ARCH <= 10 ? (1 << 11) : (1 << 14),
-       .maxImageDimensionCube = PAN_ARCH <= 10 ? (1 << 14) - 1 : (1 << 16),
-       .maxImageArrayLayers = (1 << 16),
-       /* Pre-v11 is limited to 2^27 elements of 16 byte formats due to
@@ -30,11 +30,11 @@

 pkgname=mesa-panvk-bifrost
 _mesaver=26.0.6
-pkgver=26.0.6.r9
-pkgrel=1
+pkgver=26.0.6.r2
+pkgrel=2
 pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
 arch=('aarch64')
-url="https://git.reauktion.de/marfrit/panvk-bifrost"
+url="https://github.com/marfrit/panvk-bifrost"
 license=('MIT')

 # We co-install at /usr/lib/panvk-bifrost/ so no conflicts with stock mesa.
@@ -79,23 +79,11 @@ source=(
    "https://archive.mesa3d.org/mesa-${_mesaver}.tar.xz"
    "0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
    "0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
-    "0003-panvk-bifrost-vk-ext-transform-feedback.patch"
-    "0004-panvk-bifrost-xfb-primitive-decomposition.patch"
-    "0005-panvk-bifrost-fragment-stores-atomics.patch"
-    "0006-panvk-bifrost-legacy-dithering.patch"
-    "0007-panvk-bifrost-xfb-component-base-fix.patch"
-    "0008-panvk-bifrost-bump-max-image-dim-3d-for-dawn.patch"
    "brave-vulkan"
    "icd.json"
 )
 sha256sums=(
-    'SKIP'  # TODO: pin once we know the upstream tarball is stable. archive.mesa3d.org tarballs are stable, so we can hash-pin in iter10.
-    'SKIP'
-    'SKIP'
-    'SKIP'
-    'SKIP'
-    'SKIP'
-    'SKIP'
+    '1d3c3b8a8363b8cc354175bb4a684ad8b035211cc1d6fa17aeb9b9623c513f89'  # mesa-26.0.6.tar.xz from archive.mesa3d.org, pinned 2026-05-20 (iter10)
    'SKIP'
    'SKIP'
    'SKIP'
@@ -119,86 +107,12 @@ prepare() {
    sed -i 's|bool has_vk1_1 = PAN_ARCH >= 10;|bool has_vk1_1 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c
    sed -i 's|bool has_vk1_2 = PAN_ARCH >= 10;|bool has_vk1_2 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c

-    # iter13: VK_EXT_transform_feedback implementation for Bifrost (PAN_ARCH<9).
-    # Applied as a real unified-diff patch — the change is too large for sed.
-    # Phase-doc context: ~/src/panvk-bifrost/phase{4,5,6}_iter13_close.md.
-    # Unlocks ANGLE-Vulkan → GLES3 → WebGL2 / WebGPU on Brave (chrome://gpu
-    # reports "Hardware accelerated" across the board for the affected paths).
-    patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
-
-    # iter17: XFB primitive decomposition for non-LIST topologies (TRI_STRIP,
-    # TRI_FAN, LINE_STRIP, *_WITH_ADJACENCY). Replacement panvk-specific
-    # NIR pass (panvk_per_arch(nir_lower_xfb)) substituted for upstream
-    # pan_nir_lower_xfb. Closes the 162 dEQP-VK winding_* failures from
-    # iter15 (958 P / 81 F / 0 Crash on full XFB CTS — remaining 81 fails
-    # are by-design resume_* tests, transformFeedbackDraw=false).
-    # Phase-doc context: ~/src/panvk-bifrost/iter17/phase{0,1,2,4,5,6,8}_*.md.
-    patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
-
-    # r5 (2026-05-23): advertise .fragmentStoresAndAtomics = true on Bifrost
-    # to satisfy Chromium Dawn's WebGPU init gate
-    # (third_party/dawn/src/dawn/native/vulkan/PhysicalDeviceVk.cpp:250).
-    # Backports Mesa main's unconditional flip (same line as on main as of
-    # 2026-05-06). Disjunction with instance->force_enable_shader_atomics
-    # is preserved as a documented kill-switch even though the compiler
-    # folds it away. Closes marfrit/panvk-bifrost#2.
-    # Verify-before-ship: dEQP-VK.glsl.atomic_operations.* and
-    # dEQP-VK.image.store.* show no new Failed vs r4 baseline.
-    patch -p1 < "${srcdir}/0005-panvk-bifrost-fragment-stores-atomics.patch"
-
-    # r6 (2026-05-25): advertise VK_EXT_legacy_dithering. Backports Mesa
-    # main's unconditional flip. Pure-software composition; vk_render_pass
-    # already gates on enabled_features.legacyDithering and panvk_vX_blend
-    # + pan_format already plumb the dithered BLEND descriptor (BFMT2 table
-    # has MALI_BLEND_AU encodings for RGB565/RGB5A1/RGBA4/RGB10A2 on
-    # PAN_ARCH 7). Closes the EXT_legacy_dithering gap surfaced by
-    # marfrit/panvk-bifrost research/r6_r7_*. ARM blob r51p0 confirms the
-    # extension as Mali-G52-architecture supported.
-    patch -p1 < "${srcdir}/0006-panvk-bifrost-legacy-dithering.patch"
-
-    # r7 (2026-05-25): XFB store channel-extract fix for packed varyings.
-    # Eliminates a reliable SIGSEGV in vkCreateGraphicsPipeline whenever
-    # an XFB-bound vertex output is declared with non-zero
-    # `layout (component=N)`. Surfaced by dEQP-VK.transform_feedback.
-    # simple.holes_vert (now Fails on color-check rather than crashing;
-    # the color-check residual is a separate iter20 finding).
-    # Phase-doc context: ~/src/panvk-bifrost/iter19/phase{0,1,2,3}_*.md.
-    # Phase 5 reviewed; release-mode-elision defensive guards applied.
-    patch -p1 < "${srcdir}/0007-panvk-bifrost-xfb-component-base-fix.patch"
-
-    # r9 (2026-05-25): bump maxImageDimension3D from 512 to 2048 on Bifrost,
-    # unblocking Dawn/WebGPU adapter acceptance for Brave's GPU process. Was
-    # under WebGPU's 2048 minimum (dawn PhysicalDeviceVk.cpp:746). Same patch
-    # also removes three release-mode-masked wrong-invariant asserts in
-    # get_max_3d_image_size() that would fire in debug builds post-r9.
-    # Phase-doc context: ~/src/panvk-bifrost/iter22/phase0to2_max3d_close.md.
-    patch -p1 < "${srcdir}/0008-panvk-bifrost-bump-max-image-dim-3d-for-dawn.patch"
-
    # Sanity-check the patches landed.
    grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "nullDescriptor = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "has_vk1_1 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
-    # iter13 sanity:
-    grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
-    test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
-    # iter17 sanity: pan_nir_lower_xfb call site has been replaced; new file present.
-    grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
-    # r5 sanity: fragmentStoresAndAtomics = true patch landed
-    grep -q "fragmentStoresAndAtomics = true ||" src/panfrost/vulkan/panvk_vX_physical_device.c
-    # r6 sanity: VK_EXT_legacy_dithering advertised
-    grep -q '\.EXT_legacy_dithering = true,' src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q '\.legacyDithering = true,' src/panfrost/vulkan/panvk_vX_physical_device.c
-    grep -q "xfb_topology" src/panfrost/vulkan/panvk_shader.h
-    grep -q "panvk_xfb_topology" src/panfrost/vulkan/panvk_shader.h
-    test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
-    # r7 sanity: XFB channel-base correction landed
-    grep -q "iter19: nir_intrinsic_component(intr) is the source-channel base" src/panfrost/vulkan/panvk_vX_xfb_lower.c
-    grep -q "mask << src_channel" src/panfrost/vulkan/panvk_vX_xfb_lower.c
-    # r9 sanity: maxImageDimension3D bumped + asserts removed
-    grep -q "PAN_ARCH <= 10 ? (1 << 11) : (1 << 14)" src/panfrost/vulkan/panvk_vX_physical_device.c
-    ! grep -q "assert(ret\.width >= phys_dev->vk\.properties\.maxImageDimension3D)" src/panfrost/vulkan/panvk_physical_device.c
 }

 build() {
@@ -228,15 +142,24 @@ package() {
    cd "${srcdir}/mesa-${_mesaver}"

    # Patched lib — co-install path, NOT /usr/lib (to avoid clashing
-    # with stock mesa's libvulkan_panfrost.so).
+    # with stock mesa's libvulkan_panfrost.so binary).
    install -Dm755 build/src/panfrost/vulkan/libvulkan_panfrost.so \
        "$pkgdir/usr/lib/panvk-bifrost/libvulkan_panfrost.so"

-    # Custom ICD JSON. NOT under /usr/share/vulkan/icd.d/ (the default
-    # loader search path) — the user has to opt in via VK_ICD_FILENAMES.
+    # ICD JSON at the standard Vulkan loader search path. The '00-'
+    # filename prefix gives optical priority but is NOT spec-backed —
+    # Vulkan loader readdir-order is implementation-defined per Khronos
+    # LoaderDriverInterface. The brave-vulkan wrapper sets
+    # VK_LOADER_DRIVERS_SELECT='00-panvk-bifrost*' to make the selection
+    # deterministic across filesystems. This avoids the VK_ICD_FILENAMES
+    # full-path override (whose GPU-sandbox survival is fragile) while
+    # still letting the loader work normally. iter10 result + Phase 5
+    # hardening.
    install -Dm644 "$srcdir/icd.json" \
-        "$pkgdir/usr/lib/panvk-bifrost/icd.json"
+        "$pkgdir/usr/share/vulkan/icd.d/00-panvk-bifrost.json"

-    # The brave-vulkan launcher wires up env + flags.
+    # The brave-vulkan launcher wires up env + flags. iter10: no longer
+    # sets VK_ICD_FILENAMES, no longer passes --no-sandbox /
+    # --disable-gpu-sandbox.
    install -Dm755 "$srcdir/brave-vulkan" "$pkgdir/usr/bin/brave-vulkan"
 }
@@ -48,20 +48,23 @@ brave-vulkan --your-flags-here                  # extra args passed through

 The launcher sets:

- `VK_ICD_FILENAMES=/usr/lib/panvk-bifrost/icd.json` (the patched driver)
 - `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1` (Mesa upstream gate)
 - `MESA_VK_VERSION_OVERRIDE=1.2` (apiVersion bump for ANGLE)
- Brave flags: `--use-gl=disabled --enable-features=Vulkan --use-vulkan=native --ozone-platform=x11 --no-sandbox --disable-gpu-sandbox --ignore-gpu-blocklist`
+- Brave flags: `--use-gl=disabled --enable-features=Vulkan --use-vulkan=native --ozone-platform=x11 --ignore-gpu-blocklist`
+
+iter10 dropped `VK_ICD_FILENAMES` (ICD now at `/usr/share/vulkan/icd.d/00-panvk-bifrost.json` so the Vulkan loader auto-picks it, pinned deterministically via `VK_LOADER_DRIVERS_SELECT='00-panvk-bifrost*'`) and `--no-sandbox` / `--disable-gpu-sandbox` (env vars survive the GPU sandbox boundary without bypass).

 ## What's in the package

 - `/usr/lib/panvk-bifrost/libvulkan_panfrost.so` — patched Mesa Vulkan driver (Mesa 26.0.6 + 2 sed-applied patches)
- `/usr/lib/panvk-bifrost/icd.json` — Vulkan ICD JSON pointing at the patched .so (NOT auto-loaded; only via `VK_ICD_FILENAMES`)
+- `/usr/share/vulkan/icd.d/00-panvk-bifrost.json` — Vulkan ICD JSON pointing at the patched .so (Vulkan loader picks it deterministically via `VK_LOADER_DRIVERS_SELECT='00-panvk-bifrost*'` set by the launcher)
 - `/usr/bin/brave-vulkan` — launcher script

-System Mesa is untouched. The stock `/usr/lib/libvulkan_panfrost.so` and
-`/usr/share/vulkan/icd.d/panfrost_icd.json` continue to work for any
-other Vulkan app.
+System Mesa's binary `/usr/lib/libvulkan_panfrost.so` is untouched. The
+stock `panfrost_icd.json` is also untouched and continues to enumerate
+the same Mali-G52 device — apps see both drivers in
+`vkEnumeratePhysicalDevices` and pick by index (ANGLE picks first, which
+becomes ours by alphabetical priority).

 ## Co-existence

@@ -7,26 +7,35 @@
 #
 # Provided by the mesa-panvk-bifrost package. See:
 #   /usr/share/doc/mesa-panvk-bifrost/README
-#   ~/src/panvk-bifrost/phase8_iteration9_close.md (campaign close)
+#   ~/src/panvk-bifrost/phase8_iteration{9,10}_close.md
 #
 # Usage: brave-vulkan [brave args...]
 # Equivalent to: brave [VULKAN_FLAGS] [your args]
+#
+# iter10 changes vs iter9:
+#   - dropped VK_ICD_FILENAMES env (ICD now at /usr/share/vulkan/icd.d/
+#     with '00-' prefix so the Vulkan loader auto-picks ours first)
+#   - dropped --no-sandbox / --disable-gpu-sandbox (env vars survive the
+#     GPU sandbox boundary, no bypass needed)

 set -e

-# Patched Vulkan driver (from this package) — must point at the custom path
-# so we don't clash with the stock /usr/share/vulkan/icd.d/panfrost_icd.json
-export VK_ICD_FILENAMES=/usr/lib/panvk-bifrost/icd.json
+# Pin the Vulkan ICD selection to our package's ICD. The Vulkan loader's
+# readdir-order in /usr/share/vulkan/icd.d/ is implementation-defined
+# per Khronos LoaderDriverInterface — the '00-' filename prefix is NOT
+# spec-backed. VK_LOADER_DRIVERS_SELECT short-circuits the directory
+# enumeration and picks our ICD deterministically. (Phase 5 review
+# hardening, iter10.)
+export VK_LOADER_DRIVERS_SELECT='00-panvk-bifrost*'

 # PanVk's "I know it's not conformant" gate — the patched driver still
-# refuses to enumerate Bifrost without this env var (Mesa upstream choice,
-# kept for compatibility).
+# refuses to enumerate Bifrost without this env var (upstream Mesa choice
+# for v6/v7, kept for compatibility).
 export PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1

 # Override apiVersion to 1.2 — ANGLE (Chromium's GL stack) requires
-# device.apiVersion >= 1.1. The patched libvulkan_panfrost.so still has
-# a PAN_ARCH>=10 gate inside get_api_version(); easier to override at
-# runtime via this Mesa env var than to add a third patch.
+# device.apiVersion >= 1.1. Source patches don't move get_api_version()'s
+# PAN_ARCH>=10 hardcode; the env var override does.
 export MESA_VK_VERSION_OVERRIDE=1.2

 # Find the live Plasma session's Xauthority. On a fresh boot the suffix
@@ -55,7 +64,5 @@ exec brave \
    --enable-features=Vulkan \
    --use-vulkan=native \
    --ozone-platform=x11 \
-    --no-sandbox \
-    --disable-gpu-sandbox \
    --ignore-gpu-blocklist \
    "$@"
@@ -1,85 +0,0 @@
-#!/bin/bash
-# Build aish_<ver>_all.deb from this directory using dpkg-deb directly.
-# Run from inside the runner container, which has dpkg installed.
-#
-# Matches the lmcp build-deb.sh pattern: no dh/debhelper, no Build-Depends
-# beyond `dpkg`, structurally a normal apt package (Architecture: all).
-set -euo pipefail
-
-PKGVER=0.1.0
-UPSTREAM_TAG=v0.1.0
-PKGREL=1
-AISH_TARBALL_SHA256=9ebc3939e028832e39391ae33efacb5ec9bcd99d123cbc8ca1cd6ca9a640b5b5
-HERE=$(dirname "$(readlink -f "$0")")
-
-# Reproducible build: pin all file mtimes + ar member timestamps to a fixed
-# epoch tied to this packaging release (aish v0.1.0 — 2026-05-25 00:00 UTC).
-# Without this, repeat builds produce different byte streams and reprepro
-# refuses re-includes with "size expected: X, got: Y".
-export SOURCE_DATE_EPOCH=1779667200
-
-work=$(mktemp -d)
-trap "rm -rf $work" EXIT
-
-cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo aish.tar.gz \
-    "https://git.reauktion.de/marfrit/aish/archive/${UPSTREAM_TAG}.tar.gz"
-echo "$AISH_TARBALL_SHA256  aish.tar.gz" | sha256sum -c
-tar xzf aish.tar.gz
-
-ROOT="$work/pkgroot"
-LIBDIR="$ROOT/usr/share/lua/5.1/aish"
-mkdir -p "$ROOT/DEBIAN" \
-         "$LIBDIR/ffi" \
-         "$LIBDIR/vendor" \
-         "$ROOT/usr/bin" \
-         "$ROOT/usr/share/doc/aish/examples"
-
-# Top-level modules
-for m in main broker context executor history mcp renderer repl router safety secrets; do
-    cp "aish/${m}.lua" "$LIBDIR/${m}.lua"
-done
-
-# FFI bindings
-for m in curl libc pty readline; do
-    cp "aish/ffi/${m}.lua" "$LIBDIR/ffi/${m}.lua"
-done
-
-# Vendored dependencies
-cp aish/vendor/dkjson.lua "$LIBDIR/vendor/dkjson.lua"
-
-# Launch wrapper
-install -m 755 aish/bin/aish "$ROOT/usr/bin/aish"
-
-# Documentation + example config
-cp aish/README.md          "$ROOT/usr/share/doc/aish/"
-cp aish/LICENSE            "$ROOT/usr/share/doc/aish/"
-cp aish/examples/config.lua "$ROOT/usr/share/doc/aish/examples/"
-cp "$HERE/debian/copyright" "$ROOT/usr/share/doc/aish/copyright"
-cp "$HERE/debian/changelog" "$ROOT/usr/share/doc/aish/changelog.Debian"
-gzip -9 -n "$ROOT/usr/share/doc/aish/changelog.Debian"
-
-cat > "$ROOT/DEBIAN/control" <<EOF
-Package: aish
-Version: ${PKGVER}-${PKGREL}
-Section: shells
-Priority: optional
-Architecture: all
-Depends: luajit, libreadline8t64 | libreadline8, libcurl4t64 | libcurl4
-Maintainer: Markus Fritsche <mfritsche@reauktion.de>
-Homepage: https://git.reauktion.de/marfrit/aish
-Description: AI-augmented conversational shell (LuaJIT, FFI-only)
- aish is an interactive REPL that interleaves shell execution and
- language-model conversation against llama.cpp HTTP brokers. Pure
- LuaJIT 2.x with FFI bindings to libcurl, GNU readline, and libc.
- .
- Modules install under /usr/share/lua/5.1/aish/. The launcher is
- /usr/bin/aish. Example configuration is at
- /usr/share/doc/aish/examples/config.lua (copy to
- ~/.config/aish/config.lua and adapt).
-EOF
-
-# Build the .deb. Output to current dir of the caller.
-DEB_OUT=aish_${PKGVER}-${PKGREL}_all.deb
-dpkg-deb --root-owner-group --build "$ROOT" "$HERE/$DEB_OUT"
-echo "built: $HERE/$DEB_OUT"
@@ -1,14 +0,0 @@
-aish (0.1.0-1) bookworm trixie; urgency=medium
-
-  * Initial release packaged for marfrit overlay repo. Phases 0-10
-    complete (102 closed issues): local llama.cpp + cloud broker
-    routing via hossenfelder, MCP tool calls with confirm-gate and
-    per-tool auto_approve, Chuck Norris autonomous mode with
-    destructive-op heuristic, cross-session memory.jsonl, multi-model
-    routing + GBNF grammar passthrough, project file-tree context,
-    cost/usage observability, /tokenize endpoint integration, project
-    overlay (.aish.lua + sha256-pinned trust ledger), cloud preplanner
-    → local executor split.
-  * Source-of-truth: git.reauktion.de/marfrit/aish, tagged v0.1.0.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Mon, 25 May 2026 00:00:00 +0000
@@ -1,20 +0,0 @@
-Source: aish
-Section: shells
-Priority: optional
-Maintainer: Markus Fritsche <mfritsche@reauktion.de>
-Standards-Version: 4.6.2
-Homepage: https://git.reauktion.de/marfrit/aish
-
-Package: aish
-Architecture: all
-Depends: ${misc:Depends}, luajit, libreadline8t64 | libreadline8, libcurl4t64 | libcurl4
-Description: AI-augmented conversational shell (LuaJIT, FFI-only)
- aish is an interactive REPL that interleaves shell execution and language-
- model conversation against llama.cpp HTTP brokers. Implementation is pure
- LuaJIT 2.x with FFI bindings to libcurl, GNU readline, and libc — no C
- extensions, no build step.
- .
- Modules install under /usr/share/lua/5.1/aish/. The launcher is
- /usr/bin/aish. Example configuration is at
- /usr/share/doc/aish/examples/config.lua (copy to ~/.config/aish/config.lua
- and adapt).
@@ -1,30 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: aish
-Source: https://git.reauktion.de/marfrit/aish
-
-Files: *
-Copyright: 2026 Markus Fritsche <mfritsche@reauktion.de>
-License: MIT
- Permission is hereby granted, free of charge, to any person obtaining a copy
- of this software and associated documentation files (the "Software"), to deal
- in the Software without restriction, including without limitation the rights
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- copies of the Software, and to permit persons to whom the Software is
- furnished to do so, subject to the following conditions:
- .
- The above copyright notice and this permission notice shall be included in
- all copies or substantial portions of the Software.
- .
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
-
-Files: vendor/dkjson.lua
-Copyright: 2010-2014 David Heiko Kolf
-License: MIT
- Permission is hereby granted, free of charge, to any person obtaining a copy
- of this software and associated documentation files (the "Software"), to deal
- in the Software without restriction, including the rights to use, copy,
- modify, merge, publish, distribute, sublicense, and/or sell copies of the
- Software, and to permit persons to whom the Software is furnished to do so,
- subject to the following conditions: the above copyright notice and this
- permission notice shall be included in all copies or substantial portions of
- the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
@@ -1,150 +0,0 @@
-#!/bin/bash
-# Package pre-built chromium-fourier artifacts into a .deb.
-#
-# Chromium can't be compiled natively on any available aarch64 runner
-# (clang version wall — chromium requires its internal clang fork).
-# The build is cross-compiled on CT 220 (data, x86_64 Ryzen 7).
-# This script expects the build artifacts to exist at BUILD_DIR
-# (default: fetched from CT 220 via SSH).
-#
-# Sibling Arch package: ../../arch/chromium-fourier/PKGBUILD
-set -euo pipefail
-
-PKGVER=148.0.7778.178
-EPOCH=1
-PKGREL=1
-ARCH=arm64
-
-HERE=$(dirname "$(readlink -f "$0")")
-export SOURCE_DATE_EPOCH=1779854400  # 2026-05-24 09:00 UTC
-
-BUILD_DIR="${BUILD_DIR:-}"
-
-work=$(mktemp -d)
-trap "rm -rf $work" EXIT
-
-if [ -z "$BUILD_DIR" ]; then
-    echo "BUILD_DIR not set — fetching artifacts from CT 220 on data..."
-    BUILD_DIR="$work/artifacts"
-    mkdir -p "$BUILD_DIR"
-    ssh root@data "pct exec 220 -- tar -cf - -C /build/chromium/src/out/Default \
-        chrome chrome_crashpad_handler \
-        libEGL.so libGLESv2.so libvk_swiftshader.so libvulkan.so.1 \
-        vk_swiftshader_icd.json \
-        chrome_100_percent.pak chrome_200_percent.pak resources.pak \
-        v8_context_snapshot.bin snapshot_blob.bin icudtl.dat \
-        locales/" | tar -xf - -C "$BUILD_DIR"
-fi
-
-ROOT="$work/pkgroot"
-
-install -Dm755 "$BUILD_DIR/chrome"                   "$ROOT/usr/lib/chromium/chromium"
-install -Dm755 "$BUILD_DIR/chrome_crashpad_handler"  "$ROOT/usr/lib/chromium/chrome_crashpad_handler"
-
-for so in libEGL.so libGLESv2.so libvk_swiftshader.so libvulkan.so.1; do
-    [ -f "$BUILD_DIR/$so" ] && install -Dm755 "$BUILD_DIR/$so" "$ROOT/usr/lib/chromium/$so"
-done
-
-for icd in "$BUILD_DIR"/*_icd.json; do
-    [ -f "$icd" ] && install -Dm644 "$icd" "$ROOT/usr/lib/chromium/$(basename "$icd")"
-done
-
-for f in chrome_100_percent.pak chrome_200_percent.pak resources.pak \
-         v8_context_snapshot.bin snapshot_blob.bin icudtl.dat; do
-    [ -f "$BUILD_DIR/$f" ] && install -Dm644 "$BUILD_DIR/$f" "$ROOT/usr/lib/chromium/$f"
-done
-
-if [ -d "$BUILD_DIR/locales" ]; then
-    install -dm755 "$ROOT/usr/lib/chromium/locales"
-    cp -r "$BUILD_DIR/locales/"* "$ROOT/usr/lib/chromium/locales/"
-fi
-
-install -dm755 "$ROOT/usr/bin"
-cat > "$ROOT/usr/bin/chromium-fourier" <<'LAUNCHER'
-#!/bin/bash
-USER_HANDLES_VULKAN=0
-for arg in "$@"; do
-  case "$arg" in
-    --use-vulkan*|--enable-features=*Vulkan*|--disable-features=*Vulkan*|--use-angle=vulkan*)
-      USER_HANDLES_VULKAN=1
-      break
-      ;;
-  esac
-done
-
-vulkan_default=()
-if [ "$USER_HANDLES_VULKAN" = 0 ]; then
-  vulkan_default=(--disable-features=Vulkan)
-fi
-
-exec /usr/lib/chromium/chromium \
-  --ozone-platform=wayland \
-  --use-gl=angle --use-angle=gles \
-  --enable-features=AcceleratedVideoDecoder \
-  "${vulkan_default[@]}" \
-  "$@"
-LAUNCHER
-chmod 0755 "$ROOT/usr/bin/chromium-fourier"
-
-mkdir -p "$ROOT/usr/share/doc/chromium-fourier" "$ROOT/DEBIAN"
-install -Dm644 "$HERE/debian/copyright" \
-    "$ROOT/usr/share/doc/chromium-fourier/copyright"
-install -Dm644 "$HERE/debian/changelog" \
-    "$ROOT/usr/share/doc/chromium-fourier/changelog.Debian"
-gzip -9 -n "$ROOT/usr/share/doc/chromium-fourier/changelog.Debian"
-
-ISIZE=$(du -sk "$ROOT" | awk '{print $1}')
-cat > "$ROOT/DEBIAN/control" <<EOF
-Package: chromium-fourier
-Version: ${EPOCH}:${PKGVER}-${PKGREL}
-Section: web
-Priority: optional
-Architecture: ${ARCH}
-Installed-Size: ${ISIZE}
-Depends: libasound2,
-         libatk-bridge2.0-0,
-         libatk1.0-0,
-         libcairo2,
-         libcups2,
-         libdbus-1-3,
-         libdrm2,
-         libexpat1,
-         libfontconfig1,
-         libfreetype6,
-         libgbm1,
-         libglib2.0-0,
-         libgtk-3-0,
-         libnspr4,
-         libnss3,
-         libpango-1.0-0,
-         libpulse0,
-         libva2,
-         libwayland-client0,
-         libx11-6,
-         libxcb1,
-         libxkbcommon0,
-         libpipewire-0.3-0,
-         fonts-liberation,
-         v4l-utils
-Provides: www-browser
-Conflicts: chromium
-Maintainer: Markus Fritsche <mfritsche@reauktion.de>
-Homepage: https://www.chromium.org/
-Description: Chromium with V4L2 HW video decode for Rockchip (Wayland + mainline)
- Chromium ${PKGVER} with three patches enabling V4L2 hardware video
- decoding on mainline Linux / Wayland for Rockchip SoCs (RK3566 hantro,
- RK3588 VDPU381).
- .
- Cross-compiled from x86_64 using chromium's bundled clang (upstream
- LLVM cannot compile chromium). Runtime target is aarch64.
- .
- Patches: enable-v4l2-decoder-default, wayland-allow-direct-egl-gles2,
- nv12-external-oes-on-modifier-external-only.
- .
- Launcher at /usr/bin/chromium-fourier defaults to Wayland + ANGLE/GLES
- with Vulkan disabled (panvk on RK3566 breaks V4L2 dispatch).
-EOF
-
-DEB_OUT="chromium-fourier_${EPOCH}%3a${PKGVER}-${PKGREL}_${ARCH}.deb"
-dpkg-deb --root-owner-group --build "$ROOT" "$HERE/$DEB_OUT"
-echo "built: $HERE/$DEB_OUT"
@@ -1,8 +0,0 @@
-chromium-fourier (1:148.0.7778.178-1) trixie; urgency=medium
-
-  * Chromium 148.0.7778.178 with V4L2 HW decode patches for Rockchip.
-  * Cross-compiled from x86_64 using chromium's bundled clang.
-  * Three fourier patches: enable-v4l2-decoder-default,
-    wayland-allow-direct-egl-gles2, nv12-external-oes-on-modifier-external-only.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Sat, 24 May 2026 09:00:00 +0200
@@ -1,32 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: Chromium
-Upstream-Contact: chromium-dev@chromium.org
-Source: https://www.chromium.org/
-
-Files: *
-Copyright: The Chromium Authors
-License: BSD-3-Clause
-
-Files: debian/*
-Copyright: 2026 Markus Fritsche <mfritsche@reauktion.de>
-License: BSD-3-Clause
-
-License: BSD-3-Clause
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are met:
- .
- 1. Redistributions of source code must retain the above copyright notice,
-    this list of conditions and the following disclaimer.
- .
- 2. Redistributions in binary form must reproduce the above copyright
-    notice, this list of conditions and the following disclaimer in the
-    documentation and/or other materials provided with the distribution.
- .
- 3. Neither the name of the copyright holder nor the names of its
-    contributors may be used to endorse or promote products derived from
-    this software without specific prior written permission.
- .
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- ARE DISCLAIMED.
@@ -16,7 +16,7 @@ work=$(mktemp -d)
 trap "rm -rf $work" EXIT

 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo his.tar.gz \
+curl -sSLfo his.tar.gz \
    "https://git.reauktion.de/marfrit/claude-his-agent/archive/v${PKGVER}.tar.gz"
 echo "$HIS_TARBALL_SHA256  his.tar.gz" | sha256sum -c
 tar xzf his.tar.gz
@@ -14,9 +14,9 @@
 # Sibling userspace package: ../daedalus-v4l2/build-deb.sh
 set -euo pipefail

-UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
-PKGVER=0.1.0+r45+g872eec5
-PKGREL=1  # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED
+UPSTREAM_COMMIT=481279c9bffd19e32c8f3299897e9b63fc5a24aa
+PKGVER=0.1.0+r18+g481279c
+PKGREL=1  # reset for new upstream pin (481279c — Phase 8.13 close)
 MODULE_NAME=daedalus_v4l2

 HERE=$(dirname "$(readlink -f "$0")")
@@ -28,7 +28,7 @@ work=$(mktemp -d)
 trap "rm -rf $work" EXIT

 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
+curl -sSLfo daedalus-v4l2.tar.gz \
    "https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
 tar xzf daedalus-v4l2.tar.gz
 SRCDIR=daedalus-v4l2
@@ -78,6 +78,7 @@ set -e

 NAME=${MODULE_NAME}
 VERSION=${PKGVER}
+KERNELVER=\$(uname -r)

 # Yellow + bold ANSI for the warning so it stands out in apt's
 # stream of "Setting up" lines.  Disable colour on non-TTY.
@@ -100,56 +101,29 @@ if [ "\$1" = "configure" ]; then

    dkms add "\$NAME/\$VERSION" 2>/dev/null || true

-    # Enumerate every kernel whose headers are actually present
-    # (/lib/modules/<kver>/build resolves to a directory).  We iterate
-    # all of them — not just \$(uname -r) — so that installing this
-    # package after a kernel update covers the newly-installed kernel
-    # too, and so that a later kernel-headers install for a previously
-    # uncovered version gets picked up on dpkg-reconfigure.  Without
-    # this, autoinstall (which targets only the running kernel) leaves
-    # /dev/daedalus-v4l2 absent after a kernel switch + reboot
-    # (marfrit/marfrit-packages#64).
-    kvers=''
-    for d in /lib/modules/*/build; do
-        [ -d "\$d" ] || continue
-        k=\$(basename "\$(dirname "\$d")")
-        kvers="\$kvers \$k"
-    done
+    # Don't let autoinstall failure mask the actual problem behind '|| true'.
+    # Run it, capture the result, then verify post-condition.
+    autoinstall_rc=0
+    dkms autoinstall "\$NAME/\$VERSION" || autoinstall_rc=\$?

-    if [ -z "\$kvers" ]; then
+    # Verify the module actually built + installed for the running kernel.
+    status=\$(dkms status -m "\$NAME" -v "\$VERSION" -k "\$KERNELVER" 2>/dev/null || true)
+    if ! printf '%s\\n' "\$status" | grep -q -E 'installed|loaded'; then
        warn ""
-        warn "No kernels with headers found under /lib/modules/*/build."
-        warn "Install kernel headers (e.g. linux-headers-rpi-2712 on Pi OS)"
-        warn "then finish with:"
-        warn "  sudo dkms autoinstall \$NAME/\$VERSION"
-        exit 0
-    fi
-
-    failed=''
-    for k in \$kvers; do
-        dkms autoinstall -k "\$k" "\$NAME/\$VERSION" >/dev/null 2>&1 || true
-        s=\$(dkms status -m "\$NAME" -v "\$VERSION" -k "\$k" 2>/dev/null || true)
-        if ! printf '%s\\n' "\$s" | grep -q -E 'installed|loaded'; then
-            failed="\$failed \$k"
-        fi
-    done
-
-    if [ -n "\$failed" ]; then
+        warn "DKMS build did NOT land for kernel \$KERNELVER."
+        warn "  dkms status -m \$NAME -v \$VERSION -k \$KERNELVER:"
+        warn "    \$(printf '%s' "\$status" | head -1)"
        warn ""
-        warn "DKMS build did NOT land for kernel(s):\$failed"
-        warn ""
-        warn "Most likely cause: kernel headers missing for those versions."
+        warn "Most likely cause: kernel headers package is missing."
        warn "  Raspberry Pi OS / Pi 5:  apt install linux-headers-rpi-2712"
-        warn "  Debian generic:          apt install linux-headers-<version>"
+        warn "  Debian generic:          apt install linux-headers-\$KERNELVER"
        warn ""
-        warn "After installing headers, finish with:"
-        for k in \$failed; do
-            warn "  sudo dkms autoinstall -k \$k \$NAME/\$VERSION"
-        done
-        warn "  sudo modprobe daedalus_v4l2  (after booting that kernel)"
+        warn "After installing headers, finish the install with:"
+        warn "  sudo dkms autoinstall \$NAME/\$VERSION"
+        warn "  sudo modprobe daedalus_v4l2"
        warn ""
-        warn "Until then daedalus_v4l2 will NOT be loadable on those kernels"
-        warn "and the userspace daedalus-v4l2 daemon will have nothing to talk to."
+        warn "Until then daedalus_v4l2 will NOT be loadable and the"
+        warn "userspace daedalus-v4l2 daemon will have nothing to talk to."
    fi
 fi

@@ -1,134 +1,3 @@
-daedalus-v4l2-dkms (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
-
-  * Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
-    Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
-    to 1 MiB in include/daedalus_v4l2_proto.h.  The kernel module
-    inherits the larger DAEDALUS_MAX_BITSTREAM via the same #define
-    and daedalus_fill_output_fmt now reports OUTPUT_MPLANE
-    sizeimage = ~1 MiB instead of 65484.
-  * Skips the r33 -> r45 commit range — between 5d8b436 and 872eec5
-    only one kernel/include change landed (the PROTO_MAX_PAYLOAD
-    bump above).  The intervening daemon-only bumps (r37 / r39 /
-    r41 / r43) didn't touch kernel/ or include/ at all.
-  * Effective wire cap is min(kernel, daemon) — lock-step install
-    WITH daedalus-v4l2 0.1.0+r45+g872eec5 REQUIRED.
-  * Allocations (kmemdup / kmalloc on payload, vb2 plane backing)
-    are dynamic and sized per-payload at runtime; the bump only
-    sets the ceiling.  KMALLOC_MAX_SIZE on aarch64 SLUB is several
-    MiB so 1 MiB is well within bounds.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 21:00:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
-
-  * Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8.  Kernel
-    module returns to the pre-#7 buf_done_and_job_finish completion
-    model: no src/dst lifecycle decoupling, no parked dst_bufs, no
-    1:1-contract violation against libva-v4l2-request-fourier
-    (closes daedalus-v4l2#9 + #10 as won't-fix at this layer; proper
-    fix tracked at daedalus-v4l2#11).
-  * Wire-protocol drops 1 → 0; lock-step install with daedalus-v4l2
-    0.1.0+r33+g5d8b436 REQUIRED.
-  * Carries forward the #64 multi-kernel postinst fix.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:50:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r30+g6ffe92b-1) bookworm trixie; urgency=medium
-
-  * Bump to 6ffe92b — fixes the kernel panic regression introduced
-    by 79256dc's split-completion design (closes daedalus-v4l2#8).
-    `device_run` now removes both src + dst from `m2m_ctx`'s
-    rdy_queue at pickup time, not at `buf_done` time.  Without
-    this, after `SRC_CONSUMED`'s `job_finish` released the m2m
-    scheduler, the NEXT `device_run` saw the still-queued parked
-    dst_buf and paired it with a fresh src — two inflight entries
-    referencing the same vb2_buffer, the later `HAS_PIXELS`
-    triggered list_del on an already-detached list_head, smashing
-    the rdy_queue → hard reboot on Pi CM5 during `mpv vaapi-copy`
-    playback of 720p H.264 (2026-05-21).
-  * Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 1.
-    Daemon (userspace daedalus-v4l2 package) need NOT bump in
-    lockstep with this DKMS update; the existing
-    daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible with
-    daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.
-  * Carries forward the #64 multi-kernel postinst fix.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:00:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
-
-  * Bump to 79256dc — H.264 B-frame display reorder fix (closes
-    daedalus-v4l2#6).  libavcodec's H.264 decoder reorders output to
-    display order before returning from avcodec_receive_frame; the
-    daemon was binding each REQ_DECODE's pixels to the cookie of the
-    bitstream that triggered the receive_frame call, not the cookie
-    of the bitstream that actually produced the picture.  For B-frame
-    sequences this paired cookie N's CAPTURE buffer with cookie N-2's
-    pixels and silently lost intermediate frames — visible as
-    "2 1 4 3 6 5" frame pairing in mpv / Firefox on Pi CM5.
-  * Wire-protocol bump (DAEDALUS_PROTO_VERSION 0 → 1): REQ_DECODE
-    gains __u64 src_pts; RESP_FRAME gains __u32 flags +
-    __u64 output_src_pts.  Kernel + daemon must install atomically
-    (this package + daedalus-v4l2 0.1.0+r28+g79256dc).
-  * Carries forward the #64 multi-kernel postinst fix from -2:
-    autoinstall for every /lib/modules/*/build that resolves to real
-    headers, not just $(uname -r).
-  * Closes #64.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 12:00:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r24+gf0d4186-2) bookworm trixie; urgency=medium
-
-  * postinst: autoinstall for every installed kernel with headers, not
-    just the running one.  Previously `dkms autoinstall $NAME/$VERSION`
-    built only against `$(uname -r)`, so installing the package on
-    kernel A and then rebooting into a separately-installed kernel B
-    left /lib/modules/B/updates/dkms/ empty — /dev/daedalus-v4l2 absent,
-    daedalus daemon nothing to talk to, browser/VAAPI silently falling
-    back to software with no obvious diagnostic.  Now we enumerate every
-    /lib/modules/*/build that resolves to a real directory and run
-    `dkms autoinstall -k <kver>` for each, reporting per-kernel failure
-    only when headers are missing.  Closes #64.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 09:30:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r24+gf0d4186-1) bookworm trixie; urgency=medium
-
-  * Bump to f0d4186 — per-ctx vb2 lock fix.  daedalus_queue_init now
-    uses ctx->vb_mutex instead of ctx->dev->m2m_lock for each
-    vb2_queue's lock, unblocking Firefox's multi-process VAAPI
-    clients (they were colliding on the device-wide mutex and one
-    would EBUSY-fail S_FMT while another was mid-streamon).
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 23:00:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r22+g462aa4b-1) bookworm trixie; urgency=medium
-
-  * Bump to 462aa4b — kernel device_run() now calls
-    v4l2_ctrl_request_setup() before reading the H.264 stateless
-    control values from the bound media_request, so the values
-    daedalus ships to the userspace daemon match what the V4L2
-    client (libva-v4l2-request-fourier) actually set.  Closes the
-    libva→kernel control-binding gap that was causing decoded
-    frames to come back as best-effort zero garbage from libavcodec.
-  * Wire-ABI lockstep with daedalus-v4l2 0.1.0+r22+g462aa4b.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 22:00:00 +0000
-
-daedalus-v4l2-dkms (0.1.0+r20+g3dd0eb0-1) bookworm trixie; urgency=medium
-
-  * Bump to 3dd0eb0 — DAEMON-PPS kernel-side changes.  device_run()
-    now reads the V4L2 H.264 stateless control values from the bound
-    media_request and ships them to the daemon inside REQ_DECODE
-    via the new struct daedalus_h264_meta block (gated on
-    DAEDALUS_REQ_FLAG_H264_META).  Required for H.264 decode to
-    work via the libva-v4l2-request -> daedalus daemon path; daemon
-    synthesises AnnexB SPS+PPS NAL units from the structs.
-  * Wire-ABI lockstep with daedalus-v4l2 0.1.0+r20+g3dd0eb0 — install
-    both packages together.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 21:00:00 +0000
-
 daedalus-v4l2-dkms (0.1.0+r18+g481279c-1) bookworm trixie; urgency=medium

  * Bump to 481279c in lockstep with the userspace daedalus-v4l2
@@ -11,23 +11,13 @@
 # Upstream repo: https://git.reauktion.de/reauktion/daedalus-v4l2
 set -euo pipefail

-# 6e6dfa1 = picks up daedalus-v4l2 PR #16 — daemon now dlopens
-# the Kwiboo fourier fork's libavcodec.so.62 / libavformat.so.62 /
-# libavutil.so.60 at /opt/fourier instead of Debian-stock soname
-# 61/61/59.  First step on the daedalus-fourier substitution arc
-# (daedalus-v4l2#11): routes the daemon through the libavcodec
-# source tree we own in marfrit-packages.  Headers + .pc files
-# come from ffmpeg-v4l2-request-fourier (installed by the CI
-# workflow before this script runs; see PKG_CONFIG_PATH below).
-UPSTREAM_COMMIT=872eec505eb91b561892d02a0526749348ddc121
-PKGVER=0.1.0+r45+g872eec5
-PKGREL=1  # reset for new upstream pin (872eec5 — PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB, closes #19); lock-step with daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED
-
-# daedalus-fourier pin.  d87239d = marfrit/daedalus-fourier PR #1 merge
-# (install rules + pkg-config, enables this consumer to find_package
-# + link).  Bump in lockstep with the upstream daemon when daedalus-
-# fourier's API or installed shaders are changed by a new consumer.
-DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc
+# Same pin as the Arch PKGBUILD.  481279c = "Phase 8.13: byte-exact
+# end-to-end via libva (consumer target hit)" — first commit where the
+# full ffmpeg -hwaccel vaapi → libva → /dev/video0 → daemon path lands
+# a pixel-correct decoded frame back in ffmpeg.
+UPSTREAM_COMMIT=481279c9bffd19e32c8f3299897e9b63fc5a24aa
+PKGVER=0.1.0+r18+g481279c
+PKGREL=1  # reset for new upstream pin (481279c — Phase 8.13 close)

 HERE=$(dirname "$(readlink -f "$0")")

@@ -37,37 +27,14 @@ export SOURCE_DATE_EPOCH=1779231600
 work=$(mktemp -d)
 trap "rm -rf $work" EXIT

-# --- daedalus-fourier: fetch + build + install to per-build prefix ---
-#
-# Static-linked into the daemon, so the temp prefix is only for the
-# duration of this build script.  Requires libvulkan-dev + glslang-tools
-# on the runner (already needed for the daedalus-fourier benches).
-FOURIER_PREFIX=$work/fourier-prefix
-mkdir -p "$FOURIER_PREFIX"
-
 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
-    "https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
-tar xzf daedalus-fourier.tar.gz
-cd daedalus-fourier
-cmake -B build -G Ninja \
-    -DCMAKE_BUILD_TYPE=Release \
-    -DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
-cmake --build build --target daedalus_core
-cmake --install build
-
-# --- daedalus-v4l2: fetch + build daemon against installed daedalus-fourier ---
-
-cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
+curl -sSLfo daedalus-v4l2.tar.gz \
    "https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
 tar xzf daedalus-v4l2.tar.gz
 SRCDIR=daedalus-v4l2

-# Build daemon (CMake) — point pkg-config at the daedalus-fourier
-# temp prefix so pkg_check_modules(DAEDALUS_FOURIER …) resolves to it.
+# Build daemon (CMake)
 cd "$SRCDIR/daemon"
-PKG_CONFIG_PATH="$FOURIER_PREFIX/lib/pkgconfig:/opt/fourier/lib/pkgconfig" \
 cmake -B build -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr
@@ -1,215 +1,3 @@
-daedalus-v4l2 (0.1.0+r45+g872eec5-1) bookworm trixie; urgency=medium
-
-  * Bump to 872eec5 — picks up daedalus-v4l2 PR #20 (closes #19).
-    Wire-protocol cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB
-    to 1 MiB.  DAEDALUS_MAX_BITSTREAM follows; daedalus_fill_output_fmt
-    now reports OUTPUT_MPLANE sizeimage = ~1 MiB instead of 65484.
-    libva-v4l2-request-fourier's S_FMT-driven OUTPUT-pool resize
-    finally succeeds; Firefox no longer falls off to libmozavcodec
-    SW when an H.264 slice exceeds 64 KiB (routine on any
-    720p+ stream).
-  * #define-only change in include/daedalus_v4l2_proto.h; struct
-    layout unchanged.  But effective cap is min(kernel, daemon) —
-    lock-step install of this package WITH
-    daedalus-v4l2-dkms 0.1.0+r45+g872eec5 REQUIRED.
-  * Daemon-side allocations are dynamic (malloc-on-payload), so
-    the practical growth is one ~1 MiB read buffer per daemon
-    process at startup.  Negligible on Pi 5 / 8 GB.
-  * Picks up the same r43 -> r45 transition as daedalus-v4l2-dkms
-    (which had been stuck at r33+g5d8b436 since the parking-design
-    revert because the kernel module didn't change in r37/r39/r41/r43).
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 21:00:00 +0000
-
-daedalus-v4l2 (0.1.0+r43+g1d8f5af-1) bookworm trixie; urgency=medium
-
-  * Bump to 1d8f5af — picks up daedalus-v4l2 PR #18 (closes #17).
-    Daemon now drops degenerate (<4 byte) bitstreams at the REQ_DECODE
-    entry instead of letting avcodec_send_packet return
-    AVERROR_INVALIDDATA.  Reply RESP_FRAME with status=
-    DAEDALUS_DECODE_NO_FRAME so libva's V4L2 surface pool stays
-    healthy.
-  * Fixes the Firefox YouTube avc1 pause→resume regression observed
-    on higgs: libva-v4l2-request-fourier flushes a 3-byte stub
-    (presumably a bare NAL start code) into OUTPUT_MPLANE at the
-    pause boundary; the old INVALIDDATA error path made Firefox
-    fall off to libmozavcodec SW for the rest of the session.  With
-    this filter the daemon logs the sentinel as 'tiny bitstream 3
-    bytes — dropping as no-op' and the next real REQ_DECODE
-    proceeds normally.
-  * Wire protocol unchanged.  No daedalus-v4l2-dkms bump needed.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 17:30:00 +0000
-
-daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium
-
-  * Bump to 6e6dfa1 — daedalus-v4l2 PR #16.  Daemon dlopens Kwiboo
-    fourier fork's libavcodec.so.62 / libavformat.so.62 /
-    libavutil.so.60 at /opt/fourier instead of Debian-stock
-    soname 61/61/59.  First step on the daedalus-fourier
-    substitution arc (daedalus-v4l2#11): the next PR series
-    layers daedalus_recipe_dispatch_h264_* substitution patches
-    into ffmpeg-v4l2-request-fourier's H264DSPContext NEON init,
-    reaching the daemon's production decode path.
-  * Build: PKG_CONFIG_PATH now includes /opt/fourier/lib/pkgconfig
-    so daemon's pkg_check_modules picks up the Kwiboo .pc files.
-  * CI workflow build-deps: libavcodec-dev / libavformat-dev /
-    libavutil-dev (Debian stock 7.1.3) → ffmpeg-v4l2-request-fourier
-    (provides /opt/fourier/include + .pc files).
-  * Wire protocol unchanged.  No daedalus-v4l2-dkms bump.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 21:30:00 +0000
-
-daedalus-v4l2 (0.1.0+r39+g3bc0da1-1) bookworm trixie; urgency=medium
-
-  * Bump to 3bc0da1 — picks up daedalus-v4l2 PR #15.  Per-frame
-    `decoder: OK ...` log line gains `decode_us=N` (libavcodec
-    send_packet + receive_frame wall-clock cost in microseconds).
-    New `decoder stats` summary line every 60 decoded frames with
-    codec, fps, avg decode_us, MBs/s throughput, B/MB bitrate.
-  * Pure observability — no decode-path behaviour change.
-    Establishes baseline metrics for the substitution work in
-    daedalus-v4l2#11 step 2 (replacing libavcodec primitives with
-    daedalus-fourier kernels one cycle at a time).
-  * On Pi CM5 / bbb 720p H.264 baseline: ~4 ms decode_us / 24 fps
-    / 90 K MBs/s — workload is well under 1 % of any single
-    daedalus-fourier kernel's NEON ceiling.
-  * Wire protocol unchanged.  No daedalus-v4l2-dkms bump needed.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 18:30:00 +0000
-
-daedalus-v4l2 (0.1.0+r37+g77e14e5-1) bookworm trixie; urgency=medium
-
-  * Bump to 77e14e5 — picks up daedalus-v4l2 PRs #12 + #13.
-  * #12 (LOW_DELAY half-measure): the daemon now sets
-    AV_CODEC_FLAG_LOW_DELAY on the H.264 AVCodecContext so libavcodec
-    emits frames in decode order ~99% of the time (a few stragglers
-    at GOP boundaries when the stream's SPS num_reorder_frames
-    overrides the flag).  Visible improvement vs the 2-1-4-3
-    pair-swap on Firefox YouTube + mpv playback; not a permanent
-    fix (see #11 for the architectural plan).
-  * #13 (daedalus-fourier linkage): the daemon now pkg-config-links
-    against the daedalus-fourier kernel library (marfrit/
-    daedalus-fourier) and logs substrate availability at startup.
-    No kernels dispatched yet — this is the build-time / link-time
-    foundation for the H.264 daemon-rewrite plan in #11
-    (substituting daedalus-fourier IDCT 4×4 / IDCT 8×8 / luma
-    deblock primitives for libavcodec's per-MB pixel math, one
-    cycle at a time, measuring CPU saved per substitution).
-  * Build-deb.sh now fetches + builds + installs daedalus-fourier
-    (pinned at d87239d, marfrit/daedalus-fourier PR #1) into a
-    per-build temp prefix, then builds the daemon with
-    PKG_CONFIG_PATH pointing at it.  daedalus-fourier is
-    statically linked into the daemon binary, so the resulting
-    .deb has no new runtime deps.  Requires libvulkan-dev +
-    glslang-tools on the CI runner (the daedalus-fourier benches
-    already needed those).
-  * Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 0.
-    No daedalus-v4l2-dkms bump needed.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 16:30:00 +0000
-
-daedalus-v4l2 (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
-
-  * Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8 (the parking
-    design that broke libva-v4l2-request-fourier's 1:1 CAPTURE
-    contract; see daedalus-v4l2#9 + #10).  After daemon-r28+g79256dc
-    landed, mpv (--hwdec=vaapi-copy) failed pre-playing with
-    "Unable to dequeue buffer: Resource temporarily unavailable" /
-    "Failed to end picture decode" because the daemon parked CAPTURE
-    buffers waiting for libavcodec to release H.264 B-frames in
-    display order — violating the V4L2 stateless 1:1 contract.
-    Firefox tolerated the mess (visible "2 1 4 3" pair-swap); mpv
-    bailed.
-  * This bump restores f0d4186-equivalent behaviour, plus PR #4
-    (cosmetic H.264 DECODE_MODE / START_CODE menu controls).  PR #7
-    + PR #8 wire-protocol additions (src_pts / output_src_pts /
-    RESP_FRAME flags) are reverted — DAEDALUS_PROTO_VERSION drops
-    back from 1 → 0.  Lock-step install with daedalus-v4l2-dkms
-    0.1.0+r33+g5d8b436 REQUIRED.
-  * Visible regression: H.264 B-frame streams in Firefox revert to
-    the original "2 1 4 3 6 5" pair-swap visual.  The proper fix
-    (concurrent in-flight requests in daemon + display-order reorder
-    in libva-v4l2-request-fourier) is tracked at daedalus-v4l2#11.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:50:00 +0000
-
-daedalus-v4l2 (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
-
-  * Bump to 79256dc — H.264 B-frame display reorder fix (closes
-    daedalus-v4l2#6 + #4 menu controls).  Daemon side: the
-    avcodec_send_packet → receive_frame loop now stamps pkt->pts =
-    req->src_pts so libavcodec's display-ordered frame->pts identifies
-    which OUTPUT bitstream's pixels each drained frame belongs to.
-    chardev_client maintains a (src_pts → cookie) lookup table so the
-    daemon can ship pixels to the cookie of the *originating*
-    bitstream, not the cookie of whatever REQ triggered the
-    receive_frame call.  Multiple RESP_FRAME messages per REQ_DECODE
-    are now possible (one for the just-consumed src, one or more for
-    drained pixels).
-  * Wire-protocol bump (DAEDALUS_PROTO_VERSION 0 → 1): REQ_DECODE
-    gains __u64 src_pts; RESP_FRAME gains __u32 flags +
-    __u64 output_src_pts.  Daemon + kernel must install atomically
-    (this package + daedalus-v4l2-dkms 0.1.0+r28+g79256dc).
-  * Also subsumes 79256dc's predecessor 7ff2d89 — H.264 DECODE_MODE +
-    START_CODE menu-control registration that retires the
-    "Unable to set control(s) error_idx=2/2" warning libva-v4l2-
-    request emitted on every context init.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 12:00:00 +0000
-
-daedalus-v4l2 (0.1.0+r24+gf0d4186-1) bookworm trixie; urgency=medium
-
-  * Bump to f0d4186 — kernel per-ctx vb2 lock fix.  daedalus_queue_init
-    was wiring src_vq->lock and dst_vq->lock to ctx->dev->m2m_lock (a
-    device-wide mutex), serialising every vb2 ioctl across all
-    concurrent clients of /dev/video0.  For Firefox (which spawns
-    separate content + RDD + GPU processes that each open the device
-    and run libva probe simultaneously), one libva session's
-    S_FMT(OUTPUT_MPLANE) hit EBUSY while another was mid-streamon —
-    Firefox VAAPI playback fell apart at startup.
-  * Fix gives each open() its own ctx->vb_mutex; vb2 ioctls run
-    independently per client.  Matches cedrus / rkvdec / hantro
-    pattern.
-  * Verified on higgs: Firefox YouTube playback engages VAAPI cleanly,
-    sustained ~230 fps decode at 640x368 through the daedalus daemon,
-    zero EBUSY in stderr or daemon journal.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 23:00:00 +0000
-
-daedalus-v4l2 (0.1.0+r22+g462aa4b-1) bookworm trixie; urgency=medium
-
-  * Bump to 462aa4b — kernel-side fix for control-binding gap that
-    closes the libva→daemon SPS/PPS pipeline.  Kernel device_run now
-    calls v4l2_ctrl_request_setup() before reading ctrl->p_cur, so
-    the daemon's daedalus_h264_meta block actually carries THIS
-    request's V4L2 stateless H.264 control values instead of stale
-    /default ones.  Pairs with libva-v4l2-request-fourier r382+gc1bb444
-    (Fix 3 + Fix 4 from issue libva-v4l2-request-fourier#8).
-  * After-fix on higgs (Pi CM5): ffmpeg -hwaccel vaapi -i h264.mp4
-    produces unique decoded P-frames (per-frame fnv1a hashes differ)
-    and zero "error while decoding MB" / "reference frames exceeds
-    max" warnings.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 22:00:00 +0000
-
-daedalus-v4l2 (0.1.0+r20+g3dd0eb0-1) bookworm trixie; urgency=medium
-
-  * Bump to 3dd0eb0 — DAEMON-PPS H.264 SPS/PPS NAL synthesiser.
-    Daemon now reconstructs AnnexB SPS+PPS NAL units from the V4L2
-    stateless H.264 control structs (forwarded by the kernel via
-    a new struct daedalus_h264_meta block in REQ_DECODE) and
-    prepends them to the slice bitstream before feeding libavcodec.
-    Without this, ffmpeg -hwaccel vaapi on H.264 sources failed
-    with "non-existing PPS 0 referenced" even after LIBVA-1/-2
-    routing correctly delivered the request.
-  * Wire protocol: new DAEDALUS_REQ_FLAG_H264_META bit + struct
-    daedalus_h264_meta; daemon and kernel must be installed in
-    lockstep (this package + daedalus-v4l2-dkms 0.1.0+r20+g3dd0eb0).
-  * VP9 / AV1 paths unchanged.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 21:00:00 +0000
-
 daedalus-v4l2 (0.1.0+r18+g481279c-1) bookworm trixie; urgency=medium

  * Bump to 481279c.  Upstream landed the systemd unit + modules-load.d
@@ -1,137 +0,0 @@
-From f760c0541586f43334c02611fcb4c212c08ad576 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Thu, 21 May 2026 21:40:22 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 4x4 IDCT through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.idct_add (called per 4x4 block from the intra-4x4
-decode path in h264_mb.c) now dispatches through
-daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
-
-The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4x4)
-the recipe is CPU NEON, so this is effectively a NEON-to-NEON
-substitution with one extra dispatch call and recipe-table lookup.
-Provides the first end-to-end exercise of the daedalus-fourier
-kernel pack inside the libavcodec.so decode hot path; follow-up
-patches wire IDCT 8x8, luma-v deblock, and qpel mc20.
-
-The library context is process-global, lazily initialised under
-pthread_once on first call.  We pick the no-QPU constructor because
-libavcodec.so is loaded into arbitrary host processes
-(firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
-cannot assume the host has a usable Vulkan instance.  Higher cycles
-(deblock luma-v, MC) that benefit from the QPU will provision their
-own recipe-selected context once that path is wired.
-
-Bulk paths (idct_add16, idct_add16intra, idct_add8 — used for
-non-intra4x4 macroblocks) remain on the stock NEON .S implementations
-and will be batched through daedalus_recipe_dispatch_h264_idct4 with
-n_blocks>1 in a follow-up.
-
-Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
-green; see marfrit/daedalus-fourier/CYCLE_LOGS.md).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2.
---
- libavcodec/aarch64/Makefile               |  3 +-
- libavcodec/aarch64/h264_idct_daedalus.c   | 49 +++++++++++++++++++++++
- libavcodec/aarch64/h264dsp_init_aarch64.c |  3 +-
- 3 files changed, 53 insertions(+), 2 deletions(-)
- create mode 100644 libavcodec/aarch64/h264_idct_daedalus.c
-
-diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
-index 41ab025..7b95fb1 100644
--- a/libavcodec/aarch64/Makefile
-+++ b/libavcodec/aarch64/Makefile
-@@ -3,7 +3,8 @@ OBJS-$(CONFIG_AC3DSP)                   += aarch64/ac3dsp_init_aarch64.o
- OBJS-$(CONFIG_FDCTDSP)                  += aarch64/fdctdsp_init_aarch64.o
- OBJS-$(CONFIG_FMTCONVERT)               += aarch64/fmtconvert_init.o
- OBJS-$(CONFIG_H264CHROMA)               += aarch64/h264chroma_init_aarch64.o
-OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o
-+OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o \
-+                                           aarch64/h264_idct_daedalus.o
- OBJS-$(CONFIG_HUFFYUVDSP)               += aarch64/huffyuvdsp_init_aarch64.o
- OBJS-$(CONFIG_H264PRED)                 += aarch64/h264pred_init.o
- OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-new file mode 100644
-index 0000000..538d223
--- /dev/null
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -0,0 +1,49 @@
-+/*
-+ * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
-+ *
-+ * Routes H264DSPContext.idct_add through
-+ * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
-+ * The recipe layer picks the substrate (CPU NEON by default for
-+ * cycle 6; future cycles may dispatch to V3D opportunistically).
-+ *
-+ * FFmpeg's 4x4 block memory layout matches daedalus's column-major
-+ * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
-+ * sides destructively zero the block after the transform.
-+ *
-+ * The library context is process-global and lazily initialised under
-+ * pthread_once.  We pick the no-QPU constructor here because
-+ * libavcodec.so is loaded into arbitrary host processes
-+ * (firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
-+ * cannot assume the host has a usable Vulkan instance.  Higher cycles
-+ * (deblock, MC) that benefit from the QPU initialise their own
-+ * recipe-selected context once that path is wired.
-+ */
-+
-+#include <pthread.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+
-+#include <daedalus.h>
-+
-+#include "libavutil/attributes.h"
-+#include "libavcodec/h264dsp.h"
-+
-+static daedalus_ctx     *g_dctx;
-+static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-+
-+static void daedalus_ctx_init_once(void)
-+{
-+    g_dctx = daedalus_ctx_create_no_qpu();
-+}
-+
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-+{
-+    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
-+                                        block, 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index c684574..b993df2 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -66,6 +66,7 @@ void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride
-                                     int weights, int offset);
- 
- void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
-                              int16_t *block, int stride,
-@@ -139,7 +140,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-         c->biweight_pixels_tab[1] = ff_biweight_h264_pixels_8_neon;
-         c->biweight_pixels_tab[2] = ff_biweight_h264_pixels_4_neon;
- 
-        c->idct_add        = ff_h264_idct_add_neon;
-+        c->idct_add        = ff_h264_idct_add_daedalus;
-         c->idct_dc_add     = ff_h264_idct_dc_add_neon;
-         c->idct_add16      = ff_h264_idct_add16_neon;
-         c->idct_add16intra = ff_h264_idct_add16intra_neon;
-- 
-2.47.3
-
@@ -1,107 +0,0 @@
-From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 10:18:21 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.idct8_add (called per 8x8 block from the High-profile
-intra-8x8-DCT decode path in h264_mb.c) now dispatches through
-daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
-
-The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
-the recipe is CPU NEON, so this is effectively a NEON-to-NEON
-substitution layered on top of the cycle-6 IDCT 4x4 wiring.  Same
-pthread_once global context, same destructive-zero semantics; FFmpeg
-column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
-
-Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
-on the in-tree NEON .S code and will be batched through
-daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
-
-Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
-green).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
---
- libavcodec/aarch64/h264_idct_daedalus.c   | 29 ++++++++++++++++-------
- libavcodec/aarch64/h264dsp_init_aarch64.c |  3 ++-
- 2 files changed, 23 insertions(+), 9 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-index 538d223..cbb98af 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -1,14 +1,16 @@
- /*
- * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
-+ * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
-  *
- * Routes H264DSPContext.idct_add through
- * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
- * The recipe layer picks the substrate (CPU NEON by default for
- * cycle 6; future cycles may dispatch to V3D opportunistically).
-+ * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
-+ *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
-+ * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
-+ * recipe layer picks the substrate (CPU NEON by default for cycles
-+ * 6 + 7; future cycles may dispatch to V3D opportunistically).
-  *
- * FFmpeg's 4x4 block memory layout matches daedalus's column-major
- * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
- * sides destructively zero the block after the transform.
-+ * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
-+ * column-major convention: block[r + N*c] = coefficient at
-+ * (row r, col c) for N ∈ {4, 8}.  Both sides destructively zero the
-+ * block after the transform.
-  *
-  * The library context is process-global and lazily initialised under
-  * pthread_once.  We pick the no-QPU constructor here because
-@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
- }
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-     daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
-                                         block, 1, &meta);
- }
-+
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-+{
-+    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
-+                                        block, 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index b993df2..741e551 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
-                             const uint8_t nnzc[15 * 8]);
- 
- void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
-                              int16_t *block, int stride,
-@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-         c->idct_add16intra = ff_h264_idct_add16intra_neon;
-         if (chroma_format_idc <= 1)
-             c->idct_add8   = ff_h264_idct_add8_neon;
-        c->idct8_add       = ff_h264_idct8_add_neon;
-+        c->idct8_add       = ff_h264_idct8_add_daedalus;
-         c->idct8_dc_add    = ff_h264_idct8_dc_add_neon;
-         c->idct8_add4      = ff_h264_idct8_add4_neon;
-     } else if (have_neon(cpu_flags) && bit_depth == 10) {
-- 
-2.47.3
-
@@ -1,121 +0,0 @@
-From 68731c41d7ea68be0e912b128cb4e71fb56e8263 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 12:15:16 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-v deblock through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
-deblock, called per macroblock-row edge from the slice deblock
-loop) now dispatches through
-daedalus_recipe_dispatch_h264_deblock_luma_v instead of
-ff_h264_v_loop_filter_luma_neon.
-
-The recipe layer picks the substrate; for cycle 8 the daedalus
-docstring marks the kernel "CPU primary; QPU opportunistic", but
-the libavcodec.so context here is built with
-daedalus_ctx_create_no_qpu — process-global pthread_once init,
-shared with cycles 6/7.  QPU opportunism stays gated off until a
-follow-up adds an explicit feature flag (no implicit Vulkan init
-in arbitrary host processes).  In the meantime cycle 8 is a
-plumbing-only substitution, NEON-to-NEON via the daedalus recipe.
-
-Intra (bS=4) loop filter — c->v_loop_filter_luma_intra — stays on
-the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
-only covers the non-intra path per its docstring.
-
-FFmpeg `int alpha/beta/int8_t tc0[4]` → daedalus_h264_deblock_meta
-(int32_t alpha/beta + inline int8_t tc0[4]).  pix already points
-to row 0 of the bottom block per FFmpeg's deblock convention,
-satisfying daedalus's `dst_off >= 4 * dst_stride` constraint.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8.
---
- libavcodec/aarch64/h264_idct_daedalus.c   | 36 +++++++++++++++++++----
- libavcodec/aarch64/h264dsp_init_aarch64.c |  4 ++-
- 2 files changed, 33 insertions(+), 7 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
-index cbb98af..92365fa 100644
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -1,11 +1,14 @@
- /*
- * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
-  *
- * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
- *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
- * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
- * recipe layer picks the substrate (CPU NEON by default for cycles
- * 6 + 7; future cycles may dispatch to V3D opportunistically).
-+ * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-+ *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-+ *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-+ * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-+ * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-+ * so cycle 8 stays on the CPU NEON path until a separate change
-+ * gates QPU init on a daedalus-fourier feature flag).
-  *
-  * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
-  * column-major convention: block[r + N*c] = coefficient at
-@@ -40,6 +43,8 @@ static void daedalus_ctx_init_once(void)
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -60,3 +65,22 @@ void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-     daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
-                                         block, 1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
-+                                                 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
-index 741e551..85ac381 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
-+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
-@@ -27,6 +27,8 @@
- 
- void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-@@ -114,7 +116,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
-     int cpu_flags = av_get_cpu_flags();
- 
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-        c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_neon;
-+        c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-         c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_neon;
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
-- 
-2.47.3
-
@@ -1,82 +0,0 @@
-From 0d1292ea99bc4e5fa2da438259fa01a2374e3e04 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Fri, 22 May 2026 14:18:25 +0200
-Subject: [PATCH] avcodec/h264: restore AV_CODEC_FLAG_LOW_DELAY semantics
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-FFmpeg 8.x dropped the H.264 decoder's low_delay path —
-AV_CODEC_FLAG_LOW_DELAY no longer prevents
-h264_select_output_frame from running the display-order DPB
-output queue.  V4L2-stateless-style consumers (daedalus-v4l2
-daemon, libva-v4l2-request-fourier) that set the flag end up
-seeing the 2-1-4-3 pair-swap pattern on B-frame streams again.
-
-Restore the documented semantics:
-
-  - Early-exit at the top of h264_select_output_frame when the
-    flag is set: emit the just-decoded picture immediately as
-    next_output_pic, mirror the corruption / recovery-point
-    tracking the main path performs, and skip the entire
-    delayed_pic[] / POC reorder machinery.
-
-  - Suppress the SPS-driven has_b_frames clobber in
-    h264_field_start when the flag is set, so the per-slice
-    bitstream_restriction_flag re-pickup cannot reintroduce a
-    nonzero reorder buffer mid-stream.
-
-This is a fork-only change required by the daedalus-v4l2 daemon's
-one-frame-per-send_packet contract; upstream FFmpeg consumers that
-expect display-order output remain untouched (flag default = off).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 deblock
-+ flag-restoration follow-up.
---
- libavcodec/h264_slice.c | 23 +++++++++++++++++++++++
- 1 file changed, 23 insertions(+)
-
-diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
-index 97fab70..a7bfbd6 100644
--- a/libavcodec/h264_slice.c
-+++ b/libavcodec/h264_slice.c
-@@ -1308,6 +1308,28 @@ static int h264_select_output_frame(H264Context *h)
-     cur->mmco_reset = h->mmco_reset;
-     h->mmco_reset = 0;
- 
-+    /* AV_CODEC_FLAG_LOW_DELAY restore (FFmpeg 8.x dropped the H.264
-+     * decoder's low_delay path).  Bypass the display-order DPB
-+     * output queue: emit the just-decoded picture immediately, in
-+     * decode order, one per send_packet.  V4L2-stateless-style
-+     * consumers (daedalus-v4l2 daemon, libva-v4l2-request-fourier)
-+     * do their own POC-based reorder downstream and require this
-+     * behaviour. */
-+    if (h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) {
-+        h->next_output_pic    = cur;
-+        h->next_outputed_poc  = cur->poc;
-+        h->frame_recovered   |= cur->recovered;
-+        cur->recovered       |= h->frame_recovered & FRAME_RECOVERED_SEI;
-+        if (!cur->recovered) {
-+            if (!(h->avctx->flags  & AV_CODEC_FLAG_OUTPUT_CORRUPT) &&
-+                !(h->avctx->flags2 & AV_CODEC_FLAG2_SHOW_ALL))
-+                h->next_output_pic = NULL;
-+            else
-+                cur->f->flags |= AV_FRAME_FLAG_CORRUPT;
-+        }
-+        return 0;
-+    }
-+
-     if (sps->bitstream_restriction_flag ||
-         h->avctx->strict_std_compliance >= FF_COMPLIANCE_STRICT) {
-         h->avctx->has_b_frames = FFMAX(h->avctx->has_b_frames, sps->num_reorder_frames);
-@@ -1415,6 +1437,7 @@ static int h264_field_start(H264Context *h, const H264SliceContext *sl,
-     sps = h->ps.sps;
- 
-     if (sps->bitstream_restriction_flag &&
-+        !(h->avctx->flags & AV_CODEC_FLAG_LOW_DELAY) &&
-         h->avctx->has_b_frames < sps->num_reorder_frames) {
-         h->avctx->has_b_frames = sps->num_reorder_frames;
-     }
-- 
-2.47.3
-
@@ -1,139 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Sat, 23 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264qpel: route 8x8 mc20 through
- daedalus-fourier
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
-half-pel, 6-tap "put" variant — the canonical representative of the
-H.264 luma motion-compensation family) now dispatches through
-daedalus_recipe_dispatch_h264_qpel_mc20 instead of
-ff_put_h264_qpel8_mc20_neon.
-
-Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
-4-cycle libavcodec.so substitution sequence (6 IDCT 4x4 / 7 IDCT 8x8 /
-8 luma-v deblock / 9 qpel mc20).
-
-The recipe layer picks the substrate. Per docs/k9_h264qpel_mc20.md
-the verdict is CPU NEON: per-block 7.6 ns at 131 Mblock/s gives 135x
-margin over 30 fps 1080p, and the QPU dispatch floor (~250 ns)
-makes any V3D shader strictly worse. Substitution is plumbing-only,
-NEON-by-recipe — same daedalus_ctx_create_no_qpu pthread_once
-context shape the cycles 6/7/8 shims already own (kept SEPARATE
-from the H264DSP shim's ctx because H264QPEL is its own libavcodec
-Makefile module and link order does not guarantee a single .o
-owns the ctx symbol; one extra ~µs init per process, paid lazily).
-
-Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
-size tier stay on the in-tree NEON .S code. Per the cycle-9 phase-1
-rationale, mc20 8x8 is representative of the whole family's per-block
-cost — extending the substitution to other variants would multiply
-recipe-lookup overhead without changing the substrate verdict.
-
-Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
-cycle 9 green; M1 = 100% bit-exact across 10000 random blocks).
-
-No SONAME change, no Depends change.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
---
- libavcodec/aarch64/Makefile                |  3 +-
- libavcodec/aarch64/h264_qpel_daedalus.c    | 50 ++++++++++++++++++++++
- libavcodec/aarch64/h264qpel_init_aarch64.c |  4 +-
- 3 files changed, 55 insertions(+), 2 deletions(-)
- create mode 100644 libavcodec/aarch64/h264_qpel_daedalus.c
-
-diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
--- a/libavcodec/aarch64/Makefile
-+++ b/libavcodec/aarch64/Makefile
-@@ -7,7 +7,8 @@ OBJS-$(CONFIG_H264DSP)                  += aarch64/h264dsp_init_aarch64.o \
-                                            aarch64/h264_idct_daedalus.o
- OBJS-$(CONFIG_HUFFYUVDSP)               += aarch64/huffyuvdsp_init_aarch64.o
- OBJS-$(CONFIG_H264PRED)                 += aarch64/h264pred_init.o
-OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o
-+OBJS-$(CONFIG_H264QPEL)                 += aarch64/h264qpel_init_aarch64.o \
-+                                           aarch64/h264_qpel_daedalus.o
- OBJS-$(CONFIG_HPELDSP)                  += aarch64/hpeldsp_init_aarch64.o
- OBJS-$(CONFIG_IDCTDSP)                  += aarch64/idctdsp_init_aarch64.o
- OBJS-$(CONFIG_ME_CMP)                   += aarch64/me_cmp_init_aarch64.o
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
-new file mode 100644
--- /dev/null
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -0,0 +1,50 @@
-+/*
-+ * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
-+ * — daedalus-fourier substitution shim.
-+ *
-+ * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
-+ * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
-+ * ff_put_h264_qpel8_mc20_neon.  The recipe layer picks the substrate
-+ * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
-+ * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
-+ *
-+ * Sibling to libavcodec/aarch64/h264_idct_daedalus.c.  We keep a
-+ * SEPARATE process-global pthread_once context here instead of
-+ * sharing the H264DSP one because H264QPEL is its own libavcodec
-+ * Makefile module and link order does not guarantee a single .o
-+ * owns the ctx symbol.  The cost is one extra
-+ * daedalus_ctx_create_no_qpu (~µs) per process; daemon and host
-+ * processes pay this lazily on first MC call.
-+ *
-+ * FFmpeg H264QpelContext convention: both dst and src use a SINGLE
-+ * stride and `src` already points at the leftmost OUTPUT column
-+ * (col 0); the 6-tap filter reads cols -2..+3.  This matches
-+ * daedalus_recipe_dispatch_h264_qpel_mc20's documented contract
-+ * directly, so dst_off = src_off = 0.
-+ */
-+
-+#include <pthread.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+
-+#include <daedalus.h>
-+
-+#include "libavutil/attributes.h"
-+
-+static daedalus_ctx     *g_dctx;
-+static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-+
-+static void daedalus_ctx_init_once(void)
-+{
-+    g_dctx = daedalus_ctx_create_no_qpu();
-+}
-+
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-+
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride)
-+{
-+    static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 };
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
-+                                            1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c
-+++ b/libavcodec/aarch64/h264qpel_init_aarch64.c
-@@ -47,6 +47,8 @@ void ff_put_h264_qpel8_mc00_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t str
- void ff_put_h264_qpel8_mc10_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc20_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-+                                     ptrdiff_t stride);
- void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-@@ -184,7 +186,7 @@ av_cold void ff_h264qpel_init_aarch64(H264QpelContext *c, int bit_depth)
-
-         c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
-         c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
-        c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
-         c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
-         c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
-         c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
--
-2.47.3
@@ -1,92 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma-h deblock through daedalus-fourier
-
-Sibling of 0005 (which substituted v_loop_filter_luma).  Same
-NEON-to-NEON substitution: H264DSPContext.h_loop_filter_luma →
-daedalus_recipe_dispatch_h264_deblock_luma_h.  The H kernel landed
-in daedalus-fourier PR #9 (CPU NEON only — no QPU shader yet).
-
-libavcodec.so ctx is no-QPU per the existing 0003-0005 / 0007
-pattern; we cannot assume Vulkan in arbitrary host processes
-(firefox-fourier RDD, mpv-fourier, etc.).
-
-Intra (bS=4) h_loop_filter_luma_intra stays on the in-tree NEON .S
-code; daedalus_h264_deblock_meta only covers the non-intra path.
-An intra-h substitution can land once daedalus-fourier exposes a
-dispatch helper (the kernel already exists internally per PR #11).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 H.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:09:33.694760715 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:09:33.715603719 +0200
-@@ -1,9 +1,10 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma-v deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-  *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ *        H264DSPContext.h_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_h
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -45,6 +46,8 @@
- void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
- void ff_h264_v_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                          int alpha, int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -84,3 +87,22 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_v(g_dctx, pix, (size_t)stride,
-                                                  1, &meta);
- }
-+
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_h(g_dctx, pix, (size_t)stride,
-+                                                 1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:09:33.695937103 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:09:33.715541700 +0200
-@@ -31,6 +31,8 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                      int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                         int alpha, int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
- void ff_h264_h_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-@@ -117,7 +119,7 @@
- 
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-         c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-        c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_neon;
-+        c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_daedalus;
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
- 
--
-2.47.3
-
@@ -1,127 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma v/h deblock through daedalus-fourier
-
-Chroma siblings of 0005 (luma_v) and 0008 (luma_h).  Same
-NEON-to-NEON pattern via the daedalus recipe layer:
-
-  H264DSPContext.v_loop_filter_chroma →
-    daedalus_recipe_dispatch_h264_deblock_chroma_v
-  H264DSPContext.h_loop_filter_chroma →
-    daedalus_recipe_dispatch_h264_deblock_chroma_h
-
-Both kernels landed in daedalus-fourier PR #10.  Recipe table
-routes AUTO to CPU NEON (no chroma QPU shaders yet), so this
-is plumbing-only and stays bit-exact against the in-tree NEON.
-
-Intra chroma (bS=4) loop filters remain on in-tree NEON;
-daedalus_h264_deblock_meta covers the non-intra (bS<4) path.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 chroma.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:15:45.995368233 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:15:46.015839177 +0200
-@@ -1,10 +1,12 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h + chroma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
- *        H264DSPContext.v_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_v
- *        H264DSPContext.h_loop_filter_luma → daedalus_recipe_dispatch_h264_deblock_luma_h
-+ *        H264DSPContext.v_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_v
-+ *        H264DSPContext.h_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_h
-+ *        H264DSPContext.v_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_v
-+ *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -48,6 +50,10 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_luma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                          int alpha, int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -106,3 +112,41 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_h(g_dctx, pix, (size_t)stride,
-                                                  1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_chroma_v(g_dctx, pix, (size_t)stride,
-+                                                   1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    meta.tc0[0] = tc0[0];
-+    meta.tc0[1] = tc0[1];
-+    meta.tc0[2] = tc0[2];
-+    meta.tc0[3] = tc0[3];
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_chroma_h(g_dctx, pix, (size_t)stride,
-+                                                   1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:15:45.996482360 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:15:46.025604910 +0200
-@@ -39,8 +39,12 @@
-                                            int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
-+void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                           int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma422_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                           int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_intra_neon(uint8_t *pix, ptrdiff_t stride,
-@@ -123,11 +127,11 @@
-         c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
- 
-        c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_neon;
-+        c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-            c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_neon;
-+            c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
-         } else {
--
-2.47.3
-
@@ -1,126 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 12:30:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 luma intra deblock through daedalus-fourier
-
-Adds the bS=4 intra-strength variants of the already-substituted
-luma_v / luma_h deblock (0005, 0008).  Intra MBs and certain
-inter-MB edges (4x4 transform boundaries inside an Intra_NxN
-neighbour) force boundary strength to 4 per H.264 §8.7.2.1.
-
-  H264DSPContext.v_loop_filter_luma_intra →
-    daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  H264DSPContext.h_loop_filter_luma_intra →
-    daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-
-Both kernels landed in daedalus-fourier PR #11.  Recipe table
-routes AUTO to CPU NEON (no intra QPU shaders yet) — plumbing-
-only NEON-to-NEON via daedalus, bit-exact against the in-tree
-FFmpeg NEON path.
-
-Signature differs from bS<4: no tc0 argument.  The wrapper
-passes daedalus_h264_deblock_meta with alpha/beta set; tc0[] is
-ignored by the intra dispatch (bS=4 hardcodes the strength).
-
-Chroma intra variants are deferred to a follow-up PR because the
-chroma path has a 4:2:0 / 4:2:2 split (chroma_format_idc gating)
-that needs explicit conditional substitution to avoid running
-the 4:2:0-only daedalus dispatch on 4:2:2 chroma.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 8 intra.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:18:54.992244965 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:20:12.338122217 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h + chroma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter + intra) + chroma v/h deblock — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -7,6 +7,8 @@
-  *        H264DSPContext.h_loop_filter_luma   → daedalus_recipe_dispatch_h264_deblock_luma_h
-  *        H264DSPContext.v_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_v
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-+ *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-+ *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -54,6 +56,10 @@
-                                            int alpha, int beta, int8_t *tc0);
- void ff_h264_h_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                            int alpha, int beta, int8_t *tc0);
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -150,3 +156,34 @@
-     daedalus_recipe_dispatch_h264_deblock_chroma_h(g_dctx, pix, (size_t)stride,
-                                                    1, &meta);
- }
-+
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    /* tc0[] is ignored by the intra-strength dispatch (bS=4 hardcodes the strength). */
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_v_intra(g_dctx, pix, (size_t)stride,
-+                                                        1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+
-+    daedalus_recipe_dispatch_h264_deblock_luma_h_intra(g_dctx, pix, (size_t)stride,
-+                                                        1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:18:54.993349573 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:20:12.338265830 +0200
-@@ -35,8 +35,12 @@
-                                          int alpha, int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
-+void ff_h264_v_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                            int beta);
-+void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                int alpha, int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -124,8 +128,8 @@
-     if (have_neon(cpu_flags) && bit_depth == 8) {
-         c->v_loop_filter_luma   = ff_h264_v_loop_filter_luma_daedalus;
-         c->h_loop_filter_luma   = ff_h264_h_loop_filter_luma_daedalus;
-        c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_neon;
-        c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_neon;
-+        c->v_loop_filter_luma_intra= ff_h264_v_loop_filter_luma_intra_daedalus;
-+        c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_daedalus;
- 
-         c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
--
-2.47.3
-
@@ -1,101 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 13:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma DC Hadamard through daedalus-fourier
-
-Substitutes H264DSPContext.chroma_dc_dequant_idct in the
-4:2:0 / bit_depth=8 init path with a wrapper that composes
-the daedalus chroma DC Hadamard primitive (fourier PR #25)
-with qmul scaling FFmpeg does in one fused function.
-
-Bit-exact against ff_h264_chroma_dc_dequant_idct_8_c.
-Hadamard correctness gated by fourier PR #23 test suite.
-
-4:2:2 chroma stays on the in-tree 422 variant (same
-gating shape as 0009 chroma deblock substitution).
-
-Requires daedalus-fourier commit b9f9ff2 or later (PR #25
-exposing the public Hadamard symbol).  Pin bumps in PKGBUILD
-and build-deb.sh come in the same commit.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:38:32.019491484 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 13:38:32.033821507 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h (inter + intra) + chroma v/h deblock — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -9,6 +9,7 @@
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-+ *        H264DSPContext.chroma_dc_dequant_idct   → daedalus_h264_chroma_dc_hadamard_2x2 + caller-side qmul
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-  * is CPU primary with QPU opportunistic — the ctx below is no-QPU,
-@@ -60,6 +61,7 @@
-                                                 int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
- {
-@@ -187,3 +189,32 @@
-     daedalus_recipe_dispatch_h264_deblock_luma_h_intra(g_dctx, pix, (size_t)stride,
-                                                         1, &meta);
- }
-+
-+/* Composes daedalus_h264_chroma_dc_hadamard_2x2 with the qmul scaling
-+ * that FFmpeg's reference does in one fused function (h264idct_template.c
-+ * ff_h264_chroma_dc_dequant_idct).
-+ *
-+ * The 4 DC coefficients are scattered across the per-MB coefficient
-+ * buffer at offsets [r*stride + c*xStride] (stride=32, xStride=16).
-+ * Extract into a contiguous int16[4], run the Hadamard, then apply
-+ * the qmul scale and write back to the original positions.
-+ *
-+ * No daedalus ctx needed; the Hadamard is a pure stateless primitive.
-+ */
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul)
-+{
-+    enum { stride = 32, xStride = 16 };
-+    int16_t dc[4];
-+
-+    dc[0] = block[stride*0 + xStride*0];
-+    dc[1] = block[stride*0 + xStride*1];
-+    dc[2] = block[stride*1 + xStride*0];
-+    dc[3] = block[stride*1 + xStride*1];
-+
-+    daedalus_h264_chroma_dc_hadamard_2x2(dc);
-+
-+    block[stride*0 + xStride*0] = (int16_t)((int)dc[0] * qmul >> 7);
-+    block[stride*0 + xStride*1] = (int16_t)((int)dc[1] * qmul >> 7);
-+    block[stride*1 + xStride*0] = (int16_t)((int)dc[2] * qmul >> 7);
-+    block[stride*1 + xStride*1] = (int16_t)((int)dc[3] * qmul >> 7);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:38:32.020346459 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 13:38:32.033909804 +0200
-@@ -41,6 +41,7 @@
-                                            int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -135,6 +136,7 @@
-         c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-+            c->chroma_dc_dequant_idct = ff_h264_chroma_dc_dequant_idct_daedalus;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
--
-2.47.3
-
@@ -1,245 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 14:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264qpel: route remaining qpel 8x8 positions through daedalus-fourier
-
-Closes the H.264 qpel substitution.  Extends 0007 (which routed only
-mc20 put_) to ALL 15 useful positions in BOTH the put_ and avg_
-tables, skipping mc00 (integer copy / pointer-only fast path).
-
-29 substitutions total: 14 new put_ + 15 avg_.  Each is a uniform
-wrapper around daedalus_recipe_dispatch_h264_qpel_{avg_,}mcXY exposed
-by daedalus-fourier PRs #15-#20.
-
-All recipe-table entries route AUTO to CPU NEON (no QPU shaders
-for any qpel position other than mc20 yet), so this is plumbing-only
-NEON-to-NEON — bit-exact against the in-tree ff_*_h264_qpel8_*_neon
-path.
-
-16x16 qpel tables ([0][...]) stay on the in-tree NEON.  daedalus
-only exposes 8x8 today; 16x16 substitution can land once fourier
-provides those variants (likely just dispatching the 8x8 path four
-times with shifted dst/src offsets).
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc qpel buildout.
---
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c	2026-05-25 14:05:05.789298250 +0200
-+++ libavcodec/aarch64/h264_qpel_daedalus.c	2026-05-25 14:05:05.818358374 +0200
-@@ -1,10 +1,13 @@
- /*
- * H.264 luma qpel mc20 (8x8, horizontal half-pel, 6-tap "put")
- * — daedalus-fourier substitution shim.
-+ * H.264 luma qpel 8x8 — daedalus-fourier substitution shims (put_ + avg_).
-  *
- * Routes H264QpelContext.put_h264_qpel_pixels_tab[1][2] through
- * daedalus_recipe_dispatch_h264_qpel_mc20 instead of
- * ff_put_h264_qpel8_mc20_neon.  The recipe layer picks the substrate
-+ * Routes ALL 15 useful positions in H264QpelContext's 8x8 put_ and
-+ * avg_ tables through daedalus_recipe_dispatch_h264_qpel_mc{XY}
-+ * (skipping mc00 which is integer copy / FFmpeg's pointer-only fast
-+ * path).  Plumbing-only NEON-by-recipe — daedalus-fourier PRs #15-#20
-+ * exposed each variant via the same dispatch signature, so the
-+ * substitution is a uniform macro across put_/avg_ and across all
-+ * 15 mc positions.  The recipe layer picks the substrate
-  * (CPU NEON for cycle 9; QPU not viable — per-block 7.6 ns vs
-  * ~250 ns QPU dispatch floor, see docs/k9_h264qpel_mc20.md).
-  *
-@@ -48,3 +51,53 @@
-     daedalus_recipe_dispatch_h264_qpel_mc20(g_dctx, dst, src, (size_t)stride,
-                                             1, &meta);
- }
-+
-+
-+/* All other 8x8 qpel positions follow the same dispatch shape as mc20
-+ * above.  The macro collapses ~600 LOC of one-wrapper-per-variant
-+ * boilerplate (29 variants total: 14 put_ + 15 avg_). */
-+#define DEFINE_QPEL_WRAPPER(type, suffix, dispatch_fn)                          \
-+void ff_ ## type ## _h264_qpel8_ ## suffix ## _daedalus(uint8_t *dst,           \
-+    const uint8_t *src, ptrdiff_t stride);                                      \
-+void ff_ ## type ## _h264_qpel8_ ## suffix ## _daedalus(uint8_t *dst,           \
-+    const uint8_t *src, ptrdiff_t stride)                                       \
-+{                                                                               \
-+    static const daedalus_h264_qpel_meta meta = { .dst_off = 0, .src_off = 0 }; \
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);                         \
-+    dispatch_fn(g_dctx, dst, src, (size_t)stride, 1, &meta);                    \
-+}
-+
-+/* put_ variants (mc20 stays on the explicit definition above). */
-+DEFINE_QPEL_WRAPPER(put, mc10, daedalus_recipe_dispatch_h264_qpel_mc10)
-+DEFINE_QPEL_WRAPPER(put, mc30, daedalus_recipe_dispatch_h264_qpel_mc30)
-+DEFINE_QPEL_WRAPPER(put, mc01, daedalus_recipe_dispatch_h264_qpel_mc01)
-+DEFINE_QPEL_WRAPPER(put, mc11, daedalus_recipe_dispatch_h264_qpel_mc11)
-+DEFINE_QPEL_WRAPPER(put, mc21, daedalus_recipe_dispatch_h264_qpel_mc21)
-+DEFINE_QPEL_WRAPPER(put, mc31, daedalus_recipe_dispatch_h264_qpel_mc31)
-+DEFINE_QPEL_WRAPPER(put, mc02, daedalus_recipe_dispatch_h264_qpel_mc02)
-+DEFINE_QPEL_WRAPPER(put, mc12, daedalus_recipe_dispatch_h264_qpel_mc12)
-+DEFINE_QPEL_WRAPPER(put, mc22, daedalus_recipe_dispatch_h264_qpel_mc22)
-+DEFINE_QPEL_WRAPPER(put, mc32, daedalus_recipe_dispatch_h264_qpel_mc32)
-+DEFINE_QPEL_WRAPPER(put, mc03, daedalus_recipe_dispatch_h264_qpel_mc03)
-+DEFINE_QPEL_WRAPPER(put, mc13, daedalus_recipe_dispatch_h264_qpel_mc13)
-+DEFINE_QPEL_WRAPPER(put, mc23, daedalus_recipe_dispatch_h264_qpel_mc23)
-+DEFINE_QPEL_WRAPPER(put, mc33, daedalus_recipe_dispatch_h264_qpel_mc33)
-+
-+/* avg_ variants — all 15 useful positions. */
-+DEFINE_QPEL_WRAPPER(avg, mc10, daedalus_recipe_dispatch_h264_qpel_avg_mc10)
-+DEFINE_QPEL_WRAPPER(avg, mc20, daedalus_recipe_dispatch_h264_qpel_avg_mc20)
-+DEFINE_QPEL_WRAPPER(avg, mc30, daedalus_recipe_dispatch_h264_qpel_avg_mc30)
-+DEFINE_QPEL_WRAPPER(avg, mc01, daedalus_recipe_dispatch_h264_qpel_avg_mc01)
-+DEFINE_QPEL_WRAPPER(avg, mc11, daedalus_recipe_dispatch_h264_qpel_avg_mc11)
-+DEFINE_QPEL_WRAPPER(avg, mc21, daedalus_recipe_dispatch_h264_qpel_avg_mc21)
-+DEFINE_QPEL_WRAPPER(avg, mc31, daedalus_recipe_dispatch_h264_qpel_avg_mc31)
-+DEFINE_QPEL_WRAPPER(avg, mc02, daedalus_recipe_dispatch_h264_qpel_avg_mc02)
-+DEFINE_QPEL_WRAPPER(avg, mc12, daedalus_recipe_dispatch_h264_qpel_avg_mc12)
-+DEFINE_QPEL_WRAPPER(avg, mc22, daedalus_recipe_dispatch_h264_qpel_avg_mc22)
-+DEFINE_QPEL_WRAPPER(avg, mc32, daedalus_recipe_dispatch_h264_qpel_avg_mc32)
-+DEFINE_QPEL_WRAPPER(avg, mc03, daedalus_recipe_dispatch_h264_qpel_avg_mc03)
-+DEFINE_QPEL_WRAPPER(avg, mc13, daedalus_recipe_dispatch_h264_qpel_avg_mc13)
-+DEFINE_QPEL_WRAPPER(avg, mc23, daedalus_recipe_dispatch_h264_qpel_avg_mc23)
-+DEFINE_QPEL_WRAPPER(avg, mc33, daedalus_recipe_dispatch_h264_qpel_avg_mc33)
-+
-+#undef DEFINE_QPEL_WRAPPER
-diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64/h264qpel_init_aarch64.c
--- a/libavcodec/aarch64/h264qpel_init_aarch64.c	2026-05-25 14:05:05.790403989 +0200
-+++ libavcodec/aarch64/h264qpel_init_aarch64.c	2026-05-25 14:05:05.819136071 +0200
-@@ -50,6 +50,64 @@
- void ff_put_h264_qpel8_mc30_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-                                      ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc10_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc30_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc01_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc11_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc21_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc31_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc02_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc12_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc22_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc32_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc03_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc13_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc23_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_put_h264_qpel8_mc33_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc10_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc30_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc01_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc11_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc21_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc31_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc02_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc12_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc22_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc32_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc03_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc13_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc23_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
-+void ff_avg_h264_qpel8_mc33_daedalus(uint8_t *dst, const uint8_t *src,
-+                                  ptrdiff_t stride);
- void ff_put_h264_qpel8_mc01_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc11_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
- void ff_put_h264_qpel8_mc21_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
-@@ -164,21 +222,21 @@
-         c->put_h264_qpel_pixels_tab[0][15] = ff_put_h264_qpel16_mc33_neon;
- 
-         c->put_h264_qpel_pixels_tab[1][ 0] = ff_put_h264_qpel8_mc00_neon;
-        c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 1] = ff_put_h264_qpel8_mc10_daedalus;
-         c->put_h264_qpel_pixels_tab[1][ 2] = ff_put_h264_qpel8_mc20_daedalus;
-        c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_neon;
-        c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_neon;
-        c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_neon;
-        c->put_h264_qpel_pixels_tab[1][ 6] = ff_put_h264_qpel8_mc21_neon;
-        c->put_h264_qpel_pixels_tab[1][ 7] = ff_put_h264_qpel8_mc31_neon;
-        c->put_h264_qpel_pixels_tab[1][ 8] = ff_put_h264_qpel8_mc02_neon;
-        c->put_h264_qpel_pixels_tab[1][ 9] = ff_put_h264_qpel8_mc12_neon;
-        c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_neon;
-        c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_neon;
-        c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_neon;
-        c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_neon;
-        c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_neon;
-        c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_neon;
-+        c->put_h264_qpel_pixels_tab[1][ 3] = ff_put_h264_qpel8_mc30_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 4] = ff_put_h264_qpel8_mc01_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 5] = ff_put_h264_qpel8_mc11_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 6] = ff_put_h264_qpel8_mc21_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 7] = ff_put_h264_qpel8_mc31_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 8] = ff_put_h264_qpel8_mc02_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][ 9] = ff_put_h264_qpel8_mc12_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_daedalus;
-+        c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_daedalus;
- 
-         c->avg_h264_qpel_pixels_tab[0][ 0] = ff_avg_h264_qpel16_mc00_neon;
-         c->avg_h264_qpel_pixels_tab[0][ 1] = ff_avg_h264_qpel16_mc10_neon;
-@@ -198,21 +256,21 @@
-         c->avg_h264_qpel_pixels_tab[0][15] = ff_avg_h264_qpel16_mc33_neon;
- 
-         c->avg_h264_qpel_pixels_tab[1][ 0] = ff_avg_h264_qpel8_mc00_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 1] = ff_avg_h264_qpel8_mc10_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 2] = ff_avg_h264_qpel8_mc20_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 3] = ff_avg_h264_qpel8_mc30_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 4] = ff_avg_h264_qpel8_mc01_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 5] = ff_avg_h264_qpel8_mc11_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 6] = ff_avg_h264_qpel8_mc21_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 7] = ff_avg_h264_qpel8_mc31_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 8] = ff_avg_h264_qpel8_mc02_neon;
-        c->avg_h264_qpel_pixels_tab[1][ 9] = ff_avg_h264_qpel8_mc12_neon;
-        c->avg_h264_qpel_pixels_tab[1][10] = ff_avg_h264_qpel8_mc22_neon;
-        c->avg_h264_qpel_pixels_tab[1][11] = ff_avg_h264_qpel8_mc32_neon;
-        c->avg_h264_qpel_pixels_tab[1][12] = ff_avg_h264_qpel8_mc03_neon;
-        c->avg_h264_qpel_pixels_tab[1][13] = ff_avg_h264_qpel8_mc13_neon;
-        c->avg_h264_qpel_pixels_tab[1][14] = ff_avg_h264_qpel8_mc23_neon;
-        c->avg_h264_qpel_pixels_tab[1][15] = ff_avg_h264_qpel8_mc33_neon;
-+        c->avg_h264_qpel_pixels_tab[1][ 1] = ff_avg_h264_qpel8_mc10_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 2] = ff_avg_h264_qpel8_mc20_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 3] = ff_avg_h264_qpel8_mc30_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 4] = ff_avg_h264_qpel8_mc01_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 5] = ff_avg_h264_qpel8_mc11_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 6] = ff_avg_h264_qpel8_mc21_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 7] = ff_avg_h264_qpel8_mc31_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 8] = ff_avg_h264_qpel8_mc02_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][ 9] = ff_avg_h264_qpel8_mc12_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][10] = ff_avg_h264_qpel8_mc22_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][11] = ff_avg_h264_qpel8_mc32_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][12] = ff_avg_h264_qpel8_mc03_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][13] = ff_avg_h264_qpel8_mc13_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][14] = ff_avg_h264_qpel8_mc23_daedalus;
-+        c->avg_h264_qpel_pixels_tab[1][15] = ff_avg_h264_qpel8_mc33_daedalus;
-     } else if (have_neon(cpu_flags) && bit_depth == 10) {
-         c->put_h264_qpel_pixels_tab[0][ 1] = ff_put_h264_qpel16_mc10_neon_10;
-         c->put_h264_qpel_pixels_tab[0][ 2] = ff_put_h264_qpel16_mc20_neon_10;
--
-2.47.3
-
@@ -1,120 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: claude-noether <claude-noether@noreply.localhost>
-Date: Sun, 25 May 2026 14:30:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 chroma intra deblock (4:2:0) through daedalus-fourier
-
-Substitutes c->v_loop_filter_chroma_intra and c->h_loop_filter_chroma_intra
-with daedalus wrappers in the bit_depth=8 / chroma_format_idc<=1 (4:2:0)
-branch.  4:2:2 stays on the in-tree NEON path (the daedalus chroma intra
-dispatch is 4:2:0-only).
-
-The fourier dispatches were exposed in PR #11 (DEFINE_INTRA_DISPATCH
-macro generates the public daedalus_dispatch_h264_deblock_chroma_*_intra
-symbols + recipe wrappers).
-
-Re-architects the chroma init: v_loop_filter_chroma_intra was previously
-assigned unconditionally to the NEON variant (which works for both 4:2:0
-and 4:2:2).  We now assign it INSIDE both branches of the chroma_format_idc
-conditional, with the 4:2:0 branch picking daedalus and the 4:2:2 branch
-keeping NEON.  No regression for 4:2:2 streams.
-
-Same NEON-to-NEON via recipe shape as 0010 luma intra.
-
-Refs reauktion/daedalus-v4l2#11 — substitution arc chroma intra.
---
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 14:21:08.267156263 +0200
-+++ libavcodec/aarch64/h264_idct_daedalus.c	2026-05-25 14:21:08.287745931 +0200
-@@ -1,5 +1,5 @@
- /*
- * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-+ * H.264 4x4 / 8x8 IDCT + luma v/h (inter+intra) + chroma v/h (inter+intra) deblock + chroma DC Hadamard — daedalus-fourier substitution shims.
-  *
-  * Routes H264DSPContext.idct_add           → daedalus_recipe_dispatch_h264_idct4
-  *        H264DSPContext.idct8_add          → daedalus_recipe_dispatch_h264_idct8
-@@ -9,6 +9,8 @@
-  *        H264DSPContext.h_loop_filter_chroma → daedalus_recipe_dispatch_h264_deblock_chroma_h
-  *        H264DSPContext.v_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_v_intra
-  *        H264DSPContext.h_loop_filter_luma_intra → daedalus_recipe_dispatch_h264_deblock_luma_h_intra
-+ *        H264DSPContext.v_loop_filter_chroma_intra → daedalus_recipe_dispatch_h264_deblock_chroma_v_intra
-+ *        H264DSPContext.h_loop_filter_chroma_intra → daedalus_recipe_dispatch_h264_deblock_chroma_h_intra
-  *        H264DSPContext.chroma_dc_dequant_idct   → daedalus_h264_chroma_dc_hadamard_2x2 + caller-side qmul
-  * instead of the in-tree ff_h264_*_neon assembly.  The recipe layer
-  * picks the substrate (CPU NEON for cycles 6 + 7 by default; cycle 8
-@@ -61,6 +63,10 @@
-                                                 int alpha, int beta);
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
- void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
- 
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
-@@ -218,3 +224,30 @@
-     block[stride*1 + xStride*0] = (int16_t)((int)dc[2] * qmul >> 7);
-     block[stride*1 + xStride*1] = (int16_t)((int)dc[3] * qmul >> 7);
- }
-+
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    /* tc0[] unused for intra (bS=4 hardcodes the strength). */
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_deblock_chroma_v_intra(g_dctx, pix, (size_t)stride,
-+                                                          1, &meta);
-+}
-+
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta)
-+{
-+    daedalus_h264_deblock_meta meta = {
-+        .dst_off = 0,
-+        .alpha   = alpha,
-+        .beta    = beta,
-+    };
-+    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
-+    daedalus_recipe_dispatch_h264_deblock_chroma_h_intra(g_dctx, pix, (size_t)stride,
-+                                                          1, &meta);
-+}
-diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 14:21:08.268311057 +0200
-+++ libavcodec/aarch64/h264dsp_init_aarch64.c	2026-05-25 14:21:08.287886563 +0200
-@@ -42,6 +42,10 @@
- void ff_h264_h_loop_filter_luma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-                                                 int alpha, int beta);
- void ff_h264_chroma_dc_dequant_idct_daedalus(int16_t *block, int qmul);
-+void ff_h264_v_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
-+void ff_h264_h_loop_filter_chroma_intra_daedalus(uint8_t *pix, ptrdiff_t stride,
-+                                                 int alpha, int beta);
- void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
-                                        int beta, int8_t *tc0);
- void ff_h264_v_loop_filter_chroma_daedalus(uint8_t *pix, ptrdiff_t stride,
-@@ -133,14 +137,15 @@
-         c->h_loop_filter_luma_intra= ff_h264_h_loop_filter_luma_intra_daedalus;
- 
-         c->v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_daedalus;
-        c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
- 
-         if (chroma_format_idc <= 1) {
-             c->chroma_dc_dequant_idct = ff_h264_chroma_dc_dequant_idct_daedalus;
-+            c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_daedalus;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_daedalus;
-            c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon;
-+            c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_daedalus;
-             c->h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon;
-         } else {
-+            c->v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon;
-             c->h_loop_filter_chroma = ff_h264_h_loop_filter_chroma422_neon;
-             c->h_loop_filter_chroma_mbaff = ff_h264_h_loop_filter_chroma_neon;
-             c->h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma422_intra_neon;
--
-2.47.3
-
@@ -1,85 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Mon, 25 May 2026 21:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264: use QPU-capable daedalus ctx (bench
- shows 4.30x faster on Pi 5)
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Patches 0003 (IDCT 4x4) and 0007 (qpel mc20) created the libavcodec.so
-process-global daedalus_ctx via daedalus_ctx_create_no_qpu().  Rationale
-at the time: cycle 6/9 had only CPU NEON paths, so a QPU-capable ctx
-would have meant pointless Vulkan init in every host process (firefox-
-fourier, mpv-fourier, daedalus_v4l2_daemon, ...).
-
-Two things changed since:
-
-  1. Every H.264 hot-path primitive now has a V3D7 compute shader.
-     IDCT 4x4/8x8 (cycles 6, 7), 8 deblock variants (luma+chroma x V+H
-     x inter+intra), 30 qpel positions (15 put_ + 15 avg_).  See
-     daedalus-fourier PRs #28-#35.
-
-  2. Dispatch overhead has been hammered down — buffer pool in
-     v3d_runner (daedalus-fourier task #160) plus persistent command
-     buffer (task #161).  daedalus-fourier PR #36 bench measures the
-     1080p worst-case sum on hertz (Pi 5 V3D 7.1, 30 iters x 5 warmup):
-
-       kernel             CPU ns/op  QPU ns/op  winner
-       IDCT 4x4 luma          10.79       2.47  QPU 4.36x
-       IDCT 8x8 luma          29.69       9.23  QPU 3.22x
-       Deblock luma_v         17.58      10.21  QPU 1.72x
-       Deblock luma_h         38.41       9.98  QPU 3.85x
-       qpel mc20 (8x8)        28.24       9.66  QPU 2.92x
-       qpel mc02 (8x8)        16.96      20.54  CPU 1.21x
-       qpel mc22 (8x8)        71.58       9.64  QPU 7.43x
-
-       1080p worst-case sum (IDCT4 + deblock luma + qpel mc22):
-         CPU NEON only:  5.57 ms
-         QPU only:       1.30 ms   (CPU/QPU sum ratio = 4.30x)
-
-PR #10's verdict (CPU 4x faster than QPU at IDCT) is reversed.  Switch
-the substitution context to daedalus_ctx_create() in both H.264 TUs
-(h264_idct_daedalus.c, h264_qpel_daedalus.c) so the recipe layer can
-actually route through the now-faster QPU path.
-
-daedalus_ctx_create() probes for a usable Vulkan device and falls back
-to no_qpu mode if unavailable, so this is safe on hosts without V3D
-(x86 reauktion build runners, debian-aarch64 builders without renderD,
-etc.).  Hosts WITH V3D (Pi 5 deployment targets) get the speedup.
-
-The remaining qpel mc02 anomaly (single-axis vertical filter, 1.21x
-CPU) is bench-flagged for a v2 shader follow-up; the recipe entry
-stays QPU since the policy decree (2026-05-23 substrate decree) holds
-and the gap is marginal.
-
-Refs reauktion/daedalus-fourier!36.
---
- libavcodec/aarch64/h264_idct_daedalus.c | 2 +-
- libavcodec/aarch64/h264_qpel_daedalus.c | 2 +-
- 2 files changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -32,7 +32,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create_no_qpu();
-+    g_dctx = daedalus_ctx_create();
- }
-
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -38,7 +38,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create_no_qpu();
-+    g_dctx = daedalus_ctx_create();
- }
-
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -1,73 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Markus Fritsche <mfritsche@reauktion.de>
-Date: Mon, 25 May 2026 22:00:00 +0200
-Subject: [PATCH] avcodec/aarch64/h264: revert ctx flip — daedalus-fourier PR
- #36 was a measurement artifact
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Reverts the daedalus_ctx_create_no_qpu() → daedalus_ctx_create() flip
-that landed in 0014-h264-ctx-qpu-capable.patch (marfrit-packages PR
-#104).  The flip was justified by daedalus-fourier PR #36 which
-reported a 4.30x QPU-over-CPU win on the 1080p H.264 hot-path sum.
-
-That number was a measurement artifact.  The bench tool's
-v3d_runner.read_spv() did a bare fopen() that resolved relative to
-cwd; when run from the source directory (as in PR #36), the SPVs at
-$builddir/v3d_*.spv were not found, every QPU dispatch returned -1
-fast, and the loop timed the failure path.  Daedalus-fourier PR #37
-fixes the SPV search + bench preflight; corrected numbers from hertz
-(Pi 5 V3D 7.1) show QPU is 12-77x SLOWER than CPU NEON at every
-H.264 hot-path kernel:
-
-  kernel             CPU ns/op  QPU ns/op  winner
-  IDCT 4x4 luma          10.75     217.63  CPU 20.24x
-  IDCT 8x8 luma          29.69     785.94  CPU 26.47x
-  Deblock luma_v         17.63     467.42  CPU 26.51x
-  Deblock luma_h         38.30     498.53  CPU 13.02x
-  qpel mc20 (8x8)        30.17    1300.44  CPU 43.10x
-  qpel mc02 (8x8)        17.69    1363.40  CPU 77.08x
-  qpel mc22 (8x8)        71.60    1948.37  CPU 27.21x
-
-  1080p sum: CPU 5.57 ms vs QPU 123.54 ms — QPU 22x slower.
-
-Until the daedalus QPU dispatch overhead is actually competitive (a
-multi-task effort tracked on the daedalus-fourier side), the
-libavcodec.so substitution must stay on daedalus_ctx_create_no_qpu()
-to avoid pessimizing every host process that loads it
-(firefox-fourier RDD, mpv-fourier, daedalus_v4l2_daemon).
-
-Both H.264 TUs (h264_idct_daedalus.c, h264_qpel_daedalus.c) are
-reverted; the change is a 2-line revert of patch 0014.
-
-Refs reauktion/daedalus-fourier!37 (the retraction PR).
---
- libavcodec/aarch64/h264_idct_daedalus.c | 2 +-
- libavcodec/aarch64/h264_qpel_daedalus.c | 2 +-
- 2 files changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
--- a/libavcodec/aarch64/h264_idct_daedalus.c
-+++ b/libavcodec/aarch64/h264_idct_daedalus.c
-@@ -32,7 +32,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create();
-+    g_dctx = daedalus_ctx_create_no_qpu();
- }
-
- void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
-diff --git a/libavcodec/aarch64/h264_qpel_daedalus.c b/libavcodec/aarch64/h264_qpel_daedalus.c
--- a/libavcodec/aarch64/h264_qpel_daedalus.c
-+++ b/libavcodec/aarch64/h264_qpel_daedalus.c
-@@ -38,7 +38,7 @@ static pthread_once_t    g_dctx_once = PTHREAD_ONCE_INIT;
-
- static void daedalus_ctx_init_once(void)
- {
-    g_dctx = daedalus_ctx_create();
-+    g_dctx = daedalus_ctx_create_no_qpu();
- }
-
- void ff_put_h264_qpel8_mc20_daedalus(uint8_t *dst, const uint8_t *src, ptrdiff_t stride);
@@ -33,19 +33,7 @@ FFMPEG_VERSION=8.1
 # epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
 # +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
 PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
-PKGREL=12  # pkgrel=12 — REVERT pkgrel=11 ctx flip; daedalus-fourier PR #36 4.30x headline was measurement artifact (PR #37 corrects: QPU 22x SLOWER than CPU)
-           # (cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes
-           # the libavcodec.so substitution sequence 6 IDCT4 / 7 IDCT8 /
-           # 8 luma-v deblock / 9 qpel mc20).  Pulls daedalus-fourier PR #2
-           # which extends the public API with
-           # daedalus_recipe_dispatch_h264_qpel_mc20.  (2026-05-23)
-
-# daedalus-fourier pin.  209a421 = daedalus-fourier PR #2 merge — public
-# API now exposes daedalus_recipe_dispatch_h264_qpel_mc20 +
-# DAEDALUS_KERNEL_H264_QPEL_MC20.  Cycle 9 plumbs the last H.264 NEON
-# kernel through the recipe layer.  Daemon-side build (debian/daedalus-v4l2)
-# can bump in a follow-up; this PR only changes the libavcodec.so consumer.
-DAEDALUS_FOURIER_COMMIT=b9f9ff2a89c068aea54dcb52b543afddad28311e  # PR #25 — public chroma DC Hadamard
+PKGREL=2  # pkgrel=2 — Path A move to /opt/fourier prefix (2026-05-19)

 HERE=$(dirname "$(readlink -f "$0")")

@@ -69,46 +57,6 @@ fi
 # Apply patches (same as Arch).
 patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch"
 patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
-patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0005-h264-deblock-luma-v-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0006-h264-restore-low-delay.patch"
-patch -Np1 -i "$HERE/0007-h264-qpel-mc20-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0008-h264-deblock-luma-h-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0009-h264-deblock-chroma-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0010-h264-deblock-luma-intra-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0011-h264-chroma-dc-hadamard-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0012-h264-qpel-rest-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0013-h264-deblock-chroma-intra-daedalus-fourier.patch"
-patch -Np1 -i "$HERE/0014-h264-ctx-qpu-capable.patch"
-patch -Np1 -i "$HERE/0015-h264-ctx-revert-to-no-qpu.patch"
-
-# --- daedalus-fourier: fetch + build static .a with PIC, install to a
-# per-build prefix; libavcodec.so links it into the shared object so
-# H264DSPContext.idct_add (and follow-up kernels) dispatch through the
-# daedalus recipe layer instead of the in-tree NEON .S code. ---
-#
-# PIC is mandatory — the static .a is linked into a .so, so all object
-# code must be relocatable.  Vulkan is PUBLIC-linked by daedalus_core
-# (queryable QPU substrate); we add libvulkan1 to Debian Depends below
-# so dlopen of libavcodec.so.62 succeeds on stock trixie.
-FOURIER_PREFIX=$work/fourier-prefix
-mkdir -p "$FOURIER_PREFIX"
-
-pushd "$work" >/dev/null
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
-    "https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
-tar xzf daedalus-fourier.tar.gz
-pushd daedalus-fourier >/dev/null
-cmake -B build -G Ninja \
-    -DCMAKE_BUILD_TYPE=Release \
-    -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-    -DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
-cmake --build build --target daedalus_core
-cmake --install build
-popd >/dev/null
-popd >/dev/null
-cd "$work/FFmpeg"

 # Configure with Arch-parity flags.  Drops the same set of features
 # (X11, AMF, CUDA, FireWire, AviSynth, Bluray, OpenMPT, JPEG-XL,
@@ -125,9 +73,6 @@ cd "$work/FFmpeg"
    --mandir=/opt/fourier/share/man \
    --extra-ldexeflags='-Wl,-rpath,/opt/fourier/lib' \
    --extra-ldsoflags='-Wl,-rpath,/opt/fourier/lib' \
-    --extra-cflags="-I${FOURIER_PREFIX}/include" \
-    --extra-ldflags="-L${FOURIER_PREFIX}/lib" \
-    --extra-libs="-ldaedalus_core -lvulkan -lpthread" \
    --disable-debug \
    --disable-static \
    --disable-doc \
@@ -150,6 +95,7 @@ cd "$work/FFmpeg"
    --enable-libass \
    --enable-libfreetype \
    --enable-libfribidi \
+    --enable-libxml2 \
    --enable-libpulse \
    --enable-libdav1d \
    --enable-libopus \
@@ -201,10 +147,10 @@ Priority: optional
 Architecture: arm64
 Depends: libc6,
         libdrm2,
-         libvulkan1,
         libfontconfig1,
         libfreetype6,
         libfribidi0,
+         libxml2,
         libpulse0,
         libdav1d7 | libdav1d6,
         libopus0,
@@ -1,171 +1,3 @@
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-10) bookworm trixie; urgency=medium
-
-  * Add 0007-h264-qpel-mc20-daedalus-fourier.patch —
-    H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma
-    horizontal half-pel, 6-tap "put" — the canonical representative
-    of the H.264 luma motion-compensation family) now dispatches
-    through daedalus_recipe_dispatch_h264_qpel_mc20 instead of
-    ff_put_h264_qpel8_mc20_neon.  Cycle 9 of the daedalus-v4l2#11
-    step 2 substitution arc; closes the 4-cycle libavcodec.so
-    substitution sequence (6 IDCT4 / 7 IDCT8 / 8 luma-v deblock /
-    9 qpel mc20).
-  * Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public
-    API extended with daedalus_recipe_dispatch_h264_qpel_mc20 +
-    DAEDALUS_KERNEL_H264_QPEL_MC20).
-  * Cycle 9 is "CPU primary; QPU pointless" per
-    docs/k9_h264qpel_mc20.md.  Per-block 7.6 ns at 131 Mblock/s
-    gives 135x margin over 30 fps 1080p; QPU dispatch floor at
-    ~250 ns makes any V3D shader strictly worse.  Substitution
-    is plumbing-only, NEON-by-recipe — same
-    daedalus_ctx_create_no_qpu pthread_once shape the cycles 6/7/8
-    shims already own (kept SEPARATE from the H264DSP shim's ctx
-    because H264QPEL is its own libavcodec Makefile module and
-    link order does not guarantee a single .o owns the ctx symbol;
-    one extra ~µs init per process, paid lazily on first MC call).
-  * Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the
-    16x16 size tier stay on the in-tree NEON .S code.  Per the
-    cycle-9 phase-1 rationale, mc20 8x8 is representative of the
-    whole family's per-block cost.
-  * Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
-    cycle 9 green; 10000/10000 random blocks).
-  * No SONAME change, no Depends change.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Sat, 23 May 2026 12:00:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-9) bookworm trixie; urgency=medium
-
-  * Add 0006-h264-restore-low-delay.patch — restore the documented
-    AV_CODEC_FLAG_LOW_DELAY semantics in the H.264 decoder.  FFmpeg
-    8.x dropped the H.264 low_delay code path entirely; setting the
-    flag at avcodec_open2 no longer prevents the display-order DPB
-    output queue from running.  Visible on Firefox YouTube as the
-    2-1-4-3 B-frame pair-swap, re-introduced silently by the
-    SONAME 61→62 jump in daedalus-v4l2 PR #16.
-  * h264_select_output_frame: early-exit when LOW_DELAY is set;
-    emit the just-decoded picture as next_output_pic, mirror the
-    corruption / recovery-point tracking, skip delayed_pic[] and
-    the POC reorder machinery entirely.
-  * h264_field_start: suppress the SPS-driven
-    has_b_frames = sps->num_reorder_frames clobber when LOW_DELAY
-    is set — without this the per-slice bitstream_restriction_flag
-    re-pickup would reintroduce a nonzero reorder buffer mid-
-    stream.
-  * Restores the same one-frame-per-send_packet contract the
-    daedalus-v4l2 daemon's decoder.c already relies on (the flag
-    is set unconditionally for H.264).  No daemon side change.
-  * No SONAME change, no Depends change.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 13:30:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-8) bookworm trixie; urgency=medium
-
-  * Add 0005-h264-deblock-luma-v-daedalus-fourier.patch —
-    H264DSPContext.v_loop_filter_luma (non-intra bS<4 vertical luma
-    deblock, called per macroblock-row edge from the slice deblock
-    loop in libavcodec/h264_loopfilter.c) now dispatches through
-    daedalus_recipe_dispatch_h264_deblock_luma_v instead of
-    ff_h264_v_loop_filter_luma_neon.  Cycle 8 of the daedalus-v4l2#11
-    step 2 substitution arc.
-  * Cycle 8 is marked "CPU primary; QPU opportunistic" in
-    daedalus-fourier, but the libavcodec.so context here uses
-    daedalus_ctx_create_no_qpu (process-global pthread_once,
-    shared with cycles 6/7).  Opportunistic QPU is deferred to a
-    separate change that gates Vulkan init on a feature flag, to
-    avoid implicit Vulkan init in arbitrary host processes.  For
-    now cycle 8 is plumbing-only — NEON-by-recipe.
-  * Intra (bS=4) loop filter c->v_loop_filter_luma_intra stays on
-    the in-tree NEON .S code; daedalus's daedalus_h264_deblock_meta
-    only covers the non-intra path per its API docstring.
-  * Bit-exact against ff_h264_v_loop_filter_luma_neon (daedalus-fourier
-    cycle 8 green).
-  * No SONAME change, no Depends change.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 12:30:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
-
-  * Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
-    (per-block 8x8 IDCT, called from the High-profile intra-8x8-DCT
-    macroblock path in libavcodec/h264_mb.c) now dispatches through
-    daedalus_recipe_dispatch_h264_idct8 instead of
-    ff_h264_idct8_add_neon.  Cycle 7 of the daedalus-v4l2#11 step 2
-    substitution arc — NEON-by-recipe, same pthread_once context the
-    cycle-6 IDCT 4x4 shim already owns.
-  * Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
-    green; FFmpeg 8x8 block storage block[r + 8*c] matches daedalus
-    column-major convention).
-  * Bulk c->idct8_add4 (inter 8x8-DCT macroblocks) stays on the
-    in-tree NEON .S code; batched substitution lands later.
-  * No SONAME change, no Depends change.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 10:30:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium
-
-  * Drop --enable-libxml2 + libxml2 Depends — the Gitea
-    debian-aarch64 runner ships libxml2 ≥ 2.14 (SONAME 16) while
-    Debian trixie targets 2.12 (SONAME 2).  -5 built fine, then
-    failed to load on higgs trixie:
-       dlopen(libavformat.so.62): libxml2.so.16:
-       cannot open shared object file
-    Neither the daedalus-v4l2 daemon (direct AVPacket feed —
-    libavformat used only for the in-tree v4l2request hwaccel
-    glue) nor mpv-fourier (Lua + ytdlp + mpv's stream code do
-    DASH/HLS) nor firefox-fourier (gecko-media DASH demux)
-    consumes FFmpeg's libxml2-backed DASH demuxer, so dropping is
-    feature-neutral.  Mirrors the libva trixie/runner ABI-skew
-    workaround documented in PR #62.
-  * CI workflow build-deps lose libxml2-dev for the same reason.
-  * No source code change beyond configure flags + Depends.
-    Substitution stays as PRs #76/#77 landed.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 23:30:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-5) bookworm trixie; urgency=medium
-
-  * pkgrel-only bump (3 → 5) to force a rebuild of the H.264 IDCT 4x4
-    daedalus-fourier substitution that landed in marfrit-packages PR
-    #76.  An orphan -4 .deb already sat in the apt pool (dated
-    2026-05-19, no matching source commit in main); CI's
-    check-already-published.sh compares with `dpkg --compare-versions
-    pool_ver ge source_full`, which short-circuited PR #76's -3
-    build.  Skipping past -4 lets the CI workflow actually publish the
-    substitution.
-  * No source code change beyond PKGREL and this changelog entry.
-    Substitution + control + build-deb.sh wiring stay as PR #76 left
-    them.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 21:30:00 +0000
-
-ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-3) bookworm trixie; urgency=medium
-
-  * Add 0003-h264-idct4-daedalus-fourier.patch — H264DSPContext.idct_add
-    (per-block 4x4 IDCT, called from the intra-4x4 decode path in
-    libavcodec/h264_mb.c) now dispatches through
-    daedalus_recipe_dispatch_h264_idct4 instead of
-    ff_h264_idct_add_neon.  First end-to-end exercise of the
-    daedalus-fourier kernel pack inside libavcodec.so on the
-    production decode hot path (daedalus-v4l2#11 step 2 — cycle 6
-    H.264 IDCT 4x4, NEON-by-recipe).
-  * build-deb.sh: fetches + builds daedalus-fourier (pinned at
-    d87239d, lockstep with the daemon's static link) with
-    -fPIC into a per-build temp prefix, then passes
-    --extra-cflags=-I.../include --extra-ldflags=-L.../lib
-    --extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
-    configure.  Static-linked into libavcodec.so.62.
-  * Bulk paths (idct_add16 / idct_add16intra / idct_add8) remain on
-    the stock NEON .S code and will be batched through
-    daedalus_recipe_dispatch_h264_idct4 with n_blocks>1 in a
-    follow-up.  Cycles 7/8/9 (IDCT 8x8 / luma-v deblock / qpel mc20)
-    land in subsequent patches.
-  * Depends gains libvulkan1 — daedalus_core PUBLIC-links Vulkan
-    (queryable QPU substrate); the no-QPU constructor still works,
-    but the loader refuses libavcodec.so.62 at dlopen time without
-    libvulkan.so.1 present.
-  * No ABI change; SONAMEs stay 62/62/60.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 20:00:00 +0000
-
 ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-1) bookworm trixie; urgency=medium

  * Initial Debian packaging for the Kwiboo FFmpeg fork with V4L2
@@ -10,25 +10,10 @@
 # Upstream fork: https://git.reauktion.de/marfrit/libva-v4l2-request-fourier
 set -euo pipefail

-# Same pin as the Arch PKGBUILD.  c454618 = PR #16 merge "picture,
-# request_pool: transparent OUTPUT-pool resize on bitstream overrun
-# (#15)" — follow-up root-cause fix to #13/#14.  On a mid-stream
-# bitstream-budget overrun (typical cause: SPS-driven resolution
-# upshift in an adaptive-bitrate stream), codec_store_buffer now
-# snapshots the in-flight surface's accumulated bytes, releases its
-# OUTPUT pool slot, calls request_pool_resize (STREAMOFF →
-# REQBUFS(0) → S_FMT with 2×sizeimage hint, capped at 1 GiB, page-
-# aligned → CREATE_BUFS → mmap → media_request_alloc → STREAMON),
-# re-acquires a slot, re-mirrors the surface's source_{data,size,
-# request_fd}, restores the bytes, and continues.  The frame
-# survives instead of being dropped back to libavcodec for surface
-# recreation.  CAPTURE side untouched (per-queue V4L2 streaming
-# independence).
-#
-# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
-# check floor (#13).
-UPSTREAM_COMMIT=c454618ae11addce2e17b560f4deeacbed067d98
-PKGVER=1.0.0+r390+gc454618
+# Same pin as the Arch PKGBUILD.  de27e95 = "v4l2: log error_idx +
+# failing ctrl id on S_EXT_CTRLS failure" (Phase 8.13 diagnostic).
+UPSTREAM_COMMIT=de27e95571b67ef34619c23a12db4698f9b3454e
+PKGVER=1.0.0+r376+gde27e95
 PKGREL=1

 HERE=$(dirname "$(readlink -f "$0")")
@@ -40,7 +25,7 @@ work=$(mktemp -d)
 trap "rm -rf $work" EXIT

 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo libva-fourier.tar.gz \
+curl -sSLfo libva-fourier.tar.gz \
    "https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/archive/${UPSTREAM_COMMIT}.tar.gz"
 tar xzf libva-fourier.tar.gz
 SRCDIR=$(echo libva-v4l2-request-fourier)
@@ -53,28 +38,6 @@ meson setup build \
    -Db_lto=false
 meson compile -C build

-# ---------------------------------------------------------------------------
-# ABI sanity check: the produced .so MUST export __vaDriverInit_1_<MINOR>
-# matching the install target's libva runtime.  Build is expected to run on
-# a Debian trixie runner where <va/va.h>'s VA_MINOR is 22 — see
-# .gitea/workflows/build.yml (runs-on: actrunner-debian-aarch64-bohr).  If a future
-# runner change lands the build on a host with a different libva-dev
-# version, the produced symbol won't bind on the install target and ffmpeg/
-# vainfo/firefox-vaapi will all fail with "has no function
-# __vaDriverInit_1_0".  Fail loud at build time instead of shipping a
-# silently-broken .deb (which is what happened in -1).
-# ---------------------------------------------------------------------------
-SO=$(find build -name 'v4l2_request_drv_video.so' | head -1)
-if ! nm -D --defined-only "$SO" | grep -q '__vaDriverInit_1_22'; then
-    echo "FATAL: built driver does not export __vaDriverInit_1_22."
-    echo "    Build host's <va/va.h> VA_MINOR_VERSION is likely != 22."
-    echo "    Expected runner: actrunner-debian-aarch64-bohr (trixie, libva 2.22)."
-    echo "    Symbol exports found:"
-    nm -D --defined-only "$SO" | grep -i vadriverinit || echo "    (none)"
-    exit 1
-fi
-echo "ABI check: $SO exports __vaDriverInit_1_22 (matches trixie libva 2.22)"
-
 ROOT="$work/pkgroot"
 DESTDIR="$ROOT" meson install -C build

@@ -1,51 +1,3 @@
-libva-v4l2-request-fourier (1.0.0+r380+g9898331-1) bookworm trixie; urgency=medium
-
-  * Bump to 9898331 — LIBVA-2 close.  Adds video_fd_daedalus to
-    any_fd_supports_output_format's probe list in config.c so the
-    profile enumerator actually sees daedalus_v4l2's OUTPUT formats
-    (VP9F + AV1F + S264).  Before this commit, ffmpeg vaapi against
-    H.264 on higgs bailed with "No support for codec h264 profile 578"
-    because RequestQueryConfigProfiles only walked rkvdec/hantro/
-    rpi-hevc-dec/vpu981 fds and never asked daedalus what it could do.
-  * Backward-compatible on RK3399/3588 — new slot gated by
-    HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus >= 0; both false in
-    those deployments.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 19:30:00 +0000
-
-libva-v4l2-request-fourier (1.0.0+r378+gc332d34-2) bookworm trixie; urgency=medium
-
-  * Rebuild on a native Debian trixie runner (actrunner-debian-aarch64-bohr) so
-    the driver picks up trixie's libva-dev (2.22) and exports
-    __vaDriverInit_1_22 — the symbol trixie's libva runtime looks up.
-    Previous -1 build used the Arch CI runner (libva 2.23.0) and
-    exported __vaDriverInit_1_23, which trixie's loader cannot bind:
-    vaInitialize() returns -1 ("has no function __vaDriverInit_1_0")
-    and ffmpeg -hwaccel vaapi fails on startup.
-  * No source change; pure build-env fix.  CI workflow's
-    libva-v4l2-request-fourier-debian job moved from runs-on:
-    arch-aarch64 to runs-on: actrunner-debian-aarch64-bohr; build-deps installed
-    via apt-get instead of pacman.
-  * Hard sanity check kept in build-deb.sh: build fails if the
-    resulting .so doesn't export __vaDriverInit_1_22 (preempts the
-    silent install-then-refuse-to-load failure mode).
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 18:00:00 +0000
-
-libva-v4l2-request-fourier (1.0.0+r378+gc332d34-1) bookworm trixie; urgency=medium
-
-  * Bump to c332d34 — LIBVA-1 per-codec dispatch close.  Pi 5 mixed
-    deployment (rpi-hevc-dec + daedalus_v4l2 both loaded) now correctly
-    opens BOTH decoders: VP9/AV1/H.264 route to daedalus via new 'd'
-    kind, HEVC stays on 'p' (rpi-hevc-dec).  Before this commit
-    find_codec_device picked rpi-hevc-dec as the sole primary and the
-    daedalus_v4l2 slot stayed -1, so VP9/AV1/H.264 frames failed.
-  * Also closes a small fd leak in RequestTerminate (daedalus pair).
-  * Backward-compatible: new branches gated by HAVE_DAEDALUS_V4L2
-    *and* video_fd_daedalus >= 0 — RK3399/RK3588 boxes unaffected.
-
- -- Markus Fritsche <mfritsche@reauktion.de>  Wed, 20 May 2026 17:30:00 +0000
-
 libva-v4l2-request-fourier (1.0.0+r376+gde27e95-1) bookworm trixie; urgency=medium

  * Initial Debian packaging (sibling to existing
@@ -23,7 +23,7 @@ work=$(mktemp -d)
 trap "rm -rf $work" EXIT

 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo lmcp.tar.gz "https://git.reauktion.de/marfrit/lmcp/archive/${UPSTREAM_TAG}.tar.gz"
+curl -sSLfo lmcp.tar.gz "https://git.reauktion.de/marfrit/lmcp/archive/${UPSTREAM_TAG}.tar.gz"
 echo "$LMCP_TARBALL_SHA256  lmcp.tar.gz" | sha256sum -c
 tar xzf lmcp.tar.gz

@@ -33,7 +33,7 @@ work=$(mktemp -d)
 trap "rm -rf $work" EXIT

 cd "$work"
-curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo mpv.tar.gz \
+curl -sSLfo mpv.tar.gz \
    "https://github.com/mpv-player/mpv/archive/v${MPV_VERSION}/mpv-${MPV_VERSION}.tar.gz"
 echo "$MPV_TARBALL_SHA256  mpv.tar.gz" | sha256sum -c
 tar xzf mpv.tar.gz