ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier

Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. ## What - Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/ h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus and a daedalus_recipe_dispatch_h264_idct8 call; patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to the new shim. - arch/PKGBUILD + debian/build-deb.sh: append the new patch to the apply list; bump pkgrel/PKGREL to 7. - No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8. ## Why The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6. Production validation of cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly, avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms baseline). Cycle 7 inherits the same shim's pthread_once context. Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus column-major convention). ## Scope NOT covered (deferred) - Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with n_blocks>1 lands later alongside the cycle-6 bulk-paths work. - High-bit-depth (10-bit) path untouched. - Cycles 8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4) - marfrit-packages PR #78 (libxml2 ABI-skew workaround) - marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)
Merge pull request 'mesa-panvk-bifrost-video: r1-r4 patches as real files (symlinks broke CI)' (#83 ) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-video-retrigger into main
2026-05-22 10:20:27 +02:00 · 2026-05-22 07:55:59 +00:00 · 2026-05-22 09:49:59 +02:00 · 2026-05-22 07:30:37 +00:00 · 2026-05-22 09:15:18 +02:00 · 2026-05-22 06:32:42 +00:00
14 changed files with 4075 additions and 10 deletions
@@ -935,7 +935,7 @@ jobs:
              libfontconfig-dev libfribidi-dev libgmp-dev libgnutls28-dev \
              libmp3lame-dev libass-dev libdav1d-dev libdrm-dev \
              libfreetype-dev libpulse-dev libva-dev libvorbis-dev libvpx-dev \
-              libwebp-dev libx264-dev libx265-dev libxml2-dev libopus-dev \
+              libwebp-dev libx264-dev libx265-dev libopus-dev \
              libvulkan-dev glslang-tools \
              v4l-utils liblzma-dev zlib1g-dev \
              curl ca-certificates openssh-client rsync dpkg-dev
@@ -1417,6 +1417,142 @@ jobs:
            -e 'ssh -i /root/.ssh/id_ed25519' \
            ./ mfritsche@nc.reauktion.de:arch/aarch64/
      - name: wipe secrets
        if: always()
        run: |
          rm -f /root/repo_pass /root/.ssh/id_ed25519
          rm -f /root/.ssh/id_ed25519_hertz
  # -------------------------------------------------------------------------
  # mesa-panvk-bifrost-video (aarch64 only) — sibling adding VK_KHR_video_decode_h264
  # via the V4L2 hantro VPU. Phase 4 byte-exact validated 2026-05-21.
  # Co-installs at /usr/lib/panvk-bifrost-video/ (parallel to r4); opt-in
  # via VK_ICD_FILENAMES (no launcher shipped — uses standard Vulkan loader).
  #
  # Build is slow (~30-60min on actrunner-aarch64): full Mesa-from-source.
  # Standalone job — no `needs:` since it doesn't depend on the fourier
  # codec stack. continue-on-error so a build hiccup doesn't block other
  # jobs in the same workflow run.
  # -------------------------------------------------------------------------
  mesa-panvk-bifrost-video-aarch64:
    runs-on: arch-aarch64
    continue-on-error: true
    steps:
      - uses: actions/checkout@v4
      - name: skip if already published
        id: skip-check
        run: |
          set -e
          result=$(./.gitea/scripts/check-already-published.sh arch/mesa-panvk-bifrost-video)
          echo "$result" >> "$GITHUB_OUTPUT"
          echo "decision: $result"
      - name: bootstrap runner (idempotent)
        if: steps.skip-check.outputs.skip != '1'
        run: |
          set -e
          retry() { for i in 1 2 3; do "$@" && return 0; rc=$?; echo "retry $i (exit=$rc)" >&2; sleep $((i*5)); done; return 1; }
          retry pacman -Syu --noconfirm --needed base-devel git rsync gnupg openssh sudo
      - name: import signing key
        if: steps.skip-check.outputs.skip != '1'
        env:
          PRIV: ${{ secrets.MARFRIT_REPO_PRIVATE_KEY }}
          PASS: ${{ secrets.MARFRIT_REPO_PASSPHRASE }}
        run: |
          set -e
          gpgconf --homedir /root/.gnupg --kill all 2>/dev/null || true
          rm -rf /root/.gnupg /root/repo_pass
          mkdir -m700 -p /root/.gnupg
          printf '%s' "$PASS" > /root/repo_pass
          chmod 600 /root/repo_pass
          printf '%s\n' "$PRIV" | gpg --batch --import
          echo "92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C:6:" | gpg --import-ownertrust
      - name: install deploy ssh key
        if: steps.skip-check.outputs.skip != '1'
        env:
          KEY: ${{ secrets.MARFRIT_REPO_DEPLOY_KEY }}
        run: |
          mkdir -m700 -p /root/.ssh
          printf '%s\n' "$KEY" > /root/.ssh/id_ed25519
          chmod 600 /root/.ssh/id_ed25519
          ssh-keyscan -t ed25519 nc.reauktion.de > /root/.ssh/known_hosts 2>/dev/null
      - name: makepkg mesa-panvk-bifrost-video
        if: steps.skip-check.outputs.skip != '1'
        run: |
          set -e
          rm -rf /tmp/build-mesa-panvk-bifrost-video
          cp -r arch/mesa-panvk-bifrost-video /tmp/build-mesa-panvk-bifrost-video
          chown -R builder:builder /tmp/build-mesa-panvk-bifrost-video
          cd /tmp/build-mesa-panvk-bifrost-video
          # MAKEFLAGS for parallel build; runner is multi-core.
          # --skipinteg because sha256sums=SKIP in PKGBUILD (matches the
          # fourier-fork PKGBUILD convention).
          sudo -u builder -H env MAKEFLAGS="-j60" \
            makepkg --nocheck --noconfirm --syncdeps --cleanbuild --skipinteg
          ls -la *.pkg.tar.* | grep -v "\.sig$"
      - name: sign mesa-panvk-bifrost-video
        if: steps.skip-check.outputs.skip != '1'
        run: |
          set -e
          cd /tmp/build-mesa-panvk-bifrost-video
          for f in *.pkg.tar.xz *.pkg.tar.zst *.pkg.tar.gz; do
            [ -f "$f" ] || continue
            gpg --batch --pinentry-mode loopback --passphrase-file /root/repo_pass \
                --detach-sign --yes -u 92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C "$f"
          done
      - name: update aarch64 repo db
        if: steps.skip-check.outputs.skip != '1'
        run: |
          set -e
          mkdir -p /tmp/arch-stage-mesa-panvk-video
          cd /tmp/arch-stage-mesa-panvk-video
          rm -f *
          for f in marfrit.db.tar.gz marfrit.db.tar.gz.sig marfrit.files.tar.gz marfrit.files.tar.gz.sig; do
            curl -sSLf "https://packages.reauktion.de/arch/aarch64/$f" -o "$f" || rm -f "$f"
          done
          for ext in xz zst gz; do
            ls /tmp/build-mesa-panvk-bifrost-video/*.pkg.tar.$ext 2>/dev/null && \
              mv /tmp/build-mesa-panvk-bifrost-video/*.pkg.tar.$ext /tmp/build-mesa-panvk-bifrost-video/*.pkg.tar.$ext.sig .
          done || true
          export GNUPGHOME=/root/.gnupg
          printf 'pinentry-mode loopback\npassphrase-file /root/repo_pass\n' > /root/.gnupg/gpg.conf
          printf 'allow-loopback-pinentry\n' > /root/.gnupg/gpg-agent.conf
          gpg-connect-agent reloadagent /bye
          pkgs=()
          for ext in xz zst gz; do
            for f in *.pkg.tar.$ext; do [ -f "$f" ] && pkgs+=("$f"); done
          done
          if [ -f marfrit.db.tar.gz ]; then
            for f in "${pkgs[@]}"; do
              name=$(echo "$f" | sed -E 's/-[0-9].*//')
              repo-remove --sign --key 92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C \
                marfrit.db.tar.gz "$name" 2>/dev/null || true
            done
          fi
          repo-add --new --sign --key 92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C \
            --verify marfrit.db.tar.gz "${pkgs[@]}"
          ln -sf marfrit.db.tar.gz        marfrit.db
          ln -sf marfrit.files.tar.gz     marfrit.files
          ln -sf marfrit.db.tar.gz.sig    marfrit.db.sig
          rm -f marfrit.files.sig
      - name: publish to aarch64
        if: steps.skip-check.outputs.skip != '1'
        run: |
          set -e
          retry() { for i in 1 2 3; do "$@" && return 0; rc=$?; echo "retry $i (exit=$rc)" >&2; sleep $((i*5)); done; return 1; }
          cd /tmp/arch-stage-mesa-panvk-video
          retry rsync -avL --copy-unsafe-links \
            -e 'ssh -i /root/.ssh/id_ed25519' \
            ./ mfritsche@nc.reauktion.de:arch/aarch64/
      - name: wipe secrets
        if: always()
        run: rm -f /root/repo_pass /root/.ssh/id_ed25519
@@ -0,0 +1,107 @@
 From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
 From: Markus Fritsche <mfritsche@reauktion.de>
 Date: Fri, 22 May 2026 10:18:21 +0200
 Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
 daedalus-fourier
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 H264DSPContext.idct8_add (called per 8x8 block from the High-profile
 intra-8x8-DCT decode path in h264_mb.c) now dispatches through
 daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
 The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
 the recipe is CPU NEON, so this is effectively a NEON-to-NEON
 substitution layered on top of the cycle-6 IDCT 4x4 wiring.  Same
 pthread_once global context, same destructive-zero semantics; FFmpeg
 column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
 Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
 on the in-tree NEON .S code and will be batched through
 daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
 Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
 green).
 Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
 ---
 libavcodec/aarch64/h264_idct_daedalus.c   | 29 ++++++++++++++++-------
 libavcodec/aarch64/h264dsp_init_aarch64.c |  3 ++-
 2 files changed, 23 insertions(+), 9 deletions(-)
 diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
 index 538d223..cbb98af 100644
 --- a/libavcodec/aarch64/h264_idct_daedalus.c
 +++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,14 +1,16 @@
 /*
 - * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
 + * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
  *
 - * Routes H264DSPContext.idct_add through
 - * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
 - * The recipe layer picks the substrate (CPU NEON by default for
 - * cycle 6; future cycles may dispatch to V3D opportunistically).
 + * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
 + *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
 + * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
 + * recipe layer picks the substrate (CPU NEON by default for cycles
 + * 6 + 7; future cycles may dispatch to V3D opportunistically).
  *
 - * FFmpeg's 4x4 block memory layout matches daedalus's column-major
 - * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
 - * sides destructively zero the block after the transform.
 + * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
 + * column-major convention: block[r + N*c] = coefficient at
 + * (row r, col c) for N ∈ {4, 8}.  Both sides destructively zero the
 + * block after the transform.
  *
  * The library context is process-global and lazily initialised under
  * pthread_once.  We pick the no-QPU constructor here because
@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
 }
 void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
 {
@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
     daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
                                         block, 1, &meta);
 }
 +
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
 +{
 +    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
 +
 +    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
 +
 +    daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
 +                                        block, 1, &meta);
 +}
 diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
 index b993df2..741e551 100644
 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c
 +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
                             const uint8_t nnzc[15 * 8]);
 void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
                              int16_t *block, int stride,
@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
         c->idct_add16intra = ff_h264_idct_add16intra_neon;
         if (chroma_format_idc <= 1)
             c->idct_add8   = ff_h264_idct_add8_neon;
 -        c->idct8_add       = ff_h264_idct8_add_neon;
 +        c->idct8_add       = ff_h264_idct8_add_daedalus;
         c->idct8_dc_add    = ff_h264_idct8_dc_add_neon;
         c->idct8_add4      = ff_h264_idct8_add4_neon;
     } else if (have_neon(cpu_flags) && bit_depth == 10) {
 -- 
 2.47.3
@@ -24,7 +24,7 @@ _srcname=FFmpeg
 _version='8.1'
 _commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935'  # v4l2-request-n8.1 tip 2026-04-24
 pkgver=8.1.r123329.b57fbbe
-pkgrel=6   # pkgrel=6 — H.264 IDCT 4x4 daedalus-fourier substitution (2026-05-21)
+pkgrel=7   # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution (cycle 7, 2026-05-22)
 epoch=2
 # daedalus-fourier pin — first kernel substitution in libavcodec
@@ -90,8 +90,9 @@ source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
        "daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz"
        '0001-libudev-bypass-fallback.patch'
        '0002-nv15-to-p010-unpack.patch'
-        '0003-h264-idct4-daedalus-fourier.patch')
+        '0003-h264-idct4-daedalus-fourier.patch'
-sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
+        '0004-h264-idct8-daedalus-fourier.patch')
 sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
 pkgver() {
  cd "${_srcname}"
@@ -105,6 +106,7 @@ prepare() {
  patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
  patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
  patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
  patch -Np1 -i "${srcdir}/0004-h264-idct8-daedalus-fourier.patch"
 }
 build() {
@@ -0,0 +1,57 @@
 From: claude-noether (on behalf of mfritsche)
 Date: 2026-05-19
 Subject: panvk: expose VK_KHR/EXT_robustness2 + nullDescriptor on Bifrost (PAN_ARCH 6/7)
 Without this, Mesa's Zink driver refuses to use PanVk-Bifrost as its Vulkan
 backend, falling back silently to llvmpipe (software rasterizer) for all
 GL-via-Zink on Bifrost SBCs. That defeats the entire purpose of having a
 Vulkan driver on Bifrost — GL acceleration via Zink is the most natural
 near-term consumer.
 panvk_vX_nir_lower_descriptors.c:1309 and panvk_vX_shader.c:1355 already
 plumb dev->vk.enabled_features.nullDescriptor arch-agnostically — the gate
 at panvk_vX_physical_device.c was set conservatively when Bifrost was
 unmaintained, not because of hardware incapability.
 iter1–7 of the panvk-bifrost campaign proved fundamental driver functions
 on Mali-G52 r1 MC1 (PAN_ARCH=7). This patch is the iter8 follow-up.
 robustBufferAccess2 and robustImageAccess2 are NOT flipped — they're
 independent rb2 features Zink doesn't require, gated differently
 (robustBufferAccess2 = PAN_ARCH >= 11, robustImageAccess2 = false), and
 out of scope for iter8.
 ---
 src/panfrost/vulkan/panvk_vX_physical_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
 --- a/src/panfrost/vulkan/panvk_vX_physical_device.c
 +++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
@@ -91,7 +91,7 @@ get_device_extensions(const struct panvk_physical_device *device,
       .KHR_pipeline_binary = true,
       .KHR_pipeline_executable_properties = true,
       .KHR_pipeline_library = true,
 -      .KHR_robustness2 = PAN_ARCH >= 10,
 +      .KHR_robustness2 = true,
       .KHR_sampler_mirror_clamp_to_edge = true,
       .KHR_sampler_ycbcr_conversion = true,
       .KHR_separate_depth_stencil_layouts = true,
@@ -168,7 +168,7 @@ get_device_extensions(const struct panvk_physical_device *device,
       .EXT_queue_family_foreign = true,
       .EXT_robustness = pan_arch(device->kmod.dev->props.gpu_id) >= 9,
       .EXT_image_robustness = true,
 -      .EXT_robustness2 = PAN_ARCH >= 10,
 +      .EXT_robustness2 = true,
       .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
       .EXT_scalar_block_layout = true,
       .EXT_separate_stencil_usage = true,
@@ -493,7 +493,7 @@ get_device_features(const struct panvk_physical_device *device,
       /* VK_KHR_robustness2 */
       .robustBufferAccess2 = PAN_ARCH >= 11,
       .robustImageAccess2 = false,
 -      .nullDescriptor = PAN_ARCH >= 10,
 +      .nullDescriptor = true,
       /* VK_KHR_shader_clock */
       .shaderSubgroupClock = device->kmod.dev->props.gpu_can_query_timestamp,
@@ -0,0 +1,47 @@
 From: claude-noether (on behalf of mfritsche)
 Date: 2026-05-20
 Subject: panvk: expose Vulkan 1.1 + 1.2 on Bifrost (PAN_ARCH 6/7)
 ANGLE (Chromium's GL stack) requires apiVersion >= 1.1 to initialize. Without
 this, Brave / Chromium's GPU process fails at GL info collection:
  vk_renderer.cpp:2659 (initialize): ANGLE Requires a minimum Vulkan device
                                     version of 1.1
  Display::initialize error 0: Internal Vulkan error (-9): The requested
                               version of Vulkan is not supported by the driver
 Stack-up with iter8's robustness2 patch enables ANGLE → PanVk-Bifrost →
 Skia (via --enable-features=Vulkan) on Bifrost SBCs.
 PanVk-Bifrost already supports the bulk of 1.1-promoted features as extensions
 (multiview, maintenance1-3, descriptor update template, 16-bit storage,
 descriptor update template, sampler ycbcr, variable pointers, etc. — all
 visible in iter0 vulkaninfo). The version bump primarily bundles them.
 Risk: Vulkan 1.1 has features beyond what iter1–7 exercised (protected memory,
 full subgroup ops). Specific app failures will be characterizable.
 1.2 is also flipped — Brave's Vulkan path may want descriptor indexing,
 buffer device address, etc. (all listed in iter0 vulkaninfo as supported
 extensions, just gated as 1.0-with-extensions, not 1.2-core).
 ---
 src/panfrost/vulkan/panvk_vX_physical_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 diff --git a/src/panfrost/vulkan/panvk_vX_physical_device.c b/src/panfrost/vulkan/panvk_vX_physical_device.c
 --- a/src/panfrost/vulkan/panvk_vX_physical_device.c
 +++ b/src/panfrost/vulkan/panvk_vX_physical_device.c
@@ -38,8 +38,8 @@ get_device_extensions(const struct panvk_physical_device *device,
                       struct vk_device_extension_table *ext)
 {
    *ext = (struct vk_device_extension_table){
 -      .KHR_8bit_storage = true,
 -      .KHR_16bit_storage = true,
 -      bool has_vk1_1 = PAN_ARCH >= 10;
 -      bool has_vk1_2 = PAN_ARCH >= 10;
 +      .KHR_8bit_storage = true,
 +      .KHR_16bit_storage = true,
 +      bool has_vk1_1 = true;
 +      bool has_vk1_2 = true;
       *ext = (struct vk_device_extension_table){
@@ -0,0 +1,328 @@
 --- a/src/panfrost/vulkan/panvk_shader.h	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-20 18:52:53.312698258 +0200
@@ -150,6 +150,10 @@
    struct {
 #if PAN_ARCH < 9
       int32_t raw_vertex_offset;
 +      uint32_t num_vertices;       /* iter13: XFB needs per-draw vertex count */
 +      /* aligned_u64 attribute below inserts the 4-byte alignment gap
 +       * after num_vertices automatically — no explicit pad needed. */
 +      aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
 #endif
       int32_t first_vertex;
       int32_t base_instance;
 --- a/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 19:09:29.711145446 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_physical_device.c	2026-05-20 18:52:54.832720445 +0200
@@ -169,6 +169,7 @@
       .EXT_provoking_vertex = true,
       .EXT_queue_family_foreign = true,
       .EXT_robustness2 = true,
 +      .EXT_transform_feedback = PAN_ARCH < 9,   /* iter13: JM-class only for now */
       .EXT_sampler_filter_minmax = PAN_ARCH >= 10,
       .EXT_scalar_block_layout = true,
       .EXT_separate_stencil_usage = true,
@@ -495,6 +496,10 @@
       .robustImageAccess2 = false,
       .nullDescriptor = true,
 +      /* VK_EXT_transform_feedback (iter13) */
 +      .transformFeedback = PAN_ARCH < 9,
 +      .geometryStreams = false,
 +
       /* VK_KHR_shader_clock */
       .shaderSubgroupClock = device->kmod.dev->props.gpu_can_query_timestamp,
       .shaderDeviceClock = device->kmod.dev->props.timestamp_device_coherent,
@@ -1020,6 +1025,18 @@
       .robustStorageBufferAccessSizeAlignment = 1,
       .robustUniformBufferAccessSizeAlignment = 1,
 +      /* VK_EXT_transform_feedback (iter13) */
 +      .maxTransformFeedbackStreams = 1,
 +      .maxTransformFeedbackBuffers = 4,
 +      .maxTransformFeedbackBufferSize = UINT32_MAX,
 +      .maxTransformFeedbackStreamDataSize = 512,
 +      .maxTransformFeedbackBufferDataSize = 512,
 +      .maxTransformFeedbackBufferDataStride = 2048,
 +      .transformFeedbackQueries = false,
 +      .transformFeedbackStreamsLinesTriangles = false,
 +      .transformFeedbackRasterizationStreamSelect = false,
 +      .transformFeedbackDraw = false,
 +
       /* VK_EXT_shader_object */
       /* We do not currently support VK_EXT_shader_object but this is used
        * internally by vk_shader
 --- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-20 18:52:56.556745611 +0200
@@ -21,6 +21,7 @@
 #include "panvk_physical_device.h"
 #include "panvk_sampler.h"
 #include "panvk_shader.h"
 +#include "pan_nir.h"   /* iter13: pan_nir_lower_xfb */
 #include "spirv/nir_spirv.h"
 #include "util/memstream.h"
@@ -100,6 +101,20 @@
    case nir_intrinsic_load_raw_vertex_offset_pan:
       val = load_sysval(b, graphics, bit_size, vs.raw_vertex_offset);
       break;
 +   case nir_intrinsic_load_num_vertices:    /* iter13: XFB index calc */
 +      val = load_sysval(b, graphics, bit_size, vs.num_vertices);
 +      break;
 +   case nir_intrinsic_load_xfb_address: {   /* iter13: XFB buffer N base address */
 +      unsigned idx = nir_intrinsic_base(intr);
 +      switch (idx) {
 +      case 0: val = load_sysval(b, graphics, bit_size, vs.xfb_address[0]); break;
 +      case 1: val = load_sysval(b, graphics, bit_size, vs.xfb_address[1]); break;
 +      case 2: val = load_sysval(b, graphics, bit_size, vs.xfb_address[2]); break;
 +      case 3: val = load_sysval(b, graphics, bit_size, vs.xfb_address[3]); break;
 +      default: return false;
 +      }
 +      break;
 +   }
    case nir_intrinsic_load_layer_id:
       assert(b->shader->info.stage == MESA_SHADER_FRAGMENT);
       val = load_sysval(b, graphics, bit_size, layer_id);
@@ -457,6 +472,7 @@
             core_max_id);
    pan_preprocess_nir(nir, pdev->kmod.dev->props.gpu_id);
 +
 }
 static void
@@ -870,6 +886,18 @@
             nir_var_shader_in | nir_var_shader_out, UINT32_MAX);
    NIR_PASS(_, nir, nir_lower_io, nir_var_shader_in | nir_var_shader_out,
             glsl_type_size, nir_lower_io_use_interpolated_input_intrinsics);
 +
 +#if PAN_ARCH < 9
 +   /* iter13: VK_EXT_transform_feedback — runs AFTER nir_lower_io so that
 +    * shader outputs are now store_output intrinsics that pan_nir_lower_xfb
 +    * can rewrite to nir_store_global+nir_load_xfb_address. */
 +   if (nir->info.stage == MESA_SHADER_VERTEX &&
 +       nir->info.has_transform_feedback_varyings) {
 +      NIR_PASS(_, nir, nir_opt_constant_folding);
 +      NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
 +      NIR_PASS(_, nir, pan_nir_lower_xfb);
 +   }
 +#endif
 }
 static VkResult
@@ -1288,6 +1316,9 @@
       .view_mask = (state && state->rp) ? state->rp->view_mask : 0,
       .robust2_modes = robust2_modes,
       .robust_descriptors = dev->vk.enabled_features.nullDescriptor,
 +      /* iter13: XFB shaders must disable IDVS (matches Panfrost-Gallium). */
 +      .no_idvs = (info->stage == MESA_SHADER_VERTEX) &&
 +                 info->nir->info.has_transform_feedback_varyings,
    };
    switch (info->stage) {
 --- a/src/panfrost/vulkan/panvk_cmd_draw.h	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/panvk_cmd_draw.h	2026-05-20 18:52:57.748763011 +0200
@@ -135,6 +135,19 @@
    struct panvk_graphics_sysvals sysvals;
 #if PAN_ARCH < 9
 +   /* iter13: VK_EXT_transform_feedback state (JM-class only for now). */
 +   struct {
 +      bool active;
 +      uint32_t buffer_count;
 +      struct {
 +         uint64_t addr;
 +         uint64_t offset;
 +         uint64_t size;
 +      } buffers[4];
 +   } xfb;
 +#endif
 +
 +#if PAN_ARCH < 9
    struct panvk_shader_link link;
 #endif
 --- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-20 19:10:23.031919662 +0200
@@ -10,6 +10,7 @@
 #include "panvk_entrypoints.h"
 #include "pan_desc.h"
 +#include "pan_compiler.h"   /* PAN_SHADER_OOB_ADDRESS */
 #include "pan_util.h"
 static void
@@ -722,6 +723,35 @@
    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.raw_vertex_offset,
                   info->vertex.raw_offset);
    set_gfx_sysval(cmdbuf, dirty_sysvals, layer_id, info->layer_id);
 +
 +   /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
 +    * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
 +   set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
 +   {
 +      const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
 +      /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
 +       * (= 1<<63). This is the Panfrost-Gallium memory-sink idiom — the
 +       * Bifrost MMU silently discards stores to this address, so a pipeline
 +       * with XFB outputs used in a non-XFB draw (or in an XFB draw with
 +       * fewer bound buffers than the shader declares) is safe instead of
 +       * faulting. See gallium/drivers/panfrost/pan_cmdstream.c PAN_SYSVAL_XFB. */
 +      uint64_t _xa0 = PAN_SHADER_OOB_ADDRESS, _xa1 = PAN_SHADER_OOB_ADDRESS,
 +               _xa2 = PAN_SHADER_OOB_ADDRESS, _xa3 = PAN_SHADER_OOB_ADDRESS;
 +      if (_gfx->xfb.active) {
 +         if (_gfx->xfb.buffer_count > 0 && _gfx->xfb.buffers[0].addr)
 +            _xa0 = _gfx->xfb.buffers[0].addr + _gfx->xfb.buffers[0].offset;
 +         if (_gfx->xfb.buffer_count > 1 && _gfx->xfb.buffers[1].addr)
 +            _xa1 = _gfx->xfb.buffers[1].addr + _gfx->xfb.buffers[1].offset;
 +         if (_gfx->xfb.buffer_count > 2 && _gfx->xfb.buffers[2].addr)
 +            _xa2 = _gfx->xfb.buffers[2].addr + _gfx->xfb.buffers[2].offset;
 +         if (_gfx->xfb.buffer_count > 3 && _gfx->xfb.buffers[3].addr)
 +            _xa3 = _gfx->xfb.buffers[3].addr + _gfx->xfb.buffers[3].offset;
 +      }
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[0], _xa0);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[1], _xa1);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[2], _xa2);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[3], _xa3);
 +   }
 #endif
    if (dyn_gfx_state_dirty(cmdbuf, CB_BLEND_CONSTANTS)) {
 --- a/src/panfrost/vulkan/meson.build	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/meson.build	2026-05-20 18:53:04.484861338 +0200
@@ -73,6 +73,7 @@
 jm_inc_dir = ['jm']
 jm_files = [
   'jm/panvk_vX_bind_queue.c',
 +  'jm/panvk_vX_cmd_xfb.c',   # iter13
   'jm/panvk_vX_cmd_buffer.c',
   'jm/panvk_vX_cmd_dispatch.c',
   'jm/panvk_vX_cmd_draw.c',
 --- a/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-04-29 22:19:00.000000000 +0200
 +++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_buffer.c	2026-05-20 19:10:26.163965149 +0200
@@ -473,5 +473,12 @@
    vk_command_buffer_begin(&cmdbuf->vk, pBeginInfo);
 +#if PAN_ARCH < 9
 +   /* iter13: clear XFB state on Begin so a reused command buffer does not
 +    * inherit stale xfb.buffer_count / xfb.active / xfb.buffers[] from a
 +    * prior recording. */
 +   memset(&cmdbuf->state.gfx.xfb, 0, sizeof(cmdbuf->state.gfx.xfb));
 +#endif
 +
    return VK_SUCCESS;
 }
 --- a/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-18 12:50:53.067999996 +0200
 +++ b/src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c	2026-05-20 19:10:27.175979847 +0200
@@ -0,0 +1,111 @@
 +/*
 + * Copyright © 2026 mfritsche / claude-noether
 + * SPDX-License-Identifier: MIT
 + *
 + * iter13: VK_EXT_transform_feedback command handlers for the JM
 + * architecture path (Bifrost v6/v7 + Valhall-JM v9).
 + *
 + * The runtime contract:
 + *   - vkCmdBindTransformFeedbackBuffersEXT: stash (gpu_addr, offset, size)
 + *     for each slot into cmdbuf->state.gfx.xfb.buffers[].
 + *   - vkCmdBeginTransformFeedbackEXT: set cmdbuf->state.gfx.xfb.active = true.
 + *     Mark sysvals dirty so the next draw re-emits vs.xfb_address[].
 + *   - vkCmdEndTransformFeedbackEXT: set active = false.
 + *
 + * Counter buffers (firstCounterBuffer/counterBufferCount/pCounterBuffers/
 + * pCounterBufferOffsets) are accepted by API but ignored — v1 doesn't
 + * support pause/resume. transformFeedbackDraw is advertised as false.
 + *
 + * Per-draw integration: jm/panvk_vX_cmd_draw.c reads cmdbuf->state.gfx.xfb
 + * and populates vs.xfb_address[i] for shader use. The pan_nir_lower_xfb
 + * pass in panvk_vX_shader.c emits nir_load_xfb_address(i) which lowers
 + * (via panvk_vX_shader.c sysval handler) to a load from the per-draw
 + * sysval push area.
 + */
 +
 +#include "vk_log.h"
 +#include "util/log.h"
 +
 +#include "panvk_cmd_buffer.h"
 +#include "panvk_cmd_draw.h"
 +#include "panvk_buffer.h"
 +#include "panvk_entrypoints.h"
 +
 +VKAPI_ATTR void VKAPI_CALL
 +panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
 +   VkCommandBuffer commandBuffer,
 +   uint32_t firstBinding,
 +   uint32_t bindingCount,
 +   const VkBuffer *pBuffers,
 +   const VkDeviceSize *pOffsets,
 +   const VkDeviceSize *pSizes)
 +{
 +   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
 +   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
 +
 +   for (uint32_t i = 0; i < bindingCount; i++) {
 +      uint32_t slot = firstBinding + i;
 +      if (slot >= 4)
 +         continue;
 +
 +      VK_FROM_HANDLE(panvk_buffer, buf, pBuffers[i]);
 +      gfx->xfb.buffers[slot].addr = panvk_buffer_gpu_ptr(buf, 0);
 +      gfx->xfb.buffers[slot].offset = pOffsets[i];
 +      gfx->xfb.buffers[slot].size =
 +         (pSizes != NULL && pSizes[i] != VK_WHOLE_SIZE)
 +            ? pSizes[i]
 +            : (buf->vk.size - pOffsets[i]);
 +   }
 +
 +   if (firstBinding + bindingCount > gfx->xfb.buffer_count)
 +      gfx->xfb.buffer_count = firstBinding + bindingCount;
 +}
 +
 +VKAPI_ATTR void VKAPI_CALL
 +panvk_per_arch(CmdBeginTransformFeedbackEXT)(
 +   VkCommandBuffer commandBuffer,
 +   uint32_t firstCounterBuffer,
 +   uint32_t counterBufferCount,
 +   const VkBuffer *pCounterBuffers,
 +   const VkDeviceSize *pCounterBufferOffsets)
 +{
 +   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
 +   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
 +
 +   /* Counter buffers ignored in v1 — see VkPhysicalDeviceTransformFeedback
 +    * PropertiesEXT.transformFeedbackDraw = false in panvk_vX_physical_device.c.
 +    * App is spec-compliant if it does not pass counter buffers (which our
 +    * features advertisement allows), but warn loudly if it does so we do not
 +    * silently produce wrong capture state. */
 +   (void)firstCounterBuffer;
 +   (void)pCounterBufferOffsets;
 +   if (counterBufferCount > 0 && pCounterBuffers != NULL) {
 +      mesa_logw("panvk: CmdBeginTransformFeedbackEXT: counter buffers not "
 +                "implemented (transformFeedbackDraw=false); XFB resume will "
 +                "restart at buffer offset 0");
 +   }
 +
 +   gfx->xfb.active = true;
 +   /* Per-draw set_gfx_sysval picks up the change automatically — no
 +    * explicit dirty marking required (set_gfx_sysval uses memcmp +
 +    * BITSET to detect state diffs and re-emit sysvals). */
 +}
 +
 +VKAPI_ATTR void VKAPI_CALL
 +panvk_per_arch(CmdEndTransformFeedbackEXT)(
 +   VkCommandBuffer commandBuffer,
 +   uint32_t firstCounterBuffer,
 +   uint32_t counterBufferCount,
 +   const VkBuffer *pCounterBuffers,
 +   const VkDeviceSize *pCounterBufferOffsets)
 +{
 +   VK_FROM_HANDLE(panvk_cmd_buffer, cmdbuf, commandBuffer);
 +   struct panvk_cmd_graphics_state *gfx = &cmdbuf->state.gfx;
 +
 +   (void)firstCounterBuffer;
 +   (void)counterBufferCount;
 +   (void)pCounterBuffers;
 +   (void)pCounterBufferOffsets;
 +
 +   gfx->xfb.active = false;
 +}
@@ -0,0 +1,629 @@
 diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
 --- a/src/panfrost/vulkan/meson.build	2026-05-21 14:04:02.529474145 +0200
 +++ b/src/panfrost/vulkan/meson.build	2026-05-21 14:04:04.106755486 +0200
@@ -123,6 +123,7 @@
   'panvk_vX_nir_lower_input_attachment_loads.c',
   'panvk_vX_sampler.c',
   'panvk_vX_shader.c',
 +  'panvk_vX_xfb_lower.c',
   sha1_h,
 ]
 diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
 --- a/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:02.525251986 +0200
 +++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:04.084251800 +0200
@@ -154,6 +154,8 @@
       /* aligned_u64 attribute below inserts the 4-byte alignment gap
        * after num_vertices automatically — no explicit pad needed. */
       aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
 +      uint32_t xfb_topology;       /* iter17: panvk_xfb_topology enum value */
 +      uint32_t xfb_output_count;   /* iter17: per-instance output verts after decomp */
 #endif
       int32_t first_vertex;
       int32_t base_instance;
@@ -569,4 +571,76 @@
    struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
    struct panvk_shader **shader_out);
 +
 +#if PAN_ARCH < 9
 +/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
 + * we need to distinguish at shader runtime for XFB capture. LIST topologies
 + * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
 +enum panvk_xfb_topology {
 +   PANVK_XFB_TOPO_LIST            = 0,
 +   PANVK_XFB_TOPO_LINE_STRIP      = 1,
 +   PANVK_XFB_TOPO_TRI_STRIP       = 2,
 +   PANVK_XFB_TOPO_TRI_FAN         = 3,
 +   PANVK_XFB_TOPO_LINE_LIST_ADJ   = 4,
 +   PANVK_XFB_TOPO_LINE_STRIP_ADJ  = 5,
 +   PANVK_XFB_TOPO_TRI_LIST_ADJ    = 6,
 +   PANVK_XFB_TOPO_TRI_STRIP_ADJ   = 7,
 +};
 +
 +#include "panvk_macros.h"
 +struct nir_shader;
 +bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
 +
 +/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
 +static inline uint32_t
 +panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
 +{
 +   switch (topo) {
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
 +      return PANVK_XFB_TOPO_LINE_STRIP;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
 +      return PANVK_XFB_TOPO_TRI_STRIP;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
 +      return PANVK_XFB_TOPO_TRI_FAN;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_LINE_LIST_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_TRI_LIST_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
 +   default:
 +      return PANVK_XFB_TOPO_LIST;
 +   }
 +}
 +
 +/* Compute the per-instance output vertex count for a given (topology, input count). */
 +static inline uint32_t
 +panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
 +{
 +   switch (topo) {
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
 +      return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
 +      return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
 +      return (input_count / 4u) * 2u;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
 +      return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
 +      return (input_count / 6u) * 3u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
 +      return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
 +   default:
 +      return input_count;  /* LIST topologies: 1:1 mapping */
 +   }
 +}
 +#endif
 +
 +
 #endif
 diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
 --- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:02.528576354 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:04.091357598 +0200
@@ -727,6 +727,20 @@
    /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
     * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
 +
 +   /* iter17: XFB primitive-decomposition sysvals.
 +    * xfb_topology = enum value for the current bound topology.
 +    * xfb_output_count = per-instance output vertex count after decomposition.
 +    * For LIST topologies, output_count == input vertex count and the shader
 +    * takes the iter13 single-store fast path. */
 +   {
 +      VkPrimitiveTopology vk_topo =
 +         cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
 +      uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
 +      uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
 +   }
    {
       const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
       /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
 diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
 --- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:02.527576494 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:04.098356619 +0200
@@ -895,7 +895,10 @@
        nir->info.has_transform_feedback_varyings) {
       NIR_PASS(_, nir, nir_opt_constant_folding);
       NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
 -      NIR_PASS(_, nir, pan_nir_lower_xfb);
 +      /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
 +       * primitive decomposition for non-LIST topologies. Single-store LIST
 +       * fast path matches iter13 behavior. */
 +      NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
    }
 #endif
 }
 diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
 --- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c	1970-01-01 01:00:00.000000000 +0100
 +++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c	2026-05-21 14:04:04.115354242 +0200
@@ -0,0 +1,486 @@
 +/*
 + * Copyright © 2026 mfritsche / claude-noether
 + * SPDX-License-Identifier: MIT
 + *
 + * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
 + * primitive decomposition for transform_feedback on non-LIST topologies
 + * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
 + *
 + * Approach: emit a topology dispatch at the start of each store_output
 + * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
 + * into per-topology emission logic. For each affected topology, the lowered
 + * code emits guarded conditional stores — one per primitive this vertex
 + * contributes to, computing the output buffer position via primitive index
 + * and slot within the decomposed primitive.
 + *
 + * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
 + * matches iter13's single-store behavior.
 + *
 + * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
 + * as slot 2 — handled via a NIR loop bounded by num_vertices.
 + *
 + * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
 + */
 +
 +#include "panvk_macros.h"
 +
 +#if PAN_ARCH < 9
 +
 +#include "panvk_shader.h"
 +
 +#include "compiler/nir/nir_builder.h"
 +#include "pan_nir.h"
 +
 +#include <vulkan/vulkan_core.h>
 +
 +/* ----- Address arithmetic ----- */
 +
 +static nir_def *
 +xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
 +               uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *byte_off = nir_iadd_imm(b,
 +      nir_imul_imm(b, out_idx, stride), offset_bytes);
 +   return nir_iadd(b, buf, nir_u2u64(b, byte_off));
 +}
 +
 +static void
 +emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
 +                nir_def *instance_id, nir_def *raw_vid, nir_def *value,
 +                uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *out_idx = nir_iadd(b,
 +      nir_imul(b, instance_id, output_count), raw_vid);
 +   nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +   nir_store_global(b, value, addr);
 +}
 +
 +static void
 +emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
 +                nir_def *instance_id, nir_def *eligible,
 +                nir_def *prim_idx, nir_def *slot,
 +                uint32_t verts_per_prim,
 +                nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_push_if(b, eligible);
 +   {
 +      nir_def *out_idx = nir_iadd(b,
 +         nir_imul(b, instance_id, output_count),
 +         nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
 +      nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +      nir_store_global(b, value, addr);
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* ----- Per-topology emission ----- */
 +
 +/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
 +static void
 +emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
 +               nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +               nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +
 +   /* Prim v, slot 0: v < N-2 */
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ult(b, v, Nm2),
 +      v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +
 +   /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *parity = nir_iand_imm(b, prim, 1u);
 +      nir_def *slot = nir_iadd_imm(b, parity, 1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, slot, 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *parity = nir_iand_imm(b, prim, 1u);
 +      nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, slot, 3, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
 +static void
 +emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
 +                nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +
 +   /* Prim v, slot 0: v < N-1 */
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ult(b, v, Nm1),
 +      v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +
 +   /* Prim v-1, slot 1: 1 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
 + *   vertex v=0: contributes to ALL prims as slot 2 (loop required)
 + *   vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
 + *   vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
 + */
 +static void
 +emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
 +             nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +             nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +
 +   /* Prim v-1, slot 0: 1 <= v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot 1: 2 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Central vertex (v == 0): loop over all prims, write to slot 2. */
 +   nir_push_if(b, nir_ieq_imm(b, v, 0));
 +   {
 +      nir_variable *p_var = nir_local_variable_create(b->impl,
 +         glsl_uint_type(), "fan_p");
 +      nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
 +      nir_push_loop(b);
 +      {
 +         nir_def *p = nir_load_var(b, p_var);
 +         nir_push_if(b, nir_uge(b, p, Nm2));
 +         {
 +            nir_jump(b, nir_jump_break);
 +         }
 +         nir_pop_if(b, NULL);
 +
 +         nir_def *out_idx = nir_iadd(b,
 +            nir_imul(b, instance_id, output_count),
 +            nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
 +         nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +         nir_store_global(b, value, addr);
 +
 +         nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
 +      }
 +      nir_pop_loop(b, NULL);
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
 + *   v contributes if v%4 == 1: prim v/4 slot 0
 + *   v contributes if v%4 == 2: prim v/4 slot 1
 + */
 +static void
 +emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   (void)N; /* eligibility is mod-based, not range-based */
 +   nir_def *vmod4 = nir_iand_imm(b, v, 3u);
 +   nir_def *prim = nir_ushr_imm(b, v, 2);  /* v / 4 */
 +
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ieq_imm(b, vmod4, 1),
 +      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ieq_imm(b, vmod4, 2),
 +      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +}
 +
 +/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
 + *   v contributes to prim v-1 slot 0 (1 <= v <= N-2)
 + *   v contributes to prim v-2 slot 1 (2 <= v <= N-1)
 + */
 +static void
 +emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                    nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                    nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +
 +   /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      (void)Nm2;
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
 + *   v contributes if v%6 == 0: prim v/6 slot 0
 + *   v contributes if v%6 == 2: prim v/6 slot 1
 + *   v contributes if v%6 == 4: prim v/6 slot 2
 + */
 +static void
 +emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                  nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                  nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   (void)N;
 +   nir_def *vmod6 = nir_umod_imm(b, v, 6);
 +   nir_def *prim = nir_udiv_imm(b, v, 6);
 +
 +   for (uint32_t slot = 0; slot < 3; slot++) {
 +      emit_prim_store(b, buf, output_count, instance_id,
 +         nir_ieq_imm(b, vmod6, slot * 2),
 +         prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
 + *   even i: {2i, 2i+2, 2i+4}    (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
 + *   odd  i: {2i, 2i+4, 2i+2}    (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
 + *
 + * Only EVEN input vertices contribute (since all output indices are 2*something).
 + * For even input v:
 + *   prim v/2 slot 0 (always, if v/2 < N/2-2)
 + *   prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd   (when v >= 2)
 + *   prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd   (when v >= 4)
 + */
 +static void
 +emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   /* Bail for odd input vertices — they never contribute. */
 +   nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
 +   nir_push_if(b, v_is_even);
 +   {
 +      nir_def *N_half = nir_ushr_imm(b, N, 1);
 +      nir_def *max_prim = nir_iadd_imm(b, N_half, -2);  /* N/2 - 2 */
 +      nir_def *v_half = nir_ushr_imm(b, v, 1);
 +
 +      /* Prim v/2 slot 0: v/2 < N/2 - 2 */
 +      emit_prim_store(b, buf, output_count, instance_id,
 +         nir_ult(b, v_half, max_prim),
 +         v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +
 +      /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
 +      {
 +         nir_def *prim = nir_iadd_imm(b, v_half, -1);
 +         nir_def *parity = nir_iand_imm(b, prim, 1u);
 +         nir_def *slot = nir_iadd_imm(b, parity, 1);  /* even→1, odd→2 */
 +         nir_def *eligible = nir_iand(b,
 +            nir_uge(b, v, nir_imm_int(b, 2)),
 +            nir_ult(b, prim, max_prim));
 +         emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                         prim, slot, 3, value, stride, offset_bytes);
 +      }
 +
 +      /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
 +      {
 +         nir_def *prim = nir_iadd_imm(b, v_half, -2);
 +         nir_def *parity = nir_iand_imm(b, prim, 1u);
 +         nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);  /* even→2, odd→1 */
 +         nir_def *eligible = nir_iand(b,
 +            nir_uge(b, v, nir_imm_int(b, 4)),
 +            nir_ult(b, prim, max_prim));
 +         emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                         prim, slot, 3, value, stride, offset_bytes);
 +      }
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* ----- Main lowering: per store_output XFB channel ----- */
 +
 +static void
 +lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
 +                        unsigned channel_idx, unsigned num_components,
 +                        unsigned buffer, unsigned offset_words)
 +{
 +   assert(buffer < MAX_XFB_BUFFERS);
 +   assert(nir_intrinsic_component(intr) == 0);
 +
 +   uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
 +   assert(stride != 0);
 +   uint16_t offset_bytes = offset_words * 4;
 +
 +   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
 +   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
 +
 +   nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
 +   nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
 +   nir_def *N = nir_load_num_vertices(b);
 +   nir_def *v = nir_load_raw_vertex_id_pan(b);
 +   nir_def *instance = nir_load_instance_id(b);
 +   nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
 +
 +   nir_def *src = intr->src[0].ssa;
 +   nir_component_mask_t mask = nir_component_mask(num_components);
 +   nir_def *value = nir_channels(b, src, mask << channel_idx);
 +
 +   /* Topology dispatch ladder. LIST first (fast path). */
 +   nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
 +   {
 +      emit_list_store(b, buf, out_count, instance, v, value,
 +                      stride, offset_bytes);
 +   }
 +   nir_push_else(b, NULL);
 +   {
 +      /* iter17 Janet Finding 3: gate all non-LIST emission on
 +       * output_count > 0. For degenerate input counts (N < min required
 +       * for the topology), output_count is 0 and we must emit NO stores
 +       * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
 +       * eligibility predicates and we falsely fire stores. */
 +      nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
 +      {
 +      nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
 +      {
 +         emit_tri_strip(b, v, N, buf, out_count, instance, value,
 +                        stride, offset_bytes);
 +      }
 +      nir_push_else(b, NULL);
 +      {
 +         nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
 +         {
 +            emit_line_strip(b, v, N, buf, out_count, instance, value,
 +                            stride, offset_bytes);
 +         }
 +         nir_push_else(b, NULL);
 +         {
 +            nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
 +            {
 +               emit_tri_fan(b, v, N, buf, out_count, instance, value,
 +                            stride, offset_bytes);
 +            }
 +            nir_push_else(b, NULL);
 +            {
 +               nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
 +               {
 +                  emit_line_list_adj(b, v, N, buf, out_count, instance, value,
 +                                     stride, offset_bytes);
 +               }
 +               nir_push_else(b, NULL);
 +               {
 +                  nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
 +                  {
 +                     emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
 +                                         stride, offset_bytes);
 +                  }
 +                  nir_push_else(b, NULL);
 +                  {
 +                     nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
 +                     {
 +                        emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
 +                                          stride, offset_bytes);
 +                     }
 +                     nir_push_else(b, NULL);
 +                     {
 +                        /* TRI_STRIP_ADJ — last case */
 +                        emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
 +                                           stride, offset_bytes);
 +                     }
 +                     nir_pop_if(b, NULL);
 +                  }
 +                  nir_pop_if(b, NULL);
 +               }
 +               nir_pop_if(b, NULL);
 +            }
 +            nir_pop_if(b, NULL);
 +         }
 +         nir_pop_if(b, NULL);
 +      }
 +      nir_pop_if(b, NULL);
 +      }
 +      nir_pop_if(b, NULL);  /* Janet Finding 3: close output_count > 0 guard */
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
 + * dispatch store_output through our topology-aware emission. */
 +static bool
 +lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
 +                 UNUSED void *data)
 +{
 +   if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
 +      b->cursor = nir_instr_remove(&intr->instr);
 +      nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
 +                               nir_load_raw_vertex_offset_pan(b));
 +      nir_def_rewrite_uses(&intr->def, repl);
 +      return true;
 +   }
 +
 +   if (intr->intrinsic != nir_intrinsic_store_output)
 +      return false;
 +
 +   bool progress = false;
 +   b->cursor = nir_before_instr(&intr->instr);
 +
 +   /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
 +    * Outer loop selects which annotation; inner picks which channel. */
 +   for (unsigned i = 0; i < 2; ++i) {
 +      nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
 +                         : nir_intrinsic_io_xfb(intr);
 +      for (unsigned j = 0; j < 2; ++j) {
 +         if (!xfb.out[j].num_components)
 +            continue;
 +         lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
 +                                 xfb.out[j].buffer, xfb.out[j].offset);
 +         progress = true;
 +      }
 +   }
 +
 +   if (progress)
 +      nir_instr_remove(&intr->instr);
 +   return progress;
 +}
 +
 +bool
 +panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
 +{
 +   return nir_shader_intrinsics_pass(
 +      nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
 +}
 +
 +#endif /* PAN_ARCH < 9 */
@@ -0,0 +1,181 @@
 # Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
 #
 # mesa-panvk-bifrost-video — sibling of mesa-panvk-bifrost (r4) that adds
 # VK_KHR_video_decode_h264 on Mali Bifrost SBCs (PAN_ARCH 6/7) backed by
 # the SoC's V4L2-stateless hantro VPU (RK3566/RK3568).
 #
 # Campaign: ~/src/panvk-bifrost-video/ — Phase 4 byte-exact validated
 # 2026-05-21 (48/48 BBB display frames match ffmpeg+libva-v4l2-request-
 # fourier byte-for-byte on the same hantro). Phase 5 second-model review
 # completed; load-bearing findings (output_map OOB, static counter,
 # session_init unwind, probe_hantro gate) all applied.
 #
 # What it does (on top of r4):
 #   - 0001..0004: inherited from mesa-panvk-bifrost (robustness2/null-
 #     descriptor, vk1.1/1.2 advertisement, EXT_transform_feedback, XFB
 #     primitive decomposition) — symlinked from the r4 package directory
 #     so the patches don't drift between siblings.
 #   - 0005: VK_KHR_video_queue + VK_KHR_video_decode_queue +
 #     VK_KHR_video_decode_h264 backed by V4L2-stateless hantro.
 #     Touches 14 files in src/panfrost/vulkan/; full diff in
 #     0005-panvk-bifrost-video-KHR-video-decode-h264.patch.
 #
 # Co-existence:
 #   - Installs to /usr/lib/panvk-bifrost-video/ (parallel to r4's
 #     /usr/lib/panvk-bifrost/). Pick at runtime via VK_ICD_FILENAMES.
 #   - r4 stays the recommended default for the Chromium-GPU-process
 #     consumer (no video needed there). Use this package when the
 #     consumer wants Vulkan video decode (mpv-fourier, ffmpeg-vulkan,
 #     future Chromium-VulkanVideoDecoder).
 #
 # Phase 1 limitations to know about (documented in source comments):
 #   - Single video session per device (active_video singleton)
 #   - Synchronous decode at record time — no pipelining yet
 #   - Hardcoded /dev/video1 + /dev/media0 (matches RK3566/68, blocks
 #     other SoCs without a topology-walk port)
 #   - Bitstream source buffer assumed HOST_VISIBLE (true on panvk-
 #     bifrost, would need fallback on other backends)
 #
 # Build target: arch-aarch64 runner via marfrit-packages Gitea Actions.
 # Mesa build is slow (~30-60min on Cortex-A55).
 pkgname=mesa-panvk-bifrost-video
 _mesaver=26.0.6
 pkgver=26.0.6.r5.video1
 pkgrel=1
 pkgdesc="Patched Mesa libvulkan_panfrost.so adding VK_KHR_video_decode_h264 on Bifrost SBCs (sibling of mesa-panvk-bifrost-r4)"
 arch=('aarch64')
 url="https://github.com/marfrit/panvk-bifrost"
 license=('MIT')
 depends=(
    'mesa'              # for shared mesa runtime libs
    'libdrm'
    'wayland'
    'libxcb'
    'libx11'
    'libxshmfence'
    'zlib'
    'zstd'
    'libelf'
    'libffi'
    'expat'
    'llvm-libs'
    'lm_sensors'
 )
 makedepends=(
    'meson'
    'ninja'
    'glslang'
    'python-mako'
    'python-packaging'
    'wayland-protocols'
    'libxrandr'
    'xorgproto'
    'libdrm'
    'llvm'
    'libclc'
    'spirv-llvm-translator'
    'spirv-tools'
    'rust-bindgen'
    'patch'
 )
 source=(
    "https://archive.mesa3d.org/mesa-${_mesaver}.tar.xz"
    "0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
    "0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
    "0003-panvk-bifrost-vk-ext-transform-feedback.patch"
    "0004-panvk-bifrost-xfb-primitive-decomposition.patch"
    "0005-panvk-bifrost-video-KHR-video-decode-h264.patch"
    "icd.json"
 )
 # Mesa tarball checksum matches the sibling r4 package — same upstream version.
 sha256sums=(
    'SKIP'  # mesa tarball — co-trust w/ r4 sibling
    'SKIP'  # patches are local
    'SKIP'
    'SKIP'
    'SKIP'
    'SKIP'
    'SKIP'  # icd.json
 )
 prepare() {
    cd "mesa-${_mesaver}"
    # r1+r2: small sed-based edits inherited from r4 (verbatim from the
    # sibling PKGBUILD — keep in sync).
    sed -i 's|\.KHR_robustness2 = PAN_ARCH >= 10,|.KHR_robustness2 = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
    sed -i 's|\.EXT_robustness2 = PAN_ARCH >= 10,|.EXT_robustness2 = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
    sed -i 's|\.nullDescriptor = PAN_ARCH >= 10,|.nullDescriptor = true,|' src/panfrost/vulkan/panvk_vX_physical_device.c
    sed -i 's|bool has_vk1_1 = PAN_ARCH >= 10;|bool has_vk1_1 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c
    sed -i 's|bool has_vk1_2 = PAN_ARCH >= 10;|bool has_vk1_2 = true;|' src/panfrost/vulkan/panvk_vX_physical_device.c
    # r3: EXT_transform_feedback for Bifrost.
    patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
    # r4: XFB primitive decomposition NIR pass.
    patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
    # video: VK_KHR_video_decode_h264 via V4L2-hantro.
    patch -p1 < "${srcdir}/0005-panvk-bifrost-video-KHR-video-decode-h264.patch"
    # Sanity-check r1..r4 (inherited).
    grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "nullDescriptor = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "has_vk1_1 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
    test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
    grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
    test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
    # Sanity-check video patch landed.
    grep -q "KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro()" \
        src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "PANVK_QUEUE_FAMILY_VIDEO_DECODE" src/panfrost/vulkan/panvk_device.h
    test -f src/panfrost/vulkan/panvk_video_decode.c
    test -f src/panfrost/vulkan/panvk_video_decode.h
    test -f src/panfrost/vulkan/panvk_v4l2.c
    test -f src/panfrost/vulkan/panvk_v4l2_h264.c
    test -f src/panfrost/vulkan/panvk_v4l2_h264_slice_header.c
    test -f src/panfrost/vulkan/panvk_v4l2_h264_slice_header.h
    grep -q "panvk_v4l2_h264_slice_header.c" src/panfrost/vulkan/meson.build
    grep -q "panvk_video_queue_submit_noop" src/panfrost/vulkan/panvk_vX_device.c
 }
 build() {
    cd "mesa-${_mesaver}"
    # Mirror r4's narrow build profile.
    meson setup build/ \
        --prefix=/usr \
        --libdir=lib \
        --buildtype=release \
        -Dvulkan-drivers=panfrost \
        -Dgallium-drivers= \
        -Dplatforms=wayland,x11 \
        -Dglx=disabled \
        -Degl=disabled \
        -Dgles1=disabled \
        -Dgles2=disabled \
        -Dvulkan-layers= \
        -Dtools= \
        -Dgallium-rusticl=false \
        -Dmicrosoft-clc=disabled
    meson compile -C build
 }
 package() {
    cd "${srcdir}/mesa-${_mesaver}"
    # Co-install path — parallel to r4's /usr/lib/panvk-bifrost/.
    install -Dm755 build/src/panfrost/vulkan/libvulkan_panfrost.so \
        "$pkgdir/usr/lib/panvk-bifrost-video/libvulkan_panfrost.so"
    # ICD JSON pointing at the video build. Opt-in via VK_ICD_FILENAMES;
    # NOT in /usr/share/vulkan/icd.d/ so it doesn't override stock or r4.
    install -Dm644 "$srcdir/icd.json" \
        "$pkgdir/usr/lib/panvk-bifrost-video/icd.json"
 }
@@ -0,0 +1,40 @@
 # mesa-panvk-bifrost-video
 Patched Mesa `libvulkan_panfrost.so` that **adds `VK_KHR_video_decode_h264`** on Mali Bifrost SBCs (PAN_ARCH 6/7, RK3566/RK3568 class hardware), backed by the SoC's V4L2-stateless **hantro** VPU.
 This is a **sibling** of [mesa-panvk-bifrost](../mesa-panvk-bifrost/) (the r4 package that exposes Bifrost to Chromium's Vulkan compositor). Pick this one when the consumer wants Vulkan **video decode** in addition; pick r4 for compositor-only.
 ## Status
 Phase 4 byte-exact validated 2026-05-21: 48/48 unique BBB display frames decoded by this package are byte-identical to `ffmpeg+libva-v4l2-request-fourier` running on the same hantro hardware. Phase 5 second-model review completed; all load-bearing findings addressed. First publish via marfrit-packages CI 2026-05-22 (PR #79 merge did not auto-fire Actions; this re-trigger restores the standard build/sign/publish path).
 ## How to use
 ```sh
 # Co-installs alongside r4 and stock mesa.
 sudo pacman -S mesa-panvk-bifrost-video
 # Opt in (not on the default loader search path).
 export VK_ICD_FILENAMES=/usr/lib/panvk-bifrost-video/icd.json
 export PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1   # mesa-upstream gate
 # Run a Vulkan video consumer.
 vulkan-video-dec-simple-test -i your.h264 --codec h264 --noPresent --maxFrameCount 50
 # or
 ffmpeg -hwaccel vulkan -i your.mp4 ...
 ```
 ## Phase 1 limitations
 Documented in source comments and worth knowing before relying on this in production:
 - **Single video session per device.** Concurrent `VkVideoSessionKHR` on the same device clobber each other (`active_video` singleton). Sufficient for current single-stream consumers.
 - **Synchronous decode at record time.** The full V4L2 ioctl dance runs to completion inside `vkCmdDecodeVideoKHR`. No pipelining. Throughput is bounded by hantro's ~1.16× realtime on 1080p H.264.
 - **Hardcoded `/dev/video1` + `/dev/media0`.** Matches RK3566/68 but won't work on other SoCs without a topology-walk port (see `libva-v4l2-request-fourier` for the full version).
 - **Bitstream source buffer assumed HOST_VISIBLE.** True on panvk-bifrost (no DEVICE_LOCAL-only memory types exist), but the code silently skips decode if the app bound the buffer to non-host-visible memory.
 ## Co-existence
 - Installs to `/usr/lib/panvk-bifrost-video/` — parallel to r4's `/usr/lib/panvk-bifrost/` and stock `/usr/lib/`.
 - Opt-in via `VK_ICD_FILENAMES`; does NOT register itself in `/usr/share/vulkan/icd.d/`.
 - Three drivers coexist without conflict; the user picks at runtime which to use.
@@ -0,0 +1,7 @@
 {
    "ICD": {
        "api_version": "1.4.335",
        "library_path": "/usr/lib/panvk-bifrost-video/libvulkan_panfrost.so"
    },
    "file_format_version": "1.0.1"
 }
@@ -0,0 +1,107 @@
 From 1b286ddb4efaca26ec9b9e290e989fec77dc1c77 Mon Sep 17 00:00:00 2001
 From: Markus Fritsche <mfritsche@reauktion.de>
 Date: Fri, 22 May 2026 10:18:21 +0200
 Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 8x8 IDCT through
 daedalus-fourier
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 H264DSPContext.idct8_add (called per 8x8 block from the High-profile
 intra-8x8-DCT decode path in h264_mb.c) now dispatches through
 daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon.
 The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8x8)
 the recipe is CPU NEON, so this is effectively a NEON-to-NEON
 substitution layered on top of the cycle-6 IDCT 4x4 wiring.  Same
 pthread_once global context, same destructive-zero semantics; FFmpeg
 column-major 8x8 storage block[r + 8*c] matches daedalus's convention.
 Bulk path c->idct8_add4 (used for inter 8x8-DCT macroblocks) remains
 on the in-tree NEON .S code and will be batched through
 daedalus_recipe_dispatch_h264_idct8 with n_blocks>1 in a follow-up.
 Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
 green).
 Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 7.
 ---
 libavcodec/aarch64/h264_idct_daedalus.c   | 29 ++++++++++++++++-------
 libavcodec/aarch64/h264dsp_init_aarch64.c |  3 ++-
 2 files changed, 23 insertions(+), 9 deletions(-)
 diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
 index 538d223..cbb98af 100644
 --- a/libavcodec/aarch64/h264_idct_daedalus.c
 +++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -1,14 +1,16 @@
 /*
 - * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
 + * H.264 4x4 / 8x8 IDCT + add — daedalus-fourier substitution shims.
  *
 - * Routes H264DSPContext.idct_add through
 - * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
 - * The recipe layer picks the substrate (CPU NEON by default for
 - * cycle 6; future cycles may dispatch to V3D opportunistically).
 + * Routes H264DSPContext.idct_add  → daedalus_recipe_dispatch_h264_idct4
 + *        H264DSPContext.idct8_add → daedalus_recipe_dispatch_h264_idct8
 + * instead of the in-tree ff_h264_idct{,8}_add_neon assembly.  The
 + * recipe layer picks the substrate (CPU NEON by default for cycles
 + * 6 + 7; future cycles may dispatch to V3D opportunistically).
  *
 - * FFmpeg's 4x4 block memory layout matches daedalus's column-major
 - * convention: block[r + 4*c] = coefficient at (row r, col c).  Both
 - * sides destructively zero the block after the transform.
 + * FFmpeg's 4x4 and 8x8 block memory layouts match daedalus's
 + * column-major convention: block[r + N*c] = coefficient at
 + * (row r, col c) for N ∈ {4, 8}.  Both sides destructively zero the
 + * block after the transform.
  *
  * The library context is process-global and lazily initialised under
  * pthread_once.  We pick the no-QPU constructor here because
@@ -37,6 +39,7 @@ static void daedalus_ctx_init_once(void)
 }
 void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
 {
@@ -47,3 +50,13 @@ void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
     daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
                                         block, 1, &meta);
 }
 +
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride)
 +{
 +    static const daedalus_h264_block_meta meta = { .dst_off = 0 };
 +
 +    pthread_once(&g_dctx_once, daedalus_ctx_init_once);
 +
 +    daedalus_recipe_dispatch_h264_idct8(g_dctx, dst, (size_t)stride,
 +                                        block, 1, &meta);
 +}
 diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
 index b993df2..741e551 100644
 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c
 +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -79,6 +79,7 @@ void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
                             const uint8_t nnzc[15 * 8]);
 void ff_h264_idct8_add_neon(uint8_t *dst, int16_t *block, int stride);
 +void ff_h264_idct8_add_daedalus(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct8_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
 void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset,
                              int16_t *block, int stride,
@@ -146,7 +147,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
         c->idct_add16intra = ff_h264_idct_add16intra_neon;
         if (chroma_format_idc <= 1)
             c->idct_add8   = ff_h264_idct_add8_neon;
 -        c->idct8_add       = ff_h264_idct8_add_neon;
 +        c->idct8_add       = ff_h264_idct8_add_daedalus;
         c->idct8_dc_add    = ff_h264_idct8_dc_add_neon;
         c->idct8_add4      = ff_h264_idct8_add4_neon;
     } else if (have_neon(cpu_flags) && bit_depth == 10) {
 -- 
 2.47.3
@@ -33,10 +33,11 @@ FFMPEG_VERSION=8.1
 # epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
 # +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
 PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
-PKGREL=5  # pkgrel=5 — H.264 IDCT 4x4 daedalus-fourier substitution; skip past
+PKGREL=7  # pkgrel=7 — H.264 IDCT 8x8 daedalus-fourier substitution
-          # an orphan -4 .deb sitting in the apt pool that made
+          # (cycle 7).  Stacks on top of cycle-6 IDCT 4x4 (PR #76) and
-          # check-already-published.sh's `pool_ver ge source_full` short-
+          # the libxml2-drop ABI-skew workaround (PR #78).  Wires
-          # circuit the previous -3 build (PR #76).  (2026-05-21)
+          # H264DSPContext.idct8_add through
          # daedalus_recipe_dispatch_h264_idct8.  (2026-05-22)
 # daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6
 # H.264 IDCT 4x4).  Same SHA as the daedalus-v4l2 daemon already ships
@@ -66,6 +67,7 @@ fi
 patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch"
 patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
 patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
 patch -Np1 -i "$HERE/0004-h264-idct8-daedalus-fourier.patch"
 # --- daedalus-fourier: fetch + build static .a with PIC, install to a
 # per-build prefix; libavcodec.so links it into the shared object so
@@ -134,7 +136,6 @@ cd "$work/FFmpeg"
    --enable-libass \
    --enable-libfreetype \
    --enable-libfribidi \
    --enable-libxml2 \
    --enable-libpulse \
    --enable-libdav1d \
    --enable-libopus \
@@ -190,7 +191,6 @@ Depends: libc6,
         libfontconfig1,
         libfreetype6,
         libfribidi0,
         libxml2,
         libpulse0,
         libdav1d7 | libdav1d6,
         libopus0,
@@ -1,3 +1,42 @@
 ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-7) bookworm trixie; urgency=medium
  * Add 0004-h264-idct8-daedalus-fourier.patch — H264DSPContext.idct8_add
    (per-block 8x8 IDCT, called from the High-profile intra-8x8-DCT
    macroblock path in libavcodec/h264_mb.c) now dispatches through
    daedalus_recipe_dispatch_h264_idct8 instead of
    ff_h264_idct8_add_neon.  Cycle 7 of the daedalus-v4l2#11 step 2
    substitution arc — NEON-by-recipe, same pthread_once context the
    cycle-6 IDCT 4x4 shim already owns.
  * Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
    green; FFmpeg 8x8 block storage block[r + 8*c] matches daedalus
    column-major convention).
  * Bulk c->idct8_add4 (inter 8x8-DCT macroblocks) stays on the
    in-tree NEON .S code; batched substitution lands later.
  * No SONAME change, no Depends change.
 -- Markus Fritsche <mfritsche@reauktion.de>  Fri, 22 May 2026 10:30:00 +0000
 ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-6) bookworm trixie; urgency=medium
  * Drop --enable-libxml2 + libxml2 Depends — the Gitea
    debian-aarch64 runner ships libxml2 ≥ 2.14 (SONAME 16) while
    Debian trixie targets 2.12 (SONAME 2).  -5 built fine, then
    failed to load on higgs trixie:
       dlopen(libavformat.so.62): libxml2.so.16:
       cannot open shared object file
    Neither the daedalus-v4l2 daemon (direct AVPacket feed —
    libavformat used only for the in-tree v4l2request hwaccel
    glue) nor mpv-fourier (Lua + ytdlp + mpv's stream code do
    DASH/HLS) nor firefox-fourier (gecko-media DASH demux)
    consumes FFmpeg's libxml2-backed DASH demuxer, so dropping is
    feature-neutral.  Mirrors the libva trixie/runner ABI-skew
    workaround documented in PR #62.
  * CI workflow build-deps lose libxml2-dev for the same reason.
  * No source code change beyond configure flags + Depends.
    Substitution stays as PRs #76/#77 landed.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 23:30:00 +0000
 ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-5) bookworm trixie; urgency=medium
  * pkgrel-only bump (3 → 5) to force a rebuild of the H.264 IDCT 4x4
Author	SHA1	Message	Date
marfrit	493c762967	ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 8×8 → daedalus-fourier Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11 step 2). H264DSPContext.idct8_add — called per 8×8 block from the High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now dispatches through daedalus_recipe_dispatch_h264_idct8 instead of ff_h264_idct8_add_neon. ## What - Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/ ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/ h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus and a daedalus_recipe_dispatch_h264_idct8 call; patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to the new shim. - arch/PKGBUILD + debian/build-deb.sh: append the new patch to the apply list; bump pkgrel/PKGREL to 7. - No new build-deps, no Depends change, no daedalus-fourier rev — the d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8. ## Why The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8) the recipe is CPU NEON, so this is effectively a NEON-to-NEON substitution layered on top of cycle 6. Production validation of cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly, avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms baseline). Cycle 7 inherits the same shim's pthread_once context. Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7 green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus column-major convention). ## Scope NOT covered (deferred) - Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the in-tree NEON .S code; batched substitution with n_blocks>1 lands later alongside the cycle-6 bulk-paths work. - High-bit-depth (10-bit) path untouched. - Cycles 8/9 — separate PRs. ## SONAME Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60. ## Refs - reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11 - marfrit-packages PR #76 (cycle 6 IDCT 4×4) - marfrit-packages PR #78 (libxml2 ABI-skew workaround) - marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)	2026-05-22 10:20:27 +02:00
marfrit	360e8eb6bf	Merge pull request 'mesa-panvk-bifrost-video: r1-r4 patches as real files (symlinks broke CI)' (#83 ) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-video-retrigger into main Reviewed-on: marfrit/marfrit-packages#83	2026-05-22 07:55:59 +00:00
marfrit	4db64917bc	mesa-panvk-bifrost-video: r1-r4 patches as real files (symlinks broke CI) The original PR #79 used symlinks for 0001..0004 patches (pointing into ../mesa-panvk-bifrost/) to avoid drift between siblings. CI's "cp -r arch/mesa-panvk-bifrost-video /tmp/build-..." preserves the symlinks, but the destination /tmp/build-... has no sibling dir to resolve them against, so makepkg errors with: ==> ERROR: 0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch was not found in the build directory and is not a URL. Each Arch PKGBUILD owns its source files per convention; the duplication risk is low because r1..r4 are closed-release patches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 09:49:59 +02:00
marfrit	6288536223	Merge pull request 'ci: fix duplicate `run:` key in build.yml wipe-secrets step (unblocks all builds since 2026-05-21)' (#82 ) from claude-noether/marfrit-packages:noether/fix-build-yaml-duplicate-run into main Reviewed-on: marfrit/marfrit-packages#82	2026-05-22 07:30:37 +00:00
claude-noether	09d8813507	ci: fix duplicate `run:` key in build.yml wipe-secrets step PR #79 (`6ee8f2748`, mesa-panvk-bifrost-video) added a second `run:` mapping key on the next line of the same step: - name: wipe secrets if: always() run: rm -f /root/repo_pass /root/.ssh/id_ed25519 run: rm -f /root/.ssh/id_ed25519_hertz ← duplicate `run:` key YAML doesn't allow two mappings with the same key in one node, so Gitea's workflow parser rejected the entire file: actions/workflows.go:124:DetectWorkflows() [W] ignore invalid workflow "build.yml": yaml: unmarshal errors: line 1423: mapping key "run" already defined at line 1422 Result: every push to main since `6ee8f2748` (2026-05-21 23:14 CEST) silently failed to enqueue ANY action run. PR #80's "re-trigger by README touch" had no chance — workflow file was invalid before #80 even existed. Runs #161-163 do not exist; #160 (pre-#79) is the last successful enqueue. Fix: merge the two single-line `run:` invocations into one literal block. Functionally identical, YAML-valid. Post-merge: workflow file becomes valid again, new push to main triggers a fresh build run covering the backlog (#79's mesa-panvk-bifrost-video build that #80 wanted re-triggered). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 09:15:18 +02:00
marfrit	8a3186b53c	Merge pull request 'mesa-panvk-bifrost-video: re-trigger Actions for PR #79 ' (#80 ) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-video-retrigger into main Reviewed-on: marfrit/marfrit-packages#80	2026-05-22 06:32:42 +00:00
marfrit	b81e2251c2	mesa-panvk-bifrost-video: re-trigger Actions for PR #79 The merge commit for PR #79 (`e7cc22e42`) did not auto-fire the Gitea Actions workflow despite touching paths matched by the build.yml filter (arch/ + .gitea/workflows/). No run row exists between #160 (PR #78 merge) and now. This README touch is a no-op content change to force a fresh workflow_dispatch through the standard push trigger. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 07:51:01 +02:00
marfrit	e7cc22e42d	Merge pull request 'mesa-panvk-bifrost-video: sibling package adding VK_KHR_video_decode_h264' (#79 ) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-video-phase8 into main Reviewed-on: marfrit/marfrit-packages#79	2026-05-21 21:33:53 +00:00
marfrit	62b6b0a700	Merge pull request 'ffmpeg-v4l2-request-fourier (debian): drop --enable-libxml2 (runner SONAME skew)' (#78 ) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-drop-libxml2 into main Reviewed-on: marfrit/marfrit-packages#78	2026-05-21 21:24:07 +00:00
marfrit	a8f4a70887	ffmpeg-v4l2-request-fourier (debian): drop --enable-libxml2 (runner SONAME skew) The Gitea debian-aarch64 runner has been upgraded past Debian trixie and now ships libxml2 ≥ 2.14 (SONAME 16) while higgs (and any other trixie target) still has libxml2 2.12 (SONAME 2). -5 built cleanly, but on higgs the daedalus-v4l2 daemon's dlopen of libavformat.so.62 fails: dlopen(libavformat.so.62): libxml2.so.16: cannot open shared object file: No such file or directory Drop --enable-libxml2 from the Debian configure invocation; remove the libxml2 entry from Depends; remove libxml2-dev from the CI build-deps. FFmpeg's libxml2-backed DASH demuxer is unused on the Fourier fleet — daedalus-v4l2 daemon feeds AVPackets straight to avcodec_send_packet (no demux); mpv-fourier uses ytdlp + mpv's own stream code; firefox-fourier uses gecko-media's DASH demux. Bumps PKGREL 5 → 6. No source code or substitution-patch change. Mirrors the libva trixie/runner ABI-skew workaround pattern (marfrit-packages PR #62). Arch PKGBUILD unaffected — Arch runner + Arch consumers both rolling, libxml2 SONAMEs match. After this lands, re-deploy on higgs via: sudo apt update && sudo apt install -y ffmpeg-v4l2-request-fourier sudo systemctl restart daedalus-v4l2	2026-05-21 23:18:00 +02:00
marfrit	6ee8f2748e	mesa-panvk-bifrost-video: sibling package adding VK_KHR_video_decode_h264 panvk-bifrost-video campaign close. Phase 4 byte-exact validated 2026-05-21 on RK3566/PineTab2 (Mali-G52 r1 MC1 + hantro VPU): 48/48 unique BBB display frames decoded by this driver are byte-identical to ffmpeg+libva-v4l2-request-fourier on the same hantro hardware (frame 42 Y md5 = 54b9b396e6cd377256eb4bce0efc0bed both ways). Phase 5 second-model review passed; load-bearing findings applied. Co-installs at /usr/lib/panvk-bifrost-video/ parallel to the r4 sibling at /usr/lib/panvk-bifrost/; opt-in via VK_ICD_FILENAMES. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 23:14:01 +02:00
marfrit	711a921e66	Merge pull request 'ffmpeg-v4l2-request-fourier: PKGREL 3 → 5 (force rebuild past orphan -4 .deb)' (#77 ) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-debian-pkgrel-5 into main Reviewed-on: marfrit/marfrit-packages#77	2026-05-21 20:36:15 +00:00