daedalus-v4l2: 77e14e5 -> 3bc0da1 — decode_us + periodic stats (#15 )

Merge pull request 'ci: add libvulkan-dev + glslang-tools for daedalus-fourier build dep' (#73 ) from claude-noether/marfrit-packages:noether/ci-fourier-build-deps into main
Reviewed-on: marfrit/marfrit-packages#73
2026-05-21 20:29:07 +02:00 · 2026-05-21 18:05:59 +00:00 · 2026-05-21 19:58:19 +02:00 · 2026-05-21 17:15:12 +00:00 · 2026-05-21 18:39:22 +02:00 · 2026-05-21 14:54:18 +00:00
9 changed files with 823 additions and 30 deletions
@@ -1166,9 +1166,18 @@ jobs:
          # daemon never link-binds against libav (Option γ — dlopen
          # at runtime), so any header set with the right struct
          # definitions works.
          # libvulkan-dev + glslang-tools: needed by the in-build
          # daedalus-fourier fetch (build-deb.sh fetches the sibling
          # library, cmake-builds it into a temp prefix, then the
          # daedalus daemon static-links against it via pkg-config).
          # Without these, daedalus-fourier's find_package(Vulkan)
          # and glslangValidator find_program both fail at configure
          # time.  See marfrit/daedalus-fourier PR #1 +
          # reauktion/daedalus-v4l2 PR #13.
          retry apt-get install -y --no-install-recommends \
              build-essential cmake ninja-build pkg-config git \
              libavcodec-dev libavformat-dev libavutil-dev libdrm-dev \
              libvulkan-dev glslang-tools \
              linux-libc-dev \
              curl ca-certificates openssh-client rsync dpkg-dev
@@ -18,12 +18,15 @@ _module=daedalus_v4l2
 # Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
 # bit-versioned together so the chardev wire protocol stays in sync.
-# PROTO_VERSION 0 → 1 at this pin (H.264 B-frame reorder fix); must
+# 5d8b436 reverts PRs #7 + #8 (parking design that broke libva's
-# install both packages atomically.
+# 1:1 contract — see daedalus-v4l2#9 + #10).  Tree is
-_commit=79256dc7ef41f83873ca9c23db20f5888858e65d
+# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
 # PROTO_VERSION drops 1 → 0; lock-step install with
 # daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
 _commit=5d8b4369e58ab947d1c56b1f718293c57c6065b5
-pkgver=0.1.0.r28.79256dc
+pkgver=0.1.0.r33.5d8b436
-pkgrel=1  # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
+pkgrel=1  # reset for new upstream pin (5d8b436 — revert parking design)
 pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
 arch=('any')
 url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -16,18 +16,18 @@
 pkgname=daedalus-v4l2
 _upstreampkg=daedalus-v4l2
-# Pin the daedalus-v4l2 tip.  79256dc = "kernel + daemon: H.264 B-frame
+# 3bc0da1 = picks up daedalus-v4l2 PR #15 — per-frame `decode_us=N`
-# display reorder fix (closes #6)" — adds the wire-protocol src_pts /
+# in the `decoder: OK` log line + a periodic `decoder stats` summary
-# output_src_pts / RESP_FRAME flags split that lets H.264 streams with
+# every 60 frames.  Pure observability — baseline for the
-# B-frames preserve display order through libva → kernel → daemon.
+# substitution work in daedalus-v4l2#11 step 2.  Daemon still needs
-# PROTO_VERSION bumps 0 → 1; lock-step userspace + kernel rebuild
+# daedalus-fourier at build time (Arch packaging for that is a
-# REQUIRED (daedalus-v4l2-dkms PKGBUILD pinned to the same commit).
+# follow-up; Debian side fetches inline via build-deb.sh).
-_commit=79256dc7ef41f83873ca9c23db20f5888858e65d
+_commit=3bc0da168cc0aa2271bfb6bc2864b49c48291185
 # 0.1.0 (pre-1.0) + commit count + short sha.  Bump the .Y on each
 # Phase 8.x close.  pkgver() recomputes at build time.
-pkgver=0.1.0.r28.79256dc
+pkgver=0.1.0.r39.3bc0da1
-pkgrel=1  # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
+pkgrel=1  # reset for new upstream pin (3bc0da1 — decode_us + stats)
 pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
 arch=('aarch64')
 url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -0,0 +1,629 @@
 diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
 --- a/src/panfrost/vulkan/meson.build	2026-05-21 14:04:02.529474145 +0200
 +++ b/src/panfrost/vulkan/meson.build	2026-05-21 14:04:04.106755486 +0200
@@ -123,6 +123,7 @@
   'panvk_vX_nir_lower_input_attachment_loads.c',
   'panvk_vX_sampler.c',
   'panvk_vX_shader.c',
 +  'panvk_vX_xfb_lower.c',
   sha1_h,
 ]
 diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
 --- a/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:02.525251986 +0200
 +++ b/src/panfrost/vulkan/panvk_shader.h	2026-05-21 14:04:04.084251800 +0200
@@ -154,6 +154,8 @@
       /* aligned_u64 attribute below inserts the 4-byte alignment gap
        * after num_vertices automatically — no explicit pad needed. */
       aligned_u64 xfb_address[4];  /* iter13: 4 transform feedback buffer base addresses */
 +      uint32_t xfb_topology;       /* iter17: panvk_xfb_topology enum value */
 +      uint32_t xfb_output_count;   /* iter17: per-instance output verts after decomp */
 #endif
       int32_t first_vertex;
       int32_t base_instance;
@@ -569,4 +571,76 @@
    struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
    struct panvk_shader **shader_out);
 +
 +#if PAN_ARCH < 9
 +/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
 + * we need to distinguish at shader runtime for XFB capture. LIST topologies
 + * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
 +enum panvk_xfb_topology {
 +   PANVK_XFB_TOPO_LIST            = 0,
 +   PANVK_XFB_TOPO_LINE_STRIP      = 1,
 +   PANVK_XFB_TOPO_TRI_STRIP       = 2,
 +   PANVK_XFB_TOPO_TRI_FAN         = 3,
 +   PANVK_XFB_TOPO_LINE_LIST_ADJ   = 4,
 +   PANVK_XFB_TOPO_LINE_STRIP_ADJ  = 5,
 +   PANVK_XFB_TOPO_TRI_LIST_ADJ    = 6,
 +   PANVK_XFB_TOPO_TRI_STRIP_ADJ   = 7,
 +};
 +
 +#include "panvk_macros.h"
 +struct nir_shader;
 +bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
 +
 +/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
 +static inline uint32_t
 +panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
 +{
 +   switch (topo) {
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
 +      return PANVK_XFB_TOPO_LINE_STRIP;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
 +      return PANVK_XFB_TOPO_TRI_STRIP;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
 +      return PANVK_XFB_TOPO_TRI_FAN;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_LINE_LIST_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_TRI_LIST_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
 +      return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
 +   case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
 +   default:
 +      return PANVK_XFB_TOPO_LIST;
 +   }
 +}
 +
 +/* Compute the per-instance output vertex count for a given (topology, input count). */
 +static inline uint32_t
 +panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
 +{
 +   switch (topo) {
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
 +      return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
 +      return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
 +      return (input_count / 4u) * 2u;
 +   case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
 +      return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
 +      return (input_count / 6u) * 3u;
 +   case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
 +      return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
 +   default:
 +      return input_count;  /* LIST topologies: 1:1 mapping */
 +   }
 +}
 +#endif
 +
 +
 #endif
 diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
 --- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:02.528576354 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c	2026-05-21 14:04:04.091357598 +0200
@@ -727,6 +727,20 @@
    /* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
     * reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
 +
 +   /* iter17: XFB primitive-decomposition sysvals.
 +    * xfb_topology = enum value for the current bound topology.
 +    * xfb_output_count = per-instance output vertex count after decomposition.
 +    * For LIST topologies, output_count == input vertex count and the shader
 +    * takes the iter13 single-store fast path. */
 +   {
 +      VkPrimitiveTopology vk_topo =
 +         cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
 +      uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
 +      uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
 +      set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
 +   }
    {
       const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
       /* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
 diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
 --- a/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:02.527576494 +0200
 +++ b/src/panfrost/vulkan/panvk_vX_shader.c	2026-05-21 14:04:04.098356619 +0200
@@ -895,7 +895,10 @@
        nir->info.has_transform_feedback_varyings) {
       NIR_PASS(_, nir, nir_opt_constant_folding);
       NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
 -      NIR_PASS(_, nir, pan_nir_lower_xfb);
 +      /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
 +       * primitive decomposition for non-LIST topologies. Single-store LIST
 +       * fast path matches iter13 behavior. */
 +      NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
    }
 #endif
 }
 diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
 --- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c	1970-01-01 01:00:00.000000000 +0100
 +++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c	2026-05-21 14:04:04.115354242 +0200
@@ -0,0 +1,486 @@
 +/*
 + * Copyright © 2026 mfritsche / claude-noether
 + * SPDX-License-Identifier: MIT
 + *
 + * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
 + * primitive decomposition for transform_feedback on non-LIST topologies
 + * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
 + *
 + * Approach: emit a topology dispatch at the start of each store_output
 + * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
 + * into per-topology emission logic. For each affected topology, the lowered
 + * code emits guarded conditional stores — one per primitive this vertex
 + * contributes to, computing the output buffer position via primitive index
 + * and slot within the decomposed primitive.
 + *
 + * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
 + * matches iter13's single-store behavior.
 + *
 + * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
 + * as slot 2 — handled via a NIR loop bounded by num_vertices.
 + *
 + * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
 + */
 +
 +#include "panvk_macros.h"
 +
 +#if PAN_ARCH < 9
 +
 +#include "panvk_shader.h"
 +
 +#include "compiler/nir/nir_builder.h"
 +#include "pan_nir.h"
 +
 +#include <vulkan/vulkan_core.h>
 +
 +/* ----- Address arithmetic ----- */
 +
 +static nir_def *
 +xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
 +               uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *byte_off = nir_iadd_imm(b,
 +      nir_imul_imm(b, out_idx, stride), offset_bytes);
 +   return nir_iadd(b, buf, nir_u2u64(b, byte_off));
 +}
 +
 +static void
 +emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
 +                nir_def *instance_id, nir_def *raw_vid, nir_def *value,
 +                uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *out_idx = nir_iadd(b,
 +      nir_imul(b, instance_id, output_count), raw_vid);
 +   nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +   nir_store_global(b, value, addr);
 +}
 +
 +static void
 +emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
 +                nir_def *instance_id, nir_def *eligible,
 +                nir_def *prim_idx, nir_def *slot,
 +                uint32_t verts_per_prim,
 +                nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_push_if(b, eligible);
 +   {
 +      nir_def *out_idx = nir_iadd(b,
 +         nir_imul(b, instance_id, output_count),
 +         nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
 +      nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +      nir_store_global(b, value, addr);
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* ----- Per-topology emission ----- */
 +
 +/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
 +static void
 +emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
 +               nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +               nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +
 +   /* Prim v, slot 0: v < N-2 */
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ult(b, v, Nm2),
 +      v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +
 +   /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *parity = nir_iand_imm(b, prim, 1u);
 +      nir_def *slot = nir_iadd_imm(b, parity, 1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, slot, 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *parity = nir_iand_imm(b, prim, 1u);
 +      nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, slot, 3, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
 +static void
 +emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
 +                nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +
 +   /* Prim v, slot 0: v < N-1 */
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ult(b, v, Nm1),
 +      v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +
 +   /* Prim v-1, slot 1: 1 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
 + *   vertex v=0: contributes to ALL prims as slot 2 (loop required)
 + *   vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
 + *   vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
 + */
 +static void
 +emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
 +             nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +             nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +
 +   /* Prim v-1, slot 0: 1 <= v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot 1: 2 <= v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
 +   }
 +
 +   /* Central vertex (v == 0): loop over all prims, write to slot 2. */
 +   nir_push_if(b, nir_ieq_imm(b, v, 0));
 +   {
 +      nir_variable *p_var = nir_local_variable_create(b->impl,
 +         glsl_uint_type(), "fan_p");
 +      nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
 +      nir_push_loop(b);
 +      {
 +         nir_def *p = nir_load_var(b, p_var);
 +         nir_push_if(b, nir_uge(b, p, Nm2));
 +         {
 +            nir_jump(b, nir_jump_break);
 +         }
 +         nir_pop_if(b, NULL);
 +
 +         nir_def *out_idx = nir_iadd(b,
 +            nir_imul(b, instance_id, output_count),
 +            nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
 +         nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
 +         nir_store_global(b, value, addr);
 +
 +         nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
 +      }
 +      nir_pop_loop(b, NULL);
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
 + *   v contributes if v%4 == 1: prim v/4 slot 0
 + *   v contributes if v%4 == 2: prim v/4 slot 1
 + */
 +static void
 +emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   (void)N; /* eligibility is mod-based, not range-based */
 +   nir_def *vmod4 = nir_iand_imm(b, v, 3u);
 +   nir_def *prim = nir_ushr_imm(b, v, 2);  /* v / 4 */
 +
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ieq_imm(b, vmod4, 1),
 +      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +
 +   emit_prim_store(b, buf, output_count, instance_id,
 +      nir_ieq_imm(b, vmod4, 2),
 +      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +}
 +
 +/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
 + *   v contributes to prim v-1 slot 0 (1 <= v <= N-2)
 + *   v contributes to prim v-2 slot 1 (2 <= v <= N-1)
 + */
 +static void
 +emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                    nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                    nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   nir_def *Nm1 = nir_iadd_imm(b, N, -1);
 +   nir_def *Nm2 = nir_iadd_imm(b, N, -2);
 +
 +   /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -1);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 1)),
 +         nir_ult(b, v, Nm1));
 +      (void)Nm2;
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
 +   }
 +
 +   /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
 +   {
 +      nir_def *prim = nir_iadd_imm(b, v, -2);
 +      nir_def *eligible = nir_iand(b,
 +         nir_uge(b, v, nir_imm_int(b, 2)),
 +         nir_ult(b, v, N));
 +      emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                      prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
 + *   v contributes if v%6 == 0: prim v/6 slot 0
 + *   v contributes if v%6 == 2: prim v/6 slot 1
 + *   v contributes if v%6 == 4: prim v/6 slot 2
 + */
 +static void
 +emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                  nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                  nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   (void)N;
 +   nir_def *vmod6 = nir_umod_imm(b, v, 6);
 +   nir_def *prim = nir_udiv_imm(b, v, 6);
 +
 +   for (uint32_t slot = 0; slot < 3; slot++) {
 +      emit_prim_store(b, buf, output_count, instance_id,
 +         nir_ieq_imm(b, vmod6, slot * 2),
 +         prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
 +   }
 +}
 +
 +/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
 + *   even i: {2i, 2i+2, 2i+4}    (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
 + *   odd  i: {2i, 2i+4, 2i+2}    (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
 + *
 + * Only EVEN input vertices contribute (since all output indices are 2*something).
 + * For even input v:
 + *   prim v/2 slot 0 (always, if v/2 < N/2-2)
 + *   prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd   (when v >= 2)
 + *   prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd   (when v >= 4)
 + */
 +static void
 +emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
 +                   nir_def *buf, nir_def *output_count, nir_def *instance_id,
 +                   nir_def *value, uint16_t stride, uint16_t offset_bytes)
 +{
 +   /* Bail for odd input vertices — they never contribute. */
 +   nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
 +   nir_push_if(b, v_is_even);
 +   {
 +      nir_def *N_half = nir_ushr_imm(b, N, 1);
 +      nir_def *max_prim = nir_iadd_imm(b, N_half, -2);  /* N/2 - 2 */
 +      nir_def *v_half = nir_ushr_imm(b, v, 1);
 +
 +      /* Prim v/2 slot 0: v/2 < N/2 - 2 */
 +      emit_prim_store(b, buf, output_count, instance_id,
 +         nir_ult(b, v_half, max_prim),
 +         v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
 +
 +      /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
 +      {
 +         nir_def *prim = nir_iadd_imm(b, v_half, -1);
 +         nir_def *parity = nir_iand_imm(b, prim, 1u);
 +         nir_def *slot = nir_iadd_imm(b, parity, 1);  /* even→1, odd→2 */
 +         nir_def *eligible = nir_iand(b,
 +            nir_uge(b, v, nir_imm_int(b, 2)),
 +            nir_ult(b, prim, max_prim));
 +         emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                         prim, slot, 3, value, stride, offset_bytes);
 +      }
 +
 +      /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
 +      {
 +         nir_def *prim = nir_iadd_imm(b, v_half, -2);
 +         nir_def *parity = nir_iand_imm(b, prim, 1u);
 +         nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);  /* even→2, odd→1 */
 +         nir_def *eligible = nir_iand(b,
 +            nir_uge(b, v, nir_imm_int(b, 4)),
 +            nir_ult(b, prim, max_prim));
 +         emit_prim_store(b, buf, output_count, instance_id, eligible,
 +                         prim, slot, 3, value, stride, offset_bytes);
 +      }
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* ----- Main lowering: per store_output XFB channel ----- */
 +
 +static void
 +lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
 +                        unsigned channel_idx, unsigned num_components,
 +                        unsigned buffer, unsigned offset_words)
 +{
 +   assert(buffer < MAX_XFB_BUFFERS);
 +   assert(nir_intrinsic_component(intr) == 0);
 +
 +   uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
 +   assert(stride != 0);
 +   uint16_t offset_bytes = offset_words * 4;
 +
 +   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
 +   BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
 +
 +   nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
 +   nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
 +   nir_def *N = nir_load_num_vertices(b);
 +   nir_def *v = nir_load_raw_vertex_id_pan(b);
 +   nir_def *instance = nir_load_instance_id(b);
 +   nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
 +
 +   nir_def *src = intr->src[0].ssa;
 +   nir_component_mask_t mask = nir_component_mask(num_components);
 +   nir_def *value = nir_channels(b, src, mask << channel_idx);
 +
 +   /* Topology dispatch ladder. LIST first (fast path). */
 +   nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
 +   {
 +      emit_list_store(b, buf, out_count, instance, v, value,
 +                      stride, offset_bytes);
 +   }
 +   nir_push_else(b, NULL);
 +   {
 +      /* iter17 Janet Finding 3: gate all non-LIST emission on
 +       * output_count > 0. For degenerate input counts (N < min required
 +       * for the topology), output_count is 0 and we must emit NO stores
 +       * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
 +       * eligibility predicates and we falsely fire stores. */
 +      nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
 +      {
 +      nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
 +      {
 +         emit_tri_strip(b, v, N, buf, out_count, instance, value,
 +                        stride, offset_bytes);
 +      }
 +      nir_push_else(b, NULL);
 +      {
 +         nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
 +         {
 +            emit_line_strip(b, v, N, buf, out_count, instance, value,
 +                            stride, offset_bytes);
 +         }
 +         nir_push_else(b, NULL);
 +         {
 +            nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
 +            {
 +               emit_tri_fan(b, v, N, buf, out_count, instance, value,
 +                            stride, offset_bytes);
 +            }
 +            nir_push_else(b, NULL);
 +            {
 +               nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
 +               {
 +                  emit_line_list_adj(b, v, N, buf, out_count, instance, value,
 +                                     stride, offset_bytes);
 +               }
 +               nir_push_else(b, NULL);
 +               {
 +                  nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
 +                  {
 +                     emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
 +                                         stride, offset_bytes);
 +                  }
 +                  nir_push_else(b, NULL);
 +                  {
 +                     nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
 +                     {
 +                        emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
 +                                          stride, offset_bytes);
 +                     }
 +                     nir_push_else(b, NULL);
 +                     {
 +                        /* TRI_STRIP_ADJ — last case */
 +                        emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
 +                                           stride, offset_bytes);
 +                     }
 +                     nir_pop_if(b, NULL);
 +                  }
 +                  nir_pop_if(b, NULL);
 +               }
 +               nir_pop_if(b, NULL);
 +            }
 +            nir_pop_if(b, NULL);
 +         }
 +         nir_pop_if(b, NULL);
 +      }
 +      nir_pop_if(b, NULL);
 +      }
 +      nir_pop_if(b, NULL);  /* Janet Finding 3: close output_count > 0 guard */
 +   }
 +   nir_pop_if(b, NULL);
 +}
 +
 +/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
 + * dispatch store_output through our topology-aware emission. */
 +static bool
 +lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
 +                 UNUSED void *data)
 +{
 +   if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
 +      b->cursor = nir_instr_remove(&intr->instr);
 +      nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
 +                               nir_load_raw_vertex_offset_pan(b));
 +      nir_def_rewrite_uses(&intr->def, repl);
 +      return true;
 +   }
 +
 +   if (intr->intrinsic != nir_intrinsic_store_output)
 +      return false;
 +
 +   bool progress = false;
 +   b->cursor = nir_before_instr(&intr->instr);
 +
 +   /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
 +    * Outer loop selects which annotation; inner picks which channel. */
 +   for (unsigned i = 0; i < 2; ++i) {
 +      nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
 +                         : nir_intrinsic_io_xfb(intr);
 +      for (unsigned j = 0; j < 2; ++j) {
 +         if (!xfb.out[j].num_components)
 +            continue;
 +         lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
 +                                 xfb.out[j].buffer, xfb.out[j].offset);
 +         progress = true;
 +      }
 +   }
 +
 +   if (progress)
 +      nir_instr_remove(&intr->instr);
 +   return progress;
 +}
 +
 +bool
 +panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
 +{
 +   return nir_shader_intrinsics_pass(
 +      nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
 +}
 +
 +#endif /* PAN_ARCH < 9 */
@@ -30,7 +30,7 @@
 pkgname=mesa-panvk-bifrost
 _mesaver=26.0.6
-pkgver=26.0.6.r3
+pkgver=26.0.6.r4
 pkgrel=1
 pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
 arch=('aarch64')
@@ -80,6 +80,7 @@ source=(
    "0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
    "0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
    "0003-panvk-bifrost-vk-ext-transform-feedback.patch"
    "0004-panvk-bifrost-xfb-primitive-decomposition.patch"
    "brave-vulkan"
    "icd.json"
 )
@@ -90,6 +91,7 @@ sha256sums=(
    'SKIP'
    'SKIP'
    'SKIP'
    'SKIP'
 )
 prepare() {
@@ -116,6 +118,15 @@ prepare() {
    # reports "Hardware accelerated" across the board for the affected paths).
    patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
    # iter17: XFB primitive decomposition for non-LIST topologies (TRI_STRIP,
    # TRI_FAN, LINE_STRIP, *_WITH_ADJACENCY). Replacement panvk-specific
    # NIR pass (panvk_per_arch(nir_lower_xfb)) substituted for upstream
    # pan_nir_lower_xfb. Closes the 162 dEQP-VK winding_* failures from
    # iter15 (958 P / 81 F / 0 Crash on full XFB CTS — remaining 81 fails
    # are by-design resume_* tests, transformFeedbackDraw=false).
    # Phase-doc context: ~/src/panvk-bifrost/iter17/phase{0,1,2,4,5,6,8}_*.md.
    patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
    # Sanity-check the patches landed.
    grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
@@ -124,8 +135,12 @@ prepare() {
    grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
    # iter13 sanity:
    grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
    grep -q "pan_nir_lower_xfb" src/panfrost/vulkan/panvk_vX_shader.c
    test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
    # iter17 sanity: pan_nir_lower_xfb call site has been replaced; new file present.
    grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
    grep -q "xfb_topology" src/panfrost/vulkan/panvk_shader.h
    grep -q "panvk_xfb_topology" src/panfrost/vulkan/panvk_shader.h
    test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
 }
 build() {
@@ -14,9 +14,9 @@
 # Sibling userspace package: ../daedalus-v4l2/build-deb.sh
 set -euo pipefail
-UPSTREAM_COMMIT=79256dc7ef41f83873ca9c23db20f5888858e65d
+UPSTREAM_COMMIT=5d8b4369e58ab947d1c56b1f718293c57c6065b5
-PKGVER=0.1.0+r28+g79256dc
+PKGVER=0.1.0+r33+g5d8b436
-PKGREL=1  # reset for new upstream pin (79256dc — H.264 B-frame reorder fix); still carries the #64 multi-kernel postinst fix
+PKGREL=1  # reset for new upstream pin (5d8b436 — revert parking design); still carries the #64 multi-kernel postinst fix
 MODULE_NAME=daedalus_v4l2
 HERE=$(dirname "$(readlink -f "$0")")
@@ -1,3 +1,39 @@
 daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
  * Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8.  Kernel
    module returns to the pre-#7 buf_done_and_job_finish completion
    model: no src/dst lifecycle decoupling, no parked dst_bufs, no
    1:1-contract violation against libva-v4l2-request-fourier
    (closes daedalus-v4l2#9 + #10 as won't-fix at this layer; proper
    fix tracked at daedalus-v4l2#11).
  * Wire-protocol drops 1 → 0; lock-step install with daedalus-v4l2
    0.1.0+r33+g5d8b436 REQUIRED.
  * Carries forward the #64 multi-kernel postinst fix.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:50:00 +0000
 daedalus-v4l2-dkms (0.1.0+r30+g6ffe92b-1) bookworm trixie; urgency=medium
  * Bump to 6ffe92b — fixes the kernel panic regression introduced
    by 79256dc's split-completion design (closes daedalus-v4l2#8).
    `device_run` now removes both src + dst from `m2m_ctx`'s
    rdy_queue at pickup time, not at `buf_done` time.  Without
    this, after `SRC_CONSUMED`'s `job_finish` released the m2m
    scheduler, the NEXT `device_run` saw the still-queued parked
    dst_buf and paired it with a fresh src — two inflight entries
    referencing the same vb2_buffer, the later `HAS_PIXELS`
    triggered list_del on an already-detached list_head, smashing
    the rdy_queue → hard reboot on Pi CM5 during `mpv vaapi-copy`
    playback of 720p H.264 (2026-05-21).
  * Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 1.
    Daemon (userspace daedalus-v4l2 package) need NOT bump in
    lockstep with this DKMS update; the existing
    daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible with
    daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.
  * Carries forward the #64 multi-kernel postinst fix.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:00:00 +0000
 daedalus-v4l2-dkms (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
  * Bump to 79256dc — H.264 B-frame display reorder fix (closes
@@ -11,16 +11,20 @@
 # Upstream repo: https://git.reauktion.de/reauktion/daedalus-v4l2
 set -euo pipefail
-# Same pin as the Arch PKGBUILD.  79256dc = "kernel + daemon: H.264
+# 3bc0da1 = picks up daedalus-v4l2 PR #15 — per-frame `decode_us=N`
-# B-frame display reorder fix (closes #6)" — adds the wire-protocol
+# in the `decoder: OK` log line + a periodic `decoder stats` summary
-# src_pts / output_src_pts / RESP_FRAME flags split that lets H.264
+# every 60 frames (codec, fps, avg decode_us, MBs/s, B/MB).  Pure
-# streams with B-frames preserve display order through libva → kernel
+# observability — no behaviour change.  Baseline metrics for the
-# → daemon.  PROTO_VERSION bumps 0 → 1; lock-step userspace + kernel
+# substitution work in daedalus-v4l2#11 step 2.
-# rebuild REQUIRED (daedalus-v4l2-dkms build-deb.sh pinned to the same
+UPSTREAM_COMMIT=3bc0da168cc0aa2271bfb6bc2864b49c48291185
-# commit).
+PKGVER=0.1.0+r39+g3bc0da1
-UPSTREAM_COMMIT=79256dc7ef41f83873ca9c23db20f5888858e65d
+PKGREL=1  # reset for new upstream pin (3bc0da1 — decode_us + stats)
-PKGVER=0.1.0+r28+g79256dc
+
-PKGREL=1  # reset for new upstream pin (79256dc — H.264 B-frame reorder fix)
+# daedalus-fourier pin.  d87239d = marfrit/daedalus-fourier PR #1 merge
 # (install rules + pkg-config, enables this consumer to find_package
 # + link).  Bump in lockstep with the upstream daemon when daedalus-
 # fourier's API or installed shaders are changed by a new consumer.
 DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc
 HERE=$(dirname "$(readlink -f "$0")")
@@ -30,14 +34,37 @@ export SOURCE_DATE_EPOCH=1779231600
 work=$(mktemp -d)
 trap "rm -rf $work" EXIT
 # --- daedalus-fourier: fetch + build + install to per-build prefix ---
 #
 # Static-linked into the daemon, so the temp prefix is only for the
 # duration of this build script.  Requires libvulkan-dev + glslang-tools
 # on the runner (already needed for the daedalus-fourier benches).
 FOURIER_PREFIX=$work/fourier-prefix
 mkdir -p "$FOURIER_PREFIX"
 cd "$work"
 curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
    "https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
 tar xzf daedalus-fourier.tar.gz
 cd daedalus-fourier
 cmake -B build -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
 cmake --build build --target daedalus_core
 cmake --install build
 # --- daedalus-v4l2: fetch + build daemon against installed daedalus-fourier ---
 cd "$work"
 curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
    "https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
 tar xzf daedalus-v4l2.tar.gz
 SRCDIR=daedalus-v4l2
-# Build daemon (CMake)
+# Build daemon (CMake) — point pkg-config at the daedalus-fourier
 # temp prefix so pkg_check_modules(DAEDALUS_FOURIER …) resolves to it.
 cd "$SRCDIR/daemon"
 PKG_CONFIG_PATH="$FOURIER_PREFIX/lib/pkgconfig" \
 cmake -B build -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr
@@ -1,3 +1,77 @@
 daedalus-v4l2 (0.1.0+r39+g3bc0da1-1) bookworm trixie; urgency=medium
  * Bump to 3bc0da1 — picks up daedalus-v4l2 PR #15.  Per-frame
    `decoder: OK ...` log line gains `decode_us=N` (libavcodec
    send_packet + receive_frame wall-clock cost in microseconds).
    New `decoder stats` summary line every 60 decoded frames with
    codec, fps, avg decode_us, MBs/s throughput, B/MB bitrate.
  * Pure observability — no decode-path behaviour change.
    Establishes baseline metrics for the substitution work in
    daedalus-v4l2#11 step 2 (replacing libavcodec primitives with
    daedalus-fourier kernels one cycle at a time).
  * On Pi CM5 / bbb 720p H.264 baseline: ~4 ms decode_us / 24 fps
    / 90 K MBs/s — workload is well under 1 % of any single
    daedalus-fourier kernel's NEON ceiling.
  * Wire protocol unchanged.  No daedalus-v4l2-dkms bump needed.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 18:30:00 +0000
 daedalus-v4l2 (0.1.0+r37+g77e14e5-1) bookworm trixie; urgency=medium
  * Bump to 77e14e5 — picks up daedalus-v4l2 PRs #12 + #13.
  * #12 (LOW_DELAY half-measure): the daemon now sets
    AV_CODEC_FLAG_LOW_DELAY on the H.264 AVCodecContext so libavcodec
    emits frames in decode order ~99% of the time (a few stragglers
    at GOP boundaries when the stream's SPS num_reorder_frames
    overrides the flag).  Visible improvement vs the 2-1-4-3
    pair-swap on Firefox YouTube + mpv playback; not a permanent
    fix (see #11 for the architectural plan).
  * #13 (daedalus-fourier linkage): the daemon now pkg-config-links
    against the daedalus-fourier kernel library (marfrit/
    daedalus-fourier) and logs substrate availability at startup.
    No kernels dispatched yet — this is the build-time / link-time
    foundation for the H.264 daemon-rewrite plan in #11
    (substituting daedalus-fourier IDCT 4×4 / IDCT 8×8 / luma
    deblock primitives for libavcodec's per-MB pixel math, one
    cycle at a time, measuring CPU saved per substitution).
  * Build-deb.sh now fetches + builds + installs daedalus-fourier
    (pinned at d87239d, marfrit/daedalus-fourier PR #1) into a
    per-build temp prefix, then builds the daemon with
    PKG_CONFIG_PATH pointing at it.  daedalus-fourier is
    statically linked into the daemon binary, so the resulting
    .deb has no new runtime deps.  Requires libvulkan-dev +
    glslang-tools on the CI runner (the daedalus-fourier benches
    already needed those).
  * Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 0.
    No daedalus-v4l2-dkms bump needed.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 16:30:00 +0000
 daedalus-v4l2 (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
  * Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8 (the parking
    design that broke libva-v4l2-request-fourier's 1:1 CAPTURE
    contract; see daedalus-v4l2#9 + #10).  After daemon-r28+g79256dc
    landed, mpv (--hwdec=vaapi-copy) failed pre-playing with
    "Unable to dequeue buffer: Resource temporarily unavailable" /
    "Failed to end picture decode" because the daemon parked CAPTURE
    buffers waiting for libavcodec to release H.264 B-frames in
    display order — violating the V4L2 stateless 1:1 contract.
    Firefox tolerated the mess (visible "2 1 4 3" pair-swap); mpv
    bailed.
  * This bump restores f0d4186-equivalent behaviour, plus PR #4
    (cosmetic H.264 DECODE_MODE / START_CODE menu controls).  PR #7
    + PR #8 wire-protocol additions (src_pts / output_src_pts /
    RESP_FRAME flags) are reverted — DAEDALUS_PROTO_VERSION drops
    back from 1 → 0.  Lock-step install with daedalus-v4l2-dkms
    0.1.0+r33+g5d8b436 REQUIRED.
  * Visible regression: H.264 B-frame streams in Firefox revert to
    the original "2 1 4 3 6 5" pair-swap visual.  The proper fix
    (concurrent in-flight requests in daemon + display-order reorder
    in libva-v4l2-request-fourier) is tracked at daedalus-v4l2#11.
 -- Markus Fritsche <mfritsche@reauktion.de>  Thu, 21 May 2026 14:50:00 +0000
 daedalus-v4l2 (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
  * Bump to 79256dc — H.264 B-frame display reorder fix (closes
Author	SHA1	Message	Date
claude-noether	9146e83710	daedalus-v4l2: 77e14e5 -> 3bc0da1 — decode_us + periodic stats (#15 )	2026-05-21 20:29:07 +02:00
marfrit	abf8fb3077	Merge pull request 'ci: add libvulkan-dev + glslang-tools for daedalus-fourier build dep' (#73 ) from claude-noether/marfrit-packages:noether/ci-fourier-build-deps into main Reviewed-on: marfrit/marfrit-packages#73	2026-05-21 18:05:59 +00:00
claude-noether	1414dfeac2	.gitea/workflows: add libvulkan-dev + glslang-tools to daedalus-v4l2 Debian build deps The daedalus-v4l2 build-deb.sh (post marfrit-packages#72) now fetches + cmake-builds daedalus-fourier into a per-build temp prefix before building the daemon, so the static-archive can be linked in. daedalus-fourier's CMakeLists requires Vulkan headers and glslangValidator (for SPIR-V compilation of the .comp compute shaders). Without them the configure step on the debian-aarch64 runner fails with: CMake Error at FindPackageHandleStandardArgs.cmake:233 (message): Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR) (Observed on Gitea Actions run 1056.) Add `libvulkan-dev` and `glslang-tools` to the apt-get install line so the in-build daedalus-fourier compile succeeds and the daemon can link.	2026-05-21 19:58:19 +02:00
marfrit	41c1e0b6b9	Merge pull request 'daedalus-v4l2: 5d8b436 -> 77e14e5 — #12 (LOW_DELAY) + #13 (daedalus-fourier linkage)' (#72 ) from claude-noether/marfrit-packages:noether/daedalus-bump-77e14e5-with-fourier into main Reviewed-on: marfrit/marfrit-packages#72	2026-05-21 17:15:12 +00:00
claude-noether	c9a4b82f2c	daedalus-v4l2: 5d8b436 -> 77e14e5 — picks up #12 (LOW_DELAY) + #13 (daedalus-fourier linkage) Daemon-only bump (no daedalus-v4l2-dkms change needed; PROTO_VERSION stays at 0). #12 (LOW_DELAY half-measure): daemon sets AV_CODEC_FLAG_LOW_DELAY on the H.264 AVCodecContext so libavcodec emits frames in decode order ~99% of the time (a few stragglers at GOP boundaries when the stream's SPS num_reorder_frames overrides the flag). Visible improvement vs the 2-1-4-3 pair-swap on Firefox + mpv playback; not the permanent fix — see daedalus-v4l2#11 for the architectural plan to substitute daedalus-fourier kernels for libavcodec's pixel math one cycle at a time. #13 (daedalus-fourier linkage): daemon now pkg-config-links against the daedalus-fourier kernel library (marfrit/daedalus-fourier) and logs substrate availability at startup. No kernels dispatched yet — this is the build-time foundation for the substitution work. build-deb.sh updated to fetch + build + install daedalus-fourier (pinned at d87239d, marfrit/daedalus-fourier PR #1) into a per- build temp prefix before invoking the daemon's cmake, exposing it via PKG_CONFIG_PATH. Static-linked, so the resulting .deb has no new runtime deps. Requires libvulkan-dev + glslang-tools on the CI runner. Arch PKGBUILD bumped to the same upstream commit but Arch packaging for daedalus-fourier itself is a follow-up; until that lands the Arch build expects daedalus-fourier installed by the user (AUR-style). Debian-side is end-to-end self-contained via build-deb.sh. Refs: * reauktion/daedalus-v4l2#12 * reauktion/daedalus-v4l2#13 * reauktion/daedalus-v4l2#11 * marfrit/daedalus-fourier#1	2026-05-21 18:39:22 +02:00
marfrit	736b6da176	Merge pull request 'daedalus-v4l2{,-dkms}: 79256dc/6ffe92b -> 5d8b436 — revert parking design' (#71 ) from claude-noether/marfrit-packages:noether/daedalus-revert-bump-5d8b436 into main Reviewed-on: marfrit/marfrit-packages#71	2026-05-21 14:54:18 +00:00
claude-noether	34972ae9c1	daedalus-v4l2{,-dkms}: 79256dc/6ffe92b -> 5d8b436 — revert parking design Lock-step downgrade of both packages to the revert tip of daedalus-v4l2 (PR #10 closed PRs #7 + #8). After 0.1.0+r28+g79256dc-1 / 0.1.0+r30+g6ffe92b-1 landed in production, mpv (--hwdec=vaapi-copy) failed pre-playing with "Unable to dequeue buffer: Resource temporarily unavailable" because the daemon parked CAPTURE buffers waiting for libavcodec's display-order reorder, violating libva's V4L2 stateless 1:1 contract. See daedalus-v4l2#9 for the diagnostic, #10 for the revert PR. DAEDALUS_PROTO_VERSION drops 1 → 0; install both .debs in the same apt transaction. Userspace ABI returns to the f0d4186-equivalent behaviour, plus PR #4 (cosmetic H.264 menu controls). The daedalus-v4l2-dkms #64 multi-kernel postinst behaviour stays in build-deb.sh. Visible regression: H.264 B-frame streams in Firefox return to the "2 1 4 3 6 5" pair-swap visual. Proper fix (concurrent in-flight requests in daemon + display-order reorder moved into libva-v4l2- request-fourier) tracked at daedalus-v4l2#11. Refs: * reauktion/daedalus-v4l2#9 * reauktion/daedalus-v4l2#10 (merged) * reauktion/daedalus-v4l2#11	2026-05-21 15:42:03 +02:00
marfrit	a9f1b833b9	Merge pull request 'mesa-panvk-bifrost: r3 -> r4 — iter17 XFB primitive decomposition' (#70 ) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-r4-iter17-xfb-decomp into main Reviewed-on: marfrit/marfrit-packages#70	2026-05-21 12:18:23 +00:00
marfrit	83e8eca56d	mesa-panvk-bifrost: r3 -> r4 — iter17 XFB primitive decomposition iter17 closes the 162 winding_* CTS failures from iter15's baseline by replacing the upstream pan_nir_lower_xfb call with a panvk-specific NIR pass (panvk_per_arch(nir_lower_xfb)) that handles per-primitive decomposition for non-LIST topologies (LINE_STRIP, TRIANGLE_STRIP, TRIANGLE_FAN, and the four _WITH_ADJACENCY variants). Topology + per-instance output vertex count are threaded as new sysvals (vs.xfb_topology + vs.xfb_output_count) so the NIR pass can dispatch per-topology at runtime without compiling 7+ shader variants. dEQP-VK.transform_feedback.simple.* result (133596 cases total): iter15 baseline -> iter17 Pass: 796 958 (+162) Fail: 243 81 (-162; resume_* by-design only) NotSupported: 132551 132551 Fatal-skip: 6 6 Pass rate of runnable: 76.2% -> 91.7% (+15.5pp) 100% of the iter15 winding-fail cluster closed. The remaining 81 fails are all resume_* (pause/resume XFB, by design — we advertise transformFeedbackDraw=false). Second-model review (janet) produced 3 findings; Findings 1+2 were already fixed in the in-tree applied state (stale applied_state/ snapshot read by reviewer), Finding 3 (degenerate N underflow on N<2) addressed by gating non-LIST emission on `output_count > 0` predicate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 14:07:00 +02:00
marfrit	1c8c186681	Merge pull request 'daedalus-v4l2-dkms: 79256dc -> 6ffe92b — fix kernel panic regression from #67 ' (#69 ) from claude-noether/marfrit-packages:noether/daedalus-dkms-bump-6ffe92b into main Reviewed-on: marfrit/marfrit-packages#69	2026-05-21 12:00:15 +00:00
claude-noether	a0be2dcc9f	daedalus-v4l2-dkms: 79256dc -> 6ffe92b — fix kernel panic from #7 Kernel-only bump. Fixes the hard-reboot regression introduced by the daedalus-v4l2#7 split-completion design and observed on higgs (Pi CM5) during the first mpv vaapi-copy playback of 720p H.264: device_run now removes src + dst from m2m_ctx's rdy_queue at the moment it picks them up, not at buf_done time. Without this, a parked dst_buf (waiting for libavcodec's display-order release) stayed in the rdy_queue and got re-picked by the next device_run after SRC_CONSUMED's job_finish released the scheduler — two inflight entries on the same vb2_buffer, later HAS_PIXELS calls list_del on an already-detached list_head, panic. DAEDALUS_PROTO_VERSION stays at 1 — daemon (userspace daedalus-v4l2) need NOT bump in lockstep with this DKMS update. The existing daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible with daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b. Refs: * reauktion/daedalus-v4l2#8	2026-05-21 13:56:42 +02:00
marfrit	eb89f12c3e	Merge pull request 'libva-v4l2-request-fourier: bump pin to c454618 (#15 transparent resize)' (#68 ) from claude-noether/marfrit-packages:bump-libva-fourier-c454618-issue-15 into main Reviewed-on: marfrit/marfrit-packages#68	2026-05-21 11:25:39 +00:00