# Phase 4 close — iter13 VK_EXT_transform_feedback implementation **Result:** GREEN. PanVk-Bifrost now implements VK_EXT_transform_feedback end-to-end. ## Probe outcome ``` [info] VK_EXT_transform_feedback present on device [info] transformFeedback=1 geometryStreams=0 [info] vertex 0: (0.000000, 0.000000, 4660.000000, 51966.000000) [info] vertex 1: (1.000000, 0.000000, 4660.000000, 51966.000000) [info] vertex 2: (2.000000, 0.000000, 4660.000000, 51966.000000) [PASS] PanVk-Bifrost transform feedback: 3 vertices captured correctly. ``` Byte-exact match against expected `vec4(vertex_id, instance_id=0, 0x1234, 0xcafe)` for each of 3 vertices. Output buffer was pre-filled with `0xDEADBEEF` sentinel — verified GPU actually wrote real data, not a stale init pattern. ## Source landings on ohm (mesa 26.0.6) Files modified (1 NEW + 6 edited): | File | Change | |---|---| | `src/panfrost/vulkan/panvk_shader.h` | sysval struct: + `uint32_t num_vertices`, `uint64_t xfb_address[4]` (under `PAN_ARCH < 9`) | | `src/panfrost/vulkan/panvk_vX_physical_device.c` | extension + feature + properties exposure (`PAN_ARCH < 9` gate) | | `src/panfrost/vulkan/panvk_vX_shader.c` | (1) `#include "pan_nir.h"` (2) sysval lowering cases for `load_num_vertices` + `load_xfb_address[0..3]` (3) the 3-pass XFB lowering (`nir_opt_constant_folding` → `nir_io_add_intrinsic_xfb_info` → `pan_nir_lower_xfb`) inserted **AFTER `nir_lower_io`** in `panvk_lower_nir` (4) `inputs.no_idvs` true for XFB-bearing vertex shaders | | `src/panfrost/vulkan/panvk_cmd_draw.h` | + `xfb` substruct in `panvk_cmd_graphics_state` (active flag + buffer_count + 4× buffers) | | `src/panfrost/vulkan/panvk_vX_cmd_draw.c` | per-draw `set_gfx_sysval` for `vs.num_vertices` + `vs.xfb_address[0..3]` | | `src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c` | NEW — `CmdBind/Begin/EndTransformFeedbackEXT` entry points | | `src/panfrost/vulkan/meson.build` | + `'jm/panvk_vX_cmd_xfb.c'` in jm_files | ## Key learnings (vs Phase 1 source map) 1. **Pass placement matters.** Phase 1's plan put `pan_nir_lower_xfb` inside `panvk_preprocess_nir`. Wrong — at that point the shader still has `store_deref` (var-based) intrinsics. `nir_lower_io` (which converts var-stores → `store_output` intrinsics) runs later inside `panvk_lower_nir`. The pass must run **right after `nir_lower_io`**, mirroring Panfrost-Gallium's flow where `nir_lower_io` precedes the XFB block in `pan_create_shader_state`. 2. **`nir_io_add_intrinsic_xfb_info` is mandatory.** Phase 1 assumed `nir->xfb_info` was the gate. Wrong — Mesa's pass that converts SPV xfb decorations into intrinsic-attached `io_xfb` info needs to run first. Gating on `nir->info.has_transform_feedback_varyings` instead (set by SPV→NIR for XFB-decorated outputs) is the correct trigger. 3. **`no_idvs` is non-negotiable.** Phase 1 noted Panfrost-Gallium sets `inputs.no_idvs = has_transform_feedback_varyings` but framed it as optional. It isn't — IDVS splits vertex shading into position + varying paths, but the JM job model for the varying path doesn't run for raster-discarded draws. Single non-IDVS vertex job is required for XFB. 4. **The sysval dirty mechanism does work for array fields.** `set_gfx_sysval(..., vs.xfb_address[0], _xa0)` expands correctly via `offsetof(struct, vs.xfb_address[0])` + `sizeof(uint64_t)` macros. Confirmed empirically — the FAU upload triggered as expected and the shader read the correct address. ## What the working shader looks like After all passes, the vertex shader does: ``` store_global(addr = xfb_address[0] + (instance_id * num_vertices + vertex_id) * stride, value = (vertex_id_as_float, instance_id_as_float, 4660.0, 51966.0)) ``` Where `xfb_address[0]` is a 64-bit FAU sysval populated per-draw from `cmdbuf->state.gfx.xfb.buffers[0].addr + offset`. ## Phase 4 artifact snapshot Working state of all 7 source files captured in `iter13/applied_state/` for replication. ## Next: Phase 5 Per CLAUDE.md "Reviews are never skippable" — second-model review of the implementation.