# Phase 1 — source map for iter13 (VK_EXT_transform_feedback in PanVk) Closed **2026-05-20**. ## Headline The implementation surface is **much smaller than the initial estimate suggested**. Mesa already has the hardware-side abstraction (`pan_nir_lower_xfb`) and PanVk has a clean sysval-injection pattern (`load_sysval(b, graphics, bit_size, FIELD)`). Total new code: ~250-300 lines + a probe. ## The `pan_nir_lower_xfb` contract (oracle) `src/panfrost/compiler/pan_nir_lower_xfb.c` (85 lines, Collabora 2022) does: ``` For every nir_store_output with XFB metadata: Replace with nir_store_global at address: buf = nir_load_xfb_address(b, 64, .base = buffer_slot) idx = nir_load_instance_id * nir_load_num_vertices + nir_load_raw_vertex_id_pan addr = buf + (idx * stride) + offset ``` Plus: replaces `nir_load_vertex_id` with `nir_load_raw_vertex_id_pan + nir_load_raw_vertex_offset_pan` (XFB programs need zero-based vertex_id for correct buffer indexing). The intrinsics the pass uses, and PanVk's current handling: | Intrinsic | PanVk handles? | Notes | |---|---|---| | `nir_load_xfb_address(buffer=N)` | ❌ **NEW** | per-stream base address | | `nir_load_num_vertices` | ❌ **NEW** | per-draw vertex count | | `nir_load_raw_vertex_id_pan` | ✅ (panvk_vX_shader.c:211) | already wired | | `nir_load_raw_vertex_offset_pan` | ✅ (panvk_vX_shader.c:101 — JM path) | already wired | | `nir_load_instance_id` | ✅ standard Mesa | always available | Only 2 new intrinsic handlers needed. ## PanVk's sysval injection pattern (the wiring mechanism) The driver-shader contract is `panvk_graphics_sysvals` — a struct that's written by the driver per-draw and read by the shader via the FAU (Fast Auxiliary Unit) push-constant area. Definition: `src/panfrost/vulkan/panvk_shader.h:133-175`. Existing pattern (for `vs.first_vertex`): - **Struct field** (panvk_shader.h:154): `int32_t first_vertex;` - **Shader lowering** (panvk_vX_shader.c:87-88): ```c case nir_intrinsic_load_first_vertex: val = load_sysval(b, graphics, bit_size, vs.first_vertex); break; ``` - **Driver populates** (jm/panvk_vX_cmd_draw.c:824): ```c set_gfx_sysval(cmdbuf, dirty_sysvals, vs.first_vertex, info->vertex.base); ``` Mirror this exactly for the two new fields: - `vs.num_vertices` (uint32_t) - `vs.xfb_address[4]` (aligned_u64 array — Vulkan spec maxTransformFeedbackBuffers ≥ 1, recommended 4) ## Implementation skeleton ### A. Extension + feature exposure (panvk_vX_physical_device.c) Around line 91 (KHR_robustness2 block): ```c .EXT_transform_feedback = PAN_ARCH < 9, // JM-class only for now ``` At feature block (~line 491): ```c /* VK_EXT_transform_feedback */ .transformFeedback = PAN_ARCH < 9, .geometryStreams = false, /* No GS support yet */ ``` At properties block (~line 1019): ```c /* VK_EXT_transform_feedback */ .maxTransformFeedbackStreams = 1, /* Up the limit if multi-stream needed; 1 is GLES3 baseline */ .maxTransformFeedbackBuffers = 4, .maxTransformFeedbackBufferSize = UINT32_MAX, .maxTransformFeedbackStreamDataSize = 512, .maxTransformFeedbackBufferDataSize = 512, .maxTransformFeedbackBufferDataStride = 2048, .transformFeedbackQueries = false, /* Start without; defer to follow-up iter */ .transformFeedbackStreamsLinesTriangles = false, .transformFeedbackRasterizationStreamSelect = false, .transformFeedbackDraw = false, /* No vkCmdDrawIndirectByteCountEXT yet */ ``` ### B. Sysval struct fields (panvk_shader.h) Add to the `vs` substruct at line 150-157, only for `PAN_ARCH < 9`: ```c struct { #if PAN_ARCH < 9 int32_t raw_vertex_offset; uint32_t num_vertices; /* NEW iter13: XFB needs per-draw vertex count */ aligned_u64 xfb_address[4]; /* NEW iter13: 4 transform feedback buffer base addresses */ #endif int32_t first_vertex; int32_t base_instance; uint32_t noperspective_varyings; } vs; ``` (Use `#if PAN_ARCH < 9` since we're not yet supporting Valhall-CSF; can extend later.) ### C. Shader-side intrinsic lowering (panvk_vX_shader.c) Add cases ~line 103 (inside `PAN_ARCH < 9` block): ```c #if PAN_ARCH < 9 case nir_intrinsic_load_num_vertices: val = load_sysval(b, graphics, bit_size, vs.num_vertices); break; case nir_intrinsic_load_xfb_address: { unsigned idx = nir_intrinsic_base(intr); assert(idx < 4); val = load_sysval(b, graphics, bit_size, vs.xfb_address[idx]); break; } #endif ``` ### D. NIR lowering chain integration (panvk_vX_shader.c, somewhere in pipeline-compile path) After the standard nir_io_add_intrinsic_xfb_info pass and BEFORE the panvk descriptor lowering: ```c if (nir->info.stage == MESA_SHADER_VERTEX && nir->info.has_transform_feedback_varyings) { NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info); NIR_PASS(_, nir, pan_nir_lower_xfb); } ``` Place this near the existing pan_preprocess_nir() call (panvk_vX_shader.c:509). ### E. Per-draw sysval population (jm/panvk_vX_cmd_draw.c) After existing vs.first_vertex / vs.raw_vertex_offset sets (line ~828): ```c set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, draw->padded_vertex_count); const struct panvk_xfb_state *xfb = &cmdbuf->state.gfx.xfb; for (unsigned i = 0; i < 4; i++) { uint64_t addr = (xfb->active && i < xfb->buffer_count) ? (xfb->buffers[i].addr + xfb->buffers[i].offset) : 0; set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[i], addr); } ``` ### F. Command buffer state (panvk_cmd_draw.h or new file) Add to the per-cmdbuf graphics state: ```c struct panvk_xfb_state { bool active; /* Between vkCmdBeginTransformFeedback and vkCmdEnd */ unsigned buffer_count; /* From vkCmdBindTransformFeedbackBuffers */ struct { uint64_t addr; /* gpu_va of the buffer base */ uint64_t offset; /* user-supplied offset */ uint64_t size; /* user-supplied size, or VK_WHOLE_SIZE */ } buffers[4]; }; ``` ### G. Vulkan command handlers (new file: jm/panvk_vX_cmd_xfb.c) ```c VKAPI_ATTR void VKAPI_CALL panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)( VkCommandBuffer cmdBuf, uint32_t firstBinding, uint32_t bindingCount, const VkBuffer *pBuffers, const VkDeviceSize *pOffsets, const VkDeviceSize *pSizes) { /* Stash addresses/offsets/sizes in cmdbuf->state.gfx.xfb.buffers[] */ } VKAPI_ATTR void VKAPI_CALL panvk_per_arch(CmdBeginTransformFeedbackEXT)( VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer, uint32_t counterBufferCount, const VkBuffer *pCounterBuffers, const VkDeviceSize *pCounterBufferOffsets) { /* Set cmdbuf->state.gfx.xfb.active = true; mark sysvals dirty; * if counter buffers supplied, read them and adjust internal byte counter * (resume case) */ } VKAPI_ATTR void VKAPI_CALL panvk_per_arch(CmdEndTransformFeedbackEXT)( VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer, uint32_t counterBufferCount, const VkBuffer *pCounterBuffers, const VkDeviceSize *pCounterBufferOffsets) { /* Set active = false; if counter buffers supplied, write the byte counter * back (pause case) */ } ``` ### H. meson.build registration Add `jm/panvk_vX_cmd_xfb.c` to the JM file list in `src/panfrost/vulkan/meson.build`. ### I. rasterizerDiscardEnable Honor `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable` if not already — apps doing pure-XFB capture set this. Skip the rasterizer + frag job emission when set. Check existing PanVk JM pipeline code; this may already work. ## Open questions / risks 1. **Counter buffer semantics.** vkCmdBeginTransformFeedback's counter buffers let apps PAUSE/RESUME XFB across command buffers. Initial implementation: ignore them (advertise `transformFeedbackDraw = false` so apps don't expect resume support). Add later if needed. 2. **Padded vertex count vs actual vertex count.** PanVk uses `padded_vertex_count` for buffer sizing because of attribute alignment requirements. For XFB the conceptual "num_vertices" is the actual draw call count, not padded. Need to make sure `vs.num_vertices = draw->info.vertex.count` (or equivalent unpadded value), not padded_vertex_count. CHECK THIS in implementation. 3. **`maxTransformFeedbackStreams = 1` is tight.** GLES3 needs only 1 stream; multi-stream is GL 4.0+ and ANGLE may not require it. Confirm via ANGLE's required-features list. 4. **NIR pass ordering.** `pan_nir_lower_xfb` must run on the shader BEFORE the panvk descriptor lowering (which assumes only certain intrinsics survive). Put it right after `nir_lower_system_values`. 5. **Shader compilation: single variant or two?** Panfrost-Gallium compiles two variants (regular + xfb). For PanVk, if a pipeline has XFB outputs declared in the shader, the lowering can run on the only variant — the XFB writes happen even when the pipeline is bound for non-XFB draws (cmdbuf state's `xfb.active=false` makes all xfb_address[i]=0, and the global stores at NULL would fault). So: NEED to either (a) compile two variants like Gallium does, or (b) at draw time guard the stores at the shader level. Simpler: when xfb.active=false, no draw should be in flight that uses the XFB-lowered shader. But Vulkan allows binding an XFB pipeline outside an XFB block. **Resolution**: probably compile two variants. Defer to Phase 2 design check. 6. **Coverage probe.** Phase 3 probe should exercise: single buffer write, single stream, single vertex, single triangle, verify byte-exact output. ## Files-list summary | Change | File | Lines (est) | |---|---|---| | Expose extension | `src/panfrost/vulkan/panvk_vX_physical_device.c` | +15 | | Sysval struct | `src/panfrost/vulkan/panvk_shader.h` | +6 | | Shader lowering | `src/panfrost/vulkan/panvk_vX_shader.c` | +15 | | NIR pass wiring | `src/panfrost/vulkan/panvk_vX_shader.c` | +6 | | Cmd state | `src/panfrost/vulkan/panvk_cmd_draw.h` | +15 | | Sysval populate | `src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c` | +15 | | New cmd handlers | `src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c` (NEW) | +150 | | Meson | `src/panfrost/vulkan/meson.build` | +1 | | **Total Mesa side** | | **~220 lines** | | Probe | `iter13/probe_xfb.c` (NEW in campaign) | +400 | | Probe shader | `iter13/probe_xfb.vert` (NEW) | +20 | | **Total probe side** | | **~420 lines** | ## Phase 1 verdict Implementation scope is **bounded and tractable** — well-defined surface, all building blocks present, no Bifrost RE needed. Phase 2 (situation analysis) should validate: 1. The single-variant-vs-two-variants question (open question #5 above) 2. The padded_vertex_count question (open question #2) 3. Spec compliance check on the property values (open question #3) Then Phase 3 writes the probe, Phase 4 implements. ## Reference - pan_nir_lower_xfb.c (85 lines, full read above) - panvk_shader.h:133-175 (graphics_sysvals struct) - panvk_vX_shader.c:87-103 (sysval lowering pattern) - jm/panvk_vX_cmd_draw.c:824-830 (per-draw sysval population) - Panfrost-Gallium oracle: src/gallium/drivers/panfrost/pan_shader.c:125-130, 593-603