# Phase 2 — situation analysis / design lock for iter13 Closed **2026-05-20**. Resolves the 3 open questions from [phase1_iter13_source_map.md](phase1_iter13_source_map.md). ## Q1: Single shader variant or two? Phase 1 noted that if the XFB-lowered shader has `nir_store_global` instructions, and we leave them unconditional, an XFB-inactive draw with `xfb_address[i] = 0` would NULL-fault the GPU. Two options to resolve: - **(B) Two compiled variants per shader** (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state. - **(C) Single variant with runtime guard**: wrap stores in `if (xfb_address[i] != 0)`. Adds predictable branches. **Decision: (B) — two compiled variants.** Rationale: - Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach). - Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps). - Zero runtime overhead (no branches in the hot path). - Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines. Implementation: in `panvk_vX_shader.c`, when compiling a vertex shader, detect `shader->info.has_transform_feedback_varyings`. If set, compile twice: 1. Without `pan_nir_lower_xfb` → store in `panvk_shader::regular_variant`. 2. With the standard `nir_io_add_intrinsic_xfb_info` + `pan_nir_lower_xfb` passes applied → store in `panvk_shader::xfb_variant`. At draw time in `jm/panvk_vX_cmd_draw.c`, select the variant based on `cmdbuf->state.gfx.xfb.active`. The lifetime + memory management for the second variant mirrors the first. ## Q2: `num_vertices` value — padded or actual? Phase 1 noted ambiguity between PanVk's `padded_vertex_count` (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB. **Decision: `vs.num_vertices = draw->info.vertex.count`** (the unpadded actual draw call count). Rationale: Per Vulkan spec, XFB output index = `instance_id * vertex_count + vertex_id`, where `vertex_count` is the draw call's vertex count (the `vertexCount` arg of `vkCmdDraw`). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes. The `pan_nir_lower_xfb` pass uses `nir_load_num_vertices()` directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value. ## Q3: Property struct values for `VkPhysicalDeviceTransformFeedbackPropertiesEXT` Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements: | Property | Decision | Reason | |---|---|---| | `maxTransformFeedbackStreams` | **1** | GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it. | | `maxTransformFeedbackBuffers` | **4** | Vulkan spec maximum is 4 separate XFB buffers; align with that. | | `maxTransformFeedbackBufferSize` | **(1ULL << 32) - 1** | Conservative 4 GiB cap; matches PanVk's general buffer size limits. | | `maxTransformFeedbackStreamDataSize` | **512** | Conservative; per-stream max bytes of XFB output per vertex. | | `maxTransformFeedbackBufferDataSize` | **512** | Same as above; per-buffer. | | `maxTransformFeedbackBufferDataStride` | **2048** | Generous; per-stream stride between vertices in a buffer. | | `transformFeedbackQueries` | **false** | Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation. | | `transformFeedbackStreamsLinesTriangles` | **false** | Don't claim emit-from-GS support; we have no GS anyway. | | `transformFeedbackRasterizationStreamSelect` | **false** | Multi-stream-specific; meaningless with 1 stream. | | `transformFeedbackDraw` | **false** | `vkCmdDrawIndirectByteCountEXT` not implemented in v1. Apps that don't need pause/resume don't need this. | Plus feature flags: - `transformFeedback = true` - `geometryStreams = false` (matches `transformFeedbackStreamsLinesTriangles = false`) ## Side-effect: `rasterizerDiscardEnable` When an app does pure-XFB capture (no fragment output), it sets `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE`. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent. ## Locked design — implementation can begin The 220-line implementation estimate from Phase 1 is unchanged. ## Phase 3 next Write `iter13/probe_xfb.c` — minimal Vulkan probe doing: 1. Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored). 2. Create vertex shader with one XFB output (e.g., `layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;`). 3. Shader writes `gl_VertexIndex`-derived value to `captured`. 4. Create pipeline with `rasterizerDiscardEnable = VK_TRUE` (no rasterization). 5. Bind XFB buffer + begin/draw/end. 6. Read back buffer. 7. Verify: 3 vec4s with the expected values. If this passes on patched Mesa, iter13 implementation is correct.