panvk-bifrost/mesa-panvk-bifrost/phase2_iter13_situation.md

# Phase 2 — situation analysis / design lock for iter13

Closed **2026-05-20**. Resolves the 3 open questions from [phase1_iter13_source_map.md](phase1_iter13_source_map.md).

## Q1: Single shader variant or two?

Phase 1 noted that if the XFB-lowered shader has `nir_store_global` instructions, and we leave them unconditional, an XFB-inactive draw with `xfb_address[i] = 0` would NULL-fault the GPU. Two options to resolve:

- **(B) Two compiled variants per shader** (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state.
- **(C) Single variant with runtime guard**: wrap stores in `if (xfb_address[i] != 0)`. Adds predictable branches.

**Decision: (B) — two compiled variants.**

Rationale:
- Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach).
- Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps).
- Zero runtime overhead (no branches in the hot path).
- Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines.

Implementation: in `panvk_vX_shader.c`, when compiling a vertex shader, detect `shader->info.has_transform_feedback_varyings`. If set, compile twice:
1. Without `pan_nir_lower_xfb` → store in `panvk_shader::regular_variant`.
2. With the standard `nir_io_add_intrinsic_xfb_info` + `pan_nir_lower_xfb` passes applied → store in `panvk_shader::xfb_variant`.

At draw time in `jm/panvk_vX_cmd_draw.c`, select the variant based on `cmdbuf->state.gfx.xfb.active`. The lifetime + memory management for the second variant mirrors the first.

## Q2: `num_vertices` value — padded or actual?

Phase 1 noted ambiguity between PanVk's `padded_vertex_count` (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB.

**Decision: `vs.num_vertices = draw->info.vertex.count`** (the unpadded actual draw call count).

Rationale: Per Vulkan spec, XFB output index = `instance_id * vertex_count + vertex_id`, where `vertex_count` is the draw call's vertex count (the `vertexCount` arg of `vkCmdDraw`). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes.

The `pan_nir_lower_xfb` pass uses `nir_load_num_vertices()` directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value.

## Q3: Property struct values for `VkPhysicalDeviceTransformFeedbackPropertiesEXT`

Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements:

| Property | Decision | Reason |
|---|---|---|
| `maxTransformFeedbackStreams` | **1** | GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it. |
| `maxTransformFeedbackBuffers` | **4** | Vulkan spec maximum is 4 separate XFB buffers; align with that. |
| `maxTransformFeedbackBufferSize` | **(1ULL << 32) - 1** | Conservative 4 GiB cap; matches PanVk's general buffer size limits. |
| `maxTransformFeedbackStreamDataSize` | **512** | Conservative; per-stream max bytes of XFB output per vertex. |
| `maxTransformFeedbackBufferDataSize` | **512** | Same as above; per-buffer. |
| `maxTransformFeedbackBufferDataStride` | **2048** | Generous; per-stream stride between vertices in a buffer. |
| `transformFeedbackQueries` | **false** | Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation. |
| `transformFeedbackStreamsLinesTriangles` | **false** | Don't claim emit-from-GS support; we have no GS anyway. |
| `transformFeedbackRasterizationStreamSelect` | **false** | Multi-stream-specific; meaningless with 1 stream. |
| `transformFeedbackDraw` | **false** | `vkCmdDrawIndirectByteCountEXT` not implemented in v1. Apps that don't need pause/resume don't need this. |

Plus feature flags:
- `transformFeedback = true`
- `geometryStreams = false` (matches `transformFeedbackStreamsLinesTriangles = false`)

## Side-effect: `rasterizerDiscardEnable`

When an app does pure-XFB capture (no fragment output), it sets `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE`. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent.

## Locked design — implementation can begin

The 220-line implementation estimate from Phase 1 is unchanged.

## Phase 3 next

Write `iter13/probe_xfb.c` — minimal Vulkan probe doing:
1. Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored).
2. Create vertex shader with one XFB output (e.g., `layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;`).
3. Shader writes `gl_VertexIndex`-derived value to `captured`.
4. Create pipeline with `rasterizerDiscardEnable = VK_TRUE` (no rasterization).
5. Bind XFB buffer + begin/draw/end.
6. Read back buffer.
7. Verify: 3 vec4s with the expected values.

If this passes on patched Mesa, iter13 implementation is correct.