a4e7d8ab90
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
77 lines
5.1 KiB
Markdown
77 lines
5.1 KiB
Markdown
# Phase 2 — situation analysis / design lock for iter13
|
||
|
||
Closed **2026-05-20**. Resolves the 3 open questions from [phase1_iter13_source_map.md](phase1_iter13_source_map.md).
|
||
|
||
## Q1: Single shader variant or two?
|
||
|
||
Phase 1 noted that if the XFB-lowered shader has `nir_store_global` instructions, and we leave them unconditional, an XFB-inactive draw with `xfb_address[i] = 0` would NULL-fault the GPU. Two options to resolve:
|
||
|
||
- **(B) Two compiled variants per shader** (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state.
|
||
- **(C) Single variant with runtime guard**: wrap stores in `if (xfb_address[i] != 0)`. Adds predictable branches.
|
||
|
||
**Decision: (B) — two compiled variants.**
|
||
|
||
Rationale:
|
||
- Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach).
|
||
- Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps).
|
||
- Zero runtime overhead (no branches in the hot path).
|
||
- Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines.
|
||
|
||
Implementation: in `panvk_vX_shader.c`, when compiling a vertex shader, detect `shader->info.has_transform_feedback_varyings`. If set, compile twice:
|
||
1. Without `pan_nir_lower_xfb` → store in `panvk_shader::regular_variant`.
|
||
2. With the standard `nir_io_add_intrinsic_xfb_info` + `pan_nir_lower_xfb` passes applied → store in `panvk_shader::xfb_variant`.
|
||
|
||
At draw time in `jm/panvk_vX_cmd_draw.c`, select the variant based on `cmdbuf->state.gfx.xfb.active`. The lifetime + memory management for the second variant mirrors the first.
|
||
|
||
## Q2: `num_vertices` value — padded or actual?
|
||
|
||
Phase 1 noted ambiguity between PanVk's `padded_vertex_count` (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB.
|
||
|
||
**Decision: `vs.num_vertices = draw->info.vertex.count`** (the unpadded actual draw call count).
|
||
|
||
Rationale: Per Vulkan spec, XFB output index = `instance_id * vertex_count + vertex_id`, where `vertex_count` is the draw call's vertex count (the `vertexCount` arg of `vkCmdDraw`). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes.
|
||
|
||
The `pan_nir_lower_xfb` pass uses `nir_load_num_vertices()` directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value.
|
||
|
||
## Q3: Property struct values for `VkPhysicalDeviceTransformFeedbackPropertiesEXT`
|
||
|
||
Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements:
|
||
|
||
| Property | Decision | Reason |
|
||
|---|---|---|
|
||
| `maxTransformFeedbackStreams` | **1** | GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it. |
|
||
| `maxTransformFeedbackBuffers` | **4** | Vulkan spec maximum is 4 separate XFB buffers; align with that. |
|
||
| `maxTransformFeedbackBufferSize` | **(1ULL << 32) - 1** | Conservative 4 GiB cap; matches PanVk's general buffer size limits. |
|
||
| `maxTransformFeedbackStreamDataSize` | **512** | Conservative; per-stream max bytes of XFB output per vertex. |
|
||
| `maxTransformFeedbackBufferDataSize` | **512** | Same as above; per-buffer. |
|
||
| `maxTransformFeedbackBufferDataStride` | **2048** | Generous; per-stream stride between vertices in a buffer. |
|
||
| `transformFeedbackQueries` | **false** | Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation. |
|
||
| `transformFeedbackStreamsLinesTriangles` | **false** | Don't claim emit-from-GS support; we have no GS anyway. |
|
||
| `transformFeedbackRasterizationStreamSelect` | **false** | Multi-stream-specific; meaningless with 1 stream. |
|
||
| `transformFeedbackDraw` | **false** | `vkCmdDrawIndirectByteCountEXT` not implemented in v1. Apps that don't need pause/resume don't need this. |
|
||
|
||
Plus feature flags:
|
||
- `transformFeedback = true`
|
||
- `geometryStreams = false` (matches `transformFeedbackStreamsLinesTriangles = false`)
|
||
|
||
## Side-effect: `rasterizerDiscardEnable`
|
||
|
||
When an app does pure-XFB capture (no fragment output), it sets `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE`. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent.
|
||
|
||
## Locked design — implementation can begin
|
||
|
||
The 220-line implementation estimate from Phase 1 is unchanged.
|
||
|
||
## Phase 3 next
|
||
|
||
Write `iter13/probe_xfb.c` — minimal Vulkan probe doing:
|
||
1. Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored).
|
||
2. Create vertex shader with one XFB output (e.g., `layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;`).
|
||
3. Shader writes `gl_VertexIndex`-derived value to `captured`.
|
||
4. Create pipeline with `rasterizerDiscardEnable = VK_TRUE` (no rasterization).
|
||
5. Bind XFB buffer + begin/draw/end.
|
||
6. Read back buffer.
|
||
7. Verify: 3 vec4s with the expected values.
|
||
|
||
If this passes on patched Mesa, iter13 implementation is correct.
|