panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.1 KiB
Phase 2 — situation analysis / design lock for iter13
Closed 2026-05-20. Resolves the 3 open questions from phase1_iter13_source_map.md.
Q1: Single shader variant or two?
Phase 1 noted that if the XFB-lowered shader has nir_store_global instructions, and we leave them unconditional, an XFB-inactive draw with xfb_address[i] = 0 would NULL-fault the GPU. Two options to resolve:
- (B) Two compiled variants per shader (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state.
- (C) Single variant with runtime guard: wrap stores in
if (xfb_address[i] != 0). Adds predictable branches.
Decision: (B) — two compiled variants.
Rationale:
- Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach).
- Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps).
- Zero runtime overhead (no branches in the hot path).
- Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines.
Implementation: in panvk_vX_shader.c, when compiling a vertex shader, detect shader->info.has_transform_feedback_varyings. If set, compile twice:
- Without
pan_nir_lower_xfb→ store inpanvk_shader::regular_variant. - With the standard
nir_io_add_intrinsic_xfb_info+pan_nir_lower_xfbpasses applied → store inpanvk_shader::xfb_variant.
At draw time in jm/panvk_vX_cmd_draw.c, select the variant based on cmdbuf->state.gfx.xfb.active. The lifetime + memory management for the second variant mirrors the first.
Q2: num_vertices value — padded or actual?
Phase 1 noted ambiguity between PanVk's padded_vertex_count (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB.
Decision: vs.num_vertices = draw->info.vertex.count (the unpadded actual draw call count).
Rationale: Per Vulkan spec, XFB output index = instance_id * vertex_count + vertex_id, where vertex_count is the draw call's vertex count (the vertexCount arg of vkCmdDraw). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes.
The pan_nir_lower_xfb pass uses nir_load_num_vertices() directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value.
Q3: Property struct values for VkPhysicalDeviceTransformFeedbackPropertiesEXT
Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements:
| Property | Decision | Reason |
|---|---|---|
maxTransformFeedbackStreams |
1 | GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it. |
maxTransformFeedbackBuffers |
4 | Vulkan spec maximum is 4 separate XFB buffers; align with that. |
maxTransformFeedbackBufferSize |
(1ULL << 32) - 1 | Conservative 4 GiB cap; matches PanVk's general buffer size limits. |
maxTransformFeedbackStreamDataSize |
512 | Conservative; per-stream max bytes of XFB output per vertex. |
maxTransformFeedbackBufferDataSize |
512 | Same as above; per-buffer. |
maxTransformFeedbackBufferDataStride |
2048 | Generous; per-stream stride between vertices in a buffer. |
transformFeedbackQueries |
false | Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation. |
transformFeedbackStreamsLinesTriangles |
false | Don't claim emit-from-GS support; we have no GS anyway. |
transformFeedbackRasterizationStreamSelect |
false | Multi-stream-specific; meaningless with 1 stream. |
transformFeedbackDraw |
false | vkCmdDrawIndirectByteCountEXT not implemented in v1. Apps that don't need pause/resume don't need this. |
Plus feature flags:
transformFeedback = truegeometryStreams = false(matchestransformFeedbackStreamsLinesTriangles = false)
Side-effect: rasterizerDiscardEnable
When an app does pure-XFB capture (no fragment output), it sets VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent.
Locked design — implementation can begin
The 220-line implementation estimate from Phase 1 is unchanged.
Phase 3 next
Write iter13/probe_xfb.c — minimal Vulkan probe doing:
- Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored).
- Create vertex shader with one XFB output (e.g.,
layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;). - Shader writes
gl_VertexIndex-derived value tocaptured. - Create pipeline with
rasterizerDiscardEnable = VK_TRUE(no rasterization). - Bind XFB buffer + begin/draw/end.
- Read back buffer.
- Verify: 3 vec4s with the expected values.
If this passes on patched Mesa, iter13 implementation is correct.