Files
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

5.1 KiB
Raw Permalink Blame History

Phase 2 — situation analysis / design lock for iter13

Closed 2026-05-20. Resolves the 3 open questions from phase1_iter13_source_map.md.

Q1: Single shader variant or two?

Phase 1 noted that if the XFB-lowered shader has nir_store_global instructions, and we leave them unconditional, an XFB-inactive draw with xfb_address[i] = 0 would NULL-fault the GPU. Two options to resolve:

  • (B) Two compiled variants per shader (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state.
  • (C) Single variant with runtime guard: wrap stores in if (xfb_address[i] != 0). Adds predictable branches.

Decision: (B) — two compiled variants.

Rationale:

  • Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach).
  • Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps).
  • Zero runtime overhead (no branches in the hot path).
  • Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines.

Implementation: in panvk_vX_shader.c, when compiling a vertex shader, detect shader->info.has_transform_feedback_varyings. If set, compile twice:

  1. Without pan_nir_lower_xfb → store in panvk_shader::regular_variant.
  2. With the standard nir_io_add_intrinsic_xfb_info + pan_nir_lower_xfb passes applied → store in panvk_shader::xfb_variant.

At draw time in jm/panvk_vX_cmd_draw.c, select the variant based on cmdbuf->state.gfx.xfb.active. The lifetime + memory management for the second variant mirrors the first.

Q2: num_vertices value — padded or actual?

Phase 1 noted ambiguity between PanVk's padded_vertex_count (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB.

Decision: vs.num_vertices = draw->info.vertex.count (the unpadded actual draw call count).

Rationale: Per Vulkan spec, XFB output index = instance_id * vertex_count + vertex_id, where vertex_count is the draw call's vertex count (the vertexCount arg of vkCmdDraw). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes.

The pan_nir_lower_xfb pass uses nir_load_num_vertices() directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value.

Q3: Property struct values for VkPhysicalDeviceTransformFeedbackPropertiesEXT

Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements:

Property Decision Reason
maxTransformFeedbackStreams 1 GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it.
maxTransformFeedbackBuffers 4 Vulkan spec maximum is 4 separate XFB buffers; align with that.
maxTransformFeedbackBufferSize (1ULL << 32) - 1 Conservative 4 GiB cap; matches PanVk's general buffer size limits.
maxTransformFeedbackStreamDataSize 512 Conservative; per-stream max bytes of XFB output per vertex.
maxTransformFeedbackBufferDataSize 512 Same as above; per-buffer.
maxTransformFeedbackBufferDataStride 2048 Generous; per-stream stride between vertices in a buffer.
transformFeedbackQueries false Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation.
transformFeedbackStreamsLinesTriangles false Don't claim emit-from-GS support; we have no GS anyway.
transformFeedbackRasterizationStreamSelect false Multi-stream-specific; meaningless with 1 stream.
transformFeedbackDraw false vkCmdDrawIndirectByteCountEXT not implemented in v1. Apps that don't need pause/resume don't need this.

Plus feature flags:

  • transformFeedback = true
  • geometryStreams = false (matches transformFeedbackStreamsLinesTriangles = false)

Side-effect: rasterizerDiscardEnable

When an app does pure-XFB capture (no fragment output), it sets VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent.

Locked design — implementation can begin

The 220-line implementation estimate from Phase 1 is unchanged.

Phase 3 next

Write iter13/probe_xfb.c — minimal Vulkan probe doing:

  1. Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored).
  2. Create vertex shader with one XFB output (e.g., layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;).
  3. Shader writes gl_VertexIndex-derived value to captured.
  4. Create pipeline with rasterizerDiscardEnable = VK_TRUE (no rasterization).
  5. Bind XFB buffer + begin/draw/end.
  6. Read back buffer.
  7. Verify: 3 vec4s with the expected values.

If this passes on patched Mesa, iter13 implementation is correct.