Files
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

77 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 2 — situation analysis / design lock for iter13
Closed **2026-05-20**. Resolves the 3 open questions from [phase1_iter13_source_map.md](phase1_iter13_source_map.md).
## Q1: Single shader variant or two?
Phase 1 noted that if the XFB-lowered shader has `nir_store_global` instructions, and we leave them unconditional, an XFB-inactive draw with `xfb_address[i] = 0` would NULL-fault the GPU. Two options to resolve:
- **(B) Two compiled variants per shader** (Panfrost-Gallium's approach): non-XFB variant + XFB variant. Select at draw time based on cmdbuf state.
- **(C) Single variant with runtime guard**: wrap stores in `if (xfb_address[i] != 0)`. Adds predictable branches.
**Decision: (B) — two compiled variants.**
Rationale:
- Matches Panfrost-Gallium's well-validated pattern (oracle for the entire approach).
- Safer against application misuse (binding XFB pipeline outside Begin/End block — the Vulkan spec forbids it, but we don't want a GPU fault for buggy apps).
- Zero runtime overhead (no branches in the hot path).
- Cost: ~2× shader compilation time + ~2× shader cache memory for XFB-bearing pipelines. Negligible — only affects shaders that declare XFB outputs, which is a small subset of all pipelines.
Implementation: in `panvk_vX_shader.c`, when compiling a vertex shader, detect `shader->info.has_transform_feedback_varyings`. If set, compile twice:
1. Without `pan_nir_lower_xfb` → store in `panvk_shader::regular_variant`.
2. With the standard `nir_io_add_intrinsic_xfb_info` + `pan_nir_lower_xfb` passes applied → store in `panvk_shader::xfb_variant`.
At draw time in `jm/panvk_vX_cmd_draw.c`, select the variant based on `cmdbuf->state.gfx.xfb.active`. The lifetime + memory management for the second variant mirrors the first.
## Q2: `num_vertices` value — padded or actual?
Phase 1 noted ambiguity between PanVk's `padded_vertex_count` (used for attribute buffer sizing) and the Vulkan-spec'd actual vertex count for XFB.
**Decision: `vs.num_vertices = draw->info.vertex.count`** (the unpadded actual draw call count).
Rationale: Per Vulkan spec, XFB output index = `instance_id * vertex_count + vertex_id`, where `vertex_count` is the draw call's vertex count (the `vertexCount` arg of `vkCmdDraw`). NOT the internal padded count. Apps reading back the XFB buffer expect packed output, no padding holes.
The `pan_nir_lower_xfb` pass uses `nir_load_num_vertices()` directly in the index calculation (line 24-25 of pan_nir_lower_xfb.c), so whatever the driver provides is what the shader uses. We provide the unpadded value.
## Q3: Property struct values for `VkPhysicalDeviceTransformFeedbackPropertiesEXT`
Phase 1 sketched conservative values. Reviewing per spec + ANGLE's actual requirements:
| Property | Decision | Reason |
|---|---|---|
| `maxTransformFeedbackStreams` | **1** | GLES3 needs 1; multi-stream is GL 4.0+; ANGLE only requires 1 for GLES3 emulation. Bump later if a real workload needs it. |
| `maxTransformFeedbackBuffers` | **4** | Vulkan spec maximum is 4 separate XFB buffers; align with that. |
| `maxTransformFeedbackBufferSize` | **(1ULL << 32) - 1** | Conservative 4 GiB cap; matches PanVk's general buffer size limits. |
| `maxTransformFeedbackStreamDataSize` | **512** | Conservative; per-stream max bytes of XFB output per vertex. |
| `maxTransformFeedbackBufferDataSize` | **512** | Same as above; per-buffer. |
| `maxTransformFeedbackBufferDataStride` | **2048** | Generous; per-stream stride between vertices in a buffer. |
| `transformFeedbackQueries` | **false** | Defer query support (VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_PRIMITIVES_WRITTEN_EXT) to a follow-up iter. Not needed for ANGLE-GLES3 emulation. |
| `transformFeedbackStreamsLinesTriangles` | **false** | Don't claim emit-from-GS support; we have no GS anyway. |
| `transformFeedbackRasterizationStreamSelect` | **false** | Multi-stream-specific; meaningless with 1 stream. |
| `transformFeedbackDraw` | **false** | `vkCmdDrawIndirectByteCountEXT` not implemented in v1. Apps that don't need pause/resume don't need this. |
Plus feature flags:
- `transformFeedback = true`
- `geometryStreams = false` (matches `transformFeedbackStreamsLinesTriangles = false`)
## Side-effect: `rasterizerDiscardEnable`
When an app does pure-XFB capture (no fragment output), it sets `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable = VK_TRUE`. PanVk needs to honor this — skip the tiler / frag job emission. Phase 4 should check current handling and wire it if absent.
## Locked design — implementation can begin
The 220-line implementation estimate from Phase 1 is unchanged.
## Phase 3 next
Write `iter13/probe_xfb.c` — minimal Vulkan probe doing:
1. Create vertex buffer with 3 vertices (just for the draw call shape; vertex inputs ignored).
2. Create vertex shader with one XFB output (e.g., `layout(xfb_buffer=0, xfb_offset=0) out vec4 captured;`).
3. Shader writes `gl_VertexIndex`-derived value to `captured`.
4. Create pipeline with `rasterizerDiscardEnable = VK_TRUE` (no rasterization).
5. Bind XFB buffer + begin/draw/end.
6. Read back buffer.
7. Verify: 3 vec4s with the expected values.
If this passes on patched Mesa, iter13 implementation is correct.