Files
panvk-bifrost/mesa-panvk-bifrost/phase1_iter13_source_map.md
T
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

258 lines
11 KiB
Markdown

# Phase 1 — source map for iter13 (VK_EXT_transform_feedback in PanVk)
Closed **2026-05-20**.
## Headline
The implementation surface is **much smaller than the initial estimate suggested**. Mesa already has the hardware-side abstraction (`pan_nir_lower_xfb`) and PanVk has a clean sysval-injection pattern (`load_sysval(b, graphics, bit_size, FIELD)`). Total new code: ~250-300 lines + a probe.
## The `pan_nir_lower_xfb` contract (oracle)
`src/panfrost/compiler/pan_nir_lower_xfb.c` (85 lines, Collabora 2022) does:
```
For every nir_store_output with XFB metadata:
Replace with nir_store_global at address:
buf = nir_load_xfb_address(b, 64, .base = buffer_slot)
idx = nir_load_instance_id * nir_load_num_vertices + nir_load_raw_vertex_id_pan
addr = buf + (idx * stride) + offset
```
Plus: replaces `nir_load_vertex_id` with `nir_load_raw_vertex_id_pan + nir_load_raw_vertex_offset_pan` (XFB programs need zero-based vertex_id for correct buffer indexing).
The intrinsics the pass uses, and PanVk's current handling:
| Intrinsic | PanVk handles? | Notes |
|---|---|---|
| `nir_load_xfb_address(buffer=N)` | ❌ **NEW** | per-stream base address |
| `nir_load_num_vertices` | ❌ **NEW** | per-draw vertex count |
| `nir_load_raw_vertex_id_pan` | ✅ (panvk_vX_shader.c:211) | already wired |
| `nir_load_raw_vertex_offset_pan` | ✅ (panvk_vX_shader.c:101 — JM path) | already wired |
| `nir_load_instance_id` | ✅ standard Mesa | always available |
Only 2 new intrinsic handlers needed.
## PanVk's sysval injection pattern (the wiring mechanism)
The driver-shader contract is `panvk_graphics_sysvals` — a struct that's written by the driver per-draw and read by the shader via the FAU (Fast Auxiliary Unit) push-constant area.
Definition: `src/panfrost/vulkan/panvk_shader.h:133-175`.
Existing pattern (for `vs.first_vertex`):
- **Struct field** (panvk_shader.h:154): `int32_t first_vertex;`
- **Shader lowering** (panvk_vX_shader.c:87-88):
```c
case nir_intrinsic_load_first_vertex:
val = load_sysval(b, graphics, bit_size, vs.first_vertex);
break;
```
- **Driver populates** (jm/panvk_vX_cmd_draw.c:824):
```c
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.first_vertex, info->vertex.base);
```
Mirror this exactly for the two new fields:
- `vs.num_vertices` (uint32_t)
- `vs.xfb_address[4]` (aligned_u64 array — Vulkan spec maxTransformFeedbackBuffers ≥ 1, recommended 4)
## Implementation skeleton
### A. Extension + feature exposure (panvk_vX_physical_device.c)
Around line 91 (KHR_robustness2 block):
```c
.EXT_transform_feedback = PAN_ARCH < 9, // JM-class only for now
```
At feature block (~line 491):
```c
/* VK_EXT_transform_feedback */
.transformFeedback = PAN_ARCH < 9,
.geometryStreams = false, /* No GS support yet */
```
At properties block (~line 1019):
```c
/* VK_EXT_transform_feedback */
.maxTransformFeedbackStreams = 1, /* Up the limit if multi-stream needed; 1 is GLES3 baseline */
.maxTransformFeedbackBuffers = 4,
.maxTransformFeedbackBufferSize = UINT32_MAX,
.maxTransformFeedbackStreamDataSize = 512,
.maxTransformFeedbackBufferDataSize = 512,
.maxTransformFeedbackBufferDataStride = 2048,
.transformFeedbackQueries = false, /* Start without; defer to follow-up iter */
.transformFeedbackStreamsLinesTriangles = false,
.transformFeedbackRasterizationStreamSelect = false,
.transformFeedbackDraw = false, /* No vkCmdDrawIndirectByteCountEXT yet */
```
### B. Sysval struct fields (panvk_shader.h)
Add to the `vs` substruct at line 150-157, only for `PAN_ARCH < 9`:
```c
struct {
#if PAN_ARCH < 9
int32_t raw_vertex_offset;
uint32_t num_vertices; /* NEW iter13: XFB needs per-draw vertex count */
aligned_u64 xfb_address[4]; /* NEW iter13: 4 transform feedback buffer base addresses */
#endif
int32_t first_vertex;
int32_t base_instance;
uint32_t noperspective_varyings;
} vs;
```
(Use `#if PAN_ARCH < 9` since we're not yet supporting Valhall-CSF; can extend later.)
### C. Shader-side intrinsic lowering (panvk_vX_shader.c)
Add cases ~line 103 (inside `PAN_ARCH < 9` block):
```c
#if PAN_ARCH < 9
case nir_intrinsic_load_num_vertices:
val = load_sysval(b, graphics, bit_size, vs.num_vertices);
break;
case nir_intrinsic_load_xfb_address: {
unsigned idx = nir_intrinsic_base(intr);
assert(idx < 4);
val = load_sysval(b, graphics, bit_size, vs.xfb_address[idx]);
break;
}
#endif
```
### D. NIR lowering chain integration (panvk_vX_shader.c, somewhere in pipeline-compile path)
After the standard nir_io_add_intrinsic_xfb_info pass and BEFORE the panvk descriptor lowering:
```c
if (nir->info.stage == MESA_SHADER_VERTEX &&
nir->info.has_transform_feedback_varyings) {
NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
NIR_PASS(_, nir, pan_nir_lower_xfb);
}
```
Place this near the existing pan_preprocess_nir() call (panvk_vX_shader.c:509).
### E. Per-draw sysval population (jm/panvk_vX_cmd_draw.c)
After existing vs.first_vertex / vs.raw_vertex_offset sets (line ~828):
```c
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, draw->padded_vertex_count);
const struct panvk_xfb_state *xfb = &cmdbuf->state.gfx.xfb;
for (unsigned i = 0; i < 4; i++) {
uint64_t addr = (xfb->active && i < xfb->buffer_count)
? (xfb->buffers[i].addr + xfb->buffers[i].offset)
: 0;
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[i], addr);
}
```
### F. Command buffer state (panvk_cmd_draw.h or new file)
Add to the per-cmdbuf graphics state:
```c
struct panvk_xfb_state {
bool active; /* Between vkCmdBeginTransformFeedback and vkCmdEnd */
unsigned buffer_count; /* From vkCmdBindTransformFeedbackBuffers */
struct {
uint64_t addr; /* gpu_va of the buffer base */
uint64_t offset; /* user-supplied offset */
uint64_t size; /* user-supplied size, or VK_WHOLE_SIZE */
} buffers[4];
};
```
### G. Vulkan command handlers (new file: jm/panvk_vX_cmd_xfb.c)
```c
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
VkCommandBuffer cmdBuf, uint32_t firstBinding, uint32_t bindingCount,
const VkBuffer *pBuffers, const VkDeviceSize *pOffsets,
const VkDeviceSize *pSizes)
{
/* Stash addresses/offsets/sizes in cmdbuf->state.gfx.xfb.buffers[] */
}
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBeginTransformFeedbackEXT)(
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
uint32_t counterBufferCount,
const VkBuffer *pCounterBuffers,
const VkDeviceSize *pCounterBufferOffsets)
{
/* Set cmdbuf->state.gfx.xfb.active = true; mark sysvals dirty;
* if counter buffers supplied, read them and adjust internal byte counter
* (resume case) */
}
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdEndTransformFeedbackEXT)(
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
uint32_t counterBufferCount,
const VkBuffer *pCounterBuffers,
const VkDeviceSize *pCounterBufferOffsets)
{
/* Set active = false; if counter buffers supplied, write the byte counter
* back (pause case) */
}
```
### H. meson.build registration
Add `jm/panvk_vX_cmd_xfb.c` to the JM file list in `src/panfrost/vulkan/meson.build`.
### I. rasterizerDiscardEnable
Honor `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable` if not already — apps doing pure-XFB capture set this. Skip the rasterizer + frag job emission when set. Check existing PanVk JM pipeline code; this may already work.
## Open questions / risks
1. **Counter buffer semantics.** vkCmdBeginTransformFeedback's counter buffers let apps PAUSE/RESUME XFB across command buffers. Initial implementation: ignore them (advertise `transformFeedbackDraw = false` so apps don't expect resume support). Add later if needed.
2. **Padded vertex count vs actual vertex count.** PanVk uses `padded_vertex_count` for buffer sizing because of attribute alignment requirements. For XFB the conceptual "num_vertices" is the actual draw call count, not padded. Need to make sure `vs.num_vertices = draw->info.vertex.count` (or equivalent unpadded value), not padded_vertex_count. CHECK THIS in implementation.
3. **`maxTransformFeedbackStreams = 1` is tight.** GLES3 needs only 1 stream; multi-stream is GL 4.0+ and ANGLE may not require it. Confirm via ANGLE's required-features list.
4. **NIR pass ordering.** `pan_nir_lower_xfb` must run on the shader BEFORE the panvk descriptor lowering (which assumes only certain intrinsics survive). Put it right after `nir_lower_system_values`.
5. **Shader compilation: single variant or two?** Panfrost-Gallium compiles two variants (regular + xfb). For PanVk, if a pipeline has XFB outputs declared in the shader, the lowering can run on the only variant — the XFB writes happen even when the pipeline is bound for non-XFB draws (cmdbuf state's `xfb.active=false` makes all xfb_address[i]=0, and the global stores at NULL would fault). So: NEED to either (a) compile two variants like Gallium does, or (b) at draw time guard the stores at the shader level. Simpler: when xfb.active=false, no draw should be in flight that uses the XFB-lowered shader. But Vulkan allows binding an XFB pipeline outside an XFB block. **Resolution**: probably compile two variants. Defer to Phase 2 design check.
6. **Coverage probe.** Phase 3 probe should exercise: single buffer write, single stream, single vertex, single triangle, verify byte-exact output.
## Files-list summary
| Change | File | Lines (est) |
|---|---|---|
| Expose extension | `src/panfrost/vulkan/panvk_vX_physical_device.c` | +15 |
| Sysval struct | `src/panfrost/vulkan/panvk_shader.h` | +6 |
| Shader lowering | `src/panfrost/vulkan/panvk_vX_shader.c` | +15 |
| NIR pass wiring | `src/panfrost/vulkan/panvk_vX_shader.c` | +6 |
| Cmd state | `src/panfrost/vulkan/panvk_cmd_draw.h` | +15 |
| Sysval populate | `src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c` | +15 |
| New cmd handlers | `src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c` (NEW) | +150 |
| Meson | `src/panfrost/vulkan/meson.build` | +1 |
| **Total Mesa side** | | **~220 lines** |
| Probe | `iter13/probe_xfb.c` (NEW in campaign) | +400 |
| Probe shader | `iter13/probe_xfb.vert` (NEW) | +20 |
| **Total probe side** | | **~420 lines** |
## Phase 1 verdict
Implementation scope is **bounded and tractable** — well-defined surface, all building blocks present, no Bifrost RE needed. Phase 2 (situation analysis) should validate:
1. The single-variant-vs-two-variants question (open question #5 above)
2. The padded_vertex_count question (open question #2)
3. Spec compliance check on the property values (open question #3)
Then Phase 3 writes the probe, Phase 4 implements.
## Reference
- pan_nir_lower_xfb.c (85 lines, full read above)
- panvk_shader.h:133-175 (graphics_sysvals struct)
- panvk_vX_shader.c:87-103 (sysval lowering pattern)
- jm/panvk_vX_cmd_draw.c:824-830 (per-draw sysval population)
- Panfrost-Gallium oracle: src/gallium/drivers/panfrost/pan_shader.c:125-130, 593-603