initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,257 @@
|
||||
# Phase 1 — source map for iter13 (VK_EXT_transform_feedback in PanVk)
|
||||
|
||||
Closed **2026-05-20**.
|
||||
|
||||
## Headline
|
||||
|
||||
The implementation surface is **much smaller than the initial estimate suggested**. Mesa already has the hardware-side abstraction (`pan_nir_lower_xfb`) and PanVk has a clean sysval-injection pattern (`load_sysval(b, graphics, bit_size, FIELD)`). Total new code: ~250-300 lines + a probe.
|
||||
|
||||
## The `pan_nir_lower_xfb` contract (oracle)
|
||||
|
||||
`src/panfrost/compiler/pan_nir_lower_xfb.c` (85 lines, Collabora 2022) does:
|
||||
|
||||
```
|
||||
For every nir_store_output with XFB metadata:
|
||||
Replace with nir_store_global at address:
|
||||
buf = nir_load_xfb_address(b, 64, .base = buffer_slot)
|
||||
idx = nir_load_instance_id * nir_load_num_vertices + nir_load_raw_vertex_id_pan
|
||||
addr = buf + (idx * stride) + offset
|
||||
```
|
||||
|
||||
Plus: replaces `nir_load_vertex_id` with `nir_load_raw_vertex_id_pan + nir_load_raw_vertex_offset_pan` (XFB programs need zero-based vertex_id for correct buffer indexing).
|
||||
|
||||
The intrinsics the pass uses, and PanVk's current handling:
|
||||
|
||||
| Intrinsic | PanVk handles? | Notes |
|
||||
|---|---|---|
|
||||
| `nir_load_xfb_address(buffer=N)` | ❌ **NEW** | per-stream base address |
|
||||
| `nir_load_num_vertices` | ❌ **NEW** | per-draw vertex count |
|
||||
| `nir_load_raw_vertex_id_pan` | ✅ (panvk_vX_shader.c:211) | already wired |
|
||||
| `nir_load_raw_vertex_offset_pan` | ✅ (panvk_vX_shader.c:101 — JM path) | already wired |
|
||||
| `nir_load_instance_id` | ✅ standard Mesa | always available |
|
||||
|
||||
Only 2 new intrinsic handlers needed.
|
||||
|
||||
## PanVk's sysval injection pattern (the wiring mechanism)
|
||||
|
||||
The driver-shader contract is `panvk_graphics_sysvals` — a struct that's written by the driver per-draw and read by the shader via the FAU (Fast Auxiliary Unit) push-constant area.
|
||||
|
||||
Definition: `src/panfrost/vulkan/panvk_shader.h:133-175`.
|
||||
|
||||
Existing pattern (for `vs.first_vertex`):
|
||||
- **Struct field** (panvk_shader.h:154): `int32_t first_vertex;`
|
||||
- **Shader lowering** (panvk_vX_shader.c:87-88):
|
||||
```c
|
||||
case nir_intrinsic_load_first_vertex:
|
||||
val = load_sysval(b, graphics, bit_size, vs.first_vertex);
|
||||
break;
|
||||
```
|
||||
- **Driver populates** (jm/panvk_vX_cmd_draw.c:824):
|
||||
```c
|
||||
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.first_vertex, info->vertex.base);
|
||||
```
|
||||
|
||||
Mirror this exactly for the two new fields:
|
||||
- `vs.num_vertices` (uint32_t)
|
||||
- `vs.xfb_address[4]` (aligned_u64 array — Vulkan spec maxTransformFeedbackBuffers ≥ 1, recommended 4)
|
||||
|
||||
## Implementation skeleton
|
||||
|
||||
### A. Extension + feature exposure (panvk_vX_physical_device.c)
|
||||
|
||||
Around line 91 (KHR_robustness2 block):
|
||||
```c
|
||||
.EXT_transform_feedback = PAN_ARCH < 9, // JM-class only for now
|
||||
```
|
||||
|
||||
At feature block (~line 491):
|
||||
```c
|
||||
/* VK_EXT_transform_feedback */
|
||||
.transformFeedback = PAN_ARCH < 9,
|
||||
.geometryStreams = false, /* No GS support yet */
|
||||
```
|
||||
|
||||
At properties block (~line 1019):
|
||||
```c
|
||||
/* VK_EXT_transform_feedback */
|
||||
.maxTransformFeedbackStreams = 1, /* Up the limit if multi-stream needed; 1 is GLES3 baseline */
|
||||
.maxTransformFeedbackBuffers = 4,
|
||||
.maxTransformFeedbackBufferSize = UINT32_MAX,
|
||||
.maxTransformFeedbackStreamDataSize = 512,
|
||||
.maxTransformFeedbackBufferDataSize = 512,
|
||||
.maxTransformFeedbackBufferDataStride = 2048,
|
||||
.transformFeedbackQueries = false, /* Start without; defer to follow-up iter */
|
||||
.transformFeedbackStreamsLinesTriangles = false,
|
||||
.transformFeedbackRasterizationStreamSelect = false,
|
||||
.transformFeedbackDraw = false, /* No vkCmdDrawIndirectByteCountEXT yet */
|
||||
```
|
||||
|
||||
### B. Sysval struct fields (panvk_shader.h)
|
||||
|
||||
Add to the `vs` substruct at line 150-157, only for `PAN_ARCH < 9`:
|
||||
```c
|
||||
struct {
|
||||
#if PAN_ARCH < 9
|
||||
int32_t raw_vertex_offset;
|
||||
uint32_t num_vertices; /* NEW iter13: XFB needs per-draw vertex count */
|
||||
aligned_u64 xfb_address[4]; /* NEW iter13: 4 transform feedback buffer base addresses */
|
||||
#endif
|
||||
int32_t first_vertex;
|
||||
int32_t base_instance;
|
||||
uint32_t noperspective_varyings;
|
||||
} vs;
|
||||
```
|
||||
|
||||
(Use `#if PAN_ARCH < 9` since we're not yet supporting Valhall-CSF; can extend later.)
|
||||
|
||||
### C. Shader-side intrinsic lowering (panvk_vX_shader.c)
|
||||
|
||||
Add cases ~line 103 (inside `PAN_ARCH < 9` block):
|
||||
```c
|
||||
#if PAN_ARCH < 9
|
||||
case nir_intrinsic_load_num_vertices:
|
||||
val = load_sysval(b, graphics, bit_size, vs.num_vertices);
|
||||
break;
|
||||
case nir_intrinsic_load_xfb_address: {
|
||||
unsigned idx = nir_intrinsic_base(intr);
|
||||
assert(idx < 4);
|
||||
val = load_sysval(b, graphics, bit_size, vs.xfb_address[idx]);
|
||||
break;
|
||||
}
|
||||
#endif
|
||||
```
|
||||
|
||||
### D. NIR lowering chain integration (panvk_vX_shader.c, somewhere in pipeline-compile path)
|
||||
|
||||
After the standard nir_io_add_intrinsic_xfb_info pass and BEFORE the panvk descriptor lowering:
|
||||
```c
|
||||
if (nir->info.stage == MESA_SHADER_VERTEX &&
|
||||
nir->info.has_transform_feedback_varyings) {
|
||||
NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
|
||||
NIR_PASS(_, nir, pan_nir_lower_xfb);
|
||||
}
|
||||
```
|
||||
|
||||
Place this near the existing pan_preprocess_nir() call (panvk_vX_shader.c:509).
|
||||
|
||||
### E. Per-draw sysval population (jm/panvk_vX_cmd_draw.c)
|
||||
|
||||
After existing vs.first_vertex / vs.raw_vertex_offset sets (line ~828):
|
||||
```c
|
||||
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, draw->padded_vertex_count);
|
||||
|
||||
const struct panvk_xfb_state *xfb = &cmdbuf->state.gfx.xfb;
|
||||
for (unsigned i = 0; i < 4; i++) {
|
||||
uint64_t addr = (xfb->active && i < xfb->buffer_count)
|
||||
? (xfb->buffers[i].addr + xfb->buffers[i].offset)
|
||||
: 0;
|
||||
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[i], addr);
|
||||
}
|
||||
```
|
||||
|
||||
### F. Command buffer state (panvk_cmd_draw.h or new file)
|
||||
|
||||
Add to the per-cmdbuf graphics state:
|
||||
```c
|
||||
struct panvk_xfb_state {
|
||||
bool active; /* Between vkCmdBeginTransformFeedback and vkCmdEnd */
|
||||
unsigned buffer_count; /* From vkCmdBindTransformFeedbackBuffers */
|
||||
struct {
|
||||
uint64_t addr; /* gpu_va of the buffer base */
|
||||
uint64_t offset; /* user-supplied offset */
|
||||
uint64_t size; /* user-supplied size, or VK_WHOLE_SIZE */
|
||||
} buffers[4];
|
||||
};
|
||||
```
|
||||
|
||||
### G. Vulkan command handlers (new file: jm/panvk_vX_cmd_xfb.c)
|
||||
|
||||
```c
|
||||
VKAPI_ATTR void VKAPI_CALL
|
||||
panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
|
||||
VkCommandBuffer cmdBuf, uint32_t firstBinding, uint32_t bindingCount,
|
||||
const VkBuffer *pBuffers, const VkDeviceSize *pOffsets,
|
||||
const VkDeviceSize *pSizes)
|
||||
{
|
||||
/* Stash addresses/offsets/sizes in cmdbuf->state.gfx.xfb.buffers[] */
|
||||
}
|
||||
|
||||
VKAPI_ATTR void VKAPI_CALL
|
||||
panvk_per_arch(CmdBeginTransformFeedbackEXT)(
|
||||
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
|
||||
uint32_t counterBufferCount,
|
||||
const VkBuffer *pCounterBuffers,
|
||||
const VkDeviceSize *pCounterBufferOffsets)
|
||||
{
|
||||
/* Set cmdbuf->state.gfx.xfb.active = true; mark sysvals dirty;
|
||||
* if counter buffers supplied, read them and adjust internal byte counter
|
||||
* (resume case) */
|
||||
}
|
||||
|
||||
VKAPI_ATTR void VKAPI_CALL
|
||||
panvk_per_arch(CmdEndTransformFeedbackEXT)(
|
||||
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
|
||||
uint32_t counterBufferCount,
|
||||
const VkBuffer *pCounterBuffers,
|
||||
const VkDeviceSize *pCounterBufferOffsets)
|
||||
{
|
||||
/* Set active = false; if counter buffers supplied, write the byte counter
|
||||
* back (pause case) */
|
||||
}
|
||||
```
|
||||
|
||||
### H. meson.build registration
|
||||
|
||||
Add `jm/panvk_vX_cmd_xfb.c` to the JM file list in `src/panfrost/vulkan/meson.build`.
|
||||
|
||||
### I. rasterizerDiscardEnable
|
||||
|
||||
Honor `VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable` if not already — apps doing pure-XFB capture set this. Skip the rasterizer + frag job emission when set. Check existing PanVk JM pipeline code; this may already work.
|
||||
|
||||
## Open questions / risks
|
||||
|
||||
1. **Counter buffer semantics.** vkCmdBeginTransformFeedback's counter buffers let apps PAUSE/RESUME XFB across command buffers. Initial implementation: ignore them (advertise `transformFeedbackDraw = false` so apps don't expect resume support). Add later if needed.
|
||||
|
||||
2. **Padded vertex count vs actual vertex count.** PanVk uses `padded_vertex_count` for buffer sizing because of attribute alignment requirements. For XFB the conceptual "num_vertices" is the actual draw call count, not padded. Need to make sure `vs.num_vertices = draw->info.vertex.count` (or equivalent unpadded value), not padded_vertex_count. CHECK THIS in implementation.
|
||||
|
||||
3. **`maxTransformFeedbackStreams = 1` is tight.** GLES3 needs only 1 stream; multi-stream is GL 4.0+ and ANGLE may not require it. Confirm via ANGLE's required-features list.
|
||||
|
||||
4. **NIR pass ordering.** `pan_nir_lower_xfb` must run on the shader BEFORE the panvk descriptor lowering (which assumes only certain intrinsics survive). Put it right after `nir_lower_system_values`.
|
||||
|
||||
5. **Shader compilation: single variant or two?** Panfrost-Gallium compiles two variants (regular + xfb). For PanVk, if a pipeline has XFB outputs declared in the shader, the lowering can run on the only variant — the XFB writes happen even when the pipeline is bound for non-XFB draws (cmdbuf state's `xfb.active=false` makes all xfb_address[i]=0, and the global stores at NULL would fault). So: NEED to either (a) compile two variants like Gallium does, or (b) at draw time guard the stores at the shader level. Simpler: when xfb.active=false, no draw should be in flight that uses the XFB-lowered shader. But Vulkan allows binding an XFB pipeline outside an XFB block. **Resolution**: probably compile two variants. Defer to Phase 2 design check.
|
||||
|
||||
6. **Coverage probe.** Phase 3 probe should exercise: single buffer write, single stream, single vertex, single triangle, verify byte-exact output.
|
||||
|
||||
## Files-list summary
|
||||
|
||||
| Change | File | Lines (est) |
|
||||
|---|---|---|
|
||||
| Expose extension | `src/panfrost/vulkan/panvk_vX_physical_device.c` | +15 |
|
||||
| Sysval struct | `src/panfrost/vulkan/panvk_shader.h` | +6 |
|
||||
| Shader lowering | `src/panfrost/vulkan/panvk_vX_shader.c` | +15 |
|
||||
| NIR pass wiring | `src/panfrost/vulkan/panvk_vX_shader.c` | +6 |
|
||||
| Cmd state | `src/panfrost/vulkan/panvk_cmd_draw.h` | +15 |
|
||||
| Sysval populate | `src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c` | +15 |
|
||||
| New cmd handlers | `src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c` (NEW) | +150 |
|
||||
| Meson | `src/panfrost/vulkan/meson.build` | +1 |
|
||||
| **Total Mesa side** | | **~220 lines** |
|
||||
| Probe | `iter13/probe_xfb.c` (NEW in campaign) | +400 |
|
||||
| Probe shader | `iter13/probe_xfb.vert` (NEW) | +20 |
|
||||
| **Total probe side** | | **~420 lines** |
|
||||
|
||||
## Phase 1 verdict
|
||||
|
||||
Implementation scope is **bounded and tractable** — well-defined surface, all building blocks present, no Bifrost RE needed. Phase 2 (situation analysis) should validate:
|
||||
1. The single-variant-vs-two-variants question (open question #5 above)
|
||||
2. The padded_vertex_count question (open question #2)
|
||||
3. Spec compliance check on the property values (open question #3)
|
||||
|
||||
Then Phase 3 writes the probe, Phase 4 implements.
|
||||
|
||||
## Reference
|
||||
|
||||
- pan_nir_lower_xfb.c (85 lines, full read above)
|
||||
- panvk_shader.h:133-175 (graphics_sysvals struct)
|
||||
- panvk_vX_shader.c:87-103 (sysval lowering pattern)
|
||||
- jm/panvk_vX_cmd_draw.c:824-830 (per-draw sysval population)
|
||||
- Panfrost-Gallium oracle: src/gallium/drivers/panfrost/pan_shader.c:125-130, 593-603
|
||||
Reference in New Issue
Block a user