panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 KiB
Phase 1 — source map for iter13 (VK_EXT_transform_feedback in PanVk)
Closed 2026-05-20.
Headline
The implementation surface is much smaller than the initial estimate suggested. Mesa already has the hardware-side abstraction (pan_nir_lower_xfb) and PanVk has a clean sysval-injection pattern (load_sysval(b, graphics, bit_size, FIELD)). Total new code: ~250-300 lines + a probe.
The pan_nir_lower_xfb contract (oracle)
src/panfrost/compiler/pan_nir_lower_xfb.c (85 lines, Collabora 2022) does:
For every nir_store_output with XFB metadata:
Replace with nir_store_global at address:
buf = nir_load_xfb_address(b, 64, .base = buffer_slot)
idx = nir_load_instance_id * nir_load_num_vertices + nir_load_raw_vertex_id_pan
addr = buf + (idx * stride) + offset
Plus: replaces nir_load_vertex_id with nir_load_raw_vertex_id_pan + nir_load_raw_vertex_offset_pan (XFB programs need zero-based vertex_id for correct buffer indexing).
The intrinsics the pass uses, and PanVk's current handling:
| Intrinsic | PanVk handles? | Notes |
|---|---|---|
nir_load_xfb_address(buffer=N) |
❌ NEW | per-stream base address |
nir_load_num_vertices |
❌ NEW | per-draw vertex count |
nir_load_raw_vertex_id_pan |
✅ (panvk_vX_shader.c:211) | already wired |
nir_load_raw_vertex_offset_pan |
✅ (panvk_vX_shader.c:101 — JM path) | already wired |
nir_load_instance_id |
✅ standard Mesa | always available |
Only 2 new intrinsic handlers needed.
PanVk's sysval injection pattern (the wiring mechanism)
The driver-shader contract is panvk_graphics_sysvals — a struct that's written by the driver per-draw and read by the shader via the FAU (Fast Auxiliary Unit) push-constant area.
Definition: src/panfrost/vulkan/panvk_shader.h:133-175.
Existing pattern (for vs.first_vertex):
- Struct field (panvk_shader.h:154):
int32_t first_vertex; - Shader lowering (panvk_vX_shader.c:87-88):
case nir_intrinsic_load_first_vertex: val = load_sysval(b, graphics, bit_size, vs.first_vertex); break; - Driver populates (jm/panvk_vX_cmd_draw.c:824):
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.first_vertex, info->vertex.base);
Mirror this exactly for the two new fields:
vs.num_vertices(uint32_t)vs.xfb_address[4](aligned_u64 array — Vulkan spec maxTransformFeedbackBuffers ≥ 1, recommended 4)
Implementation skeleton
A. Extension + feature exposure (panvk_vX_physical_device.c)
Around line 91 (KHR_robustness2 block):
.EXT_transform_feedback = PAN_ARCH < 9, // JM-class only for now
At feature block (~line 491):
/* VK_EXT_transform_feedback */
.transformFeedback = PAN_ARCH < 9,
.geometryStreams = false, /* No GS support yet */
At properties block (~line 1019):
/* VK_EXT_transform_feedback */
.maxTransformFeedbackStreams = 1, /* Up the limit if multi-stream needed; 1 is GLES3 baseline */
.maxTransformFeedbackBuffers = 4,
.maxTransformFeedbackBufferSize = UINT32_MAX,
.maxTransformFeedbackStreamDataSize = 512,
.maxTransformFeedbackBufferDataSize = 512,
.maxTransformFeedbackBufferDataStride = 2048,
.transformFeedbackQueries = false, /* Start without; defer to follow-up iter */
.transformFeedbackStreamsLinesTriangles = false,
.transformFeedbackRasterizationStreamSelect = false,
.transformFeedbackDraw = false, /* No vkCmdDrawIndirectByteCountEXT yet */
B. Sysval struct fields (panvk_shader.h)
Add to the vs substruct at line 150-157, only for PAN_ARCH < 9:
struct {
#if PAN_ARCH < 9
int32_t raw_vertex_offset;
uint32_t num_vertices; /* NEW iter13: XFB needs per-draw vertex count */
aligned_u64 xfb_address[4]; /* NEW iter13: 4 transform feedback buffer base addresses */
#endif
int32_t first_vertex;
int32_t base_instance;
uint32_t noperspective_varyings;
} vs;
(Use #if PAN_ARCH < 9 since we're not yet supporting Valhall-CSF; can extend later.)
C. Shader-side intrinsic lowering (panvk_vX_shader.c)
Add cases ~line 103 (inside PAN_ARCH < 9 block):
#if PAN_ARCH < 9
case nir_intrinsic_load_num_vertices:
val = load_sysval(b, graphics, bit_size, vs.num_vertices);
break;
case nir_intrinsic_load_xfb_address: {
unsigned idx = nir_intrinsic_base(intr);
assert(idx < 4);
val = load_sysval(b, graphics, bit_size, vs.xfb_address[idx]);
break;
}
#endif
D. NIR lowering chain integration (panvk_vX_shader.c, somewhere in pipeline-compile path)
After the standard nir_io_add_intrinsic_xfb_info pass and BEFORE the panvk descriptor lowering:
if (nir->info.stage == MESA_SHADER_VERTEX &&
nir->info.has_transform_feedback_varyings) {
NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
NIR_PASS(_, nir, pan_nir_lower_xfb);
}
Place this near the existing pan_preprocess_nir() call (panvk_vX_shader.c:509).
E. Per-draw sysval population (jm/panvk_vX_cmd_draw.c)
After existing vs.first_vertex / vs.raw_vertex_offset sets (line ~828):
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, draw->padded_vertex_count);
const struct panvk_xfb_state *xfb = &cmdbuf->state.gfx.xfb;
for (unsigned i = 0; i < 4; i++) {
uint64_t addr = (xfb->active && i < xfb->buffer_count)
? (xfb->buffers[i].addr + xfb->buffers[i].offset)
: 0;
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[i], addr);
}
F. Command buffer state (panvk_cmd_draw.h or new file)
Add to the per-cmdbuf graphics state:
struct panvk_xfb_state {
bool active; /* Between vkCmdBeginTransformFeedback and vkCmdEnd */
unsigned buffer_count; /* From vkCmdBindTransformFeedbackBuffers */
struct {
uint64_t addr; /* gpu_va of the buffer base */
uint64_t offset; /* user-supplied offset */
uint64_t size; /* user-supplied size, or VK_WHOLE_SIZE */
} buffers[4];
};
G. Vulkan command handlers (new file: jm/panvk_vX_cmd_xfb.c)
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
VkCommandBuffer cmdBuf, uint32_t firstBinding, uint32_t bindingCount,
const VkBuffer *pBuffers, const VkDeviceSize *pOffsets,
const VkDeviceSize *pSizes)
{
/* Stash addresses/offsets/sizes in cmdbuf->state.gfx.xfb.buffers[] */
}
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBeginTransformFeedbackEXT)(
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
uint32_t counterBufferCount,
const VkBuffer *pCounterBuffers,
const VkDeviceSize *pCounterBufferOffsets)
{
/* Set cmdbuf->state.gfx.xfb.active = true; mark sysvals dirty;
* if counter buffers supplied, read them and adjust internal byte counter
* (resume case) */
}
VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdEndTransformFeedbackEXT)(
VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
uint32_t counterBufferCount,
const VkBuffer *pCounterBuffers,
const VkDeviceSize *pCounterBufferOffsets)
{
/* Set active = false; if counter buffers supplied, write the byte counter
* back (pause case) */
}
H. meson.build registration
Add jm/panvk_vX_cmd_xfb.c to the JM file list in src/panfrost/vulkan/meson.build.
I. rasterizerDiscardEnable
Honor VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable if not already — apps doing pure-XFB capture set this. Skip the rasterizer + frag job emission when set. Check existing PanVk JM pipeline code; this may already work.
Open questions / risks
-
Counter buffer semantics. vkCmdBeginTransformFeedback's counter buffers let apps PAUSE/RESUME XFB across command buffers. Initial implementation: ignore them (advertise
transformFeedbackDraw = falseso apps don't expect resume support). Add later if needed. -
Padded vertex count vs actual vertex count. PanVk uses
padded_vertex_countfor buffer sizing because of attribute alignment requirements. For XFB the conceptual "num_vertices" is the actual draw call count, not padded. Need to make surevs.num_vertices = draw->info.vertex.count(or equivalent unpadded value), not padded_vertex_count. CHECK THIS in implementation. -
maxTransformFeedbackStreams = 1is tight. GLES3 needs only 1 stream; multi-stream is GL 4.0+ and ANGLE may not require it. Confirm via ANGLE's required-features list. -
NIR pass ordering.
pan_nir_lower_xfbmust run on the shader BEFORE the panvk descriptor lowering (which assumes only certain intrinsics survive). Put it right afternir_lower_system_values. -
Shader compilation: single variant or two? Panfrost-Gallium compiles two variants (regular + xfb). For PanVk, if a pipeline has XFB outputs declared in the shader, the lowering can run on the only variant — the XFB writes happen even when the pipeline is bound for non-XFB draws (cmdbuf state's
xfb.active=falsemakes all xfb_address[i]=0, and the global stores at NULL would fault). So: NEED to either (a) compile two variants like Gallium does, or (b) at draw time guard the stores at the shader level. Simpler: when xfb.active=false, no draw should be in flight that uses the XFB-lowered shader. But Vulkan allows binding an XFB pipeline outside an XFB block. Resolution: probably compile two variants. Defer to Phase 2 design check. -
Coverage probe. Phase 3 probe should exercise: single buffer write, single stream, single vertex, single triangle, verify byte-exact output.
Files-list summary
| Change | File | Lines (est) |
|---|---|---|
| Expose extension | src/panfrost/vulkan/panvk_vX_physical_device.c |
+15 |
| Sysval struct | src/panfrost/vulkan/panvk_shader.h |
+6 |
| Shader lowering | src/panfrost/vulkan/panvk_vX_shader.c |
+15 |
| NIR pass wiring | src/panfrost/vulkan/panvk_vX_shader.c |
+6 |
| Cmd state | src/panfrost/vulkan/panvk_cmd_draw.h |
+15 |
| Sysval populate | src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c |
+15 |
| New cmd handlers | src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c (NEW) |
+150 |
| Meson | src/panfrost/vulkan/meson.build |
+1 |
| Total Mesa side | ~220 lines | |
| Probe | iter13/probe_xfb.c (NEW in campaign) |
+400 |
| Probe shader | iter13/probe_xfb.vert (NEW) |
+20 |
| Total probe side | ~420 lines |
Phase 1 verdict
Implementation scope is bounded and tractable — well-defined surface, all building blocks present, no Bifrost RE needed. Phase 2 (situation analysis) should validate:
- The single-variant-vs-two-variants question (open question #5 above)
- The padded_vertex_count question (open question #2)
- Spec compliance check on the property values (open question #3)
Then Phase 3 writes the probe, Phase 4 implements.
Reference
- pan_nir_lower_xfb.c (85 lines, full read above)
- panvk_shader.h:133-175 (graphics_sysvals struct)
- panvk_vX_shader.c:87-103 (sysval lowering pattern)
- jm/panvk_vX_cmd_draw.c:824-830 (per-draw sysval population)
- Panfrost-Gallium oracle: src/gallium/drivers/panfrost/pan_shader.c:125-130, 593-603