Files
panvk-bifrost/mesa-panvk-bifrost/phase1_iter13_source_map.md
T
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

11 KiB

Phase 1 — source map for iter13 (VK_EXT_transform_feedback in PanVk)

Closed 2026-05-20.

Headline

The implementation surface is much smaller than the initial estimate suggested. Mesa already has the hardware-side abstraction (pan_nir_lower_xfb) and PanVk has a clean sysval-injection pattern (load_sysval(b, graphics, bit_size, FIELD)). Total new code: ~250-300 lines + a probe.

The pan_nir_lower_xfb contract (oracle)

src/panfrost/compiler/pan_nir_lower_xfb.c (85 lines, Collabora 2022) does:

For every nir_store_output with XFB metadata:
   Replace with nir_store_global at address:
      buf  = nir_load_xfb_address(b, 64, .base = buffer_slot)
      idx  = nir_load_instance_id * nir_load_num_vertices + nir_load_raw_vertex_id_pan
      addr = buf + (idx * stride) + offset

Plus: replaces nir_load_vertex_id with nir_load_raw_vertex_id_pan + nir_load_raw_vertex_offset_pan (XFB programs need zero-based vertex_id for correct buffer indexing).

The intrinsics the pass uses, and PanVk's current handling:

Intrinsic PanVk handles? Notes
nir_load_xfb_address(buffer=N) NEW per-stream base address
nir_load_num_vertices NEW per-draw vertex count
nir_load_raw_vertex_id_pan (panvk_vX_shader.c:211) already wired
nir_load_raw_vertex_offset_pan (panvk_vX_shader.c:101 — JM path) already wired
nir_load_instance_id standard Mesa always available

Only 2 new intrinsic handlers needed.

PanVk's sysval injection pattern (the wiring mechanism)

The driver-shader contract is panvk_graphics_sysvals — a struct that's written by the driver per-draw and read by the shader via the FAU (Fast Auxiliary Unit) push-constant area.

Definition: src/panfrost/vulkan/panvk_shader.h:133-175.

Existing pattern (for vs.first_vertex):

  • Struct field (panvk_shader.h:154): int32_t first_vertex;
  • Shader lowering (panvk_vX_shader.c:87-88):
    case nir_intrinsic_load_first_vertex:
       val = load_sysval(b, graphics, bit_size, vs.first_vertex);
       break;
    
  • Driver populates (jm/panvk_vX_cmd_draw.c:824):
    set_gfx_sysval(cmdbuf, dirty_sysvals, vs.first_vertex, info->vertex.base);
    

Mirror this exactly for the two new fields:

  • vs.num_vertices (uint32_t)
  • vs.xfb_address[4] (aligned_u64 array — Vulkan spec maxTransformFeedbackBuffers ≥ 1, recommended 4)

Implementation skeleton

A. Extension + feature exposure (panvk_vX_physical_device.c)

Around line 91 (KHR_robustness2 block):

.EXT_transform_feedback = PAN_ARCH < 9,   // JM-class only for now

At feature block (~line 491):

/* VK_EXT_transform_feedback */
.transformFeedback = PAN_ARCH < 9,
.geometryStreams = false,   /* No GS support yet */

At properties block (~line 1019):

/* VK_EXT_transform_feedback */
.maxTransformFeedbackStreams = 1,                 /* Up the limit if multi-stream needed; 1 is GLES3 baseline */
.maxTransformFeedbackBuffers = 4,
.maxTransformFeedbackBufferSize = UINT32_MAX,
.maxTransformFeedbackStreamDataSize = 512,
.maxTransformFeedbackBufferDataSize = 512,
.maxTransformFeedbackBufferDataStride = 2048,
.transformFeedbackQueries = false,                /* Start without; defer to follow-up iter */
.transformFeedbackStreamsLinesTriangles = false,
.transformFeedbackRasterizationStreamSelect = false,
.transformFeedbackDraw = false,                   /* No vkCmdDrawIndirectByteCountEXT yet */

B. Sysval struct fields (panvk_shader.h)

Add to the vs substruct at line 150-157, only for PAN_ARCH < 9:

struct {
#if PAN_ARCH < 9
   int32_t raw_vertex_offset;
   uint32_t num_vertices;        /* NEW iter13: XFB needs per-draw vertex count */
   aligned_u64 xfb_address[4];   /* NEW iter13: 4 transform feedback buffer base addresses */
#endif
   int32_t first_vertex;
   int32_t base_instance;
   uint32_t noperspective_varyings;
} vs;

(Use #if PAN_ARCH < 9 since we're not yet supporting Valhall-CSF; can extend later.)

C. Shader-side intrinsic lowering (panvk_vX_shader.c)

Add cases ~line 103 (inside PAN_ARCH < 9 block):

#if PAN_ARCH < 9
case nir_intrinsic_load_num_vertices:
   val = load_sysval(b, graphics, bit_size, vs.num_vertices);
   break;
case nir_intrinsic_load_xfb_address: {
   unsigned idx = nir_intrinsic_base(intr);
   assert(idx < 4);
   val = load_sysval(b, graphics, bit_size, vs.xfb_address[idx]);
   break;
}
#endif

D. NIR lowering chain integration (panvk_vX_shader.c, somewhere in pipeline-compile path)

After the standard nir_io_add_intrinsic_xfb_info pass and BEFORE the panvk descriptor lowering:

if (nir->info.stage == MESA_SHADER_VERTEX &&
    nir->info.has_transform_feedback_varyings) {
   NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
   NIR_PASS(_, nir, pan_nir_lower_xfb);
}

Place this near the existing pan_preprocess_nir() call (panvk_vX_shader.c:509).

E. Per-draw sysval population (jm/panvk_vX_cmd_draw.c)

After existing vs.first_vertex / vs.raw_vertex_offset sets (line ~828):

set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, draw->padded_vertex_count);

const struct panvk_xfb_state *xfb = &cmdbuf->state.gfx.xfb;
for (unsigned i = 0; i < 4; i++) {
   uint64_t addr = (xfb->active && i < xfb->buffer_count)
                       ? (xfb->buffers[i].addr + xfb->buffers[i].offset)
                       : 0;
   set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_address[i], addr);
}

F. Command buffer state (panvk_cmd_draw.h or new file)

Add to the per-cmdbuf graphics state:

struct panvk_xfb_state {
   bool active;                          /* Between vkCmdBeginTransformFeedback and vkCmdEnd */
   unsigned buffer_count;                /* From vkCmdBindTransformFeedbackBuffers */
   struct {
      uint64_t addr;                     /* gpu_va of the buffer base */
      uint64_t offset;                   /* user-supplied offset */
      uint64_t size;                     /* user-supplied size, or VK_WHOLE_SIZE */
   } buffers[4];
};

G. Vulkan command handlers (new file: jm/panvk_vX_cmd_xfb.c)

VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBindTransformFeedbackBuffersEXT)(
   VkCommandBuffer cmdBuf, uint32_t firstBinding, uint32_t bindingCount,
   const VkBuffer *pBuffers, const VkDeviceSize *pOffsets,
   const VkDeviceSize *pSizes)
{
   /* Stash addresses/offsets/sizes in cmdbuf->state.gfx.xfb.buffers[] */
}

VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdBeginTransformFeedbackEXT)(
   VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
   uint32_t counterBufferCount,
   const VkBuffer *pCounterBuffers,
   const VkDeviceSize *pCounterBufferOffsets)
{
   /* Set cmdbuf->state.gfx.xfb.active = true; mark sysvals dirty;
    * if counter buffers supplied, read them and adjust internal byte counter
    * (resume case) */
}

VKAPI_ATTR void VKAPI_CALL
panvk_per_arch(CmdEndTransformFeedbackEXT)(
   VkCommandBuffer cmdBuf, uint32_t firstCounterBuffer,
   uint32_t counterBufferCount,
   const VkBuffer *pCounterBuffers,
   const VkDeviceSize *pCounterBufferOffsets)
{
   /* Set active = false; if counter buffers supplied, write the byte counter
    * back (pause case) */
}

H. meson.build registration

Add jm/panvk_vX_cmd_xfb.c to the JM file list in src/panfrost/vulkan/meson.build.

I. rasterizerDiscardEnable

Honor VkPipelineRasterizationStateCreateInfo.rasterizerDiscardEnable if not already — apps doing pure-XFB capture set this. Skip the rasterizer + frag job emission when set. Check existing PanVk JM pipeline code; this may already work.

Open questions / risks

  1. Counter buffer semantics. vkCmdBeginTransformFeedback's counter buffers let apps PAUSE/RESUME XFB across command buffers. Initial implementation: ignore them (advertise transformFeedbackDraw = false so apps don't expect resume support). Add later if needed.

  2. Padded vertex count vs actual vertex count. PanVk uses padded_vertex_count for buffer sizing because of attribute alignment requirements. For XFB the conceptual "num_vertices" is the actual draw call count, not padded. Need to make sure vs.num_vertices = draw->info.vertex.count (or equivalent unpadded value), not padded_vertex_count. CHECK THIS in implementation.

  3. maxTransformFeedbackStreams = 1 is tight. GLES3 needs only 1 stream; multi-stream is GL 4.0+ and ANGLE may not require it. Confirm via ANGLE's required-features list.

  4. NIR pass ordering. pan_nir_lower_xfb must run on the shader BEFORE the panvk descriptor lowering (which assumes only certain intrinsics survive). Put it right after nir_lower_system_values.

  5. Shader compilation: single variant or two? Panfrost-Gallium compiles two variants (regular + xfb). For PanVk, if a pipeline has XFB outputs declared in the shader, the lowering can run on the only variant — the XFB writes happen even when the pipeline is bound for non-XFB draws (cmdbuf state's xfb.active=false makes all xfb_address[i]=0, and the global stores at NULL would fault). So: NEED to either (a) compile two variants like Gallium does, or (b) at draw time guard the stores at the shader level. Simpler: when xfb.active=false, no draw should be in flight that uses the XFB-lowered shader. But Vulkan allows binding an XFB pipeline outside an XFB block. Resolution: probably compile two variants. Defer to Phase 2 design check.

  6. Coverage probe. Phase 3 probe should exercise: single buffer write, single stream, single vertex, single triangle, verify byte-exact output.

Files-list summary

Change File Lines (est)
Expose extension src/panfrost/vulkan/panvk_vX_physical_device.c +15
Sysval struct src/panfrost/vulkan/panvk_shader.h +6
Shader lowering src/panfrost/vulkan/panvk_vX_shader.c +15
NIR pass wiring src/panfrost/vulkan/panvk_vX_shader.c +6
Cmd state src/panfrost/vulkan/panvk_cmd_draw.h +15
Sysval populate src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c +15
New cmd handlers src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c (NEW) +150
Meson src/panfrost/vulkan/meson.build +1
Total Mesa side ~220 lines
Probe iter13/probe_xfb.c (NEW in campaign) +400
Probe shader iter13/probe_xfb.vert (NEW) +20
Total probe side ~420 lines

Phase 1 verdict

Implementation scope is bounded and tractable — well-defined surface, all building blocks present, no Bifrost RE needed. Phase 2 (situation analysis) should validate:

  1. The single-variant-vs-two-variants question (open question #5 above)
  2. The padded_vertex_count question (open question #2)
  3. Spec compliance check on the property values (open question #3)

Then Phase 3 writes the probe, Phase 4 implements.

Reference

  • pan_nir_lower_xfb.c (85 lines, full read above)
  • panvk_shader.h:133-175 (graphics_sysvals struct)
  • panvk_vX_shader.c:87-103 (sysval lowering pattern)
  • jm/panvk_vX_cmd_draw.c:824-830 (per-draw sysval population)
  • Panfrost-Gallium oracle: src/gallium/drivers/panfrost/pan_shader.c:125-130, 593-603