Files
panvk-bifrost/mesa-panvk-bifrost/phase4_iter13_close.md
T
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

4.0 KiB
Raw Blame History

Phase 4 close — iter13 VK_EXT_transform_feedback implementation

Result: GREEN. PanVk-Bifrost now implements VK_EXT_transform_feedback end-to-end.

Probe outcome

[info] VK_EXT_transform_feedback present on device
[info] transformFeedback=1 geometryStreams=0
[info] vertex 0: (0.000000, 0.000000, 4660.000000, 51966.000000)
[info] vertex 1: (1.000000, 0.000000, 4660.000000, 51966.000000)
[info] vertex 2: (2.000000, 0.000000, 4660.000000, 51966.000000)
[PASS] PanVk-Bifrost transform feedback: 3 vertices captured correctly.

Byte-exact match against expected vec4(vertex_id, instance_id=0, 0x1234, 0xcafe) for each of 3 vertices. Output buffer was pre-filled with 0xDEADBEEF sentinel — verified GPU actually wrote real data, not a stale init pattern.

Source landings on ohm (mesa 26.0.6)

Files modified (1 NEW + 6 edited):

File Change
src/panfrost/vulkan/panvk_shader.h sysval struct: + uint32_t num_vertices, uint64_t xfb_address[4] (under PAN_ARCH < 9)
src/panfrost/vulkan/panvk_vX_physical_device.c extension + feature + properties exposure (PAN_ARCH < 9 gate)
src/panfrost/vulkan/panvk_vX_shader.c (1) #include "pan_nir.h" (2) sysval lowering cases for load_num_vertices + load_xfb_address[0..3] (3) the 3-pass XFB lowering (nir_opt_constant_foldingnir_io_add_intrinsic_xfb_infopan_nir_lower_xfb) inserted AFTER nir_lower_io in panvk_lower_nir (4) inputs.no_idvs true for XFB-bearing vertex shaders
src/panfrost/vulkan/panvk_cmd_draw.h + xfb substruct in panvk_cmd_graphics_state (active flag + buffer_count + 4× buffers)
src/panfrost/vulkan/panvk_vX_cmd_draw.c per-draw set_gfx_sysval for vs.num_vertices + vs.xfb_address[0..3]
src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c NEW — CmdBind/Begin/EndTransformFeedbackEXT entry points
src/panfrost/vulkan/meson.build + 'jm/panvk_vX_cmd_xfb.c' in jm_files

Key learnings (vs Phase 1 source map)

  1. Pass placement matters. Phase 1's plan put pan_nir_lower_xfb inside panvk_preprocess_nir. Wrong — at that point the shader still has store_deref (var-based) intrinsics. nir_lower_io (which converts var-stores → store_output intrinsics) runs later inside panvk_lower_nir. The pass must run right after nir_lower_io, mirroring Panfrost-Gallium's flow where nir_lower_io precedes the XFB block in pan_create_shader_state.

  2. nir_io_add_intrinsic_xfb_info is mandatory. Phase 1 assumed nir->xfb_info was the gate. Wrong — Mesa's pass that converts SPV xfb decorations into intrinsic-attached io_xfb info needs to run first. Gating on nir->info.has_transform_feedback_varyings instead (set by SPV→NIR for XFB-decorated outputs) is the correct trigger.

  3. no_idvs is non-negotiable. Phase 1 noted Panfrost-Gallium sets inputs.no_idvs = has_transform_feedback_varyings but framed it as optional. It isn't — IDVS splits vertex shading into position + varying paths, but the JM job model for the varying path doesn't run for raster-discarded draws. Single non-IDVS vertex job is required for XFB.

  4. The sysval dirty mechanism does work for array fields. set_gfx_sysval(..., vs.xfb_address[0], _xa0) expands correctly via offsetof(struct, vs.xfb_address[0]) + sizeof(uint64_t) macros. Confirmed empirically — the FAU upload triggered as expected and the shader read the correct address.

What the working shader looks like

After all passes, the vertex shader does:

store_global(addr = xfb_address[0] + (instance_id * num_vertices + vertex_id) * stride,
             value = (vertex_id_as_float, instance_id_as_float, 4660.0, 51966.0))

Where xfb_address[0] is a 64-bit FAU sysval populated per-draw from cmdbuf->state.gfx.xfb.buffers[0].addr + offset.

Phase 4 artifact snapshot

Working state of all 7 source files captured in iter13/applied_state/ for replication.

Next: Phase 5

Per CLAUDE.md "Reviews are never skippable" — second-model review of the implementation.