panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.0 KiB
Phase 4 close — iter13 VK_EXT_transform_feedback implementation
Result: GREEN. PanVk-Bifrost now implements VK_EXT_transform_feedback end-to-end.
Probe outcome
[info] VK_EXT_transform_feedback present on device
[info] transformFeedback=1 geometryStreams=0
[info] vertex 0: (0.000000, 0.000000, 4660.000000, 51966.000000)
[info] vertex 1: (1.000000, 0.000000, 4660.000000, 51966.000000)
[info] vertex 2: (2.000000, 0.000000, 4660.000000, 51966.000000)
[PASS] PanVk-Bifrost transform feedback: 3 vertices captured correctly.
Byte-exact match against expected vec4(vertex_id, instance_id=0, 0x1234, 0xcafe) for each of 3 vertices. Output buffer was pre-filled with 0xDEADBEEF sentinel — verified GPU actually wrote real data, not a stale init pattern.
Source landings on ohm (mesa 26.0.6)
Files modified (1 NEW + 6 edited):
| File | Change |
|---|---|
src/panfrost/vulkan/panvk_shader.h |
sysval struct: + uint32_t num_vertices, uint64_t xfb_address[4] (under PAN_ARCH < 9) |
src/panfrost/vulkan/panvk_vX_physical_device.c |
extension + feature + properties exposure (PAN_ARCH < 9 gate) |
src/panfrost/vulkan/panvk_vX_shader.c |
(1) #include "pan_nir.h" (2) sysval lowering cases for load_num_vertices + load_xfb_address[0..3] (3) the 3-pass XFB lowering (nir_opt_constant_folding → nir_io_add_intrinsic_xfb_info → pan_nir_lower_xfb) inserted AFTER nir_lower_io in panvk_lower_nir (4) inputs.no_idvs true for XFB-bearing vertex shaders |
src/panfrost/vulkan/panvk_cmd_draw.h |
+ xfb substruct in panvk_cmd_graphics_state (active flag + buffer_count + 4× buffers) |
src/panfrost/vulkan/panvk_vX_cmd_draw.c |
per-draw set_gfx_sysval for vs.num_vertices + vs.xfb_address[0..3] |
src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c |
NEW — CmdBind/Begin/EndTransformFeedbackEXT entry points |
src/panfrost/vulkan/meson.build |
+ 'jm/panvk_vX_cmd_xfb.c' in jm_files |
Key learnings (vs Phase 1 source map)
-
Pass placement matters. Phase 1's plan put
pan_nir_lower_xfbinsidepanvk_preprocess_nir. Wrong — at that point the shader still hasstore_deref(var-based) intrinsics.nir_lower_io(which converts var-stores →store_outputintrinsics) runs later insidepanvk_lower_nir. The pass must run right afternir_lower_io, mirroring Panfrost-Gallium's flow wherenir_lower_ioprecedes the XFB block inpan_create_shader_state. -
nir_io_add_intrinsic_xfb_infois mandatory. Phase 1 assumednir->xfb_infowas the gate. Wrong — Mesa's pass that converts SPV xfb decorations into intrinsic-attachedio_xfbinfo needs to run first. Gating onnir->info.has_transform_feedback_varyingsinstead (set by SPV→NIR for XFB-decorated outputs) is the correct trigger. -
no_idvsis non-negotiable. Phase 1 noted Panfrost-Gallium setsinputs.no_idvs = has_transform_feedback_varyingsbut framed it as optional. It isn't — IDVS splits vertex shading into position + varying paths, but the JM job model for the varying path doesn't run for raster-discarded draws. Single non-IDVS vertex job is required for XFB. -
The sysval dirty mechanism does work for array fields.
set_gfx_sysval(..., vs.xfb_address[0], _xa0)expands correctly viaoffsetof(struct, vs.xfb_address[0])+sizeof(uint64_t)macros. Confirmed empirically — the FAU upload triggered as expected and the shader read the correct address.
What the working shader looks like
After all passes, the vertex shader does:
store_global(addr = xfb_address[0] + (instance_id * num_vertices + vertex_id) * stride,
value = (vertex_id_as_float, instance_id_as_float, 4660.0, 51966.0))
Where xfb_address[0] is a 64-bit FAU sysval populated per-draw from cmdbuf->state.gfx.xfb.buffers[0].addr + offset.
Phase 4 artifact snapshot
Working state of all 7 source files captured in iter13/applied_state/ for replication.
Next: Phase 5
Per CLAUDE.md "Reviews are never skippable" — second-model review of the implementation.