panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6.2 KiB
Phase 4 progress (incomplete) — iter16
Status: WIP. Probe-correct, infrastructure-in-place, integration-blocked.
What works
panvk_vX_winding.c(new file) compiles clean, builds into the v6/v7 archives aspanvk_v6_get_winding_table/panvk_v7_get_winding_tablesymbols. Tables for 7 topologies verified by Phase 3 probe expectations.- The injection point in
jm/panvk_vX_cmd_draw.c::CmdDrawcorrectly detectsxfb.active + non-LIST topology, looks up the winding table, builds the synthetic index buffer with the correct decomposition pattern (0 1 2 1 3 2 2 3 4 3 5 4 4 5 6 5 7 6for an 8-vert tri-strip), and builds theVkDrawIndexedIndirectCommandwithindexCount = 18. - The
vs.num_verticessysval override correctly usesdecomposed_count(18) instead ofinfo->vertex.count(0 for indexed draws). - IB and topology state overrides + dirty bits set correctly.
What's broken
- After
panvk_cmd_draw_indirect(cmdbuf, &draw)returns, the captured XFB output shows 8 entries of0,1,2,3,4,5,6,7, identical to the iter13 baseline non-indexed dispatch. Expected: 18 entries of0,1,2,1,3,2,.... - Entries 8..63 of the capture buffer are 0xDEADBEEF (sentinels). So the dispatch was 8 invocations, with gl_VertexIndex consistent with non-indexed firstVertex=0.
- The fall-through trace
[iter16] FALL-THROUGH to non-indexed CmdDrawdoes not print, confirming thereturnfrom the injection block fires correctly.
What's been verified to NOT be the cause
- Probe correctness: a parallel sanity probe (
probe_idx.c) callsvkCmdBindIndexBuffer + vkCmdDrawIndexed(6 indices, [10..15])and correctly captures 10,11,12,13,14,15 via XFB. So:- iter13's XFB implementation handles indexed draws perfectly via the public CmdDrawIndexed entry.
- The patched library doesn't regress indexed XFB.
- IB-state dirty marking: added
gfx_state_set_dirty(cmdbuf, IB)after override (matchesCmdBindIndexBuffer2). No effect. - Topology dynamic-state dirty bit: added
BITSET_SET(...dirty, MESA_VK_DYNAMIC_IA_PRIMITIVE_TOPOLOGY). No effect.
Hypothesis (untested)
The difference between "my injection inside CmdDraw" and "the public CmdDrawIndexed entry" must be in implicit state setup that happens BETWEEN the bind and the draw, but specifically requires the bind to have been a real vkCmd call (not just a direct state mutation). Possibilities:
- BO tracking: when
CmdBindIndexBuffer2registers the VkBuffer with the batch, that may add the underlying BO to the batch's BO-list for kernel mapping. My synthetic IB allocated viapanvk_cmd_alloc_dev_memshould be tied to the cmdbuf but maybe needs explicit BO-list registration. - Vertex-job descriptor cached pre-draw: an earlier point in command recording may have emitted a vertex-job descriptor based on the topology+IB-bound state at that time. My runtime override doesn't trigger a re-emission because the dirty-bit flow doesn't reach the descriptor cache.
- Render-pass-scope state snapshot:
pBeginRenderingmay have captured topology/IB into batch-local copies that my mutation doesn't update.
Resolving any of these requires either: deep panvk internals expertise; GPU-side debugging tools (RGP / Mali Graph Profiler); or restructuring the iter16 fix to operate at a different layer (e.g. NIR-pass-level decomposition, or a state-restore pattern that goes through pBindIB).
Consulted Sonnet architect 2026-05-21 — verdict + outcome
Architect picked Path B (call panvk_per_arch(CmdDrawIndexed) from inside the injection instead of constructing the indir command + calling panvk_cmd_draw_indirect manually). Diagnosis: draw->info.index.size = 0 somewhere; using the public entry should fix it.
Tested. Same failure. Captured 8 entries 0,1,2,3,4,5,6,7 (non-indexed pattern). The architect's diagnosis didn't apply — my code already sets .index.size = cmdbuf->state.gfx.ib.index_size = 4. The bug isn't in that struct field.
Additional test: a sanity probe that calls vkCmdBindIndexBuffer AFTER pBeginRendering, before BindPipeline works perfectly (captures the bound indices via XFB). So render-pass scope itself isn't the gap. The gap is specifically about state-mutation-from-within-CmdDraw vs separate-vkCmdBindIndexBuffer-call-as-its-own-vkCmd. Possibly:
- pipeline-bind-time descriptor emission captures IB-bound state at that moment
- some BO-list registration happens in CmdBindIndexBuffer2 (via VK_FROM_HANDLE(panvk_buffer) path) that direct state writes skip
- Mali JM-specific dirty-tracking that needs explicit invalidation we're missing
Architect's Path C (NIR-pass-level decomposition) is the remaining structural option — 200-400 LoC in pan_nir_lower_xfb to emit multiple store_globals per VS invocation. Bypasses dispatch entirely. Multi-day investment in Mesa internals.
Recommended next attempts (in order)
- Path D — defer iter16 (chosen 2026-05-21): documentary close. Campaign's iter13/iter15 deliverables unchanged. 162 winding fails remain known/categorized.
- Path C — NIR-pass decomposition: when bandwidth allows. Bypasses the dispatch-level mystery entirely by doing decomposition at shader-compile time. Pure Mesa work; could land upstream alongside iter13's transform_feedback patches.
- Path B — deep debug: revisit with Mali Graph Profiler / RGP to see what GPU descriptors are actually being committed at dispatch. Likely 1-2 more days of driver-internals work to isolate the BO-or-cache divergence.
Files modified on ohm (for resume)
src/panfrost/vulkan/panvk_cmd_draw.h— extended xfb substruct + winding_table struct + per-arch declsrc/panfrost/vulkan/panvk_vX_cmd_draw.c— vs.num_vertices override + debug fprintf (remove before commit)src/panfrost/vulkan/jm/panvk_vX_cmd_draw.c— CmdDraw injection + debug fprintfs (remove before commit)src/panfrost/vulkan/panvk_vX_winding.c— NEWsrc/panfrost/vulkan/meson.build— register winding.c
Probe state
/home/mfritsche/src/panvk-bifrost/iter16/probe_winding.c works as a regression test. Verified to FAIL on iter13 r3 baseline (captures 8 not 18 for triangle_strip). Will PASS when the fix lands. Pre-iter16 baseline + iter16-WIP both fail identically — useful for confirming "did the fix change anything observable yet."
— claude-noether, 2026-05-21