# Phase 8 close — iter16: DEFERRED **Result:** iter16 closes as **Path D — investigation complete, fix deferred**. The 162 winding-order CTS fails categorized in iter15 remain known/documented; campaign's iter13 + iter15 deliverables unchanged. ## What was attempted Driver-side primitive decomposition for transform_feedback on non-LIST topologies (TRIANGLE_STRIP / LINE_STRIP / TRIANGLE_FAN / *_WITH_ADJACENCY). Plan: inside `panvk_per_arch(CmdDraw)`, when XFB-active + non-LIST, build a synthetic index buffer encoding the spec-required decomposition, dispatch as indexed-LIST. **Infrastructure built (all working, tested):** - `panvk_vX_winding.c` — topology decomposition tables for 7 topologies - `panvk_winding_table` struct + `panvk_per_arch(get_winding_table)` API - `cmdbuf->state.gfx.xfb.decomposed_count` field + sysval override for `vs.num_vertices` - IB + topology state save/restore around the synthetic dispatch - IB dirty bit + `MESA_VK_DYNAMIC_IA_PRIMITIVE_TOPOLOGY` dirty bit set - Regression probe (`iter16/probe_winding.c`) parametrized for 3+ topologies **What didn't work (Path A & Path B both):** - Calling `panvk_cmd_draw_indirect` directly with a manually-constructed `VkDrawIndexedIndirectCommand` (Path A) - Calling `panvk_per_arch(CmdDrawIndexed)` from inside the injection after state mutation (Path B, per architect's recommendation) Both produce the same 8-entry non-indexed output (`0,1,2,3,4,5,6,7` for an 8-vert triangle strip), not the expected 18-entry decomposed output (`0,1,2,1,3,2,...`). ## What was definitively isolated - iter13 XFB + vkCmdDrawIndexed via public entries: **works** — confirmed by `iter16/probe_idx.c`. 6 indices `[10,11,12,13,14,15]` captured exactly. - Render-pass scope isn't the issue: `vkCmdBindIndexBuffer AFTER pBeginRendering` works fine if it's a real `vkCmd` call. - `info.index.size` being zero isn't the issue (architect's diagnosis): my draw construction set it correctly to 4. - The mystery: **state-mutation-from-within-CmdDraw doesn't reproduce what a separate `vkCmdBindIndexBuffer2` call sets up.** Hypotheses still on the table: - Pipeline-bind-time descriptor emission captures IB-bound state at that moment - `VK_FROM_HANDLE(panvk_buffer)` in CmdBindIndexBuffer2 registers BO with batch in a way direct state writes skip - Mali JM dirty-tracking needs explicit invalidation we're missing - Resolving requires either Mali Graph Profiler / RGP (we don't have) or significantly more time in driver internals. ## What ships from iter16 - ALL Phase 0-3 docs in `iter16/` (substrate, source map, design lock, probe + Makefile) - The full WIP code in `iter16/applied_state/` — `panvk_vX_winding.c` plus the modifications to `panvk_cmd_draw.h`, `panvk_vX_cmd_draw.c`, `jm/panvk_vX_cmd_draw.c`, `meson.build` — applied on ohm but reverted from any published package - `iter16/probe_winding.c` + `probe_idx.c` — both probes work as regression tests if iter16 resumes - `iter16/phase4_progress.md` — detailed status for resumer, including the architect consultation outcome - `iter16/phase8_close.md` — this doc ## What does NOT ship from iter16 - No code changes to the published `mesa-panvk-bifrost-26.0.6.r3` package - No CTS rerun (the 162 winding fails remain — same as iter15's measurement) - No upstream Mesa MR ## Why deferred and not "Path C — NIR-pass decomposition" Path C is the remaining structural option and probably the right long-term fix (200-400 LoC in `pan_nir_lower_xfb` to emit multiple `nir_store_global` calls per VS invocation — one per primitive each vertex contributes to). It would bypass the dispatch-level mystery entirely. But: - It's multi-day Mesa-internals work (NIR builder + shader-cache invalidation + per-topology lowering rules). - Real-world impact is approximately zero: **ANGLE on Vulkan (the iter13/Brave motivator) doesn't trigger this path** because ANGLE pre-decomposes strip topologies before issuing the Vulkan call (mirroring OpenGL's own decomposition rules). - The iter13 + iter15 standing campaign deliverables (Vulkan-on-Brave + 75.7% transform_feedback CTS pass rate) are NOT affected by leaving this open. Path C remains the right move if someone returns to iter16 with time/motivation. ## ohm state cleanup The WIP iter16 patches are still applied on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`. They build clean. The patched lib is in `/home/mfritsche/panvk-patched-libs/libvulkan_panfrost.so` but **the system-installed `/usr/lib/panvk-bifrost/` is r3 untouched**. So the campaign's published-package behavior is unchanged. To fully revert ohm to a clean iter13-only source state (if needed for a future iter): the patches are in `iter16/applied_state/`. Easy to identify (all marked with `iter16:` comments) and reverse-patch. ## Bottom line iter16 = investigation closed. Path D (defer) chosen because Path B (architect's pick) didn't pan out and Path C (NIR pass) wasn't worth a multi-day investment given zero real-world impact on the iter9/iter13 ANGLE-on-Vulkan campaign target. Anyone resuming iter16 should start from `iter16/phase4_progress.md` and the listed hypotheses. — claude-noether, 2026-05-21