initial seed: retrofit campaign lineage from local working trees

panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan video decode) shipped before this repo existed; the deliverable patches live in marfrit-packages, but the reasoning chain, phase docs, and source-state evidence lived only in local working trees on the development host. This retrofit imports: - mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18) (libmali stub blobs at iter18/blob/ excluded — 109MB of RE artifacts replaced with a README pointer) - mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe - evidence/ — frozen .tgz source snapshots at each milestone (basis for the 0005 patch diff generation) Future iterations should branch off here from day one, so each iter is a commit rather than a snapshot. See [[feedback-session-local-process-pins]] for the process drift this retrofit closes. Total: 1.9 MB across 124 files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00
parent 430d0da278
commit a4e7d8ab90
124 changed files with 22551 additions and 1 deletions
@@ -0,0 +1,74 @@
+# Phase 1 — source map for iter16
+
+Explore agent ran 2026-05-21 on `/home/mfritsche/src/mesa-ref/mesa/src/panfrost/vulkan/`. Mirror state on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`.
+
+## Injection points
+
+### Entry points (jm/panvk_vX_cmd_draw.c)
+
+| Function | Lines | Notes |
+|---|---|---|
+| `panvk_per_arch(CmdDraw)` | 1796–1827 | sets `draw.info.vertex.count = vertexCount`; calls `panvk_cmd_draw(cmdbuf, &draw)` |
+| `panvk_per_arch(CmdDrawIndexed)` | 1830–1868 | builds `VkDrawIndexedIndirectCommand` on the fly; calls `panvk_cmd_draw_indirect()` |
+| `panvk_per_arch(CmdDrawIndirect)` | (similar) | GPU-side; **out of iter16 scope** |
+
+Both terminate in `prepare_draw()`. For `info.vs.idvs=false` (the iter13-XFB path), the dispatch goes through `panvk_draw_prepare_vertex_job` + optional tiler.
+
+### Pipeline topology
+
+Stored in **Vulkan dynamic graphics state** as `cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology`. Accessed in `panvk_emit_tiler_primitive()` at line 917 via `translate_prim_topology(ia->primitive_topology)`.
+
+### Index buffer state
+
+`cmdbuf->state.gfx.ib`:
+- `.dev_addr` — GPU VA
+- `.size` — byte count
+- `.index_size` — 1/2/4 bytes per index
+
+Bound by `vkCmdBindIndexBuffer2` at line 1010 (in `panvk_vX_cmd_draw.c`, not the jm/ variant).
+
+### Scratch BO allocator
+
+`panvk_cmd_alloc_dev_mem(cmdbuf, pool_type, size, alignment)` returns `struct pan_ptr { void *cpu; uint64_t gpu; }`. Lifetime tied to command buffer. Used at line 1844 for the synthetic `VkDrawIndexedIndirectCommand`, at line 459 for varying buffers.
+
+### XFB sysval injection
+
+`cmd_prepare_draw_sysvals` (line 813 in `panvk_vX_cmd_draw.c`). iter13 added `set_gfx_sysval(...vs.xfb_address[N], ...)` and `set_gfx_sysval(...vs.num_vertices, info->vertex.count)`.
+
+## Phase 2 design implications
+
+Cleanest injection sequence (in `panvk_cmd_draw`, before the prepare_draw call):
+
+```
+if (cmdbuf->state.gfx.xfb.active &&
+    needs_decomposition(dyns->ia.primitive_topology)) {
+    /* Compute decomposed count + build synthetic index buffer */
+    /* Override draw's topology + index buffer in the existing state */
+    /* Save/restore so user's actual bind state isn't trashed */
+}
+```
+
+The save/restore is critical — the user might issue more draws with the same topology after the XFB-active one. We don't want to corrupt their state.
+
+Three sub-paths in implementation:
+1. **CmdDraw + non-LIST topology + XFB active**: easiest. Synthetic index buffer is just `{decomp_idx(0), decomp_idx(1), ...}`. Convert draw to indexed.
+2. **CmdDrawIndexed + non-LIST + XFB**: must resolve through user's index buffer. CPU-side: map user's index buffer (vkMapMemory? no — we have the GPU VA, would need a host-coherent map). Alternative: build synthetic index buffer that points to **positions in the user's index buffer**, but Bifrost doesn't do double-indirect. So we need CPU resolution.
+3. **CmdDrawIndirect + non-LIST + XFB**: GPU compute pass to fill the synthetic index buffer. **Out of iter16 scope.**
+
+For path 2, the user's index buffer is host-mappable if it was created with `HOST_VISIBLE`, but it may also be device-local. We'd need to add a transfer step to copy device-local indices into a host-visible buffer first.
+
+**Simpler path 2 alternative:** dispatch a compute shader that reads the user's index buffer (GPU-side) and writes the synthetic decomposed index buffer (GPU-side). Compute shader code is straightforward (~30 lines GLSL). This avoids the host-visible-buffer requirement entirely.
+
+But path 2's CPU resolve has the cleaner code shape if we restrict to host-visible index buffers as a known limitation. Most CTS tests use host-visible index buffers; the limitation matches real-world usage of XFB+indexed (uncommon).
+
+## Counts of code touched
+
+- `jm/panvk_vX_cmd_draw.c`: ~150 LoC of new decomposition + dispatch override
+- `panvk_vX_cmd_draw.c`: ~30 LoC for sysval `vs.num_vertices` update
+- `panvk_cmd_draw.h`: ~20 LoC for new helper macros / topology classification
+- NEW file `iter16/winding_lower.c` (or inline): ~100 LoC for the 7 topology-specific decomposition tables
+- Probe: ~250 LoC (Phase 3)
+
+**Total estimated: ~300 LoC + 250 LoC probe = 550 LoC.** In line with Phase 0 estimate.
+
+— claude-noether, 2026-05-21