Files
panvk-bifrost/mesa-panvk-bifrost/iter16/phase1_source_map.md
T
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

75 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 — source map for iter16
Explore agent ran 2026-05-21 on `/home/mfritsche/src/mesa-ref/mesa/src/panfrost/vulkan/`. Mirror state on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`.
## Injection points
### Entry points (jm/panvk_vX_cmd_draw.c)
| Function | Lines | Notes |
|---|---|---|
| `panvk_per_arch(CmdDraw)` | 17961827 | sets `draw.info.vertex.count = vertexCount`; calls `panvk_cmd_draw(cmdbuf, &draw)` |
| `panvk_per_arch(CmdDrawIndexed)` | 18301868 | builds `VkDrawIndexedIndirectCommand` on the fly; calls `panvk_cmd_draw_indirect()` |
| `panvk_per_arch(CmdDrawIndirect)` | (similar) | GPU-side; **out of iter16 scope** |
Both terminate in `prepare_draw()`. For `info.vs.idvs=false` (the iter13-XFB path), the dispatch goes through `panvk_draw_prepare_vertex_job` + optional tiler.
### Pipeline topology
Stored in **Vulkan dynamic graphics state** as `cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology`. Accessed in `panvk_emit_tiler_primitive()` at line 917 via `translate_prim_topology(ia->primitive_topology)`.
### Index buffer state
`cmdbuf->state.gfx.ib`:
- `.dev_addr` — GPU VA
- `.size` — byte count
- `.index_size` — 1/2/4 bytes per index
Bound by `vkCmdBindIndexBuffer2` at line 1010 (in `panvk_vX_cmd_draw.c`, not the jm/ variant).
### Scratch BO allocator
`panvk_cmd_alloc_dev_mem(cmdbuf, pool_type, size, alignment)` returns `struct pan_ptr { void *cpu; uint64_t gpu; }`. Lifetime tied to command buffer. Used at line 1844 for the synthetic `VkDrawIndexedIndirectCommand`, at line 459 for varying buffers.
### XFB sysval injection
`cmd_prepare_draw_sysvals` (line 813 in `panvk_vX_cmd_draw.c`). iter13 added `set_gfx_sysval(...vs.xfb_address[N], ...)` and `set_gfx_sysval(...vs.num_vertices, info->vertex.count)`.
## Phase 2 design implications
Cleanest injection sequence (in `panvk_cmd_draw`, before the prepare_draw call):
```
if (cmdbuf->state.gfx.xfb.active &&
needs_decomposition(dyns->ia.primitive_topology)) {
/* Compute decomposed count + build synthetic index buffer */
/* Override draw's topology + index buffer in the existing state */
/* Save/restore so user's actual bind state isn't trashed */
}
```
The save/restore is critical — the user might issue more draws with the same topology after the XFB-active one. We don't want to corrupt their state.
Three sub-paths in implementation:
1. **CmdDraw + non-LIST topology + XFB active**: easiest. Synthetic index buffer is just `{decomp_idx(0), decomp_idx(1), ...}`. Convert draw to indexed.
2. **CmdDrawIndexed + non-LIST + XFB**: must resolve through user's index buffer. CPU-side: map user's index buffer (vkMapMemory? no — we have the GPU VA, would need a host-coherent map). Alternative: build synthetic index buffer that points to **positions in the user's index buffer**, but Bifrost doesn't do double-indirect. So we need CPU resolution.
3. **CmdDrawIndirect + non-LIST + XFB**: GPU compute pass to fill the synthetic index buffer. **Out of iter16 scope.**
For path 2, the user's index buffer is host-mappable if it was created with `HOST_VISIBLE`, but it may also be device-local. We'd need to add a transfer step to copy device-local indices into a host-visible buffer first.
**Simpler path 2 alternative:** dispatch a compute shader that reads the user's index buffer (GPU-side) and writes the synthetic decomposed index buffer (GPU-side). Compute shader code is straightforward (~30 lines GLSL). This avoids the host-visible-buffer requirement entirely.
But path 2's CPU resolve has the cleaner code shape if we restrict to host-visible index buffers as a known limitation. Most CTS tests use host-visible index buffers; the limitation matches real-world usage of XFB+indexed (uncommon).
## Counts of code touched
- `jm/panvk_vX_cmd_draw.c`: ~150 LoC of new decomposition + dispatch override
- `panvk_vX_cmd_draw.c`: ~30 LoC for sysval `vs.num_vertices` update
- `panvk_cmd_draw.h`: ~20 LoC for new helper macros / topology classification
- NEW file `iter16/winding_lower.c` (or inline): ~100 LoC for the 7 topology-specific decomposition tables
- Probe: ~250 LoC (Phase 3)
**Total estimated: ~300 LoC + 250 LoC probe = 550 LoC.** In line with Phase 0 estimate.
— claude-noether, 2026-05-21