Files

T

marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees

panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-23 05:25:37 +02:00

4.2 KiB

Raw Blame History

Phase 1 — source map for iter16

Explore agent ran 2026-05-21 on /home/mfritsche/src/mesa-ref/mesa/src/panfrost/vulkan/. Mirror state on ohm at /home/mfritsche/mesa-build/mesa-26.0.6/.

Injection points

Entry points (jm/panvk_vX_cmd_draw.c)

Function	Lines	Notes
`panvk_per_arch(CmdDraw)`	1796–1827	sets `draw.info.vertex.count = vertexCount`; calls `panvk_cmd_draw(cmdbuf, &draw)`
`panvk_per_arch(CmdDrawIndexed)`	1830–1868	builds `VkDrawIndexedIndirectCommand` on the fly; calls `panvk_cmd_draw_indirect()`
`panvk_per_arch(CmdDrawIndirect)`	(similar)	GPU-side; out of iter16 scope

Both terminate in prepare_draw(). For info.vs.idvs=false (the iter13-XFB path), the dispatch goes through panvk_draw_prepare_vertex_job + optional tiler.

Pipeline topology

Stored in Vulkan dynamic graphics state as cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology. Accessed in panvk_emit_tiler_primitive() at line 917 via translate_prim_topology(ia->primitive_topology).

Index buffer state

cmdbuf->state.gfx.ib:

.dev_addr — GPU VA
.size — byte count
.index_size — 1/2/4 bytes per index

Bound by vkCmdBindIndexBuffer2 at line 1010 (in panvk_vX_cmd_draw.c, not the jm/ variant).

Scratch BO allocator

panvk_cmd_alloc_dev_mem(cmdbuf, pool_type, size, alignment) returns struct pan_ptr { void *cpu; uint64_t gpu; }. Lifetime tied to command buffer. Used at line 1844 for the synthetic VkDrawIndexedIndirectCommand, at line 459 for varying buffers.

XFB sysval injection

cmd_prepare_draw_sysvals (line 813 in panvk_vX_cmd_draw.c). iter13 added set_gfx_sysval(...vs.xfb_address[N], ...) and set_gfx_sysval(...vs.num_vertices, info->vertex.count).

Phase 2 design implications

Cleanest injection sequence (in panvk_cmd_draw, before the prepare_draw call):

if (cmdbuf->state.gfx.xfb.active &&
    needs_decomposition(dyns->ia.primitive_topology)) {
    /* Compute decomposed count + build synthetic index buffer */
    /* Override draw's topology + index buffer in the existing state */
    /* Save/restore so user's actual bind state isn't trashed */
}

The save/restore is critical — the user might issue more draws with the same topology after the XFB-active one. We don't want to corrupt their state.

Three sub-paths in implementation:

CmdDraw + non-LIST topology + XFB active: easiest. Synthetic index buffer is just {decomp_idx(0), decomp_idx(1), ...}. Convert draw to indexed.
CmdDrawIndexed + non-LIST + XFB: must resolve through user's index buffer. CPU-side: map user's index buffer (vkMapMemory? no — we have the GPU VA, would need a host-coherent map). Alternative: build synthetic index buffer that points to positions in the user's index buffer, but Bifrost doesn't do double-indirect. So we need CPU resolution.
CmdDrawIndirect + non-LIST + XFB: GPU compute pass to fill the synthetic index buffer. Out of iter16 scope.

For path 2, the user's index buffer is host-mappable if it was created with HOST_VISIBLE, but it may also be device-local. We'd need to add a transfer step to copy device-local indices into a host-visible buffer first.

Simpler path 2 alternative: dispatch a compute shader that reads the user's index buffer (GPU-side) and writes the synthetic decomposed index buffer (GPU-side). Compute shader code is straightforward (~30 lines GLSL). This avoids the host-visible-buffer requirement entirely.

But path 2's CPU resolve has the cleaner code shape if we restrict to host-visible index buffers as a known limitation. Most CTS tests use host-visible index buffers; the limitation matches real-world usage of XFB+indexed (uncommon).

Counts of code touched

jm/panvk_vX_cmd_draw.c: ~150 LoC of new decomposition + dispatch override
panvk_vX_cmd_draw.c: ~30 LoC for sysval vs.num_vertices update
panvk_cmd_draw.h: ~20 LoC for new helper macros / topology classification
NEW file iter16/winding_lower.c (or inline): ~100 LoC for the 7 topology-specific decomposition tables
Probe: ~250 LoC (Phase 3)

Total estimated: ~300 LoC + 250 LoC probe = 550 LoC. In line with Phase 0 estimate.

— claude-noether, 2026-05-21

4.2 KiB Raw Blame History Unescape Escape