initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,139 @@
|
||||
# Phase 2 — design lock for iter16
|
||||
|
||||
## Decisions
|
||||
|
||||
### Q1: Where does decomposition happen — CPU or GPU?
|
||||
|
||||
**Decision: CPU-side index buffer construction.**
|
||||
|
||||
Per-draw CPU cost: building a decomposed index buffer for a 4K-vertex strip is ~12K integer writes — microseconds. Negligible against the per-frame budget. The alternative (compute shader) adds shader compile + dispatch overhead per draw which is worse for small draws. For huge meshes (>100K vertices) the calculation flips, but XFB on strip topologies in real-world apps is uncommon, and apps that do hit it can be handled with a future GPU-path optimization without ABI change.
|
||||
|
||||
### Q2: Path 2 (CmdDrawIndexed + non-LIST + XFB) — what's the strategy?
|
||||
|
||||
**Decision: deferred to follow-up iter.** iter16 handles only CmdDraw (non-indexed) + non-LIST + XFB.
|
||||
|
||||
Rationale: CTS's `winding_*` tests use **non-indexed draws**. The 162 fails categorized in iter15 are all from non-indexed paths. Fixing those gets us the parity number we promised the operator. CmdDrawIndexed + non-LIST + XFB exists as a real case but isn't in the CTS subset we measured — adding it would expand scope without moving the measured pass-rate number that's the campaign artifact.
|
||||
|
||||
For iter16, we **detect** CmdDrawIndexed + non-LIST + XFB and produce a `mesa_loge` warning + still capture (with wrong winding). That's a known soft-gap. Future iter17 can add the compute-shader path if needed.
|
||||
|
||||
### Q3: How to save/restore user's bind state?
|
||||
|
||||
**Decision: snapshot before override, restore after `panvk_cmd_draw_indirect` returns.**
|
||||
|
||||
```c
|
||||
/* Before override */
|
||||
struct panvk_cmd_index_buffer_state ib_save = cmdbuf->state.gfx.ib;
|
||||
VkPrimitiveTopology topo_save = cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
|
||||
|
||||
/* Override + dispatch */
|
||||
cmdbuf->state.gfx.ib.dev_addr = synthetic_buf.gpu;
|
||||
cmdbuf->state.gfx.ib.size = decomposed_count * 4;
|
||||
cmdbuf->state.gfx.ib.index_size = 4;
|
||||
cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology = list_equiv(topo_save);
|
||||
/* Dispatch as indexed-LIST */
|
||||
panvk_cmd_draw_indirect(cmdbuf, &draw_with_decomposed_count);
|
||||
|
||||
/* Restore */
|
||||
cmdbuf->state.gfx.ib = ib_save;
|
||||
cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology = topo_save;
|
||||
```
|
||||
|
||||
The dirty-tracking mechanism will re-mark IB and topology dirty on the next user-issued draw, so the synthetic state is correctly invalidated.
|
||||
|
||||
### Q4: Where does the decomposition table live?
|
||||
|
||||
**Decision: a small static-data table in a new file `panvk_vX_winding.c` (under PAN_ARCH < 9 gate).**
|
||||
|
||||
Per-topology entries:
|
||||
- `vertices_per_primitive_after_decomp` (2 or 3)
|
||||
- `primitive_count(input_vert_count)` lambda
|
||||
- `decompose_vertex(prim_idx, vert_in_prim) → input_vert_index` lambda
|
||||
- `equivalent_list_topology` enum
|
||||
|
||||
API:
|
||||
|
||||
```c
|
||||
struct panvk_winding_table {
|
||||
uint32_t verts_per_prim;
|
||||
uint32_t (*prim_count)(uint32_t in_count);
|
||||
uint32_t (*decompose)(uint32_t prim_idx, uint32_t vert_idx);
|
||||
VkPrimitiveTopology list_equiv;
|
||||
};
|
||||
|
||||
const struct panvk_winding_table *panvk_get_winding_table(VkPrimitiveTopology);
|
||||
|
||||
/* Returns NULL for topologies that don't need decomposition (LIST variants). */
|
||||
```
|
||||
|
||||
Caller:
|
||||
|
||||
```c
|
||||
const struct panvk_winding_table *wt = panvk_get_winding_table(topo);
|
||||
if (wt && cmdbuf->state.gfx.xfb.active) {
|
||||
uint32_t n_prim = wt->prim_count(input_vert_count);
|
||||
uint32_t out_count = n_prim * wt->verts_per_prim;
|
||||
struct pan_ptr buf = panvk_cmd_alloc_dev_mem(cmdbuf, desc, out_count * 4, 8);
|
||||
uint32_t *idx = buf.cpu;
|
||||
for (uint32_t p = 0; p < n_prim; p++)
|
||||
for (uint32_t v = 0; v < wt->verts_per_prim; v++)
|
||||
*idx++ = wt->decompose(p, v);
|
||||
/* Override IB + topology + draw as indexed-LIST */
|
||||
}
|
||||
```
|
||||
|
||||
### Q5: How does `vs.num_vertices` sysval track decomposed count?
|
||||
|
||||
**Decision: at sysval upload time, check `cmdbuf->state.gfx.xfb.decomposed_count != 0` and use it instead of `info->vertex.count`.**
|
||||
|
||||
Add a field `uint32_t decomposed_count` to `cmdbuf->state.gfx.xfb`. Set in the new decomposition path. Reset to 0 after restore.
|
||||
|
||||
In `cmd_prepare_draw_sysvals` (around the existing iter13 `set_gfx_sysval(... vs.num_vertices, info->vertex.count)` line):
|
||||
|
||||
```c
|
||||
uint32_t nv = cmdbuf->state.gfx.xfb.decomposed_count
|
||||
? cmdbuf->state.gfx.xfb.decomposed_count
|
||||
: info->vertex.count;
|
||||
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, nv);
|
||||
```
|
||||
|
||||
### Q6: Topology classification — which need decomposition?
|
||||
|
||||
**Decision:**
|
||||
|
||||
| Topology | Decomposed? | Output verts | List equiv |
|
||||
|---|---|---|---|
|
||||
| POINT_LIST | No | input | (same) |
|
||||
| LINE_LIST | No | input | (same) |
|
||||
| LINE_STRIP | **Yes** | 2(N-1) | LINE_LIST |
|
||||
| TRIANGLE_LIST | No | input | (same) |
|
||||
| TRIANGLE_STRIP | **Yes** | 3(N-2) | TRIANGLE_LIST |
|
||||
| TRIANGLE_FAN | **Yes** | 3(N-2) | TRIANGLE_LIST |
|
||||
| LINE_LIST_WITH_ADJACENCY | **Yes** | N/2 | LINE_LIST (drop adjacency verts) |
|
||||
| LINE_STRIP_WITH_ADJACENCY | **Yes** | 2(N-3) | LINE_LIST |
|
||||
| TRIANGLE_LIST_WITH_ADJACENCY | **Yes** | N/2 | TRIANGLE_LIST |
|
||||
| TRIANGLE_STRIP_WITH_ADJACENCY | **Yes** | 3(N/2-2) | TRIANGLE_LIST |
|
||||
| PATCH_LIST | N/A (tess not advertised) | — | — |
|
||||
|
||||
Seven topologies need decomposition tables. Each is a small lambda + count formula.
|
||||
|
||||
### Q7: When does the iter16 path NOT activate?
|
||||
|
||||
- XFB not active: no-op (fast path unchanged)
|
||||
- LIST or POINT topology: no-op
|
||||
- CmdDrawIndexed (any topology): falls through with warning log (Q2)
|
||||
- Tessellation (PATCH_LIST): we don't expose, never hit
|
||||
- Geometry shaders: not exposed, never hit
|
||||
|
||||
## Scope confirmation
|
||||
|
||||
- **In:** `vkCmdDraw` + LINE_STRIP / TRIANGLE_STRIP / TRIANGLE_FAN / *_WITH_ADJACENCY topologies + XFB active → driver-side decomposition
|
||||
- **Out:** indexed draws (`vkCmdDrawIndexed`) — warning only
|
||||
- **Out:** indirect draws (`vkCmdDrawIndirect`) — unchanged behavior
|
||||
- **Expected CTS delta:** all 162 winding fails → Pass (since they all use non-indexed strip/fan draws)
|
||||
- **Expected CTS new fails:** none
|
||||
|
||||
## Phase 3 next
|
||||
|
||||
Write `probe_winding.c` that exercises XFB+triangle_strip with 8 vertices, captures, and verifies the expected 18-vertex decomposed output. Same probe scaffolding as iter13's probe_xfb.c.
|
||||
|
||||
— claude-noether, 2026-05-21
|
||||
Reference in New Issue
Block a user