a4e7d8ab90
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
145 lines
6.2 KiB
Markdown
145 lines
6.2 KiB
Markdown
# Phase 1 — source map for iter17
|
||
|
||
## `pan_nir_lower_xfb.c` (80 LoC)
|
||
|
||
Anatomy:
|
||
|
||
| Lines | Function | What it does |
|
||
|---|---|---|
|
||
| 9-40 | `lower_xfb_output` | Per (output, channel) → emit ONE `store_global` |
|
||
| 42-77 | `lower_xfb` | Per intrinsic: handle `load_vertex_id` rewrite + dispatch to `lower_xfb_output` for each non-zero channel in the `nir_io_xfb` annotation |
|
||
| 79-84 | `pan_nir_lower_xfb` | Top-level wrapper calling `nir_shader_intrinsics_pass` |
|
||
|
||
### Core formula (lines 23-34)
|
||
|
||
```c
|
||
nir_def *index = nir_iadd(b,
|
||
nir_imul(b, nir_load_instance_id(b), nir_load_num_vertices(b)),
|
||
nir_load_raw_vertex_id_pan(b));
|
||
nir_def *addr = xfb_address[buffer] + index * stride + offset_bytes;
|
||
nir_store_global(b, value, addr);
|
||
```
|
||
|
||
**Critical observation:** `nir_load_num_vertices(b)` is a sysval — already in iter13's `panvk_graphics_sysvals.vs.num_vertices`. iter16's design added a second sysval (`xfb.decomposed_count`) for the override case. iter17 doesn't need that one; we keep input_count in `num_vertices` and do the decomposition arithmetic in the shader using a *third* sysval: `vs.xfb_topology`.
|
||
|
||
## NIR builder pattern we'll use
|
||
|
||
For our panvk-specific replacement pass, the existing single store becomes:
|
||
|
||
```c
|
||
nir_def *topology = load_sysval(b, vs.xfb_topology); /* uint32 */
|
||
|
||
/* Branch per topology family. Each branch emits 1-3 (or more for TRI_FAN)
|
||
* conditional stores per VS invocation. */
|
||
nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
|
||
{
|
||
emit_tri_strip_stores(b, /* contribution arithmetic */);
|
||
}
|
||
nir_push_else(b);
|
||
{
|
||
nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
|
||
{
|
||
emit_line_strip_stores(b, ...);
|
||
}
|
||
/* ... etc per topology ... */
|
||
}
|
||
```
|
||
|
||
## Per-vertex contribution map
|
||
|
||
For each affected topology, **input vertex v** contributes to a small set of `(primitive_idx, slot)` pairs.
|
||
|
||
### TRIANGLE_STRIP (canonical case)
|
||
|
||
Decomposition: prim p emits `{p, p+1+p%2, p+2-p%2}` (even/odd winding flip).
|
||
|
||
Inverse — for input vertex v on a strip of N vertices, contributes to:
|
||
|
||
| Primitive | Eligibility | Slot |
|
||
|---|---|---|
|
||
| p = v | 0 ≤ v ≤ N−3 | 0 |
|
||
| p = v − 1 | 1 ≤ v ≤ N−2 | 1 if (v−1) even, else 2 |
|
||
| p = v − 2 | 2 ≤ v ≤ N−1 | 2 if (v−2) even, else 1 |
|
||
|
||
Up to 3 stores per VS invocation. Each store guarded by the eligibility predicate.
|
||
|
||
### LINE_STRIP
|
||
|
||
Decomposition: prim p emits `{p, p+1}`. Vertex v contributes to:
|
||
|
||
| Primitive | Eligibility | Slot |
|
||
|---|---|---|
|
||
| p = v | 0 ≤ v ≤ N−2 | 0 |
|
||
| p = v − 1 | 1 ≤ v ≤ N−1 | 1 |
|
||
|
||
Up to 2 stores.
|
||
|
||
### TRIANGLE_FAN — the awkward case
|
||
|
||
Decomposition: prim p emits `{p+1, p+2, 0}`. Vertex v contributes to:
|
||
|
||
| Primitive | Eligibility | Slot |
|
||
|---|---|---|
|
||
| p = v − 1 | 1 ≤ v ≤ N−2 | 0 |
|
||
| p = v − 2 | 2 ≤ v ≤ N−1 | 1 |
|
||
| **p = any in [0, N−2)** | **v == 0** | **2** |
|
||
|
||
The **central vertex (v=0)** contributes to ALL primitives as slot 2. That's O(N) stores from a single VS invocation, requiring a **NIR loop** bounded by `num_vertices`.
|
||
|
||
### Adjacency variants
|
||
|
||
- LINE_LIST_WITH_ADJACENCY: prim p emits `{4p+1, 4p+2}`. Vertex v contributes only if (v%4 ∈ {1, 2}) — O(1) stores.
|
||
- LINE_STRIP_WITH_ADJACENCY: prim p emits `{p+1, p+2}`. Similar to LINE_STRIP shifted by 1. O(1) stores.
|
||
- TRIANGLE_LIST_WITH_ADJACENCY: prim p emits `{6p, 6p+2, 6p+4}`. Vertex v contributes only if (v%6 ∈ {0, 2, 4}) — O(1) stores.
|
||
- TRIANGLE_STRIP_WITH_ADJACENCY: prim p emits `{2p, 2p+2, 2p+4}` for even p; `{2p, 2p+4, 2p+2}` for odd. O(1) stores per vertex.
|
||
|
||
## Implications for Phase 2
|
||
|
||
- **6 of 7 affected topologies have O(1) contributions per VS invocation** — straightforward `nir_push_if` + emit.
|
||
- **TRIANGLE_FAN's central vertex needs a NIR loop** — requires `nir_push_loop` and a conditional `nir_break` based on `num_vertices`.
|
||
- **The runtime topology switch** is a 7-way branch on `vs.xfb_topology` sysval (plus a pass-through for LIST topologies). NIR generates clean conditional code; Bifrost backend should optimize it OK.
|
||
|
||
## What the sysval `vs.xfb_topology` looks like
|
||
|
||
8-bit integer in graphics_sysvals struct. Enum values:
|
||
```c
|
||
enum panvk_xfb_topology {
|
||
PANVK_XFB_TOPO_LIST = 0, /* pass-through; current iter13 formula */
|
||
PANVK_XFB_TOPO_LINE_STRIP = 1,
|
||
PANVK_XFB_TOPO_TRI_STRIP = 2,
|
||
PANVK_XFB_TOPO_TRI_FAN = 3,
|
||
PANVK_XFB_TOPO_LINE_LIST_ADJ = 4,
|
||
PANVK_XFB_TOPO_LINE_STRIP_ADJ = 5,
|
||
PANVK_XFB_TOPO_TRI_LIST_ADJ = 6,
|
||
PANVK_XFB_TOPO_TRI_STRIP_ADJ = 7,
|
||
};
|
||
```
|
||
|
||
Driver maps `VkPrimitiveTopology` → `panvk_xfb_topology` at draw time, sets the sysval via `set_gfx_sysval(cmdbuf, dirty, vs.xfb_topology, val)`.
|
||
|
||
## Risk: shader complexity
|
||
|
||
The lowered shader after iter17 will have:
|
||
- 1 sysval load
|
||
- 7 conditional branches
|
||
- 2-3 conditional stores per branch (except TRI_FAN which has a loop)
|
||
- per-store address arithmetic
|
||
|
||
That's a lot for what was a single `store_global`. On Bifrost (in-order architecture), branches are cheap but the increased instruction count + register pressure could hurt throughput.
|
||
|
||
Mitigation: most XFB workloads are tiny (per-frame, dozens to thousands of vertices). The throughput cost is irrelevant for the CTS-driven correctness target. Real-world XFB-heavy workloads (rare on Bifrost) might prefer iter13's single-store path, but those aren't impacted by iter17's correctness fix because the LIST topology still uses the fast path (topology == PANVK_XFB_TOPO_LIST → emit single store).
|
||
|
||
## What to write in Phase 4
|
||
|
||
NEW file: `src/panfrost/vulkan/panvk_vX_xfb_lower.c` — a panvk-specific replacement for `pan_nir_lower_xfb`. Calls into pieces of pan_nir_lower_xfb for the LIST case (or re-implements its minimal logic) and adds the per-topology contribution branches for the others. Exposed as `panvk_per_arch(nir_lower_xfb)(nir_shader *)`.
|
||
|
||
MODIFIED: `panvk_vX_shader.c` — replace the `NIR_PASS(_, nir, pan_nir_lower_xfb)` call with `NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb))`.
|
||
|
||
MODIFIED: `panvk_shader.h` — add `vs.xfb_topology` to sysval struct.
|
||
|
||
MODIFIED: `panvk_vX_cmd_draw.c::cmd_prepare_draw_sysvals` — at draw time, map current topology to enum + `set_gfx_sysval(..., vs.xfb_topology, mapped)`.
|
||
|
||
Phase 4 LoC estimate: ~250 (replacement pass) + 30 (sysval threading + draw-time topology map) ≈ 280 LoC.
|
||
|
||
— claude-noether, 2026-05-21
|