initial seed: retrofit campaign lineage from local working trees

panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan video decode) shipped before this repo existed; the deliverable patches live in marfrit-packages, but the reasoning chain, phase docs, and source-state evidence lived only in local working trees on the development host. This retrofit imports: - mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18) (libmali stub blobs at iter18/blob/ excluded — 109MB of RE artifacts replaced with a README pointer) - mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe - evidence/ — frozen .tgz source snapshots at each milestone (basis for the 0005 patch diff generation) Future iterations should branch off here from day one, so each iter is a commit rather than a snapshot. See [[feedback-session-local-process-pins]] for the process drift this retrofit closes. Total: 1.9 MB across 124 files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00
parent 430d0da278
commit a4e7d8ab90
124 changed files with 22551 additions and 1 deletions
@@ -0,0 +1,68 @@
+# Phase 0 — substrate lock for iter17
+
+**Goal:** close the 162 `winding_*` CTS failures from iter15 via **NIR-pass-level primitive decomposition** in (a panvk-specific replacement of) `pan_nir_lower_xfb`. iter16 attempted dispatch-level decomposition and hit an opaque wall; this iter bypasses that entire surface.
+
+Operator framing 2026-05-21: "2 it is" — picked Path C from iter16's deferred-close architect consultation.
+
+## What changed since iter16
+
+- iter16's WIP patches REVERTED on ohm. Source tree at `/home/mfritsche/mesa-build/mesa-26.0.6/` is back to clean iter13 r3 state (iter8+iter9 sed-applied + iter13 unified-diff applied).
+- Verification: probe_winding.c against the rebuilt iter13-only lib captures 8 entries for TRIANGLE_STRIP — matches the pre-iter16 baseline.
+- `panvk_vX_winding.c` left on disk as an orphan (not in meson). May be reused as a reference for the per-topology mapping logic when porting to NIR builder form. Or deleted in Phase 4 if unused.
+
+## What iter17 needs (NIR-pass approach)
+
+Currently `pan_nir_lower_xfb` at `src/panfrost/compiler/pan_nir_lower_xfb.c` (80 LoC) emits ONE `nir_store_global` per VS invocation:
+
+```
+index = instance_id * num_vertices + raw_vertex_id_pan
+addr  = xfb_address[buffer] + index * stride + offset
+store_global(addr, captured_value)
+```
+
+For strip/fan/adjacency topologies, the spec wants OUTPUT-VERTEX indexing, not INPUT-vertex indexing. iter17's approach: emit MULTIPLE store_globals per VS invocation, one for each primitive this vertex contributes to. For TRIANGLE_STRIP with input vertex v on a strip of N vertices:
+- Contributes to prim (v−2) if v ≥ 2: slot 2 if (v−2)%2==0 else slot 1
+- Contributes to prim (v−1) if v ≥ 1 and v+1 < N: slot 1 if (v−1)%2==0 else slot 2
+- Contributes to prim v if v+2 < N: slot 0
+
+For each contribution, compute the XFB output position (`prim_idx * verts_per_prim + slot`) and emit a guarded store. All seven affected topologies have similar contribution maps.
+
+## Topology must be available at NIR-pass time
+
+Pipeline compilation doesn't currently know the draw topology — that's draw-state. Two options:
+
+| Approach | Cost | Notes |
+|---|---|---|
+| Variant explosion: compile 1 shader per (XFB-bearing × topology) combo | 1+7 = 8 variants per XFB shader, on top of iter13's 1 variant. Modest shader-cache bloat but no runtime overhead. | Pipeline knows topology at draw-bind time → select variant. |
+| Sysval `vs.xfb_topology` + runtime switch in shader | 1 variant per XFB shader. Single shader with switch on the topology sysval, branches to per-topology contribution logic. | Slight per-VS-invocation overhead from the switch; cleaner cache. |
+
+**Lean: sysval approach** (Phase 2 will lock it). Variant explosion is wasteful when ANGLE (the only real consumer) pre-decomposes anyway and the workload here is purely for raw-Vulkan-app compliance with CTS.
+
+## Out-of-scope failure modes
+
+- `pan_nir_lower_xfb` is **upstream Mesa code shared with Panfrost-Gallium**. Modifying it directly would affect Gallium GL XFB on Bifrost+Valhall — same hardware, different code path consumers. Per [[feedback-no-upstream-proposals]] we won't upstream; per safety we won't disturb the Gallium consumers either.
+- **Decision (locked here):** instead of modifying `pan_nir_lower_xfb`, write a **panvk-specific replacement pass** in `src/panfrost/vulkan/panvk_vX_xfb_lower.c` (or similar) that does what `pan_nir_lower_xfb` does AND the multi-store decomposition. iter13's call to `pan_nir_lower_xfb` in `panvk_vX_shader.c` is replaced with our new pass. Gallium consumers stay untouched.
+
+## Time / complexity estimate
+
+- Phase 1 source map (read pan_nir_lower_xfb.c, understand NIR builders): 1-2h
+- Phase 2 design lock (sysval format, per-topology contribution logic): 1-2h
+- Phase 3 probe: already exists (iter16/probe_winding.c) — just reuse
+- Phase 4 implementation: 1-3 days (write panvk_vX_xfb_lower.c, wire into panvk_vX_shader.c, fix until probe passes)
+- Phase 5 review: spawn janet/Plan reviewer
+- Phase 6 CTS rerun: ~2h
+- Phase 8 PKGBUILD + close: standard
+
+Total estimate: 3-5 working days for the full cycle, comparable to iter16's plan.
+
+## Risk
+
+The iter17 approach trades dispatch-level surface (which broke in iter16) for NIR-pass surface. The NIR-pass is more concentrated and testable in isolation, but Mesa's NIR API is complex. Failure modes for iter17:
+
+- NIR builders for per-vertex contribution logic might not compose right with iter13's existing pan_nir_lower_xfb structure
+- Topology sysval threading might run into the same "shader compile doesn't know topology" issue at a slightly different layer
+- Bifrost compiler might not optimize the multi-store pattern well, causing GPU stalls on register pressure
+
+If iter17 hits a wall as deep as iter16's, the campaign retreats with TWO documented attempt-and-defer iterations on the winding problem. That's still useful — clear documentation that this corner is hard.
+
+— claude-noether, 2026-05-21