iter20: holes_vert color-fail + max_output_components_* compile-time blowup (DEFERRED) #5
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Status: DEFERRED
Phase 0 complete (2026-05-25). Naive fix tested, rejected. Deferred behind r7 ship — the crash that was load-bearing is already fixed.
Symptom 1: holes_vert color check
dEQP-VK.transform_feedback.simple.holes_vertandholes_extra_draw_vertFail the fragment-color check on shipped r7 (mesa-panvk-bifrost-26.0.6.r7-1).Image diff:
(0, 0, 1, 0)— blue channel wrong; fragment shader checksgoku == 10.0 && trunks == 20.0 && vegeta == 30.0and outputs black instead of expected blue. At least one of the three rasterized varyings is undefined in the FS.Symptom 2: max_output_components_* compile-time blowup
Independently,
max_output_components_{64,128,256}are broken across r5/r6/r7 — they coredumped on r6, then hung on r7 once iter19's crash-fix let the runner reach them at all.Architectural root cause
lower_xfb_iter17(and identically upstreampan_nir_lower_xfb) callsnir_instr_remove(&intr->instr)on any store_output whose channels were XFB-lowered. This removes the rasterizer-side path along with the XFB-bound channels. For outputs that are BOTH XFB-bound AND read by the FS (the holes_vert case), the FS now sees undef.Same design pattern in
src/gallium/drivers/panfrost/pan_shader.cfor the GL driver — no compensating pass re-emits store_output writes from io_xfb annotations. Upstream Mesa is silent on this; presumably holes_vert just Fails on Valhall too without anyone noticing.Phase 0 measurement
Tested the obvious naive fix: skip the
nir_instr_remove, keep both paths. With iter19's channel-mask correction already in place, the probe was rebuilt on ohm and tested:The probe is functionally correct but compile-time scales super-linearly: each store_output emits a 5-way topology dispatch ladder, so N store_outputs gives ~5N basic blocks; Bifrost's register-alloc / scheduler chokes. Going from 64 → 128 outputs is ≥5× the wall time.
5 min compile for a moderate XFB shader is already bad UX in Brave; 25 min is effectively a freeze. Not shippable.
Phase 2 options (when re-entering)
Option A — hoist topology dispatch. Compute topology + per-vertex output positions ONCE at shader entry; each store references the precomputed SSA values. O(1) ladders per shader regardless of output count. Probably 2-3 day iteration. Recommended path forward when this becomes worth doing.
Option B — accept holes_vert color-fail as a permanent CTS waiver. Only 2 of 7,853 transform_feedback tests check fragment color of packed-component XFB outputs (= 0.025% pass-rate hit). iter19's crash-fix was the load-bearing correctness gain. Defensible end state.
Option C — full XFB-path re-architecture via a single per-vertex-output emission table populated at shader entry. Multi-week iteration. Matches what upstream Mesa would probably do if they cared about packed-component XFB on panfrost.
Why deferred
References
~/src/panvk-bifrost/iter20/phase0_holes_vert_color_close.md— full Phase 0 close