iter20: holes_vert color-fail + max_output_components_* compile-time blowup (DEFERRED) #5

Open
opened 2026-05-25 12:21:56 +00:00 by marfrit · 0 comments
Owner

Status: DEFERRED

Phase 0 complete (2026-05-25). Naive fix tested, rejected. Deferred behind r7 ship — the crash that was load-bearing is already fixed.

Symptom 1: holes_vert color check

dEQP-VK.transform_feedback.simple.holes_vert and holes_extra_draw_vert Fail the fragment-color check on shipped r7 (mesa-panvk-bifrost-26.0.6.r7-1).

Image diff: (0, 0, 1, 0) — blue channel wrong; fragment shader checks goku == 10.0 && trunks == 20.0 && vegeta == 30.0 and outputs black instead of expected blue. At least one of the three rasterized varyings is undefined in the FS.

Symptom 2: max_output_components_* compile-time blowup

Independently, max_output_components_{64,128,256} are broken across r5/r6/r7 — they coredumped on r6, then hung on r7 once iter19's crash-fix let the runner reach them at all.

Architectural root cause

lower_xfb_iter17 (and identically upstream pan_nir_lower_xfb) calls nir_instr_remove(&intr->instr) on any store_output whose channels were XFB-lowered. This removes the rasterizer-side path along with the XFB-bound channels. For outputs that are BOTH XFB-bound AND read by the FS (the holes_vert case), the FS now sees undef.

Same design pattern in src/gallium/drivers/panfrost/pan_shader.c for the GL driver — no compensating pass re-emits store_output writes from io_xfb annotations. Upstream Mesa is silent on this; presumably holes_vert just Fails on Valhall too without anyone noticing.

Phase 0 measurement

Tested the obvious naive fix: skip the nir_instr_remove, keep both paths. With iter19's channel-mask correction already in place, the probe was rebuilt on ohm and tested:

Test Probe result Shipped r7
holes_vert PASS Fail (color)
holes_extra_draw_vert PASS Fail (color)
basic_* (36 tests) 36/36 Pass 36/36 Pass
max_output_components_64 PASS in 5m23s hang
max_output_components_128 TIMEOUT at 25m00s hang

The probe is functionally correct but compile-time scales super-linearly: each store_output emits a 5-way topology dispatch ladder, so N store_outputs gives ~5N basic blocks; Bifrost's register-alloc / scheduler chokes. Going from 64 → 128 outputs is ≥5× the wall time.

5 min compile for a moderate XFB shader is already bad UX in Brave; 25 min is effectively a freeze. Not shippable.

Phase 2 options (when re-entering)

  1. Option A — hoist topology dispatch. Compute topology + per-vertex output positions ONCE at shader entry; each store references the precomputed SSA values. O(1) ladders per shader regardless of output count. Probably 2-3 day iteration. Recommended path forward when this becomes worth doing.

  2. Option B — accept holes_vert color-fail as a permanent CTS waiver. Only 2 of 7,853 transform_feedback tests check fragment color of packed-component XFB outputs (= 0.025% pass-rate hit). iter19's crash-fix was the load-bearing correctness gain. Defensible end state.

  3. Option C — full XFB-path re-architecture via a single per-vertex-output emission table populated at shader entry. Multi-week iteration. Matches what upstream Mesa would probably do if they cared about packed-component XFB on panfrost.

Why deferred

  • iter19 (r7) fixed the SIGSEGV which was the load-bearing stability bug
  • The residual is a correctness gap on 2 / 7,853 CTS tests (0.025% pass-rate impact)
  • No Brave / Chromium real-world workload uses packed-component XFB
  • Phase 2 needs ~2-3 days of focused NIR-pass work for Option A; Option C is multi-week
  • Better to surface this as a known limit than to ship a slow compile

References

  • marfrit-packages PR #96 — iter19 ship (the crash-fix that this issue's residual sits behind)
  • ~/src/panvk-bifrost/iter20/phase0_holes_vert_color_close.md — full Phase 0 close
  • Per-shader probe lib kept at ohm:/tmp/iter20_probe_lib/ for Phase 2 use
## Status: DEFERRED Phase 0 complete (2026-05-25). Naive fix tested, rejected. Deferred behind r7 ship — the crash that was load-bearing is already fixed. ## Symptom 1: holes_vert color check `dEQP-VK.transform_feedback.simple.holes_vert` and `holes_extra_draw_vert` Fail the fragment-color check on shipped r7 (mesa-panvk-bifrost-26.0.6.r7-1). Image diff: `(0, 0, 1, 0)` — blue channel wrong; fragment shader checks `goku == 10.0 && trunks == 20.0 && vegeta == 30.0` and outputs black instead of expected blue. At least one of the three rasterized varyings is undefined in the FS. ## Symptom 2: max_output_components_* compile-time blowup Independently, `max_output_components_{64,128,256}` are broken across r5/r6/r7 — they coredumped on r6, then hung on r7 once iter19's crash-fix let the runner reach them at all. ## Architectural root cause `lower_xfb_iter17` (and identically upstream `pan_nir_lower_xfb`) calls `nir_instr_remove(&intr->instr)` on any store_output whose channels were XFB-lowered. This removes the **rasterizer-side path** along with the XFB-bound channels. For outputs that are BOTH XFB-bound AND read by the FS (the holes_vert case), the FS now sees undef. Same design pattern in `src/gallium/drivers/panfrost/pan_shader.c` for the GL driver — no compensating pass re-emits store_output writes from io_xfb annotations. Upstream Mesa is silent on this; presumably holes_vert just Fails on Valhall too without anyone noticing. ## Phase 0 measurement Tested the obvious naive fix: skip the `nir_instr_remove`, keep both paths. With iter19's channel-mask correction already in place, the probe was rebuilt on ohm and tested: | Test | Probe result | Shipped r7 | |---|---|---| | holes_vert | **PASS** | Fail (color) | | holes_extra_draw_vert | **PASS** | Fail (color) | | basic_* (36 tests) | 36/36 Pass | 36/36 Pass | | max_output_components_64 | PASS in **5m23s** | hang | | max_output_components_128 | **TIMEOUT at 25m00s** | hang | The probe is functionally correct but compile-time scales super-linearly: each store_output emits a 5-way topology dispatch ladder, so N store_outputs gives ~5N basic blocks; Bifrost's register-alloc / scheduler chokes. Going from 64 → 128 outputs is ≥5× the wall time. **5 min compile for a moderate XFB shader is already bad UX in Brave; 25 min is effectively a freeze.** Not shippable. ## Phase 2 options (when re-entering) 1. **Option A — hoist topology dispatch.** Compute topology + per-vertex output positions ONCE at shader entry; each store references the precomputed SSA values. O(1) ladders per shader regardless of output count. Probably 2-3 day iteration. Recommended path forward when this becomes worth doing. 2. **Option B — accept holes_vert color-fail as a permanent CTS waiver.** Only 2 of 7,853 transform_feedback tests check fragment color of packed-component XFB outputs (= 0.025% pass-rate hit). iter19's crash-fix was the load-bearing correctness gain. Defensible end state. 3. **Option C — full XFB-path re-architecture** via a single per-vertex-output emission table populated at shader entry. Multi-week iteration. Matches what upstream Mesa would probably do if they cared about packed-component XFB on panfrost. ## Why deferred - iter19 (r7) fixed the SIGSEGV which was the load-bearing stability bug - The residual is a correctness gap on 2 / 7,853 CTS tests (0.025% pass-rate impact) - No Brave / Chromium real-world workload uses packed-component XFB - Phase 2 needs ~2-3 days of focused NIR-pass work for Option A; Option C is multi-week - Better to surface this as a known limit than to ship a slow compile ## References - marfrit-packages PR #96 — iter19 ship (the crash-fix that this issue's residual sits behind) - `~/src/panvk-bifrost/iter20/phase0_holes_vert_color_close.md` — full Phase 0 close - Per-shader probe lib kept at ohm:/tmp/iter20_probe_lib/ for Phase 2 use
marfrit added the deferred label 2026-05-25 12:21:56 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/panvk-bifrost#5