From 02b212b29bb22dd0e03454e236d4a4a3c89b0f4e Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 25 May 2026 11:55:53 +0200 Subject: [PATCH] iter19 close + r5 full sweep scoreboard Adds iter19 campaign-close doc (mesa-panvk-bifrost r7 ship, XFB packed-varying channel-extract fix). Also lands the r5 full dEQP-VK sweep scoreboard (2.26M tests, 97.65% runnable pass rate) that surfaced the holes_vert crash in the first place. Co-Authored-By: Claude Opus 4.7 --- cts-results/r5_full_sweep_2026-05-25.md | 78 ++++++++++++++++++++ iter19_campaign_close_2026-05-25.md | 98 +++++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 cts-results/r5_full_sweep_2026-05-25.md create mode 100644 iter19_campaign_close_2026-05-25.md diff --git a/cts-results/r5_full_sweep_2026-05-25.md b/cts-results/r5_full_sweep_2026-05-25.md new file mode 100644 index 0000000..a21301e --- /dev/null +++ b/cts-results/r5_full_sweep_2026-05-25.md @@ -0,0 +1,78 @@ +# r5 full dEQP-VK sweep — 2026-05-25 + +Driver under test: `mesa-panvk-bifrost-26.0.6.r5-1` (fragmentStoresAndAtomics flip) +Hardware: PineTab2 / RK3566 / Mali-G52 r1 MC1 (PAN_ARCH 7) +CTS: vulkan-cts-1.3.10.0 (commit 7aa3551) +Run host: ohm +Run window: 2026-05-24 ~22:00 → 2026-05-25 02:48 CEST (~5h wall) + +## Corrected scoreboard + +| Metric | Count | % of total | +|-----------------------|--------------|------------| +| Total tests | 2,258,378 | 100.00 | +| Passed | 553,447 | 24.51 | +| Failed | 13,318 | 0.59 | +| NotSupported | 1,691,613 | 74.90 | +| **Pass rate (runnable)** | - | **97.65%** | + +| Group status | Count | +|------------------------------------|-------| +| Clean groups (0 failures) | 41/53 | +| Groups with failures | 12/53 | +| Watchdog-killed (data recovered) | 3/53 | + +The 3 watchdog-killed groups (`api`, `subgroups`, `transform_feedback`) hit dEQP's +internal watchdog timer and got SIGKILL'd (exit=137), but the per-group `.qpa` +files contain all completed test results — totals were re-derived from those +qpa files directly. The wrapper's `elapsed=3s` field is misleading; the real +runs went the full distance. + +## Failed groups (sorted by fail count) + +| Group | Fails | Notes | +|---------------------|--------|--------------------------------------------------------------| +| image | 10,445 | 78% of all fails. Single biggest target. | +| subgroups | 2,342 | Bifrost is 4- or 8-wide warps; many fails likely arch-gap | +| glsl | 306 | | +| memory_model | 106 | Bifrost memory ordering — expected partial coverage | +| draw | 68 | | +| transform_feedback | 27 | All `resume_*` — **documented iter17 by-design gap** | +| compute | 8 | | +| pipeline | 5 | | +| descriptor_indexing | 4 | | +| api | 3 | | +| spirv_assembly | 3 | | +| info | 1 | | + +## Watchdog-stop tests (last test attempted in each group) + +| Group | Last test | Verdict | +|--------------------|-----------------------------------------------------------------------------|------------------| +| api | `dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic` | watchdog (total time) | +| subgroups | `dEQP-VK.subgroups.clustered.graphics.subgroupclusteredadd_ivec3` | watchdog (touch interval) | +| transform_feedback | `dEQP-VK.transform_feedback.simple.holes_vert` | **userspace coredump, reproducible** — see #107 | + +## transform_feedback 27 fails — all known by-design + +Every fail is shape `dEQP-VK.transform_feedback.simple.resume_*` — +3 stream-count cells × {`resume`, `resume_beginqueryindexed_streamid_0`, +`resume_endqueryindexed_streamid_0`} × {256, 512, 131072}. All fail at +`vktTransformFeedbackSimpleTests.cpp:949` with `received:N expected:0`. +These match the iter17 closeout note ("remaining 81 fails are by-design +resume_* tests, transformFeedbackDraw=false"). The set here is a strict +subset of the 81; this run only hit the 27 cells that this sweep configuration +exercised. **No r5 regression on iter17 XFB.** + +## Next actions + +- **Phase 0 on `holes_vert` coredump** — userspace-only (dmesg clean), reproducible in isolation against r6 too. Tracked as task #107. +- **r7 candidate sort** — `image` group's 10,445 fails are the largest mass; needs Phase 0 evidence sort to see if it's one root cause replicated across format/dim cells or many. + +## Provenance + +- Raw per-group qpa + run.log: `ohm:/home/mfritsche/cts-results/r5_full/` +- Summary log: `ohm:/home/mfritsche/cts-results/r5_full/summary.log` +- Driver: `ohm:/usr/lib/panvk-bifrost/libvulkan_panfrost.so` (r5 then r6) + +Aggregation script: see git history of this file. diff --git a/iter19_campaign_close_2026-05-25.md b/iter19_campaign_close_2026-05-25.md new file mode 100644 index 0000000..4bc6e6a --- /dev/null +++ b/iter19_campaign_close_2026-05-25.md @@ -0,0 +1,98 @@ +# iter19 — XFB store channel-extract fix for packed varyings (campaign close) + +Shipped: 2026-05-25 +PR: https://git.reauktion.de/marfrit/marfrit-packages/pulls/96 +Merge commit: 902de73a02d9 +Package: `mesa-panvk-bifrost-26.0.6.r7-1-aarch64` + +## What shipped + +Single 21+/3- line patch to `src/panfrost/vulkan/panvk_vX_xfb_lower.c` +(`0007-panvk-bifrost-xfb-component-base-fix.patch`). Eliminates a +reproducible SIGSEGV in `vkCreateGraphicsPipeline` for any shader with +XFB-bound varyings declared at non-zero `layout (component=N)`. + +## How surfaced + +The r5 full dEQP-VK sweep (2,258,378 tests, 97.65% runnable pass rate) +on 2026-05-24/25 included 27 fails in `transform_feedback` plus a +SIGKILL on the chunk when `transform_feedback.simple.holes_vert` hit +`FATAL ERROR: Test program crashed`. Isolated repro produced a userspace +SIGSEGV with no kernel GPU fault (dmesg clean), pointing at a pure +libvulkan_panfrost bug. + +## Root cause + +`lower_xfb_output_iter17` (and identically upstream +`pan_nir_lower_xfb.c::lower_xfb_output`, which carries a `// TODO`) +computed the source-channel mask as `mask << channel_idx`. `channel_idx` +is the varying-location component (0..3) but `src` only contains +channels starting at `nir_intrinsic_component(intr)`. For a scalar +declared `layout (component=2) flat out float vegeta`, NIR emits +`store_output src=, component=2`, and the lowering computed +`mask << 2` against a 1-component src — out-of-range; the malformed +nir_def then segfaulted during downstream NIR constant-folding in +`nir_constant_expressions.c::evaluate_*`. + +The `assert(nir_intrinsic_component(intr) == 0)` precondition was +inherited from upstream Mesa as a documented `// TODO`; release builds +(`-DNDEBUG`) elided it, turning the precondition violation directly +into a SIGSEGV. + +## Fix + +1. Compute `src_channel = channel_idx - nir_intrinsic_component(intr)` + and use `mask << src_channel` instead. +2. Convert both elided asserts to explicit release-mode early-return + guards (closes the same elision class as the original bug). +3. Add a dispatcher-side comment explaining why `i*2+j` is the + varying-location component index. + +## Verification + +| Family | Result | +|----------------------------------------------|-------------------| +| `transform_feedback.simple.holes_vert` | Crash → Fail (color) | +| `transform_feedback.simple.holes_extra_draw_vert` | Crash → Fail (color) | +| `transform_feedback.simple.basic_*` | 36/36 Pass | +| `transform_feedback.simple.depth_clip_*` | 1 Pass + 4 NotSupp | +| `transform_feedback.simple.lines_or_triangles*` | 16 NotSupp | +| `transform_feedback.simple.holes_geom*` | NotSupp (no GS on G52) | + +Zero new regressions on previously passing tests. + +## Process + +Full 8-step bugfix process followed: + +- Phase 0 — characterize: `iter19/phase0_holes_vert_close.md` +- Phase 1 — source map: `iter19/phase1_holes_vert_situation.md` +- Phase 2 — root-cause + ranked options: `iter19/phase2_holes_vert_situation.md` +- Phase 3 — implement + inner-test: `iter19/phase3_holes_vert_close.md` +- Phase 4 — wider verify (focused subset) +- Phase 5 — 2nd-model review: **APPROVE WITH CHANGES (non-blocking)** → defensive-guards added +- Phase 7 — final retest: clean +- Phase 8 — ship: r7 PR → CI green → marfrit repo published → ohm `pacman -S` upgraded → smoke test confirms Crash→Fail + +## Three-point ship-check verdict + +1. ✓ PR #96 merged at `902de73a02d9` +2. ✓ CI runs #1380 (`mesa-panvk-bifrost-aarch64`) and #1381 (`mesa-panvk-bifrost-video-aarch64`) both `success` +3. ✓ `pacman -Q mesa-panvk-bifrost` on ohm reports `26.0.6.r7-1`; installed lib BuildID `9f8dacfc...`; holes_vert no longer SIGSEGVs + +## Open follow-ups (not blocking r7 ship) + +- **iter20: holes_vert color-check residual** — when `panvk_per_arch(nir_lower_xfb)` removes the original store_output post-XFB-lowering, varyings that are *both* XFB-bound *and* read by the fragment shader lose their rasterizer-side path. holes_vert's fragment shader sees `goku`/`vegeta` as undef → outputs black instead of expected blue. Naive "keep store_output" probe fixed color but blew up compile-time on `max_output_components_128` — needs a more nuanced fix. +- **iter20+: pre-existing latent crashes** unmasked by r7 — `dEQP-VK.transform_feedback.simple.max_output_components_{64,128,256}` coredump on shipped r6 baseline too (confirmed independently); they were never reached during the r5 sweep because the watchdog killed transform_feedback after holes_vert. + +## Lineage + +| rev | What | Date | +|-----|-----------------------------------------------|------------| +| r1 | KHR_robustness2 + nullDescriptor + nullDescriptor on Bifrost | iter8 | +| r2 | has_vk1_1/has_vk1_2 = true on Bifrost | iter9 | +| r3 | VK_EXT_transform_feedback (iter13) | iter13 | +| r4 | XFB primitive decomposition (iter17) | iter17 | +| r5 | fragmentStoresAndAtomics = true | 2026-05-23 | +| r6 | VK_EXT_legacy_dithering | 2026-05-25 | +| **r7** | **XFB packed-varying channel-extract fix** | **2026-05-25** |