Compare commits

...

2 Commits

Author SHA1 Message Date
marfrit 02b212b29b iter19 close + r5 full sweep scoreboard
Adds iter19 campaign-close doc (mesa-panvk-bifrost r7 ship, XFB
packed-varying channel-extract fix). Also lands the r5 full
dEQP-VK sweep scoreboard (2.26M tests, 97.65% runnable pass rate)
that surfaced the holes_vert crash in the first place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 11:55:53 +02:00
marfrit 6019cce7d1 research: r6/r7 Mali-G52 r1 MC1 feature audit (multi-language sourcing)
Multi-language web research (EN/CN/RU/KO/JA/Bayrisch) on the actual
hardware feature set of Mali-G52 r1 MC1 vs what Mesa 26.0.6 panvk
advertises. Goal: identify candidate downstream patches in the same
shape as r1-r5.

Top-3 r6/r7 candidates surfaced:

  1. r6 = VK_EXT_pipeline_robustness — 1-line flip, composes on top of
     our r1 KHR_robustness2, real consumer value (DXVK/vkd3d/Wine).

  2. r6.5 = small-bundle (depth_clip_control, depth_clip_enable,
     provoking_vertex, load_store_op_none, pageable_device_local_memory,
     memory_priority) — each individually small, together meaningfully
     widens the D3D-to-Vulkan translation matrix.

  3. r7 = FB-fetch + dynamic_rendering_local_read paired — real
     engineering iteration, multi-week. Bifrost TBDR tile memory
     supports this; Panfrost GL already implements FB fetch
     (Mesa MR !5755). PanVK port needed.

Confirmed not-candidates: sparseResidency*, subgroupSize ≥ 16, mesh /
RT / FSR / 64-bit atomics — silicon-absent on G52.

Source archaeology: the leaked ARM Mali-G52 Software Developer Manual
is not in the wild (multi-language search came up dry). Mesa source is
the authoritative reference; iter18 already confirmed 0 Vulkan symbols
in vendor libmali-bifrost-g52-*.so. Panfrost is and will be the only
Vulkan driver this hardware ever has.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 23:38:43 +02:00
3 changed files with 279 additions and 0 deletions
+78
View File
@@ -0,0 +1,78 @@
# r5 full dEQP-VK sweep — 2026-05-25
Driver under test: `mesa-panvk-bifrost-26.0.6.r5-1` (fragmentStoresAndAtomics flip)
Hardware: PineTab2 / RK3566 / Mali-G52 r1 MC1 (PAN_ARCH 7)
CTS: vulkan-cts-1.3.10.0 (commit 7aa3551)
Run host: ohm
Run window: 2026-05-24 ~22:00 → 2026-05-25 02:48 CEST (~5h wall)
## Corrected scoreboard
| Metric | Count | % of total |
|-----------------------|--------------|------------|
| Total tests | 2,258,378 | 100.00 |
| Passed | 553,447 | 24.51 |
| Failed | 13,318 | 0.59 |
| NotSupported | 1,691,613 | 74.90 |
| **Pass rate (runnable)** | - | **97.65%** |
| Group status | Count |
|------------------------------------|-------|
| Clean groups (0 failures) | 41/53 |
| Groups with failures | 12/53 |
| Watchdog-killed (data recovered) | 3/53 |
The 3 watchdog-killed groups (`api`, `subgroups`, `transform_feedback`) hit dEQP's
internal watchdog timer and got SIGKILL'd (exit=137), but the per-group `.qpa`
files contain all completed test results — totals were re-derived from those
qpa files directly. The wrapper's `elapsed=3s` field is misleading; the real
runs went the full distance.
## Failed groups (sorted by fail count)
| Group | Fails | Notes |
|---------------------|--------|--------------------------------------------------------------|
| image | 10,445 | 78% of all fails. Single biggest target. |
| subgroups | 2,342 | Bifrost is 4- or 8-wide warps; many fails likely arch-gap |
| glsl | 306 | |
| memory_model | 106 | Bifrost memory ordering — expected partial coverage |
| draw | 68 | |
| transform_feedback | 27 | All `resume_*`**documented iter17 by-design gap** |
| compute | 8 | |
| pipeline | 5 | |
| descriptor_indexing | 4 | |
| api | 3 | |
| spirv_assembly | 3 | |
| info | 1 | |
## Watchdog-stop tests (last test attempted in each group)
| Group | Last test | Verdict |
|--------------------|-----------------------------------------------------------------------------|------------------|
| api | `dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic` | watchdog (total time) |
| subgroups | `dEQP-VK.subgroups.clustered.graphics.subgroupclusteredadd_ivec3` | watchdog (touch interval) |
| transform_feedback | `dEQP-VK.transform_feedback.simple.holes_vert` | **userspace coredump, reproducible** — see #107 |
## transform_feedback 27 fails — all known by-design
Every fail is shape `dEQP-VK.transform_feedback.simple.resume_*`
3 stream-count cells × {`resume`, `resume_beginqueryindexed_streamid_0`,
`resume_endqueryindexed_streamid_0`} × {256, 512, 131072}. All fail at
`vktTransformFeedbackSimpleTests.cpp:949` with `received:N expected:0`.
These match the iter17 closeout note ("remaining 81 fails are by-design
resume_* tests, transformFeedbackDraw=false"). The set here is a strict
subset of the 81; this run only hit the 27 cells that this sweep configuration
exercised. **No r5 regression on iter17 XFB.**
## Next actions
- **Phase 0 on `holes_vert` coredump** — userspace-only (dmesg clean), reproducible in isolation against r6 too. Tracked as task #107.
- **r7 candidate sort** — `image` group's 10,445 fails are the largest mass; needs Phase 0 evidence sort to see if it's one root cause replicated across format/dim cells or many.
## Provenance
- Raw per-group qpa + run.log: `ohm:/home/mfritsche/cts-results/r5_full/`
- Summary log: `ohm:/home/mfritsche/cts-results/r5_full/summary.log`
- Driver: `ohm:/usr/lib/panvk-bifrost/libvulkan_panfrost.so` (r5 then r6)
Aggregation script: see git history of this file.
+98
View File
@@ -0,0 +1,98 @@
# iter19 — XFB store channel-extract fix for packed varyings (campaign close)
Shipped: 2026-05-25
PR: https://git.reauktion.de/marfrit/marfrit-packages/pulls/96
Merge commit: 902de73a02d9
Package: `mesa-panvk-bifrost-26.0.6.r7-1-aarch64`
## What shipped
Single 21+/3- line patch to `src/panfrost/vulkan/panvk_vX_xfb_lower.c`
(`0007-panvk-bifrost-xfb-component-base-fix.patch`). Eliminates a
reproducible SIGSEGV in `vkCreateGraphicsPipeline` for any shader with
XFB-bound varyings declared at non-zero `layout (component=N)`.
## How surfaced
The r5 full dEQP-VK sweep (2,258,378 tests, 97.65% runnable pass rate)
on 2026-05-24/25 included 27 fails in `transform_feedback` plus a
SIGKILL on the chunk when `transform_feedback.simple.holes_vert` hit
`FATAL ERROR: Test program crashed`. Isolated repro produced a userspace
SIGSEGV with no kernel GPU fault (dmesg clean), pointing at a pure
libvulkan_panfrost bug.
## Root cause
`lower_xfb_output_iter17` (and identically upstream
`pan_nir_lower_xfb.c::lower_xfb_output`, which carries a `// TODO`)
computed the source-channel mask as `mask << channel_idx`. `channel_idx`
is the varying-location component (0..3) but `src` only contains
channels starting at `nir_intrinsic_component(intr)`. For a scalar
declared `layout (component=2) flat out float vegeta`, NIR emits
`store_output src=<vec1>, component=2`, and the lowering computed
`mask << 2` against a 1-component src — out-of-range; the malformed
nir_def then segfaulted during downstream NIR constant-folding in
`nir_constant_expressions.c::evaluate_*`.
The `assert(nir_intrinsic_component(intr) == 0)` precondition was
inherited from upstream Mesa as a documented `// TODO`; release builds
(`-DNDEBUG`) elided it, turning the precondition violation directly
into a SIGSEGV.
## Fix
1. Compute `src_channel = channel_idx - nir_intrinsic_component(intr)`
and use `mask << src_channel` instead.
2. Convert both elided asserts to explicit release-mode early-return
guards (closes the same elision class as the original bug).
3. Add a dispatcher-side comment explaining why `i*2+j` is the
varying-location component index.
## Verification
| Family | Result |
|----------------------------------------------|-------------------|
| `transform_feedback.simple.holes_vert` | Crash → Fail (color) |
| `transform_feedback.simple.holes_extra_draw_vert` | Crash → Fail (color) |
| `transform_feedback.simple.basic_*` | 36/36 Pass |
| `transform_feedback.simple.depth_clip_*` | 1 Pass + 4 NotSupp |
| `transform_feedback.simple.lines_or_triangles*` | 16 NotSupp |
| `transform_feedback.simple.holes_geom*` | NotSupp (no GS on G52) |
Zero new regressions on previously passing tests.
## Process
Full 8-step bugfix process followed:
- Phase 0 — characterize: `iter19/phase0_holes_vert_close.md`
- Phase 1 — source map: `iter19/phase1_holes_vert_situation.md`
- Phase 2 — root-cause + ranked options: `iter19/phase2_holes_vert_situation.md`
- Phase 3 — implement + inner-test: `iter19/phase3_holes_vert_close.md`
- Phase 4 — wider verify (focused subset)
- Phase 5 — 2nd-model review: **APPROVE WITH CHANGES (non-blocking)** → defensive-guards added
- Phase 7 — final retest: clean
- Phase 8 — ship: r7 PR → CI green → marfrit repo published → ohm `pacman -S` upgraded → smoke test confirms Crash→Fail
## Three-point ship-check verdict
1. ✓ PR #96 merged at `902de73a02d9`
2. ✓ CI runs #1380 (`mesa-panvk-bifrost-aarch64`) and #1381 (`mesa-panvk-bifrost-video-aarch64`) both `success`
3.`pacman -Q mesa-panvk-bifrost` on ohm reports `26.0.6.r7-1`; installed lib BuildID `9f8dacfc...`; holes_vert no longer SIGSEGVs
## Open follow-ups (not blocking r7 ship)
- **iter20: holes_vert color-check residual** — when `panvk_per_arch(nir_lower_xfb)` removes the original store_output post-XFB-lowering, varyings that are *both* XFB-bound *and* read by the fragment shader lose their rasterizer-side path. holes_vert's fragment shader sees `goku`/`vegeta` as undef → outputs black instead of expected blue. Naive "keep store_output" probe fixed color but blew up compile-time on `max_output_components_128` — needs a more nuanced fix.
- **iter20+: pre-existing latent crashes** unmasked by r7 — `dEQP-VK.transform_feedback.simple.max_output_components_{64,128,256}` coredump on shipped r6 baseline too (confirmed independently); they were never reached during the r5 sweep because the watchdog killed transform_feedback after holes_vert.
## Lineage
| rev | What | Date |
|-----|-----------------------------------------------|------------|
| r1 | KHR_robustness2 + nullDescriptor + nullDescriptor on Bifrost | iter8 |
| r2 | has_vk1_1/has_vk1_2 = true on Bifrost | iter9 |
| r3 | VK_EXT_transform_feedback (iter13) | iter13 |
| r4 | XFB primitive decomposition (iter17) | iter17 |
| r5 | fragmentStoresAndAtomics = true | 2026-05-23 |
| r6 | VK_EXT_legacy_dithering | 2026-05-25 |
| **r7** | **XFB packed-varying channel-extract fix** | **2026-05-25** |
@@ -0,0 +1,103 @@
# Mali-G52 r1 MC1 Feature Delta — r6/r7 Candidate Audit (2026-05-24)
Multi-language research (EN/CN/RU/KO/JA/Bayrisch) into HW capabilities of
Mali-G52 r1 MC1 vs what Mesa 26.0.6 panvk advertises. Goal: identify
candidate downstream patches (r6/r7) in the same shape as r1-r5.
## Hardware
- ARM Mali-G52 r1 MC1 (Bifrost gen-2, PAN_ARCH 7)
- Single shader core, 800 MHz peak
- Shipped in RK3566 (PineTab2, PineNote, Quartz64-B)
- TBDR architecture with tile memory (~16 KB per shader core, 8 KB per
pixel per ARM developer docs)
## Authoritative sources
- ARM developer docs:
- "Bifrost Shader Core" — developer.arm.com/documentation/102546
- "Pixel Local Storage on Arm Mali GPUs" — ARM community blog
- "Framebuffer Fetch in Vulkan" — ARM community blog
- chipsandcheese.com Bifrost-G52 teardown (May 2025)
- Mesa 26.0 / 26.1 release notes
- Christian Gmeiner's "PanVK Extension Sprint" blog post (Apr 2026)
- Rockchip RK3566 datasheet (boardcon mirror)
## Confirmed HW-supported features under-exposed by upstream Mesa (PAN_ARCH < 9 gates)
These are candidate flips in the same shape as r1..r5.
### High-confidence pure-software flips (small-scope r6 candidates)
| Feature | Why HW-doable | Why panvk hides it |
|---|---|---|
| `VK_EXT_pipeline_robustness` | Software-level robustness selector; composes on top of our r1 KHR_robustness2 flip | Not advertised on PAN_ARCH<10. No HW dep. |
| `VK_EXT_depth_clip_control` / `VK_EXT_depth_clip_enable` | Mali has LOW_DEPTH_CLAMP / HIGH_DEPTH_CLAMP registers per Gmeiner's blog | Just not wired in panvk_vX_physical_device.c |
| `VK_EXT_provoking_vertex` | Panfrost GL already supports it on Bifrost; just a selector | Not wired |
| `VK_EXT_load_store_op_none` / `VK_KHR_load_store_op_none` | Pure Vulkan spec relaxation; no HW change | Not advertised |
| `VK_EXT_pageable_device_local_memory` / `VK_EXT_memory_priority` | Pure software; UMA-friendly | Not advertised |
### Medium-confidence HW-touch flips (need NIR plumbing, not just flag)
| Feature | HW support evidence | Effort |
|---|---|---|
| `shaderStorageImageMultisample` | Bifrost ALU + tile memory can do MSAA SSBO stores per ARM blob exposure | NIR dirty-bit work |
| `shaderStorageImageReadWithoutFormat` / `WriteWithoutFormat` | Bifrost LD/ST has typed + untyped paths; Mesa 25.1 flipped `shaderStorageImageExtendedFormats` already | NIR pass refinement |
| `VK_EXT_extended_dynamic_state3` (subset) | Each piece is dynamic-state plumbing, not new HW | Per-piece evaluation |
### High-confidence HW-real flips (multi-week r7 territory)
| Feature | HW evidence | Engineering scope |
|---|---|---|
| `VK_EXT_rasterization_order_attachment_access` | TBDR tile memory exists; Panfrost GL implements FB fetch (Mesa MR !5755) | Real PanVK FB-fetch plumbing — port from GL path |
| `VK_KHR_dynamic_rendering_local_read` | Maps to FB fetch / tile memory which G52 has | Pairs with the above; design together |
| Tile-image / pixel-local-storage Vulkan exposure | Mali tile memory ~16 KB per SC; ARM exposes PLS GLES-only natively | Substantive driver feature |
## Confirmed HW limitations (NOT candidates)
These are silicon-absent on G52, not patch surface:
- `sparseResidency*` — Bifrost MMU lacks sparse residency model (Mesa gates to v10+)
- `subgroupSize` ≥ 16 / advanced subgroup ops — Bifrost is 4- or 8-wide warps
- `VK_EXT_nested_command_buffer` — needs CSF (v10+)
- `VK_KHR_shader_untyped_pointers` — explicit "Bifrost has issues" per Gmeiner blog
- Video codec extensions — Mali has no video silicon (hantro VPU territory; covered by mesa-panvk-bifrost-video sibling)
- `shaderInt64`, 64-bit atomics, mesh shaders, hardware ray tracing, fragment shading rate — all silicon-absent
## Needs HW probing (uncertain)
- `sampleRateShading` — Vulkan 1.0 mandatory; if currently false on v7 it's likely a flip candidate
- `VK_EXT_sample_locations` — Bifrost rasterizer has programmable sample positions per ARM docs
- `dualSrcBlend` on v7 — Panfrost GL supports dual-source per Icecream95 notes; PanVK status unclear
- `VK_EXT_filter_cubic` — unclear if Bifrost texturing has cubic sampler natively
Test path: run targeted dEQP-VK subsets against the existing r5 driver and observe `NotSupported` vs `Fail` distribution; cross-reference with Panfrost GL coverage of the same hardware paths.
## Multi-language source notes
- 🇨🇳 Chinese: Rockchip RK3566 datasheet (boardcon mirror) confirms Mali-G52 2EE, Vulkan 1.1, OpenCL 2.0, AFBC/ASTC. Zhihu article zhuanlan #480270449 confirms panvk historic non-conformant status — background, no new HW info.
- 🇷🇺 Russian: opennet.ru posts 62674 / 55845 confirm PanVK conformance restricted to v10+ (G610/G310 only), G52 explicitly non-conformant. Matches English sources.
- 🇯🇵 Japanese: 0 useful Mali-G52-specific hits across Qiita / Hatena.
- 🇰🇷 Korean: 0 hits. RK3566 not deployed in Korean SoC ecosystem; Korean Mali experience is via Exynos with different gen.
- 🇩🇪 Bayrisch: 0 Stammtisch threads on Mali-G52. Recorded for posterity — the void where Bavarian GPU forums should be is now established as a load-bearing fact.
## The leaked ARM Software Developer's Manual
**Not found in the wild.** Multiple targeted searches (English/Russian/Chinese) for "Mali-G52 Software Developer Manual PDF leak datasheet" returned only the public ARM developer-portal product page and GitHub kernel-driver mirrors (batocera-linux/mali-bifrost, LibreELEC/mali-bifrost). The Bifrost ISA reference is reverse-engineered (Panfrost team's `src/panfrost/compiler/`), not vendor-published.
Implication: Mesa's own source is the authoritative "what's possible on Bifrost" reference. For future iterations, grep `src/panfrost/compiler/bi_test_*.c` and panvk feature gates (`PAN_ARCH < N` checks) before chasing leaks.
Confirms iter18's earlier finding: 0 Vulkan symbols in vendor libmali-bifrost-g52-*.so. The mali-bifrost-g52 has no proprietary Vulkan implementation anywhere — Panfrost is the only Vulkan driver this hardware will ever have.
## Top 3 ranked recommendations
1. **r6 = `VK_EXT_pipeline_robustness` alone** — smallest scope, real consumer value (DXVK/vkd3d/Wine D3D translation paths). 1-line panvk_vX_physical_device.c flip.
2. **r6.5 = small-bundle (`depth_clip_control` + `depth_clip_enable` + `provoking_vertex` + `load_store_op_none`)** — each is a separate small flip; together they meaningfully widen the D3D-to-Vulkan translation layer matrix.
3. **r7 = FB-fetch / tile-image extensions paired** (`VK_EXT_rasterization_order_attachment_access` + `VK_KHR_dynamic_rendering_local_read`) — multi-week real engineering iteration. Phase-0-substrate first to map Mali tile-memory primitives to Vulkan attachment-access semantics. Unlocks deferred-shading paths without bouncing through main memory.
## Out of scope
- Upstreaming any of these — per [[feedback-no-upstream-proposals]] our channel is marfrit-packages downstream.
- Chasing the leaked manual further — diminishing returns vs reading Mesa source.
🤖 Research compiled by Claude Opus 4.7 (general-purpose subagent + main thread) on 2026-05-24.