# Phase 2 — situation analysis for iter8 Opened **2026-05-19** following the RED result in iter8 ([phase0_findings_iter8.md](phase0_findings_iter8.md)). ## What we tested Per iter8 lock: run `eglinfo` and other GL clients via Zink-on-PanVk on ohm, force GL → Vulkan translation, verify Zink picks up PanVk-Bifrost (not llvmpipe). ## What happened Zink refused to load on top of PanVk-Bifrost. The error log: ``` MESA: error: Zink requires the nullDescriptor feature of KHR/EXT robustness2. ``` (Emitted twice — Zink probes twice during EGL setup.) Mesa silently fell back to **llvmpipe** (the LLVM-based software rasterizer). EGL/GL still works, but every pixel is rendered on the CPU. For a workload like TuxRacer this would be unusably slow (single-digit FPS at best on the Cortex-A55s in RK3566). ## Root cause (Mesa source) `src/panfrost/vulkan/panvk_vX_physical_device.c` (Mesa main): ```c line 94: .KHR_robustness2 = PAN_ARCH >= 10, // extension advertisement (KHR) line 194: .EXT_robustness2 = PAN_ARCH >= 10, // extension advertisement (EXT) line 590: .nullDescriptor = PAN_ARCH >= 10, // feature bit ``` Three lines gate the entire robustness2 path on Mali architectures **strictly newer than Valhall-JM**. PAN_ARCH values: - 4/5 — Midgard - 6/7 — Bifrost ← Mali-G52 r1 on ohm is 7 - 9 — Valhall (JM) - 10+ — Valhall (CSF) and fifth-gen The gate `>= 10` means **only CSF-class Valhall and fifth-gen get robustness2**. Bifrost is denied even though the underlying NIR/shader plumbing is already arch-agnostic: ```c panvk_vX_nir_lower_descriptors.c:1309: .null_descriptor_support = dev->vk.enabled_features.nullDescriptor, panvk_vX_shader.c:1355: .robust_descriptors = dev->vk.enabled_features.nullDescriptor, ``` If the feature were *exposed* on Bifrost, these per-arch code paths would handle it. The gate appears to be conservative ("haven't tested on v6/v7/v9") rather than reflecting hardware incapability. ## Why the gate exists Speculation, but informed by [iter1's findings](phase8_iteration1_close.md): the entire Bifrost+Valhall-JM path was set to "not well-tested" — see the same file's [arch gate](phase0_findings.md) at `panvk_physical_device.c:413` that requires `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`. The robustness2 gate is part of the same defensive crouch: don't advertise features that haven't been bench-tested on these archs. iter1–7 proved that the *fundamentals* of the Bifrost driver work. Specifically iter4 ([phase8_iteration4_close.md](phase8_iteration4_close.md)) showed `COMBINED_IMAGE_SAMPLER` descriptors work end-to-end. The risk that "null descriptor" specifically fails on Bifrost is real but bounded — null descriptor means "shader can attempt to read from an unbound descriptor binding without faulting", which is mostly a question of whether the descriptor table has a defined zero entry. PanVk-Bifrost's `bifrost/panvk_vX_meta_desc_copy.c` exists specifically for descriptor table manipulation — the building blocks are there. ## Why this matters Without `nullDescriptor`: - Zink refuses to use PanVk-Bifrost ⇒ fallback to llvmpipe ⇒ no GPU acceleration for any GL app on Bifrost. - TuxRacer-via-Zink (the [README operator-level motivation](README.md)) is **blocked**. - Likely many other modern Vulkan apps that opt into robustness2 (it's a popular extension; conformance tests use it) will also break. This is the campaign's **first real driver gap**. Everything before iter8 was "the gate is defensive but the driver works." This is "the gate genuinely blocks an end-user workload." ## Proposed Phase 4 fix **Minimal patch:** flip the three `PAN_ARCH >= 10` to a wider range that includes Bifrost: ```c - .KHR_robustness2 = PAN_ARCH >= 10, + .KHR_robustness2 = true, /* or PAN_ARCH >= 6 if we want to keep Midgard out */ - .EXT_robustness2 = PAN_ARCH >= 10, + .EXT_robustness2 = true, - .nullDescriptor = PAN_ARCH >= 10, + .nullDescriptor = true, ``` Risk register: 1. **Bifrost's descriptor table may handle null-binding-reads differently from Valhall-CSF.** If the NIR `null_descriptor_support` path emits Bifrost ISA that returns zero on null reads (which is the spec'd behavior for `nullDescriptor`), this works. If Bifrost requires a different sequence and the lowering code doesn't have a v6/v7 branch, we'd get either wrong values or a GPU fault on shaders that read null descriptors. 2. **The KHR/EXT robustness2 also has `nullPointers`, `robustImageAccess2`, `robustBufferAccess2` features.** The gate only mentions `nullDescriptor`, but the extension's other features may have other code paths. Need to check the per-feature gate code. 3. **Untested code paths in panvk_vX_meta_desc_copy.c** — the Bifrost-specific descriptor copy meta path was last touched 2024 (per iter0 file header). May have bit-rotted. Mitigations: - Build the patch as a custom libvulkan_panfrost.so, install side-by-side via `LD_LIBRARY_PATH`, don't overwrite system Mesa. Easy rollback. - Validate stepwise: first vulkaninfo (confirms ext list), then eglinfo (confirms Zink picks PanVk), then es2_info (GL context creates), then a simple GL workload. - Validation layer continuously enabled. ## What this needs from the operator Building Mesa from source on the workstation (or a beefier compile host — `boltzmann`, `data`, distcc cluster) and shipping the patched `libvulkan_panfrost.so` to ohm. That's a **substantial action** the operator should approve: - **Compile time:** Mesa is a big project; expect 30–90 min on a normal aarch64 builder, less with distcc or x86_64 cross-compile. - **Install path:** `LD_LIBRARY_PATH=/home/mfritsche/panvk-patched-libs PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1 ...` keeps it isolated. No system files modified. - **If it works:** publish via marfrit-packages eventually (per the libva-multiplanar fork model), feed Collabora the patch upstream (or carry out-of-tree per `feedback_no_upstream`). - **If it doesn't:** fall back to system Mesa, document what failed. ## Status iter8 is **RED, characterized.** Awaiting operator approval to proceed to Phase 4 (the build + patch step). ## Reference - Phase 0 lock: [phase0_findings_iter8.md](phase0_findings_iter8.md) - Evidence: [phase0_evidence/iter8_zink_failure.txt](phase0_evidence/iter8_zink_failure.txt) - Prior cumulative state: [phase8_iteration7_close.md](phase8_iteration7_close.md) - Mesa source paths (local clone): `~/src/mesa-ref/mesa/src/panfrost/vulkan/`