a4e7d8ab90
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
109 lines
6.4 KiB
Markdown
109 lines
6.4 KiB
Markdown
# Phase 2 — situation analysis for iter8
|
||
|
||
Opened **2026-05-19** following the RED result in iter8 ([phase0_findings_iter8.md](phase0_findings_iter8.md)).
|
||
|
||
## What we tested
|
||
|
||
Per iter8 lock: run `eglinfo` and other GL clients via Zink-on-PanVk on ohm, force GL → Vulkan translation, verify Zink picks up PanVk-Bifrost (not llvmpipe).
|
||
|
||
## What happened
|
||
|
||
Zink refused to load on top of PanVk-Bifrost. The error log:
|
||
|
||
```
|
||
MESA: error: Zink requires the nullDescriptor feature of KHR/EXT robustness2.
|
||
```
|
||
|
||
(Emitted twice — Zink probes twice during EGL setup.)
|
||
|
||
Mesa silently fell back to **llvmpipe** (the LLVM-based software rasterizer). EGL/GL still works, but every pixel is rendered on the CPU. For a workload like TuxRacer this would be unusably slow (single-digit FPS at best on the Cortex-A55s in RK3566).
|
||
|
||
## Root cause (Mesa source)
|
||
|
||
`src/panfrost/vulkan/panvk_vX_physical_device.c` (Mesa main):
|
||
|
||
```c
|
||
line 94: .KHR_robustness2 = PAN_ARCH >= 10, // extension advertisement (KHR)
|
||
line 194: .EXT_robustness2 = PAN_ARCH >= 10, // extension advertisement (EXT)
|
||
line 590: .nullDescriptor = PAN_ARCH >= 10, // feature bit
|
||
```
|
||
|
||
Three lines gate the entire robustness2 path on Mali architectures **strictly newer than Valhall-JM**. PAN_ARCH values:
|
||
|
||
- 4/5 — Midgard
|
||
- 6/7 — Bifrost ← Mali-G52 r1 on ohm is 7
|
||
- 9 — Valhall (JM)
|
||
- 10+ — Valhall (CSF) and fifth-gen
|
||
|
||
The gate `>= 10` means **only CSF-class Valhall and fifth-gen get robustness2**. Bifrost is denied even though the underlying NIR/shader plumbing is already arch-agnostic:
|
||
|
||
```c
|
||
panvk_vX_nir_lower_descriptors.c:1309:
|
||
.null_descriptor_support = dev->vk.enabled_features.nullDescriptor,
|
||
|
||
panvk_vX_shader.c:1355:
|
||
.robust_descriptors = dev->vk.enabled_features.nullDescriptor,
|
||
```
|
||
|
||
If the feature were *exposed* on Bifrost, these per-arch code paths would handle it. The gate appears to be conservative ("haven't tested on v6/v7/v9") rather than reflecting hardware incapability.
|
||
|
||
## Why the gate exists
|
||
|
||
Speculation, but informed by [iter1's findings](phase8_iteration1_close.md): the entire Bifrost+Valhall-JM path was set to "not well-tested" — see the same file's [arch gate](phase0_findings.md) at `panvk_physical_device.c:413` that requires `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`. The robustness2 gate is part of the same defensive crouch: don't advertise features that haven't been bench-tested on these archs.
|
||
|
||
iter1–7 proved that the *fundamentals* of the Bifrost driver work. Specifically iter4 ([phase8_iteration4_close.md](phase8_iteration4_close.md)) showed `COMBINED_IMAGE_SAMPLER` descriptors work end-to-end. The risk that "null descriptor" specifically fails on Bifrost is real but bounded — null descriptor means "shader can attempt to read from an unbound descriptor binding without faulting", which is mostly a question of whether the descriptor table has a defined zero entry. PanVk-Bifrost's `bifrost/panvk_vX_meta_desc_copy.c` exists specifically for descriptor table manipulation — the building blocks are there.
|
||
|
||
## Why this matters
|
||
|
||
Without `nullDescriptor`:
|
||
- Zink refuses to use PanVk-Bifrost ⇒ fallback to llvmpipe ⇒ no GPU acceleration for any GL app on Bifrost.
|
||
- TuxRacer-via-Zink (the [README operator-level motivation](README.md)) is **blocked**.
|
||
- Likely many other modern Vulkan apps that opt into robustness2 (it's a popular extension; conformance tests use it) will also break.
|
||
|
||
This is the campaign's **first real driver gap**. Everything before iter8 was "the gate is defensive but the driver works." This is "the gate genuinely blocks an end-user workload."
|
||
|
||
## Proposed Phase 4 fix
|
||
|
||
**Minimal patch:** flip the three `PAN_ARCH >= 10` to a wider range that includes Bifrost:
|
||
|
||
```c
|
||
- .KHR_robustness2 = PAN_ARCH >= 10,
|
||
+ .KHR_robustness2 = true, /* or PAN_ARCH >= 6 if we want to keep Midgard out */
|
||
|
||
- .EXT_robustness2 = PAN_ARCH >= 10,
|
||
+ .EXT_robustness2 = true,
|
||
|
||
- .nullDescriptor = PAN_ARCH >= 10,
|
||
+ .nullDescriptor = true,
|
||
```
|
||
|
||
Risk register:
|
||
1. **Bifrost's descriptor table may handle null-binding-reads differently from Valhall-CSF.** If the NIR `null_descriptor_support` path emits Bifrost ISA that returns zero on null reads (which is the spec'd behavior for `nullDescriptor`), this works. If Bifrost requires a different sequence and the lowering code doesn't have a v6/v7 branch, we'd get either wrong values or a GPU fault on shaders that read null descriptors.
|
||
2. **The KHR/EXT robustness2 also has `nullPointers`, `robustImageAccess2`, `robustBufferAccess2` features.** The gate only mentions `nullDescriptor`, but the extension's other features may have other code paths. Need to check the per-feature gate code.
|
||
3. **Untested code paths in panvk_vX_meta_desc_copy.c** — the Bifrost-specific descriptor copy meta path was last touched 2024 (per iter0 file header). May have bit-rotted.
|
||
|
||
Mitigations:
|
||
- Build the patch as a custom libvulkan_panfrost.so, install side-by-side via `LD_LIBRARY_PATH`, don't overwrite system Mesa. Easy rollback.
|
||
- Validate stepwise: first vulkaninfo (confirms ext list), then eglinfo (confirms Zink picks PanVk), then es2_info (GL context creates), then a simple GL workload.
|
||
- Validation layer continuously enabled.
|
||
|
||
## What this needs from the operator
|
||
|
||
Building Mesa from source on the workstation (or a beefier compile host — `boltzmann`, `data`, distcc cluster) and shipping the patched `libvulkan_panfrost.so` to ohm. That's a **substantial action** the operator should approve:
|
||
|
||
- **Compile time:** Mesa is a big project; expect 30–90 min on a normal aarch64 builder, less with distcc or x86_64 cross-compile.
|
||
- **Install path:** `LD_LIBRARY_PATH=/home/mfritsche/panvk-patched-libs PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1 ...` keeps it isolated. No system files modified.
|
||
- **If it works:** publish via marfrit-packages eventually (per the libva-multiplanar fork model), feed Collabora the patch upstream (or carry out-of-tree per `feedback_no_upstream`).
|
||
- **If it doesn't:** fall back to system Mesa, document what failed.
|
||
|
||
## Status
|
||
|
||
iter8 is **RED, characterized.** Awaiting operator approval to proceed to Phase 4 (the build + patch step).
|
||
|
||
## Reference
|
||
|
||
- Phase 0 lock: [phase0_findings_iter8.md](phase0_findings_iter8.md)
|
||
- Evidence: [phase0_evidence/iter8_zink_failure.txt](phase0_evidence/iter8_zink_failure.txt)
|
||
- Prior cumulative state: [phase8_iteration7_close.md](phase8_iteration7_close.md)
|
||
- Mesa source paths (local clone): `~/src/mesa-ref/mesa/src/panfrost/vulkan/`
|