panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.7 KiB
Iteration 1 close — GREEN
Closed 2026-05-19 by mfritsche + claude-noether.
Locked question
(From phase0_findings.md)
Get a minimal Vulkan compute workload to execute end-to-end on PanVk-Bifrost on ohm (PineTab2, Mali-G52 r1 MC1) with
PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1: write a known value to a host-visible storage buffer from a single-invocation compute shader, fence-wait, read back, verify. No GPU faults in dmesg, no validation errors withVK_LAYER_KHRONOS_validationif installable, no submit timeout.
Result: GREEN
PanVk-Bifrost on Mali-G52 r1 MC1 (RK3566, kernel 7.0.0-danctnix1-6, Mesa 26.0.6) executed the minimal compute probe end-to-end on the first try, 6/6 runs in a row, including 1 run with VK_LAYER_KHRONOS_validation active. Every step in the probe trace passed:
vkCreateInstance(Vulkan 1.0 core, no extensions)vkEnumeratePhysicalDevices→ "Mali-G52 r1 MC1"vkCreateDevice(1 queue from family 0, flags=GRAPHICS|COMPUTE|TRANSFER)vkCreateBuffer+vkAllocateMemory(memoryType 1: DEVICE_LOCAL|HOST_VISIBLE|HOST_CACHED) +vkBindBufferMemoryvkMapMemory(pre-fill 0xDEADBEEF sentinel)- Descriptor set layout + pool + allocate + update (1 STORAGE_BUFFER binding)
vkCreateShaderModule(560-byte SPV fromglslangValidator -V)vkCreatePipelineLayout+vkCreateComputePipelines- Command buffer record: bind pipeline + bind descriptor sets +
vkCmdDispatch(1,1,1)+ memory barrier (SHADER_WRITE → HOST_READ) vkQueueSubmit+vkWaitForFences(5s timeout, completes immediately)vkInvalidateMappedMemoryRanges+ readback
buffer[0] = 0xcafebabe (expected 0xcafebabe) — no sentinel left behind, no zero, no garbage. The GPU executed the shader and wrote correctly.
Evidence: phase0_evidence/iter1_compute_probe_run.txt.
No GPU faults, no MMU faults, no kernel-side panfrost messages logged across 6 runs.
What the close tells us
Three of the four hypotheses in phase0_findings.md are not blockers for the minimal compute path:
| Hypothesis | Status at iter1 |
|---|---|
| H1: vkCreateDevice / queue / sync init gap | ✗ no — device + queue + fence + barrier all work |
| H2: Command buffer recording / cmd_dispatch | ✗ no — single dispatch records + submits cleanly |
| H3: Shader compilation / NIR lowering | ✗ no — trivial compute shader compiles and runs correctly |
| H4: WSI / swapchain (deferred out-of-scope) | unchanged — iter1 didn't touch it |
This doesn't mean PanVk-Bifrost is universally functional — it means the minimum-viable compute path works. Failures still expected when we add complexity (multiple workgroups, larger buffers, complex shaders, descriptor indexing, real graphics, WSI).
The Mesa upstream "not well-tested on v7" gate (panvk_physical_device.c:413–425) reads as conservative rather than reflecting hard breakage on this minimal path.
iter1 in-tree artifacts
iter1/probe_compute.c— pure Vulkan 1.0 core probe (~270 LoC)iter1/probe_compute.comp— 4-line GLSL shaderiter1/Makefile—makebuilds,make run/make run-validationruns
Deferred to iter2+ (not in iter1 scope)
- Multi-workgroup / multi-invocation compute. iter1 was 1 workgroup × 1 invocation.
- Real graphics workload. iter1 was compute-only. iter2 lock will pivot to graphics.
- WSI / swapchain. iter1 used host-visible readback, no display.
- Larger buffers. iter1 was 16 bytes nominal / 64 bytes allocated (memReq alignment).
- Complex shaders. iter1's shader was a single store; no math, no math+atomic, no nontrivial control flow.
- TuxRacer / Zink-on-PanVk. Still the end-goal, still many iters away.
Next iter — iter2 lock proposal
Smallest viable graphics workload that exercises the non-compute pipeline parts on PanVk-Bifrost. Proposed pattern:
Allocate a 4×4
VK_FORMAT_R8G8B8A8_UNORMimage (COLOR_ATTACHMENT | TRANSFER_SRC), transition UNDEFINED → TRANSFER_DST,vkCmdClearColorImageto a known color (0x11223344), transition TRANSFER_DST → TRANSFER_SRC,vkCmdCopyImageToBufferto a host-visible buffer, fence-wait, verify all 16 pixels match. No rasterizer, no vertex/fragment shaders, no render pass — just exercise image creation + layout transitions + clear + image-to-buffer copy.
If that passes, iter3 adds: render pass + vertex/fragment pipeline + a single full-screen triangle that paints a constant color (still no vertex data — fullscreen triangle via gl_VertexIndex).
Pacing: iter cadence per libva-multiplanar 8-phase loop. iter2 phase 0 substrate lock when the operator opens the next iter.