# Iteration 1 close — GREEN Closed **2026-05-19** by mfritsche + claude-noether. ## Locked question (From [phase0_findings.md](phase0_findings.md)) > Get a minimal Vulkan compute workload to execute end-to-end on PanVk-Bifrost on ohm (PineTab2, Mali-G52 r1 MC1) with `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`: write a known value to a host-visible storage buffer from a single-invocation compute shader, fence-wait, read back, verify. No GPU faults in dmesg, no validation errors with `VK_LAYER_KHRONOS_validation` if installable, no submit timeout. ## Result: GREEN PanVk-Bifrost on Mali-G52 r1 MC1 (RK3566, kernel 7.0.0-danctnix1-6, Mesa 26.0.6) executed the minimal compute probe **end-to-end on the first try**, 6/6 runs in a row, including 1 run with `VK_LAYER_KHRONOS_validation` active. Every step in the probe trace passed: - `vkCreateInstance` (Vulkan 1.0 core, no extensions) - `vkEnumeratePhysicalDevices` → "Mali-G52 r1 MC1" - `vkCreateDevice` (1 queue from family 0, flags=`GRAPHICS|COMPUTE|TRANSFER`) - `vkCreateBuffer` + `vkAllocateMemory` (memoryType 1: DEVICE_LOCAL|HOST_VISIBLE|HOST_CACHED) + `vkBindBufferMemory` - `vkMapMemory` (pre-fill 0xDEADBEEF sentinel) - Descriptor set layout + pool + allocate + update (1 STORAGE_BUFFER binding) - `vkCreateShaderModule` (560-byte SPV from `glslangValidator -V`) - `vkCreatePipelineLayout` + `vkCreateComputePipelines` - Command buffer record: bind pipeline + bind descriptor sets + `vkCmdDispatch(1,1,1)` + memory barrier (SHADER_WRITE → HOST_READ) - `vkQueueSubmit` + `vkWaitForFences` (5s timeout, completes immediately) - `vkInvalidateMappedMemoryRanges` + readback **`buffer[0] = 0xcafebabe` (expected 0xcafebabe)** — no sentinel left behind, no zero, no garbage. The GPU executed the shader and wrote correctly. Evidence: [`phase0_evidence/iter1_compute_probe_run.txt`](phase0_evidence/iter1_compute_probe_run.txt). No GPU faults, no MMU faults, no kernel-side panfrost messages logged across 6 runs. ## What the close tells us Three of the four hypotheses in [phase0_findings.md](phase0_findings.md) are **not blockers** for the minimal compute path: | Hypothesis | Status at iter1 | |---|---| | H1: vkCreateDevice / queue / sync init gap | ✗ no — device + queue + fence + barrier all work | | H2: Command buffer recording / cmd_dispatch | ✗ no — single dispatch records + submits cleanly | | H3: Shader compilation / NIR lowering | ✗ no — trivial compute shader compiles and runs correctly | | H4: WSI / swapchain (deferred out-of-scope) | unchanged — iter1 didn't touch it | This **doesn't** mean PanVk-Bifrost is universally functional — it means the *minimum-viable compute path* works. Failures still expected when we add complexity (multiple workgroups, larger buffers, complex shaders, descriptor indexing, real graphics, WSI). The Mesa upstream "not well-tested on v7" gate (panvk_physical_device.c:413–425) reads as **conservative** rather than reflecting hard breakage on this minimal path. ## iter1 in-tree artifacts - [`iter1/probe_compute.c`](iter1/probe_compute.c) — pure Vulkan 1.0 core probe (~270 LoC) - [`iter1/probe_compute.comp`](iter1/probe_compute.comp) — 4-line GLSL shader - [`iter1/Makefile`](iter1/Makefile) — `make` builds, `make run` / `make run-validation` runs ## Deferred to iter2+ (not in iter1 scope) - **Multi-workgroup / multi-invocation compute.** iter1 was 1 workgroup × 1 invocation. - **Real graphics workload.** iter1 was compute-only. iter2 lock will pivot to graphics. - **WSI / swapchain.** iter1 used host-visible readback, no display. - **Larger buffers.** iter1 was 16 bytes nominal / 64 bytes allocated (memReq alignment). - **Complex shaders.** iter1's shader was a single store; no math, no math+atomic, no nontrivial control flow. - **TuxRacer / Zink-on-PanVk.** Still the end-goal, still many iters away. ## Next iter — iter2 lock proposal Smallest viable graphics workload that exercises the **non-compute** pipeline parts on PanVk-Bifrost. Proposed pattern: > **Allocate a 4×4 `VK_FORMAT_R8G8B8A8_UNORM` image (COLOR_ATTACHMENT | TRANSFER_SRC), transition UNDEFINED → TRANSFER_DST, `vkCmdClearColorImage` to a known color (0x11223344), transition TRANSFER_DST → TRANSFER_SRC, `vkCmdCopyImageToBuffer` to a host-visible buffer, fence-wait, verify all 16 pixels match. No rasterizer, no vertex/fragment shaders, no render pass — just exercise image creation + layout transitions + clear + image-to-buffer copy.** If that passes, iter3 adds: render pass + vertex/fragment pipeline + a single full-screen triangle that paints a constant color (still no vertex data — fullscreen triangle via `gl_VertexIndex`). Pacing: iter cadence per libva-multiplanar 8-phase loop. iter2 phase 0 substrate lock when the operator opens the next iter.