# Iteration 1 close — GREEN

Closed **2026-05-19** by mfritsche + claude-noether.

## Locked question

(From [phase0_findings.md](phase0_findings.md))

> Get a minimal Vulkan compute workload to execute end-to-end on PanVk-Bifrost on ohm (PineTab2, Mali-G52 r1 MC1) with `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`: write a known value to a host-visible storage buffer from a single-invocation compute shader, fence-wait, read back, verify. No GPU faults in dmesg, no validation errors with `VK_LAYER_KHRONOS_validation` if installable, no submit timeout.

## Result: GREEN

PanVk-Bifrost on Mali-G52 r1 MC1 (RK3566, kernel 7.0.0-danctnix1-6, Mesa 26.0.6) executed the minimal compute probe **end-to-end on the first try**, 6/6 runs in a row, including 1 run with `VK_LAYER_KHRONOS_validation` active. Every step in the probe trace passed:

- `vkCreateInstance` (Vulkan 1.0 core, no extensions)
- `vkEnumeratePhysicalDevices` → "Mali-G52 r1 MC1"
- `vkCreateDevice` (1 queue from family 0, flags=`GRAPHICS|COMPUTE|TRANSFER`)
- `vkCreateBuffer` + `vkAllocateMemory` (memoryType 1: DEVICE_LOCAL|HOST_VISIBLE|HOST_CACHED) + `vkBindBufferMemory`
- `vkMapMemory` (pre-fill 0xDEADBEEF sentinel)
- Descriptor set layout + pool + allocate + update (1 STORAGE_BUFFER binding)
- `vkCreateShaderModule` (560-byte SPV from `glslangValidator -V`)
- `vkCreatePipelineLayout` + `vkCreateComputePipelines`
- Command buffer record: bind pipeline + bind descriptor sets + `vkCmdDispatch(1,1,1)` + memory barrier (SHADER_WRITE → HOST_READ)
- `vkQueueSubmit` + `vkWaitForFences` (5s timeout, completes immediately)
- `vkInvalidateMappedMemoryRanges` + readback

**`buffer[0] = 0xcafebabe` (expected 0xcafebabe)** — no sentinel left behind, no zero, no garbage. The GPU executed the shader and wrote correctly.

Evidence: [`phase0_evidence/iter1_compute_probe_run.txt`](phase0_evidence/iter1_compute_probe_run.txt).

No GPU faults, no MMU faults, no kernel-side panfrost messages logged across 6 runs.

## What the close tells us

Three of the four hypotheses in [phase0_findings.md](phase0_findings.md) are **not blockers** for the minimal compute path:

| Hypothesis | Status at iter1 |
|---|---|
| H1: vkCreateDevice / queue / sync init gap | ✗ no — device + queue + fence + barrier all work |
| H2: Command buffer recording / cmd_dispatch | ✗ no — single dispatch records + submits cleanly |
| H3: Shader compilation / NIR lowering | ✗ no — trivial compute shader compiles and runs correctly |
| H4: WSI / swapchain (deferred out-of-scope) | unchanged — iter1 didn't touch it |

This **doesn't** mean PanVk-Bifrost is universally functional — it means the *minimum-viable compute path* works. Failures still expected when we add complexity (multiple workgroups, larger buffers, complex shaders, descriptor indexing, real graphics, WSI).

The Mesa upstream "not well-tested on v7" gate (panvk_physical_device.c:413–425) reads as **conservative** rather than reflecting hard breakage on this minimal path.

## iter1 in-tree artifacts

- [`iter1/probe_compute.c`](iter1/probe_compute.c) — pure Vulkan 1.0 core probe (~270 LoC)
- [`iter1/probe_compute.comp`](iter1/probe_compute.comp) — 4-line GLSL shader
- [`iter1/Makefile`](iter1/Makefile) — `make` builds, `make run` / `make run-validation` runs

## Deferred to iter2+ (not in iter1 scope)

- **Multi-workgroup / multi-invocation compute.** iter1 was 1 workgroup × 1 invocation.
- **Real graphics workload.** iter1 was compute-only. iter2 lock will pivot to graphics.
- **WSI / swapchain.** iter1 used host-visible readback, no display.
- **Larger buffers.** iter1 was 16 bytes nominal / 64 bytes allocated (memReq alignment).
- **Complex shaders.** iter1's shader was a single store; no math, no math+atomic, no nontrivial control flow.
- **TuxRacer / Zink-on-PanVk.** Still the end-goal, still many iters away.

## Next iter — iter2 lock proposal

Smallest viable graphics workload that exercises the **non-compute** pipeline parts on PanVk-Bifrost. Proposed pattern:

> **Allocate a 4×4 `VK_FORMAT_R8G8B8A8_UNORM` image (COLOR_ATTACHMENT | TRANSFER_SRC), transition UNDEFINED → TRANSFER_DST, `vkCmdClearColorImage` to a known color (0x11223344), transition TRANSFER_DST → TRANSFER_SRC, `vkCmdCopyImageToBuffer` to a host-visible buffer, fence-wait, verify all 16 pixels match. No rasterizer, no vertex/fragment shaders, no render pass — just exercise image creation + layout transitions + clear + image-to-buffer copy.**

If that passes, iter3 adds: render pass + vertex/fragment pipeline + a single full-screen triangle that paints a constant color (still no vertex data — fullscreen triangle via `gl_VertexIndex`).

Pacing: iter cadence per libva-multiplanar 8-phase loop. iter2 phase 0 substrate lock when the operator opens the next iter.