# Phase 0 — substrate for iter2 Opened **2026-05-19** by mfritsche + claude-noether, immediately after iter1 closed GREEN ([phase8_iteration1_close.md](phase8_iteration1_close.md)). ## Locked research question — iter2 > **Get a minimal Vulkan image-side workload to execute end-to-end on PanVk-Bifrost (ohm / Mali-G52 r1 MC1 / `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`): create a 4×4 `VK_FORMAT_R8G8B8A8_UNORM` image with `TRANSFER_DST|TRANSFER_SRC` usage on device-local memory, transition UNDEFINED → TRANSFER_DST_OPTIMAL, `vkCmdClearColorImage` to color 0x11223344 (R=0x11 G=0x22 B=0x33 A=0x44), transition TRANSFER_DST_OPTIMAL → TRANSFER_SRC_OPTIMAL, `vkCmdCopyImageToBuffer` to host-visible staging, fence-wait, verify all 16 pixels read back as 0x44332211 (little-endian uint32). No GPU faults in dmesg, no validation errors.** > > If GREEN: lock iter3 against a first-triangle graphics pipeline (vertex + fragment shader, fullscreen triangle via `gl_VertexIndex`, render pass or dynamic rendering, single draw, readback). > > If RED: characterize the first failure point and fix or work around in iter2. ## Why this shape iter1 collapsed three of four phase0 hypotheses on the compute side (device init, cmd-buffer recording, shader compilation). iter2 bridges from compute to graphics by adding **only image-handling machinery**, keeping the same submit/sync skeleton: - `VkImage` create + bind (new) - Image layout transitions via `VkImageMemoryBarrier` (new) - `vkCmdClearColorImage` (new — but it's a transfer op, not a real graphics pipeline) - `vkCmdCopyImageToBuffer` (new — but again a transfer op) - Optimal tiling (new — Bifrost arranges tiles differently from Valhall in some cases) Notably **not** in iter2: - Render pass / dynamic rendering - Vertex + fragment shaders - Graphics pipeline state (rasterizer, viewport, blend, depth) - Vertex buffers / index buffers - Framebuffer So if iter2 fails, the failure points to **image/layout/transfer machinery**, not "graphics pipeline" in general. That's a usefully narrow target. ## Hypothesis space (where iter2 may fail) 1. **Image creation + memory binding.** Bifrost has specific tiling layouts (e.g. block-based tiling). `vkGetImageMemoryRequirements` may report a size and alignment Mesa's PanVk-Bifrost path can't satisfy, or the allocator may pick a memory type that's not actually usable for an optimal-tiled image. 2. **Layout transitions via image barriers.** The Bifrost cache / tiler invalidation hooks may not be wired into the JM submit path consistently. Specifically: UNDEFINED → TRANSFER_DST and TRANSFER_DST → TRANSFER_SRC transitions need to flush L2 / invalidate tile caches, and that's per-arch code that may have rotted in the v6/v7 paths. 3. **`vkCmdClearColorImage` lowering.** PanVk may lower `vkCmdClearColorImage` to either a real hardware clear (tile-level) or a compute shader (meta clear). Bifrost-specific paths exist (the lone `bifrost/panvk_vX_meta_desc_copy.c` is descriptor-copy meta — a similar clear-meta path may or may not work). 4. **`vkCmdCopyImageToBuffer` + tiling decode.** Bifrost optimal tiling is non-linear — a copy from optimal-tiled image to a linear buffer needs the tiler to detile correctly. If detile is wrong, the readback will show pixel-shuffled output (recognizable from the pattern of 0x11/0x22/0x33/0x44 bytes). The clear color 0x11223344 was chosen specifically: each pixel byte is distinct, so a pixel-shuffle bug will show up as wrong-byte-order rather than all-zeros (which would mean clear didn't fire at all). ## Phase 0 deliverables This document. iter2's substrate is lighter than iter1's because iter1 already proved out the broader environment. ## In-scope (LOCKED 2026-05-19 for iter2) - Hardware: ohm only. - Format: R8G8B8A8_UNORM, optimal tiling, 4×4, 1 mip, 1 layer. - Operations: image create + bind + 2 layout transitions + clear + image-to-buffer copy. - Verification: all 16 pixels = 0x44332211. ## Out-of-scope (LOCKED 2026-05-19 for iter2) - Render pass / dynamic rendering (iter3). - Vertex / fragment shaders (iter3). - Graphics pipeline state (iter3). - WSI / swapchain (iter4+). - Larger / multi-mip / multi-layer / multi-sample images. - Other formats (R5G6B5, R32G32B32A32_SFLOAT, depth/stencil, ASTC). Add later if it makes sense to exercise per-format codepaths. - Sub-region clears, scissored copies. - Compute path (proven in iter1; not revisited). - Upstreaming. ## Reference - [phase0_findings.md](phase0_findings.md) — campaign substrate. - [phase8_iteration1_close.md](phase8_iteration1_close.md) — iter1 close. - [iter1/](iter1/) — compute probe (reusable skeleton for iter2).