Files
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

4.7 KiB
Raw Permalink Blame History

Iteration 1 close — GREEN

Closed 2026-05-19 by mfritsche + claude-noether.

Locked question

(From phase0_findings.md)

Get a minimal Vulkan compute workload to execute end-to-end on PanVk-Bifrost on ohm (PineTab2, Mali-G52 r1 MC1) with PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1: write a known value to a host-visible storage buffer from a single-invocation compute shader, fence-wait, read back, verify. No GPU faults in dmesg, no validation errors with VK_LAYER_KHRONOS_validation if installable, no submit timeout.

Result: GREEN

PanVk-Bifrost on Mali-G52 r1 MC1 (RK3566, kernel 7.0.0-danctnix1-6, Mesa 26.0.6) executed the minimal compute probe end-to-end on the first try, 6/6 runs in a row, including 1 run with VK_LAYER_KHRONOS_validation active. Every step in the probe trace passed:

  • vkCreateInstance (Vulkan 1.0 core, no extensions)
  • vkEnumeratePhysicalDevices → "Mali-G52 r1 MC1"
  • vkCreateDevice (1 queue from family 0, flags=GRAPHICS|COMPUTE|TRANSFER)
  • vkCreateBuffer + vkAllocateMemory (memoryType 1: DEVICE_LOCAL|HOST_VISIBLE|HOST_CACHED) + vkBindBufferMemory
  • vkMapMemory (pre-fill 0xDEADBEEF sentinel)
  • Descriptor set layout + pool + allocate + update (1 STORAGE_BUFFER binding)
  • vkCreateShaderModule (560-byte SPV from glslangValidator -V)
  • vkCreatePipelineLayout + vkCreateComputePipelines
  • Command buffer record: bind pipeline + bind descriptor sets + vkCmdDispatch(1,1,1) + memory barrier (SHADER_WRITE → HOST_READ)
  • vkQueueSubmit + vkWaitForFences (5s timeout, completes immediately)
  • vkInvalidateMappedMemoryRanges + readback

buffer[0] = 0xcafebabe (expected 0xcafebabe) — no sentinel left behind, no zero, no garbage. The GPU executed the shader and wrote correctly.

Evidence: phase0_evidence/iter1_compute_probe_run.txt.

No GPU faults, no MMU faults, no kernel-side panfrost messages logged across 6 runs.

What the close tells us

Three of the four hypotheses in phase0_findings.md are not blockers for the minimal compute path:

Hypothesis Status at iter1
H1: vkCreateDevice / queue / sync init gap ✗ no — device + queue + fence + barrier all work
H2: Command buffer recording / cmd_dispatch ✗ no — single dispatch records + submits cleanly
H3: Shader compilation / NIR lowering ✗ no — trivial compute shader compiles and runs correctly
H4: WSI / swapchain (deferred out-of-scope) unchanged — iter1 didn't touch it

This doesn't mean PanVk-Bifrost is universally functional — it means the minimum-viable compute path works. Failures still expected when we add complexity (multiple workgroups, larger buffers, complex shaders, descriptor indexing, real graphics, WSI).

The Mesa upstream "not well-tested on v7" gate (panvk_physical_device.c:413425) reads as conservative rather than reflecting hard breakage on this minimal path.

iter1 in-tree artifacts

Deferred to iter2+ (not in iter1 scope)

  • Multi-workgroup / multi-invocation compute. iter1 was 1 workgroup × 1 invocation.
  • Real graphics workload. iter1 was compute-only. iter2 lock will pivot to graphics.
  • WSI / swapchain. iter1 used host-visible readback, no display.
  • Larger buffers. iter1 was 16 bytes nominal / 64 bytes allocated (memReq alignment).
  • Complex shaders. iter1's shader was a single store; no math, no math+atomic, no nontrivial control flow.
  • TuxRacer / Zink-on-PanVk. Still the end-goal, still many iters away.

Next iter — iter2 lock proposal

Smallest viable graphics workload that exercises the non-compute pipeline parts on PanVk-Bifrost. Proposed pattern:

Allocate a 4×4 VK_FORMAT_R8G8B8A8_UNORM image (COLOR_ATTACHMENT | TRANSFER_SRC), transition UNDEFINED → TRANSFER_DST, vkCmdClearColorImage to a known color (0x11223344), transition TRANSFER_DST → TRANSFER_SRC, vkCmdCopyImageToBuffer to a host-visible buffer, fence-wait, verify all 16 pixels match. No rasterizer, no vertex/fragment shaders, no render pass — just exercise image creation + layout transitions + clear + image-to-buffer copy.

If that passes, iter3 adds: render pass + vertex/fragment pipeline + a single full-screen triangle that paints a constant color (still no vertex data — fullscreen triangle via gl_VertexIndex).

Pacing: iter cadence per libva-multiplanar 8-phase loop. iter2 phase 0 substrate lock when the operator opens the next iter.