Rosenblatt: project scaffold for RK3588 NPU on mainline

Codename: Frank Rosenblatt — Mark I Perceptron 1958, the first hardware neural network. This project lights up the RK3588 NPU on mainline Linux so the OSS world finally owns the silicon-side of inference on that chip. Phase-1 scope: small LLM running CPU + NPU mix on boltzmann (Rock 5 ITX+). Backend: llama.cpp with a new rknpu ggml backend offloading INT8 GEMM (attention + FFN matmuls) to the NPU's tile-MAC array while leaving dequant / RoPE / softmax / sampling / embedding on A76 NEON. Target model: qwen2.5-1.5B-instruct Q4_K_M GGUF. Scaffold layout: README.md (frame + 9+1-phase plan), TODO.md (rolling punch-list), docs/{npu-mainline-status,architecture}.md, kernel/ for DT bindings + driver tweaks, userspace/{npu-probe,llm-runtime}/, fleet/boltzmann.yaml. Next: Phase-1 substrate audit — fill the TBDs in docs/npu-mainline-status.md with the actual state of Tomeu Vizoso's rknpu / DRM-accel work on the boltzmann-running kernel.
2026-05-19 11:57:48 +00:00
commit 24adc74812
8 changed files with 578 additions and 0 deletions
@@ -0,0 +1,55 @@
+# rosenblatt fleet manifest — boltzmann (Rock 5 ITX+, RK3588)
+#
+# Phase-1 audit host. Always-on, 32 GB DDR4, NVMe rootfs. NPU silicon
+# present + accessible via Rockchip-BSP vendor module today; mainline
+# path TBD (see docs/npu-mainline-status.md).
+
+host: boltzmann
+arch: arm64
+soc: rockchip/rk3588
+board: rock-5-itx-plus
+distro: archlinuxarm  # ALARM aarch64; boltzmann is the umbrella RK3588 host
+role: primary-development  # not yet primary-target (laptop targets land later)
+
+hardware:
+  cpu: 4×Cortex-A76 (2.4 GHz) + 4×Cortex-A55 (1.8 GHz)
+  ram: 32 GB DDR4-2666
+  storage: NVMe (rootfs) + microSD (recovery)
+  npu:
+    cores: 3
+    tops_int8_per_core: 2  # ~2 TOPS INT8 per core, 6 TOPS aggregate (theoretical peak)
+    local_sram_per_core_mib: 2
+    power_domain: pd_npu
+
+# Phase-1 audit fills these (pending boltzmann inspection)
+kernel:
+  running_version: TBD  # uname -r snapshot at audit time
+  source: TBD          # mainline torvalds / mmind-rockchip / custom
+  npu_driver: TBD      # vendor rockchip-npu / mainline rknpu / none
+
+userspace:
+  rknn_vendor_runtime_installed: false  # commitment: stay mainline-clean
+  llama_cpp_installed: TBD              # via marfrit-packages or built-from-source
+
+baseline_measurement:
+  pending: true
+  target: |
+    llama.cpp pure-CPU tok/s on qwen2.5-1.5b-instruct-q4_k_m.gguf,
+    3 runs, median wallclock. Use llama-bench from llama.cpp/build/bin.
+  ground_truth_file: benchmarks/2026-XX-XX_boltzmann_qwen1.5b_cpu_baseline.json
+
+bringup_sequence:
+  1: substrate audit (docs/npu-mainline-status.md table filled)
+  2: npu-probe runs successfully (open device → 64×64 INT8 matmul → bit-match CPU ref)
+  3: llama.cpp pure-CPU baseline captured
+  4: rknpu ggml backend skeleton compiles
+  5: first llama.cpp matmul offload working on a single layer
+  6: full forward pass via NPU for one decode step
+  7: tok/s vs baseline measured
+
+backup_host: ampere  # CoolPi GenBook — port-validation target. Phase-2+ scope.
+
+reverse_dependencies:
+  - Quark (boltzmann UEFI) — must stay bootable across kernel-rev experiments
+  - Neutron (boltzmann kernel build) — provides the kernel we tweak for rknpu
+  - Volta (boltzmann umbrella) — Rosenblatt is the third Volta-child after Quark + Neutron