Rosenblatt: project scaffold for RK3588 NPU on mainline

Codename: Frank Rosenblatt — Mark I Perceptron 1958, the first
hardware neural network.  This project lights up the RK3588 NPU on
mainline Linux so the OSS world finally owns the silicon-side of
inference on that chip.

Phase-1 scope: small LLM running CPU + NPU mix on boltzmann (Rock 5
ITX+).  Backend: llama.cpp with a new rknpu ggml backend offloading
INT8 GEMM (attention + FFN matmuls) to the NPU's tile-MAC array while
leaving dequant / RoPE / softmax / sampling / embedding on A76 NEON.

Target model: qwen2.5-1.5B-instruct Q4_K_M GGUF.

Scaffold layout: README.md (frame + 9+1-phase plan), TODO.md (rolling
punch-list), docs/{npu-mainline-status,architecture}.md, kernel/ for
DT bindings + driver tweaks, userspace/{npu-probe,llm-runtime}/,
fleet/boltzmann.yaml.

Next: Phase-1 substrate audit — fill the TBDs in docs/npu-mainline-status.md
with the actual state of Tomeu Vizoso's rknpu / DRM-accel work on
the boltzmann-running kernel.
This commit is contained in:
2026-05-19 11:57:48 +00:00
commit 24adc74812
8 changed files with 578 additions and 0 deletions
+55
View File
@@ -0,0 +1,55 @@
# rosenblatt fleet manifest — boltzmann (Rock 5 ITX+, RK3588)
#
# Phase-1 audit host. Always-on, 32 GB DDR4, NVMe rootfs. NPU silicon
# present + accessible via Rockchip-BSP vendor module today; mainline
# path TBD (see docs/npu-mainline-status.md).
host: boltzmann
arch: arm64
soc: rockchip/rk3588
board: rock-5-itx-plus
distro: archlinuxarm # ALARM aarch64; boltzmann is the umbrella RK3588 host
role: primary-development # not yet primary-target (laptop targets land later)
hardware:
cpu: 4×Cortex-A76 (2.4 GHz) + 4×Cortex-A55 (1.8 GHz)
ram: 32 GB DDR4-2666
storage: NVMe (rootfs) + microSD (recovery)
npu:
cores: 3
tops_int8_per_core: 2 # ~2 TOPS INT8 per core, 6 TOPS aggregate (theoretical peak)
local_sram_per_core_mib: 2
power_domain: pd_npu
# Phase-1 audit fills these (pending boltzmann inspection)
kernel:
running_version: TBD # uname -r snapshot at audit time
source: TBD # mainline torvalds / mmind-rockchip / custom
npu_driver: TBD # vendor rockchip-npu / mainline rknpu / none
userspace:
rknn_vendor_runtime_installed: false # commitment: stay mainline-clean
llama_cpp_installed: TBD # via marfrit-packages or built-from-source
baseline_measurement:
pending: true
target: |
llama.cpp pure-CPU tok/s on qwen2.5-1.5b-instruct-q4_k_m.gguf,
3 runs, median wallclock. Use llama-bench from llama.cpp/build/bin.
ground_truth_file: benchmarks/2026-XX-XX_boltzmann_qwen1.5b_cpu_baseline.json
bringup_sequence:
1: substrate audit (docs/npu-mainline-status.md table filled)
2: npu-probe runs successfully (open device → 64×64 INT8 matmul → bit-match CPU ref)
3: llama.cpp pure-CPU baseline captured
4: rknpu ggml backend skeleton compiles
5: first llama.cpp matmul offload working on a single layer
6: full forward pass via NPU for one decode step
7: tok/s vs baseline measured
backup_host: ampere # CoolPi GenBook — port-validation target. Phase-2+ scope.
reverse_dependencies:
- Quark (boltzmann UEFI) — must stay bootable across kernel-rev experiments
- Neutron (boltzmann kernel build) — provides the kernel we tweak for rknpu
- Volta (boltzmann umbrella) — Rosenblatt is the third Volta-child after Quark + Neutron