Rosenblatt: project scaffold for RK3588 NPU on mainline
Codename: Frank Rosenblatt — Mark I Perceptron 1958, the first
hardware neural network. This project lights up the RK3588 NPU on
mainline Linux so the OSS world finally owns the silicon-side of
inference on that chip.
Phase-1 scope: small LLM running CPU + NPU mix on boltzmann (Rock 5
ITX+). Backend: llama.cpp with a new rknpu ggml backend offloading
INT8 GEMM (attention + FFN matmuls) to the NPU's tile-MAC array while
leaving dequant / RoPE / softmax / sampling / embedding on A76 NEON.
Target model: qwen2.5-1.5B-instruct Q4_K_M GGUF.
Scaffold layout: README.md (frame + 9+1-phase plan), TODO.md (rolling
punch-list), docs/{npu-mainline-status,architecture}.md, kernel/ for
DT bindings + driver tweaks, userspace/{npu-probe,llm-runtime}/,
fleet/boltzmann.yaml.
Next: Phase-1 substrate audit — fill the TBDs in docs/npu-mainline-status.md
with the actual state of Tomeu Vizoso's rknpu / DRM-accel work on
the boltzmann-running kernel.
This commit is contained in:
@@ -0,0 +1,55 @@
|
||||
# rosenblatt fleet manifest — boltzmann (Rock 5 ITX+, RK3588)
|
||||
#
|
||||
# Phase-1 audit host. Always-on, 32 GB DDR4, NVMe rootfs. NPU silicon
|
||||
# present + accessible via Rockchip-BSP vendor module today; mainline
|
||||
# path TBD (see docs/npu-mainline-status.md).
|
||||
|
||||
host: boltzmann
|
||||
arch: arm64
|
||||
soc: rockchip/rk3588
|
||||
board: rock-5-itx-plus
|
||||
distro: archlinuxarm # ALARM aarch64; boltzmann is the umbrella RK3588 host
|
||||
role: primary-development # not yet primary-target (laptop targets land later)
|
||||
|
||||
hardware:
|
||||
cpu: 4×Cortex-A76 (2.4 GHz) + 4×Cortex-A55 (1.8 GHz)
|
||||
ram: 32 GB DDR4-2666
|
||||
storage: NVMe (rootfs) + microSD (recovery)
|
||||
npu:
|
||||
cores: 3
|
||||
tops_int8_per_core: 2 # ~2 TOPS INT8 per core, 6 TOPS aggregate (theoretical peak)
|
||||
local_sram_per_core_mib: 2
|
||||
power_domain: pd_npu
|
||||
|
||||
# Phase-1 audit fills these (pending boltzmann inspection)
|
||||
kernel:
|
||||
running_version: TBD # uname -r snapshot at audit time
|
||||
source: TBD # mainline torvalds / mmind-rockchip / custom
|
||||
npu_driver: TBD # vendor rockchip-npu / mainline rknpu / none
|
||||
|
||||
userspace:
|
||||
rknn_vendor_runtime_installed: false # commitment: stay mainline-clean
|
||||
llama_cpp_installed: TBD # via marfrit-packages or built-from-source
|
||||
|
||||
baseline_measurement:
|
||||
pending: true
|
||||
target: |
|
||||
llama.cpp pure-CPU tok/s on qwen2.5-1.5b-instruct-q4_k_m.gguf,
|
||||
3 runs, median wallclock. Use llama-bench from llama.cpp/build/bin.
|
||||
ground_truth_file: benchmarks/2026-XX-XX_boltzmann_qwen1.5b_cpu_baseline.json
|
||||
|
||||
bringup_sequence:
|
||||
1: substrate audit (docs/npu-mainline-status.md table filled)
|
||||
2: npu-probe runs successfully (open device → 64×64 INT8 matmul → bit-match CPU ref)
|
||||
3: llama.cpp pure-CPU baseline captured
|
||||
4: rknpu ggml backend skeleton compiles
|
||||
5: first llama.cpp matmul offload working on a single layer
|
||||
6: full forward pass via NPU for one decode step
|
||||
7: tok/s vs baseline measured
|
||||
|
||||
backup_host: ampere # CoolPi GenBook — port-validation target. Phase-2+ scope.
|
||||
|
||||
reverse_dependencies:
|
||||
- Quark (boltzmann UEFI) — must stay bootable across kernel-rev experiments
|
||||
- Neutron (boltzmann kernel build) — provides the kernel we tweak for rknpu
|
||||
- Volta (boltzmann umbrella) — Rosenblatt is the third Volta-child after Quark + Neutron
|
||||
Reference in New Issue
Block a user