c9a3f5c600
Phase-1 audit closes with a substantively different picture than the original scaffold's TBDs: - Tomeu Vizoso's RK3588 NPU work merged in Linux 6.18 (Nov 2025) under codename `rocket` (NOT `rknpu`). All references updated. - Boltzmann's `linux-rk3588-marfrit-A1` (7.0.0-rc3-ARCH+) already ships `drivers/accel/rocket/rocket.ko` as a built-but-not-loaded module. - DT bindings + per-core nodes (`npu@fdab/c/d_0000`, compatible `rockchip,rk3588-rknn-core`) in mainline since 6.18 but ship `status = "disabled"` — board enable is the Phase-2 unblock, not a driver port. - Mesa 25.3 ships Rocket Gallium + Teflon TFLite delegate as the authoritative userspace reference for the uAPI shape. - Op coverage today is conv-centric (MobileNet-class); transformer matmul needs the conv-1×1 shoehorn (RKNPU2 BSP precedent) or rocket op-set additions. Surfaced as Phase-2-load-bearing risk. - IOMMU v1.0 hazard: 32 GB host needs `mem=4G` or local `rockchip,rk3568-iommu-v1` discriminator patches before the first NPU job, to avoid DMA-window faults. Files: - docs/npu-mainline-status.md: full audit table with upstream pointers (kernel.org / Mesa docs / dri-devel patch URLs / Tomeu's "we are in mainline" blog post). - docs/phases.md: per-phase log entry for Phase-1 closeout. - docs/op-coverage.md: matmul-vs-conv-vs-rocket-op-set framing. - fleet/boltzmann.yaml: audited kernel + npu_driver + dt_npu_nodes state. - kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso: overlay to flip the three rknn-core nodes to "okay" (+ matching mmu nodes), carries the IOMMU-mitigation warning inline. - kernel/README.md: kernel-agent scope wiring + anticipated local carry patches. - README.md: phase-status table + "rknpu → rocket" rename note. - TODO.md: Phase-2 unblock concrete steps + standing upstream-watch items.
77 lines
3.0 KiB
YAML
77 lines
3.0 KiB
YAML
# rosenblatt fleet manifest — boltzmann (Rock 5 ITX+, RK3588)
|
||
#
|
||
# Phase-1 audit host. Always-on, 32 GB DDR4, NVMe rootfs. NPU silicon
|
||
# present + accessible via Rockchip-BSP vendor module today; mainline
|
||
# path TBD (see docs/npu-mainline-status.md).
|
||
|
||
host: boltzmann
|
||
arch: arm64
|
||
soc: rockchip/rk3588
|
||
board: rock-5-itx-plus
|
||
distro: archlinuxarm # ALARM aarch64; boltzmann is the umbrella RK3588 host
|
||
role: primary-development # not yet primary-target (laptop targets land later)
|
||
|
||
hardware:
|
||
cpu: 4×Cortex-A76 (2.4 GHz) + 4×Cortex-A55 (1.8 GHz)
|
||
ram: 32 GB DDR4-2666
|
||
storage: NVMe (rootfs) + microSD (recovery)
|
||
npu:
|
||
cores: 3
|
||
tops_int8_per_core: 2 # ~2 TOPS INT8 per core, 6 TOPS aggregate (theoretical peak)
|
||
local_sram_per_core_mib: 2
|
||
power_domain: pd_npu
|
||
|
||
# Phase-1 audit (captured 2026-05-19)
|
||
kernel:
|
||
running_version: 7.0.0-rc3-ARCH+ # `uname -r`; built Apr 29 2026, custom marfrit-A1
|
||
image: linux-rk3588-marfrit-A1 # from BOOT_IMAGE in /proc/cmdline
|
||
source: marfrit/neutron (mainline-rc3 base, RK3588 patches)
|
||
npu_driver:
|
||
name: rocket # drivers/accel/rocket, author Tomeu Vizoso
|
||
state: built-as-module-not-loaded
|
||
module_path: /lib/modules/7.0.0-rc3-ARCH+/kernel/drivers/accel/rocket/rocket.ko
|
||
config:
|
||
CONFIG_DRM_ACCEL: 'y'
|
||
CONFIG_DRM_ACCEL_ROCKET: m
|
||
binds_compatible: rockchip,rk3588-rknn-core
|
||
depends: [gpu-sched, drm_shmem_helper]
|
||
dt_npu_nodes:
|
||
- addr: 0xfdab0000
|
||
compatible: rockchip,rk3588-rknn-core
|
||
status: disabled
|
||
- addr: 0xfdac0000
|
||
compatible: rockchip,rk3588-rknn-core
|
||
status: disabled
|
||
- addr: 0xfdad0000
|
||
compatible: rockchip,rk3588-rknn-core
|
||
status: disabled
|
||
dev_accel: /dev/accel # not present — major 261 registered in /proc/devices but no device bound
|
||
dmesg_npu_lines: 0 # zero npu/accel/rknpu lines since boot
|
||
|
||
userspace:
|
||
rknn_vendor_runtime_installed: false # commitment: stay mainline-clean
|
||
llama_cpp_installed: TBD # via marfrit-packages or built-from-source
|
||
|
||
baseline_measurement:
|
||
pending: true
|
||
target: |
|
||
llama.cpp pure-CPU tok/s on qwen2.5-1.5b-instruct-q4_k_m.gguf,
|
||
3 runs, median wallclock. Use llama-bench from llama.cpp/build/bin.
|
||
ground_truth_file: benchmarks/2026-XX-XX_boltzmann_qwen1.5b_cpu_baseline.json
|
||
|
||
bringup_sequence:
|
||
1: substrate audit (docs/npu-mainline-status.md table filled)
|
||
2: npu-probe runs successfully (open device → 64×64 INT8 matmul → bit-match CPU ref)
|
||
3: llama.cpp pure-CPU baseline captured
|
||
4: rknpu ggml backend skeleton compiles
|
||
5: first llama.cpp matmul offload working on a single layer
|
||
6: full forward pass via NPU for one decode step
|
||
7: tok/s vs baseline measured
|
||
|
||
backup_host: ampere # CoolPi GenBook — port-validation target. Phase-2+ scope.
|
||
|
||
reverse_dependencies:
|
||
- Quark (boltzmann UEFI) — must stay bootable across kernel-rev experiments
|
||
- Neutron (boltzmann kernel build) — provides the kernel we tweak for rknpu
|
||
- Volta (boltzmann umbrella) — Rosenblatt is the third Volta-child after Quark + Neutron
|