24adc74812
Codename: Frank Rosenblatt — Mark I Perceptron 1958, the first
hardware neural network. This project lights up the RK3588 NPU on
mainline Linux so the OSS world finally owns the silicon-side of
inference on that chip.
Phase-1 scope: small LLM running CPU + NPU mix on boltzmann (Rock 5
ITX+). Backend: llama.cpp with a new rknpu ggml backend offloading
INT8 GEMM (attention + FFN matmuls) to the NPU's tile-MAC array while
leaving dequant / RoPE / softmax / sampling / embedding on A76 NEON.
Target model: qwen2.5-1.5B-instruct Q4_K_M GGUF.
Scaffold layout: README.md (frame + 9+1-phase plan), TODO.md (rolling
punch-list), docs/{npu-mainline-status,architecture}.md, kernel/ for
DT bindings + driver tweaks, userspace/{npu-probe,llm-runtime}/,
fleet/boltzmann.yaml.
Next: Phase-1 substrate audit — fill the TBDs in docs/npu-mainline-status.md
with the actual state of Tomeu Vizoso's rknpu / DRM-accel work on
the boltzmann-running kernel.
76 lines
3.0 KiB
Markdown
76 lines
3.0 KiB
Markdown
# TODO — Rosenblatt
|
||
|
||
Rolling punch-list. Older items at bottom (move done → DONE.md when noisy).
|
||
|
||
---
|
||
|
||
## Phase 1 — substrate audit
|
||
|
||
- [ ] On boltzmann: `uname -r` → record in `fleet/boltzmann.yaml:kernel.running_version`
|
||
- [ ] `find / -path '*accel*' -name '*.ko' 2>/dev/null` — check if accel framework is built
|
||
- [ ] `ls /dev/accel/ /dev/dri/` — what's exposed?
|
||
- [ ] `lsmod | grep -iE 'rknpu|accel'` — what's loaded?
|
||
- [ ] `dmesg | grep -iE 'rknpu|npu|accel'` since boot — driver bringup log
|
||
- [ ] Tomeu's rknpu series — find on lore.kernel.org/dri-devel, capture latest
|
||
patch-set version + state (merged / in-review / dropped) → fill table in
|
||
`docs/npu-mainline-status.md`
|
||
- [ ] Check `drivers/accel/` in current torvalds tree — list in-tree
|
||
accelerators, confirm rknpu's mainline state
|
||
- [ ] Check DT bindings: `Documentation/devicetree/bindings/npu/rockchip,*.yaml`
|
||
- [ ] Inspect `arch/arm64/boot/dts/rockchip/rk3588.dtsi` for `npu` node
|
||
- [ ] If a userspace shim exists (rkneural?), capture repo URL + try
|
||
hello-world against the running kernel
|
||
- [ ] Spec-extract from BSP vendor `rockchip-npu` source — register map,
|
||
DMA descriptor format, irq handling. No code lift; spec only.
|
||
|
||
Phase exit criteria: `docs/npu-mainline-status.md` table fully populated;
|
||
clear answer to "do we drive via accel uAPI or write our own MMIO driver."
|
||
|
||
---
|
||
|
||
## Phase 2 — formulate
|
||
|
||
- [ ] List llama.cpp ops by wallclock %, profiling qwen-1.5B Q4_K_M on CPU
|
||
(use llama.cpp's built-in perf-timer or perf record)
|
||
- [ ] Pick the exact INT8 matmul tile size the NPU prefers (read from BSP source)
|
||
- [ ] Spec out the smallest backend interface: which ops we MUST handle,
|
||
which the framework falls back to CPU
|
||
- [ ] Write `docs/op-coverage.md`
|
||
|
||
---
|
||
|
||
## Phase 3 — analyze
|
||
|
||
- [ ] RKNPU2 SDK: trace through `librknnrt.so` user-API → kernel ioctl shapes
|
||
(objdump + strings, no actual reverse-engineering of vendor blob — just
|
||
the syscall surface)
|
||
- [ ] Tomeu's accel uAPI: read driver source, understand:
|
||
- submit-job ioctl shape
|
||
- dmabuf import path
|
||
- fence-wait mechanism
|
||
- error reporting
|
||
- [ ] BSP vendor `rockchip-npu` source: register layout, DMA descriptor
|
||
struct, irq handling sequence
|
||
|
||
---
|
||
|
||
## Phase 4 — baseline
|
||
|
||
- [ ] Build vanilla llama.cpp on boltzmann (mainline branch)
|
||
- [ ] Pull qwen2.5-1.5b-instruct Q4_K_M GGUF
|
||
- [ ] `llama-bench -m qwen2.5-1.5b -p 512 -n 128` × 3 runs
|
||
- [ ] Capture JSON to `benchmarks/$(date +%F)_boltzmann_qwen1.5b_cpu_baseline.json`
|
||
- [ ] Record into `fleet/boltzmann.yaml:baseline_measurement`
|
||
|
||
---
|
||
|
||
## Cross-phase / standing items
|
||
|
||
- [ ] Mirror Tomeu's WIP branch into a local clone for kernel hacking
|
||
- [ ] Set up serial console on boltzmann for kernel-panic recovery (Quark
|
||
umbrella; check current state)
|
||
- [ ] Add `project_rosenblatt.md` to claude-memory once Phase 1 closes (so
|
||
future sessions don't re-discover the campaign)
|
||
- [ ] Decide repo home: marfrit/rosenblatt on git.reauktion.de (probably yes,
|
||
after Phase-1 substrate is captured and the README isn't embarrassing)
|