Files
rosenblatt/TODO.md
T
marfrit 24adc74812 Rosenblatt: project scaffold for RK3588 NPU on mainline
Codename: Frank Rosenblatt — Mark I Perceptron 1958, the first
hardware neural network.  This project lights up the RK3588 NPU on
mainline Linux so the OSS world finally owns the silicon-side of
inference on that chip.

Phase-1 scope: small LLM running CPU + NPU mix on boltzmann (Rock 5
ITX+).  Backend: llama.cpp with a new rknpu ggml backend offloading
INT8 GEMM (attention + FFN matmuls) to the NPU's tile-MAC array while
leaving dequant / RoPE / softmax / sampling / embedding on A76 NEON.

Target model: qwen2.5-1.5B-instruct Q4_K_M GGUF.

Scaffold layout: README.md (frame + 9+1-phase plan), TODO.md (rolling
punch-list), docs/{npu-mainline-status,architecture}.md, kernel/ for
DT bindings + driver tweaks, userspace/{npu-probe,llm-runtime}/,
fleet/boltzmann.yaml.

Next: Phase-1 substrate audit — fill the TBDs in docs/npu-mainline-status.md
with the actual state of Tomeu Vizoso's rknpu / DRM-accel work on
the boltzmann-running kernel.
2026-05-19 11:57:48 +00:00

3.0 KiB
Raw Blame History

TODO — Rosenblatt

Rolling punch-list. Older items at bottom (move done → DONE.md when noisy).


Phase 1 — substrate audit

  • On boltzmann: uname -r → record in fleet/boltzmann.yaml:kernel.running_version
  • find / -path '*accel*' -name '*.ko' 2>/dev/null — check if accel framework is built
  • ls /dev/accel/ /dev/dri/ — what's exposed?
  • lsmod | grep -iE 'rknpu|accel' — what's loaded?
  • dmesg | grep -iE 'rknpu|npu|accel' since boot — driver bringup log
  • Tomeu's rknpu series — find on lore.kernel.org/dri-devel, capture latest patch-set version + state (merged / in-review / dropped) → fill table in docs/npu-mainline-status.md
  • Check drivers/accel/ in current torvalds tree — list in-tree accelerators, confirm rknpu's mainline state
  • Check DT bindings: Documentation/devicetree/bindings/npu/rockchip,*.yaml
  • Inspect arch/arm64/boot/dts/rockchip/rk3588.dtsi for npu node
  • If a userspace shim exists (rkneural?), capture repo URL + try hello-world against the running kernel
  • Spec-extract from BSP vendor rockchip-npu source — register map, DMA descriptor format, irq handling. No code lift; spec only.

Phase exit criteria: docs/npu-mainline-status.md table fully populated; clear answer to "do we drive via accel uAPI or write our own MMIO driver."


Phase 2 — formulate

  • List llama.cpp ops by wallclock %, profiling qwen-1.5B Q4_K_M on CPU (use llama.cpp's built-in perf-timer or perf record)
  • Pick the exact INT8 matmul tile size the NPU prefers (read from BSP source)
  • Spec out the smallest backend interface: which ops we MUST handle, which the framework falls back to CPU
  • Write docs/op-coverage.md

Phase 3 — analyze

  • RKNPU2 SDK: trace through librknnrt.so user-API → kernel ioctl shapes (objdump + strings, no actual reverse-engineering of vendor blob — just the syscall surface)
  • Tomeu's accel uAPI: read driver source, understand:
    • submit-job ioctl shape
    • dmabuf import path
    • fence-wait mechanism
    • error reporting
  • BSP vendor rockchip-npu source: register layout, DMA descriptor struct, irq handling sequence

Phase 4 — baseline

  • Build vanilla llama.cpp on boltzmann (mainline branch)
  • Pull qwen2.5-1.5b-instruct Q4_K_M GGUF
  • llama-bench -m qwen2.5-1.5b -p 512 -n 128 × 3 runs
  • Capture JSON to benchmarks/$(date +%F)_boltzmann_qwen1.5b_cpu_baseline.json
  • Record into fleet/boltzmann.yaml:baseline_measurement

Cross-phase / standing items

  • Mirror Tomeu's WIP branch into a local clone for kernel hacking
  • Set up serial console on boltzmann for kernel-panic recovery (Quark umbrella; check current state)
  • Add project_rosenblatt.md to claude-memory once Phase 1 closes (so future sessions don't re-discover the campaign)
  • Decide repo home: marfrit/rosenblatt on git.reauktion.de (probably yes, after Phase-1 substrate is captured and the README isn't embarrassing)