# TODO — Rosenblatt Rolling punch-list. Older items at bottom (move done → DONE.md when noisy). --- ## Phase 1 — substrate audit - [ ] On boltzmann: `uname -r` → record in `fleet/boltzmann.yaml:kernel.running_version` - [ ] `find / -path '*accel*' -name '*.ko' 2>/dev/null` — check if accel framework is built - [ ] `ls /dev/accel/ /dev/dri/` — what's exposed? - [ ] `lsmod | grep -iE 'rknpu|accel'` — what's loaded? - [ ] `dmesg | grep -iE 'rknpu|npu|accel'` since boot — driver bringup log - [ ] Tomeu's rknpu series — find on lore.kernel.org/dri-devel, capture latest patch-set version + state (merged / in-review / dropped) → fill table in `docs/npu-mainline-status.md` - [ ] Check `drivers/accel/` in current torvalds tree — list in-tree accelerators, confirm rknpu's mainline state - [ ] Check DT bindings: `Documentation/devicetree/bindings/npu/rockchip,*.yaml` - [ ] Inspect `arch/arm64/boot/dts/rockchip/rk3588.dtsi` for `npu` node - [ ] If a userspace shim exists (rkneural?), capture repo URL + try hello-world against the running kernel - [ ] Spec-extract from BSP vendor `rockchip-npu` source — register map, DMA descriptor format, irq handling. No code lift; spec only. Phase exit criteria: `docs/npu-mainline-status.md` table fully populated; clear answer to "do we drive via accel uAPI or write our own MMIO driver." --- ## Phase 2 — formulate - [ ] List llama.cpp ops by wallclock %, profiling qwen-1.5B Q4_K_M on CPU (use llama.cpp's built-in perf-timer or perf record) - [ ] Pick the exact INT8 matmul tile size the NPU prefers (read from BSP source) - [ ] Spec out the smallest backend interface: which ops we MUST handle, which the framework falls back to CPU - [ ] Write `docs/op-coverage.md` --- ## Phase 3 — analyze - [ ] RKNPU2 SDK: trace through `librknnrt.so` user-API → kernel ioctl shapes (objdump + strings, no actual reverse-engineering of vendor blob — just the syscall surface) - [ ] Tomeu's accel uAPI: read driver source, understand: - submit-job ioctl shape - dmabuf import path - fence-wait mechanism - error reporting - [ ] BSP vendor `rockchip-npu` source: register layout, DMA descriptor struct, irq handling sequence --- ## Phase 4 — baseline - [ ] Build vanilla llama.cpp on boltzmann (mainline branch) - [ ] Pull qwen2.5-1.5b-instruct Q4_K_M GGUF - [ ] `llama-bench -m qwen2.5-1.5b -p 512 -n 128` × 3 runs - [ ] Capture JSON to `benchmarks/$(date +%F)_boltzmann_qwen1.5b_cpu_baseline.json` - [ ] Record into `fleet/boltzmann.yaml:baseline_measurement` --- ## Phase-2 unblock — prerequisites (NEW, Phase-1 outflow) Phase-1 audit (2026-05-19) reframed Phase-2 from "design rknpu backend interface" to a concrete bringup sequence. See `docs/npu-mainline-status.md` for full context. - [ ] Patch boltzmann board DTS / overlay to flip `npu@fdab0000`, `npu@fdac0000`, `npu@fdad0000` from `status = "disabled"` → `"okay"`. Rebuild DTB. - [ ] **Mitigate IOMMU v1.0 hazard before first NPU job** (32 GB host). Pick one: - (A) Boot with `mem=4G` for first-bringup validation, OR - (B) Carry local patches: Simon Xue per-device-ops (``) + Midgy `rockchip,rk3568-iommu-v1` discriminator compat + DT update for `rknpu_mmu` to the new compat. - [ ] `modprobe rocket` and confirm `/dev/accel/accel0..2` appear, no probe errors in dmesg, IOMMU faults absent. - [ ] Read `drivers/accel/rocket/rocket_job.c` + Mesa Rocket Gallium to determine submit-job uAPI capabilities — specifically whether we can express a transformer matmul as a tile/op the NPU pipeline accepts, or whether we need additional op coverage upstream. - [ ] Decide matmul strategy (Phase-2 deliverable): conv-1×1 shoehorn / extend rocket op set / thinner submit shim. ## Standing items — track upstream - [ ] Watch `drivers/iommu/rockchip-iommu.c` for the discriminator `rockchip,rk3568-iommu-v1` compat to land; drop local patch (B) when it does. - [ ] Watch `linux-rockchip` for the next iteration of Midgy / Simon's thread (last visible activity 2026-04-03). - [ ] Watch `drivers/accel/rocket/` for matmul / GEMM op additions. ## Cross-phase / standing items (older) - [ ] Mirror Tomeu's branch — superseded: code is now in-tree. Keep `git.kernel.org/.../torvalds/linux.git` checkout pinned to the boltzmann kernel rev for in-tree reading. - [ ] Set up serial console on boltzmann for kernel-panic recovery (Quark umbrella; check current state) — **becomes load-bearing once we start poking IOMMU code.** - [x] Add `project_rosenblatt_overview.md` + `project_rocket_upstream_state.md` to claude-memory — done 2026-05-19. - [ ] Decide repo home: marfrit/rosenblatt on git.reauktion.de (probably yes, after Phase-1 substrate is captured). - [ ] **Resolve board-name discrepancy.** README and `fleet/boltzmann.yaml` say boltzmann is a "Rock 5 ITX+" / `rock-5-itx-plus`; the running DT reports `model = "Radxa ROCK 5 ITX"`, `compatible = "radxa,rock-5-itx"`. Confirm physical board model (Radxa sells both SKUs) and either correct the README + manifest, or note that we boot the plain-ITX DT on ITX+ hardware (likely fine; ITX+ is mostly a connectivity-refresh, same SoC + same NPU silicon).