c9a3f5c600
Phase-1 audit closes with a substantively different picture than the original scaffold's TBDs: - Tomeu Vizoso's RK3588 NPU work merged in Linux 6.18 (Nov 2025) under codename `rocket` (NOT `rknpu`). All references updated. - Boltzmann's `linux-rk3588-marfrit-A1` (7.0.0-rc3-ARCH+) already ships `drivers/accel/rocket/rocket.ko` as a built-but-not-loaded module. - DT bindings + per-core nodes (`npu@fdab/c/d_0000`, compatible `rockchip,rk3588-rknn-core`) in mainline since 6.18 but ship `status = "disabled"` — board enable is the Phase-2 unblock, not a driver port. - Mesa 25.3 ships Rocket Gallium + Teflon TFLite delegate as the authoritative userspace reference for the uAPI shape. - Op coverage today is conv-centric (MobileNet-class); transformer matmul needs the conv-1×1 shoehorn (RKNPU2 BSP precedent) or rocket op-set additions. Surfaced as Phase-2-load-bearing risk. - IOMMU v1.0 hazard: 32 GB host needs `mem=4G` or local `rockchip,rk3568-iommu-v1` discriminator patches before the first NPU job, to avoid DMA-window faults. Files: - docs/npu-mainline-status.md: full audit table with upstream pointers (kernel.org / Mesa docs / dri-devel patch URLs / Tomeu's "we are in mainline" blog post). - docs/phases.md: per-phase log entry for Phase-1 closeout. - docs/op-coverage.md: matmul-vs-conv-vs-rocket-op-set framing. - fleet/boltzmann.yaml: audited kernel + npu_driver + dt_npu_nodes state. - kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso: overlay to flip the three rknn-core nodes to "okay" (+ matching mmu nodes), carries the IOMMU-mitigation warning inline. - kernel/README.md: kernel-agent scope wiring + anticipated local carry patches. - README.md: phase-status table + "rknpu → rocket" rename note. - TODO.md: Phase-2 unblock concrete steps + standing upstream-watch items.
5.4 KiB
5.4 KiB
TODO — Rosenblatt
Rolling punch-list. Older items at bottom (move done → DONE.md when noisy).
Phase 1 — substrate audit
- On boltzmann:
uname -r→ record infleet/boltzmann.yaml:kernel.running_version find / -path '*accel*' -name '*.ko' 2>/dev/null— check if accel framework is builtls /dev/accel/ /dev/dri/— what's exposed?lsmod | grep -iE 'rknpu|accel'— what's loaded?dmesg | grep -iE 'rknpu|npu|accel'since boot — driver bringup log- Tomeu's rknpu series — find on lore.kernel.org/dri-devel, capture latest
patch-set version + state (merged / in-review / dropped) → fill table in
docs/npu-mainline-status.md - Check
drivers/accel/in current torvalds tree — list in-tree accelerators, confirm rknpu's mainline state - Check DT bindings:
Documentation/devicetree/bindings/npu/rockchip,*.yaml - Inspect
arch/arm64/boot/dts/rockchip/rk3588.dtsifornpunode - If a userspace shim exists (rkneural?), capture repo URL + try hello-world against the running kernel
- Spec-extract from BSP vendor
rockchip-npusource — register map, DMA descriptor format, irq handling. No code lift; spec only.
Phase exit criteria: docs/npu-mainline-status.md table fully populated;
clear answer to "do we drive via accel uAPI or write our own MMIO driver."
Phase 2 — formulate
- List llama.cpp ops by wallclock %, profiling qwen-1.5B Q4_K_M on CPU (use llama.cpp's built-in perf-timer or perf record)
- Pick the exact INT8 matmul tile size the NPU prefers (read from BSP source)
- Spec out the smallest backend interface: which ops we MUST handle, which the framework falls back to CPU
- Write
docs/op-coverage.md
Phase 3 — analyze
- RKNPU2 SDK: trace through
librknnrt.souser-API → kernel ioctl shapes (objdump + strings, no actual reverse-engineering of vendor blob — just the syscall surface) - Tomeu's accel uAPI: read driver source, understand:
- submit-job ioctl shape
- dmabuf import path
- fence-wait mechanism
- error reporting
- BSP vendor
rockchip-npusource: register layout, DMA descriptor struct, irq handling sequence
Phase 4 — baseline
- Build vanilla llama.cpp on boltzmann (mainline branch)
- Pull qwen2.5-1.5b-instruct Q4_K_M GGUF
llama-bench -m qwen2.5-1.5b -p 512 -n 128× 3 runs- Capture JSON to
benchmarks/$(date +%F)_boltzmann_qwen1.5b_cpu_baseline.json - Record into
fleet/boltzmann.yaml:baseline_measurement
Phase-2 unblock — prerequisites (NEW, Phase-1 outflow)
Phase-1 audit (2026-05-19) reframed Phase-2 from "design rknpu backend
interface" to a concrete bringup sequence. See
docs/npu-mainline-status.md for full context.
- Patch boltzmann board DTS / overlay to flip
npu@fdab0000,npu@fdac0000,npu@fdad0000fromstatus = "disabled"→"okay". Rebuild DTB. - Mitigate IOMMU v1.0 hazard before first NPU job (32 GB host).
Pick one:
- (A) Boot with
mem=4Gfor first-bringup validation, OR - (B) Carry local patches: Simon Xue per-device-ops (<https://lore.kernel.org/all/20260310105303.128859-1-xxm@rock-chips.com/>) + Midgyrockchip,rk3568-iommu-v1discriminator compat + DT update forrknpu_mmuto the new compat. modprobe rocketand confirm/dev/accel/accel0..2appear, no probe errors in dmesg, IOMMU faults absent.- Read
drivers/accel/rocket/rocket_job.c+ Mesa Rocket Gallium to determine submit-job uAPI capabilities — specifically whether we can express a transformer matmul as a tile/op the NPU pipeline accepts, or whether we need additional op coverage upstream. - Decide matmul strategy (Phase-2 deliverable): conv-1×1 shoehorn / extend rocket op set / thinner submit shim.
Standing items — track upstream
- Watch
drivers/iommu/rockchip-iommu.cfor the discriminatorrockchip,rk3568-iommu-v1compat to land; drop local patch (B) when it does. - Watch
linux-rockchipfor the next iteration of Midgy / Simon's thread (last visible activity 2026-04-03). - Watch
drivers/accel/rocket/for matmul / GEMM op additions.
Cross-phase / standing items (older)
- Mirror Tomeu's branch — superseded: code is now in-tree.
Keep
git.kernel.org/.../torvalds/linux.gitcheckout pinned to the boltzmann kernel rev for in-tree reading. - Set up serial console on boltzmann for kernel-panic recovery (Quark umbrella; check current state) — becomes load-bearing once we start poking IOMMU code.
- Add
project_rosenblatt_overview.md+project_rocket_upstream_state.mdto claude-memory — done 2026-05-19. - Decide repo home: marfrit/rosenblatt on git.reauktion.de (probably yes, after Phase-1 substrate is captured).
- Resolve board-name discrepancy. README and
fleet/boltzmann.yamlsay boltzmann is a "Rock 5 ITX+" /rock-5-itx-plus; the running DT reportsmodel = "Radxa ROCK 5 ITX",compatible = "radxa,rock-5-itx". Confirm physical board model (Radxa sells both SKUs) and either correct the README + manifest, or note that we boot the plain-ITX DT on ITX+ hardware (likely fine; ITX+ is mostly a connectivity-refresh, same SoC + same NPU silicon).