Files
rosenblatt/TODO.md
T
marfrit c9a3f5c600 Rosenblatt Phase-1 closeout: rocket-driver substrate inventory
Phase-1 audit closes with a substantively different picture than the
original scaffold's TBDs:

- Tomeu Vizoso's RK3588 NPU work merged in Linux 6.18 (Nov 2025) under
  codename `rocket` (NOT `rknpu`).  All references updated.
- Boltzmann's `linux-rk3588-marfrit-A1` (7.0.0-rc3-ARCH+) already ships
  `drivers/accel/rocket/rocket.ko` as a built-but-not-loaded module.
- DT bindings + per-core nodes (`npu@fdab/c/d_0000`,
  compatible `rockchip,rk3588-rknn-core`) in mainline since 6.18 but
  ship `status = "disabled"` — board enable is the Phase-2 unblock,
  not a driver port.
- Mesa 25.3 ships Rocket Gallium + Teflon TFLite delegate as the
  authoritative userspace reference for the uAPI shape.
- Op coverage today is conv-centric (MobileNet-class); transformer
  matmul needs the conv-1×1 shoehorn (RKNPU2 BSP precedent) or rocket
  op-set additions.  Surfaced as Phase-2-load-bearing risk.
- IOMMU v1.0 hazard: 32 GB host needs `mem=4G` or local
  `rockchip,rk3568-iommu-v1` discriminator patches before the first
  NPU job, to avoid DMA-window faults.

Files:
- docs/npu-mainline-status.md: full audit table with upstream pointers
  (kernel.org / Mesa docs / dri-devel patch URLs / Tomeu's "we are in
  mainline" blog post).
- docs/phases.md: per-phase log entry for Phase-1 closeout.
- docs/op-coverage.md: matmul-vs-conv-vs-rocket-op-set framing.
- fleet/boltzmann.yaml: audited kernel + npu_driver + dt_npu_nodes
  state.
- kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso: overlay to
  flip the three rknn-core nodes to "okay" (+ matching mmu nodes),
  carries the IOMMU-mitigation warning inline.
- kernel/README.md: kernel-agent scope wiring + anticipated local
  carry patches.
- README.md: phase-status table + "rknpu → rocket" rename note.
- TODO.md: Phase-2 unblock concrete steps + standing
  upstream-watch items.
2026-05-19 12:41:31 +00:00

121 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TODO — Rosenblatt
Rolling punch-list. Older items at bottom (move done → DONE.md when noisy).
---
## Phase 1 — substrate audit
- [ ] On boltzmann: `uname -r` → record in `fleet/boltzmann.yaml:kernel.running_version`
- [ ] `find / -path '*accel*' -name '*.ko' 2>/dev/null` — check if accel framework is built
- [ ] `ls /dev/accel/ /dev/dri/` — what's exposed?
- [ ] `lsmod | grep -iE 'rknpu|accel'` — what's loaded?
- [ ] `dmesg | grep -iE 'rknpu|npu|accel'` since boot — driver bringup log
- [ ] Tomeu's rknpu series — find on lore.kernel.org/dri-devel, capture latest
patch-set version + state (merged / in-review / dropped) → fill table in
`docs/npu-mainline-status.md`
- [ ] Check `drivers/accel/` in current torvalds tree — list in-tree
accelerators, confirm rknpu's mainline state
- [ ] Check DT bindings: `Documentation/devicetree/bindings/npu/rockchip,*.yaml`
- [ ] Inspect `arch/arm64/boot/dts/rockchip/rk3588.dtsi` for `npu` node
- [ ] If a userspace shim exists (rkneural?), capture repo URL + try
hello-world against the running kernel
- [ ] Spec-extract from BSP vendor `rockchip-npu` source — register map,
DMA descriptor format, irq handling. No code lift; spec only.
Phase exit criteria: `docs/npu-mainline-status.md` table fully populated;
clear answer to "do we drive via accel uAPI or write our own MMIO driver."
---
## Phase 2 — formulate
- [ ] List llama.cpp ops by wallclock %, profiling qwen-1.5B Q4_K_M on CPU
(use llama.cpp's built-in perf-timer or perf record)
- [ ] Pick the exact INT8 matmul tile size the NPU prefers (read from BSP source)
- [ ] Spec out the smallest backend interface: which ops we MUST handle,
which the framework falls back to CPU
- [ ] Write `docs/op-coverage.md`
---
## Phase 3 — analyze
- [ ] RKNPU2 SDK: trace through `librknnrt.so` user-API → kernel ioctl shapes
(objdump + strings, no actual reverse-engineering of vendor blob — just
the syscall surface)
- [ ] Tomeu's accel uAPI: read driver source, understand:
- submit-job ioctl shape
- dmabuf import path
- fence-wait mechanism
- error reporting
- [ ] BSP vendor `rockchip-npu` source: register layout, DMA descriptor
struct, irq handling sequence
---
## Phase 4 — baseline
- [ ] Build vanilla llama.cpp on boltzmann (mainline branch)
- [ ] Pull qwen2.5-1.5b-instruct Q4_K_M GGUF
- [ ] `llama-bench -m qwen2.5-1.5b -p 512 -n 128` × 3 runs
- [ ] Capture JSON to `benchmarks/$(date +%F)_boltzmann_qwen1.5b_cpu_baseline.json`
- [ ] Record into `fleet/boltzmann.yaml:baseline_measurement`
---
## Phase-2 unblock — prerequisites (NEW, Phase-1 outflow)
Phase-1 audit (2026-05-19) reframed Phase-2 from "design rknpu backend
interface" to a concrete bringup sequence. See
`docs/npu-mainline-status.md` for full context.
- [ ] Patch boltzmann board DTS / overlay to flip
`npu@fdab0000`, `npu@fdac0000`, `npu@fdad0000` from
`status = "disabled"``"okay"`. Rebuild DTB.
- [ ] **Mitigate IOMMU v1.0 hazard before first NPU job** (32 GB host).
Pick one:
- (A) Boot with `mem=4G` for first-bringup validation, OR
- (B) Carry local patches: Simon Xue per-device-ops
(`<https://lore.kernel.org/all/20260310105303.128859-1-xxm@rock-chips.com/>`)
+ Midgy `rockchip,rk3568-iommu-v1` discriminator compat
+ DT update for `rknpu_mmu` to the new compat.
- [ ] `modprobe rocket` and confirm `/dev/accel/accel0..2` appear, no
probe errors in dmesg, IOMMU faults absent.
- [ ] Read `drivers/accel/rocket/rocket_job.c` + Mesa Rocket Gallium to
determine submit-job uAPI capabilities — specifically whether
we can express a transformer matmul as a tile/op the NPU pipeline
accepts, or whether we need additional op coverage upstream.
- [ ] Decide matmul strategy (Phase-2 deliverable):
conv-1×1 shoehorn / extend rocket op set / thinner submit shim.
## Standing items — track upstream
- [ ] Watch `drivers/iommu/rockchip-iommu.c` for the discriminator
`rockchip,rk3568-iommu-v1` compat to land; drop local patch (B)
when it does.
- [ ] Watch `linux-rockchip` for the next iteration of Midgy / Simon's
thread (last visible activity 2026-04-03).
- [ ] Watch `drivers/accel/rocket/` for matmul / GEMM op additions.
## Cross-phase / standing items (older)
- [ ] Mirror Tomeu's branch — superseded: code is now in-tree.
Keep `git.kernel.org/.../torvalds/linux.git` checkout pinned to
the boltzmann kernel rev for in-tree reading.
- [ ] Set up serial console on boltzmann for kernel-panic recovery
(Quark umbrella; check current state) — **becomes load-bearing
once we start poking IOMMU code.**
- [x] Add `project_rosenblatt_overview.md` + `project_rocket_upstream_state.md`
to claude-memory — done 2026-05-19.
- [ ] Decide repo home: marfrit/rosenblatt on git.reauktion.de
(probably yes, after Phase-1 substrate is captured).
- [ ] **Resolve board-name discrepancy.** README and
`fleet/boltzmann.yaml` say boltzmann is a "Rock 5 ITX+" /
`rock-5-itx-plus`; the running DT reports
`model = "Radxa ROCK 5 ITX"`, `compatible = "radxa,rock-5-itx"`.
Confirm physical board model (Radxa sells both SKUs) and
either correct the README + manifest, or note that we boot
the plain-ITX DT on ITX+ hardware (likely fine; ITX+ is mostly
a connectivity-refresh, same SoC + same NPU silicon).