Rosenblatt Phase-1 closeout: rocket-driver substrate inventory

Phase-1 audit closes with a substantively different picture than the
original scaffold's TBDs:

- Tomeu Vizoso's RK3588 NPU work merged in Linux 6.18 (Nov 2025) under
  codename `rocket` (NOT `rknpu`).  All references updated.
- Boltzmann's `linux-rk3588-marfrit-A1` (7.0.0-rc3-ARCH+) already ships
  `drivers/accel/rocket/rocket.ko` as a built-but-not-loaded module.
- DT bindings + per-core nodes (`npu@fdab/c/d_0000`,
  compatible `rockchip,rk3588-rknn-core`) in mainline since 6.18 but
  ship `status = "disabled"` — board enable is the Phase-2 unblock,
  not a driver port.
- Mesa 25.3 ships Rocket Gallium + Teflon TFLite delegate as the
  authoritative userspace reference for the uAPI shape.
- Op coverage today is conv-centric (MobileNet-class); transformer
  matmul needs the conv-1×1 shoehorn (RKNPU2 BSP precedent) or rocket
  op-set additions.  Surfaced as Phase-2-load-bearing risk.
- IOMMU v1.0 hazard: 32 GB host needs `mem=4G` or local
  `rockchip,rk3568-iommu-v1` discriminator patches before the first
  NPU job, to avoid DMA-window faults.

Files:
- docs/npu-mainline-status.md: full audit table with upstream pointers
  (kernel.org / Mesa docs / dri-devel patch URLs / Tomeu's "we are in
  mainline" blog post).
- docs/phases.md: per-phase log entry for Phase-1 closeout.
- docs/op-coverage.md: matmul-vs-conv-vs-rocket-op-set framing.
- fleet/boltzmann.yaml: audited kernel + npu_driver + dt_npu_nodes
  state.
- kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso: overlay to
  flip the three rknn-core nodes to "okay" (+ matching mmu nodes),
  carries the IOMMU-mitigation warning inline.
- kernel/README.md: kernel-agent scope wiring + anticipated local
  carry patches.
- README.md: phase-status table + "rknpu → rocket" rename note.
- TODO.md: Phase-2 unblock concrete steps + standing
  upstream-watch items.
This commit is contained in:
2026-05-19 12:41:31 +00:00
parent 24adc74812
commit c9a3f5c600
8 changed files with 731 additions and 61 deletions
+111
View File
@@ -0,0 +1,111 @@
# phases — Rosenblatt per-phase log
One entry per phase as it closes. Stores the *findings* (what we
learned that future-us shouldn't have to rediscover) and the *next
gate* (what Phase N+1 needs from us). Lives alongside the
campaign-level README; this file is the durable journal.
---
## Phase 0 — bootstrap
**Closed:** 2026-05-19
**Deliverable:** repo scaffold (README, TODO, docs/, kernel/, userspace/,
fleet/, benchmarks/), one initial commit `Rosenblatt: project scaffold
for RK3588 NPU on mainline`.
---
## Phase 1 — substrate audit
**Closed:** 2026-05-19
**Deliverable:** `docs/npu-mainline-status.md` table fully populated;
`fleet/boltzmann.yaml` kernel/NPU-driver block filled with live data;
clear answer to the "accel uAPI vs. own MMIO driver" question.
### Findings
1. **`rocket` driver is in mainline** — Tomeu Vizoso's NPU work merged
to torvalds in Linux 6.18 (Nov 2025) as `drivers/accel/rocket/`,
Kconfig `DRM_ACCEL_ROCKET`. Driver author + history kept the same
shape, but the **upstream name is `rocket`, not `rknpu`**
searching for `rknpu` in mainline misses everything.
2. **Boltzmann already ships the driver**`linux-rk3588-marfrit-A1`
(7.0.0-rc3-ARCH+) is post-6.18 and contains `rocket.ko` at
`/lib/modules/.../drivers/accel/rocket/rocket.ko`, marked
`intree: Y`. Module aliases to `rockchip,rk3588-rknn-core`,
matching the DT compatibles on the box.
3. **DT nodes present but disabled**`rk3588-base.dtsi` defines
`rknn_core_0/1/2` at `0xfdab/c/d_0000` with compat
`rockchip,rk3588-rknn-core`; all three boot with `status =
"disabled"`. No board file enables them. Per-core IOMMUs
`rknn_mmu_0/1/2` at `0xfdab9/aca/ada_000` also disabled.
4. **Userspace is Mesa Rocket Gallium + Teflon** — shipped in Mesa
25.3. `src/gallium/drivers/rocket/` in mesa3d main is the
authoritative reference for regcmd construction. `rkt_regcmd.c`
is ~700 lines, single-conv emit path. No matmul-specific code.
5. **Kernel is a thin shim**`drivers/accel/rocket/rocket_drv.c`
exposes a single facade `/dev/accel/accel0` for all probed RKNN
cores. uAPI is 4 ioctls (CREATE_BO, SUBMIT, PREP_BO, FINI_BO).
`rocket_job_hw_submit()` powers the NPU, points
`PC_BASE_ADDRESS` at the regcmd, sets `PC_REGISTER_AMOUNTS`,
pulls `PC_OPERATION_ENABLE.OP_EN = 1`. Everything else is the
userspace-built regcmd buffer.
6. **NPU sub-blocks** identified from `rocket_registers.h` interrupt
masks: **PC** (program controller), **CNA** (conv neural-net
accel; FEATURE / WEIGHT / CSC channels), **CORE** (MAC array),
**DPU** (data processing unit, requant), **PPU**
(post-processing — not exercised by Mesa today). Each per-core
block has 2 parallel channels for double-buffering.
7. **Matmul-as-conv-1×1 is the only viable path** — confirmed by
reading Mesa's emit path. INT8 matmul `Y[M,N] = X[M,K] @ W[K,N]`
maps cleanly to a conv with width=height=1 kernel,
`DATAIN_CHANNEL=K`, `WEIGHT_KERNELS=N`. The vendor RKNPU2 stack
does the same shoehorn.
8. **IOMMU v1.0 hazard surfaced from `linux-rockchip` thread**
(Midgy BALON / Simon Xue, 2026-04-03). The NPU IOMMU is v1.0 IP
bound to generic `rockchip,rk3568-iommu` — driven via the v2.0
code path. v1.0 can't allocate its DTE above 4 GB. Boltzmann
has 32 GB. Naive enable will silently fault. Discriminator-compat
patch series planned but **not landed in mainline master as of
2026-05-19** (verified via cgit on
`drivers/iommu/rockchip-iommu.c`).
9. **Vendor stack is off-limits**`librknnrt.so` is a closed
binary blob under restrictive Rockchip license. BSP
`rockchip-linux/kernel` `drivers/rknpu/` source is permitted as
a spec-extraction reference only.
### Phase exit decision
**Drive via the `rocket` DRM-accel uAPI.** Writing our own MMIO
driver would mean re-implementing IOMMU integration, power-domain
sequencing, and fence/sched plumbing that's already in-tree and
production-validated by Mesa Teflon consumers. The Phase-2 unblock
list is short: DT enable + IOMMU mitigation + `modprobe rocket`.
### Phase outflow → TODO
Captured in `TODO.md` "Phase-2 unblock" section. Highlights:
- Apply `kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso` (or
equivalent board-DTS patch) to boltzmann.
- Mitigate IOMMU v1.0 hazard before first NPU job: `mem=4G` boot or
local discriminator-compat carry.
- `modprobe rocket`, confirm `/dev/accel/accel0`, no IOMMU faults.
- Read `rkt_regcmd.c`, `rkt_ml.c`, `rkt_task.c`, `rkt_coefs.c` from
Mesa for the conv-1×1 matmul encoding details (op-coverage.md
has the first cut).
### Memory persisted
- `project_rosenblatt_overview.md`
- `project_rocket_upstream_state.md` (note: name is `rocket`, not
`rknpu`)
- `project_iommu_v1_hazard.md`
---
## Phase 2 — formulate (open)
**Status:** open as of 2026-05-19. See `TODO.md` and
`docs/op-coverage.md` for current state of the formulation.