c9a3f5c600
Phase-1 audit closes with a substantively different picture than the original scaffold's TBDs: - Tomeu Vizoso's RK3588 NPU work merged in Linux 6.18 (Nov 2025) under codename `rocket` (NOT `rknpu`). All references updated. - Boltzmann's `linux-rk3588-marfrit-A1` (7.0.0-rc3-ARCH+) already ships `drivers/accel/rocket/rocket.ko` as a built-but-not-loaded module. - DT bindings + per-core nodes (`npu@fdab/c/d_0000`, compatible `rockchip,rk3588-rknn-core`) in mainline since 6.18 but ship `status = "disabled"` — board enable is the Phase-2 unblock, not a driver port. - Mesa 25.3 ships Rocket Gallium + Teflon TFLite delegate as the authoritative userspace reference for the uAPI shape. - Op coverage today is conv-centric (MobileNet-class); transformer matmul needs the conv-1×1 shoehorn (RKNPU2 BSP precedent) or rocket op-set additions. Surfaced as Phase-2-load-bearing risk. - IOMMU v1.0 hazard: 32 GB host needs `mem=4G` or local `rockchip,rk3568-iommu-v1` discriminator patches before the first NPU job, to avoid DMA-window faults. Files: - docs/npu-mainline-status.md: full audit table with upstream pointers (kernel.org / Mesa docs / dri-devel patch URLs / Tomeu's "we are in mainline" blog post). - docs/phases.md: per-phase log entry for Phase-1 closeout. - docs/op-coverage.md: matmul-vs-conv-vs-rocket-op-set framing. - fleet/boltzmann.yaml: audited kernel + npu_driver + dt_npu_nodes state. - kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso: overlay to flip the three rknn-core nodes to "okay" (+ matching mmu nodes), carries the IOMMU-mitigation warning inline. - kernel/README.md: kernel-agent scope wiring + anticipated local carry patches. - README.md: phase-status table + "rknpu → rocket" rename note. - TODO.md: Phase-2 unblock concrete steps + standing upstream-watch items.
112 lines
4.8 KiB
Markdown
112 lines
4.8 KiB
Markdown
# phases — Rosenblatt per-phase log
|
||
|
||
One entry per phase as it closes. Stores the *findings* (what we
|
||
learned that future-us shouldn't have to rediscover) and the *next
|
||
gate* (what Phase N+1 needs from us). Lives alongside the
|
||
campaign-level README; this file is the durable journal.
|
||
|
||
---
|
||
|
||
## Phase 0 — bootstrap
|
||
|
||
**Closed:** 2026-05-19
|
||
**Deliverable:** repo scaffold (README, TODO, docs/, kernel/, userspace/,
|
||
fleet/, benchmarks/), one initial commit `Rosenblatt: project scaffold
|
||
for RK3588 NPU on mainline`.
|
||
|
||
---
|
||
|
||
## Phase 1 — substrate audit
|
||
|
||
**Closed:** 2026-05-19
|
||
**Deliverable:** `docs/npu-mainline-status.md` table fully populated;
|
||
`fleet/boltzmann.yaml` kernel/NPU-driver block filled with live data;
|
||
clear answer to the "accel uAPI vs. own MMIO driver" question.
|
||
|
||
### Findings
|
||
|
||
1. **`rocket` driver is in mainline** — Tomeu Vizoso's NPU work merged
|
||
to torvalds in Linux 6.18 (Nov 2025) as `drivers/accel/rocket/`,
|
||
Kconfig `DRM_ACCEL_ROCKET`. Driver author + history kept the same
|
||
shape, but the **upstream name is `rocket`, not `rknpu`** —
|
||
searching for `rknpu` in mainline misses everything.
|
||
2. **Boltzmann already ships the driver** — `linux-rk3588-marfrit-A1`
|
||
(7.0.0-rc3-ARCH+) is post-6.18 and contains `rocket.ko` at
|
||
`/lib/modules/.../drivers/accel/rocket/rocket.ko`, marked
|
||
`intree: Y`. Module aliases to `rockchip,rk3588-rknn-core`,
|
||
matching the DT compatibles on the box.
|
||
3. **DT nodes present but disabled** — `rk3588-base.dtsi` defines
|
||
`rknn_core_0/1/2` at `0xfdab/c/d_0000` with compat
|
||
`rockchip,rk3588-rknn-core`; all three boot with `status =
|
||
"disabled"`. No board file enables them. Per-core IOMMUs
|
||
`rknn_mmu_0/1/2` at `0xfdab9/aca/ada_000` also disabled.
|
||
4. **Userspace is Mesa Rocket Gallium + Teflon** — shipped in Mesa
|
||
25.3. `src/gallium/drivers/rocket/` in mesa3d main is the
|
||
authoritative reference for regcmd construction. `rkt_regcmd.c`
|
||
is ~700 lines, single-conv emit path. No matmul-specific code.
|
||
5. **Kernel is a thin shim** — `drivers/accel/rocket/rocket_drv.c`
|
||
exposes a single facade `/dev/accel/accel0` for all probed RKNN
|
||
cores. uAPI is 4 ioctls (CREATE_BO, SUBMIT, PREP_BO, FINI_BO).
|
||
`rocket_job_hw_submit()` powers the NPU, points
|
||
`PC_BASE_ADDRESS` at the regcmd, sets `PC_REGISTER_AMOUNTS`,
|
||
pulls `PC_OPERATION_ENABLE.OP_EN = 1`. Everything else is the
|
||
userspace-built regcmd buffer.
|
||
6. **NPU sub-blocks** identified from `rocket_registers.h` interrupt
|
||
masks: **PC** (program controller), **CNA** (conv neural-net
|
||
accel; FEATURE / WEIGHT / CSC channels), **CORE** (MAC array),
|
||
**DPU** (data processing unit, requant), **PPU**
|
||
(post-processing — not exercised by Mesa today). Each per-core
|
||
block has 2 parallel channels for double-buffering.
|
||
7. **Matmul-as-conv-1×1 is the only viable path** — confirmed by
|
||
reading Mesa's emit path. INT8 matmul `Y[M,N] = X[M,K] @ W[K,N]`
|
||
maps cleanly to a conv with width=height=1 kernel,
|
||
`DATAIN_CHANNEL=K`, `WEIGHT_KERNELS=N`. The vendor RKNPU2 stack
|
||
does the same shoehorn.
|
||
8. **IOMMU v1.0 hazard surfaced from `linux-rockchip` thread**
|
||
(Midgy BALON / Simon Xue, 2026-04-03). The NPU IOMMU is v1.0 IP
|
||
bound to generic `rockchip,rk3568-iommu` — driven via the v2.0
|
||
code path. v1.0 can't allocate its DTE above 4 GB. Boltzmann
|
||
has 32 GB. Naive enable will silently fault. Discriminator-compat
|
||
patch series planned but **not landed in mainline master as of
|
||
2026-05-19** (verified via cgit on
|
||
`drivers/iommu/rockchip-iommu.c`).
|
||
9. **Vendor stack is off-limits** — `librknnrt.so` is a closed
|
||
binary blob under restrictive Rockchip license. BSP
|
||
`rockchip-linux/kernel` `drivers/rknpu/` source is permitted as
|
||
a spec-extraction reference only.
|
||
|
||
### Phase exit decision
|
||
|
||
**Drive via the `rocket` DRM-accel uAPI.** Writing our own MMIO
|
||
driver would mean re-implementing IOMMU integration, power-domain
|
||
sequencing, and fence/sched plumbing that's already in-tree and
|
||
production-validated by Mesa Teflon consumers. The Phase-2 unblock
|
||
list is short: DT enable + IOMMU mitigation + `modprobe rocket`.
|
||
|
||
### Phase outflow → TODO
|
||
|
||
Captured in `TODO.md` "Phase-2 unblock" section. Highlights:
|
||
|
||
- Apply `kernel/dt-overlays/rk3588-rosenblatt-npu-enable.dtso` (or
|
||
equivalent board-DTS patch) to boltzmann.
|
||
- Mitigate IOMMU v1.0 hazard before first NPU job: `mem=4G` boot or
|
||
local discriminator-compat carry.
|
||
- `modprobe rocket`, confirm `/dev/accel/accel0`, no IOMMU faults.
|
||
- Read `rkt_regcmd.c`, `rkt_ml.c`, `rkt_task.c`, `rkt_coefs.c` from
|
||
Mesa for the conv-1×1 matmul encoding details (op-coverage.md
|
||
has the first cut).
|
||
|
||
### Memory persisted
|
||
|
||
- `project_rosenblatt_overview.md`
|
||
- `project_rocket_upstream_state.md` (note: name is `rocket`, not
|
||
`rknpu`)
|
||
- `project_iommu_v1_hazard.md`
|
||
|
||
---
|
||
|
||
## Phase 2 — formulate (open)
|
||
|
||
**Status:** open as of 2026-05-19. See `TODO.md` and
|
||
`docs/op-coverage.md` for current state of the formulation.
|