Files

T

History

Markus Fritsche a2244675b1 npu-probe: backport mmap-leak fix + K>8192 backstop from ggml-rocket

bo_free() (munmap+close) fixes the per-tile mmap leak in rkt_npu_matmul; add
rocket_munmap_bo to librocket. rkt_matmul rejects K>8192 (int8 K-limit). Keeps
the canonical primitive in sync with the ggml-rocket backend copy in rk-llama.cpp.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EWpfhDgYNA21tETDP9ueBE

2026-07-14 11:32:13 +02:00

verify

Rosenblatt: golden-byte verification — port matches Mesa exactly

2026-07-13 20:56:16 +00:00

.gitignore

Rosenblatt U6: quantization bridge validated on NPU (float->quant->NPU->dequant)

2026-07-14 09:15:50 +02:00

cna_defs.h

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

cna_reference.txt

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

coredpu_defs.h

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

coredpu_reference.txt

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

gemmtest.c

Rosenblatt U6: per-output-channel requant in rkt_npu_matmul

2026-07-14 09:13:48 +02:00

ggmlbridge.c

npu-probe: ggml-linked bridge test — NPU reproduces ggml Q8_0 mul_mat

2026-07-14 09:25:03 +02:00

hwtest.c

Rosenblatt: FIRST CORRECT INT8 MATMUL ON THE MAINLINE ROCKET NPU

2026-07-14 08:36:14 +02:00

librocket.c

npu-probe: backport mmap-leak fix + K>8192 backstop from ggml-rocket

2026-07-14 11:32:13 +02:00

librocket.h

npu-probe: backport mmap-leak fix + K>8192 backstop from ggml-rocket

2026-07-14 11:32:13 +02:00

Makefile

Rosenblatt U6: GEMM tiler — block a large matmul into rocket-op-sized tiles

2026-07-13 21:20:46 +00:00

quantbridge.c

Rosenblatt U6: quantization bridge validated on NPU (float->quant->NPU->dequant)

2026-07-14 09:15:50 +02:00

README.md

Rosenblatt: project scaffold for RK3588 NPU on mainline

2026-05-19 11:57:48 +00:00

rkt_gemm.c

Rosenblatt U6: GEMM tiler — block a large matmul into rocket-op-sized tiles

2026-07-13 21:20:46 +00:00

rkt_gemm.h

Rosenblatt U6: GEMM tiler — block a large matmul into rocket-op-sized tiles

2026-07-13 21:20:46 +00:00

rkt_matmul_cna.c

Rosenblatt: golden-byte verification — port matches Mesa exactly

2026-07-13 20:56:16 +00:00

rkt_matmul_cna.h

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

rkt_matmul_coredpu.c

Rosenblatt: golden-byte verification — port matches Mesa exactly

2026-07-13 20:56:16 +00:00

rkt_matmul_coredpu.h

Rosenblatt 4b+4c: CORE+DPU+PC regcmd stage — full builder compiles

2026-07-13 19:49:10 +00:00

rkt_matmul.c

npu-probe: backport mmap-leak fix + K>8192 backstop from ggml-rocket

2026-07-14 11:32:13 +02:00

rkt_matmul.h

Rosenblatt: expose real requant + bias in the matmul builder API

2026-07-14 08:44:29 +02:00

rkt_npu_matmul.c

npu-probe: backport mmap-leak fix + K>8192 backstop from ggml-rocket

2026-07-14 11:32:13 +02:00

rkt_npu_matmul.h

Rosenblatt U6: per-output-channel requant in rkt_npu_matmul

2026-07-14 09:13:48 +02:00

rkt_operands.c

Rosenblatt: NPU operand layout (weights/features/bias/output tiling) + hwtest v2

2026-07-13 23:56:11 +02:00

rkt_operands.h

Rosenblatt: NPU operand layout (weights/features/bias/output tiling) + hwtest v2

2026-07-13 23:56:11 +02:00

rocket_accel.h

Rosenblatt Phase-2: userspace regcmd builder — U3 wrapper + CNA stage (4a)

2026-07-13 19:23:34 +00:00

selftest.c

Rosenblatt U6: GEMM tiler — block a large matmul into rocket-op-sized tiles

2026-07-13 21:20:46 +00:00

README.md

npu-probe

Smallest-possible userspace binary that:

Opens the NPU device (path TBD per Phase-1 audit)
Allocates two INT8 input tensors (64×64) + one output (64×64)
Submits a matmul via the uAPI in use (Tomeu's accel ioctl OR our own shim around vendor MMIO if accel-mainline isn't ready)
Waits for completion (DMA fence or polled completion register)
Reads back the output
Compares to a CPU INT8 matmul reference; reports pass/fail

Phase-1 deliverable. Until this works, nothing else in this repo can be exercised against real silicon.

Build

(filled when Phase-1 audit picks the uAPI shape — meson or cmake, no autotools)

Run

./npu-probe                      # default 64×64 INT8 matmul
./npu-probe --shape 128,128,128  # M,N,K override
./npu-probe --device /dev/accel/accel0    # override device path
./npu-probe --golden golden_64x64.bin     # provide expected output for diff

Why C, not Python

Direct ioctl + dmabuf + mmap. Python wrapper layer would obscure the exact syscall sequence we need to understand. Once npu-probe works, a Python binding for benchmark scripts is fine.

README.md Unescape Escape

npu-probe

Build

Run

Why C, not Python

README.md