Files
marfrit 06d3d0d726 benchmark: AI-Ghidra landscape + case-4 harness (synthetic PHY)
- benchmark/ai_ghidra/SETUP.md documents the GhidrAssist 1.5.0 install
  at /opt/ghidra/Ghidra/Extensions/GhidrAssist/ on oppenheimer (CT131),
  with dirac endpoints (Hermes-2-Pro 8B @ :8080, Qwen-coder 1.5B @ :8081)
  already reachable + tested. Final enable+config is UI-only; two
  clicks on next Ghidra launch.
- gdb_debug/harness.c extended with case 4 = train_phy_block running
  under a synthetic PHY at 0x40000000. Static MMIO shim satisfies
  polls 1-3; poll 4 needs dynamic state-machine (next session, via
  SIGBUS handler or ptrace) — documented in the README.

Vendor tree investigation: Rockchip's own sdram_rk3588.c / sdram_rk3568.c
are STUBS (return -1). No free function names from there. Path forward:
mine the vendor kernel's rockchip_dmc.c (devfreq DDR scaling driver)
for register-offset naming hints at runtime-call level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:03:43 +02:00
..

gdb_debug — single-step the benchmark functions under GDB

Wraps each of 01_memset / 02_memcpy32 / 03_magic_memset in a C harness, copies the raw bytes into an RWX buffer, and calls through a function pointer. GDB attached to the harness lets you step every machine instruction of the real blob code — no QEMU needed because boltzmann (and ampere, ohm, hertz) are natively aarch64.

Build

make                 # builds ./gdb_debug.elf natively on aarch64

Cross-build recipe (if you ever want to run on x86 oppenheimer via qemu-user) lives in the Makefile; replace gcc with aarch64-linux-gnu-gcc and ld with aarch64-linux-gnu-ld, and launch under qemu-aarch64-static -g 1234 ./gdb_debug.elf 1 with gdb-multiarch attaching to :1234.

Run under GDB

gdb ./gdb_debug.elf
(gdb) set pagination off
(gdb) layout split            # TUI: source / asm / regs split
(gdb) break call_func         # the dispatcher — one breakpoint catches all three
(gdb) run 1                   # 1=memset  2=memcpy32  3=magic_memset
(gdb) stepi                   # one machine instruction
(gdb) info reg                # full register dump
(gdb) x/8i $pc                # peek 8 upcoming instructions
(gdb) display/i $pc           # auto-show next instruction on every stop
(gdb) x/16bx $x0              # hex-dump 16 bytes from what X0 points at

What to look for

Function 1 (memset)

After MOV X3, #0, each iteration: CMP X2, X3B.NESTRB W1, [X0, X3]ADD X3, X3, #1 → back. Watch $x3 advance, inspect x/16bx $x0 to see the buffer filling with 0xAB.

Function 2 (memcpy32)

First instruction is the alignment mask: AND X2, X2, #0xfffffffc. Set a watchpoint on $x2 to catch the mask, then step the loop to watch 4-byte transfers: LDR W4, [X1, X3] ; STR W4, [X0, X3] ; ADD X3, X3, #4.

Function 3 (magic_memset)

Will SIGSEGV on LDR W2, [X0, #4] because X0 = 0x1fe000 is unmapped in user mode. That crash is the verification — it proves the function really does target that absolute address. To execute the full path, add before call_func:

mmap((void*)0x1fe000, 4096, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
*(uint32_t*)0x1fe004 = 0x54410001;

Then the magic check passes and GDB steps into the tail-call to memset.

Why this scaffold beats ddr_emu2 for verifying trampolines

ddr_emu2 dies at PC=0x10a80 in the emulator because it can't model an MMIO register — blind spot for us. Native GDB on an aarch64 host runs the actual CPU with full instruction fidelity; the limit becomes "can we fake the MMIO responses?" rather than "does the emulator know this instruction?". For compute-only code (functions 1 and 2), zero prep needed. For MMIO-touching code, mmap(MAP_FIXED) + a signal handler stub can serve as a synthetic PHY — that's the path to single-stepping a patched trampoline through the real ISA with fake hardware replies, which is exactly what the next round of v3fb verification would need.