# gdb_debug — single-step the benchmark functions under GDB Wraps each of `01_memset` / `02_memcpy32` / `03_magic_memset` in a C harness, copies the raw bytes into an RWX buffer, and calls through a function pointer. GDB attached to the harness lets you step every machine instruction of the real blob code — **no QEMU needed because boltzmann (and ampere, ohm, hertz) are natively aarch64.** ## Build ``` make # builds ./gdb_debug.elf natively on aarch64 ``` Cross-build recipe (if you ever want to run on x86 oppenheimer via qemu-user) lives in the Makefile; replace `gcc` with `aarch64-linux-gnu-gcc` and `ld` with `aarch64-linux-gnu-ld`, and launch under `qemu-aarch64-static -g 1234 ./gdb_debug.elf 1` with `gdb-multiarch` attaching to `:1234`. ## Run under GDB ``` gdb ./gdb_debug.elf (gdb) set pagination off (gdb) layout split # TUI: source / asm / regs split (gdb) break call_func # the dispatcher — one breakpoint catches all three (gdb) run 1 # 1=memset 2=memcpy32 3=magic_memset (gdb) stepi # one machine instruction (gdb) info reg # full register dump (gdb) x/8i $pc # peek 8 upcoming instructions (gdb) display/i $pc # auto-show next instruction on every stop (gdb) x/16bx $x0 # hex-dump 16 bytes from what X0 points at ``` ## What to look for ### Function 1 (memset) After `MOV X3, #0`, each iteration: `CMP X2, X3` → `B.NE` → `STRB W1, [X0, X3]` → `ADD X3, X3, #1` → back. Watch `$x3` advance, inspect `x/16bx $x0` to see the buffer filling with `0xAB`. ### Function 2 (memcpy32) First instruction is the alignment mask: `AND X2, X2, #0xfffffffc`. Set a watchpoint on `$x2` to catch the mask, then step the loop to watch 4-byte transfers: `LDR W4, [X1, X3]` ; `STR W4, [X0, X3]` ; `ADD X3, X3, #4`. ### Function 3 (magic_memset) Will **SIGSEGV** on `LDR W2, [X0, #4]` because `X0 = 0x1fe000` is unmapped in user mode. That crash **is** the verification — it proves the function really does target that absolute address. To execute the full path, add before `call_func`: ```c mmap((void*)0x1fe000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0); *(uint32_t*)0x1fe004 = 0x54410001; ``` Then the magic check passes and GDB steps into the tail-call to memset. ## Why this scaffold beats `ddr_emu2` for verifying trampolines `ddr_emu2` dies at PC=0x10a80 in the emulator because it can't model an MMIO register — blind spot for us. Native GDB on an aarch64 host runs the *actual* CPU with full instruction fidelity; the limit becomes "can we fake the MMIO responses?" rather than "does the emulator know this instruction?". For compute-only code (functions 1 and 2), zero prep needed. For MMIO-touching code, `mmap(MAP_FIXED)` + a signal handler stub can serve as a synthetic PHY — **that's the path to single-stepping a patched trampoline through the real ISA with fake hardware replies**, which is exactly what the next round of v3fb verification would need.