benchmark/: three-way RE-tool comparison + first real C-lift
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
01_memset — byte memset, 28 B
02_memcpy32 — word-aligned memcpy, 36 B
03_magic_memset — magic check + tail-call to memset, 40 B
04_train_phy_block — first real poll-site function (104 B, 26 insts),
contains poll sites 12-15
Results in RESULTS.md:
- Ghidra: A on all four. Auto-decompile is close to final.
- retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
C on #4 (mistakes & 0xF0000000 for < 0x10000000).
GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.
gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,41 @@
|
||||
# Ghidra recipe — 01_memset
|
||||
|
||||
## Load
|
||||
|
||||
**File → Import File…** → `func.bin`.
|
||||
|
||||
In the import dialog:
|
||||
- **Format:** Raw Binary
|
||||
- **Language:** AArch64:LE:64:v8A
|
||||
- **Base Address:** `0x0aac` ← critical; branches are PC-relative and the
|
||||
absolute function address matters for readability (though the code at
|
||||
0xaac has no absolute-addr refs of its own).
|
||||
|
||||
After import, click **Yes** on the "Analyze now?" prompt; default
|
||||
analyzers are fine.
|
||||
|
||||
## What to look for in Ghidra's decompiler output
|
||||
|
||||
- Function automatically detected at 0xaac (the file starts there).
|
||||
- Decompiler should produce something like:
|
||||
```c
|
||||
void FUN_00000aac(long param_1, byte param_2, long param_3) {
|
||||
long local_10 = 0;
|
||||
while (local_10 != param_3) {
|
||||
*(byte *)(param_1 + local_10) = param_2;
|
||||
local_10++;
|
||||
}
|
||||
}
|
||||
```
|
||||
- Idiomatic match rate: high for this pattern; Ghidra's decompiler
|
||||
recognises the pre-test loop well.
|
||||
- Ghidra types: `byte` (uint8_t), `long` (the 64-bit register) — not
|
||||
directly `uint8_t` / `size_t`. Manual retyping is usually needed.
|
||||
|
||||
## Benchmark notes
|
||||
|
||||
- Time to understandable output: ~seconds (auto-analysis).
|
||||
- Manual cleanup: rename `FUN_00000aac` → `memset_byte`; retype
|
||||
`param_1` to `void *`, `param_2` to `uint8_t`, `param_3` to `size_t`.
|
||||
- Limits: Ghidra's decompiler is position-dependent on the load address
|
||||
only for jump targets beyond the slice — irrelevant here.
|
||||
Reference in New Issue
Block a user