benchmark/: three-way RE-tool comparison + first real C-lift
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
01_memset — byte memset, 28 B
02_memcpy32 — word-aligned memcpy, 36 B
03_magic_memset — magic check + tail-call to memset, 40 B
04_train_phy_block — first real poll-site function (104 B, 26 insts),
contains poll sites 12-15
Results in RESULTS.md:
- Ghidra: A on all four. Auto-decompile is close to final.
- retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
C on #4 (mistakes & 0xF0000000 for < 0x10000000).
GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.
gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,38 @@
|
||||
# retdec recipe — 01_memset
|
||||
|
||||
retdec runs fully automated — hand it the binary, ask for C.
|
||||
|
||||
## Invocation (on the decompme container at pve4, or wherever retdec lives)
|
||||
|
||||
```
|
||||
retdec --mode raw --arch arm --endian little --bit-size 64 \
|
||||
--raw-entry-point 0x0aac \
|
||||
--raw-section-vma 0x0aac \
|
||||
func.bin -o retdec.c
|
||||
```
|
||||
|
||||
The flags:
|
||||
- `--mode raw` — input is a flat binary, no PE/ELF headers.
|
||||
- `--arch arm --endian little --bit-size 64` — AArch64 LE.
|
||||
- `--raw-entry-point 0x0aac` — tell retdec where execution starts.
|
||||
- `--raw-section-vma 0x0aac` — load the binary at address 0x0aac so
|
||||
branch targets resolve correctly.
|
||||
|
||||
Output goes to `retdec.c`. retdec emits a .ll (LLVM IR) and a .dsm
|
||||
(disasm) alongside — all useful for comparison.
|
||||
|
||||
## What to expect
|
||||
|
||||
retdec is the least "smart" of the three tools. For a raw 28-byte blob
|
||||
with no headers, it will:
|
||||
- Detect the function at 0x0aac.
|
||||
- Produce a C function named `function_aac` or similar.
|
||||
- Often inserts pseudo-intrinsics like `__asm_mov(x3, 0)` for instructions
|
||||
it doesn't fold into C. For this tiny loop it usually manages clean C.
|
||||
|
||||
## Benchmark notes
|
||||
|
||||
- Strength: zero-touch, scriptable, good for bulk processing.
|
||||
- Weakness: no interactive refinement — you get what you get. Type
|
||||
inference is conservative (`int32_t *` instead of `void *` / `uint8_t *`).
|
||||
- Often emits control flow as `goto` rather than structured loops.
|
||||
Reference in New Issue
Block a user