Files
marfrit 00d655187a benchmark/: three-way RE-tool comparison + first real C-lift
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
  01_memset        — byte memset, 28 B
  02_memcpy32      — word-aligned memcpy, 36 B
  03_magic_memset  — magic check + tail-call to memset, 40 B
  04_train_phy_block — first real poll-site function (104 B, 26 insts),
                       contains poll sites 12-15

Results in RESULTS.md:
  - Ghidra: A on all four. Auto-decompile is close to final.
  - retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
    C on #4 (mistakes & 0xF0000000 for < 0x10000000).

GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.

gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:26:23 +02:00

1.4 KiB

Ghidra recipe — 01_memset

Load

File → Import File…func.bin.

In the import dialog:

  • Format: Raw Binary
  • Language: AArch64:LE:64:v8A
  • Base Address: 0x0aac ← critical; branches are PC-relative and the absolute function address matters for readability (though the code at 0xaac has no absolute-addr refs of its own).

After import, click Yes on the "Analyze now?" prompt; default analyzers are fine.

What to look for in Ghidra's decompiler output

  • Function automatically detected at 0xaac (the file starts there).
  • Decompiler should produce something like:
    void FUN_00000aac(long param_1, byte param_2, long param_3) {
        long local_10 = 0;
        while (local_10 != param_3) {
            *(byte *)(param_1 + local_10) = param_2;
            local_10++;
        }
    }
    
  • Idiomatic match rate: high for this pattern; Ghidra's decompiler recognises the pre-test loop well.
  • Ghidra types: byte (uint8_t), long (the 64-bit register) — not directly uint8_t / size_t. Manual retyping is usually needed.

Benchmark notes

  • Time to understandable output: ~seconds (auto-analysis).
  • Manual cleanup: rename FUN_00000aacmemset_byte; retype param_1 to void *, param_2 to uint8_t, param_3 to size_t.
  • Limits: Ghidra's decompiler is position-dependent on the load address only for jump targets beyond the slice — irrelevant here.