Files
rk3588-ddr-analysis/benchmark/04_train_phy_block/decompme.md
T
marfrit 00d655187a benchmark/: three-way RE-tool comparison + first real C-lift
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
  01_memset        — byte memset, 28 B
  02_memcpy32      — word-aligned memcpy, 36 B
  03_magic_memset  — magic check + tail-call to memset, 40 B
  04_train_phy_block — first real poll-site function (104 B, 26 insts),
                       contains poll sites 12-15

Results in RESULTS.md:
  - Ghidra: A on all four. Auto-decompile is close to final.
  - retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
    C on #4 (mistakes & 0xF0000000 for < 0x10000000).

GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.

gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:26:23 +02:00

2.1 KiB

decomp.me recipe — 04_train_phy_block

This is the first real-blob function we're lifting to byte-matching C. Score target: ≥95% match. Perfect match unlikely (compiler unknown).

Target asm (paste into "Target asm" field)

train_phy_block:
    ldr     x0, [x0, #0xb8]
    mov     w1, #0xf000f000
    add     x0, x0, #0x8000
    str     w1, [x0, #0x110]
.Lwait_a:
    ldr     w1, [x0, #0x118]
    tst     w1, #0xf0000000
    b.eq    .Lwait_a
.Lwait_b:
    ldr     w1, [x0, #0x120]
    tst     w1, #0xf0000000
    b.eq    .Lwait_b
    mov     w1, #0x30003
    str     w1, [x0, #0x160]
    str     w1, [x0, #0x154]
.Lwait_hs1:
    ldr     w1, [x0, #0x184]
    tst     x1, #0x3
    b.eq    .Lwait_hs1
    mov     w1, #0x30000
    str     w1, [x0, #0x154]
.Lwait_hs2:
    ldr     w1, [x0, #0x184]
    tst     x1, #0x3
    b.ne    .Lwait_hs2
    mov     w1, #0x30000
    str     w1, [x0, #0x160]
    mov     w1, #0xf0000000
    str     w1, [x0, #0x110]
    ret

Compiler

aarch64-linux-gnu gcc 12 -O2 -ffreestanding -nostdlib (Try also -Os. Vendor blob's compiler unknown — could be ARMCC or older GCC. Optimal C may differ between targets; perfect byte-match probably unattainable.)

Context

Use reference.c as the starting C. The CMP-vs-TST distinction at the end (tst x1, #0x3 uses 64-bit reg even though w1 was loaded — vendor quirk) suggests a particular intrinsic / pattern. May need to write the load as (uint64_t)mmio_r(...) and the test as a 64-bit AND to coax GCC into emitting tst x1 instead of tst w1.

Things to iterate on

  • Order of writes to CFG_A vs CFG_B: vendor wrote CFG_B first (str w1, [x0, #0x160] then str w1, [x0, #0x154]). C order matters.
  • The two mov w1, #0x30000 near the end could be hoisted by GCC; vendor emitted them inline. May need separate variables to prevent hoist.
  • add x0, x0, #0x8000 vs add x0, x0, #0x8, lsl #12 — same instruction, GAS picks one. Either should round-trip.

Score expectations

  • 80%: rough loop structure + register usage matches.
  • 95%: instruction order + immediate forms match.
  • 100%: would require exact compiler/version match. Unlikely without ARMCC.