Changed u64v handshake reads to u32v with an inline zero-extending
upcast. Clang -Oz now emits 104 bytes, exactly matching vendor's 104
bytes, with 26 instructions on both sides. Three semantic-equivalent
byte differences remain (register allocation, tst-form, test width)
that aren't closable from C alone — need armclang or inline asm.
Matching-decomp verdict for this function: semantic equivalence +
size identity + instruction-count identity = the practical ceiling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tested candidate.c across GCC-15 and clang-19 optimization levels:
gcc -Os → 116 B (+12)
clang -O2/Os/Oz → 108 B (+4) ← best
vendor → 104 B (0)
Vendor output is SMALLER than GCC -Os, which rules out 'spa-appointment
dumb compiler' (hypothesis b). Clang being only 4 bytes off suggests
the vendor uses armclang or a similarly-tuned LLVM fork (hypothesis a).
Immediate consequence: default compiler for matching-decomp on this
blob is clang, not GCC. Our train_phy_block starting score jumps
from 89.7% (GCC -Os) to 96% (clang -Oz) before any C tweaking.
Pushing past 96% likely needs armclang or per-site inline asm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
01_memset — byte memset, 28 B
02_memcpy32 — word-aligned memcpy, 36 B
03_magic_memset — magic check + tail-call to memset, 40 B
04_train_phy_block — first real poll-site function (104 B, 26 insts),
contains poll sites 12-15
Results in RESULTS.md:
- Ghidra: A on all four. Auto-decompile is close to final.
- retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
C on #4 (mistakes & 0xF0000000 for < 0x10000000).
GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.
gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>