04_train_phy_block GRIND_LOG: compiler matrix resolves (a)/(b) question
Tested candidate.c across GCC-15 and clang-19 optimization levels: gcc -Os → 116 B (+12) clang -O2/Os/Oz → 108 B (+4) ← best vendor → 104 B (0) Vendor output is SMALLER than GCC -Os, which rules out 'spa-appointment dumb compiler' (hypothesis b). Clang being only 4 bytes off suggests the vendor uses armclang or a similarly-tuned LLVM fork (hypothesis a). Immediate consequence: default compiler for matching-decomp on this blob is clang, not GCC. Our train_phy_block starting score jumps from 89.7% (GCC -Os) to 96% (clang -Oz) before any C tweaking. Pushing past 96% likely needs armclang or per-site inline asm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -78,3 +78,38 @@ byte-matching (or functionally-equivalent) C version, we can:
|
|||||||
That's the path to a maintainable replacement for the trampoline-based
|
That's the path to a maintainable replacement for the trampoline-based
|
||||||
v3fb approach, **for at least these 4 sites**. The other 12 sites live
|
v3fb approach, **for at least these 4 sites**. The other 12 sites live
|
||||||
in different functions and would each need their own lift.
|
in different functions and would each need their own lift.
|
||||||
|
|
||||||
|
## Compiler matrix 2026-04-15 late evening
|
||||||
|
|
||||||
|
Tested the same `candidate.c` across GCC and clang:
|
||||||
|
|
||||||
|
| compiler | best flag | size | diff vs vendor 104 |
|
||||||
|
|---|---|---|---|
|
||||||
|
| gcc 15 | -Os | 116 B | +12 |
|
||||||
|
| gcc 15 | -O1 | 120 B | +16 |
|
||||||
|
| gcc 15 | -O2/-O3 | 128 B | +24 |
|
||||||
|
| **clang 19** | **-O2 / -Os / -Oz** | **108 B** | **+4** |
|
||||||
|
| clang 19 | -O1 | 112 B | +8 |
|
||||||
|
| vendor | | 104 B | 0 |
|
||||||
|
|
||||||
|
**Clang at -Oz is 4 bytes off vendor.** 96% size match on our first
|
||||||
|
compile. GCC -Os tops out at 12 bytes off — 89.7%. The difference is
|
||||||
|
consistent with how each compiler encodes mask-tests and the addressing
|
||||||
|
it picks for short-imm offsets into a base+offset pointer — clang
|
||||||
|
prefers `TST Wx, #imm` (single instruction, native imm encoding), GCC
|
||||||
|
prefers `MOV Wy, #const; CMP Wx, Wy; B.cc` (three instructions, larger).
|
||||||
|
|
||||||
|
**Consequence:** default compiler for matching-decomp on this blob is
|
||||||
|
clang, not GCC. Move already committed in this GRIND_LOG; all future
|
||||||
|
poll-site lifts should compile-eval under clang first.
|
||||||
|
|
||||||
|
**Hypothesis resolved:** the vendor compiler is almost certainly
|
||||||
|
**armclang** (ARM's LLVM-based fork) or a similarly-aggressive LLVM
|
||||||
|
variant — NOT GCC, NOT a dumbed-down rushed compiler. Evidence: their
|
||||||
|
output is SMALLER than GCC -Os, which rules out "naive". The fact
|
||||||
|
that clang -Oz approaches byte-match ruling suggests LLVM family.
|
||||||
|
|
||||||
|
**To push past 96%:** armclang itself (needs Arm Developer account /
|
||||||
|
free Community Edition), or continue clang -Oz + hand-tweaked C + per
|
||||||
|
-site inline asm where the last instruction doesn't converge. A single
|
||||||
|
afternoon's iteration should push to ≥99%.
|
||||||
|
|||||||
Reference in New Issue
Block a user