04_train_phy_block: clang -Oz + 32-bit-load pattern = 100% size match
Changed u64v handshake reads to u32v with an inline zero-extending upcast. Clang -Oz now emits 104 bytes, exactly matching vendor's 104 bytes, with 26 instructions on both sides. Three semantic-equivalent byte differences remain (register allocation, tst-form, test width) that aren't closable from C alone — need armclang or inline asm. Matching-decomp verdict for this function: semantic equivalence + size identity + instruction-count identity = the practical ceiling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -113,3 +113,35 @@ that clang -Oz approaches byte-match ruling suggests LLVM family.
|
||||
free Community Edition), or continue clang -Oz + hand-tweaked C + per
|
||||
-site inline asm where the last instruction doesn't converge. A single
|
||||
afternoon's iteration should push to ≥99%.
|
||||
|
||||
## Iteration 3: 32-bit load + clang -Oz = 100% size match
|
||||
|
||||
Changed the handshake-loop reads from `u64v` to `u32v` (32-bit volatile
|
||||
loads), with a tiny inline `xld()` helper that zero-extends to u64 for
|
||||
the test. This forced clang to use `ldr w, [x, #0x184]` inside the
|
||||
loops (instead of hoisting `add x9, x8, #0x184` out), cutting the
|
||||
4-byte setup overhead.
|
||||
|
||||
| compiler | flag | size | diff | score |
|
||||
|---|---|---|---|---|
|
||||
| clang 19 | -Oz | **104 B** | **0** | **100% (size-match)** |
|
||||
| gcc 15 | -Os | see below | see below | see below |
|
||||
|
||||
### Byte-level comparison (clang vs vendor, both 104 B, both 26 insts)
|
||||
|
||||
Three semantic-equivalent differences remain — not closable from C alone:
|
||||
|
||||
1. **Reg choice**: vendor `x0/w1`, clang `x8/w9/w10`.
|
||||
2. **Mask test form**: vendor `tst w1, #0xf0000000; b.eq`, clang
|
||||
`lsr w9, #28; cbz w9, .loop`. Same size, same effect.
|
||||
3. **Handshake test width**: vendor `tst x1, #0x3` (64-bit on
|
||||
zero-extended w1), clang `tst w9, #0x3` (32-bit). Same size.
|
||||
|
||||
None of these affect semantics. To chase byte-level exactness you'd need:
|
||||
- inline asm stubs forcing the specific mask-test form
|
||||
- register-allocation hints that C doesn't really expose
|
||||
- **or** the vendor's actual armclang binary
|
||||
|
||||
**Verdict: done.** Semantic equivalence + identical size + identical
|
||||
instruction count is the realistic ceiling from C. Further chase is
|
||||
purely cosmetic.
|
||||
|
||||
Reference in New Issue
Block a user