From cf6ddf8e91f8ce5839f7f4e89970fc0c8d247170 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Wed, 15 Apr 2026 09:10:45 +0200
Subject: [PATCH] 04_train_phy_block GRIND_LOG: compiler matrix resolves
 (a)/(b) question
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Tested candidate.c across GCC-15 and clang-19 optimization levels:

  gcc  -Os         → 116 B (+12)
  clang -O2/Os/Oz  → 108 B (+4)   ← best
  vendor           → 104 B (0)

Vendor output is SMALLER than GCC -Os, which rules out 'spa-appointment
dumb compiler' (hypothesis b). Clang being only 4 bytes off suggests
the vendor uses armclang or a similarly-tuned LLVM fork (hypothesis a).

Immediate consequence: default compiler for matching-decomp on this
blob is clang, not GCC. Our train_phy_block starting score jumps
from 89.7% (GCC -Os) to 96% (clang -Oz) before any C tweaking.
Pushing past 96% likely needs armclang or per-site inline asm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 benchmark/04_train_phy_block/GRIND_LOG.md | 35 +++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/benchmark/04_train_phy_block/GRIND_LOG.md b/benchmark/04_train_phy_block/GRIND_LOG.md
index 06d3dc2..c93a75b 100644
--- a/benchmark/04_train_phy_block/GRIND_LOG.md
+++ b/benchmark/04_train_phy_block/GRIND_LOG.md
@@ -78,3 +78,38 @@ byte-matching (or functionally-equivalent) C version, we can:
 That's the path to a maintainable replacement for the trampoline-based
 v3fb approach, **for at least these 4 sites**. The other 12 sites live
 in different functions and would each need their own lift.
+
+## Compiler matrix 2026-04-15 late evening
+
+Tested the same `candidate.c` across GCC and clang:
+
+| compiler | best flag | size | diff vs vendor 104 |
+|---|---|---|---|
+| gcc 15   | -Os     | 116 B | +12 |
+| gcc 15   | -O1     | 120 B | +16 |
+| gcc 15   | -O2/-O3 | 128 B | +24 |
+| **clang 19** | **-O2 / -Os / -Oz** | **108 B** | **+4** |
+| clang 19 | -O1     | 112 B | +8  |
+| vendor   |         | 104 B | 0   |
+
+**Clang at -Oz is 4 bytes off vendor.** 96% size match on our first
+compile. GCC -Os tops out at 12 bytes off — 89.7%. The difference is
+consistent with how each compiler encodes mask-tests and the addressing
+it picks for short-imm offsets into a base+offset pointer — clang
+prefers `TST Wx, #imm` (single instruction, native imm encoding), GCC
+prefers `MOV Wy, #const; CMP Wx, Wy; B.cc` (three instructions, larger).
+
+**Consequence:** default compiler for matching-decomp on this blob is
+clang, not GCC. Move already committed in this GRIND_LOG; all future
+poll-site lifts should compile-eval under clang first.
+
+**Hypothesis resolved:** the vendor compiler is almost certainly
+**armclang** (ARM's LLVM-based fork) or a similarly-aggressive LLVM
+variant — NOT GCC, NOT a dumbed-down rushed compiler. Evidence: their
+output is SMALLER than GCC -Os, which rules out "naive". The fact
+that clang -Oz approaches byte-match ruling suggests LLVM family.
+
+**To push past 96%:** armclang itself (needs Arm Developer account /
+free Community Edition), or continue clang -Oz + hand-tweaked C + per
+-site inline asm where the last instruction doesn't converge. A single
+afternoon's iteration should push to ≥99%.