- benchmark/ai_ghidra/SETUP.md documents the GhidrAssist 1.5.0 install
at /opt/ghidra/Ghidra/Extensions/GhidrAssist/ on oppenheimer (CT131),
with dirac endpoints (Hermes-2-Pro 8B @ :8080, Qwen-coder 1.5B @ :8081)
already reachable + tested. Final enable+config is UI-only; two
clicks on next Ghidra launch.
- gdb_debug/harness.c extended with case 4 = train_phy_block running
under a synthetic PHY at 0x40000000. Static MMIO shim satisfies
polls 1-3; poll 4 needs dynamic state-machine (next session, via
SIGBUS handler or ptrace) — documented in the README.
Vendor tree investigation: Rockchip's own sdram_rk3588.c / sdram_rk3568.c
are STUBS (return -1). No free function names from there. Path forward:
mine the vendor kernel's rockchip_dmc.c (devfreq DDR scaling driver)
for register-offset naming hints at runtime-call level.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
01_memset — byte memset, 28 B
02_memcpy32 — word-aligned memcpy, 36 B
03_magic_memset — magic check + tail-call to memset, 40 B
04_train_phy_block — first real poll-site function (104 B, 26 insts),
contains poll sites 12-15
Results in RESULTS.md:
- Ghidra: A on all four. Auto-decompile is close to final.
- retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
C on #4 (mistakes & 0xF0000000 for < 0x10000000).
GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.
gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause of counted_v2 brick identified:
v2 copied only ONE non-load body instruction into each trampoline (picks
the first after the LDR). For poll patterns of form
LDR Wx, [Xbase, #off]
AND Wx, Wx, #mask ; no flag update
CMP Wx, #expected ; sets flags
B.cond .retry
— 9 of the 16 sites in v1.19 have this shape — the final CMP was silently
dropped. The trampoline's B.inv_cond tested whatever flags happened to be
set before entry, producing effectively random branch decisions once
under the trampoline. Result: boot crashes before the UART banner,
observed as 'power LED off' brick.
Fix in v3: copy the ENTIRE loop body (LDR + all intermediate instructions,
in original order) into each trampoline. Size is now 4*(N+6) where N is
body length (28 bytes for body=2, 36 for body=3).
Also in v3:
- --sites subset flag for bisection (all/early/mid/late/none/index list)
- decode_sites.py helper that tries to identify which MMIO register each
site polls (best effort — the materialized_base scanner is naive and
picks up stale MOVZ targets, but cluster grouping by blob offset is
reliable and sufficient for bisection)
Site clusters in v1.19:
0..7 early (0x07b78..0x07f08): SGRF + PHY firmware state machine
8..10 mid (0x09124..0x0aaf8): DfiStatus / training start
11..15 late (0x0d154..0x0d378): UctWriteProt / CalBusy / late
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v1 NOP approach was WRONG — removed the poll entirely, proceeding with
stale/incomplete register values. ZQ cal, DFI handshake, and PHY mailbox
all require actual polling until the hardware responds.
v2 uses trampoline functions appended to the binary:
- Each poll site jumps to a trampoline that retries with a W16 counter
- 16384 iterations (~91us at 1.8GHz) before timeout
- On timeout, returns with condition NOT met (hits existing error path)
- On success, returns normally (original behavior preserved)
- W16 (IP0) used as counter — caller-saved, not used by poll loop bodies
35 poll loops patched, 13 non-poll backward branches left intact.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aggressive MMIO injection (try 0xFFFFFFFF, then 0, then 0x2) breaks through
all poll loops. Blob executes 19963 instructions visiting 3606 unique PCs
before jumping to unmapped memory (0x100000FFF).
Key findings:
- DDRC channels at 0xF7000000/0xF8000000 (not 0xFE01 as in TRM - these are
the direct DDRC addresses, not the MSCH wrapper)
- Blob reads training params from internal data at 0x000154xx
- 30% code path coverage achieved
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each poll loop branches to an appended trampoline that:
- Initializes w18 = 0x20000 (128K iterations)
- Copies the original loop body (LDR + condition check)
- Decrements w18, retries until timeout
- Falls through on timeout (no hang)
QEMU verified: original stuck at 0x10350, trampoline progresses through all polls.
Blob grows from 76704 to 78068 bytes (+1364 bytes trampoline section).
NOT YET TESTED ON REAL HARDWARE - the NOP approach bricked the GenBook.
This counted approach preserves the poll loops with a safety timeout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ghidra decompilation of v1.02-v1.19 blobs (118 functions)
- 53 functions renamed, 79 MMIO registers mapped to TRM
- 45 timeout-less poll loops identified and patched
- Production patcher (patch_prod.py) and QEMU emulator
- Comprehensive analysis, frequency tables, community research
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>