Ship the new simulation & verification stack under simulation/:
- mmio_regions.py — address → region classifier (DDRCTL, DDRPHY,
OTP, SRAM, …). Shared by every other tool so trace output is
scannable without memorising the memory map.
- sim_tripwire.py — Bin-style per-access capture. Records
(seq, insn_tick, pc, addr, size, rw, val, region, fn_name) per
MMIO access. PCResolver bisects the vendor funs table parsed
from ddr_conservative_asm.s.
- tripwire_diff.py — PC-bucketed difflib.SequenceMatcher diff of
two tripwire CSVs. Buckets by fn_name so bitflip-induced control
flow divergence doesn't cascade noise.
- training_sim.py — DDR training simulator with --mode pass and
--mode bitflip (flip first N reads per training status, exercise
retry paths). BITFLIP_ONLY env var narrows to a single addr for
the sweep.
- bitflip_sweep.py — Flip each of 23 training-status addresses
one-at-a-time and tabulate retry convergence. Surfaces which
function(s) react to a transient fault by writing different
downstream register values.
Plus:
- mmio_diff.py updated: region-tagged divergence output,
--show-regions histogram, --tripwire-out-{vendor,rebuilt} CSV
capture, --capture-stack-writes for stack-allocated buffer diffs.
- debug_probes/tp_slot_{probe,writes}.py — ad-hoc Unicorn probes
for chasing a single-slot divergence in an SRAM buffer. Kept as
reference examples of how to extend the tripwire toolchain.
The stack found 6 silicon-hostile bugs in the rebuilt blob that
mmio_diff's write-sequence gate was structurally blind to, including
three ld-unresolved-symbol NULL derefs (case-mismatched externs,
missing DATA_SYMS) and one C-early-return-skips-shared-tail bug
where vendor's asm fell through to the tail via `b` after a `ret`.
Two extensions that finally get the emu producing useful output:
1. Catch UC_ERR_EXCEPTION on MSR/MRS access, decode the instruction,
stub the destination register to 0 (for MRS) or silently accept
(for MSR), advance PC, resume. Opaque sysregs the blob touches
(CNTFRQ_EL0 etc.) no longer halt the emu.
2. Map UART2 (0xFEB50000), hook writes to THR (offset 0), collect
printable bytes. Stub LSR (+0x14 = 0x60 THRE|TEMT) and USR (+0x7C
= 0x02 TFE) in ABS_STUB so the blob|s putc polling loops resolve.
Result: stock AND patched v3fb blob each emit the full 52-byte
cold-boot banner under stub=0x00 --
DDR ff1a08bde6 typ 25/04/21-14:31.26,fwver: v1.19
-- byte-identical to what comes out of the GenBook|s real UART.
Under stub=0xFF both progress further, also identically:
DDR ff1a08bde6 typ 25/04/21-14:31.26,fwver: v1.19
pd/pu vd_ddr
Patched matches stock in both stub regimes. That|s the regression
gate we wanted: a patcher change that breaks the DDR blob|s visible
behavior now shows up as banner-divergence before any hardware flash.
Discovered the rkspi wrapper format during offline RC4 probing:
the RKNS-wrapper sector at 0x8000 is plaintext. Zero padding
fills 0x8200..0x85FF. Encoded metadata sits at 0x8600. The TPL
(DDR blob) starts at 0x8800 in plaintext -- not RC4 encrypted
as I first guessed.
New checks:
- TPL entry signature (0x01 0x00 0x00 0x14 = b +4 skipping header)
at offset 0x8800 -- catches silent TPL corruption
- optional --blob <path>: byte-by-byte compare SPI[0x8800:+len(blob)]
against a reference DDR blob file, reports sha256 + first-diff
offset on mismatch
Validated against stock SPI with stock blob (PASS, sha 13c04c4f),
patched SPI with patched blob (PASS, sha 85799151), and the
cross-pair (FAIL with diagnostics).
Closes the remaining gap in phase-1 static validation -- now
we catch not just |image has no idbloader| but also |image has
the wrong DDR bytes|.
The blob is position-dependent: entry code at 0x14-0x1c does
(return_addr & 0xFFFFFF00) == 0xFF001000 to validate the caller.
Previously we loaded at 0x0, so the check could never pass and
emu hung at 0x1c forever.
Fixed: load blob at 0xFF001000 (bootrom TPL slot), map SRAM
window 0xFF000000..0xFF100000, let x30 point at RET_STUB outside
the window. Emu now runs through the integrity check, the first
~120 instructions of entry dispatch, and stops at blob+0x10A80
on an MSR/MRS sysreg access Unicorn doesn|t model -- the same
depth ddr_emu2.c (C version) historically reached.
Stock and patched (--sites all) behave identically under both
--stub 0x00 and 0xFF. That|s the regression gate: any future
patch that crashes in the pre-sysreg segment will diverge.
Also added catch-all UC_HOOK_MEM_UNMAPPED with lazy 64KB
page-map + stub fallback so unknown MMIO targets don|t crash
the emu before we know about them.
Executes a raw DDR blob in AArch64 Unicorn with configurable stub
byte (--stub 0x00 / 0xFF) returned for every MMIO read. Intent:
gate real-hardware flashing behind "blob doesn|t crash the emu
under either stubbing regime."
Validated against rk3588_ddr_lp4_1848MHz_lp5_2112MHz_v1.19.bin
(stock) and patch_timeouts_v3.py --sites all output: both reach
max_pc=0xe0 and HALT cleanly via the return stub at RET_STUB,
identical under 0x00 and 0xFF stubs.
Phase 2 of test harness task #31. Phase 1 (spi_check.py,
structural RKNS validation) committed earlier.
Would have caught today|s 3-brick cycle (all-fb, midlate-fb, early-fb
bricked GenBook identically). Patched SPI images had 0xFF in the
entire idbloader region because u-boot|s mkimage silently failed
to produce idbloader-spi.img when the DDR blob grew by 548 bytes.
Static-only — no emulation yet. Phase 1 of the broader test-harness
task. Phase 2 will extend ddr_emu2 to execute TPL from SPI image
with stubbed MMIO.
Changed u64v handshake reads to u32v with an inline zero-extending
upcast. Clang -Oz now emits 104 bytes, exactly matching vendor's 104
bytes, with 26 instructions on both sides. Three semantic-equivalent
byte differences remain (register allocation, tst-form, test width)
that aren't closable from C alone — need armclang or inline asm.
Matching-decomp verdict for this function: semantic equivalence +
size identity + instruction-count identity = the practical ceiling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tested candidate.c across GCC-15 and clang-19 optimization levels:
gcc -Os → 116 B (+12)
clang -O2/Os/Oz → 108 B (+4) ← best
vendor → 104 B (0)
Vendor output is SMALLER than GCC -Os, which rules out 'spa-appointment
dumb compiler' (hypothesis b). Clang being only 4 bytes off suggests
the vendor uses armclang or a similarly-tuned LLVM fork (hypothesis a).
Immediate consequence: default compiler for matching-decomp on this
blob is clang, not GCC. Our train_phy_block starting score jumps
from 89.7% (GCC -Os) to 96% (clang -Oz) before any C tweaking.
Pushing past 96% likely needs armclang or per-site inline asm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous commit committed sibling claims without verifying against
the TRM bit tables. Verification fails for d328:
Sibling: +0x110 = CAL_RD_VWML0 (from TRM §2.4.3).
Blob: writes 0xF000F000 to that offset.
TRM: CAL_RD_VWML0 is READ-ONLY, bits[9:0]=rd_vwml0 code, [25:16]=rd_vwml1.
Writing is a no-op.
Root cause of sibling's error: conflated 'DDRPHY_OPB + offset' with
d328's 'DDRPHY_OPB + 0x8000 + offset'. The +0x8000 sub-block is NOT
documented in the TRM; offsets 0x110/0x118/0x120/0x154/0x160/0x184
WITHIN that sub-block mean something different from CAL_RD_VWML0 etc.
Kept the TRM-verified names I DID check:
- DDRCTL_DFISTAT @ +0x10514 (site 3)
- DDRCTL_STAT @ +0x10014 (sites 2,4,5,7)
- DDRPHY_SCHD_TRAIN_CON0 @ +0xa24 — bit layout verified directly
Retracted names for d328's +0x8XXX accesses; restoring the PHY_CTL_110
etc. RE-guess labels as the safe fallback. True names remain unknown
until we get hardware-trace data or the Synopsys DWC PUB databook.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sibling went back into the TRM and found §2.4.3 'Registers Summary
For DDRPHY' which I'd missed — it names almost every PHY PUB register
we'd been calling 'RE guess':
+0x110 = DDRPHY_CAL_RD_VWML0 (Read Valid Window Margin Left Code 0)
+0x120 = DDRPHY_CAL_RD_VWMR0 (Read Valid Window Margin Right Code 0)
+0x160 = DDRPHY_CAL_CON5 (Calibration Control 5: wrtrn_cyc_mode/en/th)
+0x684 = DDRPHY_PRBS_CON0 (PRBS Training Control — was 'CalBusy')
+0xa24 = DDRPHY_SCHD_TRAIN_CON0 (MASTER training scheduler; full bit map
in the TRM — every training type + per-rank)
+0xb88 = DDRPHY_DQSDUTY_CON2 (DQS rise-duty monitor — was 'UctShadow')
SCHD_TRAIN_CON0 is the master — the blob selects a training type via
its enable bits and polls bit[1] phy_train_done. Four of our 16 poll
sites are almost certainly polling this bit across different training
stages.
Still reserved in TRM: +0x118, +0x154, +0x184 — training-engine
private FSMs. Only dynamic tracing can name these.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted from rkbin git (using git show <add-commit>:<path> since
each version-update commit deletes the previous binary). Sits here
so future runs don't need to re-extract.
Byte-diff analysis: v1.x → v1.(x+1) differs in ~88% of bytes. Every
release is a near-rewrite, not a patch. Consequence: cross-version
symbol porting tools (Polypyus, BinDiff) would match few functions
on this target. Function-level opcode-silhouette matching with
wildcards for branch targets may still work but needs Ghidra baseline
in each version, not a one-sided v1.19 annotation.
Polypyus attempted 2026-04-15 — blocked by Python 3.14 / pony 0.7
bytecode decompiler incompatibility (LOAD_FAST_BORROW + varname
table layout). Would need pyenv Python 3.11 venv, or switch to
BinDiff CLI (no Python dep). Deferred.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Decoded all 16 poll sites against RK3588 TRM Part 2 where possible:
- 1 site (site 3) polls DDRCTL_DFISTAT (vendor-canonical, TRM-named)
- 4 sites (2,4,5,7) poll DDRCTL + 0x10014 — likely STAT.operating_mode
per generic DWC uMCTL2 convention; TRM cross-ref TBD
- 11 sites are DWC PUB / Innosilicon PHY — still RE-only (TRM does
not republish the PHY register map)
- 1 unusual site (site 10) polls absolute 0xff000024 in SRAM_BOOT
region — possibly a BL2 handoff word, not a PHY poll. Flagged for
special treatment in the v3fb bisection plan.
Known tensions documented:
- Site 3's DFISTAT test uses bits[2:1] (mask 0x6), generic uMCTL2 has
only bit[0] defined there → RK3588 likely extends DFISTAT with
vendor-specific bits. Need to verify from TRM bit tables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per RK3588 TRM Part 2 chapter 2 (DMC, 522 pages):
+0x10080 = DDRCTL_MRCTRL0 (Mode Register Control, was MicroReset)
+0x10090 = DDRCTL_MRSTAT (MR Status mr_wr_busy, was MicroContMuxSel)
+0x10514 = DDRCTL_DFISTAT (DFI Status dfi_init_complete, was UctWriteProtShadow)
These are uMCTL2 controller registers — Rockchip-documented — NOT the
opaque PHY firmware scratch regs our 2026-04 analysis guessed. Poll
semantics now vendor-grounded: wait for MR command roundtrip, wait
for PHY-side DFI handshake.
Low-offset polls in train_phy_block (0x110, 0x118, 0x120, 0x154, 0x160,
0x184) plus the 0x684/0xa24/0xb88 ones remain DWC PUB and thus
undocumented; kept the best-effort RE names with `(RE)` tag in the
BUG_ANALYSIS table so a reader can tell which ones are vendor-canonical
and which are guesses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TRM Part 2 chapter 2 (DMC, 522 pages) reveals the offsets we poll at
+0x10080/+0x10090/+0x10514 are NOT PHY firmware scratch regs as our
earlier analysis guessed. They are uMCTL2 controller registers:
+0x10080 = DDRCTL_MRCTRL0 (Mode Register Control)
+0x10090 = DDRCTL_MRSTAT (Mode Register Status — wait for MR complete)
+0x10514 = DDRCTL_DFISTAT (DFI Status — wait for PHY handshake)
Semantics are now grounded in vendor docs instead of educated guesses.
The PHY-side polls (0x110, 0x118, 0x184 etc. in d328) remain
undocumented — TRM does not republish the Synopsys DWC PUB register
map. Still need RE for those.
TRM cached at boltzmann:~/projects/AMPere/vendor/trm/ (pdf + txt).
Fetched via Stanford mirror (surfaced by the Chinese-language research
sibling alongside rk-open-docs, mfkiwl/rk-open-docs which has Rockchip
internal DDR docs for RK322x..RK1808 era — Innosilicon PHY, not our
DWC multiPHY, so useful for methodology but not direct reference).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FUN_0000d10c @ 0xd10c (49 insts) contains poll site 11.
Semantically decoded as a PHY-side prologue for frequency-change
handshake: saves current state of one PHY CTL + four secondary-table
entries, waits for PHY firmware to reach state 1 (idle).
Matching-decomp iteration deferred vs the clean first lift (d328) —
d10c's two-base-pointer csel pattern plus parity-dependent offset
chain gives GCC too much register-allocation freedom. Getting to
>=90% byte-match would be an afternoon of iteration; time better
spent expanding pre-UART coverage breadth.
Poll-site coverage so far:
d328: sites 12, 13, 14, 15 (C candidate at 89.7% size match)
d10c: site 11 (reference C only, no matching iteration)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmark/ai_ghidra/SETUP.md documents the GhidrAssist 1.5.0 install
at /opt/ghidra/Ghidra/Extensions/GhidrAssist/ on oppenheimer (CT131),
with dirac endpoints (Hermes-2-Pro 8B @ :8080, Qwen-coder 1.5B @ :8081)
already reachable + tested. Final enable+config is UI-only; two
clicks on next Ghidra launch.
- gdb_debug/harness.c extended with case 4 = train_phy_block running
under a synthetic PHY at 0x40000000. Static MMIO shim satisfies
polls 1-3; poll 4 needs dynamic state-machine (next session, via
SIGBUS handler or ptrace) — documented in the README.
Vendor tree investigation: Rockchip's own sdram_rk3588.c / sdram_rk3568.c
are STUBS (return -1). No free function names from there. Path forward:
mine the vendor kernel's rockchip_dmc.c (devfreq DDR scaling driver)
for register-offset naming hints at runtime-call level.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
01_memset — byte memset, 28 B
02_memcpy32 — word-aligned memcpy, 36 B
03_magic_memset — magic check + tail-call to memset, 40 B
04_train_phy_block — first real poll-site function (104 B, 26 insts),
contains poll sites 12-15
Results in RESULTS.md:
- Ghidra: A on all four. Auto-decompile is close to final.
- retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
C on #4 (mistakes & 0xF0000000 for < 0x10000000).
GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.
gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause of counted_v2 brick identified:
v2 copied only ONE non-load body instruction into each trampoline (picks
the first after the LDR). For poll patterns of form
LDR Wx, [Xbase, #off]
AND Wx, Wx, #mask ; no flag update
CMP Wx, #expected ; sets flags
B.cond .retry
— 9 of the 16 sites in v1.19 have this shape — the final CMP was silently
dropped. The trampoline's B.inv_cond tested whatever flags happened to be
set before entry, producing effectively random branch decisions once
under the trampoline. Result: boot crashes before the UART banner,
observed as 'power LED off' brick.
Fix in v3: copy the ENTIRE loop body (LDR + all intermediate instructions,
in original order) into each trampoline. Size is now 4*(N+6) where N is
body length (28 bytes for body=2, 36 for body=3).
Also in v3:
- --sites subset flag for bisection (all/early/mid/late/none/index list)
- decode_sites.py helper that tries to identify which MMIO register each
site polls (best effort — the materialized_base scanner is naive and
picks up stale MOVZ targets, but cluster grouping by blob offset is
reliable and sufficient for bisection)
Site clusters in v1.19:
0..7 early (0x07b78..0x07f08): SGRF + PHY firmware state machine
8..10 mid (0x09124..0x0aaf8): DfiStatus / training start
11..15 late (0x0d154..0x0d378): UctWriteProt / CalBusy / late
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v1 NOP approach was WRONG — removed the poll entirely, proceeding with
stale/incomplete register values. ZQ cal, DFI handshake, and PHY mailbox
all require actual polling until the hardware responds.
v2 uses trampoline functions appended to the binary:
- Each poll site jumps to a trampoline that retries with a W16 counter
- 16384 iterations (~91us at 1.8GHz) before timeout
- On timeout, returns with condition NOT met (hits existing error path)
- On success, returns normally (original behavior preserved)
- W16 (IP0) used as counter — caller-saved, not used by poll loop bodies
35 poll loops patched, 13 non-poll backward branches left intact.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aggressive MMIO injection (try 0xFFFFFFFF, then 0, then 0x2) breaks through
all poll loops. Blob executes 19963 instructions visiting 3606 unique PCs
before jumping to unmapped memory (0x100000FFF).
Key findings:
- DDRC channels at 0xF7000000/0xF8000000 (not 0xFE01 as in TRM - these are
the direct DDRC addresses, not the MSCH wrapper)
- Blob reads training params from internal data at 0x000154xx
- 30% code path coverage achieved
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each poll loop branches to an appended trampoline that:
- Initializes w18 = 0x20000 (128K iterations)
- Copies the original loop body (LDR + condition check)
- Decrements w18, retries until timeout
- Falls through on timeout (no hang)
QEMU verified: original stuck at 0x10350, trampoline progresses through all polls.
Blob grows from 76704 to 78068 bytes (+1364 bytes trampoline section).
NOT YET TESTED ON REAL HARDWARE - the NOP approach bricked the GenBook.
This counted approach preserves the poll loops with a safety timeout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ghidra decompilation of v1.02-v1.19 blobs (118 functions)
- 53 functions renamed, 79 MMIO registers mapped to TRM
- 45 timeout-less poll loops identified and patched
- Production patcher (patch_prod.py) and QEMU emulator
- Comprehensive analysis, frequency tables, community research
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>