Ship the new simulation & verification stack under simulation/:
- mmio_regions.py — address → region classifier (DDRCTL, DDRPHY,
OTP, SRAM, …). Shared by every other tool so trace output is
scannable without memorising the memory map.
- sim_tripwire.py — Bin-style per-access capture. Records
(seq, insn_tick, pc, addr, size, rw, val, region, fn_name) per
MMIO access. PCResolver bisects the vendor funs table parsed
from ddr_conservative_asm.s.
- tripwire_diff.py — PC-bucketed difflib.SequenceMatcher diff of
two tripwire CSVs. Buckets by fn_name so bitflip-induced control
flow divergence doesn't cascade noise.
- training_sim.py — DDR training simulator with --mode pass and
--mode bitflip (flip first N reads per training status, exercise
retry paths). BITFLIP_ONLY env var narrows to a single addr for
the sweep.
- bitflip_sweep.py — Flip each of 23 training-status addresses
one-at-a-time and tabulate retry convergence. Surfaces which
function(s) react to a transient fault by writing different
downstream register values.
Plus:
- mmio_diff.py updated: region-tagged divergence output,
--show-regions histogram, --tripwire-out-{vendor,rebuilt} CSV
capture, --capture-stack-writes for stack-allocated buffer diffs.
- debug_probes/tp_slot_{probe,writes}.py — ad-hoc Unicorn probes
for chasing a single-slot divergence in an SRAM buffer. Kept as
reference examples of how to extend the tripwire toolchain.
The stack found 6 silicon-hostile bugs in the rebuilt blob that
mmio_diff's write-sequence gate was structurally blind to, including
three ld-unresolved-symbol NULL derefs (case-mismatched externs,
missing DATA_SYMS) and one C-early-return-skips-shared-tail bug
where vendor's asm fell through to the tail via `b` after a `ret`.
7.3 KiB
RK3588 DDR TPL — Simulation & Verification Stack
A set of Unicorn-based tools for pre-silicon simulation, behavioral diffing, and fault-injection of Rockchip RK3588 DDR TPL blobs (vendor or rebuilt).
Built to hunt silicon-corruption bugs that mmio_diff.py's
write-sequence comparison cannot see — NULL derefs, read-side
divergences, retry-path diffs.
Synopsis
| Tool | One-line |
|---|---|
mmio_regions.py |
Address → region classifier (DDRCTL, DDRPHY, OTP, SRAM, …) |
sim_tripwire.py |
Bin-style per-access capture (PC, tick, addr, region, resolved fn name) |
tripwire_diff.py |
PC-bucketed SequenceMatcher diff of two tripwire CSVs |
training_sim.py |
DDR-training simulator with pass and bitflip-first-pass modes |
bitflip_sweep.py |
Flip each training-status address one at a time, report retry convergence |
The simulator DOES NOT need silicon. It runs vendor or rebuilt TPL blobs under Unicorn with an MMIO stub that returns "pass" values for all training-status polls, captures every access, and lets you diff runs behaviorally.
Quick start
Assuming your TPL blob is at ../rk3588_ddr_v1.19_prod.bin (a copy of
the vendor blob shipped at SPI offset 0x8000 on boards with RKBIN
v1.19) and the rebuilt blob at /tmp/rebuilt.bin:
# Run once in "pass" mode and capture tripwire to CSV
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
--mode pass --tripwire-out /tmp/tw-pass.csv
# Run again with the first read of every training status flipped
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
--mode bitflip --flip-count 1 --flip-mask 0xFFFFFFFF \
--tripwire-out /tmp/tw-flip.csv
# Diff the two runs by function bucket
python3 tripwire_diff.py /tmp/tw-pass.csv /tmp/tw-flip.csv
# Sweep every training-status address one-at-a-time and tabulate
# whether the retry loop reconverges cleanly
python3 bitflip_sweep.py ../rk3588_ddr_v1.19_prod.bin
For vendor-vs-rebuilt verification (needs ../mmio_diff.py in the
parent dir):
python3 ../mmio_diff.py --ignore-pc \
../rk3588_ddr_v1.19_prod.bin /tmp/rebuilt.bin \
--tripwire-out-vendor /tmp/tw-v.csv \
--tripwire-out-rebuilt /tmp/tw-r.csv \
--show-regions
python3 tripwire_diff.py /tmp/tw-v.csv /tmp/tw-r.csv
Architecture
mmio_regions.py — address classifier
Pure lookup table. classify(addr) returns a short tag for each
RK3588 peripheral window. Used by every other tool so trace output is
scannable without memorising the memory map.
Region tags: DDRCTL, DDRCTL:SW (STAT/PWRCTL/SWCTL/SWSTAT),
DDRCTL:MR (mode-register ops), DDRPHY, DDRPHY:TR (training
status offsets 0x080/090/0B4/3CC/514/684/A24), DDR_CRU, DDR_MEM,
SRAM, PMU_SRAM, GRF, BUS_GRF, SGRF, CRU, SCRU, PMU,
FW_DDR, OTP, UART, STACK, OTHER.
sim_tripwire.py — per-access capture
Capture class with rd(pc, addr, size, val, tick) and wr(...)
that record one row per access:
(seq_idx, insn_tick, pc, addr, size, rw, val, region, fn_name)
fn_name comes from PCResolver, which bisects the vendor funs
table parsed from ../ddr_conservative_asm.s (115 FUN_xxxx @ offset
headers; Ghidra export). Set RK_DDR_ASM env var to override the
default asm path.
emit_csv(path) writes out; load_csv(path) re-hydrates. Both
training_sim.py and mmio_diff.py (in parent dir) accept a
tripwire capture object and record into it.
tripwire_diff.py — PC-bucketed diff
For each unique fn_name in either capture, collect records, key
them by (region, addr, rw, val, size), diff via difflib. SequenceMatcher. quick_ratio() short-circuits buckets that share
almost nothing.
Outputs three tiers:
- OK: byte-identical key sequences (suppressed unless
--show-identical). - minor-diff: ratio ≥
--suspect-threshold(default 0.9). - SUSPECT: ratio below threshold, printed first with the raw edit script.
Why PC-bucket and not index-by-index? Under bitflip mode the control flow diverges at the flip point, which destroys index alignment. Grouping by function localises divergences so one buggy bucket doesn't cascade noise into unrelated ones.
training_sim.py — DDR training simulator
Two modes:
--mode pass— every training-status read returns its "done/OK/ trained" stub value every time. Equivalent tommio_diff's base harness.--mode bitflip --flip-count N --flip-mask MASK— the firstNreads of each training-status address returnstub_value ^ mask(default mask0xFFFFFFFF→ "not done"). Subsequent reads revert. Exercises the retry / error-recovery paths.
Training-status addresses are defined inside is_training_status();
override single-address via the BITFLIP_ONLY=0xADDR env var
(used by bitflip_sweep.py).
Region-tagged access histogram + UART TX dump on every run.
bitflip_sweep.py — per-address retry convergence
Flips each training-status register one-at-a-time and summarises:
- how many records diverged from the pass-mode baseline
- whether any MMIO write value changed (= retry path took a different branch)
- which function(s) wrote the divergent values
Output is a single table row per address. A clean "write_divergence" column means retry paths converge deterministically. A non-zero count names the function whose retry wrote a different register value — which is often vendor-intended retry behavior, sometimes a port bug.
Currently sweeps 23 addresses (7 DDRPHY training + 4 DDRCTL status × 4 channels).
Record shape + diff bucketing (for tool authors)
Per-access record fields:
seq monotonic index within the capture
tick Unicorn instruction count at the access
pc access-site PC (absolute)
addr MMIO/stack/SRAM address
size 1/2/4/8
rw 'rd' or 'wr'
val value read or written (hex)
region mmio_regions.classify(addr) tag
fn PCResolver result: FUN_xxxxxxxx from the funs table
Diff key inside each fn bucket: (region, addr, rw, val, size).
Explicitly excludes pc (codegen reg-alloc shifts individual load/
store PCs within a function without changing behavior), seq, and
tick (these drift with any upstream path difference).
Known limitations
- The Unicorn simulator exits early on sustained same-PC loops
(>10 000 iterations) to avoid deadlocks. Real silicon polling that
would eventually succeed is modelled via the stub returning the
success value; if your use case needs a different success-delay
profile, edit
stub_value/is_training_status. sim_tripwire.PCResolverattributes every PC to the largest FUN_-entry address ≤ PC. Unported code paths still resolve to a reasonable fn_name. Ports not in the// ============ FUN_xxxx @convention won't match.mmio_diff.py's--capture-stack-writesflag catches writes to Unicorn's scratch stack0x00400000..0x00500000— but the vendor firmware sometimes uses SRAM-resident scratch buffers (e.g. thetptiming buffer at0xff0164f8) instead of the call-stack. For those, add a dedicated hook in the probe (see../debug_probes/ tp_slot_writes.pyfor an example).
Dependencies
- Python 3.8+
unicorn-engine(AArch64 CPU emulator)difflib(stdlib)
pip install unicorn
License
GPL-2.0-or-later, matching the port candidates' SPDX headers.