simulation: tripwire + PC-bucketed diff + bitflip sweep

Ship the new simulation & verification stack under simulation/:

- mmio_regions.py — address → region classifier (DDRCTL, DDRPHY,
  OTP, SRAM, …). Shared by every other tool so trace output is
  scannable without memorising the memory map.
- sim_tripwire.py — Bin-style per-access capture. Records
  (seq, insn_tick, pc, addr, size, rw, val, region, fn_name) per
  MMIO access. PCResolver bisects the vendor funs table parsed
  from ddr_conservative_asm.s.
- tripwire_diff.py — PC-bucketed difflib.SequenceMatcher diff of
  two tripwire CSVs. Buckets by fn_name so bitflip-induced control
  flow divergence doesn't cascade noise.
- training_sim.py — DDR training simulator with --mode pass and
  --mode bitflip (flip first N reads per training status, exercise
  retry paths). BITFLIP_ONLY env var narrows to a single addr for
  the sweep.
- bitflip_sweep.py — Flip each of 23 training-status addresses
  one-at-a-time and tabulate retry convergence. Surfaces which
  function(s) react to a transient fault by writing different
  downstream register values.

Plus:

- mmio_diff.py updated: region-tagged divergence output,
  --show-regions histogram, --tripwire-out-{vendor,rebuilt} CSV
  capture, --capture-stack-writes for stack-allocated buffer diffs.
- debug_probes/tp_slot_{probe,writes}.py — ad-hoc Unicorn probes
  for chasing a single-slot divergence in an SRAM buffer. Kept as
  reference examples of how to extend the tripwire toolchain.

The stack found 6 silicon-hostile bugs in the rebuilt blob that
mmio_diff's write-sequence gate was structurally blind to, including
three ld-unresolved-symbol NULL derefs (case-mismatched externs,
missing DATA_SYMS) and one C-early-return-skips-shared-tail bug
where vendor's asm fell through to the tail via `b` after a `ret`.
This commit is contained in:
2026-04-22 05:55:28 +02:00
parent e20563e2ef
commit 46155bbe91
10 changed files with 1796 additions and 2 deletions
+197
View File
@@ -0,0 +1,197 @@
# RK3588 DDR TPL — Simulation & Verification Stack
A set of Unicorn-based tools for pre-silicon simulation, behavioral
diffing, and fault-injection of Rockchip RK3588 DDR TPL blobs (vendor
or rebuilt).
Built to hunt silicon-corruption bugs that `mmio_diff.py`'s
write-sequence comparison cannot see — NULL derefs, read-side
divergences, retry-path diffs.
## Synopsis
| Tool | One-line |
|---|---|
| `mmio_regions.py` | Address → region classifier (`DDRCTL`, `DDRPHY`, `OTP`, `SRAM`, …) |
| `sim_tripwire.py` | Bin-style per-access capture (PC, tick, addr, region, resolved fn name) |
| `tripwire_diff.py` | PC-bucketed `SequenceMatcher` diff of two tripwire CSVs |
| `training_sim.py` | DDR-training simulator with `pass` and `bitflip-first-pass` modes |
| `bitflip_sweep.py` | Flip each training-status address one at a time, report retry convergence |
The simulator **DOES NOT** need silicon. It runs vendor or rebuilt TPL
blobs under Unicorn with an MMIO stub that returns "pass" values for
all training-status polls, captures every access, and lets you diff
runs behaviorally.
## Quick start
Assuming your TPL blob is at `../rk3588_ddr_v1.19_prod.bin` (a copy of
the vendor blob shipped at SPI offset `0x8000` on boards with RKBIN
v1.19) and the rebuilt blob at `/tmp/rebuilt.bin`:
```bash
# Run once in "pass" mode and capture tripwire to CSV
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
--mode pass --tripwire-out /tmp/tw-pass.csv
# Run again with the first read of every training status flipped
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
--mode bitflip --flip-count 1 --flip-mask 0xFFFFFFFF \
--tripwire-out /tmp/tw-flip.csv
# Diff the two runs by function bucket
python3 tripwire_diff.py /tmp/tw-pass.csv /tmp/tw-flip.csv
# Sweep every training-status address one-at-a-time and tabulate
# whether the retry loop reconverges cleanly
python3 bitflip_sweep.py ../rk3588_ddr_v1.19_prod.bin
```
For vendor-vs-rebuilt verification (needs `../mmio_diff.py` in the
parent dir):
```bash
python3 ../mmio_diff.py --ignore-pc \
../rk3588_ddr_v1.19_prod.bin /tmp/rebuilt.bin \
--tripwire-out-vendor /tmp/tw-v.csv \
--tripwire-out-rebuilt /tmp/tw-r.csv \
--show-regions
python3 tripwire_diff.py /tmp/tw-v.csv /tmp/tw-r.csv
```
## Architecture
### `mmio_regions.py` — address classifier
Pure lookup table. `classify(addr)` returns a short tag for each
RK3588 peripheral window. Used by every other tool so trace output is
scannable without memorising the memory map.
Region tags: `DDRCTL`, `DDRCTL:SW` (STAT/PWRCTL/SWCTL/SWSTAT),
`DDRCTL:MR` (mode-register ops), `DDRPHY`, `DDRPHY:TR` (training
status offsets `0x080/090/0B4/3CC/514/684/A24`), `DDR_CRU`, `DDR_MEM`,
`SRAM`, `PMU_SRAM`, `GRF`, `BUS_GRF`, `SGRF`, `CRU`, `SCRU`, `PMU`,
`FW_DDR`, `OTP`, `UART`, `STACK`, `OTHER`.
### `sim_tripwire.py` — per-access capture
`Capture` class with `rd(pc, addr, size, val, tick)` and `wr(...)`
that record one row per access:
(seq_idx, insn_tick, pc, addr, size, rw, val, region, fn_name)
`fn_name` comes from `PCResolver`, which bisects the vendor funs
table parsed from `../ddr_conservative_asm.s` (115 `FUN_xxxx @ offset`
headers; Ghidra export). Set `RK_DDR_ASM` env var to override the
default asm path.
`emit_csv(path)` writes out; `load_csv(path)` re-hydrates. Both
`training_sim.py` and `mmio_diff.py` (in parent dir) accept a
tripwire capture object and record into it.
### `tripwire_diff.py` — PC-bucketed diff
For each unique `fn_name` in either capture, collect records, key
them by `(region, addr, rw, val, size)`, diff via `difflib.
SequenceMatcher`. `quick_ratio()` short-circuits buckets that share
almost nothing.
Outputs three tiers:
- **OK**: byte-identical key sequences (suppressed unless
`--show-identical`).
- **minor-diff**: ratio ≥ `--suspect-threshold` (default 0.9).
- **SUSPECT**: ratio below threshold, printed first with the raw
edit script.
Why PC-bucket and not index-by-index? Under bitflip mode the control
flow diverges at the flip point, which destroys index alignment.
Grouping by function localises divergences so one buggy bucket
doesn't cascade noise into unrelated ones.
### `training_sim.py` — DDR training simulator
Two modes:
- `--mode pass` — every training-status read returns its "done/OK/
trained" stub value every time. Equivalent to `mmio_diff`'s base
harness.
- `--mode bitflip --flip-count N --flip-mask MASK` — the first `N`
reads of each training-status address return `stub_value ^ mask`
(default mask `0xFFFFFFFF` → "not done"). Subsequent reads revert.
Exercises the retry / error-recovery paths.
Training-status addresses are defined inside `is_training_status()`;
override single-address via the `BITFLIP_ONLY=0xADDR` env var
(used by `bitflip_sweep.py`).
Region-tagged access histogram + UART TX dump on every run.
### `bitflip_sweep.py` — per-address retry convergence
Flips each training-status register one-at-a-time and summarises:
- how many records diverged from the pass-mode baseline
- whether any MMIO write value changed (= retry path took a
different branch)
- which function(s) wrote the divergent values
Output is a single table row per address. A clean "write_divergence"
column means retry paths converge deterministically. A non-zero
count names the function whose retry wrote a different register
value — which is often vendor-intended retry behavior, sometimes
a port bug.
Currently sweeps 23 addresses (7 DDRPHY training + 4 DDRCTL status
× 4 channels).
## Record shape + diff bucketing (for tool authors)
Per-access record fields:
seq monotonic index within the capture
tick Unicorn instruction count at the access
pc access-site PC (absolute)
addr MMIO/stack/SRAM address
size 1/2/4/8
rw 'rd' or 'wr'
val value read or written (hex)
region mmio_regions.classify(addr) tag
fn PCResolver result: FUN_xxxxxxxx from the funs table
Diff key inside each fn bucket: `(region, addr, rw, val, size)`.
Explicitly excludes `pc` (codegen reg-alloc shifts individual load/
store PCs within a function without changing behavior), `seq`, and
`tick` (these drift with any upstream path difference).
## Known limitations
- The Unicorn simulator exits early on sustained same-PC loops
(>10 000 iterations) to avoid deadlocks. Real silicon polling that
would eventually succeed is modelled via the stub returning the
success value; if your use case needs a different success-delay
profile, edit `stub_value` / `is_training_status`.
- `sim_tripwire.PCResolver` attributes every PC to the *largest
FUN_-entry address ≤ PC*. Unported code paths still resolve to a
reasonable fn_name. Ports not in the `// ============ FUN_xxxx @`
convention won't match.
- `mmio_diff.py`'s `--capture-stack-writes` flag catches writes to
Unicorn's scratch stack `0x00400000..0x00500000` — but the vendor
firmware sometimes uses SRAM-resident scratch buffers (e.g. the
`tp` timing buffer at `0xff0164f8`) instead of the call-stack. For
those, add a dedicated hook in the probe (see `../debug_probes/
tp_slot_writes.py` for an example).
## Dependencies
- Python 3.8+
- `unicorn-engine` (AArch64 CPU emulator)
- `difflib` (stdlib)
```bash
pip install unicorn
```
## License
GPL-2.0-or-later, matching the port candidates' SPDX headers.