46155bbe91
Ship the new simulation & verification stack under simulation/:
- mmio_regions.py — address → region classifier (DDRCTL, DDRPHY,
OTP, SRAM, …). Shared by every other tool so trace output is
scannable without memorising the memory map.
- sim_tripwire.py — Bin-style per-access capture. Records
(seq, insn_tick, pc, addr, size, rw, val, region, fn_name) per
MMIO access. PCResolver bisects the vendor funs table parsed
from ddr_conservative_asm.s.
- tripwire_diff.py — PC-bucketed difflib.SequenceMatcher diff of
two tripwire CSVs. Buckets by fn_name so bitflip-induced control
flow divergence doesn't cascade noise.
- training_sim.py — DDR training simulator with --mode pass and
--mode bitflip (flip first N reads per training status, exercise
retry paths). BITFLIP_ONLY env var narrows to a single addr for
the sweep.
- bitflip_sweep.py — Flip each of 23 training-status addresses
one-at-a-time and tabulate retry convergence. Surfaces which
function(s) react to a transient fault by writing different
downstream register values.
Plus:
- mmio_diff.py updated: region-tagged divergence output,
--show-regions histogram, --tripwire-out-{vendor,rebuilt} CSV
capture, --capture-stack-writes for stack-allocated buffer diffs.
- debug_probes/tp_slot_{probe,writes}.py — ad-hoc Unicorn probes
for chasing a single-slot divergence in an SRAM buffer. Kept as
reference examples of how to extend the tripwire toolchain.
The stack found 6 silicon-hostile bugs in the rebuilt blob that
mmio_diff's write-sequence gate was structurally blind to, including
three ld-unresolved-symbol NULL derefs (case-mismatched externs,
missing DATA_SYMS) and one C-early-return-skips-shared-tail bug
where vendor's asm fell through to the tail via `b` after a `ret`.
198 lines
7.3 KiB
Markdown
198 lines
7.3 KiB
Markdown
# RK3588 DDR TPL — Simulation & Verification Stack
|
||
|
||
A set of Unicorn-based tools for pre-silicon simulation, behavioral
|
||
diffing, and fault-injection of Rockchip RK3588 DDR TPL blobs (vendor
|
||
or rebuilt).
|
||
|
||
Built to hunt silicon-corruption bugs that `mmio_diff.py`'s
|
||
write-sequence comparison cannot see — NULL derefs, read-side
|
||
divergences, retry-path diffs.
|
||
|
||
## Synopsis
|
||
|
||
| Tool | One-line |
|
||
|---|---|
|
||
| `mmio_regions.py` | Address → region classifier (`DDRCTL`, `DDRPHY`, `OTP`, `SRAM`, …) |
|
||
| `sim_tripwire.py` | Bin-style per-access capture (PC, tick, addr, region, resolved fn name) |
|
||
| `tripwire_diff.py` | PC-bucketed `SequenceMatcher` diff of two tripwire CSVs |
|
||
| `training_sim.py` | DDR-training simulator with `pass` and `bitflip-first-pass` modes |
|
||
| `bitflip_sweep.py` | Flip each training-status address one at a time, report retry convergence |
|
||
|
||
The simulator **DOES NOT** need silicon. It runs vendor or rebuilt TPL
|
||
blobs under Unicorn with an MMIO stub that returns "pass" values for
|
||
all training-status polls, captures every access, and lets you diff
|
||
runs behaviorally.
|
||
|
||
## Quick start
|
||
|
||
Assuming your TPL blob is at `../rk3588_ddr_v1.19_prod.bin` (a copy of
|
||
the vendor blob shipped at SPI offset `0x8000` on boards with RKBIN
|
||
v1.19) and the rebuilt blob at `/tmp/rebuilt.bin`:
|
||
|
||
```bash
|
||
# Run once in "pass" mode and capture tripwire to CSV
|
||
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
|
||
--mode pass --tripwire-out /tmp/tw-pass.csv
|
||
|
||
# Run again with the first read of every training status flipped
|
||
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
|
||
--mode bitflip --flip-count 1 --flip-mask 0xFFFFFFFF \
|
||
--tripwire-out /tmp/tw-flip.csv
|
||
|
||
# Diff the two runs by function bucket
|
||
python3 tripwire_diff.py /tmp/tw-pass.csv /tmp/tw-flip.csv
|
||
|
||
# Sweep every training-status address one-at-a-time and tabulate
|
||
# whether the retry loop reconverges cleanly
|
||
python3 bitflip_sweep.py ../rk3588_ddr_v1.19_prod.bin
|
||
```
|
||
|
||
For vendor-vs-rebuilt verification (needs `../mmio_diff.py` in the
|
||
parent dir):
|
||
|
||
```bash
|
||
python3 ../mmio_diff.py --ignore-pc \
|
||
../rk3588_ddr_v1.19_prod.bin /tmp/rebuilt.bin \
|
||
--tripwire-out-vendor /tmp/tw-v.csv \
|
||
--tripwire-out-rebuilt /tmp/tw-r.csv \
|
||
--show-regions
|
||
|
||
python3 tripwire_diff.py /tmp/tw-v.csv /tmp/tw-r.csv
|
||
```
|
||
|
||
## Architecture
|
||
|
||
### `mmio_regions.py` — address classifier
|
||
|
||
Pure lookup table. `classify(addr)` returns a short tag for each
|
||
RK3588 peripheral window. Used by every other tool so trace output is
|
||
scannable without memorising the memory map.
|
||
|
||
Region tags: `DDRCTL`, `DDRCTL:SW` (STAT/PWRCTL/SWCTL/SWSTAT),
|
||
`DDRCTL:MR` (mode-register ops), `DDRPHY`, `DDRPHY:TR` (training
|
||
status offsets `0x080/090/0B4/3CC/514/684/A24`), `DDR_CRU`, `DDR_MEM`,
|
||
`SRAM`, `PMU_SRAM`, `GRF`, `BUS_GRF`, `SGRF`, `CRU`, `SCRU`, `PMU`,
|
||
`FW_DDR`, `OTP`, `UART`, `STACK`, `OTHER`.
|
||
|
||
### `sim_tripwire.py` — per-access capture
|
||
|
||
`Capture` class with `rd(pc, addr, size, val, tick)` and `wr(...)`
|
||
that record one row per access:
|
||
|
||
(seq_idx, insn_tick, pc, addr, size, rw, val, region, fn_name)
|
||
|
||
`fn_name` comes from `PCResolver`, which bisects the vendor funs
|
||
table parsed from `../ddr_conservative_asm.s` (115 `FUN_xxxx @ offset`
|
||
headers; Ghidra export). Set `RK_DDR_ASM` env var to override the
|
||
default asm path.
|
||
|
||
`emit_csv(path)` writes out; `load_csv(path)` re-hydrates. Both
|
||
`training_sim.py` and `mmio_diff.py` (in parent dir) accept a
|
||
tripwire capture object and record into it.
|
||
|
||
### `tripwire_diff.py` — PC-bucketed diff
|
||
|
||
For each unique `fn_name` in either capture, collect records, key
|
||
them by `(region, addr, rw, val, size)`, diff via `difflib.
|
||
SequenceMatcher`. `quick_ratio()` short-circuits buckets that share
|
||
almost nothing.
|
||
|
||
Outputs three tiers:
|
||
- **OK**: byte-identical key sequences (suppressed unless
|
||
`--show-identical`).
|
||
- **minor-diff**: ratio ≥ `--suspect-threshold` (default 0.9).
|
||
- **SUSPECT**: ratio below threshold, printed first with the raw
|
||
edit script.
|
||
|
||
Why PC-bucket and not index-by-index? Under bitflip mode the control
|
||
flow diverges at the flip point, which destroys index alignment.
|
||
Grouping by function localises divergences so one buggy bucket
|
||
doesn't cascade noise into unrelated ones.
|
||
|
||
### `training_sim.py` — DDR training simulator
|
||
|
||
Two modes:
|
||
|
||
- `--mode pass` — every training-status read returns its "done/OK/
|
||
trained" stub value every time. Equivalent to `mmio_diff`'s base
|
||
harness.
|
||
- `--mode bitflip --flip-count N --flip-mask MASK` — the first `N`
|
||
reads of each training-status address return `stub_value ^ mask`
|
||
(default mask `0xFFFFFFFF` → "not done"). Subsequent reads revert.
|
||
Exercises the retry / error-recovery paths.
|
||
|
||
Training-status addresses are defined inside `is_training_status()`;
|
||
override single-address via the `BITFLIP_ONLY=0xADDR` env var
|
||
(used by `bitflip_sweep.py`).
|
||
|
||
Region-tagged access histogram + UART TX dump on every run.
|
||
|
||
### `bitflip_sweep.py` — per-address retry convergence
|
||
|
||
Flips each training-status register one-at-a-time and summarises:
|
||
|
||
- how many records diverged from the pass-mode baseline
|
||
- whether any MMIO write value changed (= retry path took a
|
||
different branch)
|
||
- which function(s) wrote the divergent values
|
||
|
||
Output is a single table row per address. A clean "write_divergence"
|
||
column means retry paths converge deterministically. A non-zero
|
||
count names the function whose retry wrote a different register
|
||
value — which is often vendor-intended retry behavior, sometimes
|
||
a port bug.
|
||
|
||
Currently sweeps 23 addresses (7 DDRPHY training + 4 DDRCTL status
|
||
× 4 channels).
|
||
|
||
## Record shape + diff bucketing (for tool authors)
|
||
|
||
Per-access record fields:
|
||
|
||
seq monotonic index within the capture
|
||
tick Unicorn instruction count at the access
|
||
pc access-site PC (absolute)
|
||
addr MMIO/stack/SRAM address
|
||
size 1/2/4/8
|
||
rw 'rd' or 'wr'
|
||
val value read or written (hex)
|
||
region mmio_regions.classify(addr) tag
|
||
fn PCResolver result: FUN_xxxxxxxx from the funs table
|
||
|
||
Diff key inside each fn bucket: `(region, addr, rw, val, size)`.
|
||
Explicitly excludes `pc` (codegen reg-alloc shifts individual load/
|
||
store PCs within a function without changing behavior), `seq`, and
|
||
`tick` (these drift with any upstream path difference).
|
||
|
||
## Known limitations
|
||
|
||
- The Unicorn simulator exits early on sustained same-PC loops
|
||
(>10 000 iterations) to avoid deadlocks. Real silicon polling that
|
||
would eventually succeed is modelled via the stub returning the
|
||
success value; if your use case needs a different success-delay
|
||
profile, edit `stub_value` / `is_training_status`.
|
||
- `sim_tripwire.PCResolver` attributes every PC to the *largest
|
||
FUN_-entry address ≤ PC*. Unported code paths still resolve to a
|
||
reasonable fn_name. Ports not in the `// ============ FUN_xxxx @`
|
||
convention won't match.
|
||
- `mmio_diff.py`'s `--capture-stack-writes` flag catches writes to
|
||
Unicorn's scratch stack `0x00400000..0x00500000` — but the vendor
|
||
firmware sometimes uses SRAM-resident scratch buffers (e.g. the
|
||
`tp` timing buffer at `0xff0164f8`) instead of the call-stack. For
|
||
those, add a dedicated hook in the probe (see `../debug_probes/
|
||
tp_slot_writes.py` for an example).
|
||
|
||
## Dependencies
|
||
|
||
- Python 3.8+
|
||
- `unicorn-engine` (AArch64 CPU emulator)
|
||
- `difflib` (stdlib)
|
||
|
||
```bash
|
||
pip install unicorn
|
||
```
|
||
|
||
## License
|
||
|
||
GPL-2.0-or-later, matching the port candidates' SPDX headers.
|