rk3588-ddr-analysis/simulation/README.md

# RK3588 DDR TPL — Simulation & Verification Stack

A set of Unicorn-based tools for pre-silicon simulation, behavioral
diffing, and fault-injection of Rockchip RK3588 DDR TPL blobs (vendor
or rebuilt).

Built to hunt silicon-corruption bugs that `mmio_diff.py`'s
write-sequence comparison cannot see — NULL derefs, read-side
divergences, retry-path diffs.

## Synopsis

| Tool | One-line |
|---|---|
| `mmio_regions.py` | Address → region classifier (`DDRCTL`, `DDRPHY`, `OTP`, `SRAM`, …) |
| `sim_tripwire.py` | Bin-style per-access capture (PC, tick, addr, region, resolved fn name) |
| `tripwire_diff.py` | PC-bucketed `SequenceMatcher` diff of two tripwire CSVs |
| `training_sim.py` | DDR-training simulator with `pass` and `bitflip-first-pass` modes |
| `bitflip_sweep.py` | Flip each training-status address one at a time, report retry convergence |

The simulator **DOES NOT** need silicon. It runs vendor or rebuilt TPL
blobs under Unicorn with an MMIO stub that returns "pass" values for
all training-status polls, captures every access, and lets you diff
runs behaviorally.

## Quick start

Assuming your TPL blob is at `../rk3588_ddr_v1.19_prod.bin` (a copy of
the vendor blob shipped at SPI offset `0x8000` on boards with RKBIN
v1.19) and the rebuilt blob at `/tmp/rebuilt.bin`:

```bash
# Run once in "pass" mode and capture tripwire to CSV
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
        --mode pass --tripwire-out /tmp/tw-pass.csv

# Run again with the first read of every training status flipped
python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
        --mode bitflip --flip-count 1 --flip-mask 0xFFFFFFFF \
        --tripwire-out /tmp/tw-flip.csv

# Diff the two runs by function bucket
python3 tripwire_diff.py /tmp/tw-pass.csv /tmp/tw-flip.csv

# Sweep every training-status address one-at-a-time and tabulate
# whether the retry loop reconverges cleanly
python3 bitflip_sweep.py ../rk3588_ddr_v1.19_prod.bin
```

For vendor-vs-rebuilt verification (needs `../mmio_diff.py` in the
parent dir):

```bash
python3 ../mmio_diff.py --ignore-pc \
        ../rk3588_ddr_v1.19_prod.bin /tmp/rebuilt.bin \
        --tripwire-out-vendor  /tmp/tw-v.csv \
        --tripwire-out-rebuilt /tmp/tw-r.csv \
        --show-regions

python3 tripwire_diff.py /tmp/tw-v.csv /tmp/tw-r.csv
```

## Architecture

### `mmio_regions.py` — address classifier

Pure lookup table. `classify(addr)` returns a short tag for each
RK3588 peripheral window. Used by every other tool so trace output is
scannable without memorising the memory map.

Region tags: `DDRCTL`, `DDRCTL:SW` (STAT/PWRCTL/SWCTL/SWSTAT),
`DDRCTL:MR` (mode-register ops), `DDRPHY`, `DDRPHY:TR` (training
status offsets `0x080/090/0B4/3CC/514/684/A24`), `DDR_CRU`, `DDR_MEM`,
`SRAM`, `PMU_SRAM`, `GRF`, `BUS_GRF`, `SGRF`, `CRU`, `SCRU`, `PMU`,
`FW_DDR`, `OTP`, `UART`, `STACK`, `OTHER`.

### `sim_tripwire.py` — per-access capture

`Capture` class with `rd(pc, addr, size, val, tick)` and `wr(...)`
that record one row per access:

    (seq_idx, insn_tick, pc, addr, size, rw, val, region, fn_name)

`fn_name` comes from `PCResolver`, which bisects the vendor funs
table parsed from `../ddr_conservative_asm.s` (115 `FUN_xxxx @ offset`
headers; Ghidra export). Set `RK_DDR_ASM` env var to override the
default asm path.

`emit_csv(path)` writes out; `load_csv(path)` re-hydrates. Both
`training_sim.py` and `mmio_diff.py` (in parent dir) accept a
tripwire capture object and record into it.

### `tripwire_diff.py` — PC-bucketed diff

For each unique `fn_name` in either capture, collect records, key
them by `(region, addr, rw, val, size)`, diff via `difflib.
SequenceMatcher`. `quick_ratio()` short-circuits buckets that share
almost nothing.

Outputs three tiers:
- **OK**: byte-identical key sequences (suppressed unless
  `--show-identical`).
- **minor-diff**: ratio ≥ `--suspect-threshold` (default 0.9).
- **SUSPECT**: ratio below threshold, printed first with the raw
  edit script.

Why PC-bucket and not index-by-index? Under bitflip mode the control
flow diverges at the flip point, which destroys index alignment.
Grouping by function localises divergences so one buggy bucket
doesn't cascade noise into unrelated ones.

### `training_sim.py` — DDR training simulator

Two modes:

- `--mode pass` — every training-status read returns its "done/OK/
  trained" stub value every time. Equivalent to `mmio_diff`'s base
  harness.
- `--mode bitflip --flip-count N --flip-mask MASK` — the first `N`
  reads of each training-status address return `stub_value ^ mask`
  (default mask `0xFFFFFFFF` → "not done"). Subsequent reads revert.
  Exercises the retry / error-recovery paths.

Training-status addresses are defined inside `is_training_status()`;
override single-address via the `BITFLIP_ONLY=0xADDR` env var
(used by `bitflip_sweep.py`).

Region-tagged access histogram + UART TX dump on every run.

### `bitflip_sweep.py` — per-address retry convergence

Flips each training-status register one-at-a-time and summarises:

- how many records diverged from the pass-mode baseline
- whether any MMIO write value changed (= retry path took a
  different branch)
- which function(s) wrote the divergent values

Output is a single table row per address. A clean "write_divergence"
column means retry paths converge deterministically. A non-zero
count names the function whose retry wrote a different register
value — which is often vendor-intended retry behavior, sometimes
a port bug.

Currently sweeps 23 addresses (7 DDRPHY training + 4 DDRCTL status
× 4 channels).

## Record shape + diff bucketing (for tool authors)

Per-access record fields:

    seq     monotonic index within the capture
    tick    Unicorn instruction count at the access
    pc      access-site PC (absolute)
    addr    MMIO/stack/SRAM address
    size    1/2/4/8
    rw      'rd' or 'wr'
    val     value read or written (hex)
    region  mmio_regions.classify(addr) tag
    fn      PCResolver result: FUN_xxxxxxxx from the funs table

Diff key inside each fn bucket: `(region, addr, rw, val, size)`.
Explicitly excludes `pc` (codegen reg-alloc shifts individual load/
store PCs within a function without changing behavior), `seq`, and
`tick` (these drift with any upstream path difference).

## Known limitations

- The Unicorn simulator exits early on sustained same-PC loops
  (>10 000 iterations) to avoid deadlocks. Real silicon polling that
  would eventually succeed is modelled via the stub returning the
  success value; if your use case needs a different success-delay
  profile, edit `stub_value` / `is_training_status`.
- `sim_tripwire.PCResolver` attributes every PC to the *largest
  FUN_-entry address ≤ PC*. Unported code paths still resolve to a
  reasonable fn_name. Ports not in the `// ============ FUN_xxxx @`
  convention won't match.
- `mmio_diff.py`'s `--capture-stack-writes` flag catches writes to
  Unicorn's scratch stack `0x00400000..0x00500000` — but the vendor
  firmware sometimes uses SRAM-resident scratch buffers (e.g. the
  `tp` timing buffer at `0xff0164f8`) instead of the call-stack. For
  those, add a dedicated hook in the probe (see `../debug_probes/
  tp_slot_writes.py` for an example).

## Dependencies

- Python 3.8+
- `unicorn-engine` (AArch64 CPU emulator)
- `difflib` (stdlib)

```bash
pip install unicorn
```

## License

GPL-2.0-or-later, matching the port candidates' SPDX headers.