simulation: tripwire + PC-bucketed diff + bitflip sweep

Ship the new simulation & verification stack under simulation/: - mmio_regions.py — address → region classifier (DDRCTL, DDRPHY, OTP, SRAM, …). Shared by every other tool so trace output is scannable without memorising the memory map. - sim_tripwire.py — Bin-style per-access capture. Records (seq, insn_tick, pc, addr, size, rw, val, region, fn_name) per MMIO access. PCResolver bisects the vendor funs table parsed from ddr_conservative_asm.s. - tripwire_diff.py — PC-bucketed difflib.SequenceMatcher diff of two tripwire CSVs. Buckets by fn_name so bitflip-induced control flow divergence doesn't cascade noise. - training_sim.py — DDR training simulator with --mode pass and --mode bitflip (flip first N reads per training status, exercise retry paths). BITFLIP_ONLY env var narrows to a single addr for the sweep. - bitflip_sweep.py — Flip each of 23 training-status addresses one-at-a-time and tabulate retry convergence. Surfaces which function(s) react to a transient fault by writing different downstream register values. Plus: - mmio_diff.py updated: region-tagged divergence output, --show-regions histogram, --tripwire-out-{vendor,rebuilt} CSV capture, --capture-stack-writes for stack-allocated buffer diffs. - debug_probes/tp_slot_{probe,writes}.py — ad-hoc Unicorn probes for chasing a single-slot divergence in an SRAM buffer. Kept as reference examples of how to extend the tripwire toolchain. The stack found 6 silicon-hostile bugs in the rebuilt blob that mmio_diff's write-sequence gate was structurally blind to, including three ld-unresolved-symbol NULL derefs (case-mismatched externs, missing DATA_SYMS) and one C-early-return-skips-shared-tail bug where vendor's asm fell through to the tail via `b` after a `ret`.
2026-04-22 05:55:28 +02:00
parent e20563e2ef
commit 46155bbe91
10 changed files with 1796 additions and 2 deletions
@@ -0,0 +1,197 @@
+# RK3588 DDR TPL — Simulation & Verification Stack
+
+A set of Unicorn-based tools for pre-silicon simulation, behavioral
+diffing, and fault-injection of Rockchip RK3588 DDR TPL blobs (vendor
+or rebuilt).
+
+Built to hunt silicon-corruption bugs that `mmio_diff.py`'s
+write-sequence comparison cannot see — NULL derefs, read-side
+divergences, retry-path diffs.
+
+## Synopsis
+
+| Tool | One-line |
+|---|---|
+| `mmio_regions.py` | Address → region classifier (`DDRCTL`, `DDRPHY`, `OTP`, `SRAM`, …) |
+| `sim_tripwire.py` | Bin-style per-access capture (PC, tick, addr, region, resolved fn name) |
+| `tripwire_diff.py` | PC-bucketed `SequenceMatcher` diff of two tripwire CSVs |
+| `training_sim.py` | DDR-training simulator with `pass` and `bitflip-first-pass` modes |
+| `bitflip_sweep.py` | Flip each training-status address one at a time, report retry convergence |
+
+The simulator **DOES NOT** need silicon. It runs vendor or rebuilt TPL
+blobs under Unicorn with an MMIO stub that returns "pass" values for
+all training-status polls, captures every access, and lets you diff
+runs behaviorally.
+
+## Quick start
+
+Assuming your TPL blob is at `../rk3588_ddr_v1.19_prod.bin` (a copy of
+the vendor blob shipped at SPI offset `0x8000` on boards with RKBIN
+v1.19) and the rebuilt blob at `/tmp/rebuilt.bin`:
+
+```bash
+# Run once in "pass" mode and capture tripwire to CSV
+python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
+        --mode pass --tripwire-out /tmp/tw-pass.csv
+
+# Run again with the first read of every training status flipped
+python3 training_sim.py ../rk3588_ddr_v1.19_prod.bin \
+        --mode bitflip --flip-count 1 --flip-mask 0xFFFFFFFF \
+        --tripwire-out /tmp/tw-flip.csv
+
+# Diff the two runs by function bucket
+python3 tripwire_diff.py /tmp/tw-pass.csv /tmp/tw-flip.csv
+
+# Sweep every training-status address one-at-a-time and tabulate
+# whether the retry loop reconverges cleanly
+python3 bitflip_sweep.py ../rk3588_ddr_v1.19_prod.bin
+```
+
+For vendor-vs-rebuilt verification (needs `../mmio_diff.py` in the
+parent dir):
+
+```bash
+python3 ../mmio_diff.py --ignore-pc \
+        ../rk3588_ddr_v1.19_prod.bin /tmp/rebuilt.bin \
+        --tripwire-out-vendor  /tmp/tw-v.csv \
+        --tripwire-out-rebuilt /tmp/tw-r.csv \
+        --show-regions
+
+python3 tripwire_diff.py /tmp/tw-v.csv /tmp/tw-r.csv
+```
+
+## Architecture
+
+### `mmio_regions.py` — address classifier
+
+Pure lookup table. `classify(addr)` returns a short tag for each
+RK3588 peripheral window. Used by every other tool so trace output is
+scannable without memorising the memory map.
+
+Region tags: `DDRCTL`, `DDRCTL:SW` (STAT/PWRCTL/SWCTL/SWSTAT),
+`DDRCTL:MR` (mode-register ops), `DDRPHY`, `DDRPHY:TR` (training
+status offsets `0x080/090/0B4/3CC/514/684/A24`), `DDR_CRU`, `DDR_MEM`,
+`SRAM`, `PMU_SRAM`, `GRF`, `BUS_GRF`, `SGRF`, `CRU`, `SCRU`, `PMU`,
+`FW_DDR`, `OTP`, `UART`, `STACK`, `OTHER`.
+
+### `sim_tripwire.py` — per-access capture
+
+`Capture` class with `rd(pc, addr, size, val, tick)` and `wr(...)`
+that record one row per access:
+
+    (seq_idx, insn_tick, pc, addr, size, rw, val, region, fn_name)
+
+`fn_name` comes from `PCResolver`, which bisects the vendor funs
+table parsed from `../ddr_conservative_asm.s` (115 `FUN_xxxx @ offset`
+headers; Ghidra export). Set `RK_DDR_ASM` env var to override the
+default asm path.
+
+`emit_csv(path)` writes out; `load_csv(path)` re-hydrates. Both
+`training_sim.py` and `mmio_diff.py` (in parent dir) accept a
+tripwire capture object and record into it.
+
+### `tripwire_diff.py` — PC-bucketed diff
+
+For each unique `fn_name` in either capture, collect records, key
+them by `(region, addr, rw, val, size)`, diff via `difflib.
+SequenceMatcher`. `quick_ratio()` short-circuits buckets that share
+almost nothing.
+
+Outputs three tiers:
+- **OK**: byte-identical key sequences (suppressed unless
+  `--show-identical`).
+- **minor-diff**: ratio ≥ `--suspect-threshold` (default 0.9).
+- **SUSPECT**: ratio below threshold, printed first with the raw
+  edit script.
+
+Why PC-bucket and not index-by-index? Under bitflip mode the control
+flow diverges at the flip point, which destroys index alignment.
+Grouping by function localises divergences so one buggy bucket
+doesn't cascade noise into unrelated ones.
+
+### `training_sim.py` — DDR training simulator
+
+Two modes:
+
+- `--mode pass` — every training-status read returns its "done/OK/
+  trained" stub value every time. Equivalent to `mmio_diff`'s base
+  harness.
+- `--mode bitflip --flip-count N --flip-mask MASK` — the first `N`
+  reads of each training-status address return `stub_value ^ mask`
+  (default mask `0xFFFFFFFF` → "not done"). Subsequent reads revert.
+  Exercises the retry / error-recovery paths.
+
+Training-status addresses are defined inside `is_training_status()`;
+override single-address via the `BITFLIP_ONLY=0xADDR` env var
+(used by `bitflip_sweep.py`).
+
+Region-tagged access histogram + UART TX dump on every run.
+
+### `bitflip_sweep.py` — per-address retry convergence
+
+Flips each training-status register one-at-a-time and summarises:
+
+- how many records diverged from the pass-mode baseline
+- whether any MMIO write value changed (= retry path took a
+  different branch)
+- which function(s) wrote the divergent values
+
+Output is a single table row per address. A clean "write_divergence"
+column means retry paths converge deterministically. A non-zero
+count names the function whose retry wrote a different register
+value — which is often vendor-intended retry behavior, sometimes
+a port bug.
+
+Currently sweeps 23 addresses (7 DDRPHY training + 4 DDRCTL status
+× 4 channels).
+
+## Record shape + diff bucketing (for tool authors)
+
+Per-access record fields:
+
+    seq     monotonic index within the capture
+    tick    Unicorn instruction count at the access
+    pc      access-site PC (absolute)
+    addr    MMIO/stack/SRAM address
+    size    1/2/4/8
+    rw      'rd' or 'wr'
+    val     value read or written (hex)
+    region  mmio_regions.classify(addr) tag
+    fn      PCResolver result: FUN_xxxxxxxx from the funs table
+
+Diff key inside each fn bucket: `(region, addr, rw, val, size)`.
+Explicitly excludes `pc` (codegen reg-alloc shifts individual load/
+store PCs within a function without changing behavior), `seq`, and
+`tick` (these drift with any upstream path difference).
+
+## Known limitations
+
+- The Unicorn simulator exits early on sustained same-PC loops
+  (>10 000 iterations) to avoid deadlocks. Real silicon polling that
+  would eventually succeed is modelled via the stub returning the
+  success value; if your use case needs a different success-delay
+  profile, edit `stub_value` / `is_training_status`.
+- `sim_tripwire.PCResolver` attributes every PC to the *largest
+  FUN_-entry address ≤ PC*. Unported code paths still resolve to a
+  reasonable fn_name. Ports not in the `// ============ FUN_xxxx @`
+  convention won't match.
+- `mmio_diff.py`'s `--capture-stack-writes` flag catches writes to
+  Unicorn's scratch stack `0x00400000..0x00500000` — but the vendor
+  firmware sometimes uses SRAM-resident scratch buffers (e.g. the
+  `tp` timing buffer at `0xff0164f8`) instead of the call-stack. For
+  those, add a dedicated hook in the probe (see `../debug_probes/
+  tp_slot_writes.py` for an example).
+
+## Dependencies
+
+- Python 3.8+
+- `unicorn-engine` (AArch64 CPU emulator)
+- `difflib` (stdlib)
+
+```bash
+pip install unicorn
+```
+
+## License
+
+GPL-2.0-or-later, matching the port candidates' SPDX headers.