Commit Graph

24 Commits

Author SHA1 Message Date
marfrit 9c20eb0135 04_train_phy_block: clang -Oz + 32-bit-load pattern = 100% size match
Changed u64v handshake reads to u32v with an inline zero-extending
upcast. Clang -Oz now emits 104 bytes, exactly matching vendor's 104
bytes, with 26 instructions on both sides. Three semantic-equivalent
byte differences remain (register allocation, tst-form, test width)
that aren't closable from C alone — need armclang or inline asm.

Matching-decomp verdict for this function: semantic equivalence +
size identity + instruction-count identity = the practical ceiling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:16:00 +02:00
marfrit cf6ddf8e91 04_train_phy_block GRIND_LOG: compiler matrix resolves (a)/(b) question
Tested candidate.c across GCC-15 and clang-19 optimization levels:

  gcc  -Os         → 116 B (+12)
  clang -O2/Os/Oz  → 108 B (+4)   ← best
  vendor           → 104 B (0)

Vendor output is SMALLER than GCC -Os, which rules out 'spa-appointment
dumb compiler' (hypothesis b). Clang being only 4 bytes off suggests
the vendor uses armclang or a similarly-tuned LLVM fork (hypothesis a).

Immediate consequence: default compiler for matching-decomp on this
blob is clang, not GCC. Our train_phy_block starting score jumps
from 89.7% (GCC -Os) to 96% (clang -Oz) before any C tweaking.
Pushing past 96% likely needs armclang or per-site inline asm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:10:45 +02:00
marfrit 301fc08890 Retract sibling's d328 register names — they addressed wrong sub-block
Previous commit committed sibling claims without verifying against
the TRM bit tables. Verification fails for d328:

  Sibling: +0x110 = CAL_RD_VWML0 (from TRM §2.4.3).
  Blob: writes 0xF000F000 to that offset.
  TRM: CAL_RD_VWML0 is READ-ONLY, bits[9:0]=rd_vwml0 code, [25:16]=rd_vwml1.
        Writing is a no-op.

Root cause of sibling's error: conflated 'DDRPHY_OPB + offset' with
d328's 'DDRPHY_OPB + 0x8000 + offset'. The +0x8000 sub-block is NOT
documented in the TRM; offsets 0x110/0x118/0x120/0x154/0x160/0x184
WITHIN that sub-block mean something different from CAL_RD_VWML0 etc.

Kept the TRM-verified names I DID check:
  - DDRCTL_DFISTAT @ +0x10514 (site 3)
  - DDRCTL_STAT @ +0x10014 (sites 2,4,5,7)
  - DDRPHY_SCHD_TRAIN_CON0 @ +0xa24 — bit layout verified directly

Retracted names for d328's +0x8XXX accesses; restoring the PHY_CTL_110
etc. RE-guess labels as the safe fallback. True names remain unknown
until we get hardware-trace data or the Synopsys DWC PUB databook.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:05:14 +02:00
marfrit 4166f81768 regs + POLL_SITE_MAP: TRM §2.4.3 register names for low-offset polls
Sibling went back into the TRM and found §2.4.3 'Registers Summary
For DDRPHY' which I'd missed — it names almost every PHY PUB register
we'd been calling 'RE guess':

  +0x110 = DDRPHY_CAL_RD_VWML0     (Read Valid Window Margin Left Code 0)
  +0x120 = DDRPHY_CAL_RD_VWMR0     (Read Valid Window Margin Right Code 0)
  +0x160 = DDRPHY_CAL_CON5         (Calibration Control 5: wrtrn_cyc_mode/en/th)
  +0x684 = DDRPHY_PRBS_CON0        (PRBS Training Control — was 'CalBusy')
  +0xa24 = DDRPHY_SCHD_TRAIN_CON0  (MASTER training scheduler; full bit map
                                    in the TRM — every training type + per-rank)
  +0xb88 = DDRPHY_DQSDUTY_CON2     (DQS rise-duty monitor — was 'UctShadow')

SCHD_TRAIN_CON0 is the master — the blob selects a training type via
its enable bits and polls bit[1] phy_train_done. Four of our 16 poll
sites are almost certainly polling this bit across different training
stages.

Still reserved in TRM: +0x118, +0x154, +0x184 — training-engine
private FSMs. Only dynamic tracing can name these.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:59:34 +02:00
marfrit 09c4a92432 historical_blobs: v1.16..v1.18 snapshots + byte-divergence analysis
Extracted from rkbin git (using git show <add-commit>:<path> since
each version-update commit deletes the previous binary). Sits here
so future runs don't need to re-extract.

Byte-diff analysis: v1.x → v1.(x+1) differs in ~88% of bytes. Every
release is a near-rewrite, not a patch. Consequence: cross-version
symbol porting tools (Polypyus, BinDiff) would match few functions
on this target. Function-level opcode-silhouette matching with
wildcards for branch targets may still work but needs Ghidra baseline
in each version, not a one-sided v1.19 annotation.

Polypyus attempted 2026-04-15 — blocked by Python 3.14 / pony 0.7
bytecode decompiler incompatibility (LOAD_FAST_BORROW + varname
table layout). Would need pyenv Python 3.11 venv, or switch to
BinDiff CLI (no Python dep). Deferred.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:51:01 +02:00
marfrit be59c91707 POLL_SITE_MAP.md: per-site register identification with TRM cross-reference
Decoded all 16 poll sites against RK3588 TRM Part 2 where possible:
  - 1 site (site 3) polls DDRCTL_DFISTAT (vendor-canonical, TRM-named)
  - 4 sites (2,4,5,7) poll DDRCTL + 0x10014 — likely STAT.operating_mode
    per generic DWC uMCTL2 convention; TRM cross-ref TBD
  - 11 sites are DWC PUB / Innosilicon PHY — still RE-only (TRM does
    not republish the PHY register map)
  - 1 unusual site (site 10) polls absolute 0xff000024 in SRAM_BOOT
    region — possibly a BL2 handoff word, not a PHY poll. Flagged for
    special treatment in the v3fb bisection plan.

Known tensions documented:
  - Site 3's DFISTAT test uses bits[2:1] (mask 0x6), generic uMCTL2 has
    only bit[0] defined there → RK3588 likely extends DFISTAT with
    vendor-specific bits. Need to verify from TRM bit tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:41:36 +02:00
marfrit e30127d056 BUG_ANALYSIS + regs_annotated.h: TRM-canonical names for poll-site regs
Per RK3588 TRM Part 2 chapter 2 (DMC, 522 pages):
  +0x10080 = DDRCTL_MRCTRL0   (Mode Register Control, was MicroReset)
  +0x10090 = DDRCTL_MRSTAT    (MR Status mr_wr_busy, was MicroContMuxSel)
  +0x10514 = DDRCTL_DFISTAT   (DFI Status dfi_init_complete, was UctWriteProtShadow)

These are uMCTL2 controller registers — Rockchip-documented — NOT the
opaque PHY firmware scratch regs our 2026-04 analysis guessed. Poll
semantics now vendor-grounded: wait for MR command roundtrip, wait
for PHY-side DFI handshake.

Low-offset polls in train_phy_block (0x110, 0x118, 0x120, 0x154, 0x160,
0x184) plus the 0x684/0xa24/0xb88 ones remain DWC PUB and thus
undocumented; kept the best-effort RE names with `(RE)` tag in the
BUG_ANALYSIS table so a reader can tell which ones are vendor-canonical
and which are guesses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:38:49 +02:00
marfrit 5aef5bd118 benchmark/TRM_FINDINGS.md: register-name corrections from RK3588 TRM
TRM Part 2 chapter 2 (DMC, 522 pages) reveals the offsets we poll at
+0x10080/+0x10090/+0x10514 are NOT PHY firmware scratch regs as our
earlier analysis guessed. They are uMCTL2 controller registers:

  +0x10080 = DDRCTL_MRCTRL0  (Mode Register Control)
  +0x10090 = DDRCTL_MRSTAT   (Mode Register Status — wait for MR complete)
  +0x10514 = DDRCTL_DFISTAT  (DFI Status — wait for PHY handshake)

Semantics are now grounded in vendor docs instead of educated guesses.
The PHY-side polls (0x110, 0x118, 0x184 etc. in d328) remain
undocumented — TRM does not republish the Synopsys DWC PUB register
map. Still need RE for those.

TRM cached at boltzmann:~/projects/AMPere/vendor/trm/ (pdf + txt).
Fetched via Stanford mirror (surfaced by the Chinese-language research
sibling alongside rk-open-docs, mfkiwl/rk-open-docs which has Rockchip
internal DDR docs for RK322x..RK1808 era — Innosilicon PHY, not our
DWC multiPHY, so useful for methodology but not direct reference).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:19:18 +02:00
marfrit 282d23fff7 benchmark/05_prep_freq_change: second poll-site function, reference-C only
FUN_0000d10c @ 0xd10c (49 insts) contains poll site 11.
Semantically decoded as a PHY-side prologue for frequency-change
handshake: saves current state of one PHY CTL + four secondary-table
entries, waits for PHY firmware to reach state 1 (idle).

Matching-decomp iteration deferred vs the clean first lift (d328) —
d10c's two-base-pointer csel pattern plus parity-dependent offset
chain gives GCC too much register-allocation freedom. Getting to
>=90% byte-match would be an afternoon of iteration; time better
spent expanding pre-UART coverage breadth.

Poll-site coverage so far:
  d328: sites 12, 13, 14, 15 (C candidate at 89.7% size match)
  d10c: site 11 (reference C only, no matching iteration)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:14:00 +02:00
marfrit 06d3d0d726 benchmark: AI-Ghidra landscape + case-4 harness (synthetic PHY)
- benchmark/ai_ghidra/SETUP.md documents the GhidrAssist 1.5.0 install
  at /opt/ghidra/Ghidra/Extensions/GhidrAssist/ on oppenheimer (CT131),
  with dirac endpoints (Hermes-2-Pro 8B @ :8080, Qwen-coder 1.5B @ :8081)
  already reachable + tested. Final enable+config is UI-only; two
  clicks on next Ghidra launch.
- gdb_debug/harness.c extended with case 4 = train_phy_block running
  under a synthetic PHY at 0x40000000. Static MMIO shim satisfies
  polls 1-3; poll 4 needs dynamic state-machine (next session, via
  SIGBUS handler or ptrace) — documented in the README.

Vendor tree investigation: Rockchip's own sdram_rk3588.c / sdram_rk3568.c
are STUBS (return -1). No free function names from there. Path forward:
mine the vendor kernel's rockchip_dmc.c (devfreq DDR scaling driver)
for register-offset naming hints at runtime-call level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:03:43 +02:00
marfrit 00d655187a benchmark/: three-way RE-tool comparison + first real C-lift
Three small functions extracted from the v1.19 conservative blob with
ground-truth C and per-tool (Ghidra / retdec / decomp.me) docs:
  01_memset        — byte memset, 28 B
  02_memcpy32      — word-aligned memcpy, 36 B
  03_magic_memset  — magic check + tail-call to memset, 40 B
  04_train_phy_block — first real poll-site function (104 B, 26 insts),
                       contains poll sites 12-15

Results in RESULTS.md:
  - Ghidra: A on all four. Auto-decompile is close to final.
  - retdec: A on #3, F on #1 and #2 (no register-arg inference on raw),
    C on #4 (mistakes & 0xF0000000 for < 0x10000000).

GRIND_LOG.md (in 04_train_phy_block/) records the matching-decomp
iteration: 116-byte candidate.c at -Os vs vendor 104 bytes = 89.7%
size match on first real iteration. Remaining gap is GCC's choice of
`cmp w, w_const; b.ls` over vendor's `tst w, #imm; b.eq` for the
mask tests.

gdb_debug/ holds a native-aarch64 GDB single-stepper for the three
benchmark functions — boltzmann smoke test passed (memset:
buf[10] 0x00→0xab).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:26:23 +02:00
marfrit 694be88964 v3 patcher: full-body trampolines + site bisection subsets
Root cause of counted_v2 brick identified:

v2 copied only ONE non-load body instruction into each trampoline (picks
the first after the LDR). For poll patterns of form

    LDR   Wx, [Xbase, #off]
    AND   Wx, Wx, #mask     ; no flag update
    CMP   Wx, #expected     ; sets flags
    B.cond .retry

— 9 of the 16 sites in v1.19 have this shape — the final CMP was silently
dropped. The trampoline's B.inv_cond tested whatever flags happened to be
set before entry, producing effectively random branch decisions once
under the trampoline. Result: boot crashes before the UART banner,
observed as 'power LED off' brick.

Fix in v3: copy the ENTIRE loop body (LDR + all intermediate instructions,
in original order) into each trampoline. Size is now 4*(N+6) where N is
body length (28 bytes for body=2, 36 for body=3).

Also in v3:
- --sites subset flag for bisection (all/early/mid/late/none/index list)
- decode_sites.py helper that tries to identify which MMIO register each
  site polls (best effort — the materialized_base scanner is naive and
  picks up stale MOVZ targets, but cluster grouping by blob offset is
  reliable and sufficient for bisection)

Site clusters in v1.19:
  0..7   early (0x07b78..0x07f08): SGRF + PHY firmware state machine
  8..10  mid   (0x09124..0x0aaf8): DfiStatus / training start
  11..15 late  (0x0d154..0x0d378): UctWriteProt / CalBusy / late

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 01:06:51 +02:00
test0r 05d0d8edd5 patch_timeouts v2: counted-loop trampolines instead of NOPs
v1 NOP approach was WRONG — removed the poll entirely, proceeding with
stale/incomplete register values. ZQ cal, DFI handshake, and PHY mailbox
all require actual polling until the hardware responds.

v2 uses trampoline functions appended to the binary:
- Each poll site jumps to a trampoline that retries with a W16 counter
- 16384 iterations (~91us at 1.8GHz) before timeout
- On timeout, returns with condition NOT met (hits existing error path)
- On success, returns normally (original behavior preserved)
- W16 (IP0) used as counter — caller-saved, not used by poll loop bodies

35 poll loops patched, 13 non-poll backward branches left intact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:31:38 +02:00
test0r 5e1414fc28 Diary: sandbox 90% done, piclaudeio idea 2026-04-04 00:45:13 +02:00
test0r 1326b3f847 Diary: keyboard pct13x2 found in vendor DTS 2026-04-04 00:08:37 +02:00
test0r b8efa8f742 OEM eDP analysis: force-hpd, power domains, clock parents 2026-04-03 23:48:00 +02:00
test0r 2a8e5ff714 Diary: eDP analysis - power domain issue 2026-04-03 23:39:11 +02:00
test0r 0389063b52 Diary: deep dive, DDRC direct addresses, retry loop
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:24:18 +02:00
test0r 815e890056 Deep trace: 3606 unique PCs, 30% code coverage with smart injection
Aggressive MMIO injection (try 0xFFFFFFFF, then 0, then 0x2) breaks through
all poll loops. Blob executes 19963 instructions visiting 3606 unique PCs
before jumping to unmapped memory (0x100000FFF).

Key findings:
- DDRC channels at 0xF7000000/0xF8000000 (not 0xFE01 as in TRM - these are
  the direct DDRC addresses, not the MSCH wrapper)
- Blob reads training params from internal data at 0x000154xx
- 30% code path coverage achieved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:20:03 +02:00
test0r a93ce6c3e9 Add instrumented MMIO tracer and first trace
Unicorn-based tracer captures every MMIO read/write with PC and instruction count.
First trace of trampoline blob: 19 MMIO accesses in 200K instructions.

Boot sequence: PMU_GRF read -> SRAM flag -> SRAM self-register ->
BUS_GRF QoS -> DDRC reset -> SCRU PLL config -> BUS_GRF route -> polls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:16:32 +02:00
test0r d8f31784cb Add project diary - the full journey from decompilation to bricking to recovery
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:15:06 +02:00
test0r cd4d01fd69 Add trampoline patcher v5 - counted loop timeouts for all 45 polls
Each poll loop branches to an appended trampoline that:
- Initializes w18 = 0x20000 (128K iterations)
- Copies the original loop body (LDR + condition check)
- Decrements w18, retries until timeout
- Falls through on timeout (no hang)

QEMU verified: original stuck at 0x10350, trampoline progresses through all polls.
Blob grows from 76704 to 78068 bytes (+1364 bytes trampoline section).

NOT YET TESTED ON REAL HARDWARE - the NOP approach bricked the GenBook.
This counted approach preserves the poll loops with a safety timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:10:13 +02:00
test0r d68ad1a59c Add UART capture script, Makefile, updated README with prerequisites
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:41:50 +02:00
test0r 816848a474 RK3588 DDR init blob reverse engineering
- Ghidra decompilation of v1.02-v1.19 blobs (118 functions)
- 53 functions renamed, 79 MMIO registers mapped to TRM
- 45 timeout-less poll loops identified and patched
- Production patcher (patch_prod.py) and QEMU emulator
- Comprehensive analysis, frequency tables, community research

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:06:47 +02:00