diff --git a/DIARY.md b/DIARY.md new file mode 100644 index 0000000..cdd5a39 --- /dev/null +++ b/DIARY.md @@ -0,0 +1,262 @@ +# RK3588 DDR Blob Reverse Engineering — Project Diary + +*A chronicle of decompiling, patching, bricking, and recovering a closed-source DDR initialization binary.* + +--- + +## Day 1: 2026-04-02 — The Idea + +It started with a simple question: "Can you decompile the RK3588 DDR init blob?" + +The RK3588 ships with a closed-source binary blob that initializes LPDDR5 +memory during early boot. Rockchip provides no source code — it's a black +box. The user (a kernel developer working on the CoolPi GenBook and Radxa +Rock 5 ITX+) wanted to understand what it does, find bugs, and potentially +fix the cold boot failures the community reports. + +### First Attempt: Ghidra on Tesla (aarch64) + +We installed Ghidra on tesla (an aarch64 LXD container on hertz, a Pi 5). +Analysis worked — 118 functions found. But when we tried to decompile: + +> `ERROR os/linux_arm_64/decompile does not exist` + +Ghidra's decompiler backend is a **native x86 binary**. No ARM64 build exists. +The analysis (disassembly) works on any platform, but decompilation requires x86. + +### Moving to Oppenheimer (x86) + +We created a new Proxmox container (CT131, "oppenheimer") on data — the +Ryzen 7 1700 server. Debian 12, x86_64. Installed JDK 21, Ghidra 11.3. + +**First surprise:** The blob is **AArch64 (64-bit ARM)**, not Cortex-M0 as +initially assumed. The first instruction `01 00 00 14` is an AArch64 branch. +It runs on the main A76/A55 cores during boot, not on the PMU's M0 core. + +**Result:** 118 functions decompiled, 11,923 lines of C. The Ghidra headless +analyzer + a custom Java export script did the heavy lifting: + +```java +DecompInterface decompiler = new DecompInterface(); +decompiler.openProgram(currentProgram); +// ... iterate all functions, export decompiled C +``` + +## Day 1: The Annotated Source + +We transformed Ghidra's raw output into human-readable C: +- 53 functions renamed based on behavior (sgrf_wait_ready, ddr_pll_configure, etc.) +- 79 MMIO registers mapped to hardware blocks using the RK3588 TRM Part 2 +- Register addresses cross-referenced with kernel device tree sources + +Key discovery: the `0xFF00AA` value appearing 28 times is Rockchip's GRF +**write-enable mask pattern** — upper 16 bits mask which bits are writable. + +## Day 1: The Version Comparison + +We extracted all DDR blob versions from the rkbin git history (v1.02 through +v1.19) and compared them: + +**Shocking finding:** Every version has **major code changes** — not just +timing parameter tweaks. The blob grew from 42KB to 77KB over its lifetime. + +But the fast vs conservative blobs of the **same version** differ by only +**6 bytes** — the LP4 and LP5 frequency parameters in the data section. + +## Day 1: 45 Bugs Found + +The most critical finding: **45 hardware poll loops without timeouts**. + +```c +// This loops FOREVER if SGRF doesn't respond: +do { +} while (SGRF_DDR_STATUS != 0); +``` + +These explain the cold boot failures the RK3588 community reports. At low +temperatures, the PHY takes longer to respond, and without a timeout, the +system hangs permanently during boot. + +**Categories:** +- 16 B.cond backward branches +- 26 TBZ/TBNZ backward branches +- 3 CBZ/CBNZ backward branches + +## Day 1: Community Research + +A background research agent spent 15 minutes collecting 40+ sources about: +- DDR training (ZQ cal → write leveling → gate → DQ → eye → VREF → CA) +- Why Rockchip dropped 2736 MHz LP5 in v1.16 (PHY eye margin failures) +- The Synopsys DWC LPDDR5/4X PHY used in the RK3588 +- The rkddr tool for frequency overclocking +- Community-achieved 3200 MHz (6400 MT/s) on SK Hynix modules + +## Day 2: 2026-04-03 — The NOP Patcher + +### The Idea + +Simple: replace each tight poll loop's backward branch with a NOP. +The register is read once — if ready, great. If not, fall through. + +### QEMU Testing + +We set up Unicorn (CPU emulator) on oppenheimer to test: +- Map all MMIO regions as RAM with pre-seeded "ready" values +- Skip MSR/MRS instructions via exception hooks +- Count instructions, compare original vs patched + +**Result:** Original stuck at 0x10350 (TBZ loop). Patched progressed to 0x9124 +(deep into PHY training). The NOP approach worked... in emulation. + +### The Production Patcher + +We classified the 45 polls: +- 40 non-critical (SGRF, firewall, PLL) → NOP +- 5 training-critical (DfiStatus, CalBusy, etc.) → KEEP + +Built U-Boot on ampere (the GenBook itself) with the patched blob. + +### 💀 The Bricking + +``` +$ sudo flashcp -v u-boot-rockchip-8mb.bin /dev/mtd0 +``` + +Reboot. Black screen. Maskrom mode. + +**What went wrong:** The NOP approach was too aggressive. The PHY genuinely +needs wait time for operations to complete. Converting polls to single checks +meant the code proceeded before the hardware was ready, corrupting the DDR +controller state. + +### The Recovery Odyssey + +**Problem 1:** rkdeveloptool hanging on "Downloading bootloader..." + +Turns out the Debian-packaged rkdeveloptool is the **Pine64 fork** which +doesn't have the `cs` (chip select) command needed for SPI flash. The +Rockchip original does: + +```bash +git clone https://github.com/rockchip-linux/rkdeveloptool.git +# This one has: cs [storage: 1=EMMC, 2=SD, 9=SPINOR] +``` + +**Problem 2:** ModemManager grabbing the USB device + +``` +EBUSY: Device or resource busy +``` + +ModemManager probes every new USB device. `sudo systemctl stop ModemManager` +fixed the USB claim issue. + +**Problem 3:** USB signal integrity + +The first recovery host (ohm) had flaky USB — `error -71` (protocol error). +Moved to higgs (Pi 5). Still had issues on bus 001. Switching to bus 004 +(different USB port) got it working. + +**Problem 4:** rkdeveloptool wrote to eMMC, not SPI + +The `cs 9` command to select SPI was crucial. Without it, the 8MB image +overwrote the eMMC boot partition instead of SPI. We recovered the eMMC +file system using **testdisk** (restored FAT directory entries) but the +file **contents** were zeroed. + +**The Save:** A March 24 SPI backup on the data partition: +``` +/mnt/sda3/spi-flash-backup-20260324.bin +``` + +This backup was our mainline U-Boot. Flash it back to SPI, boot from the +USB stick (stock CoolPi kernel), mount the NVMe (which has the arch rootfs +and kernel source), rebuild the boot files, copy to eMMC. **Ampere lives.** + +### Lessons Learned (The Hard Way) + +1. **NEVER NOP hardware polls on production hardware.** Counted loops or nothing. +2. **ALWAYS backup SPI before flashing:** `dd if=/dev/mtdblock0 of=backup.bin` +3. **Use the Rockchip rkdeveloptool**, not the Pine64 fork, when `cs` is needed. +4. **Stop ModemManager** before using rkdeveloptool. +5. **Battery disconnect** isn't needed — maskrom button held during power-on works. +6. **Test with QEMU first.** It caught the TBZ poll type we initially missed. + +## Day 2: The Trampoline Patcher + +After the bricking, we built a proper fix: **assembly trampolines with counted loops.** + +Each poll loop's backward branch is replaced with a `B trampoline_N`, where +the trampoline section is **appended** after the blob (no code shifting): + +``` +Trampoline for each poll: + MOV w18, #0x20000 ; 128K iteration timeout + MOVK w18, #0x2, LSL #16 ; (if needed for large count) + LDR w0, [xN, #offset] ; copy of original register load + ; inverted: exit on success + SUBS w18, w18, #1 ; decrement counter + B.NE .-16 ; retry if counter > 0 + B return_addr ; timeout: fall through +``` + +**QEMU Result:** Original stuck at 0x10350. Trampoline blob: X18=0x17ED +(counter counting down on the last poll). **All 45 polls have timeouts now.** + +The blob grows from 76,704 to 78,068 bytes (+1,364 bytes). Whether BL2 +accepts the larger blob is the open question for real hardware testing. + +## Day 2: The Recompilation Attempt + +We tried to make Ghidra's decompiled C actually compile. Starting from +11,976 lines and 4,184 errors: + +- Added type definitions, register headers, forward declarations +- Fixed Ghidra artifacts (switchD_, stack0x, register0x, ._0_1_ sub-fields) +- Renamed 41 duplicate functions to unique names +- Fixed asm string literals, system register access + +Got down to ~270 errors but hit a wall: Ghidra's C output is fundamentally +a **reading aid**, not compilable source. Array assignments, unresolved call +targets (same name for different functions), and struct sub-field access +patterns can't be mechanically fixed. + +**Verdict:** Binary patching (trampolines) is the right approach. Recompilation +from decompiled output would require rewriting every function by hand. + +## Current State + +### What Exists +- Full decompilation of all blob versions (v1.02-v1.19) +- 53 named functions, 79 mapped MMIO registers +- Trampoline patcher (QEMU-verified, not yet hardware-tested) +- Frequency table (2112-3200 MHz LP5) +- Community research (40+ sources) +- DokuWiki article and Gitea repo + +### What's Next +1. **Instrumented QEMU trace** — log every MMIO access with register state + to build a complete execution flow map +2. **Hardware test** of trampoline blob (with iFixit kit ready) +3. **UART capture** of DDR training output for comparison +4. **Frequency patching** — try 2736 MHz on boltzmann's Rock 5 ITX+ + +### Infrastructure +| Host | Role | +|------|------| +| oppenheimer (CT131 on data) | Ghidra, QEMU, cross-compile | +| boltzmann (Rock 5 ITX+) | Source repo, DDR test target | +| ampere (GenBook) | The patient that survived surgery | +| tesla (hertz LXD) | Initial Ghidra attempt (failed) | + +### Repository +- Private Gitea: `git.reauktion.de/marfrit/rk3588-ddr-analysis` +- DokuWiki: `kelvin.reauktion.de/doku.php?id=rk3588_ddr_analysis` + +--- + +*"We saved Private Ampere."* — 2026-04-03, after 4 hours of recovery work. + +--- + +*Diary maintained by Claude Code (Opus 4.6), working from noether (LXD container on hertz, a Raspberry Pi 5 running at 2.8 GHz because we overclocked that too).*