Add project diary - the full journey from decompilation to bricking to recovery
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,262 @@
|
||||
# RK3588 DDR Blob Reverse Engineering — Project Diary
|
||||
|
||||
*A chronicle of decompiling, patching, bricking, and recovering a closed-source DDR initialization binary.*
|
||||
|
||||
---
|
||||
|
||||
## Day 1: 2026-04-02 — The Idea
|
||||
|
||||
It started with a simple question: "Can you decompile the RK3588 DDR init blob?"
|
||||
|
||||
The RK3588 ships with a closed-source binary blob that initializes LPDDR5
|
||||
memory during early boot. Rockchip provides no source code — it's a black
|
||||
box. The user (a kernel developer working on the CoolPi GenBook and Radxa
|
||||
Rock 5 ITX+) wanted to understand what it does, find bugs, and potentially
|
||||
fix the cold boot failures the community reports.
|
||||
|
||||
### First Attempt: Ghidra on Tesla (aarch64)
|
||||
|
||||
We installed Ghidra on tesla (an aarch64 LXD container on hertz, a Pi 5).
|
||||
Analysis worked — 118 functions found. But when we tried to decompile:
|
||||
|
||||
> `ERROR os/linux_arm_64/decompile does not exist`
|
||||
|
||||
Ghidra's decompiler backend is a **native x86 binary**. No ARM64 build exists.
|
||||
The analysis (disassembly) works on any platform, but decompilation requires x86.
|
||||
|
||||
### Moving to Oppenheimer (x86)
|
||||
|
||||
We created a new Proxmox container (CT131, "oppenheimer") on data — the
|
||||
Ryzen 7 1700 server. Debian 12, x86_64. Installed JDK 21, Ghidra 11.3.
|
||||
|
||||
**First surprise:** The blob is **AArch64 (64-bit ARM)**, not Cortex-M0 as
|
||||
initially assumed. The first instruction `01 00 00 14` is an AArch64 branch.
|
||||
It runs on the main A76/A55 cores during boot, not on the PMU's M0 core.
|
||||
|
||||
**Result:** 118 functions decompiled, 11,923 lines of C. The Ghidra headless
|
||||
analyzer + a custom Java export script did the heavy lifting:
|
||||
|
||||
```java
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.openProgram(currentProgram);
|
||||
// ... iterate all functions, export decompiled C
|
||||
```
|
||||
|
||||
## Day 1: The Annotated Source
|
||||
|
||||
We transformed Ghidra's raw output into human-readable C:
|
||||
- 53 functions renamed based on behavior (sgrf_wait_ready, ddr_pll_configure, etc.)
|
||||
- 79 MMIO registers mapped to hardware blocks using the RK3588 TRM Part 2
|
||||
- Register addresses cross-referenced with kernel device tree sources
|
||||
|
||||
Key discovery: the `0xFF00AA` value appearing 28 times is Rockchip's GRF
|
||||
**write-enable mask pattern** — upper 16 bits mask which bits are writable.
|
||||
|
||||
## Day 1: The Version Comparison
|
||||
|
||||
We extracted all DDR blob versions from the rkbin git history (v1.02 through
|
||||
v1.19) and compared them:
|
||||
|
||||
**Shocking finding:** Every version has **major code changes** — not just
|
||||
timing parameter tweaks. The blob grew from 42KB to 77KB over its lifetime.
|
||||
|
||||
But the fast vs conservative blobs of the **same version** differ by only
|
||||
**6 bytes** — the LP4 and LP5 frequency parameters in the data section.
|
||||
|
||||
## Day 1: 45 Bugs Found
|
||||
|
||||
The most critical finding: **45 hardware poll loops without timeouts**.
|
||||
|
||||
```c
|
||||
// This loops FOREVER if SGRF doesn't respond:
|
||||
do {
|
||||
} while (SGRF_DDR_STATUS != 0);
|
||||
```
|
||||
|
||||
These explain the cold boot failures the RK3588 community reports. At low
|
||||
temperatures, the PHY takes longer to respond, and without a timeout, the
|
||||
system hangs permanently during boot.
|
||||
|
||||
**Categories:**
|
||||
- 16 B.cond backward branches
|
||||
- 26 TBZ/TBNZ backward branches
|
||||
- 3 CBZ/CBNZ backward branches
|
||||
|
||||
## Day 1: Community Research
|
||||
|
||||
A background research agent spent 15 minutes collecting 40+ sources about:
|
||||
- DDR training (ZQ cal → write leveling → gate → DQ → eye → VREF → CA)
|
||||
- Why Rockchip dropped 2736 MHz LP5 in v1.16 (PHY eye margin failures)
|
||||
- The Synopsys DWC LPDDR5/4X PHY used in the RK3588
|
||||
- The rkddr tool for frequency overclocking
|
||||
- Community-achieved 3200 MHz (6400 MT/s) on SK Hynix modules
|
||||
|
||||
## Day 2: 2026-04-03 — The NOP Patcher
|
||||
|
||||
### The Idea
|
||||
|
||||
Simple: replace each tight poll loop's backward branch with a NOP.
|
||||
The register is read once — if ready, great. If not, fall through.
|
||||
|
||||
### QEMU Testing
|
||||
|
||||
We set up Unicorn (CPU emulator) on oppenheimer to test:
|
||||
- Map all MMIO regions as RAM with pre-seeded "ready" values
|
||||
- Skip MSR/MRS instructions via exception hooks
|
||||
- Count instructions, compare original vs patched
|
||||
|
||||
**Result:** Original stuck at 0x10350 (TBZ loop). Patched progressed to 0x9124
|
||||
(deep into PHY training). The NOP approach worked... in emulation.
|
||||
|
||||
### The Production Patcher
|
||||
|
||||
We classified the 45 polls:
|
||||
- 40 non-critical (SGRF, firewall, PLL) → NOP
|
||||
- 5 training-critical (DfiStatus, CalBusy, etc.) → KEEP
|
||||
|
||||
Built U-Boot on ampere (the GenBook itself) with the patched blob.
|
||||
|
||||
### 💀 The Bricking
|
||||
|
||||
```
|
||||
$ sudo flashcp -v u-boot-rockchip-8mb.bin /dev/mtd0
|
||||
```
|
||||
|
||||
Reboot. Black screen. Maskrom mode.
|
||||
|
||||
**What went wrong:** The NOP approach was too aggressive. The PHY genuinely
|
||||
needs wait time for operations to complete. Converting polls to single checks
|
||||
meant the code proceeded before the hardware was ready, corrupting the DDR
|
||||
controller state.
|
||||
|
||||
### The Recovery Odyssey
|
||||
|
||||
**Problem 1:** rkdeveloptool hanging on "Downloading bootloader..."
|
||||
|
||||
Turns out the Debian-packaged rkdeveloptool is the **Pine64 fork** which
|
||||
doesn't have the `cs` (chip select) command needed for SPI flash. The
|
||||
Rockchip original does:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/rockchip-linux/rkdeveloptool.git
|
||||
# This one has: cs [storage: 1=EMMC, 2=SD, 9=SPINOR]
|
||||
```
|
||||
|
||||
**Problem 2:** ModemManager grabbing the USB device
|
||||
|
||||
```
|
||||
EBUSY: Device or resource busy
|
||||
```
|
||||
|
||||
ModemManager probes every new USB device. `sudo systemctl stop ModemManager`
|
||||
fixed the USB claim issue.
|
||||
|
||||
**Problem 3:** USB signal integrity
|
||||
|
||||
The first recovery host (ohm) had flaky USB — `error -71` (protocol error).
|
||||
Moved to higgs (Pi 5). Still had issues on bus 001. Switching to bus 004
|
||||
(different USB port) got it working.
|
||||
|
||||
**Problem 4:** rkdeveloptool wrote to eMMC, not SPI
|
||||
|
||||
The `cs 9` command to select SPI was crucial. Without it, the 8MB image
|
||||
overwrote the eMMC boot partition instead of SPI. We recovered the eMMC
|
||||
file system using **testdisk** (restored FAT directory entries) but the
|
||||
file **contents** were zeroed.
|
||||
|
||||
**The Save:** A March 24 SPI backup on the data partition:
|
||||
```
|
||||
/mnt/sda3/spi-flash-backup-20260324.bin
|
||||
```
|
||||
|
||||
This backup was our mainline U-Boot. Flash it back to SPI, boot from the
|
||||
USB stick (stock CoolPi kernel), mount the NVMe (which has the arch rootfs
|
||||
and kernel source), rebuild the boot files, copy to eMMC. **Ampere lives.**
|
||||
|
||||
### Lessons Learned (The Hard Way)
|
||||
|
||||
1. **NEVER NOP hardware polls on production hardware.** Counted loops or nothing.
|
||||
2. **ALWAYS backup SPI before flashing:** `dd if=/dev/mtdblock0 of=backup.bin`
|
||||
3. **Use the Rockchip rkdeveloptool**, not the Pine64 fork, when `cs` is needed.
|
||||
4. **Stop ModemManager** before using rkdeveloptool.
|
||||
5. **Battery disconnect** isn't needed — maskrom button held during power-on works.
|
||||
6. **Test with QEMU first.** It caught the TBZ poll type we initially missed.
|
||||
|
||||
## Day 2: The Trampoline Patcher
|
||||
|
||||
After the bricking, we built a proper fix: **assembly trampolines with counted loops.**
|
||||
|
||||
Each poll loop's backward branch is replaced with a `B trampoline_N`, where
|
||||
the trampoline section is **appended** after the blob (no code shifting):
|
||||
|
||||
```
|
||||
Trampoline for each poll:
|
||||
MOV w18, #0x20000 ; 128K iteration timeout
|
||||
MOVK w18, #0x2, LSL #16 ; (if needed for large count)
|
||||
LDR w0, [xN, #offset] ; copy of original register load
|
||||
<condition check> ; inverted: exit on success
|
||||
SUBS w18, w18, #1 ; decrement counter
|
||||
B.NE .-16 ; retry if counter > 0
|
||||
B return_addr ; timeout: fall through
|
||||
```
|
||||
|
||||
**QEMU Result:** Original stuck at 0x10350. Trampoline blob: X18=0x17ED
|
||||
(counter counting down on the last poll). **All 45 polls have timeouts now.**
|
||||
|
||||
The blob grows from 76,704 to 78,068 bytes (+1,364 bytes). Whether BL2
|
||||
accepts the larger blob is the open question for real hardware testing.
|
||||
|
||||
## Day 2: The Recompilation Attempt
|
||||
|
||||
We tried to make Ghidra's decompiled C actually compile. Starting from
|
||||
11,976 lines and 4,184 errors:
|
||||
|
||||
- Added type definitions, register headers, forward declarations
|
||||
- Fixed Ghidra artifacts (switchD_, stack0x, register0x, ._0_1_ sub-fields)
|
||||
- Renamed 41 duplicate functions to unique names
|
||||
- Fixed asm string literals, system register access
|
||||
|
||||
Got down to ~270 errors but hit a wall: Ghidra's C output is fundamentally
|
||||
a **reading aid**, not compilable source. Array assignments, unresolved call
|
||||
targets (same name for different functions), and struct sub-field access
|
||||
patterns can't be mechanically fixed.
|
||||
|
||||
**Verdict:** Binary patching (trampolines) is the right approach. Recompilation
|
||||
from decompiled output would require rewriting every function by hand.
|
||||
|
||||
## Current State
|
||||
|
||||
### What Exists
|
||||
- Full decompilation of all blob versions (v1.02-v1.19)
|
||||
- 53 named functions, 79 mapped MMIO registers
|
||||
- Trampoline patcher (QEMU-verified, not yet hardware-tested)
|
||||
- Frequency table (2112-3200 MHz LP5)
|
||||
- Community research (40+ sources)
|
||||
- DokuWiki article and Gitea repo
|
||||
|
||||
### What's Next
|
||||
1. **Instrumented QEMU trace** — log every MMIO access with register state
|
||||
to build a complete execution flow map
|
||||
2. **Hardware test** of trampoline blob (with iFixit kit ready)
|
||||
3. **UART capture** of DDR training output for comparison
|
||||
4. **Frequency patching** — try 2736 MHz on boltzmann's Rock 5 ITX+
|
||||
|
||||
### Infrastructure
|
||||
| Host | Role |
|
||||
|------|------|
|
||||
| oppenheimer (CT131 on data) | Ghidra, QEMU, cross-compile |
|
||||
| boltzmann (Rock 5 ITX+) | Source repo, DDR test target |
|
||||
| ampere (GenBook) | The patient that survived surgery |
|
||||
| tesla (hertz LXD) | Initial Ghidra attempt (failed) |
|
||||
|
||||
### Repository
|
||||
- Private Gitea: `git.reauktion.de/marfrit/rk3588-ddr-analysis`
|
||||
- DokuWiki: `kelvin.reauktion.de/doku.php?id=rk3588_ddr_analysis`
|
||||
|
||||
---
|
||||
|
||||
*"We saved Private Ampere."* — 2026-04-03, after 4 hours of recovery work.
|
||||
|
||||
---
|
||||
|
||||
*Diary maintained by Claude Code (Opus 4.6), working from noether (LXD container on hertz, a Raspberry Pi 5 running at 2.8 GHz because we overclocked that too).*
|
||||
Reference in New Issue
Block a user