d8f31784cb
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
263 lines
9.6 KiB
Markdown
263 lines
9.6 KiB
Markdown
# RK3588 DDR Blob Reverse Engineering — Project Diary
|
|
|
|
*A chronicle of decompiling, patching, bricking, and recovering a closed-source DDR initialization binary.*
|
|
|
|
---
|
|
|
|
## Day 1: 2026-04-02 — The Idea
|
|
|
|
It started with a simple question: "Can you decompile the RK3588 DDR init blob?"
|
|
|
|
The RK3588 ships with a closed-source binary blob that initializes LPDDR5
|
|
memory during early boot. Rockchip provides no source code — it's a black
|
|
box. The user (a kernel developer working on the CoolPi GenBook and Radxa
|
|
Rock 5 ITX+) wanted to understand what it does, find bugs, and potentially
|
|
fix the cold boot failures the community reports.
|
|
|
|
### First Attempt: Ghidra on Tesla (aarch64)
|
|
|
|
We installed Ghidra on tesla (an aarch64 LXD container on hertz, a Pi 5).
|
|
Analysis worked — 118 functions found. But when we tried to decompile:
|
|
|
|
> `ERROR os/linux_arm_64/decompile does not exist`
|
|
|
|
Ghidra's decompiler backend is a **native x86 binary**. No ARM64 build exists.
|
|
The analysis (disassembly) works on any platform, but decompilation requires x86.
|
|
|
|
### Moving to Oppenheimer (x86)
|
|
|
|
We created a new Proxmox container (CT131, "oppenheimer") on data — the
|
|
Ryzen 7 1700 server. Debian 12, x86_64. Installed JDK 21, Ghidra 11.3.
|
|
|
|
**First surprise:** The blob is **AArch64 (64-bit ARM)**, not Cortex-M0 as
|
|
initially assumed. The first instruction `01 00 00 14` is an AArch64 branch.
|
|
It runs on the main A76/A55 cores during boot, not on the PMU's M0 core.
|
|
|
|
**Result:** 118 functions decompiled, 11,923 lines of C. The Ghidra headless
|
|
analyzer + a custom Java export script did the heavy lifting:
|
|
|
|
```java
|
|
DecompInterface decompiler = new DecompInterface();
|
|
decompiler.openProgram(currentProgram);
|
|
// ... iterate all functions, export decompiled C
|
|
```
|
|
|
|
## Day 1: The Annotated Source
|
|
|
|
We transformed Ghidra's raw output into human-readable C:
|
|
- 53 functions renamed based on behavior (sgrf_wait_ready, ddr_pll_configure, etc.)
|
|
- 79 MMIO registers mapped to hardware blocks using the RK3588 TRM Part 2
|
|
- Register addresses cross-referenced with kernel device tree sources
|
|
|
|
Key discovery: the `0xFF00AA` value appearing 28 times is Rockchip's GRF
|
|
**write-enable mask pattern** — upper 16 bits mask which bits are writable.
|
|
|
|
## Day 1: The Version Comparison
|
|
|
|
We extracted all DDR blob versions from the rkbin git history (v1.02 through
|
|
v1.19) and compared them:
|
|
|
|
**Shocking finding:** Every version has **major code changes** — not just
|
|
timing parameter tweaks. The blob grew from 42KB to 77KB over its lifetime.
|
|
|
|
But the fast vs conservative blobs of the **same version** differ by only
|
|
**6 bytes** — the LP4 and LP5 frequency parameters in the data section.
|
|
|
|
## Day 1: 45 Bugs Found
|
|
|
|
The most critical finding: **45 hardware poll loops without timeouts**.
|
|
|
|
```c
|
|
// This loops FOREVER if SGRF doesn't respond:
|
|
do {
|
|
} while (SGRF_DDR_STATUS != 0);
|
|
```
|
|
|
|
These explain the cold boot failures the RK3588 community reports. At low
|
|
temperatures, the PHY takes longer to respond, and without a timeout, the
|
|
system hangs permanently during boot.
|
|
|
|
**Categories:**
|
|
- 16 B.cond backward branches
|
|
- 26 TBZ/TBNZ backward branches
|
|
- 3 CBZ/CBNZ backward branches
|
|
|
|
## Day 1: Community Research
|
|
|
|
A background research agent spent 15 minutes collecting 40+ sources about:
|
|
- DDR training (ZQ cal → write leveling → gate → DQ → eye → VREF → CA)
|
|
- Why Rockchip dropped 2736 MHz LP5 in v1.16 (PHY eye margin failures)
|
|
- The Synopsys DWC LPDDR5/4X PHY used in the RK3588
|
|
- The rkddr tool for frequency overclocking
|
|
- Community-achieved 3200 MHz (6400 MT/s) on SK Hynix modules
|
|
|
|
## Day 2: 2026-04-03 — The NOP Patcher
|
|
|
|
### The Idea
|
|
|
|
Simple: replace each tight poll loop's backward branch with a NOP.
|
|
The register is read once — if ready, great. If not, fall through.
|
|
|
|
### QEMU Testing
|
|
|
|
We set up Unicorn (CPU emulator) on oppenheimer to test:
|
|
- Map all MMIO regions as RAM with pre-seeded "ready" values
|
|
- Skip MSR/MRS instructions via exception hooks
|
|
- Count instructions, compare original vs patched
|
|
|
|
**Result:** Original stuck at 0x10350 (TBZ loop). Patched progressed to 0x9124
|
|
(deep into PHY training). The NOP approach worked... in emulation.
|
|
|
|
### The Production Patcher
|
|
|
|
We classified the 45 polls:
|
|
- 40 non-critical (SGRF, firewall, PLL) → NOP
|
|
- 5 training-critical (DfiStatus, CalBusy, etc.) → KEEP
|
|
|
|
Built U-Boot on ampere (the GenBook itself) with the patched blob.
|
|
|
|
### 💀 The Bricking
|
|
|
|
```
|
|
$ sudo flashcp -v u-boot-rockchip-8mb.bin /dev/mtd0
|
|
```
|
|
|
|
Reboot. Black screen. Maskrom mode.
|
|
|
|
**What went wrong:** The NOP approach was too aggressive. The PHY genuinely
|
|
needs wait time for operations to complete. Converting polls to single checks
|
|
meant the code proceeded before the hardware was ready, corrupting the DDR
|
|
controller state.
|
|
|
|
### The Recovery Odyssey
|
|
|
|
**Problem 1:** rkdeveloptool hanging on "Downloading bootloader..."
|
|
|
|
Turns out the Debian-packaged rkdeveloptool is the **Pine64 fork** which
|
|
doesn't have the `cs` (chip select) command needed for SPI flash. The
|
|
Rockchip original does:
|
|
|
|
```bash
|
|
git clone https://github.com/rockchip-linux/rkdeveloptool.git
|
|
# This one has: cs [storage: 1=EMMC, 2=SD, 9=SPINOR]
|
|
```
|
|
|
|
**Problem 2:** ModemManager grabbing the USB device
|
|
|
|
```
|
|
EBUSY: Device or resource busy
|
|
```
|
|
|
|
ModemManager probes every new USB device. `sudo systemctl stop ModemManager`
|
|
fixed the USB claim issue.
|
|
|
|
**Problem 3:** USB signal integrity
|
|
|
|
The first recovery host (ohm) had flaky USB — `error -71` (protocol error).
|
|
Moved to higgs (Pi 5). Still had issues on bus 001. Switching to bus 004
|
|
(different USB port) got it working.
|
|
|
|
**Problem 4:** rkdeveloptool wrote to eMMC, not SPI
|
|
|
|
The `cs 9` command to select SPI was crucial. Without it, the 8MB image
|
|
overwrote the eMMC boot partition instead of SPI. We recovered the eMMC
|
|
file system using **testdisk** (restored FAT directory entries) but the
|
|
file **contents** were zeroed.
|
|
|
|
**The Save:** A March 24 SPI backup on the data partition:
|
|
```
|
|
/mnt/sda3/spi-flash-backup-20260324.bin
|
|
```
|
|
|
|
This backup was our mainline U-Boot. Flash it back to SPI, boot from the
|
|
USB stick (stock CoolPi kernel), mount the NVMe (which has the arch rootfs
|
|
and kernel source), rebuild the boot files, copy to eMMC. **Ampere lives.**
|
|
|
|
### Lessons Learned (The Hard Way)
|
|
|
|
1. **NEVER NOP hardware polls on production hardware.** Counted loops or nothing.
|
|
2. **ALWAYS backup SPI before flashing:** `dd if=/dev/mtdblock0 of=backup.bin`
|
|
3. **Use the Rockchip rkdeveloptool**, not the Pine64 fork, when `cs` is needed.
|
|
4. **Stop ModemManager** before using rkdeveloptool.
|
|
5. **Battery disconnect** isn't needed — maskrom button held during power-on works.
|
|
6. **Test with QEMU first.** It caught the TBZ poll type we initially missed.
|
|
|
|
## Day 2: The Trampoline Patcher
|
|
|
|
After the bricking, we built a proper fix: **assembly trampolines with counted loops.**
|
|
|
|
Each poll loop's backward branch is replaced with a `B trampoline_N`, where
|
|
the trampoline section is **appended** after the blob (no code shifting):
|
|
|
|
```
|
|
Trampoline for each poll:
|
|
MOV w18, #0x20000 ; 128K iteration timeout
|
|
MOVK w18, #0x2, LSL #16 ; (if needed for large count)
|
|
LDR w0, [xN, #offset] ; copy of original register load
|
|
<condition check> ; inverted: exit on success
|
|
SUBS w18, w18, #1 ; decrement counter
|
|
B.NE .-16 ; retry if counter > 0
|
|
B return_addr ; timeout: fall through
|
|
```
|
|
|
|
**QEMU Result:** Original stuck at 0x10350. Trampoline blob: X18=0x17ED
|
|
(counter counting down on the last poll). **All 45 polls have timeouts now.**
|
|
|
|
The blob grows from 76,704 to 78,068 bytes (+1,364 bytes). Whether BL2
|
|
accepts the larger blob is the open question for real hardware testing.
|
|
|
|
## Day 2: The Recompilation Attempt
|
|
|
|
We tried to make Ghidra's decompiled C actually compile. Starting from
|
|
11,976 lines and 4,184 errors:
|
|
|
|
- Added type definitions, register headers, forward declarations
|
|
- Fixed Ghidra artifacts (switchD_, stack0x, register0x, ._0_1_ sub-fields)
|
|
- Renamed 41 duplicate functions to unique names
|
|
- Fixed asm string literals, system register access
|
|
|
|
Got down to ~270 errors but hit a wall: Ghidra's C output is fundamentally
|
|
a **reading aid**, not compilable source. Array assignments, unresolved call
|
|
targets (same name for different functions), and struct sub-field access
|
|
patterns can't be mechanically fixed.
|
|
|
|
**Verdict:** Binary patching (trampolines) is the right approach. Recompilation
|
|
from decompiled output would require rewriting every function by hand.
|
|
|
|
## Current State
|
|
|
|
### What Exists
|
|
- Full decompilation of all blob versions (v1.02-v1.19)
|
|
- 53 named functions, 79 mapped MMIO registers
|
|
- Trampoline patcher (QEMU-verified, not yet hardware-tested)
|
|
- Frequency table (2112-3200 MHz LP5)
|
|
- Community research (40+ sources)
|
|
- DokuWiki article and Gitea repo
|
|
|
|
### What's Next
|
|
1. **Instrumented QEMU trace** — log every MMIO access with register state
|
|
to build a complete execution flow map
|
|
2. **Hardware test** of trampoline blob (with iFixit kit ready)
|
|
3. **UART capture** of DDR training output for comparison
|
|
4. **Frequency patching** — try 2736 MHz on boltzmann's Rock 5 ITX+
|
|
|
|
### Infrastructure
|
|
| Host | Role |
|
|
|------|------|
|
|
| oppenheimer (CT131 on data) | Ghidra, QEMU, cross-compile |
|
|
| boltzmann (Rock 5 ITX+) | Source repo, DDR test target |
|
|
| ampere (GenBook) | The patient that survived surgery |
|
|
| tesla (hertz LXD) | Initial Ghidra attempt (failed) |
|
|
|
|
### Repository
|
|
- Private Gitea: `git.reauktion.de/marfrit/rk3588-ddr-analysis`
|
|
- DokuWiki: `kelvin.reauktion.de/doku.php?id=rk3588_ddr_analysis`
|
|
|
|
---
|
|
|
|
*"We saved Private Ampere."* — 2026-04-03, after 4 hours of recovery work.
|
|
|
|
---
|
|
|
|
*Diary maintained by Claude Code (Opus 4.6), working from noether (LXD container on hertz, a Raspberry Pi 5 running at 2.8 GHz because we overclocked that too).*
|