Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.6 KiB
RK3588 DDR Blob Reverse Engineering — Project Diary
A chronicle of decompiling, patching, bricking, and recovering a closed-source DDR initialization binary.
Day 1: 2026-04-02 — The Idea
It started with a simple question: "Can you decompile the RK3588 DDR init blob?"
The RK3588 ships with a closed-source binary blob that initializes LPDDR5 memory during early boot. Rockchip provides no source code — it's a black box. The user (a kernel developer working on the CoolPi GenBook and Radxa Rock 5 ITX+) wanted to understand what it does, find bugs, and potentially fix the cold boot failures the community reports.
First Attempt: Ghidra on Tesla (aarch64)
We installed Ghidra on tesla (an aarch64 LXD container on hertz, a Pi 5). Analysis worked — 118 functions found. But when we tried to decompile:
ERROR os/linux_arm_64/decompile does not exist
Ghidra's decompiler backend is a native x86 binary. No ARM64 build exists. The analysis (disassembly) works on any platform, but decompilation requires x86.
Moving to Oppenheimer (x86)
We created a new Proxmox container (CT131, "oppenheimer") on data — the Ryzen 7 1700 server. Debian 12, x86_64. Installed JDK 21, Ghidra 11.3.
First surprise: The blob is AArch64 (64-bit ARM), not Cortex-M0 as
initially assumed. The first instruction 01 00 00 14 is an AArch64 branch.
It runs on the main A76/A55 cores during boot, not on the PMU's M0 core.
Result: 118 functions decompiled, 11,923 lines of C. The Ghidra headless analyzer + a custom Java export script did the heavy lifting:
DecompInterface decompiler = new DecompInterface();
decompiler.openProgram(currentProgram);
// ... iterate all functions, export decompiled C
Day 1: The Annotated Source
We transformed Ghidra's raw output into human-readable C:
- 53 functions renamed based on behavior (sgrf_wait_ready, ddr_pll_configure, etc.)
- 79 MMIO registers mapped to hardware blocks using the RK3588 TRM Part 2
- Register addresses cross-referenced with kernel device tree sources
Key discovery: the 0xFF00AA value appearing 28 times is Rockchip's GRF
write-enable mask pattern — upper 16 bits mask which bits are writable.
Day 1: The Version Comparison
We extracted all DDR blob versions from the rkbin git history (v1.02 through v1.19) and compared them:
Shocking finding: Every version has major code changes — not just timing parameter tweaks. The blob grew from 42KB to 77KB over its lifetime.
But the fast vs conservative blobs of the same version differ by only 6 bytes — the LP4 and LP5 frequency parameters in the data section.
Day 1: 45 Bugs Found
The most critical finding: 45 hardware poll loops without timeouts.
// This loops FOREVER if SGRF doesn't respond:
do {
} while (SGRF_DDR_STATUS != 0);
These explain the cold boot failures the RK3588 community reports. At low temperatures, the PHY takes longer to respond, and without a timeout, the system hangs permanently during boot.
Categories:
- 16 B.cond backward branches
- 26 TBZ/TBNZ backward branches
- 3 CBZ/CBNZ backward branches
Day 1: Community Research
A background research agent spent 15 minutes collecting 40+ sources about:
- DDR training (ZQ cal → write leveling → gate → DQ → eye → VREF → CA)
- Why Rockchip dropped 2736 MHz LP5 in v1.16 (PHY eye margin failures)
- The Synopsys DWC LPDDR5/4X PHY used in the RK3588
- The rkddr tool for frequency overclocking
- Community-achieved 3200 MHz (6400 MT/s) on SK Hynix modules
Day 2: 2026-04-03 — The NOP Patcher
The Idea
Simple: replace each tight poll loop's backward branch with a NOP. The register is read once — if ready, great. If not, fall through.
QEMU Testing
We set up Unicorn (CPU emulator) on oppenheimer to test:
- Map all MMIO regions as RAM with pre-seeded "ready" values
- Skip MSR/MRS instructions via exception hooks
- Count instructions, compare original vs patched
Result: Original stuck at 0x10350 (TBZ loop). Patched progressed to 0x9124 (deep into PHY training). The NOP approach worked... in emulation.
The Production Patcher
We classified the 45 polls:
- 40 non-critical (SGRF, firewall, PLL) → NOP
- 5 training-critical (DfiStatus, CalBusy, etc.) → KEEP
Built U-Boot on ampere (the GenBook itself) with the patched blob.
💀 The Bricking
$ sudo flashcp -v u-boot-rockchip-8mb.bin /dev/mtd0
Reboot. Black screen. Maskrom mode.
What went wrong: The NOP approach was too aggressive. The PHY genuinely needs wait time for operations to complete. Converting polls to single checks meant the code proceeded before the hardware was ready, corrupting the DDR controller state.
The Recovery Odyssey
Problem 1: rkdeveloptool hanging on "Downloading bootloader..."
Turns out the Debian-packaged rkdeveloptool is the Pine64 fork which
doesn't have the cs (chip select) command needed for SPI flash. The
Rockchip original does:
git clone https://github.com/rockchip-linux/rkdeveloptool.git
# This one has: cs [storage: 1=EMMC, 2=SD, 9=SPINOR]
Problem 2: ModemManager grabbing the USB device
EBUSY: Device or resource busy
ModemManager probes every new USB device. sudo systemctl stop ModemManager
fixed the USB claim issue.
Problem 3: USB signal integrity
The first recovery host (ohm) had flaky USB — error -71 (protocol error).
Moved to higgs (Pi 5). Still had issues on bus 001. Switching to bus 004
(different USB port) got it working.
Problem 4: rkdeveloptool wrote to eMMC, not SPI
The cs 9 command to select SPI was crucial. Without it, the 8MB image
overwrote the eMMC boot partition instead of SPI. We recovered the eMMC
file system using testdisk (restored FAT directory entries) but the
file contents were zeroed.
The Save: A March 24 SPI backup on the data partition:
/mnt/sda3/spi-flash-backup-20260324.bin
This backup was our mainline U-Boot. Flash it back to SPI, boot from the USB stick (stock CoolPi kernel), mount the NVMe (which has the arch rootfs and kernel source), rebuild the boot files, copy to eMMC. Ampere lives.
Lessons Learned (The Hard Way)
- NEVER NOP hardware polls on production hardware. Counted loops or nothing.
- ALWAYS backup SPI before flashing:
dd if=/dev/mtdblock0 of=backup.bin - Use the Rockchip rkdeveloptool, not the Pine64 fork, when
csis needed. - Stop ModemManager before using rkdeveloptool.
- Battery disconnect isn't needed — maskrom button held during power-on works.
- Test with QEMU first. It caught the TBZ poll type we initially missed.
Day 2: The Trampoline Patcher
After the bricking, we built a proper fix: assembly trampolines with counted loops.
Each poll loop's backward branch is replaced with a B trampoline_N, where
the trampoline section is appended after the blob (no code shifting):
Trampoline for each poll:
MOV w18, #0x20000 ; 128K iteration timeout
MOVK w18, #0x2, LSL #16 ; (if needed for large count)
LDR w0, [xN, #offset] ; copy of original register load
<condition check> ; inverted: exit on success
SUBS w18, w18, #1 ; decrement counter
B.NE .-16 ; retry if counter > 0
B return_addr ; timeout: fall through
QEMU Result: Original stuck at 0x10350. Trampoline blob: X18=0x17ED (counter counting down on the last poll). All 45 polls have timeouts now.
The blob grows from 76,704 to 78,068 bytes (+1,364 bytes). Whether BL2 accepts the larger blob is the open question for real hardware testing.
Day 2: The Recompilation Attempt
We tried to make Ghidra's decompiled C actually compile. Starting from 11,976 lines and 4,184 errors:
- Added type definitions, register headers, forward declarations
- Fixed Ghidra artifacts (switchD_, stack0x, register0x, .0_1 sub-fields)
- Renamed 41 duplicate functions to unique names
- Fixed asm string literals, system register access
Got down to ~270 errors but hit a wall: Ghidra's C output is fundamentally a reading aid, not compilable source. Array assignments, unresolved call targets (same name for different functions), and struct sub-field access patterns can't be mechanically fixed.
Verdict: Binary patching (trampolines) is the right approach. Recompilation from decompiled output would require rewriting every function by hand.
Current State
What Exists
- Full decompilation of all blob versions (v1.02-v1.19)
- 53 named functions, 79 mapped MMIO registers
- Trampoline patcher (QEMU-verified, not yet hardware-tested)
- Frequency table (2112-3200 MHz LP5)
- Community research (40+ sources)
- DokuWiki article and Gitea repo
What's Next
- Instrumented QEMU trace — log every MMIO access with register state to build a complete execution flow map
- Hardware test of trampoline blob (with iFixit kit ready)
- UART capture of DDR training output for comparison
- Frequency patching — try 2736 MHz on boltzmann's Rock 5 ITX+
Infrastructure
| Host | Role |
|---|---|
| oppenheimer (CT131 on data) | Ghidra, QEMU, cross-compile |
| boltzmann (Rock 5 ITX+) | Source repo, DDR test target |
| ampere (GenBook) | The patient that survived surgery |
| tesla (hertz LXD) | Initial Ghidra attempt (failed) |
Repository
- Private Gitea:
git.reauktion.de/marfrit/rk3588-ddr-analysis - DokuWiki:
kelvin.reauktion.de/doku.php?id=rk3588_ddr_analysis
"We saved Private Ampere." — 2026-04-03, after 4 hours of recovery work.
Diary maintained by Claude Code (Opus 4.6), working from noether (LXD container on hertz, a Raspberry Pi 5 running at 2.8 GHz because we overclocked that too).