Per RK3588 TRM Part 2 chapter 2 (DMC, 522 pages): +0x10080 = DDRCTL_MRCTRL0 (Mode Register Control, was MicroReset) +0x10090 = DDRCTL_MRSTAT (MR Status mr_wr_busy, was MicroContMuxSel) +0x10514 = DDRCTL_DFISTAT (DFI Status dfi_init_complete, was UctWriteProtShadow) These are uMCTL2 controller registers — Rockchip-documented — NOT the opaque PHY firmware scratch regs our 2026-04 analysis guessed. Poll semantics now vendor-grounded: wait for MR command roundtrip, wait for PHY-side DFI handshake. Low-offset polls in train_phy_block (0x110, 0x118, 0x120, 0x154, 0x160, 0x184) plus the 0x684/0xa24/0xb88 ones remain DWC PUB and thus undocumented; kept the best-effort RE names with `(RE)` tag in the BUG_ANALYSIS table so a reader can tell which ones are vendor-canonical and which are guesses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16 KiB
RK3588 DDR Init Blob — Bug Analysis & Training Explainer
What is DDR Training?
DDR training is the process by which the memory controller and PHY (physical interface) calibrate their timing to reliably communicate with the DRAM chips. At the frequencies involved (2400-3200 MHz clock, 4800-6400 MT/s data rate), signal integrity is the primary challenge. Electrical signals traveling on PCB traces experience:
- Propagation delay — different trace lengths = different arrival times
- Crosstalk — adjacent signals interfere with each other
- ISI (inter-symbol interference) — previous bit values affect current bit
- Voltage droop/overshoot — impedance mismatches cause reflections
- PVT variation — process, voltage, temperature affect timing margins
Training compensates for all of these by finding the optimal timing window (the "eye") for each signal individually.
Training Stages (as seen in this blob)
The RK3588 uses a Synopsys DWC (DesignWare Core) LPDDR5/4X multiPHY. The training sequence visible in the decompiled code follows the standard DWC PHY training flow:
1. ZQ Calibration (0x684 — CalBusy register)
- What: Calibrates output driver impedance (pull-up/pull-down strength)
- Why: Process variation means each chip's transistors are slightly different
- In the code: Polls
DWC_DDRPHY_MASTER_CalBusy(offset 0x684) — 11 uses - Bug risk: No timeout on CalBusy poll (Line 3226)
2. Write Leveling
- What: Aligns the DQS (data strobe) signal with the CK (clock) at the DRAM
- Why: PCB trace lengths differ — the clock and data arrive at different times
- How: Controller sends DQS edges, DRAM reports whether DQS arrived early/late
- In the code: Part of the large training functions with nested loops (0x10 iterations = 16 DQ bits, 0x20 iterations = 32-bit data path)
3. Read Gate Training
- What: Finds the correct time to enable the read data capture circuit
- Why: The controller must know exactly when valid read data will arrive
- In the code: Functions polling
DfiStatus(offset 0xa24, 65 uses)
4. Read/Write DQ Training (per-bit deskew)
- What: Adjusts timing for each individual data bit (DQ0-DQ15)
- Why: Each bit trace has slightly different length and coupling
- How: Sends known patterns (0xAA55AA55, 0x55AA55AA), reads back, adjusts delays
- In the code: The 0xaa55aa55 pattern writes at offsets 0x93c-0x970 (Lines 3857-3868)
- Size: ~30 functions dedicated to DQ training — the bulk of the blob
5. Read/Write Eye Training
- What: Scans the timing and voltage range where data is reliably captured
- Why: Finds the center of the "eye diagram" — maximum noise margin
- How: Sweeps delay and VREF values, tests at each point
- In the code: The "eyescan" blob variant runs extended eye analysis
- Note: The
0xff00aapattern (28 uses) is a BUS_GRF configuration for interleaving/routing — not directly eye training but enables the channel configuration needed for it.
6. VREF Training (Voltage Reference)
- What: Finds optimal voltage threshold for distinguishing 0 and 1
- Why: At high speeds, the voltage swing is smaller — VREF must be centered
- Two sides:
- Host-side VREF: PHY's input comparator threshold
- DRAM-side VREF: DRAM's input comparator threshold (set via Mode Register Write)
- In the code: Functions accessing offsets 0x600, 0x608, 0x60c (29+24+14 = 67 uses)
7. CA Training (Command/Address)
- What: Calibrates timing for the command/address bus (separate from data)
- Why: Commands must arrive reliably — a missed command corrupts everything
- In the code: Uses the same DFI interface (offset 0xa24) but with CA-specific mode register commands
Why Training Must Run Every Boot
Training results depend on:
- Temperature — timing shifts by ~1-2 ps/°C
- Voltage — supply voltage affects driver strength
- DRAM internal state — varies between power cycles
- PCB and component aging — long-term drift
The results are stored in SRAM (offsets 0x001FE000-0x001FE010) and passed to the kernel via PMU GRF OS registers. The kernel uses these for DVFS (Dynamic Voltage and Frequency Scaling) during runtime.
Bug Analysis
CRITICAL: 20 Timeout-less Hardware Polls
The most serious class of bugs. These are do {} while loops that poll
hardware registers indefinitely. If the hardware doesn't respond (due to
a clock issue, reset problem, or silicon defect), the system hangs
permanently during boot with no diagnostic output.
Updated 2026-04-15 per RK3588 TRM Part 2: the uMCTL2 controller
offsets are now vendor-named. PHY PUB offsets remain undocumented in
the TRM (DWC / Innosilicon IP is not republished) — the names below
with (RE) are our reverse-engineering guesses.
| Register | Offset (base) | Uses | What it waits for | Source |
|---|---|---|---|---|
| SGRF_DDR_STATUS | abs 0xFE0500E0 | 1 | Security GRF config done | RK3588 TRM part 1 |
| SGRF_DDR_CON21 | abs 0xFE050054 | 2 | Security GRF configuration done | RK3588 TRM part 1 |
| DDRCTL_DFISTAT | DDRCTL + 0x10514 | 5 | dfi_init_complete — PHY↔controller handshake |
TRM part 2 Ch.2 (renamed from "UctWriteProtShadow") |
| DDRCTL_MRSTAT | DDRCTL + 0x10090 | 4 | mr_wr_busy — Mode Register Write complete |
TRM part 2 Ch.2 (renamed from "MicroContMuxSel") |
| DDRCTL_MRCTRL0 | DDRCTL + 0x10080 | 2 | Mode Register Control (not a poll target by itself — but polled by code waiting for MR command completion) | TRM part 2 Ch.2 (renamed from "MicroReset") |
| PHY_CTL_STATE | PHY + 0x14 (RE) |
4 | PHY state machine: [2:0] == 1 (idle) or (val & 7) == 3 (some training stage) |
Reverse-engineered — still not in TRM |
| PHY_CALBUSY | PHY + 0x684 (RE) |
1 | ZQ calibration complete — name matches DWC PUB convention | Reverse-engineered |
| PHY_DFI_READY | PHY + 0xa24 (RE) |
4 | DFI-side handshake bit from PHY — separate from DDRCTL_DFISTAT | Reverse-engineered |
| PHY_SHADOW_BB8 | PHY + 0xb88 (RE) |
2 | Shadow status word that carries training firmware state between sub-blocks | Reverse-engineered |
| PHY_TRAIN_STEP | PHY + 0x118 / 0x120 (RE) |
2 | Step-complete bits [31:28] — used in train_phy_block (d328) | Reverse-engineered |
| PHY_HANDSHAKE | PHY + 0x184 (RE) |
2 | HANDSHAKE bits [1:0] — writer/reader sync in d328 | Reverse-engineered |
Base conventions:
DDRCTL= per-channel uMCTL2 controller base (four channels: DDRCTL0..3 per TRM Table "DDR Channel X IO description", pp. 557-558).PHY= per-channel PHY base pointer held inctx[ch*32], with the+0x8000sub-block for the "Master"-class PHY block seen in d328 and the+0x10000sub-block for the larger PHY block seen in d10c.
Impact: Any of these can cause a boot hang. The most likely failure mode:
- Cold boot at extreme temperatures (timing margins shrink)
- DRAM module with slow ZQ calibration
- Power supply droop during training (PHY doesn't respond)
Fix: Add a timeout counter (e.g., 1000 iterations with 1µs delay = 1ms timeout) and return an error code. The calling function already checks for 0xFFFFFFFF error returns (23 instances).
WARNING: Read-Modify-Write on MMIO Without Memory Barriers
Several MMIO registers are read, modified, and written back without memory
barriers (dsb, dmb, or isb). On AArch64 with strongly-ordered device
memory, this is usually safe if the memory type is set correctly (Device-nGnRE
or Device-nGnRnE). However, if the MMU mapping is incorrect (Normal memory
type), these operations could be reordered.
Affected registers:
SGRF_DDR_ENABLE(|= 1, &= ~1)FW_DDR_ACCESS_CTRL(|= 0xFFFF, &= 0xFFFF0000)
Since this runs in EL3 with the MMU configuration controlled by BL31, this is likely safe — but it's a latent risk if the memory map changes.
WARNING: Firewall Left Open
ddr_open_firewall() (Line 137) sets FW_DDR_ACCESS_CTRL |= 0xFFFF,
granting all bus masters DDR access. The matching ddr_close_firewall()
(Line 206) re-restricts it. However, the close function may not be called
on all error paths — an early return due to training failure could leave
the firewall wide open.
OPTIMIZATION: Redundant Register Polls
Several functions poll the same register in sequence:
- Lines 1969-1979: Three consecutive polls of
+0x10090and+0x10080 - Lines 2470-2480: Same triple poll pattern
These appear to be:
- Wait for PHY firmware to finish current operation
- Check firmware status
- Wait for firmware to accept new command
The first and third polls are redundant if the firmware always transitions atomically. This could be a defensive coding pattern or a workaround for a PHY firmware bug where the status isn't updated atomically.
OPTIMIZATION: Magic Number 0xFF00AA
The value 0xFF00AA appears 28 times in BUS_GRF register writes. This is
the Rockchip GRF "write-enable mask" pattern:
- Upper 16 bits = write mask (0xFF00 = bits 15:8 writable)
- Lower 16 bits = value (0x00AA)
This is a hardware feature of Rockchip GRF registers — not a bug, but the decompiled code obscures the intent. In readable form:
BUS_GRF_REG[15:8] = 0xAA; // set bits 15:8 to 10101010
OBSERVATION: Error Recovery Strategy
The blob has 23 error returns (0xFFFFFFFF) across 1405 conditional checks —
a 1.6% error handling ratio. Most errors result in immediate abort with no
retry. The main orchestrator function (ddr_pmu_status_check at 0x9A90,
the largest at 43K chars) does attempt retries by calling training
subfunctions in sequence and checking their return values.
The error flow is:
- Training function fails → returns 0xFFFFFFFF
- Orchestrator detects failure → prints error string via UART
- Returns failure to BL2
- BL2 typically resets the SoC and tries again
There is no selective retry (e.g., "write leveling passed but read gate training failed, retry only read gate training"). Each failure restarts the entire training sequence from scratch.
Code Structure Summary
| Component | Functions | Lines | Purpose |
|---|---|---|---|
| Entry/dispatch | 3 | ~100 | Reset vector, version check |
| Security setup | 2 | ~50 | SGRF, firewall open/close |
| Clock/PLL | 3 | ~200 | DPLL config, clock gating, reset |
| Bus config | 1 | ~800 | 27 BUS_GRF registers |
| PHY training | ~30 | ~6000 | DQ/CA/VREF/eye training |
| DDRC init | 5 | ~2000 | Controller configuration |
| Timing calc | 3 | ~1500 | Timing parameter computation |
| Orchestrator | 1 | ~1500 | Main sequence, error handling |
| Mailbox/SRAM | 2 | ~200 | BL31 communication |
| Scramble | 1 | ~100 | DDR encryption |
| Utilities | ~65 | ~500 | Helper functions |
Cross-Reference: Known Bugs vs Decompiled Code
v1.18 "Single-rank LPDDR5 derate crash" — Found in Code
The v1.18 release notes say: "Fixed derate issue with single-rank LPDDR5" and "System might hang in kernel when switching frequency".
In the decompiled code, the DERATEINT/MR4 logic is in the large timing
calculation function FUN_0000de40 (22,819 chars, 162 branches). This function
computes timing parameters including derating adjustments. The single-rank bug
likely affected the branch at the CS0/CS1 asymmetric capacity handling, which
was added in v1.16 but not correctly gated for single-rank configurations.
The timeout-less polls at offsets +0x10090 and +0x10080 (PHY firmware
mailbox and reset) are on the DVFS frequency switch path — exactly where
the v1.18 hang was reported.
v1.15 "PHY skew > DLL lock" — Found in Code
The training functions contain per-bit deskew calculations with clamping logic. In the DQ training functions (e.g., at line ~2040-2100), nested loops iterate over 0x20 (32) delay taps and 4 byte lanes. The clamping check ensures the selected delay tap doesn't exceed the DLL's lock range — a boundary condition that v1.15 fixed.
The 20 Timeout-less Polls — Explain Cold Boot Failures
Community reports of cold boot failures (Armbian, Radxa forums) are consistent with the 20 timeout-less hardware polls found in this analysis. At low temperatures:
- ZQ calibration takes longer (CalBusy at +0x684) — silicon is slower
- PHY firmware startup is slower (MicroReset at +0x10080)
- DFI interface negotiation takes longer (DfiStatus at +0xA24)
Without timeouts, any of these becoming "slightly too slow" causes a permanent boot hang. The board appears dead until power-cycled (which changes temperature slightly, possibly allowing the next boot to succeed).
The LPDDR5 Bandwidth Paradox
ThomasKaiser documented that LPDDR5 at 5472 MT/s showed worse latency than LPDDR4X at 4224 MT/s on the Rock 5 ITX. This is explained by the LPDDR5 protocol overhead visible in the decompiled code:
- WCK synchronization — LPDDR5 requires WCK-to-CK alignment before every data transfer, adding ~5 ns latency per burst
- Longer CA training — the separate CA bus requires CBT Mode 1/2 training
- More training stages — 15 steps vs ~8 for LPDDR4X
The bus configuration in BUS_GRF (27 registers at 0xFD5F8xxx) is significantly
more complex for LPDDR5, with the 0xFF00AA write-mask pattern used 28 times
to configure interleaving and routing for the 4-channel LPDDR5 topology.
Synopsys DWC PHY — Training Sequence in the Code
The decompiled training flow maps to the standard Synopsys DWC PHY sequence:
Register-to-Training-Stage Mapping
| PHY Offset | Synopsys Name | Training Stage | Uses |
|---|---|---|---|
| +0x684 | CalBusy | ZQ Calibration | 11 |
| +0xA24 | DfiStatus | DFI ready / gate training | 65 |
| +0x600 | VrefDAC0 | VREF training (host-side) | 29 |
| +0x608 | VrefDAC1 | VREF training (DRAM-side) | 24 |
| +0x60C | VrefDAC2 | VREF training | 14 |
| +0x10080 | MicroReset | PHY firmware control | 13 |
| +0x10090 | MicroContMuxSel | Firmware ↔ APB mux | varies |
| +0x10180 | AcsmPlayback | Address/Command SM | 26 |
| +0x10280 | AcsmPlayback+0x100 | AC training | 21 |
| +0x10510 | UctWriteOnlyShadow | Training write commands | 28 |
| +0x10514 | UctWriteProtShadow | Training status/complete | 28 |
| +0x12BA0 | Reserved/vendor | Vendor-specific training | 11 |
The 0xAA55AA55 Training Pattern
The distinctive pattern 0xAA55AA55 written to offsets 0x93C-0x970 (lines
3857-3868) is the DQ training data pattern. The alternating bit pattern
is specifically chosen because:
10101010...maximizes switching noise (worst-case ISI)01010101...tests the complementary case10101010_01010101(0xAA55) tests all DQ-to-DQ crosstalk combinations
The variations (0xAAAA5555, 0x55AA55AA, 0x00005555) provide different crosstalk scenarios — each pattern stresses a different subset of inter-bit coupling on the PCB.
Optimization Opportunities
1. Add Timeouts to Hardware Polls (Critical)
Add a countdown with ~1ms timeout to all 20 identified polls. Return 0xFFFFFFFF on timeout — the infrastructure already exists.
2. Selective Training Retry
Currently, any training failure restarts the full sequence. The Synopsys PUB supports restarting individual training steps via the PIR register. Retrying only the failed step would reduce recovery time from ~100ms to ~10ms.
3. Parallel Channel Training
The code appears to train channels sequentially (single-channel DDRC access at 0xFE010000). The Synopsys PUB firmware supports parallel training of independent channels — this could halve training time for 4-channel configs.
4. Remove Redundant Polls
The triple-poll pattern (lines 1969-1979, 2470-2480) appears to be defensive coding for a PHY firmware race condition. If the race is fixed in current firmware, these could be collapsed to single polls.
5. Spread-Spectrum Clocking
The ddrbin_tool supports spread-spectrum mode (center/up/down spread) for EMI reduction. This is not configured in the standard blob — enabling center-spread could reduce DDR EMI by 6-10 dB with negligible performance impact.