When the LMAC active monitor detects 'link break between lmac and host'
(the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor),
bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the
device down and prepare for a fresh probe. On the wifi_force_close_work
side this calls bes2600_chrdev_do_system_close() which dispatches
sbus_ops->power_switch(0).
On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a
no-op:
* bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message,
clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is
the literal one-line stub bes_warn("...this function does
nothing\n").
* bes2600_sdio_on() (the eventual power_switch(1) counterpart)
toggles pdata->powerup, which is NULL on PineTab2 because the
wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device
tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi:
'The reset pin is claimed by sdio_mmcseq, It is better to move it
to U-Boot so the OS can use it.').
Net result: the chip is never reset. The function drivers are not
removed (the SDIO core has no signal that the card is gone), the
firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO
function in a half-torn-down state. modprobe bes2600 then fails with
'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM)
on both functions (:1 wifi, :2 BT-companion) until a full system
reboot.
Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150
minutes of background-scan rejects (wsm_generic_confirm 0x0007,
[SCAN] Scan failed (-22)) accumulating until the LMAC stopped
acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible
under sustained scan pressure.
Add a sbus operation bus_reset() that the recovery path can call when
power_switch() has no effective chip-reset signal of its own. Provide
an SDIO implementation that calls mmc_hw_reset(self->func->card),
which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN
and func 2 for the BT-companion path) takes the remove-and-rescan
path: mmc_sdio_hw_reset() marks the card removed and schedules
mmc_rescan, which tears down the bound function drivers and re-detects
the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
With a single function probed it instead invokes mmc_power_cycle()
directly, which on PineTab2 toggles the wifi-reset GPIO via
sdio_pwrseq.
Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It
invokes the bus op and then waits on probe_done_wq for the SDIO
remove() callback to clear sbus_priv, mirroring the wait pattern
already used by bes2600_chrdev_do_system_close() so that a subsequent
bes2600_switch_wifi(true) sees a clean state and can wait on the
fresh probe.
Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is
set (the hard-exception path used by both
bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and
the underlying bus implements bus_reset, take the new recovery path;
otherwise fall back to the legacy power_switch(0) sequence so this
patch is a no-op on USB or any other future bus that does not provide
bus_reset.
mmc_hw_reset() is exported by the MMC core and is the canonical
recovery primitive; calling it without holding the SDIO host claim is
correct because the multi-func remove-and-rescan path acquires the
host claim via the mmc workqueue, and the single-func mmc_power_cycle
path does not require the host claim.
No DT change is required: this works against the existing PineTab2
DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on
v2.0 boards) are both already configured as MMC pwrseq resets.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
The SDIO TX path rounds the DMA transfer length up to the host's
current block size and hands that length to dma_map_sg() via
sg_set_buf(&sg[scatters], tx_buffer->buf, align) in sdio_tx_work().
tx_buffer->buf typically aliases into an skb linear head whose
allocated size matches tx_buffer->len, not the block-aligned
align. The DMA engine (swiotlb / dw_mci IDMAC) therefore reads up
to one block past the end of the skb. On a PineTab2 with KFENCE
enabled this fires as:
BUG: KFENCE: out-of-bounds read in __pi_memcpy_generic
Out-of-bounds read at ... (704B right of kfence-#...):
__pi_memcpy_generic
swiotlb_tbl_map_single
swiotlb_map
dma_direct_map_sg
__dma_map_sg_attrs
dma_map_sg_attrs
dw_mci_pre_dma_transfer
__dw_mci_start_request
...
bes_sdio_memcpy_to_io_helper+0x18c/0x288 [bes2600]
sdio_tx_work+0x2b4/0x4a0 [bes2600]
allocated by ... pskb_expand_head / validate_xmit_skb / tcp_*
In addition to being undefined behavior, the padding bytes (which
come from whatever memory follows the skb) are transmitted to the
peer, leaking kernel memory on the air.
Allocate a driver-owned DMA-page bounce buffer sized to
MAX_SDIO_TRANSFER_LEN and use it as the scatter-gather backing for
sdio_tx_work. Each TX buffer is copied into its bounce slot and the
tail (align - tx_buffer->len bytes) is zeroed. This mirrors the
existing bounce pattern already used by bes2600_sdio_memcpy_toio()
via single_gathered_buffer; a separate allocation is used for the
TX path because single_gathered_buffer is only serialised via
sdio_claim_host and sdio_tx_work accumulates scatter entries before
claiming the bus.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Follow-up to \"bes2600: use request_firmware() for factory.txt read\".
That patch switched the factory calibration read path from filp_open()
+ kernel_read() to request_firmware(), but passed dev=NULL to
request_firmware() because factory_section_read_file() did not have a
struct device * in scope. The resulting logs carry the
'(NULL device *):' prefix and do not propagate a udev association.
Add a module-local static struct device * used as the firmware-class
load context, plus a small exported setter:
static struct device *bes2600_factory_dev;
void bes2600_factory_set_dev(struct device *dev);
Wire bes2600_factory_set_dev(&func->dev) from bes2600_sdio_probe(),
right after bes2600_platform_data_init() so the platform layer has
already had a chance to use the same struct device for its own
initialization.
factory_section_read_file() now passes bes2600_factory_dev (instead
of NULL) to request_firmware(). When the factory read happens before
probe (not currently the case on PineTab2) the pointer is still NULL
and request_firmware() accepts that; no regression.
No API changes to bes2600_get_factory_cali_data() callers. The
char *path parameter remains (it is the firmware-class name fed
straight to request_firmware()).
Tested-on: PineTab2 (BES2600WM + RK3566) running linux-pinetab2
6.19.10-danctnix1-1. Driver probes, factory data is read, and any
post-c5 factory diagnostics now carry the SDIO device identity
instead of '(NULL device *)'.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>