The SDIO TX path rounds the DMA transfer length up to the host's
current block size and hands that length to dma_map_sg() via
sg_set_buf(&sg[scatters], tx_buffer->buf, align) in sdio_tx_work().
tx_buffer->buf typically aliases into an skb linear head whose
allocated size matches tx_buffer->len, not the block-aligned
align. The DMA engine (swiotlb / dw_mci IDMAC) therefore reads up
to one block past the end of the skb. On a PineTab2 with KFENCE
enabled this fires as:
BUG: KFENCE: out-of-bounds read in __pi_memcpy_generic
Out-of-bounds read at ... (704B right of kfence-#...):
__pi_memcpy_generic
swiotlb_tbl_map_single
swiotlb_map
dma_direct_map_sg
__dma_map_sg_attrs
dma_map_sg_attrs
dw_mci_pre_dma_transfer
__dw_mci_start_request
...
bes_sdio_memcpy_to_io_helper+0x18c/0x288 [bes2600]
sdio_tx_work+0x2b4/0x4a0 [bes2600]
allocated by ... pskb_expand_head / validate_xmit_skb / tcp_*
In addition to being undefined behavior, the padding bytes (which
come from whatever memory follows the skb) are transmitted to the
peer, leaking kernel memory on the air.
Allocate a driver-owned DMA-page bounce buffer sized to
MAX_SDIO_TRANSFER_LEN and use it as the scatter-gather backing for
sdio_tx_work. Each TX buffer is copied into its bounce slot and the
tail (align - tx_buffer->len bytes) is zeroed. This mirrors the
existing bounce pattern already used by bes2600_sdio_memcpy_toio()
via single_gathered_buffer; a separate allocation is used for the
TX path because single_gathered_buffer is only serialised via
sdio_claim_host and sdio_tx_work accumulates scatter entries before
claiming the bus.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>