928268f477
Per PR #6 review feedback. Independent track from Bug #5; scheduled once the Bug #5 measurement pass finishes.
163 lines
6.2 KiB
Markdown
163 lines
6.2 KiB
Markdown
# Observed BES2600 driver bugs on PineTab2 (ohm)
|
||
|
||
Compiled from on-device dmesg + Pine64 wiki + community reports. Cross-references the patch series.
|
||
|
||
## Bug #1 — factory.txt path mismatch + filp_open antipattern (FIXED in c1)
|
||
|
||
**File**: `bes2600_factory.c:148-170` (read), `:188-200` (create)
|
||
|
||
**Symptom (pre-fix)**:
|
||
```
|
||
(NULL device *): read and check /lib/firmware/bes2600_factory.txt error
|
||
```
|
||
|
||
**Root cause**: hardcoded `FACTORY_PATH=/lib/firmware/bes2600_factory.txt` Makefile macro;
|
||
real file ships at `/lib/firmware/bes2600/bes2600_factory.txt`. Worse, the read uses
|
||
`filp_open` + `kernel_read` directly, bypassing the firmware-class infrastructure.
|
||
|
||
**Fix**: c1 patch — `request_firmware()` for the read path, repointed Makefile macro
|
||
to firmware-class name `bes2600/bes2600_factory.txt`.
|
||
|
||
## Bug #1.5 — factory.txt parse failure (NEW, c5 to investigate)
|
||
|
||
**File**: `bes2600_factory.c factory_parse()`
|
||
|
||
**Symptom (post-c1)**:
|
||
```
|
||
bes2600_factory.txt parse fail
|
||
read and check bes2600/bes2600_factory.txt error
|
||
factory cali data get failed.
|
||
```
|
||
|
||
**How discovered**: c1 fix exposed a deeper bug — `factory_parse()` chokes on the data
|
||
that `request_firmware()` now successfully returns. The original bug masked this
|
||
because the read always failed first.
|
||
|
||
**Hypotheses**: null-termination assumption mismatch (`request_firmware` doesn't
|
||
null-terminate), `FACTORY_MEMBER_NUM=30/31` count discrepancy, kmalloc not
|
||
zero-initialized, parser strict on trailing `%%\n` delimiter.
|
||
|
||
**Status**: investigation pending (task c5). Driver falls back to defaults; WiFi
|
||
functional but TX power is uncalibrated (all channels at 0x1400).
|
||
|
||
## Bug #2 — PM low-power handshake timeout (recurring)
|
||
|
||
**File**: `bes_pwr.c:470-558` — `bes2600_pwr_enter_lp_mode()`. Error at line 538.
|
||
|
||
**Symptom**:
|
||
```
|
||
bes2600_wlan mmc2:0001:1: bes2600_pwr_enter_lp_mode, wait pm ind timeout
|
||
```
|
||
|
||
Fires every 5–10s in steady state when associated. Floods dmesg, likely
|
||
correlates with bug #3 (SDIO TX stack splat) and bad battery life.
|
||
|
||
**Root cause**: `wait_for_completion_timeout(&pm_enter_cmpl, 5*HZ)` waits
|
||
for firmware to acknowledge a PM mode change; firmware never sends ACK.
|
||
Driver proceeds to `bes2600_pwr_device_enter_lp_mode()` regardless.
|
||
|
||
**Mobian == danctnix**: identical bes_pwr.c (1447 lines, 0-hunk diff). No
|
||
upstream fix exists; we'd invent it (gate device-LP entry on completion +
|
||
add retry).
|
||
|
||
**Status**: task c2.
|
||
|
||
## Bug #3 — SDIO TX scatter-gather panic / WARN
|
||
|
||
**File**: `bes2600_sdio.c:952-1200` — `bes_sdio_memcpy_to_io_helper`,
|
||
`sdio_tx_work`.
|
||
|
||
**Symptom**:
|
||
```
|
||
[RX] Receive failure: 4.
|
||
bes_sdio_memcpy_to_io_helper+0x18c/0x288 [bes2600]
|
||
sdio_tx_work+0x2b4/0x4a0 [bes2600]
|
||
Workqueue: bes_sdio sdio_tx_work [bes2600]
|
||
```
|
||
|
||
Recurring under TX load. Can wedge the chip irrecoverably (per Pine64 wiki:
|
||
"Power/reset circuitry not properly implemented; hard reset impossible
|
||
without board power-cycle").
|
||
|
||
**Status**: task c3 (indirectly, via bes_chardev removal which currently
|
||
gates the signal/nosignal mode switch path).
|
||
|
||
## Backlog — full architect review of bes2600 driver code quality
|
||
|
||
The Phase 0 perf trace for Bug #5 exposes a "when in doubt, add a lock"
|
||
pattern in the BH path (~20 % CPU in `_raw_spin_unlock_irqrestore` even
|
||
during healthy throughput). Markus has flagged this for a separate
|
||
architect-review pass: have Claude Sonnet (or equivalent reviewer) do a
|
||
top-to-bottom code-quality review of the bes2600 sources we have on
|
||
boltzmann (`~/src/besser/bes2600-dkms-mobian/bes2600/`), looking for:
|
||
|
||
- needless lock proliferation
|
||
- BH / workqueue dispatch shape
|
||
- error-handling coverage
|
||
- dead code / leftover-from-cw1200 cruft
|
||
- API contract violations relative to mainline mac80211
|
||
|
||
Output: ranked list of cleanup targets that would make later patch series
|
||
land more cleanly. Not blocking on Bug #5 — independent track.
|
||
|
||
**Status**: backlog. Schedule when Bug #5's measurement pass finishes.
|
||
|
||
## Bug #5 — RX path degrades under attempted-throughput pressure
|
||
|
||
**Suspect file**: bes2600 RX path (`txrx.c bes2600_rx_cb`, `bh.c bes2600_bh_work`,
|
||
SDIO RX scheduling) — pinpoint pending.
|
||
|
||
**Symptom (observed 2026-05-07 13:43, srcversion `1B3B3ED0` = c-stack +
|
||
Patch A + Patch B, ohm @ -57 dBm 2.4GHz ch11 5b:32, idle save for the
|
||
netcat load):**
|
||
|
||
```
|
||
sender cap 1 MB/s → ohm receives 1015 KB/s, signal -57 dBm, RX MCS 4
|
||
sender cap 4 MB/s → ohm receives 563 KB/s, signal -67 dBm, RX MCS 3
|
||
(Send-Q on boltzmann backed up to 1.16 MB)
|
||
```
|
||
|
||
Pushing the sender-side cap from 1 MB/s to 4 MB/s **decreased** observed
|
||
throughput at the receiver and degraded the link metrics. Signal dropped
|
||
~10 dB and the chip downshifted MCS, suggesting the chip can't sustain
|
||
the higher RX rate even with the link physically capable of more (link
|
||
bitrate 65 Mb/s = ~8 MB/s theoretical).
|
||
|
||
**Hypothesis (Markus, 2026-05-07): driver/firmware locks itself to death
|
||
under busy reads** — possibly a busy-wait loop or lock contention on the
|
||
RX SDIO path that prevents draining at line rate. Plausible reason it
|
||
didn't surface for the c-stack tasks: those operated at typical
|
||
browse-rate traffic, well below the saturation threshold this bug needs
|
||
to fire.
|
||
|
||
**May explain**: original Phase-0 observation that **YouTube DASH chunks
|
||
drop ~10 frames per chunk fetch** on hardware-decoder playback. A chunk
|
||
fetch is a brief burst at near-link-rate; if the driver throttles itself
|
||
down during high-RX, the player buffer underruns for the duration of
|
||
the fetch.
|
||
|
||
**How to drill (when prioritized)**:
|
||
- Capture trace_pipe with `mmc:*` and `sdio*` events enabled during a
|
||
controlled rate-ramp (e.g., pv -L 500K, 1M, 2M, 4M each for 60 s).
|
||
- Watch `/proc/sys/kernel/sched_*` and the `bes2600_bh_work` kworker for
|
||
CPU saturation.
|
||
- `perf top -p $(pgrep -f bes_sdio)` during 4 MB/s load.
|
||
|
||
**Status**: backlog. No patch yet.
|
||
|
||
## Bug #4 — scan_complete_cb constant loop
|
||
|
||
**File**: `scan.c:883-909` — `bes2600_scan_complete_cb()`.
|
||
|
||
**Symptom**:
|
||
```
|
||
ieee80211 phy0: bes2600_scan_complete_cb status: 0
|
||
```
|
||
|
||
Fires every 2–10s (status=0 = success, but the FREQUENCY suggests background
|
||
scanning runs continuously when associated + idle).
|
||
|
||
Most likely a NetworkManager scheduling artifact, not a driver bug. Low
|
||
priority; suppress the wiphy_dbg print or skip scan-on-assoc'd if it
|
||
matters.
|