I am trying to troubleshoot interruptions on the 2.4ghz network, primarily affecting ESP8266 devices used as part of my home automation.
The card in my Ten64 doing 2.4ghz duties at the moment is 0001:04:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
using the ath10k driver. dmesg says the following [ 19.387532] ath10k_pci 0001:04:00.0: firmware ver 10.1-ct-8x-__fW-022-ecad3248 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 3e4cf97f
The purple line on the graph represents dropped ICMP packets. I have tried a number of channels and settings, at the moment these are the ‘most stable’ but every time the packets drop, my ESP8266 devices disconnect and reconnect.
config wifi-device 'radio1'
option type 'mac80211'
option macaddr '04:f0 --'
option htmode 'HT20'
option band '2g'
option cell_density '0'
option country 'AU'
option channel '13'
config wifi-iface 'wifinet1'
option device 'radio1'
option mode 'ap'
option ssid '--'
option encryption 'psk2'
option key '--'
option skip_inactivity_poll '1'
option network 'lan'
option disassoc_low_ack '0'
Linux muvirt 5.10.64 #0 SMP Wed Oct 13 05:51:35 2021 aarch64 GNU/Linux
I don’t see anything else in the logs that points to issues with the card.
What other troubleshooting steps should I follow? I am happy to run a different kernel, drivers, firmware whatever.
Out of interest, could you log the station data for one of the clients and see if there are any patterns (e.g signal strength dropping, changes in bitrate)?
Like so (obviously this is on 5GHz band but output will be similar):
# iw dev wlan0 station dump
Station xx:xx:xx:xx:xx:xx (on wlan0)
inactive time: 192 ms
rx bytes: 289453027
rx packets: 361708
tx bytes: 1210119351
tx packets: 881967
tx retries: 3988
tx failed: 1
rx drop misc: 277
signal: -56 [-60, -61, -63, -59] dBm
signal avg: -55 [-60, -62, -60, -58] dBm
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
tx duration: 27860395 us
rx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
rx duration: 0 us
airtime weight: 256
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
...
There’s been a couple of fixes in OpenWrt recently for WiFi related issues, they’ve been part of the general OpenWrt builds here a while but not muvirt. I’m going to push a new set of builds soon for this.
It’s also possible to switch from the CT driver’s and firmware OpenWrt prefer to the vanilla ones. Those may have different bugs though.
I’m running watch -n15 "iw dev wlan1 station dump | tee -a station_dump.txt"
and I’ll leave it going for a while, but
tx failed: 9619
rx drop misc: 27640
is a bit of a worry
Station 2c:f4:xx:xx:xx:xx (on wlan1)
inactive time: 460 ms
rx bytes: 25067929
rx packets: 752132
tx bytes: 8232293
tx packets: 83528
tx retries: 0
tx failed: 9619
rx drop misc: 27640
signal: -61 [-67, -65, -62] dBm
signal avg: -59 [-65, -63, -61] dBm
tx bitrate: 48.0 MBit/s
tx duration: 23444778 us
rx bitrate: 6.0 MBit/s
rx duration: 0 us
airtime weight: 256
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: no
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
connected time: 262848 seconds
associated at [boottime]: 49.477s
associated at: 1643001581860 ms
current time: 1643264429485 ms
A new muvirt build is out, this provides the option of switching between all available ath10k packages (drivers and firmware):
Build images: Index of /pub/traverse/software/muvirt/branches/master/460622476/image/
To switch to the plain ath10k driver and firmware:
opkg update
opkg remove kmod-ath10k-ct ath10k-firmware-qca988x-ct ath10k-firmware-qca9984-ct ath10k-firmware-qca99x0-ct
opkg install kmod-ath10k ath10k-firmware-qca988x ath10k-firmware-qca99x0 ath10k-firmware-qca9984
reboot
There’s also “full htt” variants of the -ct firmware, it could help in your situation if the card is getting stuck sending management frames to clients which are unreachable:
This saves limitted WMI buffers which can become depleted if lots of
management frames become stuck in TX queues due to peer
that went away.
opkg remove .. # remove existing ath10k-firmware* + kmod-ath10k if not using kmod-ath10k-ct
opkg install kmod-ath10k-ct ath10k-firmware-qca988x-ct-full-htt ath10k-firmware-qca9984-ct-full-htt ath10k-firmware-qca99x0-ct-full-htt
I’ll try the full htt drivers.
Am I okay to sysupgrade if I’ve installed muVirt on the NVMe drive?
Yes, it’s ok, the upgrade process preserves any partitions outside the standard OpenWrt zone as well.
Only catch: on older builds there was a bug where the configuration didn’t carry if the existing boot+root partitions sizes didn’t match the new ones. It’s been fixed now, but do a config backup just in case.
The dropouts have improved a bit, but something also messed up my graphs so I don’t have data to back that up.