Intermittent 2.4ghz interruptions

I am trying to troubleshoot interruptions on the 2.4ghz network, primarily affecting ESP8266 devices used as part of my home automation.

The card in my Ten64 doing 2.4ghz duties at the moment is 0001:04:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter using the ath10k driver. dmesg says the following [ 19.387532] ath10k_pci 0001:04:00.0: firmware ver 10.1-ct-8x-__fW-022-ecad3248 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 3e4cf97f

The purple line on the graph represents dropped ICMP packets. I have tried a number of channels and settings, at the moment these are the ‘most stable’ but every time the packets drop, my ESP8266 devices disconnect and reconnect.

config wifi-device 'radio1'
        option type 'mac80211'
        option macaddr '04:f0 --'
        option htmode 'HT20'
        option band '2g'
        option cell_density '0'
        option country 'AU'
        option channel '13'

config wifi-iface 'wifinet1'
        option device 'radio1'
        option mode 'ap'
        option ssid '--'
        option encryption 'psk2'
        option key '--'
        option skip_inactivity_poll '1'
        option network 'lan'
        option disassoc_low_ack '0'

Linux muvirt 5.10.64 #0 SMP Wed Oct 13 05:51:35 2021 aarch64 GNU/Linux

I don’t see anything else in the logs that points to issues with the card.

What other troubleshooting steps should I follow? I am happy to run a different kernel, drivers, firmware whatever.

Out of interest, could you log the station data for one of the clients and see if there are any patterns (e.g signal strength dropping, changes in bitrate)?

Like so (obviously this is on 5GHz band but output will be similar):

# iw dev wlan0 station dump
Station xx:xx:xx:xx:xx:xx (on wlan0)
        inactive time:  192 ms
        rx bytes:       289453027
        rx packets:     361708
        tx bytes:       1210119351
        tx packets:     881967
        tx retries:     3988
        tx failed:      1
        rx drop misc:   277
        signal:         -56 [-60, -61, -63, -59] dBm
        signal avg:     -55 [-60, -62, -60, -58] dBm
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        tx duration:    27860395 us
        rx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        rx duration:    0 us
        airtime weight: 256
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        short slot time:yes

There’s been a couple of fixes in OpenWrt recently for WiFi related issues, they’ve been part of the general OpenWrt builds here a while but not muvirt. I’m going to push a new set of builds soon for this.

It’s also possible to switch from the CT driver’s and firmware OpenWrt prefer to the vanilla ones. Those may have different bugs though.

I’m running watch -n15 "iw dev wlan1 station dump | tee -a station_dump.txt" and I’ll leave it going for a while, but

        tx failed:      9619
        rx drop misc:   27640

is a bit of a worry

Station 2c:f4:xx:xx:xx:xx (on wlan1)
        inactive time:  460 ms
        rx bytes:       25067929
        rx packets:     752132
        tx bytes:       8232293
        tx packets:     83528
        tx retries:     0
        tx failed:      9619
        rx drop misc:   27640
        signal:         -61 [-67, -65, -62] dBm
        signal avg:     -59 [-65, -63, -61] dBm
        tx bitrate:     48.0 MBit/s
        tx duration:    23444778 us
        rx bitrate:     6.0 MBit/s
        rx duration:    0 us
        airtime weight: 256
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        no
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        connected time: 262848 seconds
        associated at [boottime]:       49.477s
        associated at:  1643001581860 ms
        current time:   1643264429485 ms

A new muvirt build is out, this provides the option of switching between all available ath10k packages (drivers and firmware):

Build images: Index of /pub/traverse/software/muvirt/branches/master/460622476/image/

To switch to the plain ath10k driver and firmware:

opkg update
opkg remove kmod-ath10k-ct ath10k-firmware-qca988x-ct ath10k-firmware-qca9984-ct ath10k-firmware-qca99x0-ct
opkg install kmod-ath10k ath10k-firmware-qca988x ath10k-firmware-qca99x0 ath10k-firmware-qca9984 

There’s also “full htt” variants of the -ct firmware, it could help in your situation if the card is getting stuck sending management frames to clients which are unreachable:

This saves limitted WMI buffers which can become depleted if lots of
management frames become stuck in TX queues due to peer
that went away.

opkg remove .. # remove existing ath10k-firmware* + kmod-ath10k if not using kmod-ath10k-ct
opkg install kmod-ath10k-ct ath10k-firmware-qca988x-ct-full-htt ath10k-firmware-qca9984-ct-full-htt ath10k-firmware-qca99x0-ct-full-htt

I’ll try the full htt drivers.

Am I okay to sysupgrade if I’ve installed muVirt on the NVMe drive?

Yes, it’s ok, the upgrade process preserves any partitions outside the standard OpenWrt zone as well.

Only catch: on older builds there was a bug where the configuration didn’t carry if the existing boot+root partitions sizes didn’t match the new ones. It’s been fixed now, but do a config backup just in case.

Successful upgrade from muvirt 21.02-SNAPSHOT r0+16341-7f361f5a1c to muvirt 21.02-SNAPSHOT r0+16550-0956c793c6 using and I’m using the htt drivers/firmware

The dropouts have improved a bit, but something also messed up my graphs so I don’t have data to back that up.