It seems on my machine, the fan speed goes above 5k when
temp2_input is near 55C.
Looking at emc2301/USAGE.md · master · ls1088firmware / traverse-sensors · GitLab, should fan speed be 5.5k around 70C?
I am running:
muvirt 21.02.2+traverse r0+16642-b340b05020 / LuCI branch git-22.090.55699-bb6ef9f.
The fan speed is adjusted according to the temperature sensor inside the CPU:
$ cat /sys/class/thermal/thermal_zone0/temp
It should be full speed (5000rpm) when the CPU temperature approaches 70C, at other times it will try and stay around 3000rpm.
The fan settings are set in the device tree, see target/linux/arm64/patches-5.10/310-arm64-dts-add-device-tree-for-Traverse-Ten64-LS1088A.patch · muvirt_base_2102_2021_09_20 · traversetech / muvirt-lede · GitLab for the ‘trip points’ and under emc2301@2f for the fan speeds.
Thanks for the correction.
At the moment I have the following values
root@openwrt:~# cat /sys/class/hwmon/hwmon2/fan1_target
root@openwrt:~# cat /sys/class/thermal/thermal_zone0/temp
which I find quite noisy.
The router seems to not do much.
top is showing CPU at 0% and CPU load is
Load average: 0.00 0.01 0.00
The traffic is below much 100 Kbit/s on all of the interfaces.
For comparison, my Raspberry Pi 3 shows 59.1 C (nice coincidence ;)) CPU temperature, but with 40% CPU utilization and load 0.6 (for all 3 values).
I cannot find anything about CPU frequency scaling. Is this, or something similar, possible? Or my only option is to allow higher CPU temperatures via the device tree?
I am using the fan integrated into the enclosure. No fan on the radiator on top of the CPU.
The static power consumption of the LS1088 is quite a bit higher than the RPI3 due to it’s high speed interfaces (DDR4, PCIe3, 10G etc) but there are some things we can do to reduce it.
There is a cpufreq driver for this CPU (qoriq-cpufreq) but in my previous experience it doesn’t seem to be effective at power reduction.
(the power consumption remains the same, so maybe it doesn’t change all the related clocks)
BUT if you don’t need the full power of the CPU, you can try reducing it’s speed to 1200MHz by flashing a different BL2/RCW. We also setup U-Boot to reduce the CPU voltage slightly to 0.9V (from 1.0V). The DDR and internal CPU bus speeds are also reduced.
If you flash these two files from recovery or via a OpenWrt on NAND:
bl2_qspi.bpl 1200MHz version
fip.bin (U-Boot) with 1200MHz voltage scaling
mtd erase bl2 && mtd write bl2_qspi.pbl bl2
mtd erase bl3 && mtd write fip.bin bl3
Your CPU will startup at 1200MHz and have it’s voltage reduced on the next boot:
SoC: LS1088AE Rev1.0 (0x87030010)
CPU0(A53):1200 MHz CPU1(A53):1200 MHz CPU2(A53):1200 MHz
CPU3(A53):1200 MHz CPU4(A53):1200 MHz CPU5(A53):1200 MHz
CPU6(A53):1200 MHz CPU7(A53):1200 MHz
Bus: 500 MHz DDR: 1600 MT/s
VID: CPU Core readback: 14
We have been deploying 1200MHz as part of a customer project for a while now without any significant issues. I would be interested to hear any other experiences with it.
I need the full power of the CPU occasionally. My router runs few services, PostgreSQL among others, and it can be hammered with a query from time to time. Therefore, any CPU frequency scaling solution would be probably the best.
I will flash the files for now. Thank you.
@mcbridematt Running at 1.2 GHz makes a big difference. Thanks again.
Sorry for my ignorance, but looking at this table (from https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/imx-processors/161806/1/NXP%20LS1088A-1386069.pdf)
Does it mean, that if CPU frequency scaling was working, the CPU frequency could go as low as 600 MHz?
I am getting the following error messages in the kernel log, and I wonder if that could be related to 1.2MHz/0.9V changes?
Can I ask you to provide the 1.6GHz/1.0V files for flashing as well? I would like to double check if I am getting the error when running that configuration.
[240068.461268] ath10k_pci 0001:04:00.0: failed to transmit management frame via WMI: -108
[240068.469198] ath10k_pci 0001:04:00.0:  next: 0x411aa8 buf: 0x40fefc sz: 1500 len: 144 count: 9 free: 0
[240068.477841] ath10k_pci 0001:04:00.0: failed to transmit management frame via WMI: -108
[240068.487483] ath10k_pci 0001:04:00.0: ath10k_pci ATH10K_DBG_BUFFER:
[240068.495007] ath10k_pci 0001:04:00.0: failed to transmit management frame via WMI: -108
[240068.498569] ath10k_pci 0001:04:00.0: SWBA overrun on vdev 0, skipped old beacon
[240068.504149] ath10k: : 06A80C10 0BFC4C20 00000009 0000000B 06A80C24 0BFC4C20 00000009 0000000B
[240068.504624] ath10k: : 06A80C39 0BFC4C20 00000009 0000000B 06A80C4D 0BFC4C20 00000009 0000000B
[240068.513329] ath10k_pci 0001:04:00.0: failed to transmit management frame via WMI: -108
[240068.521331] ath10k: : 06A80C62 0BFC4C20 00000009 0000000B 06A80C76 0BFC4C20 00000008 0000000B
[240068.521822] ath10k: : 06A80C8B 0BFC4C20 00000008 0000000B 06A80C9F 0BFC4C20 00000008 0000000B
[240068.530505] ath10k_pci 0001:04:00.0: failed to transmit management frame via WMI: -108
[240068.539643] ath10k: : 06A80CB4 0BFC4C20 00000008 0000000B
[240068.540132] ath10k_pci 0001:04:00.0: ATH10K_END
bl2_qspi.pbl into the bl2 partition to restore 1.6GHz
Make sure you power cycle your machine after writing the new bl2 so it returns to 1.0V.
Just in case, my understanding is that
fip.bin file stays the same when switching back to 1.6 GHz?
When running 1.2 GHz but with ath10k official (non-ct) firmware - this seems to be stable for last couple of days.
Once I get bit more free time, I can try to check behaviour of ct firmware between 1.6 GHz and 1.2 GHz. Anyone interested, let me know please.
@mcbridematt If you don’t mind, I would suggest to list
bl2_qspi.pbl files for each new firmware directory for 1.2 and 1.6 GHz with instructions how to flash them. For my use cases, it seems I/O is the crucial part of performance, so I am OK to run my router at 1.2 GHz as it is much quieter. But it would be great to switch to 1.6 GHz if required in the future.
That is correct, the U-Boot/BL3 (fip.bin) change is needed to instruct the voltage regulators to change to 0.9V when running at 1.2GHz. When the CPU is started at 1600MHz it does not change any settings.
We will put it this change into future firmware versions.
Can certainly be done.
I’ll put the cpufreq on my TODO list. I think some other bits (e.g voltage regulator control) may be needed to make it effective.
I encountered something similar to these messages while updating our OpenWrt builds today, so I don’t think it’s related to the 1.2GHz CPU speed.
Could you try removing ath10k-ct and the -ct firmware and installing the ‘plain’ (upstream) versions:
opkg remove kmod-ath10k-ct ath10k-firmware-qca988x-ct
opkg install kmod-ath10k ath10k-firmware-qca988x
reboot # required to start with the normal ath10k driver and firmware
This works for the WLE600/900VX card, if you have something else check the system log to make sure you remove and uninstall the right firmware package.
@mcbridematt Thanks for the info. I switched to the plain firmware sometime ago indeed (please see Cpu temperature and fan speed - #9 by wrobell). It has been running without any issues for about 10 days, now.
I have changed few things when setting the router to run at 1.2 GHz CPU speed. None of the software related - new radiator on SSD drive, new location of the router (and thus the need for it to be quieter ;)), new arrangement of the antennas. After that, I had started to notice the wifi issue, and then the messages. If the issue existed before those various changes, then it had to be really rare, and then to intensify as I started to get it once an hour.
I am happy with the plain firmware, so lost motivation for further investigation. But I suspect, something had to trigger the wifi issue. If I can provide some useful info, let me know please.