Holdups like this are generally due to drivers not unloading correctly, the same sort of issues that cause rmmod to stall or panic if you unload a module on a running system.
Do you have your WiFi 6 / ath11k card installed? That driver does have issues unloading cleanly at the moment (though for the issues I’ve seen, you get a lovely kernel panic after the kvm: exiting hardware virtualization)
I’ve just tried changing them to very low values (e.g 30 or 60 seconds) and it doesn’t cause the watchdog to trigger when a driver crashes in this situation, I’m guessing because the watchdog driver might get unloaded after systemd-shutdown[1]: Rebooting.
On a tangential note, systemd doesn’t enable the watchdog outside shutdown/reboot but you can do so by setting RuntimeWatchdogSet. A bit more info here in the manual.
The watchdog works fine if I use the echo c > /proc/sysrq-trigger trick. I think the log messages are saying that systemd is disabling the watchdog as part of the shutdown process?
...
[ 540.068080] watchdog: watchdog0: watchdog did not stop!
...
[ 540.148497] systemd-shutdown[1]: Using hardware watchdog 'sp805-wdt', version 0, device /dev/watchdog
...
I don’t see any indication that the kernel has paniced in the output (these logs are from the serial console).
You do seem to be correct that unloading the ath11k_pci module does cause the machine to lock up, I’m going to blacklist that driver for the moment.
Blacklisting the ath11k modules does seem to make rebooting working fine again. I’m a bit confused about why the kernel panic related to the ath11k module unloading doesn’t seem to end up being printed.
I added a couple of printk’s into the watchdog driver.
It does show the watchdog driver being disabled after systemd has finished it’s shutdown processes, so another driver failing to unload after this is not ‘protected’ by the watchdog:
[ 118.196065] wdt_setload called timeout=40
[ 118.200148] watchdog: watchdog0: watchdog did not stop!
[ 118.241398] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 118.248180] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 118.265188] systemd-journald[1238]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 118.282272] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 118.297370] systemd-shutdown[1]: Using hardware watchdog 'sp805-wdt', version 0, device /dev/watchdog
[ 118.308985] systemd-shutdown[1]: Unmounting file systems.
[ 118.315901] [2404]: Remounting '/' read-only in with options 'discard,errors=remount-ro'.
[ 118.332964] EXT4-fs (nvme0n1p1): re-mounted. Opts: discard,errors=remount-ro. Quota mode: none.
...
[ 118.434260] systemd-shutdown[1]: Rebooting.
[ 118.438463] kvm: exiting hardware virtualization
[ 118.443206] wdt_disable called
(40 seconds is the value for ShutdownWatchdogSec I used)
If you unload ath11k_pci while the system is running, do you get a panic, or does it rmmod stall with no kernel output? I’ve seen both happen.