Ten64 not rebooting after reboot command

Hi!

The Ten64 doesn’t seem to be successfully rebooting after running a reboot command. It seems to get stuck at;

[  OK  ] Stopped Initial cloud-init job (pre-networking).
[  OK  ] Stopped Grow File System on /.
[  OK  ] Removed slice Slice /system/systemd-growfs.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Unmounted /boot/efi.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
[  OK  ] Finished Reboot.
[  OK  ] Reached target Reboot.
[  540.068080] watchdog: watchdog0: watchdog did not stop!
[  540.098004] systemd-shutdown[1]: Syncing filesystems and block devices.
[  540.104777] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  540.120689] systemd-journald[1452]: Received SIGTERM from PID 1 (systemd-shutdow).
[  540.133906] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  540.148497] systemd-shutdown[1]: Using hardware watchdog 'sp805-wdt', version 0, device /dev/watchdog
[  540.159239] systemd-shutdown[1]: Unmounting file systems.
[  540.166238] [4733]: Remounting '/' read-only in with options 'discard,errors=remount-ro'.
[  540.185748] EXT4-fs (nvme0n1p1): re-mounted. Opts: discard,errors=remount-ro. Quota mode: none.
[  540.205795] systemd-shutdown[1]: All filesystems unmounted.
[  540.211721] systemd-shutdown[1]: Deactivating swaps.
[  540.216764] systemd-shutdown[1]: All swaps deactivated.
[  540.221999] systemd-shutdown[1]: Detaching loop devices.
[  540.229548] systemd-shutdown[1]: All loop devices detached.
[  540.235134] systemd-shutdown[1]: Stopping MD devices.
[  540.240439] systemd-shutdown[1]: All MD devices stopped.
[  540.245761] systemd-shutdown[1]: Detaching DM devices.
[  540.251136] systemd-shutdown[1]: All DM devices detached.
[  540.256563] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[  540.269403] systemd-shutdown[1]: Syncing filesystems and block devices.
[  540.276085] systemd-shutdown[1]: Rebooting.
[  540.280274] kvm: exiting hardware virtualization

The hardware watchdog also doesn’t seem to kick in an force a reboot?

Any idea what is going wrong?

Holdups like this are generally due to drivers not unloading correctly, the same sort of issues that cause rmmod to stall or panic if you unload a module on a running system.

Do you have your WiFi 6 / ath11k card installed? That driver does have issues unloading cleanly at the moment (though for the issues I’ve seen, you get a lovely kernel panic after the kvm: exiting hardware virtualization)

systemd has a default of 10 minutes for RuntimeWatchdogSec which is a very long time.

I’ve just tried changing them to very low values (e.g 30 or 60 seconds) and it doesn’t cause the watchdog to trigger when a driver crashes in this situation, I’m guessing because the watchdog driver might get unloaded after systemd-shutdown[1]: Rebooting.

On a tangential note, systemd doesn’t enable the watchdog outside shutdown/reboot but you can do so by setting RuntimeWatchdogSet. A bit more info here in the manual.

The watchdog works fine if I use the echo c > /proc/sysrq-trigger trick. I think the log messages are saying that systemd is disabling the watchdog as part of the shutdown process?

...
[  540.068080] watchdog: watchdog0: watchdog did not stop!
...
[  540.148497] systemd-shutdown[1]: Using hardware watchdog 'sp805-wdt', version 0, device /dev/watchdog
...

I don’t see any indication that the kernel has paniced in the output (these logs are from the serial console).

You do seem to be correct that unloading the ath11k_pci module does cause the machine to lock up, I’m going to blacklist that driver for the moment.

Blacklisting the ath11k modules does seem to make rebooting working fine again. I’m a bit confused about why the kernel panic related to the ath11k module unloading doesn’t seem to end up being printed.

It also looks like there are separate watchdog configurations for runtime verse reboot, from systemd-system.conf(5) — systemd — Debian testing — Debian ManpagesRuntimeWatchdogSec= , RebootWatchdogSec= , KExecWatchdogSec=

Might be worth mentioning in your watchdog page?

I added a couple of printk’s into the watchdog driver.

It does show the watchdog driver being disabled after systemd has finished it’s shutdown processes, so another driver failing to unload after this is not ‘protected’ by the watchdog:

[  118.196065] wdt_setload called timeout=40
[  118.200148] watchdog: watchdog0: watchdog did not stop!
[  118.241398] systemd-shutdown[1]: Syncing filesystems and block devices.
[  118.248180] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  118.265188] systemd-journald[1238]: Received SIGTERM from PID 1 (systemd-shutdow).
[  118.282272] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  118.297370] systemd-shutdown[1]: Using hardware watchdog 'sp805-wdt', version 0, device /dev/watchdog
[  118.308985] systemd-shutdown[1]: Unmounting file systems.
[  118.315901] [2404]: Remounting '/' read-only in with options 'discard,errors=remount-ro'.
[  118.332964] EXT4-fs (nvme0n1p1): re-mounted. Opts: discard,errors=remount-ro. Quota mode: none.
...
[  118.434260] systemd-shutdown[1]: Rebooting.
[  118.438463] kvm: exiting hardware virtualization
[  118.443206] wdt_disable called

(40 seconds is the value for ShutdownWatchdogSec I used)

If you unload ath11k_pci while the system is running, do you get a panic, or does it rmmod stall with no kernel output? I’ve seen both happen.

Good idea.