Reboot command regression from 5.10 to 5.15 kernel

marcus · January 2, 2022, 2:53pm

After upgrading from kernel 5.10.76 (with Traverse patches) to kernel 5.15.11 (without Traverse patches, since it was said none were needed), the rebootcommand no longer works. With 5.10.76, I get console traces like the following when rebooting:

 * Remounting remaining filesystems read-only ...
 *   Remounting / read only ...
 [ ok ]
 [ ok ]
[   49.630445] reboot: Restarting system
INFO:    PSCI Power Domain Map:
INFO:      Domain Node : Level 2, parent_node -1, State ON (0x0)
INFO:      Domain Node : Level 1, parent_node 0, State ON (0x0)
INFO:      Domain Node : Level 1, parent_node 0, State ON (0x0)
INFO:      CPU Node : MPID 0x0, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x1, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x2, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x3, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x100, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x101, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x102, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x103, parent_node 2, State ON (0x0)
INFO:    RCW BOOT SRC is QSPI

but with 5.15.11 I only get

 * Remounting remaining filesystems read-only ...
 *   Remounting / read only ...
 [ ok ]
 [ ok ]

and then nothing. No Restarting system, and no bootloader traces. Also, the system keeps responding to ping packets, but is otherwise unresponsive.

Could it be that some patch is still needed, or is there some other change in 5.15 which might cause this?

marcus · January 2, 2022, 3:01pm

I also tried reboot with -k but it also very silent:

temari ~ # dmesg -n 8                                                           
temari ~ # /sbin/reboot -dknf                                                   
[   52.287616] kvm: exiting hardware virtualization

and then nothing…

mcbridematt · January 2, 2022, 11:31pm

This was referenced in the whats new thread.

I am aware of an issue in recent kernel versions that causes reboots to hang. The issue has been narrowed down to a potential race issue when “unplugging” DPAA2 objects from the drivers and futher debugging will be done in the new year. In the mean time, if you experience reboot hang issues, you should switch to legacy network management mode

Workaround 1) is to switch to legacy management mode.
(This will make all the GbE interfaces appear as link up all the time, which may cause issues if your system does automatic network configuration)

To move into legacy network mode, enter these commands at the U-Boot prompt:

setenv gbemode legacy
setenv sfpmode legacy
saveenv
reset

Workaround 2)

This patch I used for debugging seems to make the issue go away. It’s NOT a fix, the extra logging just adds enough delay to avoid the deadlock/race condition.

marcus · January 3, 2022, 5:53pm

Thanks. The patch did not work though.

The legacy mode workaround does makes reset work, but now I notice that the 10G interface is not working (NO-CARRIER from ip link). Is it 100% confirmed that no patches are needed for 10G access with 5.15?

mcbridematt · January 3, 2022, 8:44pm

Hmm… thanks for trying. I’m planning to spend more time trying to find the root cause of the issue next week.

If you have ‘active’ SFPs (not just passive twinax cables), for legacy mode, you will need to signal them to turn on via GPIOs:

# Export lower SFP+ TXDISABLE to userspace:
echo 369 > /sys/class/gpio/export
# Make it an output
echo out > /sys/class/gpio/gpio369/direction
# Set value to 0 (enable TX)
echo 0 > /sys/class/gpio/gpio369/value

The upper SFP (XG1) is GPIO 373.
More info on the SFP page

marcus · January 3, 2022, 8:59pm

Yeah, I have laser SFPs. But it turns out that the issue was not with legacy mode but with 5.15. Even with managed mode, it didn’t work on 5.15, but as soon as I switched back to 5.10 it started working again…

mcbridematt · January 14, 2022, 2:51am

Apologies, this is what is missing: dpaa2-eth: do not hold rtnl_lock on phylink_create() or _destroy()

As far as I can tell the problem is due to a double lock when the kernel tries to destroy the SFP ‘phy’. This patch solves that issue but was rejected because it potentially causes others.

It sounds like the dpaa2 ethernet driver needs to be reworked so it doesn’t try to destroy the ‘phy’ instance on removal.

Can you compile a mainline kernel (without these patches) with CONFIG_PROVE_LOCKING=y to ensure it’s the same issue?

This is what happens on mine:

[  168.382355]
[  168.383863] ============================================
[  168.389183] WARNING: possible recursive locking detected
[  168.394503] 5.16.0-provelock-torvalds-08301-gfb3b0673b7d5-dirty #11 Tainted: G        W
[  168.403220] --------------------------------------------
[  168.408539] bash/3735 is trying to acquire lock:
[  168.413163] ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.420419]
[  168.420419] but task is already holding lock:
[  168.426260] ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.433509]
[  168.433509] other info that might help us debug this:
[  168.440048]  Possible unsafe locking scenario:
[  168.440048]
[  168.445977]        CPU0
[  168.448422]        ----
[  168.450868]   lock(rtnl_mutex);
[  168.454014]   lock(rtnl_mutex);
[  168.457161]
[  168.457161]  *** DEADLOCK ***
[  168.457161]
[  168.463093]  May be due to missing lock nesting notation
[  168.463093]
[  168.469893] 4 locks held by bash/3735:
[  168.473648]  #0: ffff008000d0c438 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xd0/0x220
[  168.481603]  #1: ffff008004004888 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xfc/0x1c0
[  168.490426]  #2: ffff008005272978 (&dev->mutex){....}-{4:4}, at: device_driver_detach+0x48/0xe0
[  168.499162]  #3: ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.506851]
[  168.506851] stack backtrace:
[  168.511216] CPU: 4 PID: 3735 Comm: bash Tainted: G        W         5.16.0-provelock-torvalds-08301-gfb3b0673b7d5-dirty #11
[  168.522375] Hardware name: traverse ten64/ten64, BIOS 2020.07-rc1-g488778dc 11/22/2021
[  168.530310] Call trace:
[  168.532758]  dump_backtrace.part.0+0xdc/0xf0
[  168.537041]  show_stack+0x24/0x80
[  168.540362]  dump_stack_lvl+0x8c/0xb8
[  168.544034]  dump_stack+0x18/0x34
[  168.547356]  __lock_acquire+0xbe4/0x2110
[  168.551288]  lock_acquire.part.0+0x9c/0x1e0
[  168.555480]  lock_acquire+0x68/0x84
[  168.558974]  __mutex_lock+0x8c/0x374
[  168.562558]  mutex_lock_nested+0x44/0x70
[  168.566488]  rtnl_lock+0x28/0x3c
[  168.569722]  sfp_bus_del_upstream+0x28/0xb0
[  168.573916]  phylink_destroy+0x28/0x54
[  168.577672]  dpaa2_mac_disconnect+0x34/0x70
[  168.581867]  dpaa2_eth_remove+0x19c/0x1b0
[  168.585884]  fsl_mc_driver_remove+0x30/0x70

If you are using our default DPAA2 configuration, you should be able to cause at runtime it by unbinding the relevant DPNIs (dpmac.{2,1} → dpni.{0,1}):

echo 'dpni.1' > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/unbind
[  168.382355]
[  168.383863] ============================================
[  168.389183] WARNING: possible recursive locking detected
...

(Warning: your system may still be unusable after this, but by doing it on the SFP interfaces directly it proves where the problem is)

marcus · January 14, 2022, 11:06pm

Well, that was weird. With CONFIG_PROVE_LOCKING=y I didn’t get any ethernet interfaces at all (with the side effect that reboot was working just fine )… The DPNIs were there, just no linux interfaces. And /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/ did not exist. Full bootlog here: dpaste: G5UKQUBP9

Anyway, the rtnl_lock patch worked fine (it didn’t apply cleanly to 5.15, but the conflict was easy to resolve), so now I can reboot again. Now I just need to find out why the 10G interface is not working in 5.15 but works fine in 5.10 (managed mode in both cases)…

marcus · January 16, 2022, 5:12pm

Hm, I wonder is this is related to the problem with the 10G interface:

[    7.014994] sfp dpmac2_sfp: module OEM              10GB-SFP-SR      rev 1.0  sn XP96S7812        dc 210625  
[    7.015022] fsl_dpaa2_eth dpni.4 eno0d8: validation with support 0000000,00000000,00006440 failed: -22
``

marcus · January 16, 2022, 5:33pm

Yup, it turns out this patch is still needed:

Now everything seems to work fine.

kpfleming · February 25, 2023, 10:17pm

Pardon my jumping in here…

I received two units this week, and I’ve spent part of the day bringing them up. I installed Intel 670p SSDs and Crucial 16GB DRAM (from the compatibility list), and nothing else. The only NIC in use on them is eth7. The configs are identical, and both were upgraded to the most recent firmware before I installed Linux.

Both of them are running plain Debian Bookworm, no kernel patches or Traverse modules (yet). One works perfectly, the other boots and operates perfectly but reliably hangs when rebooting. It’s been power cycled after the firmware upgrades as well in case that made any difference.

The DPAA interfaces are in the default mode in the firmware, I’ve not made any changes in the U-boot environment.

If there’s anything I can do to help troubleshoot this let me know; I do plan to use the SFP+ slots (they will have 10Gtek AOCs in them) so I’ll hopefully be able to leave the interfaces in managed mode.

Also: I used the bare metal appliance store to install Bookworm initially onto a USB drive (which I then booted and used to install onto the NVMe drive). The unit which fails also fails when rebooting using the Traverse-modified Debian install.

mcbridematt · February 26, 2023, 11:31pm

Hi all,

This has actually been fixed very recently in Linux 6.2:
https://patchwork.kernel.org/project/netdevbpf/cover/20221129141221.872653-1-vladimir.oltean@nxp.com/

There was an earlier fix floating around (which we carry in all our <6.1 kernel patchsets), much simpler but not accepted due to some hypothetical issues.
The above series from Vladimir contains a deeper rework of how the MAC and PHY is linked.

I’ve discussed with Vladimir about whether this fix can go into the 6.1 stable/LTS series. While it’s an important fix, it would be difficult to satisfy the stable patch criteria. I believe it best to wait until the fix has circulated around in different kernel versions first.

You are all welcome to ask your favorite distribution to backport it, though.
For the ones I have been involved with (OpenWrt, VyOS etc.) I will submit them when they switch to 6.1.

I created a 6.1 branch of our kernel patchset last week (and you can use the lts-6-1 tag in our APT repository to get them):

kpfleming · February 27, 2023, 2:05am

That’s excellent news… I’ll use your patched kernel packages until the 6.2 kernel lands in Debian unstable.

kpfleming · February 27, 2023, 7:18pm

Confirmed: the kernel from the lts-6-1 tags reboots properly on the machine which would not reboot before. There was one complication with the installation but I’ll report that in the repo.