Reboot command regression from 5.10 to 5.15 kernel

After upgrading from kernel 5.10.76 (with Traverse patches) to kernel 5.15.11 (without Traverse patches, since it was said none were needed), the rebootcommand no longer works. With 5.10.76, I get console traces like the following when rebooting:

 * Remounting remaining filesystems read-only ...
 *   Remounting / read only ...
 [ ok ]
 [ ok ]
[   49.630445] reboot: Restarting system
INFO:    PSCI Power Domain Map:
INFO:      Domain Node : Level 2, parent_node -1, State ON (0x0)
INFO:      Domain Node : Level 1, parent_node 0, State ON (0x0)
INFO:      Domain Node : Level 1, parent_node 0, State ON (0x0)
INFO:      CPU Node : MPID 0x0, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x1, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x2, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x3, parent_node 1, State ON (0x0)
INFO:      CPU Node : MPID 0x100, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x101, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x102, parent_node 2, State ON (0x0)
INFO:      CPU Node : MPID 0x103, parent_node 2, State ON (0x0)

but with 5.15.11 I only get

 * Remounting remaining filesystems read-only ...
 *   Remounting / read only ...
 [ ok ]
 [ ok ]

and then nothing. No Restarting system, and no bootloader traces. Also, the system keeps responding to ping packets, but is otherwise unresponsive.

Could it be that some patch is still needed, or is there some other change in 5.15 which might cause this?

I also tried reboot with -k but it also very silent:

temari ~ # dmesg -n 8                                                           
temari ~ # /sbin/reboot -dknf                                                   
[   52.287616] kvm: exiting hardware virtualization                             

and then nothing…

This was referenced in the whats new thread.

I am aware of an issue in recent kernel versions that causes reboots to hang. The issue has been narrowed down to a potential race issue when “unplugging” DPAA2 objects from the drivers and futher debugging will be done in the new year. In the mean time, if you experience reboot hang issues, you should switch to legacy network management mode

Workaround 1) is to switch to legacy management mode.
(This will make all the GbE interfaces appear as link up all the time, which may cause issues if your system does automatic network configuration)

To move into legacy network mode, enter these commands at the U-Boot prompt:

setenv gbemode legacy
setenv sfpmode legacy

Workaround 2)

This patch I used for debugging seems to make the issue go away. It’s NOT a fix, the extra logging just adds enough delay to avoid the deadlock/race condition.

Thanks. The patch did not work though.

The legacy mode workaround does makes reset work, but now I notice that the 10G interface is not working (NO-CARRIER from ip link). Is it 100% confirmed that no patches are needed for 10G access with 5.15?

Hmm… thanks for trying. I’m planning to spend more time trying to find the root cause of the issue next week.

If you have ‘active’ SFPs (not just passive twinax cables), for legacy mode, you will need to signal them to turn on via GPIOs:

# Export lower SFP+ TXDISABLE to userspace:
echo 369 > /sys/class/gpio/export
# Make it an output
echo out > /sys/class/gpio/gpio369/direction
# Set value to 0 (enable TX)
echo 0 > /sys/class/gpio/gpio369/value

The upper SFP (XG1) is GPIO 373.
More info on the SFP page

Yeah, I have laser SFPs. But it turns out that the issue was not with legacy mode but with 5.15. Even with managed mode, it didn’t work on 5.15, but as soon as I switched back to 5.10 it started working again…

Apologies, this is what is missing: dpaa2-eth: do not hold rtnl_lock on phylink_create() or _destroy()

As far as I can tell the problem is due to a double lock when the kernel tries to destroy the SFP ‘phy’. This patch solves that issue but was rejected because it potentially causes others.

It sounds like the dpaa2 ethernet driver needs to be reworked so it doesn’t try to destroy the ‘phy’ instance on removal.

Can you compile a mainline kernel (without these patches) with CONFIG_PROVE_LOCKING=y to ensure it’s the same issue?

This is what happens on mine:

[  168.382355]
[  168.383863] ============================================
[  168.389183] WARNING: possible recursive locking detected
[  168.394503] 5.16.0-provelock-torvalds-08301-gfb3b0673b7d5-dirty #11 Tainted: G        W
[  168.403220] --------------------------------------------
[  168.408539] bash/3735 is trying to acquire lock:
[  168.413163] ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.420419]
[  168.420419] but task is already holding lock:
[  168.426260] ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.433509]
[  168.433509] other info that might help us debug this:
[  168.440048]  Possible unsafe locking scenario:
[  168.440048]
[  168.445977]        CPU0
[  168.448422]        ----
[  168.450868]   lock(rtnl_mutex);
[  168.454014]   lock(rtnl_mutex);
[  168.457161]
[  168.457161]  *** DEADLOCK ***
[  168.457161]
[  168.463093]  May be due to missing lock nesting notation
[  168.463093]
[  168.469893] 4 locks held by bash/3735:
[  168.473648]  #0: ffff008000d0c438 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xd0/0x220
[  168.481603]  #1: ffff008004004888 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xfc/0x1c0
[  168.490426]  #2: ffff008005272978 (&dev->mutex){....}-{4:4}, at: device_driver_detach+0x48/0xe0
[  168.499162]  #3: ffff80000a13af98 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x3c
[  168.506851]
[  168.506851] stack backtrace:
[  168.511216] CPU: 4 PID: 3735 Comm: bash Tainted: G        W         5.16.0-provelock-torvalds-08301-gfb3b0673b7d5-dirty #11
[  168.522375] Hardware name: traverse ten64/ten64, BIOS 2020.07-rc1-g488778dc 11/22/2021
[  168.530310] Call trace:
[  168.532758]  dump_backtrace.part.0+0xdc/0xf0
[  168.537041]  show_stack+0x24/0x80
[  168.540362]  dump_stack_lvl+0x8c/0xb8
[  168.544034]  dump_stack+0x18/0x34
[  168.547356]  __lock_acquire+0xbe4/0x2110
[  168.551288]  lock_acquire.part.0+0x9c/0x1e0
[  168.555480]  lock_acquire+0x68/0x84
[  168.558974]  __mutex_lock+0x8c/0x374
[  168.562558]  mutex_lock_nested+0x44/0x70
[  168.566488]  rtnl_lock+0x28/0x3c
[  168.569722]  sfp_bus_del_upstream+0x28/0xb0
[  168.573916]  phylink_destroy+0x28/0x54
[  168.577672]  dpaa2_mac_disconnect+0x34/0x70
[  168.581867]  dpaa2_eth_remove+0x19c/0x1b0
[  168.585884]  fsl_mc_driver_remove+0x30/0x70

If you are using our default DPAA2 configuration, you should be able to cause at runtime it by unbinding the relevant DPNIs (dpmac.{2,1} → dpni.{0,1}):

echo 'dpni.1' > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/unbind
[  168.382355]
[  168.383863] ============================================
[  168.389183] WARNING: possible recursive locking detected

(Warning: your system may still be unusable after this, but by doing it on the SFP interfaces directly it proves where the problem is)

Well, that was weird. With CONFIG_PROVE_LOCKING=y I didn’t get any ethernet interfaces at all (with the side effect that reboot was working just fine :smile:)… The DPNIs were there, just no linux interfaces. And /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/ did not exist. Full bootlog here: dpaste: G5UKQUBP9

Anyway, the rtnl_lock patch worked fine (it didn’t apply cleanly to 5.15, but the conflict was easy to resolve), so now I can reboot again. Now I just need to find out why the 10G interface is not working in 5.15 but works fine in 5.10 (managed mode in both cases)…

Hm, I wonder is this is related to the problem with the 10G interface:

[    7.014994] sfp dpmac2_sfp: module OEM              10GB-SFP-SR      rev 1.0  sn XP96S7812        dc 210625  
[    7.015022] fsl_dpaa2_eth dpni.4 eno0d8: validation with support 0000000,00000000,00006440 failed: -22

Yup, it turns out this patch is still needed:

Now everything seems to work fine. :partying_face: