No SFP recognized since upgrade

After doing some upgrades (both hardware and software, bumped muVirt to 24.10) my Ten64 stopped recognizing any SFPs plugged into the cages. Nothing in dmesg, no lights. I’ve tried downgrading to older versions of muvirt or openwrt to no avail, as well as try in recovery. ethtool -m eth8/eth9 gives netlink error: Not supported.

Any further testing I should do to confirm/narrow the issue?

The first thing to check would be the sfpmode variable in U-Boot and/or if U-Boot prints something like this:

fsl-mc: ten64: setting SFP to legacy (unmanaged) mode

If you re-flashed the system firmware, it might have changed back to the non-managed (“legacy”) mode.
It also gets applied when installing most older distribution versions (kernel <6.2) from the appliance store, including fairly current ones like Debian 12/bookworm.

If that is the case, the fix is easy.
Reboot and interrupt U-Boot to get a U-Boot prompt:

env delete sfpmode
saveenv
reset #reboot to make the change effective

The next major cause is a missing I2C Mux (i2c-mux-pca954x) driver. That normally causes warnings like this to appear in dmesg:

[   26.713510] platform dpmac2-sfp: deferred probe pending
[   26.718750] platform dpmac1-sfp: deferred probe pending

If you have downloaded something we built (like our OpenWrt or muvirt), it should have all the correct drivers built in, but they sometimes go missing from images built manually or via means such as ImageBuilder.

These are the two main causes of the SFP’s not being recognized, if these don’t solve the issue then let me know.

No sfpmode in the U-boot environment. Checked for gbemode too.

Controller: wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=0
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=1
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=2
i2c_init_transfer: give up i2c_regs=0x2000000
ten64_get_micro_udevice: Could not get microcontroller device
ERROR: unable to communicate
Retimer: wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=0
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=1
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x7e retry=2
i2c_init_transfer: give up i2c_regs=0x2000000
ten64_get_micro_udevice: Could not get microcontroller device
Retimer power on failed
Fan: wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2f retry=0
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2f retry=1
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2f retry=2
i2c_init_transfer: give up i2c_regs=0x2000000
emc230x-i2c emc2301@2f: Failed to read EMC230X Product ID register: -85
ten64_disable_fan_pwm: ERROR: Unable to get fan controller device (err=-85)
USB Hub:    wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2d retry=0
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2d retry=1
wait_for_sr_state: Arbitration lost sr=92 cr=0 state=2020
i2c_init_transfer: failed for chip 0x2d retry=2
i2c_init_transfer: give up i2c_regs=0x2000000

Some errors in U-Boot logs, and from dmesg

root@openwrt:/# dmesg | grep sfp
root@openwrt:/# dmesg | grep err
[    0.023709] PCI/MSI: /interrupt-controller@6000000/gic-its@6020000 domain created
[    0.069154] kvm [1]: vgic interrupt IRQ9
[    3.141859] sdhci: Copyright(c) Pierre Ossman
[    4.406339] tpm tpm0: tpm_try_transmit: send(): error -11
[    4.411760] tpm tpm0: A TPM error (-11) occurred attempting to determine the timeouts
[    4.419600] tpm_i2c_atmel: probe of 0-0029 failed with error -11
[   10.022879] pca953x: probe of 0-0076 failed with error -11
[   10.285173] usbcore: registered new interface driver sierra
[   10.290785] usbserial: USB Serial support registered for Sierra USB modem
[   12.650553] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.672965] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.695352] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.717733] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.740108] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.762485] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.784862] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.807238] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.829617] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.851993] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.873868] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.895743] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
[   12.917786] fsl_dpaa2_eth dpni.9 eth0: Could not add ucast MAC 00:0a:fa:24:2e:01 to the filtering table (err -119)
root@openwrt:/# dmesg | grep pca
[    9.998903] pca954x 2-0070: registered 2 multiplexed busses for I2C mux pca9540
[   10.007138] pca953x 0-0076: supply vcc not found, using dummy regulator
[   10.013874] pca953x 0-0076: using no AI
[   10.017906] pca953x 0-0076: failed writing register
[   10.022879] pca953x: probe of 0-0076 failed with error -11

This is on muvirt 23.05-SNAPSHOT, r0+23779-5c8244842f.

That does not look good.
The system I2C bus is stuck and this is preventing communication to a lot of components on the board like the board microcontroller, fan controller and SFP signal conditioner/retimer.

Have you done any hardware changes like changing add-in cards etc. recently?

Using an air duster/compressed air spray over the board might help.
This fault usually happens due to something conductive getting between the pins of one of the components.

I have removed a wifi card that wasn’t doing anything (ath11k so support was super patchy).

Already tried compressed air, will try again more thoroughly/maybe alcohol under the underside of the board.

Will report back when that’s done.