LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: LEDE Project
Opened by Russell Senior - 08.04.2017
Last edited by Mathias Kresin - 24.06.2017

FS#687 - Meraki MR24: Ethernet interface not detected correctly if cable not plugged at boot time

- Device problem occurs on

Meraki MR24 (apm821xx)

- Software versions of LEDE release, packages, etc.

Tested reboot-3921-g3169a6a7ad, but it’s been a problem since at least last September.

- Steps to reproduce

 

Disconnect ethernet cable, apply power, wait until device has booted, plug in ethernet, check for interfaces, no eth0 is listed.

This appears to be a problem during probing of the AR8035 Phy chip. When ethernet has no link, the phy detection fails, and eth0 is not created. Plugging ethernet later has no effect, because there is no interface as far as the kernel is concerned. The relevant part of the boot log looks like this:

this is the failing case:

[    0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[    0.882532] /plb/opb/ethernet@ef600c00: reset timeout
[    0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!

and the succeeding case:

[    0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[    0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:73:01:23:41
[    0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)


Closed by  Mathias Kresin
24.06.2017 20:39
Reason for closing:  Fixed
Additional comments about closing:  

Fixed with ht tps://git.lede-project.org/6adc757097ca9 66796ac213ba7f888d59b651661

Christian Lamparter commented on 08.04.2017 17:02

Hello,

I don't have a MR24 myself. But I know that the MyBook Live. (Same single RGMII PHY setup, but with a Broadcom PHY BCM54610) doesn't have a problem with detecting the PHY during startup, even if no ethernet cable is connected to it. I've tested todays image from downloads.lede-project.org: r3925-64175ff

Since I think this is something specific to the MR24 and since you said "but it’s been a problem since at least last September.". I think it must have been a issue from the beginning. Is this correct? Or do you remember an older version, were this was working as intended?

As for debugging this. Is this phy detection error just a problem during boot-time? Or does it persist? I think you could test this by unbinding and rebinding the emac driver as root when the MR24 has finished booting and is running:


# echo "4ef600c00.ethernet" > /sys/bus/platform/drivers/emac/unbind

# echo "4ef600c00.ethernet" > /sys/bus/platform/drivers/emac/bind

[  208.654537] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[  208.661902] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:90:aa:31:32:25
[  208.668772] eth0: found Generic MII PHY (0x01) 
(The MBL can use the generic PHY driver)
...

If the PHY is still not detected, then it has to be something else.

(maybe the bootloader disables the PHY via GPIO? Does Cisco advertise such a feature? In that case a dump of /sys/kernel/debug/gpio with and w/o ethernet cable attached during boot might help).

Russell Senior commented on 09.04.2017 01:38
Since I think this is something specific to the MR24 and since you said "but it’s been a problem since at least last September.". I think it must have been a issue from the beginning. Is this correct? Or do you remember an older version, were this was working as intended?

I am unaware if it ever worked. I think my initial foray into MR24 things dates from the same September period.

I tried the bind/unbind. With ethernet unplugged at boot time, then plugged in, and running the suggested command, I get this:

echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind
[   49.562867] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[   49.568842] /plb/opb/ethernet@ef600c00: reset timeout
[   49.573925] /plb/opb/ethernet@ef600c00: can't find PHY!
ash: write error: No such device

With ethernet plugged at boot time, I see this:

# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 18:33 .
drwxr-xr-x   23 root     root             0 Apr  8 18:33 ..
lrwxrwxrwx    1 root     root             0 Apr  8 18:33 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w-------    1 root     root          4096 Apr  8 18:33 bind
--w-------    1 root     root          4096 Apr  8 18:33 uevent
--w-------    1 root     root          4096 Apr  8 18:33 unbind

With the MR24 booted with no ethernet, the GPIOs dump as follows:

# cat /sys/kernel/debug/gpio
gpiochip2: GPIOs 448-463, parent: pci/0000:44:00.0, ath9k-phy1:
 gpio-458 (                    |ath9k-phy1          ) out lo

gpiochip1: GPIOs 464-479, parent: pci/0000:43:00.0, ath9k-phy0:
 gpio-474 (                    |ath9k-phy0          ) out lo

gpiochip0: GPIOs 480-511, /plb/opb/gpio@ef600b00:
 gpio-496 (                    |Reset button        ) in  hi
 gpio-497 (                    |?                   ) out hi
 gpio-498 (                    |?                   ) out lo
 gpio-499 (                    |?                   ) out hi
 gpio-500 (                    |?                   ) out lo
 gpio-501 (                    |?                   ) out hi
 gpio-502 (                    |?                   ) out hi
 gpio-503 (                    |?                   ) out hi

With ethernet plugged at boot time, I get these GPIOs:

# cat /sys/kernel/debug/gpio 
gpiochip2: GPIOs 448-463, parent: pci/0000:44:00.0, ath9k-phy1:
 gpio-458 (                    |ath9k-phy1          ) out lo    

gpiochip1: GPIOs 464-479, parent: pci/0000:43:00.0, ath9k-phy0:
 gpio-474 (                    |ath9k-phy0          ) out lo    

gpiochip0: GPIOs 480-511, /plb/opb/gpio@ef600b00:
 gpio-496 (                    |Reset button        ) in  hi    
 gpio-497 (                    |?                   ) out lo    
 gpio-498 (                    |?                   ) out lo    
 gpio-499 (                    |?                   ) out hi    
 gpio-500 (                    |?                   ) out lo    
 gpio-501 (                    |?                   ) out hi    
 gpio-502 (                    |?                   ) out hi    
 gpio-503 (                    |?                   ) out hi    

I see gpio-497 has s different state, which is interesting.

Russell Senior commented on 09.04.2017 02:25

gpio-497 is just the LAN LED, according to target/linux/apm821xx/dts/MR24.dts:

                [...]
                lan {
                        label = "mr24:green:wan";
                        gpios = <&GPIO0 17 GPIO_ACTIVE_LOW>;
                };
                [...]
Russell Senior commented on 09.04.2017 06:19

When booted with ethernet unplugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  4 06:13 .
drwxr-xr-x   23 root     root             0 Apr  4 06:13 ..
--w-------    1 root     root          4096 Apr  4 06:13 bind
--w-------    1 root     root          4096 Apr  4 06:13 uevent
--w-------    1 root     root          4096 Apr  4 06:13 unbind

Then trying to bind:

root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind
[  533.566010] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[  533.572086] /plb/opb/ethernet@ef600c00: reset timeout
[  533.577159] /plb/opb/ethernet@ef600c00: can't find PHY!
ash: write error: No such device

When booted with ethernet plugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
lrwxrwxrwx    1 root     root             0 Apr  8 23:10 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w-------    1 root     root          4096 Apr  8 23:10 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:10 unbind

Then unbind, ls, bind with cable still plugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
lrwxrwxrwx    1 root     root             0 Apr  8 23:10 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w-------    1 root     root          4096 Apr  8 23:10 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:10 unbind
root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/unbind
root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
--w-------    1 root     root          4096 Apr  8 23:10 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:11 unbind
root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind
[  124.622773] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[  124.630448] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:73:01:23:41
[  124.637355] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)
[  124.645524] eth0: link is down
[  124.649110] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@LEDE:/# [  128.178678] eth0: link is up, 1000 FDX, pause enabled
[  128.183831] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
lrwxrwxrwx    1 root     root             0 Apr  8 23:11 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w-------    1 root     root          4096 Apr  8 23:11 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:11 unbind

Now, remove the ethernet and repeat:

root@LEDE:/# [  323.765251] eth0: link is down
root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
lrwxrwxrwx    1 root     root             0 Apr  8 23:11 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w-------    1 root     root          4096 Apr  8 23:11 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:11 unbind
root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/unbind
[  339.974763] /plb/opb/ethernet@ef600c00: RX disable timeout
root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
--w-------    1 root     root          4096 Apr  8 23:11 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:15 unbind
root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind
[  352.468859] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[  352.474941] /plb/opb/ethernet@ef600c00: reset timeout
[  352.480974] /plb/opb/ethernet@ef600c00: can't find PHY!
ash: write error: No such device
root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x    2 root     root             0 Apr  8 23:10 .
drwxr-xr-x   23 root     root             0 Apr  8 23:10 ..
--w-------    1 root     root          4096 Apr  8 23:15 bind
--w-------    1 root     root          4096 Apr  8 23:10 uevent
--w-------    1 root     root          4096 Apr  8 23:15 unbind
Christian Lamparter commented on 10.04.2017 16:06

Chris Blake was also able to reproduce this issue with his unit. Problem is we are currently stuck on what's going on with regards to the AR8035.

It's likely that the u-boot is powering down the (inactive) phy before handing
control over to the LEDE installation. This is because the MX60 (same Generation, but it is the router) does disable the ports as some sort of "security measure".

I think in order to debug this, it will be necessary to look what is happening to the phy chip. Can you probe the individual pins of the chip with a digital oscilloscope? The most important pins would be the reset pin (Pin 1), the XI clocks (Pin 5 and Pin 4). Of course it would be better, if you can also probe the MDC/MDIO(pin 40 and pin 39) and see if u-boot is disabling the PHY and how.

you can find a datasheet with the pinout with google
(i.e.: <https://www.redeszone.net/app/uploads/2014/04/AR8035.pdf> )

The Pinout is on page 4. The Power-on Sequence is explained on page 26.

Note: We looked into meraki-linux's source as well. But there are no modifications to the emac driver, apart from adding the PHY to the list of known phys. I don't have a MR24 myself, but given this information, I wonder what Meraki's stock firmware is doing in this case. Can you reflash the original firmware and provide a bootlog? Just in case there's some note or hint in it.

Christian Lamparter commented on 09.06.2017 17:37

A patch has been sent to the LEDE-ML:
<https://patchwork.ozlabs.org/patch/772731/>

The fix has also been accepted upstream by David Miller:
<https://patchwork.ozlabs.org/patch/772436/>

Christian Lamparter commented on 22.06.2017 12:23

@dedeckeh
Can you please look again? The patch is still waiting (marked as new) in LEDE's patchwork: <https://patchwork.ozlabs.org/patch/772731/>
I can't find it in either the main source.git or in your staging tree either.

Project Manager
Hans Dedecker commented on 22.06.2017 13:04

Closed by accident; reopened again.
sorry for the noise

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing