LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Medium
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 2
  • Private
Attached to Project: LEDE Project
Opened by Mauro Mozzarelli - 10.12.2016
Last edited by Mathias Kresin - 10.08.2017

FS#330 - Bt Home Hub 5 (xrx200): kernel panic at boot when LAN port is plugged

When a LAN port of the Bt Home Hub 5 is plugged to a gigabit network, upon booting, the Bt Home Hub 5 goes into a kernel panic boot loop. The kernel panic is either about unaligned access or unhandled paging request.

//Original report://

Bt Home Hub 5 is connected to the internet via ADSL and acquires a subnet.
The 4 ports Switch is on the fixed public IP Internet subnet, 1 port connected to WNDR3700v4 WAN port and the others to servers on fixed public IP subnet.
The red Ethernet is not connected.
Masquerading is not configured, used as proper router.
WiFi Disabled

WNDR3700v4 WAN port connected to BT Home Hub 5 Switch and assigned 1 IP in the public subnet
WNDR3700v4 4 port switch connected to local switch and other devices on local private lan.
Configured for Masquerading from LAN and WiFi to WAN.

Bt Home Hub 5 panics at boot only if:
1) the 4 port switch is connected to the WNDR3700v4 WAN port
2) WNDR3700v4 runs LEDE > r2393. I have no releases between r2393 and r2443, thus it begins to happen if I upgrade WNDR3700v4 to 2443.

It is really peculiar because a firmware upgrade on one router, WNDR3700v4, causes a panic on a Home Hub 5 type!

If I downgrade WNDR3700v4 to r2393, I can reboot the Home Hub as many times as I want and there is no panic.

Attached is the boot console log from Home Hub 5

[    3.877651] init: - preinit -
[    4.072951] Reserved instruction in kernel code[#1]:
[    4.076461] CPU: 0 PID: 391 Comm: ip Not tainted 4.4.36 #0
[    4.081940] task: 87d20f00 ti: 87f9c000 task.ti: 87f9c000
[    4.087330] $ 0   : 00000000 00000000 97ff97ff 00000000
[    4.092552] $ 4   : afffffdf 87c0f6c0 00000020 00000000
[    4.097774] $ 8   : 00000040 87e17962 00000000 00000000
[    4.102996] $12   : 10032094 7737d2b0 00000000 00400bd4
[    4.108219] $16   : 87c0f6c0 afffffdf 87c0f6c0 afffffdf
[    4.113441] $20   : 00000000 c0000000 00000020 87d14800
[    4.118664] $24   : 00000000 800285d0                  
[    4.123886] $28   : 87f9c000 87f9db68 8066c878 802e76bc
[    4.129109] Hi    : 000001cb
[    4.131981] Lo    : 00001e75
[    4.134861] epc   : 97ff97ff 0x97ff97ff
[    4.138708] ra    : 802e76bc eth_type_trans+0x30/0x210
[    4.143819] Status: 1100fc03	KERNEL EXL IE 
[    4.147999] Cause : 10800028 (ExcCode 0a)
[    4.152002] PrId  : 00019556 (MIPS 34Kc)
[    4.155916] Modules linked in: usb_storage dwc2 ledtrig_transient ehci_platform ehci_hcd sd_mod scsi_mod gpio_button_hotplug xfs libcrc32c ext4 jbd2 mbcache exportfs cryptomgr aead crypto_null crc32c_generic
[    4.174377] Process ip (pid: 391, threadinfo=87f9c000, task=87d20f00, tls=7737ee50)
[    4.182028] Stack : 87f9c000 87f9db90 8066c878 802c8904 8066cca8 71000040 87c0f6c0 8066cca8
	  00000040 80277ad8 87f380a7 00000b0e 00000000 00000000 8000003f 8049d0ac
	  8066cca8 00000020 00000008 0000012c 804e8960 87f9dbf0 fffedee6 804e0000
	  804f0000 802c8904 00001000 00000003 00000001 8048944c 804f0000 804b350c
	  804e0000 804b3898 87f9dbf0 87f9dbf0 87f9dbf8 87f9dbf8 80643f24 00000004
	  ...
[    4.217542] Call Trace:
[    4.220013] [<802c8904>] net_rx_action+0x118/0x2e0
[    4.224781] [<80277ad8>] xrx200_poll_rx+0x134/0x1d8
[    4.229653] [<802c8904>] net_rx_action+0x118/0x2e0
[    4.234456] [<80030c6c>] __do_softirq+0x2a0/0x2b8
[    4.239135] [<8005cd40>] handle_level_irq+0x100/0x170
[    4.244185] [<802786e8>] xrx200_open+0x128/0x1e8
[    4.248808] [<8000d15c>] ltq_hw_irqdispatch+0xa8/0xe4
[    4.253854] [<80144cb4>] do_readpage+0x438/0x6a0
[    4.258458] [<80002430>] ret_from_irq+0x0/0x4
[    4.262815] [<802786e8>] xrx200_open+0x128/0x1e8
[    4.267427] [<802786e8>] xrx200_open+0x128/0x1e8
[    4.272039] [<80278704>] xrx200_open+0x144/0x1e8
[    4.276649] [<80278614>] xrx200_open+0x54/0x1e8
[    4.281189] [<802cbd20>] __dev_open+0xf0/0x1ac
[    4.285633] [<80087b08>] filemap_map_pages+0x250/0x2f4
[    4.290755] [<802cc08c>] __dev_change_flags+0xc0/0x17c
[    4.295901] [<800aea7c>] do_set_pte+0x180/0x1d4
[    4.300416] [<802cc170>] dev_change_flags+0x28/0x70
[    4.305286] [<802e26b8>] dev_load+0x18/0x8c
[    4.309475] [<8033bf28>] devinet_ioctl+0x2b4/0x8a8
[    4.314273] [<802acd1c>] sock_ioctl+0x294/0x2f0
[    4.318788] [<800d980c>] do_vfs_ioctl+0x584/0x5ec
[    4.323480] [<800dc664>] d_instantiate+0x24/0x38
[    4.328093] [<802ab698>] sock_alloc_file+0xb8/0x118
[    4.332970] [<800e3a7c>] __alloc_fd+0xb0/0x208
[    4.337411] [<800d98c4>] SyS_ioctl+0x50/0x94
[    4.341676] [<8000455c>] syscall_common+0x30/0x54
[    4.346373] [<80006960>] __bzero+0xc4/0x164
[    4.350543] 
[    4.352017] 
Code: ffff  ffff  ffff <ffff> ffff  ffff  ffff  ffff  ffff 
[    4.358872] ---[ end trace 248345993f4ae135 ]---
[    4.365919] Kernel panic - not syncing: Fatal exception in interrupt
[    4.372104] Rebooting in 1 seconds..


Closed by  Mathias Kresin
10.08.2017 06:13
Reason for closing:  Deferred
Mauro Mozzarelli commented on 10.12.2016 18:43

I have now recovered r2395 and flashed WNDR3700v4. Rebooted BT Home Hub 5 (with r2446) no panic. So up to r2395 on WNDR3700v4 Bt Home Hub 5 connected to WAN is OK.

Mauro Mozzarelli commented on 10.12.2016 20:31

I upgraded both BT Home Hub 5 and WNDR3700v4 to r2449, just pulled from GIT and built.
No more panics on boot for BT Home Hub 5!

Project Manager
Mathias Kresin commented on 16.12.2016 08:55

It is most likely not fixed and unrelated to the WNDR3700v4.

It seams to me some timing related bug. As soon as something changes in the boot timing the issue is either triggered or gone.

I'm aware of two other people who managed to trigger the bug. All of them are using a BT HomeHub 5A. Another report of the panic can be found at http://openwrt.ebilan.co.uk/viewtopic.php?f=7&t=182&p=1461#p1441.

Mauro Mozzarelli commented on 13.01.2017 14:33

This bug disappeared from r2449 up to r2885.
I have just built r2935 and the bug has reappeared.

Project Manager
Mathias Kresin commented on 13.01.2017 14:51
Mauro Mozzarelli commented on 18.01.2017 11:29

Since my previous post I built the following:

r2960, r2966, r2970, r3018.

I have not seen the problem occurring since. I am not sure which one included the above patch. Currently I am doing LEDE builds for BT Home Hub 5 and Netgear WNDR 3700v4 about twice a week average, I will report again should the problem re-occur. Thank you.

Project Manager
Mathias Kresin commented on 18.01.2017 13:37

None of them includes the patch. I asked you to apply this patch to your clone of the LEDE r2935 source code, compile an image by your self and test again.

I just want to know if this patches fixes the issue. But since you are moved on, it's the same as before. Due to minimal changes of the kernel and or the boot sequence this bug appears or is gone.

Project Manager
Felix Fietkau commented on 19.01.2017 13:52

I've pushed a fix that might help with this issue

Michael Thomson commented on 20.01.2017 07:14

Thanks Felix - I'll test over the weekend. I had tried a random build yesterday just before your patch landed, in hope Matthias was right and it was intermittent / build-dependent, but immediately got into a panic loop on boot with an existing Ethernet connection to my LAN.

Mauro Mozzarelli commented on 20.01.2017 10:19

Mathias:

I am sorry, I take a backup of the full trunk folder every time I create a new release after the test is successful, I have a history of the last two months builds, but since it did not work I did not take one for r2935.

If the test is still important (I think I have seen Felix's change in the latest pull), I could see if it is possible to re-pull from git that specific release and I will see if I can reproduce the condition.

Mauro Mozzarelli commented on 20.01.2017 12:04

I built r3055, a patch came through for lantiq, I wonder if it is Felix's patch.
The Bt Home Hub 5 now panics on boot.

Here is my physical layout:

                          +------------ BT Home Hub 5 -----------+
                          |                                      | 
         +-------- [Yellow Switch] --------+              [Red Ethernet]
         |          |          |           |                     |
         |      Computer       |       Computer                  |
         |      Intel GbE      |      Nvidia GbE               Empty
         |                     |
   WNDR 3700v4             Computer
                          Intel GbE

When WNDR 3700v4 (also running r3055 build) is connected:
- Bt Home Hub 5 panics (see boot log attached) and reboots. It may continue to reboot several times or succeed the second time. This appears to be random.

When WNDR 3700v4 is disconnected
- Bt Home Hub 5 boots without panic.

On the second case, once booted if I reconnect WNDR3700v4 it works fine. However once (and it happened only once so far) it rebooted itself randomly and then continued to panic on each reboot until I disconnected WNDR 3700v4.

Mauro Mozzarelli commented on 20.01.2017 15:27

Here is something really odd:

1) I flashed WNDR3700v4 back to r3018.
2) I rebooted BT Home Hub 5 (still on r3055) several times. No panic.
3) I thought: with some firmware builds something must put WNDR3700v4 Ethernet in a state that triggers the panic on BT Home Hub 5
4) I flashed WNDR3700v4 forward to r3055, thinking "now I want to test Bt Home Hub 5 on r3018 (that never gave panic) with the former on r3055".
5) Once both are again on r3055 (before I flash BtHomeHub5 back) I reboot BtHomeHub5. No panic. I try again a dozen times. No panic.
6) I end the test, no point to flash BtHomeHub5 back to r3018.

Please see attached the boot log for BtHomeHub5 at step 5.

Michael Thomson commented on 20.01.2017 20:26

Sorry Felix.

I built snapshot r3042-ada6d9f, including your patch, and my HH5A Type 1, and it panics reliably every time, if booted with a live Gig-E connection to my LAN. It boots normally without a connection. So no improvement from previous behaviour, I'm sorry to say.

Mauro Mozzarelli commented on 22.01.2017 20:06

I built most releases up to r3069. Same problem as above.

Project Manager
Mathias Kresin commented on 23.01.2017 12:01

Has anybody applied the patch I linked to on top of an revision that shows the error? Does it improve the situation?

Mauro Mozzarelli commented on 23.01.2017 12:07

Mathias, usually I pull updates from git. If your patch was not pushed through git, then I haven't tested it. If you confirm, then I will try to apply it on my next build.

Please can we raise the priority from "Very Low" to "High" for this task, given it is Critical? My request however is more administrative than anything else as I see that you guys are already providing excellent support. Thank you.

Project Manager
Mathias Kresin commented on 23.01.2017 12:19

My patch will not be pushed to git as long as no one confirms that it fixes anything. It is just a possible fix and might not work or does silly things.

You need to apply it manually. And you should apply it to the version that shows the bug. If you pull in updates and apply my patch afterwards, it isn't clear if something was pushed to git that fixes the issues or if a git commit just changed the timing, so that the bug is hidden (but not fixed!) again.

Michael Thomson commented on 23.01.2017 23:23

Hi Matthias
I'll certainly try your patch on top of r3042-ada6d9f, and let you know if that helps. Might take me a day or so to get the time to build and test, however.
Mike.

Michael Thomson commented on 23.01.2017 23:30

Hmmm. I'm doing something stupid, I can feel it. Tried applying that patch with –dry-run and failed, see below.
Suggestions / tuition gratefully accepted!

Mikes-2015-Sierra-iMac:source michthom$ patch --dry-run < /tmp/patch
can't find file to patch at input line 5
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/target/linux/lantiq/patches-4.4/0025-NET-MIPS-lantiq-adds-xrx200-net.patch b/target/linux/lantiq/patches-4.4/0025-NET-MIPS-lantiq-adds-xrx200-net.patch
|index 4e60f30..9c931f8 100644
|--- a/target/linux/lantiq/patches-4.4/0025-NET-MIPS-lantiq-adds-xrx200-net.patch
|+++ b/target/linux/lantiq/patches-4.4/0025-NET-MIPS-lantiq-adds-xrx200-net.patch
--------------------------
File to patch: ./build_dir/target-mips_24kc_musl-1.1.16/linux-lantiq_xrx200/linux-4.4.42/drivers/net/ethernet/lantiq_xrx200.c
patching file ./build_dir/target-mips_24kc_musl-1.1.16/linux-lantiq_xrx200/linux-4.4.42/drivers/net/ethernet/lantiq_xrx200.c
Hunk #1 FAILED at 209.
Hunk #2 FAILED at 1185.
Hunk #3 FAILED at 1207.
3 out of 3 hunks FAILED -- saving rejects to file ./build_dir/target-mips_24kc_musl-1.1.16/linux-lantiq_xrx200/linux-4.4.42/drivers/net/ethernet/lantiq_xrx200.c.rej
Michael Thomson commented on 23.01.2017 23:40

Nevermind - I patched it in manually. Building now.

Michael Thomson commented on 23.01.2017 23:56

I may have screwed up the build - I do have a new lantiq_xrx200.o and lede-lantiq-xrx200-BTHOMEHUBV5A-squashfs-sysupgrade.bin, but the uname -a output is still
Linux lede 4.4.42 #0 Thu Jan 19 19:46:58 2017 mips GNU/Linux

And it still crashes when the LAN cable is present on boot.

[    2.163167] init: Console is alive
[    2.165386] init: - watchdog -
[    2.380073] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.239078] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.242679] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.249981] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.255512] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.262953] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.270385] hub 1-0:1.0: USB hub found
[    3.273481] hub 1-0:1.0: 1 port detected
[    3.280549] init: - preinit -
[    3.476319] Unhandled kernel unaligned access[#1]:
[    3.479652] CPU: 0 PID: 354 Comm: ip Not tainted 4.4.42 #0
[    3.485132] task: 87d4ca10 ti: 87f5c000 task.ti: 87f5c000
[    3.490521] $ 0   : 00000000 00000000 87e1f92a 00000000
[    3.495743] $ 4   : 87c1b6c0 f7ebffb6 00000020 00000000
[    3.500965] $ 8   : 00000040 87e1f962 00000000 00000000
[    3.506187] $12   : 1003206c 770322c0 00000000 00400bcc
[    3.511410] $16   : 8065ccc0 00000040 87c1b6c0 f7ebffb6
[    3.516632] $20   : 00000000 c0000000 00000020 8065c898
[    3.521855] $24   : 00000000 80028a38                  
[    3.527077] $28   : 87f5c000 87f5dc08 87d78800 8027810c
[    3.532300] Hi    : 000001cb
[    3.535172] Lo    : 00001e75
[    3.538073] epc   : 802e7794 eth_type_trans+0x14/0x210
[    3.543194] ra    : 8027810c xrx200_poll_rx+0x140/0x1f4
[    3.548402] Status: 1100fc03 KERNEL EXL IE 
[    3.552582] Cause : 00800010 (ExcCode 04)
[    3.556584] BadVA : f7ec00ea
[    3.559458] PrId  : 00019556 (MIPS 34Kc)
[    3.563372] Modules linked in: dwc2 gpio_button_hotplug cryptomgr aead crypto_null
[    3.570953] Process ip (pid: 354, threadinfo=87f5c000, task=87d4ca10, tls=77033d48)
[    3.578604] Stack : 87f5c000 87f5dc30 87d78800 802c87b8 8065ccc0 71000040 87c1b6c0 8065ccc0
          00000040 8027810c 0004a515 804e0000 00000bd3 805a2000 8000003f 80498bf8
          8065ccc0 00000020 00000008 0000012c 804e4940 87f5dc90 fffede51 804e0000
          804e0000 802c87b8 0000004b 87812758 81014ba0 805a2000 804e0000 804af094
          804e0000 804af3a4 87f5dc90 87f5dc90 87f5dc98 87f5dc98 80633f2c 00000004
          ...
[    3.614117] Call Trace:
[    3.616563] [<802e7794>] eth_type_trans+0x14/0x210
[    3.621352] [<8027810c>] xrx200_poll_rx+0x140/0x1f4
[    3.626245] [<802c87b8>] net_rx_action+0x118/0x2e0
[    3.631027] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    3.635712] [<800311c8>] do_softirq.part.1+0x70/0x78
[    3.640674] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    3.645896] [<80278d48>] xrx200_open+0x144/0x1e8
[    3.650513] [<802cbf50>] __dev_open+0xf0/0x1ac
[    3.654950] [<802cc2bc>] __dev_change_flags+0xc0/0x17c
[    3.660084] [<802cc3a0>] dev_change_flags+0x28/0x70
[    3.664961] [<803382fc>] devinet_ioctl+0x2b4/0x8a8
[    3.669768] [<802ad2c0>] sock_ioctl+0x294/0x2f0
[    3.674263] 
[    3.675741] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.685744] ---[ end trace 7d1f7055a9454ce3 ]---
[    3.692582] Kernel panic - not syncing: Fatal exception in interrupt
[    3.698713] Rebooting in 1 seconds..
Mauro Mozzarelli commented on 24.01.2017 12:06

Mathias,

I applied the patch you posted above on 13.01.2017 14:51 to r3079-9993d80

Here are my steps:

1) I built r3079-9993d80 for WNDR-3700v4 (no patch)
2) I re-flashed WNDR-3700v4
3) I rebooted BT Home Hub running r3069 (no panic)
4) I built r3079-9993d80 for BT Home Hub 5 (no patch)
5) I flashed BT Home Hub 5 with r3079-9993d80 (no patch)
6) I rebooted BT Home Hub 5 several times with the network connected - panic every time
7) I rebooted BT Home Hub 5 without the network ports connected - it booted and I reconnected the network ports after boot.
8) I patched and built r3079-9993d80 for BT Home Hub 5 with your patch above
9) I flashed BT Home Hub 5 with r3079-9993d80 including your patch
10) I rebooted several times. NO PANIC

This test might not be 100% conclusive because in my case at times a simple firmware rebuild and re-flash could prevent the issue manifesting. However it looks promising.
I will continue to reapply the patch (now stashed) through every future builds and report.

Thank you.

Project Manager
Mathias Kresin commented on 24.01.2017 12:43

Mauro, thanks a lot for the feedback.

I didn't wrote the patch and I've to admit that I'm not sure if I understand what the patch really does. But I'm in contact with the author.

Please keep on testing this patch to make sure that it isn't just a changed timing which makes the bug disappear.

Mauro Mozzarelli commented on 24.01.2017 13:42

Mathias,

I built and tested r3088 with patch. No panic.

The patch disables interrupts within a code block in the xrx200 ethernet driver.

Mauro Mozzarelli commented on 24.01.2017 19:45

Mathias,

I built and tested r3112 with patch. All good, no panic.

Michael Thomson commented on 25.01.2017 01:33

Hmmm. I rm'd my build tree, cloned a fresh copy of the LEDE master tree, but it won't build even before I add Mattias's patch:

ltq-boot-image.c:47:2: error: unknown type name 'loff_t'; did you mean 'off_t'?
        loff_t          uboot_offset;

ltq-boot-image.c:278:34: error: unknown type name 'loff_t'; did you mean 'off_t'?
static int pad_to_offset(int fd, loff_t offset)

ltq-boot-image.c:280:2: error: unknown type name 'loff_t'; did you mean 'off_t'?
        loff_t pos;

I'm on macOS 10.12 - any clues?

Project Manager
Mathias Kresin commented on 25.01.2017 11:19

Unselect all u-boots in menuconfig ⇒ bootloaders. U-boot doesn't build at moment on MacOS.

Michael Thomson commented on 25.01.2017 19:47

Thank you Matthias - my sanity is saved.

My testing:
r3121-b02c381 without the interrupt blocking code:
- Boots normally with no active Ethernet connection
- Reproducibly panics 100% of the time with a LAN cable connected.

I then applied the patch:
Mikes-2015-Sierra-iMac:source michthom$ cd /Volumes/LEDE/source
Mikes-2015-Sierra-iMac:source michthom$ patch -p1 < /tmp/patch
patching file target/linux/lantiq/patches-4.4/0025-NET-MIPS-lantiq-adds-xrx200-net.patch
Mikes-2015-Sierra-iMac:source michthom$ make

Booted from Matthias' install image, ran prepare, sysupgraded with the new build, restarted

r3121-b02c381 WITH the interrupt blocking code:
- Boots normally with no active Ethernet connection
- Reproducibly panics 100% of the time with a LAN cable connected.

So I'm sorry to bring bad news, but this doesn't fix it for me, unless I should have done a make clean or get a completely fresh git clone before applying the patch?

Michael Thomson commented on 25.01.2017 21:49

Nope. Created a fresh git clone from LEDE master at r3128-b94177e
Applied the patch, ran make menuconfig to set the target and remove the u-boot
make
Booted from installimage over tftp
prepare
sysupgrade from my new sys upgrade image

and I get the same as before - panic 100% every time with ethernet connected, stable as a rock without.

Mauro Mozzarelli commented on 26.01.2017 11:31

Michael,

I think your test results are more significant because you can reproduce the panic reliably whilst in my case it is random.

Michael Thomson commented on 26.01.2017 20:19

Mauro, are you connecting to a 100Mb/s or Gigabit network? Mine is Cat5e 1000Mbps, and I wonder if the line rate is relevant in how likely the issue is to trigger? But the "unaligned" part of the error seems strange - like it's missing a byte somewhere... I wish I could read code (and network code in particular) well enough to do anything more than act as a (very willing) guinea pig. I have 3 of the HH5a units, so I'm going to try a build with the patch on each one and see if I get any variation in behaviour...

Mauro Mozzarelli commented on 27.01.2017 10:18

Michael,

It is 1000Mbs 1.5m Cat5e cable to a Netgear 3700v4 WAN port. So far this has been the only configuration that triggered it in my case. When it happens if I disconnect the cable it does not panic. The "random" depends on the fact that when I manage to trigger it, it is reliably, but if I flash again WNDR3700v4 with the very same firmware image that triggered it. It does not trigger it anymore after a 2nd flash (???). No idea how and why.

Michael Thomson commented on 27.01.2017 19:17

Wow - that's odd. I've no idea why an image flashed *twice* should be more stable. But I'll try anything at this point...!

Michael Thomson commented on 27.01.2017 19:35

Hmm. I can't reproduce the "fix" of flashing the same (patched) image twice - I get reboots with Ethernet connected.
One possible red herring...
When I was doing a TFTP boot, before sys upgrade, on two occasions the connection was not (immediately) ready. ON both occasions the install image did seem to load, but when I used bootm I got a *different* crash, see below.
I'd just switched from the LAN to a direct connection to my Mac (the tftp server).

[    1.958193] init: Console is alive
[    1.960431] init: - watchdog -
[    1.978905] exFAT: Version 1.2.9
[    2.020715] SCSI subsystem initialized
[    2.031326] dwc2 1e101000.ifxhcd: requested GPIO 495
[    2.034930] dwc2 1e101000.ifxhcd: Configuration mismatch. Forcing host mode
[    2.898313] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    2.901912] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    2.909219] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    2.914762] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    2.922193] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    2.929701] hub 1-0:1.0: USB hub found
[    2.932650] hub 1-0:1.0: 1 port detected
[    2.940395] usbcore: registered new interface driver usb-storage
[    2.955456] init: - preinit -
[    3.120557] random: procd: uninitialized urandom read (4 bytes read, 13 bits of entropy available)
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
[    3.314149] usb 1-1: new high-speed USB device number 2 using dwc2
[    3.515649] usb-storage 1-1:1.0: USB Mass Storage device detected
[    3.521725] scsi host0: usb-storage 1-1:1.0
[    4.523474] scsi 0:0:0:0: Direct-Access     SanDisk  Cruzer Glide     1.27 PQ: 0 ANSI: 6
[    4.533609] sd 0:0:0:0: [sda] 31266816 512-byte logical blocks: (16.0 GB/14.9 GiB)
[    4.541347] sd 0:0:0:0: [sda] Write Protect is off
[    4.545262] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    4.570660]  sda: sda1
[    4.576816] sd 0:0:0:0: [sda] Attached SCSI removable disk
[    5.378253] eth0: port 2 got link
[    5.481081] CPU 0 Unable to handle kernel paging request at virtual address fffffffe, epc == ffffffff, ra == 802e5818
[    5.490290] Oops[#1]:
[    5.492514] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.30 #0
[    5.498254] task: 804dad98 ti: 804d4000 task.ti: 804d4000
[    5.503645] $ 0   : 00000000 80a70000 ffffffff 00000000
[    5.508866] $ 4   : 8e0a704f 87c0f600 00000020 00000000
[    5.514088] $ 8   : 0000003c 87e436fe 00000000 00000004
[    5.519311] $12   : 00000000 000000b9 00000000 00000000
[    5.524533] $16   : 87c0f600 8e0a704f 87c0f600 8e0a704f
[    5.529755] $20   : 00000000 c0000000 00000020 87dc8800
[    5.534978] $24   : 804d6034 80028818                  
[    5.540200] $28   : 804d4000 804d5cb8 80a9c8b8 802e5818
[    5.545423] Hi    : 002884f8
[    5.548296] Lo    : a4800000
[    5.551176] epc   : ffffffff 0xffffffff
[    5.555023] ra    : 802e5818 eth_type_trans+0x30/0x210
[    5.560133] Status: 1100fc03 KERNEL EXL IE 
[    5.564313] Cause : 10800008 (ExcCode 02)
[    5.568316] BadVA : fffffffe
[    5.571189] PrId  : 00019556 (MIPS 34Kc)
[    5.575103] Modules linked in: usb_storage dwc2 sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache exfat crc32c_generic
[    5.586165] Process swapper (pid: 0, threadinfo=804d4000, task=804dad98, tls=00000000)
[    5.594078] Stack : 804d4000 804d5ce0 80a9c8b8 802c6880 80a9cce0 7100003c 87c0f600 80a9cce0
          0000003c 80276328 00000003 804d9b70 804d9bd8 80063f58 8000003f 804912b4
          80a9cce0 00000020 00000008 0000012c 804da8a0 804d5d40 fffee047 804e0000
          804e0000 802c6880 00000000 00000000 00000007 804dfdc8 804e0000 804a7714
          804d0000 804a7a24 804d5d40 804d5d40 804d5d48 804d5d48 80a73f54 00000004
          ...
[    5.629591] Call Trace:
[    5.632054] [<802c6880>] net_rx_action+0x118/0x2e0
[    5.636830] [<80276328>] xrx200_poll_rx+0x134/0x1d8
[    5.641716] [<80063f58>] hrtimer_interrupt+0x110/0x270
[    5.646835] [<802c6880>] net_rx_action+0x118/0x2e0
[    5.651638] [<80030eb4>] __do_softirq+0x2a0/0x2b8
[    5.656319] [<8005cf88>] handle_level_irq+0x100/0x170
[    5.661383] [<8000d0b4>] ltq_hw_irqdispatch+0xa8/0xe4
[    5.666418] [<80002430>] ret_from_irq+0x0/0x4
[    5.670790] [<8004f8d8>] pick_next_task_fair+0x1cc/0x294
[    5.676093] [<80013a4c>] r4k_wait_irqoff+0x0/0x20
[    5.680785] [<80053190>] cpu_startup_entry+0xa0/0xf4
[    5.685740] [<80008e0c>] schedule_preempt_disabled+0x10/0x1c
[    5.691398] [<80013a64>] r4k_wait_irqoff+0x18/0x20
[    5.696198] [<80006c38>] rest_init+0x64/0x74
[    5.700466] [<804fcbec>] start_kernel+0x46c/0x488
[    5.705152] [<804fc43c>] unknown_bootoption+0x0/0x290
[    5.710203] 
[    5.711669] 
Code: (Bad address in epc)
[    5.715585] 
[    5.717110] ---[ end trace a57d690884b72863 ]---
[    5.724182] Kernel panic - not syncing: Fatal exception in interrupt
[    5.730377] Rebooting in 1 seconds..
Mauro Mozzarelli commented on 29.01.2017 16:55

Mathias,

After my last report I built and flashed the following releases, all including your patch:

r3088, r3157, r3186, r3190, r3196

I am afraid my results differ from Michael, as I have not seen any PANIC anymore.
This might not be significant as unlike Michael I have not been able to reproduce the panic reliably.

I hereby include a link to download the latest r3196 i built for Michael to test.

Michael Thomson commented on 29.01.2017 19:56

Sorry Mauro. Your build crashes reliably for me, see below. I suspect that this is less to do with the build and more the data the network stack is dealing with (which will be different in your network from mine).

Reboot (SNAPSHOT, r3196-84da2a6)
Linux LEDE 4.4.45 #0 Sun Jan 29 13:09:38 2017 mips GNU/Linux

I found a good description of unaligned access errors.

I wonder if there are some (malformed?) packets floating around my LAN that are being seen by the Ethernet code on the HH5a, and causing it to barf?

[    2.164615] init: Console is alive
[    2.166843] init: - watchdog -
[    2.509009] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.873317] SGI XFS with security attributes, no debug enabled
[    2.918200] SCSI subsystem initialized
[    2.940235] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.946857] ehci-platform: EHCI generic platform driver
[    2.973384] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.831319] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.834913] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.842229] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.847758] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.855198] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.862627] hub 1-0:1.0: USB hub found
[    3.865619] hub 1-0:1.0: 1 port detected
[    3.882086] usbcore: registered new interface driver usb-storage
[    3.887577] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.898135] init: - preinit -
[    4.152448] Unhandled kernel unaligned access[#1]:
[    4.155787] CPU: 0 PID: 393 Comm: ip Not tainted 4.4.45 #0
[    4.161266] task: 87d25e30 ti: 87ff2000 task.ti: 87ff2000
[    4.166655] $ 0   : 00000000 7ffe2e0c 87e1f92a 00000000
[    4.171877] $ 4   : 87c1b6c0 ffffffff 00000020 00000000
[    4.177099] $ 8   : 00000040 87e1f962 5e78a44d 00000000
[    4.182321] $12   : 00000000 87ff2000 00000000 00000002
[    4.187544] $16   : 8066ccc8 00000040 87c1b6c0 ffffffff
[    4.192766] $20   : 00000000 c0000000 00000020 8066c898
[    4.197989] $24   : 00000000 80028a38                  
[    4.203211] $28   : 87ff2000 87ff38b0 87d78800 80278494
[    4.208434] Hi    : 000419d0
[    4.211306] Lo    : 00000000
[    4.214208] epc   : 802e7fe4 eth_type_trans+0x14/0x210
[    4.219328] ra    : 80278494 xrx200_poll_rx+0x140/0x1f4
[    4.224536] Status: 1100fc03 KERNEL EXL IE 
[    4.228716] Cause : 00800010 (ExcCode 04)
[    4.232718] BadVA : 00000133
[    4.235592] PrId  : 00019556 (MIPS 34Kc)
[    4.239506] Modules linked in: usb_storage dwc2 ledtrig_transient ehci_platform ehci_hcd sd_mod scsi_mod gpio_button_hotplug xf
s libcrc32c ext4 jbd2 mbcache exportfs aead crypto_null cryptomgr crc32c_generic
[    4.257967] Process ip (pid: 393, threadinfo=87ff2000, task=87d25e30, tls=772c9d48)
[    4.265618] Stack : 87ff2000 87ff38d8 87d78800 802c9248 8066ccc8 71000040 87c1b6c0 8066ccc8
          00000040 80278494 87d78010 87d78350 804fa804 00000003 8000003f 8049e26c
          8066ccc8 00000020 00000008 0000012c 804e8960 87ff3938 fffedefa 804e0000
          804f0000 802c9248 00000000 000073ff 00000000 00000000 804f0000 804b470c
          804e0000 804b4a98 87ff3938 87ff3938 87ff3940 87ff3940 80643f2c 00000004
          ...
[    4.301131] Call Trace:
[    4.303577] [<802e7fe4>] eth_type_trans+0x14/0x210
[    4.308366] [<80278494>] xrx200_poll_rx+0x140/0x1f4
[    4.313259] [<802c9248>] net_rx_action+0x118/0x2e0
[    4.318040] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    4.322726] [<800311c8>] do_softirq.part.1+0x70/0x78
[    4.327688] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    4.332909] [<802790d0>] xrx200_open+0x144/0x1e8
[    4.337529] [<802cc664>] __dev_open+0xf0/0x1ac
[    4.341963] [<802cc9d0>] __dev_change_flags+0xc0/0x17c
[    4.347098] [<802ccab4>] dev_change_flags+0x28/0x70
[    4.351988] [<802dd564>] do_setlink+0x2a0/0x768
[    4.356499] [<802de90c>] rtnl_newlink+0x3a8/0x77c
[    4.361199] [<802dcc50>] rtnetlink_rcv_msg+0x208/0x258
[    4.366346] [<802f8cd8>] netlink_rcv_skb+0x70/0x114
[    4.371208] [<802dca38>] rtnetlink_rcv+0x2c/0x3c
[    4.375821] [<802f832c>] netlink_unicast+0x19c/0x320
[    4.380782] [<802f89fc>] netlink_sendmsg+0x414/0x4b4
[    4.385763] [<802ada08>] ___sys_sendmsg+0x168/0x274
[    4.390618] [<802aea48>] __sys_sendmsg+0x48/0x78
[    4.395237] [<8000482c>] syscall_common+0x30/0x54
[    4.399923] 
[    4.401401] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    4.411395] ---[ end trace 33c589b4a5cf0753 ]---
[    4.418418] Kernel panic - not syncing: Fatal exception in interrupt
Mauro Mozzarelli commented on 29.01.2017 20:06

Michael,

Thank you for testing my build, I think it is positive that now we established that you have a test environment where the fault can be reproduced reliably.

This confirms that the patch provided above by Mathias is ineffective.

Project Manager
Mathias Kresin commented on 29.01.2017 20:14
This confirms that the patch provided above by Mathias is ineffective.

Thanks a lot for confirming that the patch doesn't do anything related. I've removed the patch.

Would you please try http://patchwork.ozlabs.org/patch/721064/ instead. It is already accepted but not yet in the master tree. Since this patch addresses an unaligned access issue, it would be interesting if this one is the key to success.

Mauro Mozzarelli commented on 29.01.2017 21:37

Thank you Mathias.

Michael, I built and flashed r3201 including the patch above posted by Mathias.
I do not get panics. Please can you too test it? The firmware is at the same url as in my previous post. The version r3201 is in the filename.

Michael Thomson commented on 29.01.2017 23:32

Hah - I just spotted Alexander's commit, and was going to post the news.

I built r3201-f3c5934 without the patch and the behaviour remains the same (as expected). No reboot loop if the HH5a is restarted when only connected to my iMac, but consistent reboot loop if it's connected to my gigabit switch.
I'm building the patched sysupgrade file now. Fingers crossed.

Michael Thomson commented on 30.01.2017 00:48

With the patch I get a change in behaviour, but still the restart loop with the LAN cable attached at startup.
The unaligned access has gone, now I get an unhandled kernel paging request:

[    2.164071] init: Console is alive
[    2.166297] init: - watchdog -
[    2.331192] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.399376] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.259441] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.263035] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.270349] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.275891] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.283319] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.290890] hub 1-0:1.0: USB hub found
[    3.293740] hub 1-0:1.0: 1 port detected
[    3.298706] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.315913] init: - preinit -
[    3.517407] CPU 0 Unable to handle kernel paging request at virtual address ef7f8128, epc == 802e7b10, ra == 80278498
[    3.526629] Oops[#1]:
[    3.528835] CPU: 0 PID: 355 Comm: ip Not tainted 4.4.45 #0
[    3.534316] task: 87d4e840 ti: 87fbe000 task.ti: 87fbe000
[    3.539706] $ 0   : 00000000 00000000 87e1f92a 00000000
[    3.544928] $ 4   : 87c1b6c0 ef7f7ff4 00000020 00000000
[    3.550150] $ 8   : 00000040 87e1f962 00000000 00000000
[    3.555372] $12   : 10032080 770d02c0 00000000 00400bd0
[    3.560595] $16   : 8065ccc0 00000040 87c1b6c0 ef7f7ff4
[    3.565817] $20   : 00000000 c0000000 00000020 87d78800
[    3.571039] $24   : 00000000 80028a38                  
[    3.576262] $28   : 87fbe000 87fbfc08 8065c898 80278498
[    3.581485] Hi    : 000001cb
[    3.584357] Lo    : 00001e75
[    3.587258] epc   : 802e7b10 eth_type_trans+0x14/0x210
[    3.592379] ra    : 80278498 xrx200_poll_rx+0x134/0x1d8
[    3.597587] Status: 1100fc03 KERNEL EXL IE 
[    3.601767] Cause : 00800008 (ExcCode 02)
[    3.605769] BadVA : ef7f8128
[    3.608643] PrId  : 00019556 (MIPS 34Kc)
[    3.612557] Modules linked in: dwc2 gpio_button_hotplug aead cryptomgr crypto_null
[    3.620137] Process ip (pid: 355, threadinfo=87fbe000, task=87d4e840, tls=770d1d48)
[    3.627789] Stack : 805a7000 00000b83 87f48030 87f48000 8065ccc0 71000040 87c1b6c0 8065ccc0
          00000040 80278498 00000015 00000014 00000b83 805a7000 8000003f 80498d6c
          8065ccc0 00000020 00000008 0000012c 804e4940 87fbfc90 fffede5b 804e0000
          804e0000 802c8b34 0000004b 87812758 81014c40 805a7000 804e0000 804af20c
          804e0000 804af51c 87fbfc90 87fbfc90 87fbfc98 87fbfc98 80633f2c 00000004
          ...
[    3.663302] Call Trace:
[    3.665747] [<802e7b10>] eth_type_trans+0x14/0x210
[    3.670537] [<80278498>] xrx200_poll_rx+0x134/0x1d8
[    3.675430] [<802c8b34>] net_rx_action+0x118/0x2e0
[    3.680211] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    3.684897] [<800311c8>] do_softirq.part.1+0x70/0x78
[    3.689858] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    3.695080] [<802790c4>] xrx200_open+0x144/0x1e8
[    3.699699] [<802cc2cc>] __dev_open+0xf0/0x1ac
[    3.704135] [<802cc638>] __dev_change_flags+0xc0/0x17c
[    3.709269] [<802cc71c>] dev_change_flags+0x28/0x70
[    3.714146] [<80338678>] devinet_ioctl+0x2b4/0x8a8
[    3.718953] [<802ad630>] sock_ioctl+0x294/0x2f0
[    3.723447] 
[    3.724925] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.734907] ---[ end trace e58ed3fc69ff3318 ]---
[    3.741784] Kernel panic - not syncing: Fatal exception in interrupt
[    3.747911] Rebooting in 1 seconds..
Mauro Mozzarelli commented on 30.01.2017 14:29

Michael,

unhandled kernel paging is an interesting panic.

The kernel is trying to point to a memory address that cannot be accessed. This can be due to 1) a bug; 2) a hardware error that somehow changes address bits.

Do you have another BT Home Hub 5 that you can use to test the same firmware in the same physical network context?

Michael Thomson commented on 30.01.2017 18:30

I have three - will retest later this evening - I ran out of time last night as it was nearly 1am when my compile finished.

Michael Thomson commented on 30.01.2017 18:37

Argh! I'm starting to doubt my sanity - second HH5a is still seeing the unaligned access problem, despite the latest patch?
The output of uname -a has me thinking I didn't get it rebuilt properly last night:

uname -a
Linux LEDE 4.4.45 #0 Sun Jan 29 19:51:02 2017 mips GNU/Linux

I shall try make clean, verify the patch is in place and build the whole stack again.

Mauro Mozzarelli commented on 30.01.2017 18:41

I thought you would have more than one. I have 3 too, soon to add more (so cheap on eBay and such good router!)

None of my 3 panics with the latest few builds as I reported above.

Could you test all yours with the same firmware to exclude a hardware problem on your earlier "unhandled kernel paging" panic?

Michael Thomson commented on 30.01.2017 23:27

Okay, I've built snapshot r3210-4096d33 (which included the new patch already) from a fresh directory, and installed on the first of my HH5a boxes.

Sadly I have to report it still panics in a tight loop with my LAN connected at startup time.

[    2.164064] init: Console is alive
[    2.166291] init: - watchdog -
[    2.331285] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.399560] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.259352] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.262947] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.270262] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.275804] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.283232] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.290801] hub 1-0:1.0: USB hub found
[    3.293651] hub 1-0:1.0: 1 port detected
[    3.298618] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.315828] init: - preinit -
[    3.517916] Unhandled kernel unaligned access[#1]:
[    3.521257] CPU: 0 PID: 355 Comm: ip Not tainted 4.4.45 #0
[    3.526735] task: 87d4de30 ti: 87fba000 task.ti: 87fba000
[    3.532125] $ 0   : 00000000 00000000 87e1f92a 00000000
[    3.537347] $ 4   : 87c1b6c0 f7ebffb6 00000020 00000000
[    3.542569] $ 8   : 00000040 87e1f962 00000000 00000000
[    3.547791] $12   : 10032080 776002c0 00000000 00400bd0
[    3.553014] $16   : 8065ccc0 00000040 87c1b6c0 f7ebffb6
[    3.558236] $20   : 00000000 c0000000 00000020 87d78800
[    3.563458] $24   : 00000000 80028a38                  
[    3.568681] $28   : 87fba000 87fbbc08 8065c898 80278498
[    3.573904] Hi    : 000001cb
[    3.576776] Lo    : 00001e75
[    3.579676] epc   : 802e7b10 eth_type_trans+0x14/0x210
[    3.584798] ra    : 80278498 xrx200_poll_rx+0x134/0x1d8
[    3.590006] Status: 1100fc03 KERNEL EXL IE 
[    3.594186] Cause : 00800010 (ExcCode 04)
[    3.598188] BadVA : f7ec00ea
[    3.601062] PrId  : 00019556 (MIPS 34Kc)
[    3.604976] Modules linked in: dwc2 gpio_button_hotplug aead cryptomgr crypto_null
[    3.612556] Process ip (pid: 355, threadinfo=87fba000, task=87d4de30, tls=77601d48)
[    3.620208] Stack : 805a5000 00000b83 87f48030 87f48000 8065ccc0 71000040 87c1b6c0 8065ccc0
          00000040 80278498 00000015 00000014 00000b83 805a5000 8000003f 80498d6c
          8065ccc0 00000020 00000008 0000012c 804e4940 87fbbc90 fffede5b 804e0000
          804e0000 802c8b34 0000004b 87812758 81014c00 805a5000 804e0000 804af20c
          804e0000 804af51c 87fbbc90 87fbbc90 87fbbc98 87fbbc98 80633f2c 00000004
          ...
[    3.655721] Call Trace:
[    3.658166] [<802e7b10>] eth_type_trans+0x14/0x210
[    3.662956] [<80278498>] xrx200_poll_rx+0x134/0x1d8
[    3.667848] [<802c8b34>] net_rx_action+0x118/0x2e0
[    3.672630] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    3.677316] [<800311c8>] do_softirq.part.1+0x70/0x78
[    3.682277] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    3.687499] [<802790c4>] xrx200_open+0x144/0x1e8
[    3.692118] [<802cc2cc>] __dev_open+0xf0/0x1ac
[    3.696554] [<802cc638>] __dev_change_flags+0xc0/0x17c
[    3.701688] [<802cc71c>] dev_change_flags+0x28/0x70
[    3.706564] [<80338678>] devinet_ioctl+0x2b4/0x8a8
[    3.711372] [<802ad630>] sock_ioctl+0x294/0x2f0
[    3.715866] 
[    3.717344] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.727349] ---[ end trace 8ae55c5ff9524dac ]---
[    3.734190] Kernel panic - not syncing: Fatal exception in interrupt
[    3.740311] Rebooting in 1 seconds..

root@LEDE:/# uname -a
Linux LEDE 4.4.45 #0 Mon Jan 30 21:27:30 2017 mips GNU/Linux
Michael Thomson commented on 30.01.2017 23:37

Interesting - the *same* build on the second HH produces a restart loop, but with a different panic cause:

[    2.162199] init: Console is alive
[    2.164484] init: - watchdog -
[    2.329298] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.397350] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.255429] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.259021] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.266336] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.271876] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.279306] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.286887] hub 1-0:1.0: USB hub found
[    3.289727] hub 1-0:1.0: 1 port detected
[    3.294695] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.311898] init: - preinit -
[    3.513269] CPU 0 Unable to handle kernel paging request at virtual address ef7f8128, epc == 802e7b10, ra == 80278498
[    3.522494] Oops[#1]:
[    3.524700] CPU: 0 PID: 355 Comm: ip Not tainted 4.4.45 #0
[    3.530181] task: 87d4ca10 ti: 87f5a000 task.ti: 87f5a000
[    3.535571] $ 0   : 00000000 00000000 87e1f92a 00000000
[    3.540793] $ 4   : 87c1b6c0 ef7f7ff4 00000020 00000000
[    3.546015] $ 8   : 00000040 87e1f962 00000000 00000000
[    3.551237] $12   : 10032080 77ff02c0 00000000 00400bd0
[    3.556460] $16   : 8065ccc0 00000040 87c1b6c0 ef7f7ff4
[    3.561682] $20   : 00000000 c0000000 00000020 87d78800
[    3.566904] $24   : 00000000 80028a38                  
[    3.572127] $28   : 87f5a000 87f5bc08 8065c898 80278498
[    3.577350] Hi    : 000001cb
[    3.580222] Lo    : 00001e75
[    3.583122] epc   : 802e7b10 eth_type_trans+0x14/0x210
[    3.588244] ra    : 80278498 xrx200_poll_rx+0x134/0x1d8
[    3.593452] Status: 1100fc03 KERNEL EXL IE 
[    3.597632] Cause : 00800008 (ExcCode 02)
[    3.601634] BadVA : ef7f8128
[    3.604508] PrId  : 00019556 (MIPS 34Kc)
[    3.608422] Modules linked in: dwc2 gpio_button_hotplug aead cryptomgr crypto_null
[    3.616002] Process ip (pid: 355, threadinfo=87f5a000, task=87d4ca10, tls=77ff1d48)
[    3.623654] Stack : 805ae000 00000b83 87f48030 87f48000 8065ccc0 71000040 87c1b6c0 8065ccc0
          00000040 80278498 00000015 00000014 00000b83 805ae000 8000003f 80498d6c
          8065ccc0 00000020 00000008 0000012c 804e4940 87f5bc90 fffede5a 804e0000
          804e0000 802c8b34 0000004b 87812758 81014d20 805ae000 804e0000 804af20c
          804e0000 804af51c 87f5bc90 87f5bc90 87f5bc98 87f5bc98 80633f2c 00000004
          ...
[    3.659167] Call Trace:
[    3.661612] [<802e7b10>] eth_type_trans+0x14/0x210
[    3.666402] [<80278498>] xrx200_poll_rx+0x134/0x1d8
[    3.671295] [<802c8b34>] net_rx_action+0x118/0x2e0
[    3.676076] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    3.680762] [<800311c8>] do_softirq.part.1+0x70/0x78
[    3.685723] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    3.690945] [<802790c4>] xrx200_open+0x144/0x1e8
[    3.695564] [<802cc2cc>] __dev_open+0xf0/0x1ac
[    3.700000] [<802cc638>] __dev_change_flags+0xc0/0x17c
[    3.705134] [<802cc71c>] dev_change_flags+0x28/0x70
[    3.710011] [<80338678>] devinet_ioctl+0x2b4/0x8a8
[    3.714818] [<802ad630>] sock_ioctl+0x294/0x2f0
[    3.719313] 
[    3.720791] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.730772] ---[ end trace 87bbe4e66f1c36ee ]---
[    3.737649] Kernel panic - not syncing: Fatal exception in interrupt
[    3.743776] Rebooting in 1 seconds..
Michael Thomson commented on 30.01.2017 23:47

The third box took a little time to sysupgrade, because it repeatedly crashed on attempts to boot the installimage from the TFTP server (?)

After successfully upgrading to r3210-4096d33, Iit's also crashing on startup (with LAN) showing the unaligned access cause:

[    2.175029] init: Console is alive
[    2.177314] init: - watchdog -
[    2.342244] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.410102] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.271341] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.274911] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.282264] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.287791] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.295217] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.302786] hub 1-0:1.0: USB hub found
[    3.305634] hub 1-0:1.0: 1 port detected
[    3.310605] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.327807] init: - preinit -
[    3.529755] Unhandled kernel unaligned access[#1]:
[    3.533094] CPU: 0 PID: 355 Comm: ip Not tainted 4.4.45 #0
[    3.538573] task: 87d4ed48 ti: 87f9e000 task.ti: 87f9e000
[    3.543962] $ 0   : 00000000 00000000 87e1f92a 00000000
[    3.549184] $ 4   : 87c1b6c0 ffff3bff 00000020 00000000
[    3.554406] $ 8   : 00000040 87e1f962 00000000 00000000
[    3.559628] $12   : 10032080 773b82c0 00000000 00400bd0
[    3.564851] $16   : 8065ccc0 00000040 87c1b6c0 ffff3bff
[    3.570073] $20   : 00000000 c0000000 00000020 87d78800
[    3.575295] $24   : 00000000 80028a38                  
[    3.580518] $28   : 87f9e000 87f9fc08 8065c898 80278498
[    3.585741] Hi    : 000001cb
[    3.588613] Lo    : 00001e75
[    3.591512] epc   : 802e7b10 eth_type_trans+0x14/0x210
[    3.596636] ra    : 80278498 xrx200_poll_rx+0x134/0x1d8
[    3.601843] Status: 1100fc03 KERNEL EXL IE 
[    3.606023] Cause : 00800010 (ExcCode 04)
[    3.610025] BadVA : ffff3d33
[    3.612899] PrId  : 00019556 (MIPS 34Kc)
[    3.616813] Modules linked in: dwc2 gpio_button_hotplug aead cryptomgr crypto_null
[    3.624393] Process ip (pid: 355, threadinfo=87f9e000, task=87d4ed48, tls=773b9d48)
[    3.632045] Stack : 805a4000 00000b83 87f48030 87f48000 8065ccc0 71000040 87c1b6c0 8065ccc0
          00000040 80278498 00000015 00000014 00000b83 805a4000 8000003f 80498d6c
          8065ccc0 00000020 00000008 0000012c 804e4940 87f9fc90 fffede5e 804e0000
          804e0000 802c8b34 0000004b 87812758 81014be0 805a4000 804e0000 804af20c
          804e0000 804af51c 87f9fc90 87f9fc90 87f9fc98 87f9fc98 80633f2c 00000004
          ...
[    3.667558] Call Trace:
[    3.670003] [<802e7b10>] eth_type_trans+0x14/0x210
[    3.674793] [<80278498>] xrx200_poll_rx+0x134/0x1d8
[    3.679686] [<802c8b34>] net_rx_action+0x118/0x2e0
[    3.684467] [<800310e0>] __do_softirq+0x2a0/0x2b8
[    3.689153] [<800311c8>] do_softirq.part.1+0x70/0x78
[    3.694114] [<80031258>] __local_bh_enable_ip+0x88/0xc0
[    3.699336] [<802790c4>] xrx200_open+0x144/0x1e8
[    3.703955] [<802cc2cc>] __dev_open+0xf0/0x1ac
[    3.708391] [<802cc638>] __dev_change_flags+0xc0/0x17c
[    3.713525] [<802cc71c>] dev_change_flags+0x28/0x70
[    3.718401] [<80338678>] devinet_ioctl+0x2b4/0x8a8
[    3.723209] [<802ad630>] sock_ioctl+0x294/0x2f0
[    3.727703] 
[    3.729181] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.739187] ---[ end trace 5e0bdb96a84ed0a6 ]---
[    3.746046] Kernel panic - not syncing: Fatal exception in interrupt
[    3.752167] Rebooting in 1 seconds..
Michael Thomson commented on 03.02.2017 19:59

Prepare for possible stupidity but...

In the call trace, my understanding is that the most recent is on top, and hierarchically back from there?

So - looking in the xrx200_poll_rx I can't see any code that calls eth_type_trans directly - is this appearing through some side-effect or is there an indirection at work here?

Mike

P.S. I updated to r3257-geaf3fef in case the SMP patch sorted anything out - no joy, unaligned access panic at the same point.

Project Manager
Mathias Kresin commented on 04.02.2017 09:44
In the call trace, my understanding is that the most recent is on top, and hierarchically back from there?
So - looking in the xrx200_poll_rx I can't see any code that calls eth_type_trans directly - is this appearing through some side-effect or is there an indirection at work here?

No, what you are seeing is the result of the compiler optimisation. eth_type_trans() is called from xrx200_hw_receive(). And xrx200_hw_receive() is only called from xrx200_poll_rx(). The compiler merged the code from xrx200_hw_receive() into xrx200_poll_rx().

but consistent reboot loop if it's connected to my gigabit switch.

Michael, are your iMac and the HomeHub the only devices connected to your gigabit switch? Do you have the possibility to capture packages send from the switch (monitor port)? As you suspect the network traffic could be the culprit, it would be interesting to see what kind of network traffic you actually have.

Michael Thomson commented on 04.02.2017 16:56

Hi Matthias
Thanks for the explanation, as ever!
Currently I have a Zyxel GS1910-24 managed switch, which can do port mirroring. There's a pile of things connected to it in addition to the HH5A and the iMac, I haven't tried isolating the HH to see if it's down to a different device in the LAN, but I assume it will be.
I've not tried the port mirroring before, but I assume what you're suggesting is to hang the HH5A on one port, that's seeing traffic from the rest of the LAN, mirror all packets Tx and Tx to another port, and record them e.g. using WireShark on my iMac?
I'll try some experiments and see what I can set up.
I don't think it'll be easy to get a timestamp for the panic in the packet stream, unless there's a deterministic sequence transmitted as the HH restarts? But I guess if we can see a packet transmitted at some point in the boot sequence we can compare the timestamps in the kernel log vs the WireShark trace?
<rolls up sleeves> This should be fun...

Project Manager
Mathias Kresin commented on 04.02.2017 17:34
I've not tried the port mirroring before, but I assume what you're suggesting is to hang the HH5A on one port, that's seeing traffic from the rest of the LAN, mirror all packets Tx and Tx to another port, and record them e.g. using WireShark on my iMac?

Yes that is how port mirroring works. Basically you tell the switch to send a copy of all packages received on/send to a port (or multiple ports) to the mirror port.

There's a pile of things connected to it in addition to the HH5A and the iMac, I haven't tried isolating the HH to see if it's down to a different device in the LAN, but I assume it will be.

Please try to isolate the device which makes the HomeHub to panic. Otherwise there might be to much noise in your package capture.

I don't think it'll be easy to get a timestamp for the panic in the packet stream, unless there's a deterministic sequence transmitted as the HH restarts?

My idea is to fire all packages found in your capture to the HomeHub, to track down the issue to a specific kind of traffic and to reliable reproduce the issue.

Project Manager
Felix Fietkau commented on 04.02.2017 18:36
Michael Thomson commented on 04.02.2017 21:10

Thanks Felix, I will do shortly. I have grabbed two pcap traces with Wireshark:

Clean booted the HH5a while disconnected from the LAN, started port mirror capture and plugged it in
"LAN connect after isolated boot.pcapng"

While the HH5a was happily running connected to the LAN, I rebooted it and grabbed traffic over several panic / restarts.
"Rebooted while connected to LAN - multiple unaligned panics.pcapng"

Michael Thomson commented on 04.02.2017 22:58

Sorry Felix
I built r3293-f791fb4 with your patch, but I'm still getting unaligned access panics consistently.
There's no change in behaviour, though a slightly different call trace, see attached.

I can give you more Wireshark traces if it'd help, but I suspect there's no difference to those above.

I'll start weeding my LAN to see if I can figure out if this behaviour is due to a local bad actor...

[    2.210699] init: Console is alive
[    2.213019] init: - watchdog -
[    2.382312] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    2.425788] dwc2 1e101000.ifxhcd: requested GPIO 495
[    3.284207] dwc2 1e101000.ifxhcd: DWC OTG Controller
[    3.287766] dwc2 1e101000.ifxhcd: new USB bus registered, assigned bus number 1
[    3.295136] dwc2 1e101000.ifxhcd: irq 62, io mem 0x00000000
[    3.300646] dwc2 1e101000.ifxhcd: Hardware does not support descriptor DMA mode -
[    3.308088] dwc2 1e101000.ifxhcd: falling back to buffer DMA mode.
[    3.315840] hub 1-0:1.0: USB hub found
[    3.318319] hub 1-0:1.0: 1 port detected
[    3.323848] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.342213] init: - preinit -
[    3.540171] Unhandled kernel unaligned access[#1]:
[    3.543599] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.46 #0
[    3.549499] task: 805358f8 ti: 80526000 task.ti: 80526000
[    3.554884] $ 0   : 00000000 80680000 87f0792a 00000000
[    3.560103] $ 4   : 87e810c0 7ffeff7f 00000020 00000000
[    3.565323] $ 8   : 00000040 87f07962 00000001 87c1ea80
[    3.570546] $12   : 810f2874 ffffff80 00000000 777462c0
[    3.575768] $16   : 806ad3c0 00000040 87e810c0 7ffeff7f
[    3.580991] $20   : 00000000 c0000000 00000020 87e90800
[    3.586213] $24   : 00001012 8002caa8                  
[    3.591435] $28   : 80526000 87c0dea0 806acf40 802b1d28
[    3.596661] Hi    : 00480e1d
[    3.599530] Lo    : aa3bbf71
[    3.602443] epc   : 803253a8 eth_type_trans+0x14/0x210
[    3.607566] ra    : 802b1d28 xrx200_poll_rx+0x134/0x1d8
[    3.612765] Status: 1100ff03 KERNEL EXL IE 
[    3.616944] Cause : 00800010 (ExcCode 04)
[    3.620948] BadVA : 7fff00b3
[    3.623817] PrId  : 00019556 (MIPS 34Kc)
[    3.627732] Modules linked in: dwc2 gpio_button_hotplug
[    3.632969] Process swapper/0 (pid: 0, threadinfo=80526000, task=805358f8, tls=00000000)
[    3.641049] Stack : 8110dd60 8053270c 00000000 00000001 806ad3c0 71000040 87e810c0 806ad3c0
          00000040 802b1d28 80530000 80074d18 8110dd60 8053270c 8000003f 804dfa30
          806ad3c0 00000020 8110de80 00000003 0000012c 87c0df28 fffede5f 80530000
          87c0df30 80305f58 00000001 80530000 804a0000 80074da0 80530000 804f5f28
          80530000 804f6278 87c0df28 87c0df28 87c0df30 87c0df30 8052804c 00000004
          ...
[    3.676560] Call Trace:
[    3.679015] [<803253a8>] eth_type_trans+0x14/0x210
[    3.683813] [<802b1d28>] xrx200_poll_rx+0x134/0x1d8
[    3.688706] [<80305f58>] net_rx_action+0x13c/0x2f8
[    3.693472] [<80036db0>] __do_softirq+0x2d4/0x2ec
[    3.698170] [<800370d0>] irq_exit+0x94/0xc8
[    3.702364] [<8000e7d4>] plat_irq_dispatch+0xac/0xdc
[    3.707290] 
[    3.708766] 
Code: afb0001c  afbf0024  ac850014 <8ca20134> 00a08821  10400004  00808021  00802821  0040f809 
[    3.718772] ---[ end trace aa248b0e31160e28 ]---
[    3.726757] Kernel panic - not syncing: Fatal exception in interrupt
[    3.733705] Rebooting in 1 seconds..
Michael Thomson commented on 04.02.2017 23:10

Also, I tried leaving it rebooting to see if it ever stabilises - I'd not had much patience before.
Here's the result of the reboot/unaligned access panic loops, and subsequent restarts:

6 loops then stable, rebooted manually
5 loops then stable, rebooted manually
1 loops then stable, rebooted manually
24 loops then stable

Doubting my sanity again - I unplugged devices and blocked ports down to a minimal configuration, so unless it's the Zyxel switch itself messing things up, I'm not sure I can point to any other culprit device.

Mauro Mozzarelli commented on 04.02.2017 23:54

I built r3293 and I got panic at boot. See log attached.

The router re-starts and randomly it can panic again or continue to a successful boot as reported by Michael.

With previous releases, once operational the router randomly freezes, but I have not been able yet to capture the logs when it happens because it can be several hours. I am now connecting a serial device permanently and I will post what I will be able to capture.

   boot.log (16.2 KiB)
Mauro Mozzarelli commented on 05.02.2017 11:06

It took a while, but finally I captured the serial console logs when the router freezes.
It looks as it is an issue with the CPU which is not related to the ethernet bug we are dealing with.

Please let me know if you want to deal with it as part of this task or if you want me to open a different task.

   crash.log (112.9 KiB)
Mauro Mozzarelli commented on 06.02.2017 00:20

I reverted to r3220 (the latest I had without xrx200 SMP) and the routers no longer freeze with the issue reported in my previous post.

I opened a new task #471 because this is unrelated to the panic caused by the network driver.

Mauro Mozzarelli commented on 06.02.2017 00:50

@Felix:

Thank you for the patch you posted on 04.02.2017 18:36.
I applied it to r3293 and tested. I still get panics at boot, in addition with the patch (when I get past the boot issue) i get very high latency on the Red Ethernet. When pinging directly from the router another device connected to the Red Ethernet, about 50% of the pings are lost. The Yellow switch instead behaves normally. Without the patch the Red Ethernet behaves normally. I hope this makes sense to you.

Michael Thomson commented on 13.02.2017 07:22

I saw Hauke Mehrten's patch in the commit log, and thought I'd try a build just in case it had changed any behaviour (I assumed not).
The answer was no change in r3426-g4c09f99.
I can see there's a *lot* of activity at the moment in the main trunk, so I'm happy waiting on the sidelines, but if there's anything I can do to help test or try theories please let me know if there's a development branch looking at these panics?
Cheers
Mike

Glyn commented on 13.02.2017 22:43

BT HH5A
Reboot (17.01.0-rc2, r3131-42f3c1f)
192.168.1.x is home network. HH5A has IP 192.168.1.31. Connected to Linksys WAG54GS which in turn is connected to BT HH3 which is the connection to the internet. All copper connections.

If I connect the Linksys to the HH5A on Eth port 3 or 4 I get a kernel panic and the booting loops. Eth 1 or 2 and it boots fine. I've tried an old ADSL router on ports 3 and 4 and it boots fine. Only difference I can see is that the Linksys is on an active network. The HH5A also boots fine if my laptop is plugged in on any port.

Console log of an Eth4 kernel panic attached.

Michael Thomson commented on 14.02.2017 07:30

Hi Glyn
I can't reproduce your result with r3426-g4c09f99.
I get unaligned access panics, with a LAN connection to my switch in any of the LAN ports on reboot.

Ethernet 1 (eth0 port 4)
Ethernet 2 (eth0 port 2)
Ethernet 3 (eth0 port 0)
Ethernet 4 (eth0 port 1)

Also your panic was "Unable to handle kernel paging request" so I think we're seeing different behaviour?
Mike

Glyn commented on 15.02.2017 18:51

Hmmm. Strange. I just powered on with the Linksys in port 4 and it was fine. However, every subsequent attempt has resulted in loops. Happy to run any test anyone might like.

Michael Thomson commented on 15.02.2017 19:57

Glyn, it's a weird bug, see above - I believe the current focus is on the packets floating around in the LAN environment triggering the unaligned access issue on the HomeHub - See Matthias's comment from a couple of weeks ago:

Mathias Kresin commented on 04.02.2017 17:34
Please try to isolate the device which makes the HomeHub to panic. Otherwise there might be to much noise in your package capture. My idea is to fire all packages found in your capture to the HomeHub, to track down the issue to a specific kind of traffic and to reliably reproduce the issue.

Michael Thomson commented on 24.02.2017 23:30

I've kept trying new builds in vain hope, but up to r3581-23dff07 there's no change.

Empirically, the point the LAN cable is attached in the boot process is important. Any of the LAN ports behave similarly (note I have mine set to request an address using DHCP, I don't know if that's relevant). Plugging the LAN cable into the WAN port with that on the default of pppoe doesn't provoke a reboot - at least in the few tries I gave that combination. Should I switch the LAN port to DHCP? I wasn't sure if that'd provide useful information or just be confounding.

If the LAN cable is attached before the HH5a powers up, there's a strong probability (but not certainty) that it'll encounter the kernel unaligned access. (P.S. I'm not sure if the packet trace I provided for Matthias was ever useful in provoking this behaviour synthetically?)

If I plug in during the boot sequence there's a definite tipping point - later in the process, but *before* the point it panics when plugged in from boot - then it seems to start up normally. My guess is the behaviour changes if the cable goes in after the following lines, but everything is going pretty fast so it's hard to be sure:

[    1.212639] eth0: attached PHY [Lantiq XWAY PEF7071] (phy_addr=0:00, irq=-1)
[    1.280615] eth0: attached PHY [Lantiq XWAY PEF7071] (phy_addr=0:01, irq=-1)
[    1.348600] eth0: attached PHY [Lantiq XWAY VR9 GPHY 11G v1.4] (phy_addr=0:11
, irq=-1)
[    1.416599] eth0: attached PHY [Lantiq XWAY VR9 GPHY 11G v1.4] (phy_addr=0:13
, irq=-1)
[    1.424675] net-xrx200: invalid MAC, using random
[    1.492620] eth1: attached PHY [Lantiq XWAY PEF7071] (phy_addr=0:05, irq=-1)

Does any of this help? Can I do anything to assist further? I'm aware you wanted a binary search of my network devices to find a culprit but I'm struggling against demanding family members who object to Dad unplugging The Internet from their gadgets at inconvenient moments ;-)

Mauro Mozzarelli commented on 26.02.2017 15:09

I think we have some evidence that this bug occurs only when the Ethernet (eth0) device driver is initialized, I found that leaving the Ethernet disconnected and plugging the LAN port in after boot has completed it is a reliable way to circumvent the kernel panic.

What happens when the device driver is initialized? Does it clear the registers and starts fresh? Does it start reading whatever is in the registers?

Michael Thomson commented on 14.03.2017 00:04

Hi folks

I've just build r3716-gcd0f990 having seen interesting commits from Mathias ("lantiq: xrx200: use vlan for ethernet wan port", and "lantiq: fix broadcasts and vlans in two iface mode") that I thought might change the boot initialisation behaviour.

It's probably not definitive, but I've manually rebooted around 10 times now and have not seen a repeat of the kernel panics that were highly likely in previous builds, on my network at least.

I'll keep poking this with sharp sticks, but this is the most significant reduction in kernel panics I've seen to date, so thanks if this has unexpectedly had a side benefit?

Project Manager
Mathias Kresin commented on 14.03.2017 09:38

It is an unexpected side benefit and most likely not properly fixed. I guess the kernel panics will reappear soon or later.

Project Manager
Mathias Kresin commented on 10.08.2017 06:13

I'm going to close this bug as "deferred". In the meantime there were some changes to the network driver and it might be that one of the changes fixed the kernel panic. At least I did not received any further reports about the panic.

Please reopen the bug if you hit the kernel panic again.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing