LEDE Project

  • Status New
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: LEDE Project
Opened by Martin Dindos - 10.02.2017

FS#494 - NETDEV WATCHDOG: ptm0 (): transmit queue 0 timed out

I have a VDSL line with Plusnet (UK) - the connection is pppoe - ptm0.101. With the supplied modem/router the line and connection is stable with no disconnects. With LEDE the connection is established and works well until it disconnects (sometimes as early as few minutes other times it stays connected up to an hour). After the disconnect there is no reconnection until a reboot. Restarting wan interface (ifdown wan/ifup wan) or dsl connection (/etc/init.d/dslcontrol stop/start) does not help.

Supply the following if possible:
- Device problem occurs on TP-w8970 and BTHomehub 5A (same crash on both lantiq based devices)
- Software versions of LEDE release, packages, etc. Tested on LEDE RC2, earlier LEDE snapshot from October 2016, Openwrt CC - same symptoms
- Steps to reproduce - VDSL connection on ptm0.101 via pppoe

Here is a trace of the crash (dmesg):

[ 1414.124413] —[ beginning trace ff034b465cdad16b ]— [ 1414.125631] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1a8/0x2f0()
[ 1414.126471] NETDEV WATCHDOG: ptm0 (): transmit queue 0 timed out
[ 1414.132456] Modules linked in: ltq_ptm_vr9 option iptable_nat ath9k usb_wwan rt2800usb rt2800lib
pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 l2tp_ppp ipt_REJECT ipt_MASQUERADE ath9k_common
xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac
xt_limit xt_length xt_id xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit
xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial rt2x00usb
rt2x00lib pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN
ip_tables crc_itu_t crc_ccitt cdc_acm ath9k_hw ath10k_pci ath10k_core ath mac80211 cfg80211 compat drv_dsl_cpe_api
drv_mei_cpe xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net
ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark
ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc
l2tp_ip6 l2tp_ip l2tp_eth sit l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp xfrm4_tunnel xfrm4_mode_tunnel
xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 tunnel4 ip_tunnel tun af_key xfrm_user xfrm_ipcomp xfrm_algo br2684 atm
drv_ifxos echainiv sha256_generic sha1_generic jitterentropy_rng drbg md5 hmac des_generic cbc authenc usb_storage
dwc2 uhci_hcd ehci_platform ehci_hcd sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache aead crypto_null
[ 1414.287462] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.7 #1
[ 1414.293130] Stack : 804b0000 00000001 00000000 00000000 805172b8 80516f43 80489a24 00000000
[ 1414.293130] 80673844 00010000 80510000 805159bc 80515abc 80055664 00000003 80510000
[ 1414.293130] 80491b4c 00000000 8048ff50 80511c44 80515abc 800535b0 00000006 00000001
[ 1414.293130] 00000000 80512000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1414.293130] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1414.293130] ...
[ 1414.328618] Call Trace:
[ 1414.331095] [<800178a8>] show_stack+0×50/0×84 [ 1414.335454] [<8002af48>] warn_slowpath_common+0xa0/0xd0
[ 1414.340670] [<8002afa4>] warn_slowpath_fmt+0x2c/0×38 [ 1414.345636] [<802e637c>] dev_watchdog+0x1a8/0x2f0
[ 1414.350348] [<8005f7b0>] call_timer_fn.isra.5+0×24/0×80 [ 1414.355557] [<8005fa2c>] run_timer_softirq+0x1a4/0×208 [ 1414.360694] [<8002de80>] __do_softirq+0×298/0x2b0
[ 1414.365388] [<80002430>] ret_from_irq+0×0/0×4 [ 1414.369760] [<80013a8c>] r4k_wait_irqoff+0×18/0×20 [ 1414.374528] [<8004ff6c>] cpu_startup_entry+0xa4/0xf8
[ 1414.379508] [<80539bf8>] start_kernel+0×474/0×494 [ 1414.384180]
[ 1414.385631] —[ end trace ff034b465cdad16b ]—

 


Project Manager
Mathias Kresin commented on 11.02.2017 08:30

I can confirm the issue. I'm seeing the same using vdsl + ptm + vlan but never heard of someone else having these problems. I was the opinion it is related to my local (development) changes.

Sometimes it does work for weeks, sometimes it does work only for days. Never found a way to trigger the warning/crash.

And I can confirm that only after a reboot the pppoe discover works again. Simply unloading the ptm kernel module or similar does not work.

After the warning is shown and the pppoe discovery doesn't work any longer, things like querying the ptm carrier state fail as well:

root@LEDE:~# cat /sys/devices/virtual/net/ptm0/carrier
cat: read error: Invalid argument

If tried different xdsl firmware version to make sure that it's not related to a crash of the xdsl firmware.

Carl-Daniel Hailfinger commented on 17.02.2017 11:40

With o2 ADSL (Annex B) I'm seeing this usually within 4 minutes after boot. I don't even get an initial PPPoE connection established before everything falls over.

syslog is attached.
Output from dsl_control follows.
root@LEDE:~# /etc/init.d/dsl_control status
ATU-C Vendor ID: Broadcom 147.158
ATU-C System Vendor ID: 00,00,30,30,30,30,00,00
Chipset: Lantiq-VRX200 Unknown
Firmware Version: 5.7.4.4.0.2
API Version: 4.17.18.6
XTSE Capabilities: 0x0, 0x0, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0
Annex: B
Line Mode: G.992.5 (ADSL2+)
Profile:
Line State: UP [0x801: showtime_tc_sync]
Forward Error Correction Seconds (FECS): Near: 0 / Far: 178461
Errored seconds (ES): Near: 0 / Far: 7808
Severely Errored Seconds (SES): Near: 0 / Far: 1913
Loss of Signal Seconds (LOSS): Near: 0 / Far: 8
Unavailable Seconds (UAS): Near: 48 / Far: 48
Header Error Code Errors (HEC): Near: 0 / Far: 608262
Non Pre-emtive CRC errors (CRC_P): Near: 0 / Far: 0
Pre-emtive CRC errors (CRCP_P): Near: 0 / Far: 0
Power Management Mode: L0 - Synchronized
Latency / Interleave Delay: Down: Interleave (8.0 ms) / Up: Interleave (8.0 ms)
Data Rate: Down: 10.988 Mb/s / Up: 1.150 Mb/s
Line Attenuation (LATN): Down: 20.8dB / Up: 7.8dB
Signal Attenuation (SATN): Down: 19.0dB / Up: 8.0dB
Noise Margin (SNR): Down: 9.2dB / Up: 9.4dB
Aggregate Transmit Power (ACTATP): Down: 18.6dB / Up: 12.6dB
Max. Attainable Data Rate (ATTNDR): Down: 11.104 Mb/s / Up: 1.234 Mb/s
Line Uptime Seconds: 625
Line Uptime: 10m 25s

Carl-Daniel Hailfinger commented on 17.02.2017 12:37

OK, this is interesting. Apparently it only happens if no data is sent/received over the line for some time.
I had incorrectly configured the DSL Encapsulation mode and didn't get any responses from the remote side due to that. With the correct DSL Encapsulation mode the remote side does respond, and the transmit queue timeout doesn't happen anymore.

Guido L. commented on 23.04.2017 21:30
config atm-bridge 'atm'
        option vpi '1'
        option vci '32'
        option encaps 'llc'
        option payload 'bridged'

config dsl 'dsl'
        option annex 'bdmt'
        option xfer_mode 'atm'
        option line_mode 'adsl'

config interface 'wan'
        option proto 'pppoe'
        option ipv6 'auto'
        option username 'username'
        option password 'password'
        option ifname 'nas0'

this are the settings what im using and it works fine :)

Martin Dindos commented on 24.04.2017 11:16

Guido, your comment is not relevant unfortunately as from your config I see you are using ADSL over ATM not VDSL over PTM.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing