LEDE Project

  • Status Researching
  • Percent Complete
  • Task Type Bug Report
  • Category Base system
  • Assigned To
    Matthias Schiffer
  • Operating System All
  • Severity High
  • Priority Medium
  • Reported Version lede-17.01
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 3
  • Private
Attached to Project: LEDE Project
Opened by Matthias Schiffer - 01.06.2017
Last edited by Matthias Schiffer - 01.06.2017

FS#822 - busybox (ash) sporadically segfaults running shell scripts on ar71xx

I’m often seeing this message in my logs:

[ 2183.499756] do_page_fault(): sending SIGSEGV to dhcpv6.script for invalid read access from 00000000    
[ 2183.509195] epc = 0041efe9 in busybox[400000+4b000]                                                    
[ 2183.514285] ra  = 0041efb1 in busybox[400000+4b000]                                                    

The issue might be a new variant of  FS#251 , which disappeared after a busybox upgrade.

I’m on a recent lede-17.01 version (dfecce60e6e75abf3ea817fe7bf29fd432693f13), with the follow adjustments to busybox:


Hardware: TP-Link TL-WR841ND v9 (QCA9533)

My analysis so far:


          zero       at       v0       v1       a0       a1       a2       a3
 R0   00000000 7fcfd4c8 00000000 0090b7c0 00910df4 00910df9 2f2f2f2f bcd0a2f0 
            t0       t1       t2       t3       t4       t5       t6       t7
 R8   fefefeff 80808080 80083c3c 002f3634 7fcf9698 00000000 00000000 77a6e2c0 
            s0       s1       s2       s3       s4       s5       s6       s7
 R16  0090ccc0 7fcf9900 00000004 0040788d 77a64000 77a64000 77a67518 77a68d8c 
            t8       t9       k0       k1       gp       sp       s8       ra
 R24  0045b0e8 77a36d44 00000000 00000000 77a6e2c0 7fcf97d0 00000000 0041efb1 
            sr       lo       hi      bad    cause       pc
      0000f413 00000048 00000013 00000000 00800008 0041efe9 
           fsr      fir
      00000000 00000000 


Dump of assembler code for function find_command:                                                         
   0x0041ef45 <+0>:     save    a0-a3,232,ra,s0-s1                                                        
   0x0041ef49 <+4>:     move    s1,a1                                                                     
   0x0041ef4b <+6>:     jal     0x449c61 <strchr@mips16plt>                                               
   0x0041ef4f <+10>:    li      a1,47
   0x0041ef51 <+12>:    beqz    v0,0x41ef79 <find_command+52>
   0x0041ef53 <+14>:    li      v0,1
   0x0041ef55 <+16>:    neg     v0
   0x0041ef57 <+18>:    lw      v1,240(sp)
   0x0041ef59 <+20>:    sw      v0,4(s1)
   0x0041ef5b <+22>:    li      v0,2
   0x0041ef5d <+24>:    and     v0,v1
   0x0041ef5f <+26>:    bnez    v0,0x41ef65 <find_command+32>
   0x0041ef61 <+28>:    li      v0,0
   0x0041ef63 <+30>:    b       0x41ef75 <find_command+48>
   0x0041ef65 <+32>:    lw      a0,232(sp)
   0x0041ef67 <+34>:    jal     0x44a441 <stat@mips16plt>
   0x0041ef6b <+38>:    addiu   a1,sp,56
   0x0041ef6d <+40>:    slti    v0,0
   0x0041ef6f <+42>:    bteqz   0x41ef61 <find_command+28>
   0x0041ef71 <+44>:    li      v0,1
   0x0041ef73 <+46>:    neg     v0
   0x0041ef75 <+48>:    sb      v0,0(s1)
   0x0041ef77 <+50>:    b       0x41f1af <find_command+618>
   0x0041ef79 <+52>:    lw      v0,0x41f1b4 <find_command+623>
   0x0041ef7b <+54>:    lw      a0,244(sp)
   0x0041ef7d <+56>:    lw      v0,0(v0)
   0x0041ef7f <+58>:    addiu   v0,124
   0x0041ef81 <+60>:    lw      v0,88(v0)
   0x0041ef83 <+62>:    addiu   v0,5
   0x0041ef85 <+64>:    xor     v0,a0
   0x0041ef87 <+66>:    sltiu   v0,1
   0x0041ef89 <+68>:    move    v1,t8
   0x0041ef8b <+70>:    sw      v1,40(sp)
   0x0041ef8d <+72>:    beqz    v0,0x41efa9 <find_command+100>
   0x0041ef8f <+74>:    lw      v0,240(sp)
   0x0041ef91 <+76>:    li      s0,8
   0x0041ef93 <+78>:    lw      a1,0x41f1b8 <find_command+627>
   0x0041ef95 <+80>:    jal     0x449ba1 <strstr@mips16plt>
   0x0041ef99 <+84>:    or      s0,v0
   0x0041ef9b <+86>:    beqz    v0,0x41efa7 <find_command+98>
   0x0041ef9d <+88>:    lw      v1,240(sp)
   0x0041ef9f <+90>:    li      v0,40
   0x0041efa1 <+92>:    or      v1,v0
   0x0041efa3 <+94>:    sw      v1,240(sp)
   0x0041efa5 <+96>:    b       0x41efa9 <find_command+100>
   0x0041efa7 <+98>:    sw      s0,240(sp)
   0x0041efa9 <+100>:   lw      a0,232(sp)
   0x0041efab <+102>:   jal     0x41a75d <cmdlookup>
   0x0041efaf <+106>:   li      a1,0
   0x0041efb1 <+108>:   move    s0,v0
   0x0041efb3 <+110>:   beqz    v0,0x41efdd <find_command+152>
   0x0041efb5 <+112>:   lb      v0,8(v0)
   0x0041efb7 <+114>:   cmpi    v0,1
   0x0041efb9 <+116>:   bteqz   0x41efc3 <find_command+126>
   0x0041efbb <+118>:   cmpi    v0,2
   0x0041efbd <+120>:   btnez   0x41efc7 <find_command+130>
   0x0041efbf <+122>:   li      v0,32
   0x0041efc1 <+124>:   b       0x41efc9 <find_command+132>
   0x0041efc3 <+126>:   li      v0,4
   0x0041efc5 <+128>:   b       0x41efc9 <find_command+132>
   0x0041efc7 <+130>:   li      v0,8
   0x0041efc9 <+132>:   lw      v1,240(sp)
   0x0041efcb <+134>:   and     v0,v1
   0x0041efcd <+136>:   bnez    v0,0x41efd7 <find_command+146>
   0x0041efcf <+138>:   lbu     v0,9(s0)
   0x0041efd1 <+140>:   beqz    v0,0x41f1a3 <find_command+606>
   0x0041efd5 <+144>:   b       0x41efdd <find_command+152>
   0x0041efd7 <+146>:   li      v0,0
   0x0041efd9 <+148>:   li      s0,0
   0x0041efdb <+150>:   sw      v0,40(sp)
   0x0041efdd <+152>:   jal     0x41b025 <find_builtin>
   0x0041efe1 <+156>:   lw      a0,232(sp)
   0x0041efe3 <+158>:   sw      v0,44(sp)
   0x0041efe5 <+160>:   beqz    v0,0x41f00f <find_command+202>
   0x0041efe7 <+162>:   lw      v0,0(v0)
=> 0x0041efe9 <+164>:   lbu     v1,0(v0)
   0x0041efeb <+166>:   li      v0,2
   0x0041efed <+168>:   and     v0,v1
   0x0041efef <+170>:   bnez    v0,0x41f177 <find_command+562>
   0x0041eff3 <+174>:   lw      v1,240(sp)
   0x0041eff5 <+176>:   li      v0,8
   0x0041eff7 <+178>:   and     v0,v1
   0x0041eff9 <+180>:   beqz    v0,0x41f005 <find_command+192>
   0x0041effb <+182>:   li      v0,32
   0x0041effd <+184>:   and     v0,v1
   0x0041efff <+186>:   beqz    v0,0x41f177 <find_command+562>
   0x0041f003 <+190>:   b       0x41f00f <find_command+202>
   0x0041f005 <+192>:   lw      v0,0x41f1bc <find_command+631>
   0x0041f007 <+194>:   lw      v0,0(v0)

As in  FS#251 , the contents of the registers don’t really make sense. Unless I’m overlooking something, it should not be possible for $pc to reach 0x0041efe9 with $ra on 0x0041efb1 (return from cmdlookup); rather, $ra should have the value 0x0041efe3 (return from find_builtin). There are no code paths reaching 0x0041efe9 that don’t call find_builtin.

Project Manager
Matthias Schiffer commented on 10.06.2017 10:51

Stack dump around $sp (7fcf97d0):

(gdb) x/64 $sp-128
0x7fcf9750:     0x0000019e      0x00000000      0x00000000      0x00000000
0x7fcf9760:     0x0091a6e8      0x00000000      0x0000000b      0x0041c70b
0x7fcf9770:     0x0090d63b      0x0045b544      0x0091a6f4      0x77a66f98
0x7fcf9780:     0x77a52c8c      0x779ed000      0x77a6e2c0      0x0090d647
0x7fcf9790:     0x00000000      0x00000000      0x77a68de0      0x0041d15f
0x7fcf97a0:     0x77a6e2c0      0x00000003      0x0000002f      0x77a36c48
0x7fcf97b0:     0x00000000      0x0091a6f4      0x00000000      0x0090b7c0
0x7fcf97c0:     0x77a6e2c0      0x7fcfdd78      0x7fcf9900      0x0041efb1
0x7fcf97d0:     0x00910df4      0x00000000      0x77a64000      0x77a64000
0x7fcf97e0:     0x77a67518      0x77a68d8c      0x00000000      0x77a6e2c0
0x7fcf97f0:     0x00000102      0x00000000      0x00000001      0x0090ccc0
0x7fcf9800:     0x00000000      0x00910df4      0x0045b7d8      0x77a3752c
0x7fcf9810:     0x77a6e2c0      0x00418f9d      0x00000000      0x80000400
0x7fcf9820:     0x77a6e2c0      0x00000000      0x00910df4      0x0041ad53
0x7fcf9830:     0x00000000      0x00000008      0x00000004      0x00000008
0x7fcf9840:     0x0045b7d8      0x0041a251      0x77a67518      0x00910df4
Marcus Scharf commented on 13.06.2017 01:41

I can confirm this issues on LEDE+Gluon "gluon-v2017.1-2-g4827f2d" with WR842v3.1

[ 1571.670269] eth1: link down
[ 1571.674048] br-wan: port 1(eth1) entered disabled state
[ 1671.865533] 
[ 1671.865533] do_page_fault(): sending SIGSEGV to dhcpv6.script for invalid read access from 00000000
[ 1671.874977] epc = 0041efe9 in busybox[400000+4b000]
[ 1671.880083] ra  = 0041efb1 in busybox[400000+4b000]
[ 1671.885139] 
Baptiste Jonglez commented on 29.07.2017 10:37

kernel 4.4.79 has several commits fixing issues on MIPS: https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.79

In particular this commit 4c7d28c1e99d ("MIPS: math-emu: Prevent wrong ISA mode instruction emulation") looks related

Project Manager
Matthias Schiffer commented on 30.07.2017 19:45

Checked with 4.4.79, no change.

I have also added some debug messages to the kernel for trap-related context switches (like math emulation), and if I didn't overlook anything, no context switches happen anywhere close to these crashes.


