Kernel Oops in net/sunrpc/xprt.c/xprt_timer

Kernel Oops in net/sunrpc/xprt.c/xprt_timer

Post by Binesh Bannerje » Tue, 25 Mar 2003 12:19:12



Hi,
        I recently upgraded to 2.4.20 (from 2.2.24) and I'm getting rather
frequent crashes:

I BELIEVE I've traced the EIP to the function xprt_timer.
Basically, it said the EIP was at c02ceeb2 and the function
xprt_timer in System.map is at c02cee5c.

I objdump --disassemble d xprt.o and tried to look at it, but it's way
beyond my knowledge of intel assembler. The machine code it prints tho,
matches the code that the kernel oops prints out exactly tho.

It says that it's a NULL pointer dereference... I could obviously
just put null pointer checks everywhere in the function, but I don't know
what that would do to performance. (Of course, if it's performance degradation
or crashing all the time at 120 mph, I'll take performance degradation any day
of the week)

I'm going to post the disassembly, xprt_timer, and my TYPED copy of the
Kernel oops (Because, it doesn't write this message to any log...)
and hope some kind soul could help me debug it.

(I'll apologize for the length of this post in advance)

Here's the Kernel Ooops text:
<oops>
Unable to handle kernel NULL pointer dereference at virtual address 00000058
printing eip:
        c02ceeb2
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010 [<c02ceeb2>] Not tainted
EFLAGS: 00010202
eax: 0000002c ebx: e0cdf3d8 ecx: 00000008 edx: 00000001
esi: 00000000 edi: 00000001 ebp: d817be6c esp: d817be54
ds: 0018 es: 0018 ss: 0018
Process cjpeg (pid: 7256, stackpage=d817b000)
Stack:  d91d4080 c02cee5c 00000001 f7b7ad40 00000000 e0cdf000 d817be84 c02d00da
        d91d4080 d91d4080 c02d0050 00000000 d817bebc c012239c d91d4080 00000000
        00000020 00000000 c221de20 00000001 00000020 c0403b2c c0403b2c d817bebc
Call Trace:
        [<c02cee5c>] [<c02d00da>] [<c02d0050>] [<c012239c>] [<c011f016>]
        [<c011eee3>] [<c011ec5d>] [<c010a66d>] [<c02d74d0>] [<c0142ce4>]
        [<c0139c68>] [<c0108c3f>]
Code:   8b 40 2c 89 7d f4 83 f8 09 0f 4c c8 b8 01 00 00 00 89 c2 d3
</oops>

Here is the C code for xprt_timer (it's unaltered from 2.4.20)
<xprt_timer>
/*
 * RPC receive timeout handler.
 */
static void
xprt_timer(struct rpc_task *task)
{
        struct rpc_rqst *req = task->tk_rqstp;
        struct rpc_xprt *xprt = req->rq_xprt;

        spin_lock(&xprt->sock_lock);
        if (req->rq_received)
                goto out;

        if (!xprt->nocong) {
                if (xprt_expbackoff(task, req)) {
                        rpc_add_timer(task, xprt_timer);
                        goto out_unlock;
                }
                rpc_inc_timeo(&task->tk_client->cl_rtt);
                xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
        }
        req->rq_nresend++;

        dprintk("RPC: %4d xprt_timer (%s request)\n",
                task->tk_pid, req ? "pending" : "backlogged");

        task->tk_status  = -ETIMEDOUT;
out:
        task->tk_timeout = 0;
        rpc_wake_up_task(task);
out_unlock:
        spin_unlock(&xprt->sock_lock);

}

</xprt_timer>

Here is the assembler of JUST xprt_timer (I have posted the entire
assembler output at
        http://www.hex21.com/~binesh/xprt.s)
<assembler>
0000134c <xprt_timer>:
    134c:       55                      push   %ebp
    134d:       89 e5                   mov    %esp,%ebp
    134f:       83 ec 0c                sub    $0xc,%esp
    1352:       57                      push   %edi
    1353:       56                      push   %esi
    1354:       53                      push   %ebx
    1355:       8b 45 08                mov    0x8(%ebp),%eax
    1358:       8b 58 18                mov    0x18(%eax),%ebx
    135b:       8b 13                   mov    (%ebx),%edx
    135d:       89 55 fc                mov    %edx,0xfffffffc(%ebp)
    1360:       f0 fe 8a 98 09 00 00    lock decb 0x998(%edx)
    1367:       0f 88 ef 10 00 00       js     245c <.text.lock.xprt+0x107>
    136d:       8b 43 6c                mov    0x6c(%ebx),%eax
    1370:       89 45 f8                mov    %eax,0xfffffff8(%ebp)
    1373:       85 c0                   test   %eax,%eax
    1375:       0f 85 af 00 00 00       jne    142a <xprt_timer+0xde>
    137b:       f6 82 7c 09 00 00 02    testb  $0x2,0x97c(%edx)
    1382:       75 5f                   jne    13e3 <xprt_timer+0x97>
    1384:       8b bb 88 00 00 00       mov    0x88(%ebx),%edi
    138a:       8b 45 08                mov    0x8(%ebp),%eax
    138d:       8d 57 01                lea    0x1(%edi),%edx
    1390:       89 93 88 00 00 00       mov    %edx,0x88(%ebx)
    1396:       b9 08 00 00 00          mov    $0x8,%ecx
    139b:       47                      inc    %edi
    139c:       8b 70 14                mov    0x14(%eax),%esi
    139f:       8d 46 2c                lea    0x2c(%esi),%eax
    13a2:       8b 40 2c                mov    0x2c(%eax),%eax
    13a5:       89 7d f4                mov    %edi,0xfffffff4(%ebp)
    13a8:       83 f8 09                cmp    $0x9,%eax
    13ab:       0f 4c c8                cmovl  %eax,%ecx
    13ae:       b8 01 00 00 00          mov    $0x1,%eax
    13b3:       89 c2                   mov    %eax,%edx
    13b5:       d3 e2                   shl    %cl,%edx
    13b7:       39 55 f4                cmp    %edx,0xfffffff4(%ebp)
    13ba:       0f 4d 45 f8             cmovge 0xfffffff8(%ebp),%eax
    13be:       85 c0                   test   %eax,%eax
    13c0:       74 10                   je     13d2 <xprt_timer+0x86>
    13c2:       68 4c 13 00 00          push   $0x134c
    13c7:       8b 55 08                mov    0x8(%ebp),%edx
    13ca:       52                      push   %edx
    13cb:       e8 fc ff ff ff          call   13cc <xprt_timer+0x80>
    13d0:       eb 68                   jmp    143a <xprt_timer+0xee>
    13d2:       f0 ff 46 58             lock incl 0x58(%esi)
    13d6:       6a 92                   push   $0xffffff92
    13d8:       8b 03                   mov    (%ebx),%eax
    13da:       50                      push   %eax
    13db:       e8 f4 ed ff ff          call   1d4 <xprt_adjust_cwnd>
    13e0:       83 c4 08                add    $0x8,%esp
    13e3:       ff 83 8c 00 00 00       incl   0x8c(%ebx)
    13e9:       f6 05 00 00 00 00 01    testb  $0x1,0x0
    13f0:       74 2e                   je     1420 <xprt_timer+0xd4>
    13f2:       b8 36 07 00 00          mov    $0x736,%eax
    13f7:       ba 2e 07 00 00          mov    $0x72e,%edx
    13fc:       85 db                   test   %ebx,%ebx
    13fe:       0f 45 c2                cmovne %edx,%eax
    1401:       50                      push   %eax
    1402:       8b 55 08                mov    0x8(%ebp),%edx
    1405:       0f b7 82 80 00 00 00    movzwl 0x80(%edx),%eax
    140c:       50                      push   %eax
    140d:       68 60 07 00 00          push   $0x760
    1412:       e8 fc ff ff ff          call   1413 <xprt_timer+0xc7>
    1417:       83 c4 0c                add    $0xc,%esp
    141a:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
    1420:       8b 45 08                mov    0x8(%ebp),%eax
    1423:       c7 40 1c 92 ff ff ff    movl   $0xffffff92,0x1c(%eax)
    142a:       8b 55 08                mov    0x8(%ebp),%edx
    142d:       c7 42 74 00 00 00 00    movl   $0x0,0x74(%edx)
    1434:       52                      push   %edx
    1435:       e8 fc ff ff ff          call   1436 <xprt_timer+0xea>
    143a:       8b 55 fc                mov    0xfffffffc(%ebp),%edx
    143d:       b0 01                   mov    $0x1,%al
    143f:       86 82 98 09 00 00       xchg   %al,0x998(%edx)
    1445:       8d 65 e8                lea    0xffffffe8(%ebp),%esp
    1448:       5b                      pop    %ebx
    1449:       5e                      pop    %esi
    144a:       5f                      pop    %edi
    144b:       89 ec                   mov    %ebp,%esp
    144d:       5d                      pop    %ebp
    144e:       c3                      ret    
    144f:       90                      nop    

</assembler>

Thanks!
Binesh Bannerjee

--
"Some say the world will end in fire,
 Some say in ice.
 From what I've tasted of desire
 I hold with those who favor fire.
 But if it had to perish twice
 I think I know enough of hate
 To know that for destruction ice
 Is also great
 And would suffice."
        -- "Fire and Ice" by Robert Frost

    PGP  Key: http://www.hex21.com/~binesh/binesh-public.asc
        Key fingerprint = 421D B4C2 2E96 B8EE 7190  A0CF B42F E71C 7FC3 AD96
    SSH2 Key: http://www.hex21.com/~binesh/binesh-ssh2.pub
    SSH1 Key: http://www.hex21.com/~binesh/binesh-ssh1.pub
OpenSSH  Key: http://www.hex21.com/~binesh/binesh-openssh.pub

 
 
 

Kernel Oops in net/sunrpc/xprt.c/xprt_timer

Post by JAW » Tue, 25 Mar 2003 21:47:45


I am by far no expert at the kernel, so take my comments coming from someone who has not looked at the kernel, but has scanned the
'C' code below and offer some suggestions as to possible probelm area(s). Also if this has been addressed before, I am sorry for
repeating this:

In the following two lines ....

struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;

Someone has set the pointer req to some structure reference inside of task. Then it is propigated to the xprt pointer. If the
tsk->tk_rqstp is NULL, it has not been checked. Also we do not know if *task is not NULL (it may of been checked before here, but I
do not know this).  The same holds true for the xrpt pointer.

HTH
Jerry

Binesh Bannerjee wrote:
> Hi,
>    I recently upgraded to 2.4.20 (from 2.2.24) and I'm getting rather
> frequent crashes:

> I BELIEVE I've traced the EIP to the function xprt_timer.
> Basically, it said the EIP was at c02ceeb2 and the function
> xprt_timer in System.map is at c02cee5c.

> I objdump --disassemble d xprt.o and tried to look at it, but it's way
> beyond my knowledge of intel assembler. The machine code it prints tho,
> matches the code that the kernel oops prints out exactly tho.

> It says that it's a NULL pointer dereference... I could obviously
> just put null pointer checks everywhere in the function, but I don't know
> what that would do to performance. (Of course, if it's performance degradation
> or crashing all the time at 120 mph, I'll take performance degradation any day
> of the week)

> I'm going to post the disassembly, xprt_timer, and my TYPED copy of the
> Kernel oops (Because, it doesn't write this message to any log...)
> and hope some kind soul could help me debug it.

> (I'll apologize for the length of this post in advance)

> Here's the Kernel Ooops text:
> <oops>
> Unable to handle kernel NULL pointer dereference at virtual address 00000058
> printing eip:
>    c02ceeb2
> *pde = 00000000
> Oops: 0000
> CPU: 1
> EIP: 0010 [<c02ceeb2>] Not tainted
> EFLAGS: 00010202
> eax: 0000002c ebx: e0cdf3d8 ecx: 00000008 edx: 00000001
> esi: 00000000 edi: 00000001 ebp: d817be6c esp: d817be54
> ds: 0018 es: 0018 ss: 0018
> Process cjpeg (pid: 7256, stackpage=d817b000)
> Stack:     d91d4080 c02cee5c 00000001 f7b7ad40 00000000 e0cdf000 d817be84 c02d00da
>    d91d4080 d91d4080 c02d0050 00000000 d817bebc c012239c d91d4080 00000000
>    00000020 00000000 c221de20 00000001 00000020 c0403b2c c0403b2c d817bebc
> Call Trace:
>    [<c02cee5c>] [<c02d00da>] [<c02d0050>] [<c012239c>] [<c011f016>]
>    [<c011eee3>] [<c011ec5d>] [<c010a66d>] [<c02d74d0>] [<c0142ce4>]
>    [<c0139c68>] [<c0108c3f>]
> Code:      8b 40 2c 89 7d f4 83 f8 09 0f 4c c8 b8 01 00 00 00 89 c2 d3
> </oops>

> Here is the C code for xprt_timer (it's unaltered from 2.4.20)
> <xprt_timer>
> /*
>  * RPC receive timeout handler.
>  */
> static void
> xprt_timer(struct rpc_task *task)
> {
>    struct rpc_rqst *req = task->tk_rqstp;
>    struct rpc_xprt *xprt = req->rq_xprt;

>    spin_lock(&xprt->sock_lock);
>    if (req->rq_received)
>            goto out;

>    if (!xprt->nocong) {
>            if (xprt_expbackoff(task, req)) {
>                    rpc_add_timer(task, xprt_timer);
>                    goto out_unlock;
>            }
>            rpc_inc_timeo(&task->tk_client->cl_rtt);
>            xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
>    }
>    req->rq_nresend++;

>    dprintk("RPC: %4d xprt_timer (%s request)\n",
>            task->tk_pid, req ? "pending" : "backlogged");

>    task->tk_status  = -ETIMEDOUT;
> out:
>    task->tk_timeout = 0;
>    rpc_wake_up_task(task);
> out_unlock:
>    spin_unlock(&xprt->sock_lock);
> }
> </xprt_timer>

> Here is the assembler of JUST xprt_timer (I have posted the entire
> assembler output at
>    http://www.hex21.com/~binesh/xprt.s)
> <assembler>
> 0000134c <xprt_timer>:
>     134c:  55                      push   %ebp
>     134d:  89 e5                   mov    %esp,%ebp
>     134f:  83 ec 0c                sub    $0xc,%esp
>     1352:  57                      push   %edi
>     1353:  56                      push   %esi
>     1354:  53                      push   %ebx
>     1355:  8b 45 08                mov    0x8(%ebp),%eax
>     1358:  8b 58 18                mov    0x18(%eax),%ebx
>     135b:  8b 13                   mov    (%ebx),%edx
>     135d:  89 55 fc                mov    %edx,0xfffffffc(%ebp)
>     1360:  f0 fe 8a 98 09 00 00    lock decb 0x998(%edx)
>     1367:  0f 88 ef 10 00 00       js     245c <.text.lock.xprt+0x107>
>     136d:  8b 43 6c                mov    0x6c(%ebx),%eax
>     1370:  89 45 f8                mov    %eax,0xfffffff8(%ebp)
>     1373:  85 c0                   test   %eax,%eax
>     1375:  0f 85 af 00 00 00       jne    142a <xprt_timer+0xde>
>     137b:  f6 82 7c 09 00 00 02    testb  $0x2,0x97c(%edx)
>     1382:  75 5f                   jne    13e3 <xprt_timer+0x97>
>     1384:  8b bb 88 00 00 00       mov    0x88(%ebx),%edi
>     138a:  8b 45 08                mov    0x8(%ebp),%eax
>     138d:  8d 57 01                lea    0x1(%edi),%edx
>     1390:  89 93 88 00 00 00       mov    %edx,0x88(%ebx)
>     1396:  b9 08 00 00 00          mov    $0x8,%ecx
>     139b:  47                      inc    %edi
>     139c:  8b 70 14                mov    0x14(%eax),%esi
>     139f:  8d 46 2c                lea    0x2c(%esi),%eax
>     13a2:  8b 40 2c                mov    0x2c(%eax),%eax
>     13a5:  89 7d f4                mov    %edi,0xfffffff4(%ebp)
>     13a8:  83 f8 09                cmp    $0x9,%eax
>     13ab:  0f 4c c8                cmovl  %eax,%ecx
>     13ae:  b8 01 00 00 00          mov    $0x1,%eax
>     13b3:  89 c2                   mov    %eax,%edx
>     13b5:  d3 e2                   shl    %cl,%edx
>     13b7:  39 55 f4                cmp    %edx,0xfffffff4(%ebp)
>     13ba:  0f 4d 45 f8             cmovge 0xfffffff8(%ebp),%eax
>     13be:  85 c0                   test   %eax,%eax
>     13c0:  74 10                   je     13d2 <xprt_timer+0x86>
>     13c2:  68 4c 13 00 00          push   $0x134c
>     13c7:  8b 55 08                mov    0x8(%ebp),%edx
>     13ca:  52                      push   %edx
>     13cb:  e8 fc ff ff ff          call   13cc <xprt_timer+0x80>
>     13d0:  eb 68                   jmp    143a <xprt_timer+0xee>
>     13d2:  f0 ff 46 58             lock incl 0x58(%esi)
>     13d6:  6a 92                   push   $0xffffff92
>     13d8:  8b 03                   mov    (%ebx),%eax
>     13da:  50                      push   %eax
>     13db:  e8 f4 ed ff ff          call   1d4 <xprt_adjust_cwnd>
>     13e0:  83 c4 08                add    $0x8,%esp
>     13e3:  ff 83 8c 00 00 00       incl   0x8c(%ebx)
>     13e9:  f6 05 00 00 00 00 01    testb  $0x1,0x0
>     13f0:  74 2e                   je     1420 <xprt_timer+0xd4>
>     13f2:  b8 36 07 00 00          mov    $0x736,%eax
>     13f7:  ba 2e 07 00 00          mov    $0x72e,%edx
>     13fc:  85 db                   test   %ebx,%ebx
>     13fe:  0f 45 c2                cmovne %edx,%eax
>     1401:  50                      push   %eax
>     1402:  8b 55 08                mov    0x8(%ebp),%edx
>     1405:  0f b7 82 80 00 00 00    movzwl 0x80(%edx),%eax
>     140c:  50                      push   %eax
>     140d:  68 60 07 00 00          push   $0x760
>     1412:  e8 fc ff ff ff          call   1413 <xprt_timer+0xc7>
>     1417:  83 c4 0c                add    $0xc,%esp
>     141a:  8d b6 00 00 00 00       lea    0x0(%esi),%esi
>     1420:  8b 45 08                mov    0x8(%ebp),%eax
>     1423:  c7 40 1c 92 ff ff ff    movl   $0xffffff92,0x1c(%eax)
>     142a:  8b 55 08                mov    0x8(%ebp),%edx
>     142d:  c7 42 74 00 00 00 00    movl   $0x0,0x74(%edx)
>     1434:  52                      push   %edx
>     1435:  e8 fc ff ff ff          call   1436 <xprt_timer+0xea>
>     143a:  8b 55 fc                mov    0xfffffffc(%ebp),%edx
>     143d:  b0 01                   mov    $0x1,%al
>     143f:  86 82 98 09 00 00       xchg   %al,0x998(%edx)
>     1445:  8d 65 e8                lea    0xffffffe8(%ebp),%esp
>     1448:  5b                      pop    %ebx
>     1449:  5e                      pop    %esi
>     144a:  5f                      pop    %edi
>     144b:  89 ec                   mov    %ebp,%esp
>     144d:  5d                      pop    %ebp
>     144e:  c3                      ret    
>     144f:  90                      nop    

> </assembler>

> Thanks!
> Binesh Bannerjee

> --
> "Some say the world will end in fire,
>  Some say in ice.
>  From what I've tasted of desire
>  I hold with those who favor fire.
>  But if it had to perish twice
>  I think I know enough of hate
>  To know that for destruction ice
>  Is also great
>  And would suffice."
>    -- "Fire and Ice" by Robert Frost

>     PGP  Key: http://www.hex21.com/~binesh/binesh-public.asc
>            Key fingerprint = 421D B4C2 2E96 B8EE 7190  A0CF B42F E71C 7FC3 AD96
>     SSH2 Key: http://www.hex21.com/~binesh/binesh-ssh2.pub
>     SSH1 Key: http://www.hex21.com/~binesh/binesh-ssh1.pub
> OpenSSH  Key: http://www.hex21.com/~binesh/binesh-openssh.pub


 
 
 

Kernel Oops in net/sunrpc/xprt.c/xprt_timer

Post by Neil Horma » Tue, 25 Mar 2003 22:30:34



> Hi,
>    I recently upgraded to 2.4.20 (from 2.2.24) and I'm getting rather
> frequent crashes:

> I BELIEVE I've traced the EIP to the function xprt_timer.
> Basically, it said the EIP was at c02ceeb2 and the function
> xprt_timer in System.map is at c02cee5c.

> I objdump --disassemble d xprt.o and tried to look at it, but it's way
> beyond my knowledge of intel assembler. The machine code it prints tho,
> matches the code that the kernel oops prints out exactly tho.

> It says that it's a NULL pointer dereference... I could obviously
> just put null pointer checks everywhere in the function, but I don't know
> what that would do to performance. (Of course, if it's performance degradation
> or crashing all the time at 120 mph, I'll take performance degradation any day
> of the week)

> I'm going to post the disassembly, xprt_timer, and my TYPED copy of the
> Kernel oops (Because, it doesn't write this message to any log...)
> and hope some kind soul could help me debug it.

<<snip>>
Try using the ksymoops tool to decode the oops message.  It will
translate the EPI and stack trace to a set of file names and line
numbers.  The line it translates the EIP into will give you a more
concrete idea of what pointer is NULL.
 
 
 

1. Patch?: linux-2.5.42/net/sunrpc/sunrpc_syms.c - symbols needed by new nfsd

        nfsd in 2.5.42 needs a bunch of symbols that the sunrpc module
does not export.  This patch adds them to net/sunrpc/sunrpc_syms.c.
I am now running the nfsd and sunrpc modules based on this patch.

        I suspect that some of symbol exports in this patch may need
to be bracketed in some kind of #ifdef CONFIG_foo....#endif conditionals.

        Also, I know that having a central exports file like
sunrpc_syms.c impedes efforts to split the module if it turns out
that some users of it only need certain functions, but I thought I
ought to keep this patch as small as possible.  I would be happy to
make a patch to move the EXPORT_SYMBOL declarations in sunrpc_syms.c
to the files that actually define them if there is interest.

--
Adam J. Richter     __     ______________   575 Oroville Road

+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

  sunrpc.diff
< 1K Download

2. Linux ELF interpreter error with linux version of NeTraMet on Freebsd2.2.7

3. How do I shutdown sunrpc and localhost.sunrpc

4. F50 Hangs. Help to force dump!

5. net/sunrpc/svcauth.c trivial header fix

6. Access to ISA bus addresses

7. C99 initializers for net/sunrpc/sysctl.c

8. Amiga announces Linux kernel is new Amiga kernal - Opinions?

9. 2.5.69 net/sunrpc/sunrpc_syms.c (trivial)

10. kernel Oops message -2.4.x - contains ksymoops <oops.txt

11. Kernel Oops kernel-2.4.18.8.2mdk-1-3mdk

12. Kernel still oopsing (was: Kernel oops on Dell PowerEdge 600SC with 2.4.21-pre7)

13. kernel 2.2.6 - Oops then Kernel Panic