2.2.20 umount oops (probably smbfs related)

2.2.20 umount oops (probably smbfs related)

Post by Erik Inge Bols » Wed, 10 Apr 2002 21:00:14



Here's an oops that appeared yesterday in umount, after 81 days of uptime
and much automated smbfs mount/umount activity:

Stock kernel 2.2.20. No charset= or other weird options to smbfs.

I seem to remember having seen this once on a 2.2.19pre series kernel as
well.

Ksymoops:

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
current->tss.cr3 = 08f1f000, %cr3 = 08f1f000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0126389>]
EFLAGS: 00010286
eax: 00000000   ebx: 00000000   ecx: cb428000   edx: 0000003c
esi: cd8ef600   edi: 00000000   ebp: ce6a0004   esp: cb429f4c
ds: 0018   es: 0018   ss: 0018
Process umount (pid: 30793, process nr: 116, stackpage=cb429000)
Stack: 00000000 cd8ef644 cd8ef644 cd8ef600 00000004 c012914e cd8ef600 00000004
       fffffffa c14f0004 ce6a8188 c01291f8 00000004 00000000 00000000 00000000
       08050004 c14f2a00 00000000 c01292ed 00000004 00000000 cb428000 08051ea9
Call Trace: [<c012914e>] [<c01291f8>] [<c01292ed>] [<c0129308>] [<c0109144>]
Code: 8b 43 1c 48 75 35 53 e8 9f 9b 00 00 53 e8 31 ee ff ff c7 43

Quote:>>EIP: c0126389 <fput+5/48>

Trace: c012914e <do_umount+ee/144>
Trace: c01291f8 <umount_dev+54/9c>
Trace: c01292ed <sys_umount+ad/bc>
Trace: c0129308 <sys_oldumount+c/10>
Trace: c0109144 <system_call+34/38>
Code:  c0126389 <fput+5/48>                    00000000 <_EIP>: <===
Code:  c0126389 <fput+5/48>                       0:      8b 43 1c                movl   0x1c(%ebx),%eax <===
Code:  c012638c <fput+8/48>                       3:      48                      decl   %eax
Code:  c012638d <fput+9/48>                       4:      75 35                   jne     c01263c4 <fput+40/48>
Code:  c012638f <fput+b/48>                       6:      53                      pushl  %ebx
Code:  c0126390 <fput+c/48>                       7:      e8 9f 9b 00 00          call    c012ff34 <locks_remove_flock+0/90>
Code:  c0126395 <fput+11/48>                      c:      53                      pushl  %ebx
Code:  c0126396 <fput+12/48>                      d:      e8 31 ee ff ff          call    c01251cc <__fput+0/48>
Code:  c012639b <fput+17/48>                     12:      c7 43 00 00 00 00 00    movl   $0x0,0x0(%ebx)

3 warnings issued.  Results may not be reliable.

Right before the oops, I got these lines in dmesg:

ind //email.txt failed, error=-5
smb_lookup: find //email.txt failed, error=-5
smb_retry: signal failed, error=-3
smb_lookup: find //email.txt failed, error=-5
smb_get_length: recv error = 512
smb_request: result -512, setting invalid
smb_dont_catch_keepalive: did not get valid server!

Especially the last line - happened in the same second as the oops,
according to syslog.

Note that the smb share in question is mounted, alive and well as of this
moment, I can read files on it just fine - it's just the umount of it that
oopsed.

This is a production server in heavy use, so no _too_ experimental patches
please, can't reboot it very often :-/

Any fixes handy, anyone? Can't seem to find anything that would fix this
in the 2.2.21pre changelog...

Please CC: me, I'm not on either of the linux-kernel or samba lists.

--
Erik I. Bols?, Triangel Maritech Software AS | Skybert AS
Tlf: 712 41 694         Mobil: 915 79 512

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.2.20 umount oops (probably smbfs related)

Post by Urban Widmar » Sat, 13 Apr 2002 05:30:11



> Process umount (pid: 30793, process nr: 116, stackpage=cb429000)
> Stack: 00000000 cd8ef644 cd8ef644 cd8ef600 00000004 c012914e cd8ef600 00000004
>        fffffffa c14f0004 ce6a8188 c01291f8 00000004 00000000 00000000 00000000
>        08050004 c14f2a00 00000000 c01292ed 00000004 00000000 cb428000 08051ea9
> Call Trace: [<c012914e>] [<c01291f8>] [<c01292ed>] [<c0129308>] [<c0109144>]
> Code: 8b 43 1c 48 75 35 53 e8 9f 9b 00 00 53 e8 31 ee ff ff c7 43

> >>EIP: c0126389 <fput+5/48>
> Trace: c012914e <do_umount+ee/144>
> Trace: c01291f8 <umount_dev+54/9c>
> Trace: c01292ed <sys_umount+ad/bc>
> Trace: c0129308 <sys_oldumount+c/10>
> Trace: c0109144 <system_call+34/38>
> Code:  c0126389 <fput+5/48>                    00000000 <_EIP>: <===
> Code:  c0126389 <fput+5/48>                       0: 8b 43 1c                movl   0x1c(%ebx),%eax <===

Your trace doesn't include any smb_ references, but I suppose the cd8ef644
ones might be. I don't see where do_umount calls fput so ...

Quote:> Right before the oops, I got these lines in dmesg:

> ind //email.txt failed, error=-5
> smb_lookup: find //email.txt failed, error=-5
> smb_retry: signal failed, error=-3

"signal failed, error=-3" means that smbmount is no longer with us. When
that happens smbfs can't get a new connection when the connection is lost
(which is a normal event).

This is usually bad and you may want to investigate why it died/upgrade
your samba version regardless of the patch below. Recent smbmounts can log
to file and with a suitable debuglevel you may find out what happened
(debug=4 or so).

Quote:> smb_lookup: find //email.txt failed, error=-5
> smb_get_length: recv error = 512
> smb_request: result -512, setting invalid
> smb_dont_catch_keepalive: did not get valid server!

smbfs unmount code "put_super" does:
        if (server->sock_file) {
                smb_proc_disconnect(server);
                smb_dont_catch_keepalive(server);
                fput(server->sock_file);
        }

I think what happened is that there was a server->sock_file, but that the
tcp connection behind it was actually dead. -5 is an indication of that.

When it tries to send the disconnect message in smb_proc_disconnect it
detects this, closes sock_file and sets it to NULL.

smb_dont_catch_keepalive prints that error message on a NULL sock_file.

Then when the fput is run the put_super code assumes there is a
sock_file, because it was one in the if ...

If that is what happened the patch below should help. It simply changes
smbfs not to try and send a disconnect message if it isn't connected.
Which makes sense anyway, no need to connect just to say goodbye. Even if
that may the polite thing to do :)

Quote:> Note that the smb share in question is mounted, alive and well as of this
> moment, I can read files on it just fine - it's just the umount of it that
> oopsed.

Sounds strange. Could that be some automounter that mounted another one
for you?

If the patch below doesn't work, try just removing the smb_proc_disconnect
line from put_super. Closing the file disconnects anyway.

/Urban

diff -urN -X exclude linux-2.2.20-orig/fs/smbfs/proc.c linux-2.2.20-smbfs/fs/smbfs/proc.c
--- linux-2.2.20-orig/fs/smbfs/proc.c   Thu Apr 11 21:25:09 2002

 int
 smb_proc_disconnect(struct smb_sb_info *server)
 {
-       int result;
+       int result = -EIO;
+
        smb_lock_server(server);
+        if (server->state != CONN_VALID)
+                goto out;
+
        smb_setup_header(server, SMBtdis, 0, 0);
        result = smb_request_ok(server, SMBtdis, 0, 0);
+
+out:
        smb_unlock_server(server);
        return result;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.2.20 umount oops (probably smbfs related)

Post by Erik Inge Bols » Sat, 13 Apr 2002 17:40:06




> > >>EIP: c0126389 <fput+5/48>
> > Trace: c012914e <do_umount+ee/144>
> > Trace: c01291f8 <umount_dev+54/9c>
> > Trace: c01292ed <sys_umount+ad/bc>
> > Trace: c0129308 <sys_oldumount+c/10>
> > Trace: c0109144 <system_call+34/38>
> > Code:  c0126389 <fput+5/48>                    00000000 <_EIP>: <===
> > Code:  c0126389 <fput+5/48>                       0:    8b 43 1c                movl   0x1c(%ebx),%eax <===

> Your trace doesn't include any smb_ references, but I suppose the cd8ef644
> ones might be. I don't see where do_umount calls fput so ...

Right. Seems that the somewhat ancient ksymoops (0.6e) didn't pick up the
smbfs module's symbols. Will update.

Quote:> This is usually bad and you may want to investigate why it died/upgrade
> your samba version regardless of the patch below. Recent smbmounts can log
> to file and with a suitable debuglevel you may find out what happened
> (debug=4 or so).

Thanks for the tip. Upgrading the 2.0.6 to 2.0.10 ASAP.

Quote:> > smb_lookup: find //email.txt failed, error=-5
> > smb_get_length: recv error = 512
> > smb_request: result -512, setting invalid
> > smb_dont_catch_keepalive: did not get valid server!

> smbfs unmount code "put_super" does:
>    if (server->sock_file) {
>            smb_proc_disconnect(server);
>            smb_dont_catch_keepalive(server);
>            fput(server->sock_file);
>    }

<snip good explanation>

Aha! I traced it as far as these lines myself yesterday, but couldn't
figure out what nulled sock_file, and why. Thanks!

Quote:> If that is what happened the patch below should help. It simply changes
> smbfs not to try and send a disconnect message if it isn't connected.
> Which makes sense anyway, no need to connect just to say goodbye. Even if
> that may the polite thing to do :)

Thanks, will try the patch as soon as I find time to rebuild. Looks sane
:)

Quote:> > Note that the smb share in question is mounted, alive and well as of this
> > moment, I can read files on it just fine - it's just the umount of it that
> > oopsed.

> Sounds strange. Could that be some automounter that mounted another one
> for you?

Could be, I suppose. No automounter running, but the script that oopsed is
run once an hour and does an umount/mount to deal with the windows server
being rebooted - we want the share to stay mounted, no matter if we reboot
the old NT4 box. (If we reboot it and don't do this, we get I/O errors on
accessing the mount point.)

--
Erik I. Bols?, Triangel Maritech Software AS | Skybert AS
Tlf: 712 41 694         Mobil: 915 79 512

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/