2.4.20 nfs bug

2.4.20 nfs bug

Post by Bas Zoetekou » Tue, 11 Feb 2003 21:10:17



Hi guys!

Yesterday I encountered the following bug in my 2.4.20 kernel.  My
machine froze (not instaneously, but disk reads didn't work; I had to
hard resetmy machine), and the following entries were written to syslog:

Feb  9 17:40:31 localhost kernel: kernel BUG at inode.c:1034!
Feb  9 17:40:31 localhost kernel: invalid operand: 0000
Feb  9 17:40:31 localhost kernel: CPU:    0
Feb  9 17:40:31 localhost kernel: EIP:    0010:[iput+552/592]    Tainted: P
Feb  9 17:40:31 localhost kernel: EFLAGS: 00010246
Feb  9 17:40:31 localhost kernel: eax: 00000001   ebx: c5d01580   ecx: c5d01724   edx: c5d01701
Feb  9 17:40:31 localhost kernel: esi: 00000000   edi: ccf19c00   ebp: ccb9b540   esp: cde1ff04
Feb  9 17:40:31 localhost kernel: ds: 0018   es: 0018   ss: 0018
Feb  9 17:40:31 localhost kernel: Process rpciod (pid: 183, stackpage=cde1f000)
Feb  9 17:40:31 localhost kernel: Stack: c9da0d40 ca123e3c 00000046 c9da0cc0 c5d01580 c5d01724 d4ac4dfa c5d01580
Feb  9 17:40:31 localhost kernel:        c9da0cd0 c9da0cc0 c10db3e8 d4ac40cc c9da0cc0 c7d02294 cea647c0 ccb9b540
Feb  9 17:40:31 localhost kernel:        d4ac7c90 cf8ac618 d4a9b1a5 ccb9b668 ccb9b5e4 ccb9b5c8 ccb9b594 ccb9b540
Feb  9 17:40:31 localhost kernel: Call Trace:    [nfs:__insmod_nfs_S.text_L58808+44410/58808] [nfs:__insmod_nfs_S.text_L58808+41036/58808] [nfs:__insmod_nfs_S.text_L58808+56336/58808] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-126555/128] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-112815/128]
Feb  9 17:40:31 localhost kernel:   [schedule+513/832] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-111907/128] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-109948/128] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-110112/128] [kernel_thread+46/64] [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-69600/128]
Feb  9 17:40:31 localhost kernel:   [nfs:__insmod_nfs_O/lib/modules/2.4.20/kernel/fs/nfs/nfs.o_M3E3E+-110112/128]
Feb  9 17:40:31 localhost kernel:
Feb  9 17:40:31 localhost kernel: Code: 0f 0b 0a 04 43 4e 27 c0 e9 fb fd ff ff c7 04 24 2c 00 00 00
F

This is an Athlon 1800+, 256MB memory, SiS mobo, IDE (SiS 5513), scsi
(advansys), running 2.4.20 with the 20021212-2.4.20 acpi patch from
acpi.sf.net.  The machine has nfs utils version 1.0.2 installed (Debian
unstable) and is running nfs-kernel-server.  There were nfs-mounted
disks from a client that was also running Debian unstable with nfs utils
1.0.2.  The mount options were rsize=8192,wsize=8192,nfsvers=3,bg,intr.

The NFS-related lines from my .config are:

CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
# CONFIG_ROOT_NFS is not set
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_SUNRPC=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y

I hope this is any help to you.

--
Kind regards,
+---------------------------------------------------------------+
| Bas Zoetekouw                  | Si l'on sait exactement ce   |
|--------------------------------| que l'on va faire, a quoi    |

| GPG key: 0644fab7              |               Pablo Picasso  |
+---------------------------------------------------------------+

  application_pgp-signature_part
< 1K Download
 
 
 

1. NFS/UDP/IP performance - 2.4.19 v/s 2.4.20, 2.4.20-pre3

Greetings.

There seem to be a remarkable performance difference
between 2.4.19 and 2.4.20/2.4.21-pre3 in regards to
NFS writes/reads. I am not sure, but the problem may not
in NFS but somewhere lower (UDP/IP or core).

For example, in my kernel and network configuration a
write to a new file over NFS on 2.4.19 for 5MB takes 2.5
seconds or so. With everything same (including kernel
configuration) 2.4.20 and 2.4.21-pre3 the same takes
11 or more seconds.

Also, when this file write is in progress, the system
time goes up to 15% on 2.4.19, whereas on 2.4.20/21-pre3,
it is about 4%. (I use sar/sysstat for this).

Memory accesses dont seem to be the issue either. Test
program to check this show same times and are ok (as I
expect on the board I use).

"netstat -s" or ifconfig or tcpdump traces dont seem to
point to dropped messages, collisions, retransmissions
etc.

The hardware configuration is PowerPC based, and there
are no changes in the board specific IO subsystem between
2.4.19 and 2.4.20/21-pre3. The same compiler is used for
building both the kernels, and have tried this even with
GCC 3.2, with same results.

So, I dont suspect this is either board or compiler
related issue.

Also, I see some differences in handling of the bottom
halves in net/core/dev.c between 2.4.19 and 2.4.20/21-pre3.
Although, I have not gone through these in details to
assert that this is indeed the problem area.

Questions:

  - Has anyone seen this? Perhaps on other platforms (x86 etc)?
    Is there some tunable that has been added (or is different)
    after 2.4.19, and which needs to be tuned?

  - I have tried to enable kernel profiling to find any
    potential problem code areas. But given the low cpu
    utilization during these copies, I am not sure if this
    can give any useful info.

    Could anyone offer any ideas to debug this?

I would appreciate if you copy me on any responses to this post, I
dont subscribe to this list.

Best regards,
-Arun.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Linux vs. BSD

3. IBM x440 problems on 2.4.20 to 2.4.20-rc1-ac3

4. CGI & SSL Problem ! !

5. 2.4.20 + XFS patches + rmap15a + Ingo's 2.4.20-rc3 O(1) sched

6. Puzzle

7. SCSI under 2.4.20-8 but not 2.4.20-18.9 (RH9)

8. kernel panic w/ kernel 2.4.2-1 -> VFS:Unable to mounth root fs on 03:44

9. sbp2.o fails to load on RH rpms - 2.4.20-2.9, 2.4.20-2.10 and 2.4.20-2.11

10. : 2.4.20: fix oopsable bug in OSS PCI sound drivers

11. [Fwd: Bug in Kernel 2.4.20-8]

12. 2.4.20-rc1-ac1 kernel BUG at page_alloc.c:127!

13. 2.4.20-pre4/ext3: Fix the "dump corrupts filesystems" buffer-cache bug.