PROBLEM: NFS client hangs when X is running (2.4.20)

PROBLEM: NFS client hangs when X is running (2.4.20)

Post by Greg Wooledg » Sun, 13 Apr 2003 01:50:04



[1.] NFS client hangs when X is running (2.4.20)

[2.] The NFS client is running Linux 2.4.20 (but I also saw this same
problem on another system running 2.4.18).  The NFS server is running
OpenBSD 3.2.  Both are i386 systems.  I mount /home from the OpenBSD
system onto the Linux system with option "nolock".  Everything is
OK until I start X.  About 15 to 30 seconds after X starts, the NFS
mount hangs.  Any process which attempts to do anything to any file
on the NFS file system goes into catatonia and cannot be killed
(even with -9; ps shows 'D' for state).

This problem does not occur with any 2.2.x kernel.  I have tried building
2.4.20 with and without CONFIG_NFS_V3 enabled; same results.

[3.] Keywords: NFS client 2.4.20 OpenBSD XFree86 hang lock crash catatonic

[4.] Linux version 2.4.20 (root@griffon) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Fri Apr 11 18:37:04 EDT 2003

[5.] nfs: server pegasus not responding, still trying
However, this is erroneous.  Pegasus (the OpenBSD box) responds
perfectly to ping, showmount -e, ssh and so on.  Any existing ssh
connections to pegasus continue working, even ones I started in an
rxvt window in the 15-30 second period when the NFS subsystem hadn't
locked up yet.  No other errors are reported.

[6.] mount /home; startx; sleep 30

[7.1] (Note: I modified Makefile to use gcc 2.95 instead of 3.2)
Gnu C                  3.2.3
Gnu make               3.80
util-linux             2.11z
mount                  2.11z
modutils               2.4.21
e2fsprogs              1.33-WIP
PPP                    2.4.1
Linux C Library        2.3.1
Dynamic linker (ldd)   2.3.1
Procps                 3.1.8
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               4.5.10
Modules Loaded         serial 3c59x

[7.2]
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2000+
stepping        : 0
cpu MHz         : 1667.388
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3329.22

[7.3]
serial                 42436   0 (autoclean)
3c59x                  25040   1

[7.4]
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(set)
0cf8-0cff : PCI conf1
c000-cfff : PCI Bus #01
  c000-c0ff : PCI device 1002:4966 (ATI Technologies Inc)
d000-d01f : Creative Labs SB Live! EMU10k1
d400-d407 : Creative Labs SB Live! MIDI/Game Port
d800-d87f : 3Com Corporation 3c905C-TX/TX-M [Tornado]
  d800-d87f : 00:0b.0
dc00-dc1f : VIA Technologies, Inc. USB
e000-e01f : VIA Technologies, Inc. USB (#2)
e400-e41f : VIA Technologies, Inc. USB (#3)
e800-e80f : VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE
  e800-e807 : ide0
  e808-e80f : ide1
ec00-ecff : VIA Technologies, Inc. VT6102 [Rhine-II]

00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000d0000-000d07ff : Extension ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
  00100000-00243dd0 : Kernel code
  00243dd1-002c3e23 : Kernel data
1fff0000-1fff2fff : ACPI Non-volatile Storage
1fff3000-1fffffff : ACPI Tables
c0000000-cfffffff : PCI device 1106:3189 (VIA Technologies, Inc.)
d0000000-dfffffff : PCI Bus #01
  d0000000-d7ffffff : PCI device 1002:4966 (ATI Technologies Inc)
  d8000000-dfffffff : PCI device 1002:496e (ATI Technologies Inc)
e0000000-e00fffff : PCI Bus #01
  e0020000-e002ffff : PCI device 1002:4966 (ATI Technologies Inc)
  e0030000-e003ffff : PCI device 1002:496e (ATI Technologies Inc)
e0120000-e012007f : 3Com Corporation 3c905C-TX/TX-M [Tornado]
e0121000-e01210ff : VIA Technologies, Inc. USB 2.0
  e0121000-e01210ff : ehci-hcd
e0122000-e01220ff : VIA Technologies, Inc. VT6102 [Rhine-II]
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffff0000-ffffffff : reserved

[7.5]
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400 AGP] Host Bridge
        Subsystem: ABIT Computer Corp.: Unknown device 1401
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 8
        Region 0: Memory at c0000000 (32-bit, prefetchable) [size=256M]
        Capabilities: [a0] AGP version 2.0
                Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4
                Command: RQ=32 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: e0000000-e00fffff
        Prefetchable memory behind bridge: d0000000-dfffffff
        BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0a.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
        Subsystem: Creative Labs: Unknown device 8065
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (500ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 3
        Region 0: I/O ports at d000 [size=32]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0a.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
        Subsystem: Creative Labs Gameport Joystick
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Region 0: I/O ports at d400 [size=8]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2500ns min, 2500ns max), cache line size 08
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at d800 [size=128]
        Region 1: Memory at e0120000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:10.0 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00 [UHCI])
        Subsystem: ABIT Computer Corp.: Unknown device 1401
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 4: I/O ports at dc00 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.1 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00 [UHCI])
        Subsystem: ABIT Computer Corp.: Unknown device 1401
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Interrupt: pin B routed to IRQ 3
        Region 4: I/O ports at e000 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.2 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00 [UHCI])
        Subsystem: ABIT Computer Corp.: Unknown device 1401
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Interrupt: pin C routed to IRQ 5
        Region 4: I/O ports at e400 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20 [EHCI])
        Subsystem: ABIT Computer Corp.: Unknown device 1401
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Interrupt: pin D routed to IRQ 10
        Region 0: Memory at e0121000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power
...

read more »

  application_pgp-signature_part
< 1K Download
 
 
 

PROBLEM: NFS client hangs when X is running (2.4.20)

Post by Trond Myklebus » Sun, 13 Apr 2003 11:50:08


Sounds very much like a network card driver problem.

     > [5.] nfs: server pegasus not responding, still trying However,
     > this is erroneous.  Pegasus (the OpenBSD box) responds
     > perfectly to ping, showmount -e, ssh and so on.  Any existing
     > ssh connections to pegasus continue working, even ones I
     > started in an rxvt window in the 15-30 second period when the
     > NFS subsystem hadn't locked up yet.  No other errors are
     > reported.

.... and a tcpdump would show?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROBLEM: NFS client hangs when X is running (2.4.20)

Post by Greg Wooledg » Sun, 13 Apr 2003 21:20:09



> Sounds very much like a network card driver problem.

I don't think so, simply because this has happened to me on two
compeletely different Linux systems.  One (griffon) has a 3c59x
card, and the other (dwarf) had a tulip and an ne2k-pci.

(Or did you mean on the server side?  The OpenBSD box is using two
RTL 8139 cards.)

Quote:>      > [5.] nfs: server pegasus not responding, still trying However,
>      > this is erroneous.  Pegasus (the OpenBSD box) responds
>      > perfectly to ping, showmount -e, ssh and so on.  Any existing
>      > ssh connections to pegasus continue working, even ones I
>      > started in an rxvt window in the 15-30 second period when the
>      > NFS subsystem hadn't locked up yet.  No other errors are
>      > reported.

> .... and a tcpdump would show?

I haven't tried that yet, but after discussing this with another
person who was having the same problems, we learned that OpenBSD's
firewall (pf) was blocking the Linux packets when configured with
"scrub in all" (which is recommended in the OpenBSD FAQs).

After commenting out the "scrub in all" and rebooting back to 2.4.20,
I have X and NFS working simultaneously.

--
Greg Wooledge                  |   "Truth belongs to everybody."

http://wooledge.org/~greg/     |

  application_pgp-signature_part
< 1K Download
 
 
 

1. NFS/UDP/IP performance - 2.4.19 v/s 2.4.20, 2.4.20-pre3

Greetings.

There seem to be a remarkable performance difference
between 2.4.19 and 2.4.20/2.4.21-pre3 in regards to
NFS writes/reads. I am not sure, but the problem may not
in NFS but somewhere lower (UDP/IP or core).

For example, in my kernel and network configuration a
write to a new file over NFS on 2.4.19 for 5MB takes 2.5
seconds or so. With everything same (including kernel
configuration) 2.4.20 and 2.4.21-pre3 the same takes
11 or more seconds.

Also, when this file write is in progress, the system
time goes up to 15% on 2.4.19, whereas on 2.4.20/21-pre3,
it is about 4%. (I use sar/sysstat for this).

Memory accesses dont seem to be the issue either. Test
program to check this show same times and are ok (as I
expect on the board I use).

"netstat -s" or ifconfig or tcpdump traces dont seem to
point to dropped messages, collisions, retransmissions
etc.

The hardware configuration is PowerPC based, and there
are no changes in the board specific IO subsystem between
2.4.19 and 2.4.20/21-pre3. The same compiler is used for
building both the kernels, and have tried this even with
GCC 3.2, with same results.

So, I dont suspect this is either board or compiler
related issue.

Also, I see some differences in handling of the bottom
halves in net/core/dev.c between 2.4.19 and 2.4.20/21-pre3.
Although, I have not gone through these in details to
assert that this is indeed the problem area.

Questions:

  - Has anyone seen this? Perhaps on other platforms (x86 etc)?
    Is there some tunable that has been added (or is different)
    after 2.4.19, and which needs to be tuned?

  - I have tried to enable kernel profiling to find any
    potential problem code areas. But given the low cpu
    utilization during these copies, I am not sure if this
    can give any useful info.

    Could anyone offer any ideas to debug this?

I would appreciate if you copy me on any responses to this post, I
dont subscribe to this list.

Best regards,
-Arun.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. sendmail puzzle, or somebody tried to break in???

3. IBM x440 problems on 2.4.20 to 2.4.20-rc1-ac3

4. more 101318-64 conflicting views

5. ~2.4.20-pre3 -> 2.4.21 : nfs client read performance "broken"

6. shell scripts site

7. nfs-server slowdown in 2.4.20-pre10 with client 2.2.19

8. make problem

9. RedHat 7.3 (2.4.20) running on VMware - nfs issues

10. 2.4.20 + XFS patches + rmap15a + Ingo's 2.4.20-rc3 O(1) sched

11. SCSI under 2.4.20-8 but not 2.4.20-18.9 (RH9)

12. PROBLEM: 2.4.20 Hangs, 2.4.2 didn't

13. PROBLEM: System hangs while Partition check: with 2.4.20-rc1 kernel