very slow NFS from SGI to Tru64 5.0 server

very slow NFS from SGI to Tru64 5.0 server

Post by Mark Wiederspa » Thu, 30 Mar 2000 04:00:00



very slow NFS v3 from SGI to Tru64 5.0 server

Digital alpha pc 164lx with de500 tulip NIC, connected to 3com
3900 switch connected to SGI Origin with gigabit enet interface. Unpatched
system initially; patched with t64v50as0001-19991025 from the public
patch set with no significant result. NIC is run non-autonegotiate 100fdx.
Switch shows no data errors, from or to the alpha or sgi. All the kernel
parameters are stock. 1 Gbyte memory, 8Mb cache, 667 Mhz.
sys_check does not show anything obvious.

binary ftp writing to alpha from sgi is 6ish Mbytes/sec; same speed reading.
similar speed ftp from alpha writing to sgi, or reading from sgi.

nfs v3 udp or tcp transfers with sgi as server, alpha as client are all
above 5 Mb/sec.

nfs v3 udp or tcp reads from alpha server to sgi are 5ish Mbyte/sec.

nfs v3 udp or tcp writes to alpha from sgi are 0.8 Mbyte/sec !
nfs mount rsize,wsize small or big doesn't seem to change anything.

lots (1-4/sec) of input errors shown by netstat -i
netstat -I tu1 -s shows:

tu1 Ethernet counters at Wed Mar 29 01:37:20 2000

             179 seconds since last zeroed
       272641828 bytes received
         6549810 bytes sent
          186580 data blocks received
           99526 data blocks sent
           12823 multicast bytes received
             138 multicast blocks received
             588 multicast bytes sent
               6 multicast blocks sent
               0 blocks sent, initially deferred
               0 blocks sent, single collision
               0 blocks sent, multiple collisions
               0 send failures
               0 collision detect check failure
             350 receive failures, reasons include:
               0 unrecognized frame destination
               0 data overruns
               0 system buffer unavailable
               0 user buffer unavailable

nothing in /var/adm/messages except:

Mar 28 19:03:28 acig0 vmunix: tu1: transmit FIFO underflow: threshold
raised to: 256 bytes
Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
raised to: 512 bytes
Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
raised to: 1024 bytes
(but this is going the wrong way, and only shows up once per system boot).

nfs v3 tcp writes from Suns to this same system are 3+ Mbytes/sec; slow
but not impossible.
There are network errors at about the same rate, but it doesn't seem to
affect the transfer
rate as for the SGI.

This enet switch has been used for some while with the same alpha, running
NT with Kingston
NIC's and Hummingbird Maestro NFS without these problems. Tru64 is a new
trial install,
since Maestro doesn't support >4Gb files.

hints appreciated.

mw

--
Mark Wiederspahn
Senior System Analyst

 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Olivier Deme » Sat, 01 Apr 2000 04:00:00


Had a similar problem with net cards installed in our Digital and connecting
to a switch.
(don't have the part numbers and all, but its a 100 Mb full-duplex net card
and a 100 Mb switch with Gb fiber backbone, it happened with both DU 4.0 and
True64)

The only way to have good performance is to force the switch port and the
ethernet card on the Digital to 100 Mb Half-Duplex.

I'll be happy to hear of any other way to make it work full-duplex tough.

Olivier.

(anti-spam: remove .xxx from the e-mail address to repond to me via mail.)


Quote:

> very slow NFS v3 from SGI to Tru64 5.0 server

> Digital alpha pc 164lx with de500 tulip NIC, connected to 3com
> 3900 switch connected to SGI Origin with gigabit enet interface. Unpatched
> system initially; patched with t64v50as0001-19991025 from the public
> patch set with no significant result. NIC is run non-autonegotiate 100fdx.
> Switch shows no data errors, from or to the alpha or sgi. All the kernel
> parameters are stock. 1 Gbyte memory, 8Mb cache, 667 Mhz.
> sys_check does not show anything obvious.

> binary ftp writing to alpha from sgi is 6ish Mbytes/sec; same speed
reading.
> similar speed ftp from alpha writing to sgi, or reading from sgi.

> nfs v3 udp or tcp transfers with sgi as server, alpha as client are all
> above 5 Mb/sec.

> nfs v3 udp or tcp reads from alpha server to sgi are 5ish Mbyte/sec.

> nfs v3 udp or tcp writes to alpha from sgi are 0.8 Mbyte/sec !
> nfs mount rsize,wsize small or big doesn't seem to change anything.

> lots (1-4/sec) of input errors shown by netstat -i
> netstat -I tu1 -s shows:

> tu1 Ethernet counters at Wed Mar 29 01:37:20 2000

>              179 seconds since last zeroed
>        272641828 bytes received
>          6549810 bytes sent
>           186580 data blocks received
>            99526 data blocks sent
>            12823 multicast bytes received
>              138 multicast blocks received
>              588 multicast bytes sent
>                6 multicast blocks sent
>                0 blocks sent, initially deferred
>                0 blocks sent, single collision
>                0 blocks sent, multiple collisions
>                0 send failures
>                0 collision detect check failure
>              350 receive failures, reasons include:
>                0 unrecognized frame destination
>                0 data overruns
>                0 system buffer unavailable
>                0 user buffer unavailable

> nothing in /var/adm/messages except:

> Mar 28 19:03:28 acig0 vmunix: tu1: transmit FIFO underflow: threshold
> raised to: 256 bytes
> Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
> raised to: 512 bytes
> Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
> raised to: 1024 bytes
> (but this is going the wrong way, and only shows up once per system boot).

> nfs v3 tcp writes from Suns to this same system are 3+ Mbytes/sec; slow
> but not impossible.
> There are network errors at about the same rate, but it doesn't seem to
> affect the transfer
> rate as for the SGI.

> This enet switch has been used for some while with the same alpha, running
> NT with Kingston
> NIC's and Hummingbird Maestro NFS without these problems. Tru64 is a new
> trial install,
> since Maestro doesn't support >4Gb files.

> hints appreciated.

> mw

> --
> Mark Wiederspahn
> Senior System Analyst


 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Eric Werme - replace nospam with wer » Sat, 01 Apr 2000 04:00:00


Very odd.  You've done a good job recording data and trying other
things, you haven't left with anything easy I can suggest.  I did pass
your post to a driver person.

Tcpdump traces are always worthwhile.  See the Digital Unix/Tru64
FAQ for notes on setup and use.


>tu1 Ethernet counters at Wed Mar 29 01:37:20 2000
>             179 seconds since last zeroed
...
>             350 receive failures, reasons include:
>               0 unrecognized frame destination

...

Odd that no reasons are listed.  On one busy NIC here:

tu4 Ethernet counters at Thu Mar 30 20:13:58 2000

           65535 seconds since last zeroed
      2292031989 bytes received
       455658252 bytes sent
        18648668 data blocks received
          922294 data blocks sent
      1996473052 multicast bytes received
        18148613 multicast blocks received
         1360956 multicast bytes sent
           14307 multicast blocks sent
           53785 blocks sent, initially deferred
           20737 blocks sent, single collision
           24614 blocks sent, multiple collisions
               9 send failures, reasons include:
                Excessive collisions
               0 collision detect check failure
            2093 receive failures, reasons include:
                Frame too long
               0 unrecognized frame destination
               0 data overruns
               0 system buffer unavailable
               0 user buffer unavailable

Quote:>nothing in /var/adm/messages except:
>Mar 28 19:03:28 acig0 vmunix: tu1: transmit FIFO underflow: threshold
>raised to: 256 bytes
>Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
>raised to: 512 bytes
>Mar 28 19:05:50 acig0 vmunix: tu1: transmit FIFO underflow: threshold
>raised to: 1024 bytes
>(but this is going the wrong way, and only shows up once per system boot).

That's pretty much normal.

Quote:>nfs v3 tcp writes from Suns to this same system are 3+ Mbytes/sec; slow
>but not impossible.
>There are network errors at about the same rate, but it doesn't seem to
>affect the transfer
>rate as for the SGI.

What speed is the NIC on the Sun box?  Not that it'll help me.

        -Ric Werme
--
<>    Eric (Ric) Werme    <> The above is unlikely to contain    <>
<>    ROT-13 addresses:   <> official claims or policies of      <>


 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Mark Wiederspah » Sun, 02 Apr 2000 04:00:00



> very slow NFS v3 from SGI to Tru64 5.0 server

in synopsis:
NFS v3/tcp writes over 100 Mbit ethernet average 0.8 Mbyte/sec
although ftp over the same path is fast, 6 Mbyte/sec.

thanks for all those who suggested things to try.

more observations:

It does not seem to be a network problem per se.

I have the same problem with half-duplex-100, fdx-100
and auto-negotiated fdx-100. I don't have any problem with
binary ftp from the SGI to the Alpha /dev/null - it runs
at 11.5 Mbyte/sec. Close monitoring of the packet count/sec
showd during ftp to a disk file, every 70 Mbyte or so
as the advfs cache fills and is flushe,  the packets/sec
drop from 5000+/sec to a few hundred, and speeds back up
after about 4 seconds when the disk cache flush is done.

It turns out, that any disk write (but not uncached read) while the
network is active seems to slow the net down by a noticable amount.
This effect is most pronounced for inbound packets.
For example, while a binary ftp to Alpha /dev/null is going on,
8000 packets/sec drops to 100 packets/sec while the disk
io is active at 15-18 mbyte/sec, and then resumes full speed
when the io is done. (dd if=/dev/zero bs=256k count=1000 of=foo)

I've tried swapping the pci order of the network and disk boards.
No change. These are, btw, a Dec de500b and a Intraserver lvd board
based on a symbios 53c895 chipset (itpsa driver).

One more distressing observation: while the NFS write is plodding
along at 1-2 mbyte/sec, it is possible to do a binary ftp from
the same SGI which is writing nfs, over the same path, and have it
transfer data to /dev/null at 3.2 Mbyte/sec, or to the same disk
as the nfs file at 2.5 Mb/sec ! (The total io to the disk increases,
so the ftp is not stealing the NFS bandwidth). This implies to me
that it is an internal NFS/advfs issue, rather than a network problem.

And the cpu never shows less than about 70% idle.

Any ideas?

thanks,
mw

 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Jochen Lübber » Tue, 04 Apr 2000 04:00:00


Sorry, that here is no solution, but I also got bad (or very bad) results with
NFS
between to Aplha 1000A 400 boxes unsing DEC Unix 4.0B and 4.0E.

J.L.
--



 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Eric Werme - replace nospam with wer » Tue, 04 Apr 2000 04:00:00



>It does not seem to be a network problem per se.

Kinda looking that way.

Quote:>It turns out, that any disk write (but not uncached read) while the
>network is active seems to slow the net down by a noticable amount.

Disk writes are pretty expensive in AdvFS, so I'm not too surprised.
V5.0 saw a lot of performance tuning in AdvFS, so if you feel
ambitious....

Quote:>One more distressing observation: while the NFS write is plodding
>along at 1-2 mbyte/sec, it is possible to do a binary ftp from
>the same SGI which is writing nfs, over the same path, and have it
>transfer data to /dev/null at 3.2 Mbyte/sec, or to the same disk
>as the nfs file at 2.5 Mb/sec ! (The total io to the disk increases,
>so the ftp is not stealing the NFS bandwidth). This implies to me
>that it is an internal NFS/advfs issue, rather than a network problem.

It's past time for my "tcpdump, dadump, dadump" chant.  One thing
AdvFS has some trouble with is I/O that isn't quite sequential.  FTP's
writes are completely sequential, NFS's are not due to client write
behind threads (biod/nfsiod), scheduling policy, number of CPUs, and the
order of processing by the NFS server threads.

I think that's another thing V5.0 handles better.

If you aren't familiar with tcpdump, see
http://www.unix.digital.com/unix/faq/network.html#N7
The sort of data that is interesting are the write offsets in the write
calls, whether they're stable or unstable (safe asynchronous writes),
and time between call and reply.

BTW, are you talking to Compaq support folks about this?  I hang out here
on a "time available" basis, and there isn't much of that....

        -Ric Werme
--
<>    Eric (Ric) Werme    <> The above is unlikely to contain    <>
<>    ROT-13 addresses:   <> official claims or policies of      <>


 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Serguei Patchkovsk » Tue, 04 Apr 2000 04:00:00


: It's past time for my "tcpdump, dadump, dadump" chant.  One thing

As the risk of sounding silly, what is dadump?

/Serge.P

--
home page: http://www.cobalt.chem.ucalgary.ca/ps/

 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Ric Werm » Tue, 04 Apr 2000 04:00:00



>: It's past time for my "tcpdump, dadump, dadump" chant.  One thing
>As the risk of sounding silly, what is dadump?

Just something silly to provide a little rhythm.  There's no dadump program
as far as I know.  Hmm.  Maybe I ought to write one.  I wonder what it
should do.

I'm known at work for not taking vistors at my office very seriously unless
they bring tcpdump traces.  I have most poeple pretty well trained.  :-)

        -Ric
--

http://people.ne.mediaone.net/werme  |       ^^^^^^^ delete

 
 
 

very slow NFS from SGI to Tru64 5.0 server

Post by Serguei Patchkovsk » Wed, 05 Apr 2000 04:00:00



: >: It's past time for my "tcpdump, dadump, dadump" chant.  One thing

: >As the risk of sounding silly, what is dadump?

: Just something silly to provide a little rhythm.  There's no dadump program
: as far as I know.  Hmm.  Maybe I ought to write one.  I wonder what it
: should do.

Oh, that would be quite easy. As anybody who used real computers knows,
'da' is an abbreviation of 'dasd' - and that, in turn, is one of those
wonderful, even if somewhat new-fangled, disk drives they seem to like
so much these days.

Just make sure you announce it on the next April 1st :-)

/Serge.P

--
home page: http://www.cobalt.chem.ucalgary.ca/ps/

 
 
 

1. HELP: Slow NFS client to server causing Very slow NFS install

My RH 5.2 NFS Install is SO SLOWWW!!

This is taking forever!!  The link speed currently is
284 MB in 17 hours!! or 4.6kbytes/s ~ 46kbps

Why is there such a slow server to client link?

I'm doing this using Lynksys Etherfast 10/100 card on a PII350 desktop
and
a P75 laptop with the PCMCIA version of the Etherfast 10/100.  The
status
LEDs on the cards both indicate that the connections is at 100Mbps, but
the

See below for all of my network config files in /etc.

I believe that with host.conf setup for "order hosts,bind" that I don't
need a
nameserver.  I don't have a local name server setup.  

I have tried modifying some of the parameters below during the
installation,
running exportfs and turning eth0 off and on (using usernet on the
desktop)
DURING THE INSTALLATION (it seems that RH5.2 mounts "hard" during NFS
Install,
ie it is continuing on after I turn eth0 back on) but with no change on
performance.

I did notice when I set the network up before the install that pinging
from
client to server was odd, some packets took 10ms, others took 1000ms in
an
almost every other packet trade off.  The ping from server to client was
on
the order of 10 ms every packet.  Now with the install going on, ping
from
server (desktop) is averaging 350ms with any packet size from 256Bytes
to
8kbytes.  

I have the following setups in exports, host.conf, hosts, hosts.deny,
hosts.allow and resolv.conf:

 cat exports
/mnt/cdrom laptop(ro)  
/home/chris laptop(rw)

cat host.conf
order hosts,bind
multi on

 cat hosts
192.168.0.1     desktop.mylocal.net   desktop
192.168.0.3     laptop.mylocal.net   laptop
127.0.0.1       localhost       loopback

 cat hosts.deny
ALL: ALL
portmap: ALL

 cat hosts.allow
ALL: LOCAL
portmap: 192.168.0.0/255.255.255.0

 cat resolv.conf
domain mylocal.net
#nameserver 140.174.162.14    
#nameserver 140.174.162.10

I have tried resolv.conf with and without commenting out the nameservers
(my
local isp nameservers,) but with no effect.

2. IP vs. Non-IP VirtualHost

3. NFS funnies - cannot pwd on SGI box on fs exported from Tru64 Unix

4. Redhat linux: Perl mSQL module installation problems.

5. FTP problem: linux>>win/SGI slow win/SGI>>linux fast

6. Mouse Problems

7. FTP problem: Linux->Win/SGI slow Win/SGI->Linux fast

8. Panic....can't boot

9. How to config NFS install of REDHAT 5.0 in a NFS server?

10. NFS problems: OpenBSD server to Tru64 UNIX client

11. suse linux and nfs-server based on tru64 unix 4.0f

12. Kernel 2.4 NFS and SGI IRIX servers - directory problem solved?

13. OpenGL Interoperability between HP Tru64 and SGI