NFS V3 terrible performance when system is client & server

NFS V3 terrible performance when system is client & server

Post by Andy Penningt » Mon, 04 Mar 2002 02:10:24



Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
the filesystem from the NFS Server via NFS client (in effect a
loopback, but across a gigabit net).

cd <the NFS mounted directory>
cp large-file another-file

Runs at about 1 Meg/minute (yes, 1MB/Minute) across a gigabit network,
Network otherwise seems fine. No nfsstat -c errors (well, few)

Switch to a V2 NFS mount, and it runs fine.

Anyone seen this? It only seems to happen when the filesystem is
accessed from the NFS server via an NFS 'loopback' mount, but this
required for our app architecture.

Must be something stupid, but I can't see it. It happens on two
separate systems, with two separate networks, too.

Oh, and if I do the cp command a second time it runs quickly, which i
presume means it is buffered on the host somewhere.

TIA, Andy.

 
 
 

NFS V3 terrible performance when system is client & server

Post by Nicholas Drone » Mon, 04 Mar 2002 04:25:14



> Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
> the filesystem from the NFS Server via NFS client (in effect a
> loopback, but across a gigabit net).
> cd <the NFS mounted directory>
> cp large-file another-file
> Runs at about 1 Meg/minute (yes, 1MB/Minute) across a gigabit network,
> Network otherwise seems fine. No nfsstat -c errors (well, few)
> Switch to a V2 NFS mount, and it runs fine.
> Anyone seen this? It only seems to happen when the filesystem is
> accessed from the NFS server via an NFS 'loopback' mount, but this
> required for our app architecture.
> Must be something stupid, but I can't see it. It happens on two
> separate systems, with two separate networks, too.
> Oh, and if I do the cp command a second time it runs quickly, which i
> presume means it is buffered on the host somewhere.

Sigh.  I think my news client just dropped the response
I just spent 20 minutes to write.

First, this has *nothing* to do with your gigabit network.
If the client and the server are on the same machine, all
network traffic between them will go through the loopback
interface.  You could have a quadraplex superpetabit network
and it still wouldn't matter.

I believe the problem is the difference between NFSv2 and
NFSv3.  This is pretty obvious given what you said here.

What I'd like to see is an iptrace on lo0 for all NFS
packets.  You only need to run this test with NFSv3.
We already know that NFSv2 performs well enough for
you.  What we want to know is what NFSv3 does that
causes it to be so pokey.

I think something like this should suffice:

# iptrace -a -b -i lo0 -p 2049

This assumes the NFS server is listening on port 2049.

The key questions are:

        * Does the client issue numerious NFS3PROC_COMMIT RPCs?
        * Does the last line of the NFS section of the trace
        always include "UNSTABLE"?

It would be handy to put the formatted trace file on
a web server and post the URL so everyone who cares to
can read it.

Regards,

Nicholas Dronen

--
---------------------------------------------------------------------------
Certified AIX Advanced Technical Expert
Boulder, Colorado
---------------------------------------------------------------------------

 
 
 

NFS V3 terrible performance when system is client & server

Post by Paul Pluzhniko » Mon, 04 Mar 2002 06:05:14




> > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
> > the filesystem from the NFS Server via NFS client (in effect a
> > loopback, but across a gigabit net).

> First, this has *nothing* to do with your gigabit network.
> If the client and the server are on the same machine, all
> network traffic between them will go through the loopback
> interface.  You could have a quadraplex superpetabit network
> and it still wouldn't matter.

I do not believe you understood what the OP does:
you can't NFS mount host:/foo on host:/bar, can you?

So in fact, the client is a *different* host, and the
"experiment" is "read a large file from server and write
it back to the (same) server", which *would* involve the
network.

I suggest that Andy first needs to figure out what the timing
is for 1:"read from server to /dev/null", 2:"write to server"
and 3:"read and write back" ...

So far all we know is 3: 1MB/minute.

Getting ftp get and put numbers may also prove useful.

 
 
 

NFS V3 terrible performance when system is client & server

Post by Brent Butchar » Tue, 05 Mar 2002 09:54:32


Hello,

I would look at the tcp send and receive sizes for the nfsno command. You
will find that this may be to small. I would suggest increasing this to be
the same on both machines. This may be that the data is coming across a fast
network and the daemons are unable wake up and read in the data from the
buffer space before the buffer fills.

Also you may want to increase the number of daemons running on both
machines.

Let me know how you go.

Regards,

Brent Butchard





> > > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
> > > the filesystem from the NFS Server via NFS client (in effect a
> > > loopback, but across a gigabit net).

> > First, this has *nothing* to do with your gigabit network.
> > If the client and the server are on the same machine, all
> > network traffic between them will go through the loopback
> > interface.  You could have a quadraplex superpetabit network
> > and it still wouldn't matter.

> I do not believe you understood what the OP does:
> you can't NFS mount host:/foo on host:/bar, can you?

> So in fact, the client is a *different* host, and the
> "experiment" is "read a large file from server and write
> it back to the (same) server", which *would* involve the
> network.

> I suggest that Andy first needs to figure out what the timing
> is for 1:"read from server to /dev/null", 2:"write to server"
> and 3:"read and write back" ...

> So far all we know is 3: 1MB/minute.

> Getting ftp get and put numbers may also prove useful.

 
 
 

NFS V3 terrible performance when system is client & server

Post by Nicholas Drone » Tue, 05 Mar 2002 13:36:46






>> > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
>> > the filesystem from the NFS Server via NFS client (in effect a
>> > loopback, but across a gigabit net).

>> First, this has *nothing* to do with your gigabit network.
>> If the client and the server are on the same machine, all
>> network traffic between them will go through the loopback
>> interface.  You could have a quadraplex superpetabit network
>> and it still wouldn't matter.
> I do not believe you understood what the OP does:
> you can't NFS mount host:/foo on host:/bar, can you?

Cite one reason this is not possible.  I don't care whether
it makes sense to do this -- *why* isn't it possible?

Quote:> So in fact, the client is a *different* host, and the
> "experiment" is "read a large file from server and write
> it back to the (same) server", which *would* involve the
> network.

If the client is "in fact" on a different host, why did the
OP use the term "loopback" *twice*.  Pay attention to this
paragraph of the post:

        Anyone seen this? It only seems to happen when the filesystem is
        accessed from the NFS server via an NFS 'loopback' mount, but this
        required for our app architecture.

Read this closely.  I didn't believe he was mounting the
filesystem to the server from the server until I read the
post a few times.  (I know it's technically possible; it
just wasn't absolutely clear that this was the case.)
If he's not doing this, either his use of "loopback" is
totally inappropriate or I'm on crack.

Quote:> I suggest that Andy first needs to figure out what the timing
> is for 1:"read from server to /dev/null", 2:"write to server"
> and 3:"read and write back" ...
> So far all we know is 3: 1MB/minute.

Even if one assumes that the client and server are on different
machines, the fact that the performance of an NFSv2 mount is
*acceptable* to the OP (read: vastly better than the NFSv3 mount)
obviates the need for network performance data -- that is,  absent
an explanation which accounts for why the *network* and only
the network seems to handle NFSv2 just fine but chokes on NFSv3.
Whether the NFS client accesses the NFS server via the loopback
interface or over the gigabit network, both NFSv2 and NFSv3 mounts
go over the same media -- only part of the IP protool stack in the
case of a "loopback" mount, and all of the stack + driver + NIC +
media in the case where the mount is over the gigabit network.

I think another poster has suggested tuning various network and
NFS options.  This is reasonable advice.  What I find far more
interesting, however, is the reason performance differs noticably
between NFSv2 and NFSv3.

Regards,

Nicholas Dronen

--
---------------------------------------------------------------------------
Certified AIX Advanced Technical Expert
Boulder, Colorado
---------------------------------------------------------------------------

 
 
 

NFS V3 terrible performance when system is client & server

Post by Andy Penningt » Wed, 06 Mar 2002 01:49:07


Thanks for all your input.

It transpires that in shifting from V3 to V2 NFS, this also implies
moving by default from TCP to UDP and the problem is then gone.
Separately, by mounting with rsize and wsize of 1024 using V3 and TCP,
this also performs fine (I'd asked someone else to try this, and
they'd reported no change, but upon checking again, it does actually
avoid the issue)

So, the problem is almost certainly a buffer size, and I suspect it
will be found in the 'no' parameters.

You've all been invaluable in giving me pointers.
Thank you all for your help.
Andy.






> >> > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
> >> > the filesystem from the NFS Server via NFS client (in effect a
> >> > loopback, but across a gigabit net).

> >> First, this has *nothing* to do with your gigabit network.
> >> If the client and the server are on the same machine, all
> >> network traffic between them will go through the loopback
> >> interface.  You could have a quadraplex superpetabit network
> >> and it still wouldn't matter.

> > I do not believe you understood what the OP does:
> > you can't NFS mount host:/foo on host:/bar, can you?

> Cite one reason this is not possible.  I don't care whether
> it makes sense to do this -- *why* isn't it possible?

> > So in fact, the client is a *different* host, and the
> > "experiment" is "read a large file from server and write
> > it back to the (same) server", which *would* involve the
> > network.

> If the client is "in fact" on a different host, why did the
> OP use the term "loopback" *twice*.  Pay attention to this
> paragraph of the post:

>    Anyone seen this? It only seems to happen when the filesystem is
>    accessed from the NFS server via an NFS 'loopback' mount, but this
>    required for our app architecture.

> Read this closely.  I didn't believe he was mounting the
> filesystem to the server from the server until I read the
> post a few times.  (I know it's technically possible; it
> just wasn't absolutely clear that this was the case.)
> If he's not doing this, either his use of "loopback" is
> totally inappropriate or I'm on crack.

> > I suggest that Andy first needs to figure out what the timing
> > is for 1:"read from server to /dev/null", 2:"write to server"
> > and 3:"read and write back" ...

> > So far all we know is 3: 1MB/minute.

> Even if one assumes that the client and server are on different
> machines, the fact that the performance of an NFSv2 mount is
> *acceptable* to the OP (read: vastly better than the NFSv3 mount)
> obviates the need for network performance data -- that is,  absent
> an explanation which accounts for why the *network* and only
> the network seems to handle NFSv2 just fine but chokes on NFSv3.
> Whether the NFS client accesses the NFS server via the loopback
> interface or over the gigabit network, both NFSv2 and NFSv3 mounts
> go over the same media -- only part of the IP protool stack in the
> case of a "loopback" mount, and all of the stack + driver + NIC +
> media in the case where the mount is over the gigabit network.

> I think another poster has suggested tuning various network and
> NFS options.  This is reasonable advice.  What I find far more
> interesting, however, is the reason performance differs noticably
> between NFSv2 and NFSv3.

> Regards,

> Nicholas Dronen

 
 
 

NFS V3 terrible performance when system is client & server

Post by H?kan Ekdah » Fri, 08 Mar 2002 23:08:29


Hi goup,
The AIXNEWS bulletin had this article which corresponds rather well with the
problem you are experincing (?)
HTH
/Hakanen

Dear AIXNEWS subscribers,

ISSUE:
If you are running HACMP for AIX*, in a 2-node  configuration with
NFS-exported filesystems, you  may experience cases where the nfso commands
issued within the HACMP scripts will take a long  time and eventually return
with RPC timeout. This  may be due to how the hostname on your system has
been defined.

SOLUTION:
The dependency upon the hostname for finding the  rpc.statd daemon on the
local node has been removed from the nfso command by it using the loopback
interface. All customers with these configurations should apply the APAR
appropriate to their AIX release as indicated below:

AIX 4.3.3: IY26866
AIX 5.1: IY27072

Ed Kwedar
AIX Consultant


> Thanks for all your input.

> It transpires that in shifting from V3 to V2 NFS, this also implies
> moving by default from TCP to UDP and the problem is then gone.
> Separately, by mounting with rsize and wsize of 1024 using V3 and TCP,
> this also performs fine (I'd asked someone else to try this, and
> they'd reported no change, but upon checking again, it does actually
> avoid the issue)

> So, the problem is almost certainly a buffer size, and I suspect it
> will be found in the 'no' parameters.

> You've all been invaluable in giving me pointers.
> Thank you all for your help.
> Andy.




- Show quoted text -





> > >> > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount accessing
> > >> > the filesystem from the NFS Server via NFS client (in effect a
> > >> > loopback, but across a gigabit net).

> > >> First, this has *nothing* to do with your gigabit network.
> > >> If the client and the server are on the same machine, all
> > >> network traffic between them will go through the loopback
> > >> interface.  You could have a quadraplex superpetabit network
> > >> and it still wouldn't matter.

> > > I do not believe you understood what the OP does:
> > > you can't NFS mount host:/foo on host:/bar, can you?

> > Cite one reason this is not possible.  I don't care whether
> > it makes sense to do this -- *why* isn't it possible?

> > > So in fact, the client is a *different* host, and the
> > > "experiment" is "read a large file from server and write
> > > it back to the (same) server", which *would* involve the
> > > network.

> > If the client is "in fact" on a different host, why did the
> > OP use the term "loopback" *twice*.  Pay attention to this
> > paragraph of the post:

> > Anyone seen this? It only seems to happen when the filesystem is
> > accessed from the NFS server via an NFS 'loopback' mount, but this
> > required for our app architecture.

> > Read this closely.  I didn't believe he was mounting the
> > filesystem to the server from the server until I read the
> > post a few times.  (I know it's technically possible; it
> > just wasn't absolutely clear that this was the case.)
> > If he's not doing this, either his use of "loopback" is
> > totally inappropriate or I'm on crack.

> > > I suggest that Andy first needs to figure out what the timing
> > > is for 1:"read from server to /dev/null", 2:"write to server"
> > > and 3:"read and write back" ...

> > > So far all we know is 3: 1MB/minute.

> > Even if one assumes that the client and server are on different
> > machines, the fact that the performance of an NFSv2 mount is
> > *acceptable* to the OP (read: vastly better than the NFSv3 mount)
> > obviates the need for network performance data -- that is,  absent
> > an explanation which accounts for why the *network* and only
> > the network seems to handle NFSv2 just fine but chokes on NFSv3.
> > Whether the NFS client accesses the NFS server via the loopback
> > interface or over the gigabit network, both NFSv2 and NFSv3 mounts
> > go over the same media -- only part of the IP protool stack in the
> > case of a "loopback" mount, and all of the stack + driver + NIC +
> > media in the case where the mount is over the gigabit network.

> > I think another poster has suggested tuning various network and
> > NFS options.  This is reasonable advice.  What I find far more
> > interesting, however, is the reason performance differs noticably
> > between NFSv2 and NFSv3.

> > Regards,

> > Nicholas Dronen

 
 
 

NFS V3 terrible performance when system is client & server

Post by Darcy Dippe » Wed, 13 Mar 2002 14:21:07


Thanks for the APAR.    I've also experienced a similar problem in the last
few days.   It seems that my iptraces show that the first time I fire up
HACMP, it uses NFS V3 and the performance is horrible.  Subsequent restarts,
use  NFS v2 and magically my performance gets much better.


> Hi goup,
> The AIXNEWS bulletin had this article which corresponds rather well with
the
> problem you are experincing (?)
> HTH
> /Hakanen

> Dear AIXNEWS subscribers,

> ISSUE:
> If you are running HACMP for AIX*, in a 2-node  configuration with
> NFS-exported filesystems, you  may experience cases where the nfso
commands
> issued within the HACMP scripts will take a long  time and eventually
return
> with RPC timeout. This  may be due to how the hostname on your system has
> been defined.

> SOLUTION:
> The dependency upon the hostname for finding the  rpc.statd daemon on the
> local node has been removed from the nfso command by it using the loopback
> interface. All customers with these configurations should apply the APAR
> appropriate to their AIX release as indicated below:

> AIX 4.3.3: IY26866
> AIX 5.1: IY27072

> Ed Kwedar
> AIX Consultant



> > Thanks for all your input.

> > It transpires that in shifting from V3 to V2 NFS, this also implies
> > moving by default from TCP to UDP and the problem is then gone.
> > Separately, by mounting with rsize and wsize of 1024 using V3 and TCP,
> > this also performs fine (I'd asked someone else to try this, and
> > they'd reported no change, but upon checking again, it does actually
> > avoid the issue)

> > So, the problem is almost certainly a buffer size, and I suspect it
> > will be found in the 'no' parameters.

> > You've all been invaluable in giving me pointers.
> > Thank you all for your help.
> > Andy.







> > > >> > Scenario is this: AIX 4.3.3, HACMP, using an NFS V3 mount
accessing
> > > >> > the filesystem from the NFS Server via NFS client (in effect a
> > > >> > loopback, but across a gigabit net).

> > > >> First, this has *nothing* to do with your gigabit network.
> > > >> If the client and the server are on the same machine, all
> > > >> network traffic between them will go through the loopback
> > > >> interface.  You could have a quadraplex superpetabit network
> > > >> and it still wouldn't matter.

> > > > I do not believe you understood what the OP does:
> > > > you can't NFS mount host:/foo on host:/bar, can you?

> > > Cite one reason this is not possible.  I don't care whether
> > > it makes sense to do this -- *why* isn't it possible?

> > > > So in fact, the client is a *different* host, and the
> > > > "experiment" is "read a large file from server and write
> > > > it back to the (same) server", which *would* involve the
> > > > network.

> > > If the client is "in fact" on a different host, why did the
> > > OP use the term "loopback" *twice*.  Pay attention to this
> > > paragraph of the post:

> > > Anyone seen this? It only seems to happen when the filesystem is
> > > accessed from the NFS server via an NFS 'loopback' mount, but this
> > > required for our app architecture.

> > > Read this closely.  I didn't believe he was mounting the
> > > filesystem to the server from the server until I read the
> > > post a few times.  (I know it's technically possible; it
> > > just wasn't absolutely clear that this was the case.)
> > > If he's not doing this, either his use of "loopback" is
> > > totally inappropriate or I'm on crack.

> > > > I suggest that Andy first needs to figure out what the timing
> > > > is for 1:"read from server to /dev/null", 2:"write to server"
> > > > and 3:"read and write back" ...

> > > > So far all we know is 3: 1MB/minute.

> > > Even if one assumes that the client and server are on different
> > > machines, the fact that the performance of an NFSv2 mount is
> > > *acceptable* to the OP (read: vastly better than the NFSv3 mount)
> > > obviates the need for network performance data -- that is,  absent
> > > an explanation which accounts for why the *network* and only
> > > the network seems to handle NFSv2 just fine but chokes on NFSv3.
> > > Whether the NFS client accesses the NFS server via the loopback
> > > interface or over the gigabit network, both NFSv2 and NFSv3 mounts
> > > go over the same media -- only part of the IP protool stack in the
> > > case of a "loopback" mount, and all of the stack + driver + NIC +
> > > media in the case where the mount is over the gigabit network.

> > > I think another poster has suggested tuning various network and
> > > NFS options.  This is reasonable advice.  What I find far more
> > > interesting, however, is the reason performance differs noticably
> > > between NFSv2 and NFSv3.

> > > Regards,

> > > Nicholas Dronen

 
 
 

1. Poor NFS write performance from Linux clients to NFS Ver 3 servers.

I'm having a real problem with very slow NFS write performance from Linux
clients running Redhat 4.2, 5.0, and 5.1 to servers running NFS version
3.0.  The servers indicate that the Linux clients are mounted using NFS v2.
The servers I have are running Solaris 2.5.1 and 2.6 on both Intel and SPARC
hardware. Others have reported the same problem with Alpha servers running
Digital UNIX and NFS v3. My servers are all up to date on patches. The
network (Fast Ethernet) is not the problem as I have this problem on two
isolated networks at different locations. NFS read performance is normal.

I have increased the rsize and wsize to 8192, which doubled the performance,
and have played with every option I thought would help.  Now the writes are
only a factor of 10 times slower than FTP'ing the file and 6 times slower
than other clients running Solaris 2.5.1 and NFS v3 using TCP.

I'm getting desperate as everyone is screaming about the slow performance.
I'm hoping that someone has solved this problem. Surely I don't have to put
extra disks of home directories on these Linux boxes to be able to use NFS.
I just installed Redhat 5.1 hoping that would fix my problem, but its no
different. Why is it taking so long to add NFS version 3.0 protocols to
Linux?  It has been available from other vendors for well over a year. The
performance of version 3.0 using TCP is significantly faster than version
2.

Thanks in advance for any help. I will post a summary if I find a solution.

Denny Morse

2. application in Linux

3. Problem: performance NFS V3 vs NFS V2

4. Source packages install & Term help needed

5. Status of NFS v3 client/server

6. IP Masqing

7. Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client

8. Problem with tail -f

9. NFS server on HP-UX 10.20 & NFS client on RH-5.2 : write permission ???

10. terrible linux nfs performance???

11. terrible NFS performance with Solaris 2.6

12. NFS performance solaris client, linux server

13. Need info on PC/NFS and Unix NFS servers, server/client ratio