We've been chasing this for a few weeks. Thought it about time I let a
few other people know. EMC has been great about helping us with this
one even though it is not their problem. Hummingbird have not been
nearly as helpful. So, I would be very interested to hear from other
people who might be seeing similar problems. More ammunition to beat
Hummingbird up with would be helpful.
Please find below various email exchanges that describe the problem etc.
Cheers,
--
Chris Stacey
Unix Systems Administrator, Motorola ECID
Tel: +44 (0)1793 565142 Fax: +44 (0)1793 565419
mailto: cstac...@email.mot.com
-------------- Begin various included messages ---------------
Subject:
RE: Help: NT + DOS App + Unix Server
Date:
Fri, 15 May 1998 12:07:04 -0700
From:
bob.dehnha...@amd.com
To:
stac...@ecid.cig.mot.com
Chris! You are my New Best Friend!
You were exactly right that it was a NLM problem. The network trace
showed numerous identical lock requests and grant pairs, just as you
said. You were right on the money.
Unfortunately, we can't switch to FTP's client, due to a pronouncement
from on high that Maestro Is Good And Shall Be Used. And it takes
jumping through numerous hoops to get the configuration on any of our
Suns changed. That left me with the registry hack from Hummingbird's
site.
Good news: It Works! Program load times dropped from almost 2 minutes to
15 seconds. Data access dropped from 15 seconds to <1 second. I am now a
Big Hero with our Payroll department (which is probably the best
department in which to be a hero). And I owe it all to you.
If you're ever in Sunnyvale, CA, let me know. I've got several pints of
your favorite beverage waiting for you.
- Bob
> From: Chris Stacey [SMTP:stac...@ecid.cig.mot.com]
> Sent: Thursday, May 14, 1998 11:53 AM
> To: Dehnhardt, Bob; stac...@ecid.cig.mot.com
> Subject: Re: Help: NT + DOS App + Unix Server
> Bob,
> Sounds like you might be getting bitten by a bug in Hummingbird NFS.
> We've just
> recently moved a filesystem that is accessed by a number of NT4.0 +
> Hummingbird NFS
> (v5.1.3) clients from a Solaris 2.3 server to an EMC celerra box (this
> is a fast
> server dedicated to providing NFS).
> We started seeing major performance problems. The PC's would start
> taking 3-5mins to
> explore a directory. Applications like Excel and Word couldn't browse
> directories.
> After much grief, enourmous help from EMC, and sod all from
> Hummingbird,
> we finally
> noticed something in the network traces we've taken. When a client
> uses
> NFSv2 it should use Network Lock Manager (NLM) v3. When it uses NFSv3
> it should use NLMv4. Turns out that
> Hummingbird (5.1.3, 6.0, and 6.0.1) all use NLMv3 with NFSv3, rather
> than NLMv4 as it
> should.
> The server doesn't seem to care, but the clients get confused after a
> while and start asking
> for the same lock ~150 times (it is granted each time). If there are
> not many clients,
> you may just see slow performance, with lots of clients we see abismal
> performance.
> This is especially true on filesystems that are accessed by a number
> of
> clients simultaneously.
> Since your unix box is running 2.6 it supports NFSv3 (anything later
> than Solaris 2.3 does).
> So you are probably getting this problem.
> The Hummingbird web site has a registry hack that supposedly turns
> NFSv3
> off, but I don't
> trust them anymore. Our solution is going to be to turn NFSv3 off on
> the server. I'm
> not sure you can do that with Solaris (I must check because we need
> to). Note that we
> haven't actually done that yet so it may not be the only problem we
> have. Tests in EMC's
> labs have confirmed it with SunOS 4.1.3 and Solaris 2.5 boxes.
> Our other option is to use a different PC NFS client, such as FTP's
> which cheaper anyway.
> Try capturing a network trace using snoop on your sun, or SMS on the
> PC's. You should see
> hundreds of identical lock request and grant pairs of packets. I'd be
> very interested to
> have someone else confirm our findings.
> Please feel free to forward this on to the news group as I picked it
> up
> from a search on
> DejaNews and do not read news much these days. I suspect a lot of
> people are getting
> bitten by this, but unless they have lots of clients accessing the
> same
> filesystems they'll
> just be living with it.
> Hope this helps.
> Cheers,
> --
> Chris Stacey
> Unix Systems Administrator, Motorola ECID
> Tel: +44 (0)1793 565142 Fax: +44 (0)1793 565419 Pager: 0839 693406
> mailto: stac...@ecid.cig.mot.com
Subject:
RE: NFS and NLM version corespondence.
Date:
Wed, 20 May 1998 15:27:48 +0100
From:
Paul Richards <pa...@hcl.com>
To:
"'Chris Stacey'" <stac...@ecid.cig.mot.com>
Good afternoon Chris, thankyou for your email, could you please expand
on your statement... " Hummingbird's use of NLMv3 with NFSv3 is wrong"
Regards Paul
> From: Chris Stacey [SMTP:stac...@ecid.cig.mot.com]
> Sent: Wednesday, May 20, 1998 1:35 PM
> To: pa...@hcl.com; stac...@ecid.cig.mot.com; gwi...@ecid.cig.mot.com
> Subject: NFS and NLM version corespondence.
> Paul,
> According to our reading of the NLM appendix of the NFSv3 RFC (see
> below) Hummingbird's use of NLMv3 with NFSv3 is wrong. Please explain
> why you think it is correct.
> I have also been in contact with an administrator at AMD in
> California,
> who was seeing very similar problems to us with NT4.0 clients using
> Hummingbird NFS to access a Solaris 2.6 server. I asked him to trace
> what traffic was flowing between the clients and server. His trace
> confirmed the same NFSv3 NLMv3 mix and repeated lock request/grant
> pairs
> of packets that we have been seeing (we copied a number of our traces
> to
> you two weeks ago). After applying the registry hack from the
> Hummingbird web site that prevents the client from using NFSv3 all his
> problems went away.
> Because of the number of clients we have we have not implemented the
> registry hack yet. Instead, we have asked EMC to provide us with a
> version of their server that has NFSv3 turned off. We have yet to
> confirm whether that will solve all the problems we've been
> experiencing
> with Hummingbird clients.
> Regards,
> Chris Stacey
> Extract from:
> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1813.txt
> Callaghan, el al Informational [Page
> 113]
> RFC 1813 NFS Version 3 Protocol June
> 1995
> 6.0 Appendix II: Lock manager protocol
> Because the NFS version 2 protocol as well as the NFS version 3
> protocol is stateless, an additional Network Lock Manager (NLM)
> protocol is required to support locking of NFS-mounted files.
> The NLM version 3 protocol, which is used with the NFS version 2
> protocol, is documented in [X/OpenNFS].
> Some of the changes in the NFS version 3 protocol require a
> new version of the NLM protocol. This new protocol is the NLM
> version 4 protocol. The following table summarizes the
> correspondence between versions of the NFS protocol and NLM
> protocol.
> NFS and NLM protocol compatibility
> +---------+---------+
> | NFS | NLM |
> | Version | Version |
> +===================+
> | 2 | 1,3 |
> +---------+---------+
> | 3 | 4 |
> +---------+---------+
> This appendix only discusses the differences between the NLM
> version 3 protocol and the NLM version 4 protocol. As in the
> NFS version 3 protocol, almost all the names in the NLM version
> 4 protocol have been changed to include a version number. This
> appendix does not discuss changes that consist solely of a name
> change.
(beginning of original message)
Subject: NFS and lockd issues
From: "Ravi C. Kumar" <ra...@bounce.ms.com>
Date: 1998/05/12
Newsgroups: comp.unix.solaris,comp.protocols.nfs
Hi Everyone,
I have a Solaris NFS server with quite a few NFS shares.
My NFS clients are NT based (using Hummingbird NFS). They
have no problem mounting (or in doze terms mapping) the
shares from any other UNIX machine. But with this particular
machine, they have wait a good 5 minutes before they can access
the share. Each file they access takes another good five minutes,
and sometimes it just hangs. All our servers are identical
configuration, I ran snoop on this and found that the NLM is
causing the problem.
NLM C SHARE3 OH=6D75 FH=F6D8 Mode=3 Access=3
NLM R SHARE3 OH=6D75 denied
(These NLM messages are repeated for a long time)
It continues for a long time and finally sometimes the access
is granted. This is a problem with only one particular machine.
I tried to debug the lockd and all it returns is
debug= 1, timout= 300, retrans= 5, grace= 45, nservers= 20
The 'nfsstat' shows everything as normal.
Is this a case of client locks not being released? When ran
'pfiles' on lockd, it shows still plenty of file descriptors
available, so running out of fds is out of question. Any ideas?
As what might cause this?? Any answers are appreciated.
--
Ravi C. Kumar
"Please remove the bounce from my address to reply to me"
(end of original message) Ravi, We've been chasing this problem for a few weeks. Hummingbird have taken Cheers,
---------------------------------------------------------------------------
a long time to start listening. Below is an email I just sent to one of
their engineers. I'll also forward an email exchange I had with a guy
at AMD in California. He had NFS Maestro clients talking to a Solaris
2.6 server. What exactly have you got? And what are you seeing?
--
Chris Stacey
Unix Systems Administrator, Motorola ECID
Tel: +44 (0)1793 565142 Fax: +44 (0)1793 565419
mailto: cstac...@email.mot.com
> Reply
> Ref : Incident ID 425864
> In the recently concluded Connectathon meet we found serious
> interoperatablity
> issues with NLM ver 4 when we tested our client with various NFS
> servers and as
> a result Hummingbird chose not to include NLM ver 4 in the current
> shipping
> release of NFS Maestro Client ver 6.0.1 . However as more vendors
> implement NFS
> ver 3 (NLM ver 4) we do plan to re-introduce NLM ver 4 in the future
> releases of
> NFS Maestro Client.
was waiting for the results of some tests we were running. I am happy
to believe that your use of NLMv3 with NFSv3 is allowed. However, what
I really want is a PC-NFS client that works. Thus far neither NFS
Maestro Client ver 5.1.3, nor 6.0.1, are happy talking to our EMC
Celerra box.
Last week we thought the problem was related to Maestro and NFSv3.
After a while clients would start hang for upto 5 minutes while
exploring directories and applications such as Excel and Word were
unable to browse the filesystems. In the network traces we took (they
were sent to you at the time) we saw repeated lock requests for the same
filehandle from the clients. The lock was granted each time, but the
clients kept pausing for 0.5 seconds and trying again.
We've configrmed this behaviour here at Motorola, at EMC, and at AMD in
California. EMC have confirmed against both their Celerra box and
Solaris 2.5 (I think), AMD had a Solaris 2.6 server. Rich Leroy of EMC
has forwarded me a news group article from someone else describing the
same problem with clients talking to a Solaris server.
The use of NLMv3 with NFSv3 led us to suspect NFSv3, so we tried the
registry hack available on your web site to disable NFSv3 on the
clients. That didn't seem to help. We then got EMC to build a new
kernel for their Celerra product that has an option to disable NFSv3.
We installed that at the end of last week. Yesterday was the first full
working day since then and the problem has reappeared twice.
The only solution (all along) is to reboot the server. This cures the
problem for anything between 2 hours and three days.
I am curently working to update our test environment and get some more
traces since it seems NFSv3 is not the (only) problem.
I tried calling the phone number in your sig but I must have missed part
of the country/area code. I would appreciate the opportunity to discuss
this more by phone. Please eithercall me, or give me a few more hints
on your number.
---------------------------------------