Down 'Solaris' NFS Server hangs 'Solaris' Client

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by David Hil » Sun, 09 Nov 1997 04:00:00



    I have a mix of Solaris 2.5.1 boxes and IRIX 5.2,5.3 & 6.2 boxes on my
    network.  When a Solaris NFS server goes down, my other Solaris clients
    hang on new logins waiting for the server to come backup up, BUT the
    IRIX boxes let users login with no problems.

    The Solaris and Irix boxes mount using the following options.

          vers=2,intr,bg

    We played with the automount program, but Solaris clients still hang.

    Any ideas?

    Thanks.

   -----------
   David Hiltz

   Unix System and Network Administrator
   Northeast Fisheries Science Center

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Mon, 10 Nov 1997 04:00:00


[[ Reply by email or post, don't do both ]]


>    I have a mix of Solaris 2.5.1 boxes and IRIX 5.2,5.3 & 6.2 boxes on my
>    network.  When a Solaris NFS server goes down, my other Solaris clients
>    hang on new logins waiting for the server to come backup up, BUT the
>    IRIX boxes let users login with no problems.
>    The Solaris and Irix boxes mount using the following options.
>          vers=2,intr,bg
>    We played with the automount program, but Solaris clients still hang.

Try mounting "noquota".

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Tue, 11 Nov 1997 04:00:00


[[ Reply by email or post, don't do both ]]


>  I have a box called "xyz" (SunOS) which is exporting one partition that is
>  not using disk quotas.  When "xyz" goes down, it hangs my Solaris boxes
>  which mount this partition.
>  So I don't that it's a "quota" problem.

It's a quota problem.

If you don't mount the filesystems with the "noquota" option, the quota
program (run from /etc/profile and /etc/.login, depending on the user's
shell) will ask teh servers of *all* filesystems currently mounted whether
the user has any quotas.  This will take a *LONG* time if the server is down.

Quote:>  I heard (somewhere) that the "amd" automounter deamon solves this
>  "hanging" problem, but haven't given it a try yet.

If something is unmounted, it won't give you any trouble.
If it's mounted, you'll hang on use and quota calls used by default
on all filesystems.

Amd will have more handging problems, rather than less, as it's single
threaded and once it blocks it's out of there.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Matt Brow » Tue, 11 Nov 1997 04:00:00




> : [[ Reply by email or post, don't do both ]]


> : >    We played with the automount program, but Solaris clients still hang.

> : Try mounting "noquota".

>   I have a box called "xyz" (SunOS) which is exporting one partition that is
>   not using disk quotas.  When "xyz" goes down, it hangs my Solaris boxes
>   which mount this partition.

>   So I don't that it's a "quota" problem.

>   I heard (somewhere) that the "amd" automounter deamon solves this
>   "hanging" problem, but haven't given it a try yet.

I think what you nead is to mount "soft" instead of the default "hard."

Soft mounts are in general not recommended in the NFS manual, but we plan to use
it here in a case where if we can't write to a file mounted from a primary server,
we
have the app attempt to write to a secondary server.  With a hard mount, the app
would hang waiting until the link with the primary server came back.  In a
production
environment, this would be most undesireable.

There are some other ways we could approach this, but unless we hear of or come
across something particularly * about soft mounts, that's the way we intend to
go.


R&D Team Leader, Data Transmission Network

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Chip Campbel » Tue, 11 Nov 1997 04:00:00



> > : >    We played with the automount program, but Solaris clients
> still hang.

Advice that surfaces occasionally is not to make nfs mounts into the
client's root directory. Lots of processes that have nothing to do with
the mounted filesystem will do fstat on / and this will attempt to
access it, which typically causes the process to hang until the server
is available. So, keep your mounts in subdirectories.

Chip

Chip Campbell
Sony Electronics, San Jose, California

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by David Hil » Tue, 11 Nov 1997 04:00:00



: [[ Reply by email or post, don't do both ]]


: >    I have a mix of Solaris 2.5.1 boxes and IRIX 5.2,5.3 & 6.2 boxes on my
: >    network.  When a Solaris NFS server goes down, my other Solaris clients
: >    hang on new logins waiting for the server to come backup up, BUT the
: >    IRIX boxes let users login with no problems.

: >    The Solaris and Irix boxes mount using the following options.

: >          vers=2,intr,bg

: >    We played with the automount program, but Solaris clients still hang.

: Try mounting "noquota".

  I have a box called "xyz" (SunOS) which is exporting one partition that is
  not using disk quotas.  When "xyz" goes down, it hangs my Solaris boxes
  which mount this partition.

  So I don't that it's a "quota" problem.

  I heard (somewhere) that the "amd" automounter deamon solves this
  "hanging" problem, but haven't given it a try yet.

   -----------
   David Hiltz

   Unix System and Network Administrator
   Northeast Fisheries Science Center

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Matt Brow » Wed, 12 Nov 1997 04:00:00



Quote:> Yes, but you should remember that whatever timeout you chose for softmounts,
> it will always be shorter than short temporary outage (of network/whatever).

Help  me understand.  If the softmount times out, won't it return an error to the
application?In other words, won't the write fail at that point?  If I'm looking for
that situation, then what's
the problem?

Quote:

> My experience is that you get NFS timeout occasionally, even in a fully
> functional net (traffic peaks, etc).

> Any NFS server not responding emssaeg you see now, will translate to
> (possibly unnoticed) application data loss.

Again, I'm not understanding where we'll lose data if our apps are looking for the
error.  I feel like I'm missing something here.

Quote:

> You can interrupts writes to hard mounted filesystems, look into
> that instead.

Can you give me a more specific pointer to how this is accomplished?

Thanks for the help.

Matt

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Wed, 12 Nov 1997 04:00:00


[[ Reply by email or post, don't do both ]]



>> Yes, but you should remember that whatever timeout you chose for softmounts,
>> it will always be shorter than short temporary outage (of network/whatever).
>Help  me understand.  If the softmount times out, won't it return an error to the
>application?In other words, won't the write fail at that point?  If I'm looking for
>that situation, then what's
>the problem?

It will return an error on write.

But it will cause a bus error on executables (read timeout)

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Wed, 12 Nov 1997 04:00:00


[[ Reply by email or post, don't do both ]]


>I think what you nead is to mount "soft" instead of the default "hard."

No, don't use soft.

(If you use soft mounts on r/w filesystem, you needn't bother with backups)

Quote:>Soft mounts are in general not recommended in the NFS manual, but we plan to use
>it here in a case where if we can't write to a file mounted from a primary server,
>we
>have the app attempt to write to a secondary server.  With a hard mount, the app
>would hang waiting until the link with the primary server came back.  In a
>production
>environment, this would be most undesireable.

Yes, but you should remember that whatever timeout you chose for softmounts,
it will always be shorter than short temporary outage (of network/whatever).

My experience is that you get NFS timeout occasionally, even in a fully
functional net (traffic peaks, etc).

Any NFS server not responding emssaeg you see now, will translate to
(possibly unnoticed) application data loss.

You can interrupts writes to hard mounted filesystems, look into
that instead.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Jay Sco » Fri, 14 Nov 1997 04:00:00



Quote:>I think what you nead is to mount "soft" instead of the default "hard."

>Soft mounts are in general not recommended in the NFS manual, but we plan to use
>it here in a case where if we can't write to a file mounted from a primary server,
>we
>have the app attempt to write to a secondary server.  With a hard mount, the app
>would hang waiting until the link with the primary server came back.  In a
>production
>environment, this would be most undesireable.

>There are some other ways we could approach this, but unless we hear of or come
>across something particularly * about soft mounts, that's the way we intend to
>go.

i'd personally like to hear how this works out.  we considered this
but the rest of the group did not have the nerve to go ahead.
--
Jay Scott               512-835-3553

Applied Research Labs, Computer Science Div.
University of Texas at Austin
 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Jay Sco » Fri, 14 Nov 1997 04:00:00




>[[ Reply by email or post, don't do both ]]


>>    I have a mix of Solaris 2.5.1 boxes and IRIX 5.2,5.3 & 6.2 boxes on my
>>    network.  When a Solaris NFS server goes down, my other Solaris clients
>>    hang on new logins waiting for the server to come backup up, BUT the
>>    IRIX boxes let users login with no problems.

>>    The Solaris and Irix boxes mount using the following options.

>>          vers=2,intr,bg

>>    We played with the automount program, but Solaris clients still hang.

so do ours.  we've had service calls on this situation open for YEARS.
I think since '94, certainly '95.  our workaround is for our file
server to mount NOTHING.  but the problem's still there.

/usr/sbin/df will hang just before the down server (you've probably
figured this out).  you can see what's down by more'ing /etc/mnttab
and seeing what's next in the list.

good luck getting it fixed.  we've had none.
j.
--
Jay Scott               512-835-3553

Applied Research Labs, Computer Science Div.
University of Texas at Austin

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Jay Sco » Fri, 14 Nov 1997 04:00:00




>  So I don't that it's a "quota" problem.

aarrgh!  forgot to mention.  we're not running quotas,
so that's correct, it's not a quota problem.
j.
--
Jay Scott               512-835-3553

Applied Research Labs, Computer Science Div.
University of Texas at Austin
 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Jay Sco » Fri, 14 Nov 1997 04:00:00




>[[ Reply by email or post, don't do both ]]


>>  I have a box called "xyz" (SunOS) which is exporting one partition that is
>>  not using disk quotas.  When "xyz" goes down, it hangs my Solaris boxes
>>  which mount this partition.

>>  So I don't that it's a "quota" problem.

>It's a quota problem.

>If you don't mount the filesystems with the "noquota" option, the quota
>program (run from /etc/profile and /etc/.login, depending on the user's
>shell) will ask teh servers of *all* filesystems currently mounted whether
>the user has any quotas.  This will take a *LONG* time if the server is down.

bah, humbug.  we don't have quotas, and we have this problem.
and in all the YEARS it's gone on, no one, even the mission critical
"big iron" people never suggested a bit of this, and they had all
the info about automounter and our maps and so on.
--
Jay Scott               512-835-3553

Applied Research Labs, Computer Science Div.
University of Texas at Austin
 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Fri, 14 Nov 1997 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>bah, humbug.  we don't have quotas, and we have this problem.
>and in all the YEARS it's gone on, no one, even the mission critical
>"big iron" people never suggested a bit of this, and they had all
>the info about automounter and our maps and so on.

Well, if they never mentioned "noquota" mounting or replacing /usr/bin/quota
by /bin/true or removing it from /etc/.login and /etc/profile,
*EVEN THOUGH YOU'RE NOT RUNNING QUOTAS* they must have had their
heads in paper bags.

It's one of those things that crops up frequently in sun-managers.

Even if you dont' use quotas, you must mount noquota or every mounted
filesystem gets interrogated about quotas at boot.

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Down 'Solaris' NFS Server hangs 'Solaris' Client

Post by Casper H.S. Dik - Network Security Engine » Fri, 14 Nov 1997 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]




>>  So I don't that it's a "quota" problem.
>aarrgh!  forgot to mention.  we're not running quotas,
>so that's correct, it's not a quota problem.

It *IS* a quota problem, unless you mount "noquota".

Whether you run quotas or not is *TOTALLY* irrelevant.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.