I need a little troubleshooting advice.
In a nutshell...
My server, a e450 running Solaris 5.6 is losing network connectivity.
SunOS foobarserver 5.6 Generic_105181-26 sun4u sparc SUNW,Ultra-4
Setup: e450 Solaris 5.6, hme0 & qfe0 have accessible IP's.
Symtoms: I'm ssh'd into both IP's tailing log files, I'm telneted in
to both IP's, tailing log files & ping tests, and I get dropped
randomly 45 seconds to sometimes 2 minutes, zero to 2 times a day.
rlogin & ftp die as well. The server becomes unpingable from the
outside using either IP.
NFS mount is mounted as follows;
/usr/stc from foobarserver2:/export/stc
Flags: vers=2,proto=udp,sec=sys,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5
Lookups: srtt=7 (17ms), dev=3 (15ms), cur=2 (40ms)
Reads: srtt=4 (10ms), dev=3 (15ms), cur=2 (40ms)
Writes: srtt=7 (17ms), dev=4 (20ms), cur=2 (40ms)
All: srtt=7 (17ms), dev=3 (15ms), cur=2 (40ms)
I understand that version2/udp isn't very good, but the netware server
won't accept a vers3/tcp connection as it needs to be upgraded.
/var/adm/messages is pummelled with nfs timeouts for 1 second & back
the same second. The outages I'm experiencing can be correlated to
those nfs timeouts which span 30-120 seconds. I was initially thinking
that having this NFS mount under /usr probably wasn't the best idea as
it locks up the /usr directory (not sub directories) & thought it may
be causing our problems, i.e. the cause, not the effect, but I have
been unable to reproduce it in the lab & ssh is running out of /opt.
Would not being able to stat the /usr directory in any way be causing
this problem? I'm installing the latest recommended patches this
Sunday, what else should I be looking at?
The NFS errors are the only thing showing up in /var/adm/messages.
Today I had a similar problem with a development server, the nfs
mounts are tcp/vers3, same kind of outage, no messages (probably
because its version3 & not udp/2). So now I'm back to thinking it's a
network issue.
Thanks.
Mike