We are setting up Network Management Systems using Sun Ultra 5's
with Solaris 2.6 (SunOS 5.6) and HP OpenView 5.1, and have found that
the machines will occasionally lockup. This has occurred on two
separate machines which are due to be sent to our customer next week.
We have setup several similar machines in the past but this is the
first time we have experienced this problem, although because of other
delays this is the first time we've done extensive long term testing.
One other piece of info is that we have to install a memory upgrade
(purchased from Sun) and a SCSI card (also purchased from Sun) before
we install the OS (the ultra 5's come with 2.7), HP OpenView, patches
and our application.
The "crash" appears to occur as a window is being moved across the
desktop or is being closed. After the crash the system does not
respond to any keyboard or mouse input. We can Telnet to the machine
and can observe that the Xsun process is hogging the CPU (approx 99%).
From the Telnet session we can usually perform an orderly shutdown
using sync ..sync ..reboot, but on a few occassions we've had to power
cycle, although this is usually after the machine has been in the
locked up state for several hours. We thought the culprits might be:
1) a bad install of RAM DIMM and/or SCSI card,
2) running the "Top" utility to observe process activity,
3) something to do with HP OpenView.
We think (1) is unlikely if it's happening in both machines. We thought (2)
was it until we had a crash when Top wasn't running. It could be (3) as the
HPOV background processes are running all the time
but the question remains why does the Xsun process CPU utilisation
go through the roof? Xsun normally occupies less than 0.5% of the CPU.
Anyone else out there seen this/know what it is?
Thanks,