As part of a consulting job, I am trying to straighten out a few
servers which crash every few days. They are 386's running Sun's
(no-longer-supported) Interactive Unix. (This name always seemed strange
to me; what version of Unix isn't interactive?).
The configuration of these systems, their startup files, and crontabs,
is really bizarre; they've had a whole succession of sysadmin's (who by the way
seem to have been more fluent in DOS batchfile than in shell) adding more
cruft on top of an already not-too-well understood system.
There's a job running every hour to do a "find / -name core"
and delete the corefiles. Programs are frequently crashing, and the
resultant corefiles clog the disk.
I wanted to see what programs were crashing and causing these
core dumps; the "file core" command says "core: English text" instead of,
as I expected, "core: corefile from someprog". The only de* on the
system is gdb, and gdb won't load the corefile.... I did a simple test,
making a C program which did nothing but try to assign to an uninitialized
pointer, so it would get a SEGV and dump core. This worked as expected,
but then "gcc myprog core" said that the core file was not a recognized
core file format. I don't have adb or any other de*s to work with
on this system. What to do?? (The corefile generated by my program,
was also "English text" according to the "file" command). I could
use "strings core" and sift through to figure out the offending program,
but it would be nice to be able to actually do some debugging.
The other question I had, for anyone familiar with Interactive Unix,
is how to get the system to write a dumpfile to be analyzed by "crash".
When the systems in question hang, the sysadmin just hits the reset button.
There's no way to learn anything about what caused the crash unless somehow
a dumpfile gets written. What causes that?
Thanks greatly to anyone who takes the time to answer these questions.