> I've had 2 occurences of the following:
> First I receive:
> Audit: Collection file inode table overflow
> Audit Subsystem termination due to irrecoverable error.
These should only occur if you have auditing enabled. You should have
auditing enabled if you 1) work for a paranoid government agent who, in
the interests of security, want their Pentium Pro systems to run like a
Commodore 64; or 2) are trying to debug a recurring system problem
(break-in or crash), and no other methods seem to work.
So first thing is to make sure you mean to have auditing turned on.
Quote:> Then about 1 half hour later, something bad happens to cron. I have a
> short job running each minute (does netstat -m to a file); this stops
> running; I come in at about 0900 and find the following messages
> repeating 6 times each minute:
> c queue max run limit reached Wed Nov 13 09:17:00
> rescheduling a cron job Wed Nov 13 09:17:00
> The date and time are updated to system time.
If you don't intend to have auditing turned on, turn it off; end of
problem. But if you do, you *must* have a policy about what to do when
auditing resources run out. When the first message appears, you need to
immediately look at the system, figure out what filesystem it's talking
about, and free up space that can be consumed by the auditing data. You
have committed to giving away great gobs of disk space to auditing. You
cannot stop feeding the beast.
Auditing probably shut down when some filesystem was nearly full (and
that filesystem was probably /). After that, other activities continued
to fill that filesystem -- various logs, perhaps mail from your 1/minute
cron job? Eventually the filesystem filled the rest of the way and
other parts of the system started getting indigestion.
Quote:> Using cron tab to stop my 1/minute netstat does not stopp thes messages.
Not unless you clean up...
Quote:> Using ps, I find many job whcih are "/etc/cron" starting about the time
> that the last succesful cron job ran, space 1 minute apart for an hour or
> two; similarrly I find a large number of jobs that look like:
> root 2794 2793 0 01:57:00 ? 0:00 dlvr_audit 847880520 19 26 2793 cron 6
> root 0 run an at/cron session all securi
> where the preceding 2 lines are really on long line (and yes, it really
> does break off in the middle of the word security).
Cron sends messages to the auditing subsystem, documenting what
activities cron has initiated. Apparently this interaction goes awry
when auditing is enabled, but stopped due to running out of space. Each
job run by cron starts an audit delivery process; those processes hang,
waiting for the auditing subsystem to be ready to receive the messages.
Either turn off auditing or feed it more regularly.
Quote:>Bela<