Monitoring System Error Messages ( Errpt ) ?

Monitoring System Error Messages ( Errpt ) ?

Post by Pepe Leg » Thu, 30 Sep 1999 04:00:00



Does anybody have a good script or even just a rule of thumb about the best way
to monitor system error messages on AIX, as reported by errpt and taking the
appropriate action ?

Capturing the output from errpt is easy . The tricky part is identifying which
messages are urgent and require immediate attention ( like disk or SCSI errors
) and which ones are just warnings.

Thanks, PP

 
 
 

Monitoring System Error Messages ( Errpt ) ?

Post by Burkhard Weebe » Fri, 01 Oct 1999 04:00:00




Quote:> Does anybody have a good script or even just a rule of thumb about the
best way
> to monitor system error messages on AIX, as reported by errpt and taking
the
> appropriate action ?

> Capturing the output from errpt is easy . The tricky part is identifying
which
> messages are urgent and require immediate attention ( like disk or SCSI
errors
> ) and which ones are just warnings.

> Thanks, PP

There is a errnotify class in ODM that runs a defined command when an error
occurs.
This class is standard in AIX4.
For AIX3 I have a couple of files that implement this errnotify class and
add a smit interface. I don't know if this smit interface works with AIX4
(not tested yet).

Some major problems to watch already implemented in the
/usr/lib/ras/notifymeth which is the default method (program) called.

I found out that watching DISK_ERR2 (disconnect of disk) and DISK_ERR4(bad
block relocation) is useful.
BBR should be responded to if it occurs more often than 2 times per GB disk
capacity and year.

There may be other events you want to watch (TAPE_ERR s to clean the tape
drive).

I'm sorry but it looks like you have to deal with odmget, odmadd, odmdelete
to add your tailored notification.

--
Burkhard Weeber
viastore systems GmbH
P/O Box 300668
D-70446 Stuttgart


 
 
 

Monitoring System Error Messages ( Errpt ) ?

Post by Geral » Fri, 01 Oct 1999 04:00:00


If you're running AIX4 and are interested in H/W errors, you might
look into diagela. Try "info -s diagela"

> Does anybody have a good script or even just a rule of thumb about the best way
> to monitor system error messages on AIX, as reported by errpt and taking the
> appropriate action ?

> Capturing the output from errpt is easy . The tricky part is identifying which
> messages are urgent and require immediate attention ( like disk or SCSI errors
> ) and which ones are just warnings.

> Thanks, PP

 
 
 

1. Error message when issuing errpt as non-root

I have been getting the following error message whenever I try to run
"errpt" (either on its own or via smit) as a non-root user on a 4.2.1
RS6000 node. The normal errpt output follows, but I would like to fix
the error. Running the command as root does not give the error.

Unable to get errlg_file attribute from ODM object class SWservAt.
Using default value /var/adm/ras/errlog for error log file.

ls -la /etc/objrepos/SWs*
-rw-rw-r--   1 root     system      4096 Jul 06 17:41 SWservAt
-rw-rw-r--   1 root     system      8192 Jul 06 17:41 SWservAt.vc

Could anyone possibly assist?

2. Linux on a Mocha P4

3. Need explanation of message in 3.2 ERRPT

4. KDE2.2 dependancy packages

5. Syslog messages have been redirected to errpt - how to switch this off?

6. Why won't RH7 boot with L1/L2 cache enabled?

7. A strange message from errpt.

8. NEWBIE: HELP ME

9. errpt not showing message when server mysteriously reboots itself

10. Relation between errpt output and contents of /var/adm/messages

11. Error message continuous error messages

12. Monitor System Messages

13. Monitoring For Error Messages