> A question:
> My direct superior, The VP of OPS, who was once long ago an SA as well,
> gave me a directive of 98% availability. Needless to say that leaves a huge
> '?'. I was wondering if you guys would help me define what the deliverable
> really is when someone says 98% avail.
First get him to define 98% of what? 24x7 (ie 24 hrs/day forever)?
7:00am - 6:00pm Mon-Fri? 98% of any given day? This will have a BIG
affect on how you approach the problem. It is just as important to
determine how many crashes are resonable as well has how many hours up.
That is, 98% 0f 24x7 allows for 1 crash a year. 98% of any given day
allows for one crash a day. Ideally, you want to shoot for something in
the middle. :)
Quote:> Here's our scenario:
> 1. The rookie hero, yours truly, a capable NT admin who is now doing SCO
> Unix, for the last 6 months, who had not had a formal training in SA let
> alone UNIX. In short, don't make any assumptions as to my skills.
Include in your proposal a request for SA level training for yourself.
Quote:> 2. The suspect, a Intel based machine, consists of 128mg, 14 GB on three
> drives, 2 200PPros, etc. Takes pride in being the best of breed, with no
> generic parts, and no new gewgaws. Runs SCO 5.0.2. Keeps 5 programmers and
> 3 QA people in food, clothing and shelter. Has a tendency to be far to open
> to change, root password on the loose.
Immediately change the root password. Have anyone who complains submit
to you a request in writing (going to need a certain amount of paper for
CYA!) with their needs for root access. Have them list the functions
they will be doing, and why it cannot be done another way. Keep in mind
there is ligitimate stuff that falls into this category. Develop a
policy that says who has root level, and how often the root password
changes. Make it an immediate policy that the root password is to NEVER
be hardcoded into any scripts (especially FTP) or programs on this, or
any other system. If you do not have the backing from you boss to do
this, then tell him that what he wants is not doable. No sense in
trying to be responsible for an area over which you do not have
authority.
Find out how many of the rest of the logon ids have group zero
priveledges (cat /etc/groups). This gives them essentially the same
security level as root, except for those processes that explicitly look
for the name 'root'.
Quote:> 3. The goal: how do I define uptime?
> To wit:
> What is a reasonable percentage and sample time?
see above. Also, have to allow for regular maintenance (upgrades,
repairs, file reorgs). Get your boss to agree ahead of time whether this
counts against the 98% or not.
Quote:> If not what can be achieved without redundant parts or systems?
Not much. You can go for a RAID 5 file system, which will mirror your
data on multiple drives. Do your backups religously (I would recommend
using vdump, if it is available).
Quote:> What items are normally considered out of my control besides acts of God?
Anything caused directly by any individual who has acess to the system,
i.e. if one of the programmers crashed the telnet daemon (if your are
using telnet, or LAT, etc.) and you have to reboot to recover, that
should not be counted against you. On the other hand, if, as system
admin, you have final say on what does and what does not get run, then
it could.
Also need to coordinate with you local power company as to what their
scheduled down times in your area are, and get a contact name from them
so you can deal with those "non-scheduled events". :)
Quote:> How do I record and verify this info?
setup a cron process to run once an hour every day that will run uptime,
sar and/or vmstat; maybe even a 'ps aux' too. Append both standard out
and standard error to a daily log file. This will give you a consistent
snapshot of both availability and performance (in general).
Quote:> I am in the hot seat, about to be reassigned to cross walk duty or worse.
> Please help.
> Russ Conner
> SA
> Intrix Systems Grou
It sounds as if this is a development box. Generally, the uptime
tolerance is a bit greater than if it were a production system, for most
companies. Good luck and hope this helps.
Courtesy copy e-mailed.
--
The above opinions are mine, not my employer's.