My employer is starting to move more and more of the business onto Linux
servers, and we're getting to the point where it could be very expensive if
a server died.
We're considering building some redundancy into our servers as follows:
- The applications and data for each service (e.g., Oracle database, SMTP
server, HTTP server, etc.) will be stored on sets of RAID disks.
- We'll have N+2 machines (where there are N services running on our Linux
machines), with the extra two just sitting around waiting for a problem.
- Each of the N+2 machines will have a SCSI controller, an Ethernet
controller, and a small internal disk with nothing on it but the kernel,
drivers, and bare operating system (that is, no web/mail/database server).
- Each of the RAID sets will also include a script that makes any settings
necessary on the server and starts the appropriate application. This script
will also bind the necessary IP address to our network card (e.g., the HTTP
disks will bind www.rentrak.com to the card, the SMTP disks will bind
mail.rentrak.com, etc.)
The goal is obviously to get rid of any single points of failure. Disk
failure will be handled automatically by RAID, and any other hardware
problems can be fixed by moving the disks over to one of the backup
machines.
A nice benefit of this separation of hardware/software is that we can
upgrade hardware less painfully; we build the hardware, make sure the
kernel's set up, and just move the application disks to the new box.
Does anyone see any problems with this scheme? I'm not sure whether it
would even be possible to move the RAID disks to a machine with a different
RAID controller (or to a machine using software RAID). Am I missing
anything else?
--
Aaron Harsh