hot spare system

hot spare system

Post by Mark Wiewe » Tue, 27 Oct 1998 04:00:00



Hi all,

I would like to build up a kind of hot spare system to my currently
successful running server (OpenServer 5.0.2).
There are two directorys on my server which I would like to mirror to
the hot standby machine. Any changed data in these directorys should be
"immedeatly" changed on the other machine as well.
Any ideas which could be the smartest way??? Currently I am thinking
about a cpio-copy called by cron every 5 minutes. Any better ideas???

Any help is appreciated,
Mark

--
Mark Wiewel
System Administrator Unix
Legion Telekommunikation GmbH
Am Seestern 24
D-40547 Dsseldorf
Tel.: 0211-523 95 59
Fax:  0211-523 95 99

 
 
 

hot spare system

Post by -bill » Fri, 30 Oct 1998 04:00:00


Do you want a "warm" standby or a "hot standby."

A hot standby will pick up the load of the disabled server without
interruption, also called "failover."


used for (about) 8 years and many versions.  They now offer a "failover"
solution that we have looked at but not yet installed.  It looks as if
it will do the job well, including rassigning the IP address of the
failed system to the backup system and maintaining mirrored data (via a
second network between the two machines).

Give them a yell if you want a hot standby.

There are other failover solutions, but I am not at all familiar with
them or their companies.
--

-bill-



 
 
 

hot spare system

Post by Steve Wer » Fri, 30 Oct 1998 04:00:00


: Hi all,
:
: I would like to build up a kind of hot spare system to my currently
: successful running server (OpenServer 5.0.2).
: There are two directorys on my server which I would like to mirror to
: the hot standby machine. Any changed data in these directorys should be
: "immedeatly" changed on the other machine as well.
: Any ideas which could be the smartest way??? Currently I am thinking
: about a cpio-copy called by cron every 5 minutes. Any better ideas???

There's a bunch of holes in doing something like this.  Basically,
you have inconsistent data on the backup machines at any given
time.  This is especially dangerous for database applications,
and are the reason we use terms like "online backup" and "offline
backup".

I'd do RAID and/or some kind of hot-swappable devices.  Or rdist
once a night across the network, when the system is not being used
(very much).

-sw

 
 
 

hot spare system

Post by Bill Vermilli » Sat, 31 Oct 1998 04:00:00




>: Hi all,
>: I would like to build up a kind of hot spare system to my
>: currently successful running server (OpenServer 5.0.2). There are
>: two directorys on my server which I would like to mirror to the
>: hot standby machine. Any changed data in these directorys should
>: be "immedeatly" changed on the other machine as well. Any ideas
>: which could be the smartest way??? Currently I am thinking about
>: a cpio-copy called by cron every 5 minutes. Any better ideas???
>There's a bunch of holes in doing something like this. Basically,
>you have inconsistent data on the backup machines at any given
>time. This is especially dangerous for database applications, and
>are the reason we use terms like "online backup" and "offline
>backup".

In a properly implemented system there is no problem with this.
The second system is kept exactly in sync with the first - give or
take a few milli/micro seconds.

Quote:>I'd do RAID and/or some kind of hot-swappable devices. Or rdist
>once a night across the network, when the system is not being used
>(very much).

The approach the original poster was taking was to essentail
make a RAID array of the computers.  An HD RAID array won't help at
all if the SCSI adaptor fails, or the system fails in general.

Doing backups overnight is not a solution as the target machine
will start out behind and by the next night the old machines data
is now 24 hours behind.   You'd be in the same boat if the computer
crashed during a nightly backup.

Another poster mentioned 1776.   I recall that Specialix also had
a solution for mirroring servers - that was geared toward serial
users - and moved all users automatically from the failed machine
to the running transparently.    There are probably many other
other mirroring solutions out there for this environment.

The alternative is a fault-tolerant computer - and those tend to
get very expensive very quickly.  Mirrored systems are going to be
typically cheaper than a fault-tolerant computer.  Those tend not
to be stock iNTEL systems, thought the first fault-tolreant system
I saw in person was running on iNTEL 8086's as I recall.  A small
start-up company that never made it AFAIK - call  NO-HALT.

--

 
 
 

hot spare system

Post by Jeff Lieberma » Sun, 01 Nov 1998 04:00:00



Quote:>There's a bunch of holes in doing something like this.  Basically,
>you have inconsistent data on the backup machines at any given
>time.  This is especially dangerous for database applications,
>and are the reason we use terms like "online backup" and "offline
>backup".

Actually, there's only one really big hole and that's the risk of
copying bad data on top of good data.  As long as everything is working
well, such a scheme is workable.  If one system goes nuts, chances are
that it will the other one with it.

I also don't believe in bi-directional replication as there's always a
danger of the copy going in the "wrong" direction.

I have one such backup system that copies key parts and pieces of a
critical database at regular intevals to an off-site "standby" server.
The pipe is sufficiently narrow (ISDN) that copying the entire 1GB of
database would be impractical.  Instead, the application builds
transaction logs and these are copied to the standby server.  After the
copy, the standby database is updated using the transaction logs.  In
theory, since the structure of the standby drive is identical to the
main database server, recovery could be achieved by replacing the real
drive with the standby.  Admittedly, I've never tried it, but it should
be simple enough.

At another site, I have a variety of "single application server" where
I've implimented a crude copy scheme.  This time, I have a 100baseT
switched connection so I can copy the entire database.  To insure
integrity, I run checksums on the monsters at both ends and recopy
anything that fails.  I use the database manager to do a sorted copy to
an rmcd pipe with cpio.  Because I'm using the DBM to generate the
copied records, I can test for record locks and just wait until the lock
is cleared.  I'm getting perhaps one checksum failure every 2 weeks
which methinks is acceptable.

The problem with such crude schemes is they don't scale well into
monster applications.  What works well for relatively small ( <1GB )
databases, will become clumsy and time consuming with larger databases.
I'm not sure what's the right answer for large systems.

Quote:>I'd do RAID and/or some kind of hot-swappable devices.  Or rdist
>once a night across the network, when the system is not being used
>(very much).

RAID makes one big assumption; that the least reliable part of the
storage puzzle is the disk drive.  According to everything I've read,
this is a good assumption.  Yet, my recent experience shows that storage
dataloss and downtime experienced by my customers are caused by (in
order):
  Human error  (trip over the power cord, command line errors).
  Cable problems (loose connectors, flakey ribbon connectors, shredded
     ribbon cable).
  Cooling failure (dead fan causes overheating).
  External influence (power glitches, static blasts, RF hit, water
     damage, runaway auto, programmer bearing screwdriver).

Somewhere near the bottom of the list are drive failures.  In fact, I've
had no drive failures (other than DOA shipping damage) for several
years.  I replace or upgrade the drives before they fail.  My tape
drives fail before my hard disks.  I can see the theoretical benifits of
RAID, but methinks the money would be better spent on a locked cabinet,
armoured cable, fancy cooling, DMI/SNMP monitoring, and barbed wire.

Hot swap is a really fun.  One of my pastimes is to wait for the boss to
sneak into the server cave and ask "How are Things Going(tm)"?  That's
when I yank one of the drives out of his live server and announce that
I'm testing the RAID recovery system.  (Note: Don't do this if he has a
heart condition or ulcer).  Great fun.  However, there is one catch.  It
takes some time for the drive to re-mirror.  For example, the Compaq
5500 400Mhz server, with SmartSomething 2P RAID adapter, 16MB cache, 4ea
4.3GB drives setup as RAID 10 (mirror + stripe) took 80 minutes to
re-mirror one drive.  During the 80 minutes, system performance was
useable, but not great.

I've often wondered why SCO doesn't bash RAID and sell customers on the
concept of a "spare" server.  That way, SCO could sell twice as many
server licenses than if they pushed the RAID concept.

--
Jeff Liebermann  150 Felker St #D  Santa Cruz CA 95060
(831)699-0483 pgr (831)426-1240 fax (831)336-2558 home
http://www.cruzio.com/~jeffl   WB6SSY

 
 
 

1. Disksuite - Removing a Hot spare

How do you go about removing a hot spare which is in use ?
i.e. Disassociating it from the submirror that it is now
part of.

I basically want to return the disk to it's associated Hot Spare
pool, detach the submirror it was part of, and then build a new
submirror from some spare disks.

N.B. I don't want to use the spare disks as Hot spares.

Thanks a lot.

2. Which Ethernet driver to use with Intell 8255 chipset (Advantech single board cpu)

3. Can hot spare be added after metadevices were created ?

4. Lotus Domino and FreeBSD - any long term experiences out there?

5. DiskSuite + Hot Spare

6. Linux and Solaris x86 2.51?

7. Veritas Mirroring (Hot Spare Problems)

8. IntelliJ IDEA && Java Version ('1.4.1-p3_1' vs '1.4.1_01')

9. Hot spare when both submirrors fails (DiskSuite 4.2)?

10. adding a hot spare to a RAID5 device...

11. VRTS Hot relocation/sparing.

12. Is a hot spare disk needed for disk mirroring?

13. SSA Hot Spares not in loop