What is a Tru64 Cluster & what happens on failover?

What is a Tru64 Cluster & what happens on failover?

Post by Zoran Marjansk » Sat, 04 Aug 2001 01:01:12



Hi,

I'm seeking a bit of info on how a Tru64 Cluster behaves when one of two
nodes fails. If one node fails, do all the processes that were running on
that one node, continue to run, uninterrupted, on the second node?

We were told by a Compaq reseller (and a Compaq rep I believe) that
everything just seemlessly keeps running. But now that installation and
setup of the Cluster has begun we're finding out that this is sort of true
for batch type programs like a ftp server or a webserver but not memory
resident processes that startup and never exit; a good example being an
Oracle database instance.

If we have an Oracle DB up and running on one of two nodes, and that one
node fails, would the Oracle instance continue to run on the other node? I
think not. I think the Oracle instance would need to be re-started on the
second node, but in the meantime our business suffers downtime.

I must say that if the failover of memory resident processes is not
seemless, I'm very disappointed with Tru64 Clustering, but particularly the
marketing reps that sold the Tru64 Cluster based on that premis.

Thanks, Zoran.

 
 
 

What is a Tru64 Cluster & what happens on failover?

Post by Dan Noto » Sun, 05 Aug 2001 02:18:51


The best place to find out is to read the Compaq documentation available at
the following URL:

http://tru64unix.compaq.com/faqs/publications/cluster_doc/cluster_51/...
OC.HTM
The documentation is available in HTML and PDF format. Of particular
interest are the "Technical Overview" and "Highly Available Applications"
documents.


Quote:> Hi,

> I'm seeking a bit of info on how a Tru64 Cluster behaves when one of two
> nodes fails. If one node fails, do all the processes that were running on
> that one node, continue to run, uninterrupted, on the second node?

> We were told by a Compaq reseller (and a Compaq rep I believe) that
> everything just seemlessly keeps running. But now that installation and
> setup of the Cluster has begun we're finding out that this is sort of true
> for batch type programs like a ftp server or a webserver but not memory
> resident processes that startup and never exit; a good example being an
> Oracle database instance.

> If we have an Oracle DB up and running on one of two nodes, and that one
> node fails, would the Oracle instance continue to run on the other node? I
> think not. I think the Oracle instance would need to be re-started on the
> second node, but in the meantime our business suffers downtime.

> I must say that if the failover of memory resident processes is not
> seemless, I'm very disappointed with Tru64 Clustering, but particularly
the
> marketing reps that sold the Tru64 Cluster based on that premis.

> Thanks, Zoran.


 
 
 

What is a Tru64 Cluster & what happens on failover?

Post by Jim Comstoc » Mon, 13 Aug 2001 00:33:19


I'm a Tru64 Sysadmin with several Tru64 V5.1 clusters all running Oracle.
Here's the sequence of what happens
on a node failure.

    1) The shutdown script runs on the failing node (shutting down the
Oracle instance). Hopefully, you've
        put the shutdown script for the Oracle Instance and an ifconfig
command to remove the ip alias for the TNS
        listener in the shutdown script.

    2) The cluster daemon decides if there's enough votes to continue
cluster activity.  In a two node cluster,
        you should have a quorum disk to contribute the extra vote to keep
going.

    3) Assuming that item 2 is true, the startup script for the service is
executed on the other node.

    The bottom line is that the Oracle instance is shutdown on the failing
node (yes, there is an interruption of
    service) and started on the other node.  Currently, the only way to have
the failure of a node in the cluster be
    close to transparent is to run the Oracle Parallel Server.

    Hope this helps...


Quote:> Hi,

> I'm seeking a bit of info on how a Tru64 Cluster behaves when one of two
> nodes fails. If one node fails, do all the processes that were running on
> that one node, continue to run, uninterrupted, on the second node?

> We were told by a Compaq reseller (and a Compaq rep I believe) that
> everything just seemlessly keeps running. But now that installation and
> setup of the Cluster has begun we're finding out that this is sort of true
> for batch type programs like a ftp server or a webserver but not memory
> resident processes that startup and never exit; a good example being an
> Oracle database instance.

> If we have an Oracle DB up and running on one of two nodes, and that one
> node fails, would the Oracle instance continue to run on the other node? I
> think not. I think the Oracle instance would need to be re-started on the
> second node, but in the meantime our business suffers downtime.

> I must say that if the failover of memory resident processes is not
> seemless, I'm very disappointed with Tru64 Clustering, but particularly
the
> marketing reps that sold the Tru64 Cluster based on that premis.

> Thanks, Zoran.

 
 
 

What is a Tru64 Cluster & what happens on failover?

Post by SpitsOnSpamme » Tue, 14 Aug 2001 06:13:49


Quote:>    2) The cluster daemon decides if there's enough votes to continue
>cluster activity.  In a two node cluster,
>        you should have a quorum disk to contribute the extra vote to keep
>going.

I'm curious:  Why must there be a "quorum"?  What advantages does a "quorum"
convey?  And what disadvantages does not having a "quorum" invoke?  Thanks in
advance.

-----------------------------
POSTING TO THIS NEWSGROUP IS THE ONLY WAY TO REACH ME:  THIS ACCOUNT REJECTS
ALL EMAIL.  SPAMMERS CAN HARVEST AWAY AND BE DAMNED!!!  

 
 
 

What is a Tru64 Cluster & what happens on failover?

Post by Bill Tod » Tue, 14 Aug 2001 15:13:59



Quote:> >    2) The cluster daemon decides if there's enough votes to continue
> >cluster activity.  In a two node cluster,
> >        you should have a quorum disk to contribute the extra vote to
keep
> >going.

> I'm curious:  Why must there be a "quorum"?  What advantages does a
"quorum"
> convey?  And what disadvantages does not having a "quorum" invoke?  Thanks
in
> advance.

The main function of the 'quorum' is to guarantee that when one or more
cluster communication paths are lost at most one subset of nodes in the
cluster continues to run.  If more than one disjoint subset of nodes
continued to run, independently, concurrently-shared (for write) storage
devices could easily become corrupted and replicas (or any related data)
spread across disjoint partitions of the cluster could get out of synch.

- bill

 
 
 

1. Cluster failover without Cluster

Hello from France ...

After looking at cluster solutions on Solaris (Sun, Legato and Veritas),
we are a little disappointed because none of them can manage a HA level
(i mean 99.99... %), especially if storage is non-Sun and at a very
high cost (100.000 to 150.000 $ for the solution).

So has anybody implement a cluster-failover solution between
2 (or more) Sun Exxxx servers without any commercial cluster solutions.

I think that we can do a lot of things with Veritas VM using mirroring
between the 2 disk arrays (A3500FC or A5000). Of course, it becomes more
complex with the failover process between the 2 servers (IP, hostid,
apps
binaries, etc...).

Another question : is this the good way to have 2 equal servers to make
a good cluster (i mean not have a E10K in primary with a E3500 in
failover).

Thanks a lot for any answer.

2. I am a microsoft user and want to use linux... HELP!!!

3. hacmp failover and dismatched hdisk names on two cluster members

4. RH 5.2 "Can't find directory tree ..." Error -- Possible Fix

5. **Cluster/Failover Software**

6. MultiArch Dependent Automounting under Solaris

7. Cluster failover scripts for LiveCache 7.4

8. INN 2.1 for Solaris7 ?

9. Failover Cluster w/o shared diskset ?

10. Firewall/Router Loadbalancing/Failover Cluster - how?

11. Sequent Cluster Failover Scripts

12. Clustering, load balancing and failover

13. VERITAS Cluster failover trigger