Dual CPUs, share memory & near fault-tolerance

Dual CPUs, share memory & near fault-tolerance

Post by unkno » Fri, 09 Sep 1994 00:31:50



Hi,
   We are doing a research project about implement dual cpu near-fault-tolerance
. The situation is roughly described as following:

               _______  Line1
      X.25 ---|       |-------------HP9000/700 ----------|
              |  BOX  |------------|                     |
      RS232---|_______| Line2      | SHARE MEMORY        |------Disk Array
                                   |                     |
                                   |                     |
                                   ---HP9000/700 ---------

Sorry for the awkward drawing. Basically, we want to get data from X.25
and RS232, through the "BOX" to split the data souces to both two HP9000/700
board. (that means line 1 & 2 has idental data, and HP9000/700 is a board
with CPU, two HP9000/700 and one share mem all access to same VME bus).

The two HP9000/700 will be master/slave modle, althought both get datas,
only master HP9000/700 can write to Disk Array. If master HP9000/700 is crash
, the slave HP9000/700 will take over the action continue write data to
Disk Array. (before crash, both HP9000/700 are active). The two HP boards
can use their own local memory or use share memory.

What we need help is how to write a program using the share memory to
do the switch-over. (so far, two conditions need to switch over, line1 or
line 2 is disconnected, or, master HP9000/700 crash).

Our first idea about this program is a daemon keep on checking the
connection status of line1&2, and status  of master HP board, writting flag
to share memory, another daemon check this flag to determine whether
switch over or not.

We are looking for any source codes, hints, suggestions. No commercial
products please, our dspartment doesn't has the budget, we even make
the "BOX" by ourselves.

Please E-mail me directly, thanks in advance.

scott

 
 
 

Dual CPUs, share memory & near fault-tolerance

Post by Prabhat Ken » Sat, 10 Sep 1994 13:19:39



Quote:>Hi,
>   We are doing a research project about implement dual cpu near-fault-tolerance
>. The situation is roughly described as following:
>               _______  Line1
>      X.25 ---|       |-------------HP9000/700 ----------|
>              |  BOX  |------------|                     |
>      RS232---|_______| Line2      | SHARE MEMORY        |------Disk Array
>                                   |                     |
>                                   |                     |
>                                   ---HP9000/700 ---------

...
...

Quote:>The two HP9000/700 will be master/slave modle, althought both get datas,
>only master HP9000/700 can write to Disk Array. If master HP9000/700 is crash
>, the slave HP9000/700 will take over the action continue write data to
>Disk Array. (before crash, both HP9000/700 are active). The two HP boards
>can use their own local memory or use share memory.

Scott,

These things are usually best done in h/w with some very controlled
firmware support. The fact that you are striving to achieve fault tolerance
probably means you expect to have mission critical apps running. Normal
processes running in user space subject to normal scheduling algorithms
would not be able to make meaningful decisions (not to mention buggy decisions
). Your problem may divided in two parts: 1) detection of fault in one of
the h/w and 2) switching ooperations to the other processor. Part 1) should
be mechanical (h/w based) this ensures (to a large extent) a correct decision
Part 2) has to be left to s/w e.g. from which point in the data stream the
other processor starts responding, dealing with  partially processed data by
the first processor. This part has relatively minor decision making and may
be left to s/w. Also, bugs in this part result in transient loss of data
as opposed to complete failure of operations (as would occur if something
in part 1) goes wrong and both machines think they are masters or slaves.)

Last time I was involved with such a thingy, part 1) was achieved by a
h/w interrupt. Master also had a way of knowing when slave had stopped
functioning thru the same mechanism.  There was also periodic
"switch-over" to check sanity of the slave.


--


 
 
 

1. Dual CPUs, share memory & near fault-tolerance

Hi,
   We are doing a research project about implement dual cpu near-fault-tolerance
. The situation is roughly described as following:

               _______  Line1
      X.25 ---|       |-------------HP9000/700 ----------|
              |  BOX  |------------|                     |
      RS232---|_______| Line2      | SHARE MEMORY        |------Disk Array
                                   |                     |
                                   |                     |
                                   ---HP9000/700 ---------

Sorry for the awkward drawing. Basically, we want to get data from X.25
and RS232, through the "BOX" to split the data souces to both two HP9000/700
board. (that means line 1 & 2 has idental data, and HP9000/700 is a board
with CPU, two HP9000/700 and one share mem all access to same VME bus).

The two HP9000/700 will be master/slave modle, althought both get datas,
only master HP9000/700 can write to Disk Array. If master HP9000/700 is crash
, the slave HP9000/700 will take over the action continue write data to
Disk Array. (before crash, both HP9000/700 are active). The two HP boards
can use their own local memory or use share memory.

What we need help is how to write a program using the share memory to
do the switch-over. (so far, two conditions need to switch over, line1 or
line 2 is disconnected, or, master HP9000/700 crash).

Our first idea about this program is a daemon keep on checking the
connection status of line1&2, and status  of master HP board, writting flag
to share memory, another daemon check this flag to determine whether
switch over or not.

We are looking for any source codes, hints, suggestions. No commercial
products please, our dspartment doesn't has the budget, we even make
the "BOX" by ourselves.

Please E-mail me directly, thanks in advance.

scott

2. Neophyte: ls with wildcards

3. Dual CUPs, share memory & near fault-tolerance

4. idebug could not find engine

5. how to setup sendmail load-balancing and fault-tolerance?

6. Partition Setup Help

7. Fault-tolerance software

8. HELP: PPP does not work in 2.0.14 ...

9. Solaris 10: Increasing the process data space; shared memory segments & intimate shared memory problems

10. strategies for fault-tolerance/redundancy?

11. fault-tolerance on linux servers?

12. Fault-tolerance software

13. Survey- Fault Tolerance & Duplicate Record Elimination