Problem with SUN cluster and pnmd

Problem with SUN cluster and pnmd

Post by deni.. » Tue, 21 Nov 2000 04:00:00



Hi,

I'm experiencing problem with SUNCluster software. I have a server
(Ultra Sparc II) with 2 ethernet cards (one internal and an additional
SCSI + 100 Base T PCI controller). We are using the pnmd daemon with
_no_ -s and -c options. When I unplug the hme1 link, pnmd switches back
to hme0 in about twenty seconds but when I unplug the hme0 link (with
hme1 ok), the switch to hme1 takes about one minute due to the fact that
pnmd thinks that hme1 is not ok and switch back to hme0 (still failed)
before seeing hme1 OK. What went wrong ? Is it a known bug of pnmd or a
problem with the hme1 ethernet card (X6541A) ?

When pnmd is started with no -s or -c options, it is possible to change
the timeout configuration (usually in <cluster_name>.cdb) ? (I think that
pnmd does not read such configuration files when started in non cluster
mode, but there might be another way to change that ping/retry
configuration...).

Thanks in advance,

Regards,

L.Dniel

PS: Please reply to Laurent.Den...@fr.airsysatm.thomson-csf.com

Log of pnmd attached :

1:              NEW RUN (pid = 25001) 1: host id = 0 1: Values for PNM
tuneable parameters - inactive_time (5 s), ping_timeout (4 s), rep_test (3
time(s)), slow_network (2 s)4: h_name = hera 4: 25001 (4)- run(/sbin/ifconfig
hme0 plumb) : return - 0 4: 25001 (4)- run(/sbin/ifconfig hme1 plumb) :
return - 0 4: 25001 (4)- run(/sbin/ifconfig hme1 unplumb) : return - 0 4:
ipaddr = 192.168.13.2, ptr->self_ip = 192.168.13.2 4: ipaddr = 192.168.13.2,
ptr->self_ip = 192.168.13.2 4: BKG: act_adp = hme0, status = 0, no = 0, mark
= 0, fo_time = 0 4: ECS: h_id = 0, h_name = hera, net_mask = 255.255.255.0,
ccd_row = , act_adp = hme0, status = 0, no = 0, mark = 0, fo_time = 0,
self_ip = 192.168.13.2, self_MAC = 0 4: going to setup test_switch thread for
nafo0 4: going to wait ..... 5: kstat_chk: adp = hme0, module = hme, inst = 0
5: kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_old =
31540886907125760; out_old = 31540886907125760 5: test: Sleep 5 second(s) ...
5: kstat_chk: adp = hme0, module = hme, inst = 0 5: kstat_chk: adp = hme0,
module = hme, inst = 0 5: test: in_new1 = 31541123130327040; out_new1 =
31541123130327040; mark = 0; adp_err = 0 5: test (stage 1): No adp problems
detected on bkg[0] 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_old =
31541123130327040; out_old = 31541123130327040 5: test: Sleep 5 second(s) ...

... Unplug of hme0 ...

5: kstat_chk: adp = hme0, module = hme, inst = 0 5: kstat_chk: adp = hme0,
module = hme, inst = 0 5: test: in_new1 = 31662026459709440; out_new1 =
31662026459709440; mark = 0; adp_err = 0 5: test (stage 1): No adp problems
detected on bkg[0] 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_old =
31662026459709440; out_old = 31662026459709440 5: test: Sleep 5 second(s) ...
5: kstat_chk: adp = hme0, module = hme, inst = 0 5: kstat_chk: adp = hme0,
module = hme, inst = 0 5: test: in_new1 = 31662026459709440; out_new1 =
31662026459709440; mark = 0; adp_err = 0 5: test: ipaddr = 192.168.13.2; brd
addr = 192.168.13.255 5: test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null; begin_time_0 = Mon Nov 20 10:44:45 2000
5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null) : return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 =
/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null;
begin_time_1 = Mon Nov 20 10:44:46 2000 5: 25001 (5)- run(/usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test:
finish_time = Mon Nov 20 10:44:46 2000 5: kstat_chk: adp = hme0, module =
hme, inst = 0 5: kstat_chk: adp = hme0, module = hme, inst = 0 5: test:
in_new2 = 31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err
= 0 5: test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null; begin_time_0 = Mon Nov 20 10:44:48 2000 5: 25001
(5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null) : return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 =
/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null;
begin_time_1 = Mon Nov 20 10:44:49 2000 5: 25001 (5)- run(/usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test:
finish_time = Mon Nov 20 10:44:49 2000 5: kstat_chk: adp = hme0, module =
hme, inst = 0 5: kstat_chk: adp = hme0, module = hme, inst = 0 5: test:
in_new2 = 31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err
= 1 5: test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null; begin_time_0 = Mon Nov 20 10:44:51 2000 5: 25001
(5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null) : return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 =
/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null;
begin_time_1 = Mon Nov 20 10:44:52 2000 5: 25001 (5)- run(/usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test:
finish_time = Mon Nov 20 10:44:52 2000 5: kstat_chk: adp = hme0, module =
hme, inst = 0 5: kstat_chk: adp = hme0, module = hme, inst = 0 5: test:
in_new2 = 31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err
= 2 5: test: status is 1 5: Bk_gp (nafo0) Status (DOUBT); Adp (hme0) Status
(DOUBT)5: test: Distinguish N/W failure; begin_time_nw = Mon Nov 20 10:44:54
2000 5: test: Distinguish N/W failure; finish_time_nw = Mon Nov 20 10:44:54
2000 5: test (stage 3): ADAPTER FAILURE on bkg[0] - START SWITCHOVER! 5:
get_logic_ip: enter 5: get_logic_ip: so = 3 5: get_logic_ip: if_cnt = 2 5:
get_logic_ip: name = lo0, act_adp = hme0, len = 4 5: get_logic_ip: name =
hme0, act_adp = hme0, len = 4 5: get_logic_ip (inside) : act_adp = hme0, name
= hme0, inst = 0 5: name = hme0, inst = 0, flag = 1 ifr_flags = 2147 5:
get_logic_ip: name = hme0, ip = 192.168.13.2 5: get_logic_ip: head->ipaddr =
192.168.13.2 5: get_logic_ip: exit 5: LOGICAL IP List :- 5: do_switch: ipaddr
= 192.168.13.2, inst = 0 up = 1 5: do_switch: dst (hme1) & src (hme0) 5:
25001 (5)- run(/sbin/ifconfig hme1 plumb) : return - 0 5: do_switch: dst_adp
(hme1) plumbed 5:  do_switch: finish 1st stage 5: 25001 (5)-
run(/sbin/ifconfig hme0:0 down) : return - 0 5: do_switch: cmd0_down =
/sbin/ifconfig hme0:0 down 5: 25001 (5)- run(/sbin/ifconfig hme0 unplumb) :
return - 0 5: do_switch: src_adp (hme0) unplumbed 5: 25001 (5)-
run(/sbin/ifconfig hme1 192.168.13.2 netmask + broadcast + -trailers up) :
return - 0 5: do_switch: cmd0_up = /sbin/ifconfig hme1 192.168.13.2 netmask +
broadcast + -trailers up 5:  do_switch: finish 2nd stage 5: do_switch: free
mem ---------- 5: kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk:
adp = hme1, module = hme, inst = 1 5: test: in_old = 9326452763787264;
out_old = 9326452763787264 5: test: Sleep 5 second(s) ... 5: kstat_chk: adp =
hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1, module = hme, inst = 1
5: test: in_new1 = 9326452763787264; out_new1 = 9326452763787264; mark = 0;
adp_err = 0 5: test: ipaddr = 192.168.13.2; brd addr = 192.168.13.255 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:44:59 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:00 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:00 2000 5: kstat_chk: adp = hme1, module = hme, inst = 1 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: test: in_new2 =
9326452763787264, out_new2 = 9326452763787264; mark = 0, adp_err = 0 5: test:
ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:02 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:03 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:03 2000 5: kstat_chk: adp = hme1, module = hme, inst = 1 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: test: in_new2 =
9326452763787264, out_new2 = 9326452763787264; mark = 0, adp_err = 1 5: test:
ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:05 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:06 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:06 2000 5: kstat_chk: adp = hme1, module = hme, inst = 1 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: test: in_new2 =
9326452763787264, out_new2 = 9326452763787264; mark = 0, adp_err = 2 5: test:
status is 1 5: test: Distinguish N/W failure; begin_time_nw = Mon Nov 20
10:45:08 2000 5: test: Distinguish N/W failure; finish_time_nw = Mon Nov 20
10:45:08 2000 5: test (stage 3): ADAPTER FAILURE on bkg[0] - START
SWITCHOVER! 5: do_switch: hme1 test failed 5: kstat_chk: adp = hme1, module =
hme, inst = 1 5: kstat_chk: adp = hme1, module = hme, inst = 1 5: test:
in_old = 9326452763787264; out_old = 9326452763787264 5: test: Sleep 5
second(s) ... 5: kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk:
adp = hme1, module = hme, inst = 1 5: test: in_new1 = 9326452763787264;
out_new1 = 9326452763787264; mark = 0; adp_err = 0 5: test: ipaddr =
192.168.13.2; brd addr = 192.168.13.255 5: test: ping cmd = /usr/sbin/ping
-i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null; begin_time_0 = Mon
Nov 20 10:45:13 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) : return - 256 5: ping cmd
(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null)
failed 5: test: ping cmd1 = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4
2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov 20 10:45:14 2000 5: 25001
(5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null
1>/dev/null) : return - 0 5: test: finish_time = Mon Nov 20 10:45:14 2000 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1,
module = hme, inst = 1 5: test: in_new2 = 9326452763787264, out_new2 =
9326452763787264; mark = 0, adp_err = 0 5: test: ping cmd = /usr/sbin/ping
-i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null; begin_time_0 = Mon
Nov 20 10:45:16 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) : return - 256 5: ping cmd
(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null)
failed 5: test: ping cmd1 = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4
2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov 20 10:45:17 2000 5: 25001
(5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null
1>/dev/null) : return - 0 5: test: finish_time = Mon Nov 20 10:45:17 2000 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1,
module = hme, inst = 1 5: test: in_new2 = 9326452763787264, out_new2 =
9326452763787264; mark = 0, adp_err = 1 5: test: ping cmd = /usr/sbin/ping
-i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null; begin_time_0 = Mon
Nov 20 10:45:19 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.2 1 2>/dev/null 1>/dev/null) : return - 256 5: ping cmd
(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null)
failed 5: test: ping cmd1 = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4
2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov 20 10:45:20 2000 5: 25001
(5)- run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.1 4 2>/dev/null
1>/dev/null) : return - 0 5: test: finish_time = Mon Nov 20 10:45:20 2000 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1,
module = hme, inst = 1 5: test: in_new2 = 9326452763787264, out_new2 =
9326452763787264; mark = 0, adp_err = 2 5: test: status is 2 5: test:
Distinguish N/W failure; begin_time_nw = Mon Nov 20 10:45:22 2000 5: test:
Distinguish N/W failure; finish_time_nw = Mon Nov 20 10:45:22 2000 5: test
(stage 3): ADAPTER FAILURE on bkg[0] - START SWITCHOVER!

????

5: get_logic_ip: enter 5: get_logic_ip: so = 3 5: get_logic_ip: if_cnt = 2 5:
get_logic_ip: name = lo0, act_adp = hme1, len = 4 5: get_logic_ip: name =
hme1, act_adp = hme1, len = 4 5: get_logic_ip (inside) : act_adp = hme1, name
= hme1, inst = 0 5: name = hme1, inst = 0, flag = 1 ifr_flags = 2147 5:
get_logic_ip: name = hme1, ip = 192.168.13.2 5: get_logic_ip: head->ipaddr =
192.168.13.2 5: get_logic_ip: exit 5: LOGICAL IP List :- 5: do_switch: ipaddr
= 192.168.13.2, inst = 0 up = 1 5: do_switch: dst (hme0) & src (hme1) 5:
25001 (5)- run(/sbin/ifconfig hme0 plumb) : return - 0 5: do_switch: dst_adp
(hme0) plumbed 5:  do_switch: finish 1st stage 5: 25001 (5)-
run(/sbin/ifconfig hme1:0 down) : return - 0 5: do_switch: cmd0_down =
/sbin/ifconfig hme1:0 down 5: 25001 (5)- run(/sbin/ifconfig hme1 unplumb) :
return - 0 5: do_switch: src_adp (hme1) unplumbed 5: 25001 (5)-
run(/sbin/ifconfig hme0 192.168.13.2 netmask + broadcast + -trailers up) :
return - 0 5: do_switch: cmd0_up = /sbin/ifconfig hme0 192.168.13.2 netmask +
broadcast + -trailers up 5:  do_switch: finish 2nd stage 5: do_switch: free
mem ---------- 5: kstat_chk: adp = hme0, module = hme, inst = 0 5: kstat_chk:
adp = hme0, module = hme, inst = 0 5: test: in_old = 31662026459709440;
out_old = 31662026459709440 5: test: Sleep 5 second(s) ... 5: kstat_chk: adp
= hme0, module = hme, inst = 0 5: kstat_chk: adp = hme0, module = hme, inst =
0 5: test: in_new1 = 31662026459709440; out_new1 = 31662026459709440; mark =
0; adp_err = 0 5: test: ipaddr = 192.168.13.2; brd addr = 192.168.13.255 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:28 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:29 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:29 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 0 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:31 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:32 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:32 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 1 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:34 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:35 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:35 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 2 5:
test: status is 2 5: test: Distinguish N/W failure; begin_time_nw = Mon Nov
20 10:45:37 2000 5: test: Distinguish N/W failure; finish_time_nw = Mon Nov
20 10:45:37 2000 5: test (stage 3): ADAPTER FAILURE on bkg[0] - START
SWITCHOVER! 5: do_switch: hme0 test failed

... (normal hme0 still unpluged) ...

5: kstat_chk: adp = hme0, module = hme, inst = 0 5: kstat_chk: adp = hme0,
module = hme, inst = 0 5: test: in_old = 31662026459709440; out_old =
31662026459709440 5: test: Sleep 5 second(s) ... 5: kstat_chk: adp = hme0,
module = hme, inst = 0 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
test: in_new1 = 31662026459709440; out_new1 = 31662026459709440; mark = 0;
adp_err = 0 5: test: ipaddr = 192.168.13.2; brd addr = 192.168.13.255 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:42 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:43 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:43 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 0 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:45 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:46 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:46 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 1 5:
test: ping cmd = /usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null
1>/dev/null; begin_time_0 = Mon Nov 20 10:45:48 2000 5: 25001 (5)-
run(/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1 2>/dev/null 1>/dev/null) :
return - 256 5: ping cmd (/usr/sbin/ping  -i 192.168.13.2 -r 224.0.0.2 1
2>/dev/null 1>/dev/null) failed 5: test: ping cmd1 = /usr/sbin/ping  -i
192.168.13.2 -r 224.0.0.1 4 2>/dev/null 1>/dev/null; begin_time_1 = Mon Nov
20 10:45:49 2000 5: 25001 (5)- run(/usr/sbin/ping  -i 192.168.13.2 -r
224.0.0.1 4 2>/dev/null 1>/dev/null) : return - 0 5: test: finish_time = Mon
Nov 20 10:45:49 2000 5: kstat_chk: adp = hme0, module = hme, inst = 0 5:
kstat_chk: adp = hme0, module = hme, inst = 0 5: test: in_new2 =
31662026459709440, out_new2 = 31662026459709440; mark = 0, adp_err = 2 5:
test: status is 2 5: test: Distinguish N/W failure; begin_time_nw = Mon Nov
20 10:45:51 2000 5: test: Distinguish N/W failure; finish_time_nw = Mon Nov
20 10:45:51 2000 5: test (stage 3): ADAPTER FAILURE on bkg[0] - START
SWITCHOVER! 5: get_logic_ip: enter 5: get_logic_ip: so = 3 5: get_logic_ip:
if_cnt = 2 5: get_logic_ip: name = lo0, act_adp = hme0, len = 4 5:
get_logic_ip: name = hme0, act_adp = hme0, len = 4 5: get_logic_ip (inside) :
act_adp = hme0, name = hme0, inst = 0 5: name = hme0, inst = 0, flag = 1
ifr_flags = 2147 5: get_logic_ip: name = hme0, ip = 192.168.13.2 5:
get_logic_ip: head->ipaddr = 192.168.13.2 5: get_logic_ip: exit 5: LOGICAL IP
List :- 5: do_switch: ipaddr = 192.168.13.2, inst = 0 up = 1 5: do_switch:
dst (hme1) & src (hme0) 5: 25001 (5)- run(/sbin/ifconfig hme1 plumb) : return
- 0 5: do_switch: dst_adp (hme1) plumbed 5:  do_switch: finish 1st stage 5:
25001 (5)- run(/sbin/ifconfig hme0:0 down) : return - 0 5: do_switch:
cmd0_down = /sbin/ifconfig hme0:0 down 5: 25001 (5)- run(/sbin/ifconfig hme0
unplumb) : return - 0 5: do_switch: src_adp (hme0) unplumbed 5: 25001 (5)-
run(/sbin/ifconfig hme1 192.168.13.2 netmask + broadcast + -trailers up) :
return - 0 5: do_switch: cmd0_up = /sbin/ifconfig hme1 192.168.13.2 netmask +
broadcast + -trailers up 5:  do_switch: finish 2nd stage 5: do_switch: free
mem ---------- 5: kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk:
adp = hme1, module = hme, inst = 1 5: test: in_old = 9326452763787264;
out_old = 9326452763787264 5: test: Sleep 5 second(s) ... 5: kstat_chk: adp =
hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1, module = hme, inst = 1
5: test: in_new1 = 9326710461825024; out_new1 = 9326710461825024; mark = 0;
adp_err = 0 5: test (stage 1): No adp problems detected on bkg[0] 5:
kstat_chk: adp = hme1, module = hme, inst = 1 5: kstat_chk: adp = hme1,
module = hme, inst = 1 5: test: in_old = 9326710461825024; out_old =
9326710461825024 5: test: Sleep 5 second(s) ... ... hme1 ok ...

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

1. Sun Cluster + QFS or Sun Cluster + Veritas

Right now I've got some 3.5 Veritas Clusters running Oracle8
that are due for an upgrade.

I'm playing around with the idea of getting rid of all of
our Veritas software and replacing it with Sun software
(since cluster is now free).

So it would be:

Solaris 10 (01/06 HW release, latest patches)
Sun Volume Manager
MPXIO
QFS
Sun Cluster 3.1

I've used Veritas for so long now & am weary of mixing it in
with Sun Cluster.  Ideally - I'd like to just go with one vendor
for my clustering.

Is MPXIO comparable to DMP?  Are people actually running Sun volume
manager under production Oracle DB's?  I've used SVM for mirroring
boot drives on web servers and app servers but never in the database
tier.

I've read that QFS can do the volume management all on it's own -
is that what people are typically using with Sun Clusters?

I'm reading the FAQs and white papers right now but would love to
hear from some people with real world experience with the stuff.

-Sanjay

2. Do You Want a BSD Magazine?

3. Clustering Sun: Sun cluster or Veritas?

4. teTex package

5. Sun Cluster 3.0 vs Veritas Cluster Server

6. Diff between SLIP & PPP

7. New Whitepaper: Sun Cluster 3.0 Cluster File System

8. DNS for a small LAN

9. Veritas Cluster versus SUN Cluster

10. Best Practices "Migrating Veritas VCS Cluster to SUN Cluster"

11. VERITAS cluster server vs. SUN cluster -- Any recommendations ?

12. Veritas clustering vs SUN clustering

13. Solaris Clusters - Solstice HA 1.2, 1.3 / Sun Cluster PDB etc.