Solstice Disk Suite - Replacing a failed disk - Posting only; No response required

Solstice Disk Suite - Replacing a failed disk - Posting only; No response required

Post by Joel Shandelm » Fri, 19 Jul 2002 01:46:11



Admin Colleagues,

I looked for this type of recipe in the Solaris oriented news groups
but did not come across anything that gave step-by-step directions.
I hope this information is useful to others who may need it. A common
problem which others seemed to have had was a boot time error of
"mod_hold_stub" when the SDS state database becomes hosed. I also had
this problem prior to using the following recipe.

I hope this helps others who may have been looking for something similar.
As always, comments, suggestions, and criticism are welcome.

   -- Joel

Replacing a disk under the control of Solstice Disk Suite 4.x

Summary:
This document will assist the reader in replacing a faulty SDS controlled disk
drive. As it is difficult to simulate a "bad" disk without actually having one
I took the liberty of simulating what would likely be required in the event that
a SDS disk would need to be replaced. From my experience with the recipe below,
it doesn't appear that there would be much of a difference between replacing
disk0 (primary boot disk) or disk1 (secondary boot disk).

Assumptions:
(1) Hardware:    Sun E220R
(2) OS Version:  Solaris 2.7
(3) SDS Version: 4.1 (but should also work for 4.2.1 on Solaris 8)
(4) 2 internal Sun SCSI Disk drives installed as c0t0d0/disk0 and c0t1d0/disk1
(5) The root mirror is called d30, the submirrors are called d10 and d20
(6) We are replacing disk1 (2nd disk) in the system
(7) The server runs off a single partition s0 on both disks
    (perhaps not recommended but for simplicity sake, it works)
(8) The replacement disk is >= the size of the original disk (in sectors)

Now for the*details:

Suggestion: Run metastat after each meta command to verify state change

 1) metastat                    # Identify which submirror you want to detach
 2) metaoffline d30 d20         # Take disk1 offline - will no longer mirror
 3) metadetach d30 d20          # Detach it from the d30 mirror
 4) metaclear -f d20            # Delete all references to the d20 submirror

Note: The SDS User Guide implies that prior to rebooting the /etc/vfstab should
      NOT to use a metadevice but rather the native Solaris disk device. In
      practice, since the d30 mirror/d10 submirror is still active and valid,
      it is unnecessary to change the /etc/vfstab file at all for this excercise

 5) metadb -d c0t1d0s7          # Remove the state database copies on disk1
                                # Warning: System won't boot init 6 without
                                # first removing the disk1 state DB copies.

 6) shutdown -y -g0 -i0         # Shutdown the server, install the new disk1
 7) Reboot the server           # Lookout for any strange SDS errors
 8) Partition and label disk1   # A shortcut care of if disks are identical

    prtvtoc /dev/rdsk/c0t0d0s0 | fmthard -s - /dev/rdsk/c0t1d0s0

 9) metastat                    # Should show just the d30/d10 mirror/submirror
10) metadb                      # Should show only 2 state DB copies on disk0
11) metadb -a -f -c 2 c0t1d0s7  # Recreate the state database copies on disk1
12) metadb                      # Should show all 4 state DB copies
13) metainit d20 1 1 c0t1d0s0   # d20: Concat/Stripe is setup
14) metattach d30 d20           # d30: Submirror d20 is attached. Resync begins
15) metastat                    # Verify that a resync is in progress
                                # You can monitor the resync progress with the
                                # following Solaris csh commands:
csh% while 1
? echo -n "`date`: "; do /usr/opt/SUNWmd/sbin/metastat | grep "in progress"
? sleep 60
? done
Note: <Ctrl-C> the command after 96%-99% because the resync is no longer
      "in progress" and the script will continue running indefinitely.

16) installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0
17) Test by halting the system and booting from disk1 ok> boot disk1

That's it.

 
 
 

Solstice Disk Suite - Replacing a failed disk - Posting only; No response required

Post by Michael Tos » Fri, 19 Jul 2002 03:05:06


Thanks Joe, for writing this.
From my experience, there is a simpler way:

metastat
metadb -d c0t1d0s7
# Now change the disk1 - it is (kind of) hot swap.
# A reboot would only spoil your uptime statistics and upset your customers.
prtvtoc /dev/rdsk/c0t0d0s0 | fmthard -s - /dev/rdsk/c0t1d0s0
# is nice and correct. I have done
cat /dev/dsk/c0t0d0s0 > /dev/dsk/c0t1d0s0
# and, after a second, abort with Ctrl-C.
metadb -a c0t1d0s7  # do you _really_ need -f -c 2 ?
metareplace -e d30 c0t1d0s7
metastat d30
# should show that d30/d20 is resyncing.

# installboot is not needed. At least not after my "cat" command above.
# And the resync copies the boot block anyway, doesn't it?

--

Michael Tosch / Master IS/IT Support
Ericsson Eurolab Deutschland GmbH
Tel: +49 2407 575 313


 
 
 

Solstice Disk Suite - Replacing a failed disk - Posting only; No response required

Post by Joel Shandelm » Sat, 20 Jul 2002 23:52:27


Michael,

Thank you for your comments and suggestions.

First, you are right about the reboot. Most if not all workgroup class
servers support hot-swappable disk drives. If this is the case, then
certainly the reboot sequence can be (should be!) skipped.

Second, I was under the impression that a minimum of two copies of the
state database should reside on each physical disk. -c 2 does this. The
-f option is unnecessary as you pointed out since a valid state datbase
exists on the alternate running disk.

Third, it looks like metareplace does in fact combine a metainit and
metattach operation into one command for disk replacement operations.

Regarding the installboot being reinstalled automatically on resync, I'll
have to try it myself and let you know.

Thanks again for your comments. Anything to keep users and servers
running non-stop is the way to go.

   -- Joel


> Thanks Joe, for writing this.
> From my experience, there is a simpler way:

> metastat
> metadb -d c0t1d0s7
> # Now change the disk1 - it is (kind of) hot swap.
> # A reboot would only spoil your uptime statistics and upset your customers.
> prtvtoc /dev/rdsk/c0t0d0s0 | fmthard -s - /dev/rdsk/c0t1d0s0
> # is nice and correct. I have done
> cat /dev/dsk/c0t0d0s0 > /dev/dsk/c0t1d0s0
> # and, after a second, abort with Ctrl-C.
> metadb -a c0t1d0s7  # do you _really_ need -f -c 2 ?
> metareplace -e d30 c0t1d0s7
> metastat d30
> # should show that d30/d20 is resyncing.

> # installboot is not needed. At least not after my "cat" command above.
> # And the resync copies the boot block anyway, doesn't it?

 
 
 

1. Moving a Solstice Disk Suite-striped disk from one news server to another

I've got a news server that is running Solaris 2.5.1 and Solstice Disk
Suite 4.0 with various patches to both.  Most of the spool is on a
3-disk striped metadevice.  (Yes, I have heard that Veritas is better,
yada, yada.)

I would like to move this spool disk to another machine without having
to save to tape, metainit, newfs, and restore from tape.  How can I do
this?  I can't guarantee that on the new machine, the block devices
will have the same names, major and minor device numbers, and so
forth.

I'm guessing that I can move the 'set md:mddb_bootlist1' line in
/etc/system to the new system, and modify various files (including "do
not hand edit" files) in /etc/opt/SUNWmd, but then again mddb.cf has a
per-line checksum that I'm not sure how to compute.

Is there an easy way to do this that I have missed?

Also, is there a way to do this if I upgrade to Solaris 2.6 and SDS
4.1?

--

NASA/MSFC Flight Data Systems Branch

2. Firewall/Proxy/Masquerading?

3. Can I Use a big disk to replace a failed disk in VM

4. Compaq FP720 digital flat panel display and XFree86???

5. Solaris 2.6 with Sybase 11.0.3 and Solstice Disk Suite 4.1: Advice/Problems??

6. What options does "beforelight" (screen saver) use/accept?

7. Solstice Disk Suite -- Watch out for root mirrors!

8. Mandrake8 DHCP client interacting with Win2K DHCP server...

9. Error Messages with Solstice Disk Suite

10. Solstice Disk Suite metatool starting problems...

11. Solaris 2.6 and Solstice Disk Suite

12. Where to find Solstice Disk Suite for Solaris 2.6

13. create some simple partitions with solstice disk suite