SDS says "Needs maintenance" when nothing's wrong. How to clear?

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by mkirs.. » Sun, 26 Feb 2006 22:25:38



Last night, we had some connectivity problems between a V210 (Solaris
8) and an EMC Clariion. The long and short of it is that I ended up
unmounting, fscking, and remounting all 8 filesystems.

I did not notice this last night, but SDS is now complaining that all 8
metadevices "need maintenance." They mounted okay, the data is fine,
and I've confirmed that there is nothing physically wrong with any of
the components.

How do I clear the "Needs maintenance" messages from my metadevices
without destroying data?

This is an example of one of the devices. Yes, it's a "one-sided"
mirror that I've set up. This is so if I ever need to migrate to
another type of storage, I can, by simply adding the storage and
mirroring.

d16: Mirror
    Submirror 0: d106
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 17694080 blocks

d106: Submirror of d16
    State: Needs maintenance
    Invoke: after replacing "Maintenance" components:
                metareplace d16 c3t61d6s0 <new device>
    Size: 17694080 blocks
    Stripe 0:
        Device      Start Block  Dbase State        Hot Spare
        c3t61d6s0          0     No    Last Erred

As I said, there's nothing wrong with the disk or data, yet SDS is
complaining.

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by larr » Sun, 26 Feb 2006 22:53:30



> Last night, we had some connectivity problems between a V210 (Solaris
> 8) and an EMC Clariion. The long and short of it is that I ended up
> unmounting, fscking, and remounting all 8 filesystems.

> I did not notice this last night, but SDS is now complaining that all 8
> metadevices "need maintenance." They mounted okay, the data is fine,
> and I've confirmed that there is nothing physically wrong with any of
> the components.

> How do I clear the "Needs maintenance" messages from my metadevices
> without destroying data?

> This is an example of one of the devices. Yes, it's a "one-sided"
> mirror that I've set up. This is so if I ever need to migrate to
> another type of storage, I can, by simply adding the storage and
> mirroring.

> d16: Mirror
>     Submirror 0: d106
>       State: Needs maintenance
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 17694080 blocks

> d106: Submirror of d16
>     State: Needs maintenance
>     Invoke: after replacing "Maintenance" components:
>                 metareplace d16 c3t61d6s0 <new device>
>     Size: 17694080 blocks
>     Stripe 0:
>         Device      Start Block  Dbase State        Hot Spare
>         c3t61d6s0          0     No    Last Erred

> As I said, there's nothing wrong with the disk or data, yet SDS is
> complaining.

try metastat -i

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by mkirs.. » Sun, 26 Feb 2006 22:55:27


I get the exact same output:

d16: Mirror
    Submirror 0: d106
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 17694080 blocks

d106: Submirror of d16
    State: Needs maintenance
    Invoke: after replacing "Maintenance" components:
                metareplace d16 c3t61d6s0 <new device>
    Size: 17694080 blocks
    Stripe 0:
        Device      Start Block  Dbase State        Hot Spare
        c3t61d6s0          0     No    Last Erred

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by Julian Jacob » Sun, 26 Feb 2006 23:07:17



Quote:>I get the exact same output:

> d16: Mirror
>    Submirror 0: d106
>      State: Needs maintenance
>    Pass: 1
>    Read option: roundrobin (default)
>    Write option: parallel (default)
>    Size: 17694080 blocks

> d106: Submirror of d16
>    State: Needs maintenance
>    Invoke: after replacing "Maintenance" components:
>                metareplace d16 c3t61d6s0 <new device>
>    Size: 17694080 blocks
>    Stripe 0:
>        Device      Start Block  Dbase State        Hot Spare
>        c3t61d6s0          0     No    Last Erred

You could try replacing the device with itself:
metareplace -e d16 c3t61d6s0

JulianJ

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by mkirs.. » Sun, 26 Feb 2006 23:42:24


Yes, but since this mirror has only one copy, will it just clear things
up or will it try to resync from nothing?
 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by mkirs.. » Mon, 27 Feb 2006 00:19:03


I have a solution: Give it what it wants!

I'll allocate some temporary storage to create a two-copy mirror, sync
up, do the metareplace, and viola!

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by Julian Jacob » Mon, 27 Feb 2006 03:01:17



Quote:>I have a solution: Give it what it wants!

> I'll allocate some temporary storage to create a two-copy mirror, sync
> up, do the metareplace, and viola!

You have to replace the last failed item first.
A metareplace -e should just put that meta device into a good state.
If the item is in a Clarion it should already be in a good state.
Any way whycreate a meta device for this LUN?

JulianJ

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by slackware gu » Mon, 27 Feb 2006 07:25:55


I think metareplace only works if the data is mirrored. From your
output d16 should have 2 subdevices which are mirrirs of each other.
You do not have mirrored data. If the disk is not totally dead, BACK IT
UP IMMEDIATELY! If you do a metareplace, there will be nothing to
restore your data from. Your alternative (assuming you have enough life
left in the disk) is to metainit the d106 mirror (say metainit d206 1 1
cXtXdXs2) and mirror the device using metattach d16 d206.

Good Luck!

 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by slackware gu » Mon, 27 Feb 2006 07:32:37


Then do the metareplace
 
 
 

SDS says "Needs maintenance" when nothing's wrong. How to clear?

Post by Julian Jacob » Tue, 28 Feb 2006 01:29:09



Quote:>I think metareplace only works if the data is mirrored. From your
> output d16 should have 2 subdevices which are mirrirs of each other.
> You do not have mirrored data. If the disk is not totally dead, BACK IT
> UP IMMEDIATELY! If you do a metareplace, there will be nothing to
> restore your data from. Your alternative (assuming you have enough life
> left in the disk) is to metainit the d106 mirror (say metainit d206 1 1
> cXtXdXs2) and mirror the device using metattach d16 d206.

> Good Luck!

The following is a link to a metareplace man page:
http://www.cse.msu.edu/cgi-bin/man2html?metareplace?1m?/usr/man
metareplace - enable or replace components of submirrors or RAID5
metadevices

A component may be in one of several states. The Last Erred and the
Maintenance states require action. Always replace components in the
Maintenance state first, followed by a resync and validation of data. After
components requiring maintenance are fixed, validated, and resynced,
components in the Last Erred state should be replaced. To avoid data loss,
it is always best to back up all data before replacing Last Erred devices

      -e  Transitions the state of component to the available state and
resyncs the failed component. If the failed component has been hot spare
replaced, the hot spare is placed in the available state and made available
for other hot spare replacements. This command is useful when a component
fails due to human error (for example, accidentally turning off a disk), or
because the component was physically replaced. In this case, the replacement
component must be partitioned to match the disk being replaced before
running the metareplace command.

As this mirror is a single sidded mirror we have two options.
1. Delete and recreate the mirror
2. Replace the errored component with itself.
The second option is the fastest as only a metastat -e d16 c3t61d6s0 needs
to be done.

Is it worth having these mirrors? All you are doing is putting LVM or
whatever you want to call it in the way.
The chances are that if you do migrate to new storage you will most likely
also move to larger LUN's. This would also add to your migration plan. When
you use growfs you will write-lock the file system.

JulianJ

 
 
 

1. I say "Hello" Linux says "Goodbye" ... aaarghhh

Hello all,
after much blood, sweat, tears and swearing I got Linux running on a
Compaq 386/20. It has 10MB RAM, about 1GB disk space and a CD-ROM.
It recently gone on the internet as an ftp resource.

Everything worked fine at first but now Telnet & ftp have become unreliable.
When they work, they work fine. However when they don't:

- FTP session
  Open <symbolic address> says:

  "Connected to <Symbolic address>"
  "Escape character is '^]'."
  "Connection closed by foreign host."

- Telnet session

  "Connected to <Symbolic address>"
  "421 Service not available, remote server has closed connection"

  and drops me back to the Linux prompt.

It does this regardless of the client (DOS, Windows, Mac or Un*x).
Ping finds the box and since it sometimes works I'm a bit stuck on
what to try next. I've tried "kill -HUP <inetd PID>". The damn
thing is visible and connect-to-able...

Our campus uses nameservers and I've included four in resolv.conf.
Name resolution uses "named" nameservers first. (Order bind, host
in host.conf).

I've just tried *again* and ... then damn thing is now working!!!
I have a few ideas but don't know how to test them, they are:

- Could it be something is timing out (our network is very busy
  and slow quite often) ? How do I confirm this ?

- Could it be some connection limit (shouldn't be, it's very rare
  that more than one person are connected at once).
  Where is this info held ?

- Could it be a name resolution funny (would "host" help here ?).
  Can I test this ?

- Could it be the first Telnet request(s) fail until some
  sort of dynamic table somewhere is updated ?

- Is inetd not loading telnetd for some reason ?

I've checked that the symbolic name and IP number are correct.
The system does a reboot (via a cron script) but this is
identical to another (486) Linux box which works fine ...

I've read books, scanned FAQs etc. but don't really know
where I can sensibly look next. I don't expect answers,
just suggestions of what/where to looknext.
I apologise if this is a trivial or stupid question, I'm
an acting sysadmin with my "L" plates still on ...

Also, another query:
Where is lpd loaded at boot-time ?
Can it safely be *not* loaded to save CPU and RAM ?
(the machine doesn't have a printer attached).

Thanks in advance for any information/suggestions,
   Andy.

2. OPL3-SA3 on motherboard-RH5.2

3. """"""""My SoundBlast 16 pnp isn't up yet""""""""""""

4. pipe_size limited to 4096 byte

5. 'insmod st' says "kernel_version needed, but can't be found" ???

6. Web based system administration using Java sockets.......

7. GETSERVBYNAME()????????????????????"""""""""""""

8. Caldera 2.2 "Unofficial" CD?

9. sh says: test "$1" = "" when $1="-b"

10. Sun says "Skip 2.3 Maintenance Patches"

11. What should I do when Linux OS says "give root password for maintenance"?

12. bootdisk says "LIL-" and does nothing (slak 3.6)

13. Solstice "needs Maintenance" state! need help!