AIX 4.3.3 and on-line backups (long)

AIX 4.3.3 and on-line backups (long)

Post by Malcolm Pre » Fri, 29 Sep 2000 04:00:00



In the IBM Redbook, AIX Logical Volume Manager, from A to Z:
Introduction and Concepts, reference SG24-5432-00, in Chapter 6,
Subsection 1, it discusses on-line backups, and the way that you can
break an active mirror, backup the mirrored (consistant) data, and
then re-mirror, in order to achieve complete and consistent backups,
but at the same time having the system available, and usable for a
large majority of the time.

This chapter goes on at length and regularly emphasises that the
methods developed prior to the extra commands being available in AIX
4.3.3 is a "hack".

As such, we have reworked all our scripts, to use the new
functionality (chfs -a splitcopy=<new fs name> copy=2 <old fs name>),
and it works fine... however, we discovered the following "feature",
which having followed our normal support channels, was written off as
being the way it was designed. I'm sorry, but if this is the way it
was designed, then IBM wants to shoot the person who designed this
feature in.... I've no problem with it, but SURELY this is a bug !!!

I'd be very interested to hear comments from ANYONE about this... am I
being overly demanding ? I mean, I have scripts in place to check my
errorlog, and mail me differences, so I see these very quickly, and it
bugs me... but maybe everyone else can live with it ??

Thanks for listening,

Malcolm Preen

------begin fault description--------
One "feature" of using the on-line backup method, is that the
split-off mirror which is being backed up becomes stale. This is
completely expected, as the whole purpose of using this method is to
be able to perform a backup of a system which requires as close as
possible to 24 hour processing. This implies that the two copies, once
split, will become out of sync.

What we see as a result of this "feature", is that the error log gets
many entries with label LVM_SA_STALEPP, this appears to be made up of:

one stale PP in the JFSLOG lv
one stale PP in the "split" lv

Sometimes, two entries are logged for the same stale partition. The
above, is obviously a minimum. Given a particularly active LV, it
could be that during the time of the backup, EVERY LV becomes stale,
and for a large LV (which it is likely to be given that it is commonly
going to be a database) there could be many thousands of errorlog
entries.

As an example, I have a filesystem, which consists of 20 LPs, having
split it in order to back it up, I filled the previously empty
filesystem, and filled it up, and then emptied it, and as a result I
had 29 LVM_SA_STALEPP errorlog entries.

IBM's response to this report was to change the errorlog template to
not log STALEPP errors.

OK, so it wouldn't log these masses of errors, but it also wouldn't
log any real STALEPP errors, thus putting the customer at risk of not
noticing a failing disk until it had failed fatally.

Given that these errors are entirely expected, and that the
splitlvcopy functionality was added by IBM in response to people using
"the hack" to perform a similar function, the impression that these
errors, and IBM's response to the report of these problems give is
that their solution is almost as bad a "hack" as the original fix.

It would seem that the ideal solution is for the splitlvcopy routine
to mark the "split off copy" in some way that we don't care if the
split copy becomes stale, and thus any staleness in this case should
not be logged to the errorlog.

I hope that explains the situation... for us it is not so vital
anymore, as we have added to our script a routine which having
successfully completed the on-line backup, the STALEPP errors to the
appropriate LVs are removed from the errorlog, but this is not ideal.
----end fault description-------
--
Malcolm (recent 2-1-0 sav%86.96 GAA 3.95 - career 32-31-1 84.69% 6.45)
Goaltending is 90% mental, the other 10% is in your head (ICQ#8195978)
Hockey Results & Tables: http://homepages.tcp.co.uk/~sonic/hockey.html

 
 
 

AIX 4.3.3 and on-line backups (long)

Post by John McQu » Sat, 30 Sep 2000 04:00:00


Malcom,

In general I agree with you, the LP's shouldn't be considered as stale
when they are mounted as a backup copy of a mirrored filesystem.

In the past I've successfully implemented several broken mirror backup
solutions using the so called "hacks" with little or no difficulty but in
those cases the broken mirrors had new LV names and slightly different
mount points so they wouldn't appear as stale.

We even used to break mirrors, reduce the disks out of the VG, create a
new VG, make new LVs from the map files as per the hacks then
varyoff/export the disks and take them to another site or vary them onto
another machine, aren't SSA loops flexible.

I'd like to know if the official method is how sysback takes online
backups. I'd also like to know if sysback is capable of doing online
backups of DBs like Oracle as I'm looking for alternatives to SQL
Backtrack and ADSM.

Kindest Regards

John McQue


> In the IBM Redbook, AIX Logical Volume Manager, from A to Z:
> Introduction and Concepts, reference SG24-5432-00, in Chapter 6,
> Subsection 1, it discusses on-line backups, and the way that you can
> break an active mirror, backup the mirrored (consistant) data, and
> then re-mirror, in order to achieve complete and consistent backups,
> but at the same time having the system available, and usable for a
> large majority of the time.

> This chapter goes on at length and regularly emphasises that the
> methods developed prior to the extra commands being available in AIX
> 4.3.3 is a "hack".

> As such, we have reworked all our scripts, to use the new
> functionality (chfs -a splitcopy=<new fs name> copy=2 <old fs name>),
> and it works fine... however, we discovered the following "feature",
> which having followed our normal support channels, was written off as
> being the way it was designed. I'm sorry, but if this is the way it
> was designed, then IBM wants to shoot the person who designed this
> feature in.... I've no problem with it, but SURELY this is a bug !!!

> I'd be very interested to hear comments from ANYONE about this... am I
> being overly demanding ? I mean, I have scripts in place to check my
> errorlog, and mail me differences, so I see these very quickly, and it
> bugs me... but maybe everyone else can live with it ??

> Thanks for listening,

> Malcolm Preen

> ------begin fault description--------
> One "feature" of using the on-line backup method, is that the
> split-off mirror which is being backed up becomes stale. This is
> completely expected, as the whole purpose of using this method is to
> be able to perform a backup of a system which requires as close as
> possible to 24 hour processing. This implies that the two copies, once
> split, will become out of sync.

> What we see as a result of this "feature", is that the error log gets
> many entries with label LVM_SA_STALEPP, this appears to be made up of:

> one stale PP in the JFSLOG lv
> one stale PP in the "split" lv

> Sometimes, two entries are logged for the same stale partition. The
> above, is obviously a minimum. Given a particularly active LV, it
> could be that during the time of the backup, EVERY LV becomes stale,
> and for a large LV (which it is likely to be given that it is commonly
> going to be a database) there could be many thousands of errorlog
> entries.

> As an example, I have a filesystem, which consists of 20 LPs, having
> split it in order to back it up, I filled the previously empty
> filesystem, and filled it up, and then emptied it, and as a result I
> had 29 LVM_SA_STALEPP errorlog entries.

> IBM's response to this report was to change the errorlog template to
> not log STALEPP errors.

> OK, so it wouldn't log these masses of errors, but it also wouldn't
> log any real STALEPP errors, thus putting the customer at risk of not
> noticing a failing disk until it had failed fatally.

> Given that these errors are entirely expected, and that the
> splitlvcopy functionality was added by IBM in response to people using
> "the hack" to perform a similar function, the impression that these
> errors, and IBM's response to the report of these problems give is
> that their solution is almost as bad a "hack" as the original fix.

> It would seem that the ideal solution is for the splitlvcopy routine
> to mark the "split off copy" in some way that we don't care if the
> split copy becomes stale, and thus any staleness in this case should
> not be logged to the errorlog.

> I hope that explains the situation... for us it is not so vital
> anymore, as we have added to our script a routine which having
> successfully completed the on-line backup, the STALEPP errors to the
> appropriate LVs are removed from the errorlog, but this is not ideal.
> ----end fault description-------
> --
> Malcolm (recent 2-1-0 sav%86.96 GAA 3.95 - career 32-31-1 84.69% 6.45)
> Goaltending is 90% mental, the other 10% is in your head (ICQ#8195978)
> Hockey Results & Tables: http://homepages.tcp.co.uk/~sonic/hockey.html


 
 
 

AIX 4.3.3 and on-line backups (long)

Post by Norman Levi » Fri, 06 Oct 2000 04:00:00


Looks like they tried to make "splitlvcopy' easier to use but
completely forgot about "chlvcopy" and "readlvcopy" which are
much better ways to accomplish what you want.


> In the IBM Redbook, AIX Logical Volume Manager, from A to Z:
> Introduction and Concepts, reference SG24-5432-00, in Chapter 6,
> Subsection 1, it discusses on-line backups, and the way that you can
> break an active mirror, backup the mirrored (consistant) data, and
> then re-mirror, in order to achieve complete and consistent backups,
> but at the same time having the system available, and usable for a
> large majority of the time.

> This chapter goes on at length and regularly emphasises that the
> methods developed prior to the extra commands being available in AIX
> 4.3.3 is a "hack".

> As such, we have reworked all our scripts, to use the new
> functionality (chfs -a splitcopy=<new fs name> copy=2 <old fs name>),
> and it works fine... however, we discovered the following "feature",
> which having followed our normal support channels, was written off as
> being the way it was designed. I'm sorry, but if this is the way it
> was designed, then IBM wants to shoot the person who designed this
> feature in.... I've no problem with it, but SURELY this is a bug !!!

> I'd be very interested to hear comments from ANYONE about this... am I
> being overly demanding ? I mean, I have scripts in place to check my
> errorlog, and mail me differences, so I see these very quickly, and it
> bugs me... but maybe everyone else can live with it ??

> Thanks for listening,

> Malcolm Preen

> ------begin fault description--------
> One "feature" of using the on-line backup method, is that the
> split-off mirror which is being backed up becomes stale. This is
> completely expected, as the whole purpose of using this method is to
> be able to perform a backup of a system which requires as close as
> possible to 24 hour processing. This implies that the two copies, once
> split, will become out of sync.

> What we see as a result of this "feature", is that the error log gets
> many entries with label LVM_SA_STALEPP, this appears to be made up of:

> one stale PP in the JFSLOG lv
> one stale PP in the "split" lv

> Sometimes, two entries are logged for the same stale partition. The
> above, is obviously a minimum. Given a particularly active LV, it
> could be that during the time of the backup, EVERY LV becomes stale,
> and for a large LV (which it is likely to be given that it is commonly
> going to be a database) there could be many thousands of errorlog
> entries.

> As an example, I have a filesystem, which consists of 20 LPs, having
> split it in order to back it up, I filled the previously empty
> filesystem, and filled it up, and then emptied it, and as a result I
> had 29 LVM_SA_STALEPP errorlog entries.

> IBM's response to this report was to change the errorlog template to
> not log STALEPP errors.

> OK, so it wouldn't log these masses of errors, but it also wouldn't
> log any real STALEPP errors, thus putting the customer at risk of not
> noticing a failing disk until it had failed fatally.

> Given that these errors are entirely expected, and that the
> splitlvcopy functionality was added by IBM in response to people using
> "the hack" to perform a similar function, the impression that these
> errors, and IBM's response to the report of these problems give is
> that their solution is almost as bad a "hack" as the original fix.

> It would seem that the ideal solution is for the splitlvcopy routine
> to mark the "split off copy" in some way that we don't care if the
> split copy becomes stale, and thus any staleness in this case should
> not be logged to the errorlog.

> I hope that explains the situation... for us it is not so vital
> anymore, as we have added to our script a routine which having
> successfully completed the on-line backup, the STALEPP errors to the
> appropriate LVs are removed from the errorlog, but this is not ideal.
> ----end fault description-------
> --
> Malcolm (recent 2-1-0 sav%86.96 GAA 3.95 - career 32-31-1 84.69% 6.45)
> Goaltending is 90% mental, the other 10% is in your head (ICQ#8195978)
> Hockey Results & Tables: http://homepages.tcp.co.uk/~sonic/hockey.html

--
Norman Levin
 
 
 

1. On-line/Off-line long-term power backup

Howdy,

I'm interested in any experiences/advice regarding true un-interruptible
power for systems.  We would like some sure means of keeping our 3.2v4.2
server and modems running for at least several hours in the event of a
local service outage.  We are currently running on an APC SmartUPS-900,
which really only provides about 30-min battery time (configured for
20-minute shutdown).

While our power here is choppy at times (local industry, I presume), we
haven't experienced any downtime (at least since moving here in July '95).
 Our system is critical, though, and overnight processing of store data is
crucial to our operation.

Advice/ideas?

Thanks!
Scot

--
Scot Harkins (KA5KDU) | Systems Administrator, Thurman Ind, Bothell, WA
Renton, WA            | Native Texan.  Proud daddy and husband!


2. UNIX SUCKS!!!! (kidding)

3. JobConnectEurope & The Contractor's Club Now On-Line, Thousands of IT Jobs On-Line..

4. lilo says "Sorry don't know how ...

5. backup products for oracle on-line backup

6. getrootfs errors on 3.2.5 machine

7. On-line backup

8. Bad blocks and fsck?

9. Please help us w/ solaris 2.0 on-line backup

10. On-line backups?

11. Informix On-Line 5.0 & AIX 3.2.2

12. aix 3.2.5 awk and long lines

13. Command line: How long is too long?