Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Post by Dmitriy » Fri, 25 Mar 2005 05:53:20



I have been having some issues running backup on my Linux file server
the past few weeks. The server setup is as follows:

OS: Red Hat Enterprise Linux ES release 4 (Nahant)
SCSI Card: Adaptec 2940
Tape Drive: Sony SDX-500V (Internal AIT-2)
Backup Software: Bru (version 16.0)
The tape drive is the only SCSI device in the server

What happens is, it backs up perfectly fine for a while, backup,
verify, and restoring from tape works beautifly! Then all of a sudden
it starts giving an error, it still backs up, just the backup never
seems to finish completely:

/var/log/backup.log:
...[list of files being backed up]...
bru: [I109] attention - assuming end of volume 1 (unknown size)

Apparently it's asking for another volume, but since the job is running
in the background, it just aborts.

Here is what appears in the output of dmesg:
st0: Error with sense data: Info fld=0x676, Current st0: sense key
Medium Error
Additional sense: Write error
st0: Error with sense data: Info fld=0x40, Current st0: sense key
Medium Error
Additional sense: Write error
st0: Error with sense data: Info fld=0x40, Current st0: sense key
Medium Error
Additional sense: Write error
st0: Error with sense data: Info fld=0x40, Current st0: sense key
Medium Error
Additional sense: Write error
st0: Error with sense data: Info fld=0x1, Current st0: sense key Medium
Error
Additional sense: Write error
st0: Error on write filemark.

There are no messages in /var/log/messages

The same problem happened a few months ago, back then, we took the
server apart, cleaned everything, made sure all the SCSI settings were
correct, replaced the SCSI card, then replaced the SCSI cable with a
brand new one, none of that helped the problem... a week or so later,
it just kind of went away on it's own.

Anyone have any ideas on what may be causing this and what I can do to
try and stop these seemingly random periods of failures?

Thanks,
-- Dmitriy M

 
 
 

Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Post by John-Paul Stewar » Fri, 25 Mar 2005 06:07:57



> I have been having some issues running backup on my Linux file server
> the past few weeks. The server setup is as follows:

> OS: Red Hat Enterprise Linux ES release 4 (Nahant)
> SCSI Card: Adaptec 2940
> Tape Drive: Sony SDX-500V (Internal AIT-2)
> Backup Software: Bru (version 16.0)
> The tape drive is the only SCSI device in the server

> What happens is, it backs up perfectly fine for a while, backup,
> verify, and restoring from tape works beautifly! Then all of a sudden
> it starts giving an error, it still backs up, just the backup never
> seems to finish completely:

> /var/log/backup.log:
> ...[list of files being backed up]...
> bru: [I109] attention - assuming end of volume 1 (unknown size)

> Apparently it's asking for another volume, but since the job is running
> in the background, it just aborts.

> Here is what appears in the output of dmesg:
> st0: Error with sense data: Info fld=0x676, Current st0: sense key
> Medium Error
> Additional sense: Write error
[snip]
> Anyone have any ideas on what may be causing this and what I can do to
> try and stop these seemingly random periods of failures?

When was the last time you cleaned the tape drive with the
manufacturer's recommended cleaning cartridge?  What about retensioning
the tape?  (I don't know what AIT is like, but my old Travan tapes used
to need periodic retensioning with 'mt retension'.)

 
 
 

Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Post by Dmitriy » Wed, 30 Mar 2005 23:19:19


I cleaned the tape with a Sony AIT cleaning cartridge, and did a manual
retension (although I think the drive does that automatically).

The backup fails even with brand new tapes though, so I don't think
it's tape related. It also seems to fail at a random location.

Here is a s*t of my bruexeclog:

---- bruexeclog --------
20050321 15:36:25|2606|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N7 -f /dev/st0 .'
20050321 16:17:18|2606|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 423f30490a2e, (Op)
20050322 07:22:08|2606|root|[L182] wrote 19012576 blocks (38025152
KBytes) on volume [1], 15:04:50, 1669 Kb/sec (effective)
20050322 07:23:48|2606|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050322 07:23:48|2606|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050322 07:23:48|2606|root|[E070] interaction needed, aborted by -B
option
20050322 07:23:49|2606|root|[L165] FINISH - 0 warnings, 1 errors, exit
code = 2
20050322 15:00:01|3435|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N7 -f /dev/st0 .'
20050322 15:39:45|3435|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 424079410d6b, (Op)
20050323 06:37:01|3435|root|[L182] wrote 20528576 blocks (41057152
KBytes) on volume [1], 14:57:16, 1776 Kb/sec (effective)
20050323 06:38:28|3435|root|[W003] "/dev/st0": warning - close error on
archive: errno = 5, Input/output error
20050323 06:38:28|3435|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050323 06:38:28|3435|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050323 06:38:28|3435|root|[E070] interaction needed, aborted by -B
option
20050323 06:38:29|3435|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2
20050323 15:00:01|5124|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N7 -f /dev/st0 .'
20050323 15:40:53|5124|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 4241cac11404, (Op)
20050324 10:51:34|5124|root|[L182] wrote 23856160 blocks (47712320
KBytes) on volume [1], 19:10:41, 1842 Kb/sec (effective)
20050324 10:53:13|5124|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050324 10:53:13|5124|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050324 10:53:13|5124|root|[E070] interaction needed, aborted by -B
option
20050324 10:53:14|5124|root|[L165] FINISH - 0 warnings, 1 errors, exit
code = 2
20050324 15:00:01|6191|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050324 15:28:59|6191|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 42431c41182f, (Op)
20050325 03:52:04|6191|root|[L182] wrote 16795328 blocks (33590656
KBytes) on volume [1], 12:23:05, 1786 Kb/sec (effective)
20050325 03:53:32|6191|root|[W003] "/dev/st0": warning - close error on
archive: errno = 5, Input/output error
20050325 03:53:32|6191|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050325 03:53:32|6191|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050325 03:53:32|6191|root|[E070] interaction needed, aborted by -B
option
20050325 03:53:33|6191|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2
20050325 10:01:01|2592|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050325 10:28:56|2592|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 424427ad0a20, (Op)
20050325 15:00:02|2726|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050325 16:00:15|2726|root|[W002] "/dev/st0": can't open archive:
errno = 16, Device or resource busy
20050325 16:00:15|2726|root|[A121] load volume 1 - press ENTER to
continue on device '/dev/st0'
20050325 16:00:15|2726|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050325 16:00:15|2726|root|[E070] interaction needed, aborted by -B
option
20050325 16:00:16|2726|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2
20050325 22:35:16|2592|root|[L182] wrote 20132512 blocks (40265024
KBytes) on volume [1], 12:06:20, 2190 Kb/sec (effective)
20050325 22:37:03|2592|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050325 22:37:03|2592|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050325 22:37:03|2592|root|[E070] interaction needed, aborted by -B
option
20050325 22:37:04|2592|root|[L165] FINISH - 0 warnings, 1 errors, exit
code = 2
20050328 09:26:57|2627|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050328 09:55:01|2627|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 424814310a43, (Op)
20050328 12:09:22|2627|root|[L182] wrote 3064816 blocks (6129632
KBytes) on volume [1], 2:14:21, 1246 Kb/sec (effective)
20050328 12:10:03|2627|root|[W003] "/dev/st0": warning - close error on
archive: errno = 5, Input/output error
20050328 12:10:03|2627|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050328 12:10:03|2627|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050328 12:10:03|2627|root|[E070] interaction needed, aborted by -B
option
20050328 12:10:03|2627|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2
20050328 12:35:52|2647|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050328 13:02:33|2647|root|[L167] device = /dev/st0, buffer = 32K
bytes, media size = <unknown>, archive id = 424840780a57, (Op)
20050328 15:00:02|2664|root|[L163] START (r 16.0.8.0.0Bg), CMD =
'/bin/bru -c -ddd -v -O -G -V -Z -B -X -N6 -f /dev/st0 .'
20050328 15:59:31|2664|root|[W002] "/dev/st0": can't open archive:
errno = 16, Device or resource busy
20050328 15:59:31|2664|root|[A121] load volume 1 - press ENTER to
continue on device '/dev/st0'
20050328 15:59:31|2664|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050328 15:59:31|2664|root|[E070] interaction needed, aborted by -B
option
20050328 15:59:32|2664|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2
20050329 02:55:36|2647|root|[L182] wrote 23000384 blocks (46000768
KBytes) on volume [1], 13:53:03, 2248 Kb/sec (effective)
20050329 02:57:11|2647|root|[W003] "/dev/st0": warning - close error on
archive: errno = 5, Input/output error
20050329 02:57:12|2647|root|[A121] load volume 2 - press ENTER to
continue on device '/dev/st0'
20050329 02:57:12|2647|root|[Q120] (C)ontinue  (L)abel  (N)ew device
(Q)uit [default: C] >>
20050329 02:57:12|2647|root|[E070] interaction needed, aborted by -B
option
20050329 02:57:12|2647|root|[L165] FINISH - 1 warnings, 1 errors, exit
code = 2

---- end-----

Bug in software? Faulty hardware? Ideas anyone?

Thanks,
--Dmitriy

 
 
 

Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Post by John-Paul Stewar » Thu, 31 Mar 2005 01:34:19



> I cleaned the tape with a Sony AIT cleaning cartridge, and did a manual
> retension (although I think the drive does that automatically).

> The backup fails even with brand new tapes though, so I don't think
> it's tape related. It also seems to fail at a random location.

> Here is a s*t of my bruexeclog:

> ---- bruexeclog --------
> 20050321 15:36:25|2606|root|[L163] START (r 16.0.8.0.0Bg), CMD =
> '/bin/bru -c -ddd -v -O -G -V -Z -B -X -N7 -f /dev/st0 .'
> 20050321 16:17:18|2606|root|[L167] device = /dev/st0, buffer = 32K
> bytes, media size = <unknown>, archive id = 423f30490a2e, (Op)
> 20050322 07:22:08|2606|root|[L182] wrote 19012576 blocks (38025152
> KBytes) on volume [1], 15:04:50, 1669 Kb/sec (effective)
> 20050322 07:23:48|2606|root|[A121] load volume 2 - press ENTER to
> continue on device '/dev/st0'

Looks like you reached the end of the tape!  BRU is asking for volume 2,
and aborting.  (The -B option causes it to run in background mode.  Try
running without -B to be sure.)

The log shows you wrote 38 025 152 KBytes, or roughly 38GB.  How big is
this tape supposed to be (both native capacity and manufacturers
estimated compressed capacity)?

 
 
 

Adaptec 2940 and Sony AIT-2 Tape Drive Problems

Post by Dmitriy » Thu, 31 Mar 2005 23:24:28


BRU 'thinks' it reached the end of the tape, however I don't think it
actually has for several reasons:
(1) The native capacity of the media (SDX2-50C)  is 50GB, and it can go
up to 130GB compressed.
(http://www.peripheralstorage.com/tape/sony/sony_ait2_ext.htm)

(2) The values for "bytes written" seem to be radiacally different from
time to time, if it was just the tape getting full I'd expect the
backup to always fail with about the same number of bytes written.

I think I will try running it without the -B option to just see what it
does, any other ideas would be greatly appreciated though

Thanks!
--Dmitriy