Cannot read crash dump

Cannot read crash dump

Post by bothwel.. » Tue, 27 Mar 2001 23:41:04



I had my production server crash last week while attending class.
H50
3GB RAM
AIX 4.3.2

The error report looked like this:

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
AD331440   0321172601 U S SYSDUMP        SYSTEM DUMP
9D035E4D   0321164901 P S SYSVMM         DATA STORAGE INTERRUPT, PROCESSOR
9DBCFDEE   0321172801 T O errdemon       ERROR LOGGING TURNED ON

Had someone capture the crash dump on tape & the system has been running
OK since rebooting.
Restored the dump file to disk (using tar) and tried to run the crash
command against it:

root> crash /data/sa/dump0321/dump_file
Using /unix as the default namelist file.
Cannot locate offset 0x01f5eb8 in segment 0x000000.
endcomm 0x00000000/0x011b5e90
WARNING: dumpfile does not appear to match namelist
Cannot locate offset 0x00c15d0 in segment 0x000000.
0452-179: Cannot read v structure from address 0x   c15d0.
Symbol proc has null value.
Symbol thread has null value.
Cannot locate offset 0x00c15d0 in segment 0x000000.
0452-179: Cannot read v structure from address 0x   c15d0.
Cannot locate offset 0x00034c4 in segment 0x000000.
0452-1002: Cannot read extension segment value from address 0x000034c4

I believe that the dump was copied to tape:
root> sysdumpdev -L

Device name: /dev/hd6
Major device number: 10
Minor device number: 1
Size:       186116608 bytes
Date/Time:  Wed Mar 21 16:49:06 2001
Dump status:     0
dump completed successfully
Failed to copy the dump from /dev/hd6 to /var/adm/ras.
Allowed the customer to copy the dump to external media.

The logical volume for the primary paging space (hd6) is
located on hdisk0 and is mirrored to hdisk1.

The first few hunder bytes of the crash dump look like this:

        0   41495820 4c564342 00007061 67696e67 00000000   AIX LVCB..paging....
       20   00000000 00000000 00000000 00000000 00000000   ....................
       40   00003030 30303733 35306564 37363935 39392e31   ..00007350ed769599.1
       60   00000000 00000068 64360000 00000000 00000000   .......hd6..........
       80   00000000 00000000 00000000 00000000 00000000   ....................
      100   00000000 00000000 00000000 00000000 00000000   ....................
      120   00000000 00000000 00000054 68752044 65632032   ...........Thu Dec 2
      140   34203136 3a33303a 34372031 3939380a 00000000   4 16:30:47 1998.....
      160   00536174 20446563 20313620 32303a34 313a3131   .Sat Dec 16 20:41:11
      180   20323030 300a0000 00000030 37333530 34433030    2000......073504C00

Any ideas why the crash command complains about the dump file?

Any suggestions about how to examine it successfully?

Thanks!

Bob
--

 
 
 

Cannot read crash dump

Post by Alex Robinso » Wed, 28 Mar 2001 00:15:00


Run 'smitty chgsys'. Make sure that "Enable full CORE dump" is true. It seems to be the default to have it set to false.

This won't help with the old core dump, but next time you should have a useable one.

Regards,
Alex Robinson



>I had my production server crash last week while attending class.
>H50
>3GB RAM
>AIX 4.3.2

>The error report looked like this:

>IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
>AD331440   0321172601 U S SYSDUMP        SYSTEM DUMP
>9D035E4D   0321164901 P S SYSVMM         DATA STORAGE INTERRUPT, PROCESSOR
>9DBCFDEE   0321172801 T O errdemon       ERROR LOGGING TURNED ON

>Had someone capture the crash dump on tape & the system has been running
>OK since rebooting.
>Restored the dump file to disk (using tar) and tried to run the crash
>command against it:

>root> crash /data/sa/dump0321/dump_file
>Using /unix as the default namelist file.
>Cannot locate offset 0x01f5eb8 in segment 0x000000.
>endcomm 0x00000000/0x011b5e90
>WARNING: dumpfile does not appear to match namelist
>Cannot locate offset 0x00c15d0 in segment 0x000000.
>0452-179: Cannot read v structure from address 0x   c15d0.
>Symbol proc has null value.
>Symbol thread has null value.
>Cannot locate offset 0x00c15d0 in segment 0x000000.
>0452-179: Cannot read v structure from address 0x   c15d0.
>Cannot locate offset 0x00034c4 in segment 0x000000.
>0452-1002: Cannot read extension segment value from address 0x000034c4

>I believe that the dump was copied to tape:
>root> sysdumpdev -L

>Device name: /dev/hd6
>Major device number: 10
>Minor device number: 1
>Size:       186116608 bytes
>Date/Time:  Wed Mar 21 16:49:06 2001
>Dump status:     0
>dump completed successfully
>Failed to copy the dump from /dev/hd6 to /var/adm/ras.
>Allowed the customer to copy the dump to external media.

>The logical volume for the primary paging space (hd6) is
>located on hdisk0 and is mirrored to hdisk1.

>The first few hunder bytes of the crash dump look like this:

>        0   41495820 4c564342 00007061 67696e67 00000000   AIX LVCB..paging....
>       20   00000000 00000000 00000000 00000000 00000000   ....................
>       40   00003030 30303733 35306564 37363935 39392e31   ..00007350ed769599.1
>       60   00000000 00000068 64360000 00000000 00000000   .......hd6..........
>       80   00000000 00000000 00000000 00000000 00000000   ....................
>      100   00000000 00000000 00000000 00000000 00000000   ....................
>      120   00000000 00000000 00000054 68752044 65632032   ...........Thu Dec 2
>      140   34203136 3a33303a 34372031 3939380a 00000000   4 16:30:47 1998.....
>      160   00536174 20446563 20313620 32303a34 313a3131   .Sat Dec 16 20:41:11
>      180   20323030 300a0000 00000030 37333530 34433030    2000......073504C00

>Any ideas why the crash command complains about the dump file?

>Any suggestions about how to examine it successfully?

>Thanks!

>Bob
>--


_______________________________________________
Submitted via WebNewsReader of http://www.interbulletin.com

 
 
 

Cannot read crash dump

Post by Christer Pal » Wed, 28 Mar 2001 03:49:09



> On the other hand, AIX 4.3.2 cannot produce a good dump
> file if your paging space is mirrored (iirc).  If that's
> the case, the dump is useless.

>  ...



> >>   ...

> >>The logical volume for the primary paging space (hd6) is
> >>located on hdisk0 and is mirrored to hdisk1.

> >>   ...

That's obviously the case here...

  palm.vcf
< 1K Download
 
 
 

Cannot read crash dump

Post by Jim Shaff » Thu, 29 Mar 2001 00:02:27



Quote:>The logical volume for the primary paging space (hd6) is
>located on hdisk0 and is mirrored to hdisk1.

Awe, and there in lies your problem.
Unfortunately, AIX 4.3.2 won't handle a mirrored dump device well.
You can get the dump using the "readlvcopy" command I believe it is.
The trouble is that the dump is only taken to the first mirror, and
you're reading from all mirrors when you reboot and run crash.  The
best solution is to unmirror your paging space.

This is fixed in version 4.3.3, so perhaps your best solution is to
upgrade to the latest 4.3.3.

--
Jim Shaffer

www.jjshaffer.net

 
 
 

Cannot read crash dump

Post by bothwel.. » Thu, 29 Mar 2001 02:34:42


:>The logical volume for the primary paging space (hd6) is
:>located on hdisk0 and is mirrored to hdisk1.

: Awe, and there in lies your problem.
: Unfortunately, AIX 4.3.2 won't handle a mirrored dump device well.
: You can get the dump using the "readlvcopy" command I believe it is.
: The trouble is that the dump is only taken to the first mirror, and
: you're reading from all mirrors when you reboot and run crash.  The
: best solution is to unmirror your paging space.

: This is fixed in version 4.3.3, so perhaps your best solution is to
: upgrade to the latest 4.3.3.
Hmm, cannot upgrade just yet due to application vendor requirements.
Would like to keep hd6 mirrored for a little faul-tolerance.

Will this work (for future crashes):

--crash--
boot from CD
un-mirror hd6
re-boot from hdisk0
capture crash dump on tape

OR

--crash--
boot from CD
un-mirror hd6
use readlvcopy to capture crash dump

Thanks for all of the replies!

--

 
 
 

Cannot read crash dump

Post by Christer Pal » Thu, 29 Mar 2001 03:43:15



> Would like to keep hd6 mirrored for a little faul-tolerance.

> Will this work (for future crashes):

> --crash--
> boot from CD
> un-mirror hd6
> re-boot from hdisk0
> capture crash dump on tape

> OR

> --crash--
> boot from CD
> un-mirror hd6
> use readlvcopy to capture crash dump

> Thanks for all of the replies!

The first on won't work, the second one I think is the way the problem
is solved in 4.3 IIRC, but don't unmirror first!

The usual way to solve this is to dump to some other LV than hd6 that is
not mirrored. You can control which LV gets the dump by using the
sysdumpdev command.

  palm.vcf
< 1K Download
 
 
 

Cannot read crash dump

Post by Jim Shaff » Thu, 29 Mar 2001 23:28:18


Do you have access to a 4.3.3 system?  If so, you might just get the
/usr/sbin/savecore program from 4.3.3.  I haven't tested running this
on 4.3.2, but it will probably work.  Test it before you have a real
crash!

You could just use readlvcopy when the system comes up.  Usually the
dump is good for a while on hd6 until the system gets heavily into
using paging space.  So do this asap after boot up.

Of course yet another solution is to create a dedicated dump logical
volume that isn't mirrored.

Quote:

>Will this work (for future crashes):

>--crash--
>boot from CD
>un-mirror hd6
>re-boot from hdisk0
>capture crash dump on tape

>OR

>--crash--
>boot from CD
>un-mirror hd6
>use readlvcopy to capture crash dump

--
Jim Shaffer

www.jjshaffer.net
 
 
 

Cannot read crash dump

Post by Norman Levi » Sat, 31 Mar 2001 02:02:41


From your dump messages it appears /var is to small to hold the dump in
hd6.  make it larger.  I'm not sure why you would want a full core dump
since the kernel is a small part of real memory (hopefully).  


> Run 'smitty chgsys'. Make sure that "Enable full CORE dump" is true. It seems to be the default to have it set to false.

> This won't help with the old core dump, but next time you should have a useable one.

> Regards,
> Alex Robinson



> >I had my production server crash last week while attending class.
> >H50
> >3GB RAM
> >AIX 4.3.2

> >The error report looked like this:

> >IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
> >AD331440   0321172601 U S SYSDUMP        SYSTEM DUMP
> >9D035E4D   0321164901 P S SYSVMM         DATA STORAGE INTERRUPT, PROCESSOR
> >9DBCFDEE   0321172801 T O errdemon       ERROR LOGGING TURNED ON

> >Had someone capture the crash dump on tape & the system has been running
> >OK since rebooting.
> >Restored the dump file to disk (using tar) and tried to run the crash
> >command against it:

> >root> crash /data/sa/dump0321/dump_file
> >Using /unix as the default namelist file.
> >Cannot locate offset 0x01f5eb8 in segment 0x000000.
> >endcomm 0x00000000/0x011b5e90
> >WARNING: dumpfile does not appear to match namelist
> >Cannot locate offset 0x00c15d0 in segment 0x000000.
> >0452-179: Cannot read v structure from address 0x   c15d0.
> >Symbol proc has null value.
> >Symbol thread has null value.
> >Cannot locate offset 0x00c15d0 in segment 0x000000.
> >0452-179: Cannot read v structure from address 0x   c15d0.
> >Cannot locate offset 0x00034c4 in segment 0x000000.
> >0452-1002: Cannot read extension segment value from address 0x000034c4

> >I believe that the dump was copied to tape:
> >root> sysdumpdev -L

> >Device name: /dev/hd6
> >Major device number: 10
> >Minor device number: 1
> >Size:       186116608 bytes
> >Date/Time:  Wed Mar 21 16:49:06 2001
> >Dump status:     0
> >dump completed successfully
> >Failed to copy the dump from /dev/hd6 to /var/adm/ras.
> >Allowed the customer to copy the dump to external media.

> >The logical volume for the primary paging space (hd6) is
> >located on hdisk0 and is mirrored to hdisk1.

> >The first few hunder bytes of the crash dump look like this:

> >        0   41495820 4c564342 00007061 67696e67 00000000   AIX LVCB..paging....
> >       20   00000000 00000000 00000000 00000000 00000000   ....................
> >       40   00003030 30303733 35306564 37363935 39392e31   ..00007350ed769599.1
> >       60   00000000 00000068 64360000 00000000 00000000   .......hd6..........
> >       80   00000000 00000000 00000000 00000000 00000000   ....................
> >      100   00000000 00000000 00000000 00000000 00000000   ....................
> >      120   00000000 00000000 00000054 68752044 65632032   ...........Thu Dec 2
> >      140   34203136 3a33303a 34372031 3939380a 00000000   4 16:30:47 1998.....
> >      160   00536174 20446563 20313620 32303a34 313a3131   .Sat Dec 16 20:41:11
> >      180   20323030 300a0000 00000030 37333530 34433030    2000......073504C00

> >Any ideas why the crash command complains about the dump file?

> >Any suggestions about how to examine it successfully?

> >Thanks!

> >Bob
> >--

> _______________________________________________
> Submitted via WebNewsReader of http://www.interbulletin.com

--
Norman Levin

greatest power, the village idiot will come forth to be acclaimed the
leader.'"

 
 
 

1. Crash Can't Read Dump Image

I have a client running OSR5.0.4c with SMP and VDM and oss496a, oss601a,
oss469d, oss471e, app612a and oss605a (in that order) applied that has been
crashing with kernel panics 1-3 times/week since applying these patches
earlier this month.  Since the swap area was too small, I used TA 105920 to
dump directly to their tape device.  This apparently works and then I can
use /etc/ldsysdump to load the tape image back to their hd BUT when I try to
inspect the dump using crash, I receive the following error and crash exits:

        Read error on page table entry at 0xfddeff0

This has happened twice now.  I can run strings on the dump and glean some
information but it sure would be nice to be able to use crash.

Any idea as to my problem?  Should I use somethine other than ldsysdump to
load the tape?

Thank you,
Lucky

Lucky Leavell                      Phone: (800) 481-2393 (US/Canada)
UniXpress - Your Source for SCO       OR: (812) 366-4066
1560 Zoar Church Road NE             FAX: (812) 366-3618

WWW Home Page:  http://www.UniXpress.com  

2. Asynchronous Socket I/O - IRIX 5.3

3. Could not read dump file from tape after crash ?

4. where is nlist ??

5. gdb cannot read elf core dump

6. Stats comp.os.linux.networking (last 7 days)

7. How can I dump the contents of a crash/dump part'n onto floppy?

8. password problem with POP3

9. Does Solaris 2 create "crash dumps" (in /var/crash/*)?

10. needed a crash course in crash dump!!!!

11. tar, streamer : "Cannot read: Cannot allocate memory"

12. unix dump: dumped twice to same dumpfile; restore if file displays only last dump

13. (for gurus) ISSUES: crashes, crash on boot, crash on shutdown