debug kernel panic

debug kernel panic

Post by C Pelto » Sat, 05 Apr 2003 09:17:10



I'm running RH7.3 with an AMD K62000 on an albatron 400+, kernel 2.4.20,
with a Radeon AGP card (7500?)  that has random kernel panics. Sometimes
it will stay up for days, other times its down in a matter of minutes.
Are there any known issues with this hardware configuration?

Otherwise, I've found the netdump utility to send the kernel core over
the network, is there a way I can just save it to disk and either
forward it to Redhat or piece through it myself to try and find out what
exactly is causing the panic? Or is it already saving somewhere and I
just don't know where to find it?

I've been using redhat for years and have never seen anything this
unstable - maybe because I wasn't building the system ;).

TIA,
Chris

--
******************************
Chris Pelton
Scientific Computing Support
Campus Data Center, UCDavis
(530)752.1399 Phone
(530)752.0220 Fax
******************************

 
 
 

debug kernel panic

Post by Pete Zaitc » Sat, 05 Apr 2003 12:46:54



> I'm running RH7.3 with an AMD K62000 on an albatron 400+, kernel 2.4.20,
> with a Radeon AGP card (7500?)  that has random kernel panics. Sometimes
> it will stay up for days, other times its down in a matter of minutes.

Are they "panics", lockups, or what? If any tracebacks happen,
are they different each time? Anyhow, you aren't giving any info.

-- Pete

 
 
 

debug kernel panic

Post by Mark » Sat, 05 Apr 2003 19:10:39



Quote:> I'm running RH7.3 with an AMD K62000 on an albatron 400+, kernel 2.4.20,
> with a Radeon AGP card (7500?)  that has random kernel panics. Sometimes
> it will stay up for days, other times its down in a matter of minutes.
> Are there any known issues with this hardware configuration?

I had something similar on a somewhat older system, but it might be the same
problem. There seems to be a problem with some AMD K6 processors. IIRC they
mess up memory addressing when they get too hot. I first noticed my problem
when I was running seti. From the moment I started it, I had at top 2 hours
before at least seti crashed. But in some cases my whole system would crash.
I had the problem before when trying to build kernels, but due to my
inexperience at that time I thought I just did something wrong with the
kernel building.

In any case, eventually it turned out that the cooling of my power supply
was broken, and the cooler of my processor was insufficient. I replaced
both, and the problem has never occurred again.

So my advice is to check if you have applications disappearing or
segfaulting spontaneous, and see if this AMD K6 bug applies to your CPU, and
if you have enough cooling for your CPU.

Mark

 
 
 

1. debugging kernel panics

Help! They've gotten rid of the UNIX gurus at my company and left me
to maintain a pipeline scada system. Usually this system is pretty
stable but we have experienced about 8 kernel panics in the last 3 weeks.
What I am looking for is ideas on how to sort out what might be causing
the panics. I have already talked to Sun and installed all the latest
patches.

Our software includes a home grown device driver that facilitates
inter process communication and it is a good bet this is where the problem
lies. But I don't know for sure.

I know I can run adb -k on the kernel core file but I'm at a loss as
to how to determine what was going on. The system manuals give few details
on debugging kernel core files. I have no experience working at the kernel
level or writing device drivers. I just want to sort out where the problem
is arising. Does one need a good understanding of kernel data structures
etc in order to do this kind of thing or are there relatively straight
forward adb commands for sorting out what is happening?

The particulars:
        Sun OS 4.1.3 on Spark 10 (sun4m architecture)

        the kernel panic is caused by a BAD TRAP - Invalid address on
        supv data fetch

2. 4 computer LAN

3. debug of a kernel panic leading nowhere...

4. Samsung SyncMaster 17GLs monitor setup (X)

5. Debugging X -> kernel panic

6. Newbie Question on install and Uninstall

7. : Kernel debugging (saving debug info)

8. 'Hidden' Caching Proxy?

9. panic not rebooting even though /proc/sys/kernel/panic=1?

10. What is Kernel Panic: EXT2-fs panic (Device 3/65)

11. Kernel panic: EXT2-fs panic on 2.5.4-pre3

12. Kernel panic: aha152x panic (during LILO install)

13. Kernel panic: EXT-fs panic from fdisk