System hang debugging

System hang debugging

Post by David Albrech » Thu, 26 Jun 2003 12:40:00



I've got years of C, C++, and Unix experience, but zip when it comes
to Linux kernel debugging.  I have a 1.2G Athlon system based on the
FIC AD11 MB with 512M of ECC RAM, ATI Radeon 8500DV.  Runs rock solid
on Win2K.  I installed Suse 8.2 on it and the whole system hangs once
or twice a day, it just hung right under my nose.   Mouse pointer goes
away screen freezes solid.  Won't respond to a telnet from outside so
its not just XFree its the whole system.

I'm willing to dive in and look for the problem, but I don't know
where to start.  Are there kernel de*s or forced kernel stack
dump utilities to give one a handle on such problems so you know where
to look?  A good book?  Some doc I should be reading?

Dave

 
 
 

System hang debugging

Post by Kasper Dupon » Thu, 26 Jun 2003 17:45:31



> I've got years of C, C++, and Unix experience, but zip when it comes
> to Linux kernel debugging.  I have a 1.2G Athlon system based on the
> FIC AD11 MB with 512M of ECC RAM, ATI Radeon 8500DV.  Runs rock solid
> on Win2K.  I installed Suse 8.2 on it and the whole system hangs once
> or twice a day, it just hung right under my nose.   Mouse pointer goes
> away screen freezes solid.  Won't respond to a telnet from outside so
> its not just XFree its the whole system.

You are right. Those symptoms indicates the whole system and not just
XFree86 has crashed. (It is still possible, that XFree86 is responsible,
but that is not very likely, it is more likely to be a kernel bug.)

Though nothing points towards XFree86, I still think you should try to
reproduce it without. Mainly because it is easier to debug kernel bugs
while not running XFree86. Obviously in a few cases a bug cannot be
reproduced without XFree86, then you need other means of debuging.

You could try using a serial console or the netconsole patch. That way
you might be able to get some messages from the system as it crashes.

--
Kasper Dupont -- der bruger for meget tid p? usenet.

It is NOT portable (Linus Benedict Torvalds 1991)

 
 
 

System hang debugging

Post by David Albrech » Fri, 27 Jun 2003 06:21:25


Good idea.  I rebooted the system into run level 3 to get rid of the
xdm nonsense and sure enough a couple of hours just sitting at a shell
prompt and it was hung.

Dave

Anyone used the SGI kernel de*?   Would it be at all useful in
this circumstance?

On Wed, 25 Jun 2003 10:45:31 +0200, Kasper Dupont



>> I've got years of C, C++, and Unix experience, but zip when it comes
>> to Linux kernel debugging.  I have a 1.2G Athlon system based on the
>> FIC AD11 MB with 512M of ECC RAM, ATI Radeon 8500DV.  Runs rock solid
>> on Win2K.  I installed Suse 8.2 on it and the whole system hangs once
>> or twice a day, it just hung right under my nose.   Mouse pointer goes
>> away screen freezes solid.  Won't respond to a telnet from outside so
>> its not just XFree its the whole system.

>You are right. Those symptoms indicates the whole system and not just
>XFree86 has crashed. (It is still possible, that XFree86 is responsible,
>but that is not very likely, it is more likely to be a kernel bug.)

>Though nothing points towards XFree86, I still think you should try to
>reproduce it without. Mainly because it is easier to debug kernel bugs
>while not running XFree86. Obviously in a few cases a bug cannot be
>reproduced without XFree86, then you need other means of debuging.

>You could try using a serial console or the netconsole patch. That way
>you might be able to get some messages from the system as it crashes.

 
 
 

1. Debugging a hung SunOs5.3 system

Is there a way to find out where a SunOs5.3 kernel (on SPARC)
is when it hangs?  L1 A does not seem to work with this sort of hang.

If I hooked an RS232 device to the PROM monitor could I get
control after the hang?

Thank you for your help.

                                      Bob Barned

2. can I do this with dd ?

3. Need help with system hang (RH5.0/2.0.32) & tips about kernel debugging

4. Unterstuetzte ISDN-Karten

5. System hangs with unrecoverable SCSI bus or device hang.

6. NIM

7. HELP!: Solaris 2.4 hang with asppp debug level set to 9

8. C-shell idiosyncracy ??

9. debugging a hang ssh session

10. kernel debugging a hung process

11. How to debug: new HD causes kernel to hang?

12. AIX 3.2 hangs with 888-102-300-0c0 after debugging with dbx and gdb

13. ISC V3.01 system(network) hang with no activities on the system