glibc backtrace function

glibc backtrace function

Post by Andrew Taylo » Sat, 29 Mar 2003 02:42:48



I have a question regarding the glibc backtrace function.  This function
is super useful, but is there any way to get it to work within a signal
handler?  It would be nice to be able to trap SIGSEGVs, etc., and print
out a nice, friendly stack trace similar to Java.

It looks like the signal handler has a different stack from the main
program, so the backtrace doesn't reveal what the program was doing,
pre-signal.  Is there any way to get a backtrace of the call stack
*before* the signal happened?

--
Andrew

 
 
 

glibc backtrace function

Post by Steven Rosted » Sat, 29 Mar 2003 03:15:34


<I didn't realize that Andrew had the follow up only to COLDA, so I'm
reposting here with followups to both COLDA and COLDS>


> I have a question regarding the glibc backtrace function.  This function
> is super useful, but is there any way to get it to work within a signal
> handler?  It would be nice to be able to trap SIGSEGVs, etc., and print
> out a nice, friendly stack trace similar to Java.

> It looks like the signal handler has a different stack from the main
> program, so the backtrace doesn't reveal what the program was doing,
> pre-signal.  Is there any way to get a backtrace of the call stack
> *before* the signal happened?

This may be tricky. What the kernel does when handling a signal, is
to add code to the user's stack that calls your signal handler and
on return does a syscall to continue your code where it left off.
So doing a backtrace will have to know how to skip this added code
and take a look at the stack that's above it.

So the signal handler doesn't really use a different stack, it's
just been modified by the kernel to do some clean up for the sig
handler.

-- Steve

 
 
 

glibc backtrace function

Post by Neil Horma » Sat, 29 Mar 2003 04:30:47



> <I didn't realize that Andrew had the follow up only to COLDA, so I'm
> reposting here with followups to both COLDA and COLDS>


>> I have a question regarding the glibc backtrace function.  This
>> function is super useful, but is there any way to get it to work
>> within a signal handler?  It would be nice to be able to trap
>> SIGSEGVs, etc., and print out a nice, friendly stack trace similar to
>> Java.

>> It looks like the signal handler has a different stack from the main
>> program, so the backtrace doesn't reveal what the program was doing,
>> pre-signal.  Is there any way to get a backtrace of the call stack
>> *before* the signal happened?

> This may be tricky. What the kernel does when handling a signal, is
> to add code to the user's stack that calls your signal handler and
> on return does a syscall to continue your code where it left off.
> So doing a backtrace will have to know how to skip this added code
> and take a look at the stack that's above it.

> So the signal handler doesn't really use a different stack, it's
> just been modified by the kernel to do some clean up for the sig
> handler.

> -- Steve

This may be a platform specific thing, but I've done a good bit poking
about in the signal handler code, and I've learned a few things:

1) The "extra data" on the stack when a signal handler is invoked is
formatted (at least under the powerpc) in a manner in which the frame
link pointers are correct.  This means that while the data in these
pseudo frames are invalid for the stack trace, the backtrace function
should be able to traverse them without causing any damage to the data
being gathered. The side effect is that any recorded stack trace from
inside a signal handler will have one or two erroneous entries near the
top, which are usually easily identifiable.

2) There is a way around this if need be.  I've done this on a stack
trace routine I've written myself.  That "extra data" on the stack has a
whole bunch of useful debug data in it, and since its on the stack, you
can get to it.  Again, this is platform specific, but under linux, at a
certain offset from the current stack pointer withing the signal handler
function, a pt_regs structure can be found.  The structure is used by
the kernel to restore processor state for this process after a signal
has been handled.  If you fish out the pt_regs structure inside the
signal handler, then you have access to the contents of the processors
registers at the moment the instruction which generated the signal was
executed.  This includes, of course, the NIP (which lets you know
exactly where the error occured) and GPR1, and the origional stack
pointer (recorded before the signal data was added).  Once this data is
retrieved, then its fairly straightforward to follow the backchain of
links up the stack to record function addresses on your own.

Hope that helps!
Neil

 
 
 

glibc backtrace function

Post by Grant Taylo » Sun, 30 Mar 2003 01:58:04



> The "extra data" on the stack when a signal handler is invoked is
> formatted (at least under the powerpc) in a manner in which the
> frame link pointers are correct.  This means that while the data in
> these pseudo frames are invalid for the stack trace, the backtrace
> function should be able to traverse them without causing any damage
> to the data being gathered. The side effect is that any recorded
> stack trace from inside a signal handler will have one or two
> erroneous entries near the top, which are usually easily
> identifiable.

Yes, this is very platform-specific.  On architectures with
caller-saves ABI semantics, one simply can't construct a valid stack
frame from which you can just return to any arbitrary point in an
interrupted function.  One could use a special epilogue for a
sighandler, I suppose, but that would be poor.

On MIPS (which uses a mixture of caller and callee saves) Linux's
signal frame is in fact a *oline consisting basically of the
syscall "sigreturn" (the kernel then straightens things out).  The
signal handler uses the *oline's frame (there aren't really frames
as such on mips anyway), and its ra is set to this *oline.  I
suspect that many architectures have to do at least some of this sort
of thing.

On MIPS Irix, the C library contains a signal return function (perhaps
the Irix kernel plugs in a well-known return address?) and no
*oline is used; I assume other language runtimes must also contain
explicit signal support.

Quote:> There is a way around this if need be.  I've done this on a stack
> trace routine I've written myself.  That "extra data" on the stack
> has a whole bunch of useful debug data in it, and since its on the
> stack, you can get to it.

Yes, there is a struct ucontext as the second argument (or third on
MIPS/Linux, for no good reason that I could see).  When this is on
your stack, it is sufficient to write out the top part of your stack
and its location; you can compute a backtrace at a later time (or
indeed adapt GDB to operate on this crash dump, as I did).

MIPS stact traces involve hueristic searches for stack pointer and
return address stores in the code; adding in understanding of
*olines and the way ra/sp are saved in there was not a stretch.
On other architectures it may well be unexpected to need this, but it
must surely be simpler than on mips.

--
Grant Taylor - gtaylor<at>picante.com - http://www.veryComputer.com/~gtaylor/
 Linux Printing Website and HOWTO:  http://www.veryComputer.com/

 
 
 

1. Interpreting the results from backtrace()/ backtrace() usability

I've got the following output from backtrace_symbols_fd():

(There are a ton of [0x0]s in the beginning of the file)

...
[0x0]
[0x0]
[0x14000000]
/usr/local/sbin/lockmgr.4.5
[0x42029188]
[0x2e320000]
/usr/local/sbin/lockmgr.4.5
[0x804a758]
[0x2000]
[0x0]
[0x0]
[0x0]
[0x0]
[0x0]
[0x0]

...

[0x0]
[0x0]
[0x0]
[0x0]
[0x0]
[0x0]
[0x0]
[0x10000000]
/lib/ld-linux.so.2[0x40000812]
/usr/local/sbin/lockmgr.4.5[0x4201033a]
[0x4002c1a0]
/lib/ld-linux.so.2(_dl_lookup_versioned_symbol+0x11)[0x40007641]
[0x55a36f5]
/usr/local/sbin/lockmgr.4.5[0x804a7c7]
[0x3]
[0xbfffdef0]
[0xbfffdc0c]
/usr/local/sbin/lockmgr.4.5[0x804a7b7]
[0xe]
/usr/local/sbin/lockmgr.4.5[0x804a758]
[0xbfffdc0c]
/usr/local/sbin/lockmgr.4.5[0x804a7a8]
[0xc]
/lib/ld-linux.so.2[0x400005b8]
/lib/ld-linux.so.2[0x40000218]
/lib/ld-linux.so.2(_rtld_global+0x1c8)[0x400131e8]
[0x4]
[0x4002c294]
[0xbfffdf08]
/usr/local/sbin/lockmgr.4.5
[0x42029188]

I seem to be able to crossreference some 0x8???????es to the output of
nm. But what about the rest? The executable is compiled with debug info.

2. SPARC UFS Support

3. Code to print function call backtrace wanted

4. Iptables "paranoia plus" ruleset... need help

5. Seeking stack backtrace function

6. Help with HTTPd

7. clock function in glibc

8. is there a way to implement this in command line?

9. GLIBC Hash Functions - Please Help...

10. I loose int tell(int fd) C function with new glibc !!!

11. glibc function reference

12. DNS glibc functions confused by trailing hyphen in host name

13. glob() function in glibc