Bogus end() stack frames under Linux

Bogus end() stack frames under Linux

Post by Ben Wi » Thu, 10 Aug 1995 04:00:00



Quite frequently, when attaching to a process using gdb or investigating
a core dump using gdb, I find that some of the stack frames have been
trashed, and are usually shown as a call to end() or something random
like that.

Frequently, in fact, the stack frame that gets trashed is the one in
which a signal handler was invoked (typically because it did something
bogus).  This can make it a real pain in the ass to figure out what
went wrong, if the trashed stack frame was in a large function.

Here's an example of what I'm talking about:

#0  0x60018cf9 in end ()
#1  0x12c0 in mark_device (obj={...}, markobj=0x6) at device.c:114
#2  0xa5fd3 in fatal_error_signal (sig=6) at emacs.c:171
#3  0xbffff4dc in end ()
#4  0x6001d22a in end ()
#5  0x60007963 in end ()
#6  0xa7fa1 in assert_failed (file=0x1102f7 "eval.c", line=1709,
    expr=0x112c77 "abort()") at emacs.c:1589
#7  0x113a84 in signal_1 (sig={...}, data={...}) at eval.c:1709
#8  0x114161 in Fsignal (sig={...}, data={...}) at eval.c:1855
#9  0x1141c1 in signal_error (sig={...}, data={...}) at eval.c:1863
#10 0x104a79 in arith_error (signo=8) at data.c:1437
#11 0xbffff658 in end ()
#12 0x1bb24c in xlw_update_one_widget (instance=0x3a72e0, widget=0x38d600,
    val=0x361f80, deep_p=0 '\000') at lwlib-Xlw.c:376
#13 0x1b925d in set_one_value (instance=0x3a72e0, val=0x361f80,
    deep_p=0 '\000') at lwlib.c:662
#14 0x1b92f3 in update_one_widget_instance (instance=0x3a72e0, deep_p=0
'\000')
    at lwlib.c:686
#15 0x1b9346 in update_all_widget_values (info=0x31c980, deep_p=0 '\000')
    at lwlib.c:696
#16 0x1b9586 in lw_modify_all_widgets (id=65541, val=0x344080, deep_p=0
'\000')
    at lwlib.c:747
#17 0xa1b95 in x_update_scrollbar_instance_status (w=0x2c8600, active=1,
    size=15, instance=0x2beec0) at scrollbar-x.c:267
#18 0x1a7ac6 in update_window_scrollbars (w=0x2c8600, mirror=0x321280,
    active=1, horiz_only=0) at scrollbar.c:264
#19 0x3a7d5 in redisplay_output_window (w=0x2c8600) at redisplay-output.c:1312
#20 0x2aa8f in redisplay_window (window={...}, skip_selected=0)
    at redisplay.c:4709
#21 0x2b0b0 in redisplay_frame (f=0x275700) at redisplay.c:4815
#22 0x2b7f1 in redisplay_device (d=0x2fb900) at redisplay.c:4918
#23 0x2bb91 in redisplay_without_hooks () at redisplay.c:4982
#24 0x2bdf0 in redisplay () at redisplay.c:5044

Stack frames 0, 1, 3, 4, 5, and 11 are trashed.  The real stack
should look something like

----- process died -----
kill (MYPID, 6)
fatal_error_signal (sig=6)
----- signal handler called (6) -----
abort ()
assert_failed (...)
signal_1 (...)
Fsignal (...)
signal_error (...)
arith_error (sig=8)
----- signal handler called (8)
xlw_update_scrollbar (...) at line XXX
xlw_update_one_widget (...)

etc.

The only frame I care about is #11 (in xlw_update_scrollbar()), and this
is the most important one in the whole stack trace because it
says where the real problem lies.

(BTW, I'm using GDB 4.14.)

What is the cause of this behavior?  When debugging the same program
under Solaris, I don't see the problem -- the stack trace correctly
shows all signal invocations, the calls to abort() and kill(), etc.
This is using DBX.

Is this a bug in GDB or Linux?

ben
--
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin

 
 
 

Bogus end() stack frames under Linux

Post by Ben Wi » Fri, 11 Aug 1995 04:00:00




|
|>Quite frequently, when attaching to a process using gdb or investigating
|>a core dump using gdb, I find that some of the stack frames have been
|>trashed, and are usually shown as a call to end() or something random
|>like that.
|
|>Frequently, in fact, the stack frame that gets trashed is the one in
|>which a signal handler was invoked (typically because it did something
|>bogus).  This can make it a real pain in the ass to figure out what
|>went wrong, if the trashed stack frame was in a large function.
|
|>Here's an example of what I'm talking about:
|
|...
|
|>What is the cause of this behavior?  When debugging the same program
|>under Solaris, I don't see the problem -- the stack trace correctly
|>shows all signal invocations, the calls to abort() and kill(), etc.
|>This is using DBX.
|
|>Is this a bug in GDB or Linux?
|
|What you are seeing are the addresses of the shared library entry
|points, for which no symbolic names are known in the de*.
|
|When debugging a program, it is better to make a statically linked
|version, so that you have symbolic information (and access to) all
|library functions...

The shared library entry points are obliterating a stack frame?
That seems weird.  Is there something strange in the way that
shared libraries are invokved (e.g. a JUMP with the return address
in a register, rather than a CALL)?  Otherwise I'd expect the
return address of the shared library call to be on the stack
just like everything else.

Is this fixed with ELF?

ben
--
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin

 
 
 

Bogus end() stack frames under Linux

Post by Lex Spo » Fri, 11 Aug 1995 04:00:00





: |

[ snip ]

: |
: |>What is the cause of this behavior?  When debugging the same program
: |>under Solaris, I don't see the problem -- the stack trace correctly
: |>shows all signal invocations, the calls to abort() and kill(), etc.
: |>This is using DBX.
: |
: |>Is this a bug in GDB or Linux?
: |
: |What you are seeing are the addresses of the shared library entry
: |points, for which no symbolic names are known in the de*.
: |
: |When debugging a program, it is better to make a statically linked
: |version, so that you have symbolic information (and access to) all
: |library functions...

: The shared library entry points are obliterating a stack frame?
: That seems weird.  Is there something strange in the way that
: shared libraries are invokved (e.g. a JUMP with the return address
: in a register, rather than a CALL)?  Otherwise I'd expect the
: return address of the shared library call to be on the stack
: just like everything else.

Perhaps end() is the highest address that gdb know of?  And perhaps
gdb is assuming that any address higher than that is inside
the end() routine (though in reality it may be well past the
end of end() ) ?

Glancing back at your output, gdb is reporting end() for some
wildly different addresses.

Oh well, just thoughts...

-lex