malloc/free blues

malloc/free blues

Post by Eli Zaretski » Sun, 20 Jul 2003 20:46:04



> Date: Fri, 18 Jul 2003 14:35:00 +0200

> Call frame traceback EIPs:
>   0x0006939c merge(BLOCK*, BLOCK*, BLOCK*)+170, file
> c:/src/dots2002/mallocsrc.c
>   0x000678e1 free+141, file c:/src/dots2002/mallocsrc.cpp, line 312
>   0x0006d94e destroy_bitmap+370, file c:/djgpp/allegro/src/graphics.c,
> line 1165
>   0x000133d5 .debug_str+544, file c:/src/dots2002/grafx.cpp, line 2496
>   0x0005f98a .debug_pubnames+42777, file c:/src/dots2002/exp.cpp, line 731
>   0x00033809 .debug_info+613, file c:/src/dots2002/irpreter.cpp, line 1006
>   0x00024d51 .debug_line+581, file c:/src/dots2002/irpreter.cpp, line 211
>   0x00023d59 interprete(std::string)+391, file
> c:/src/dots2002/irpreter.cpp, lin

> In GDB this looks like:
> SIGSEV, Segmentation Fault at 0x0006939c in merge(a=0x5ecfa24,
> b=0x6e971c, c=0x5eca24)
> mallocsrc.cpp:273: ENDSZ(a)=a->size

This means that either the dereference of `a' in "a->size" or whatever
ENDSZ(a) does caused the crash.  Since "a->size" only _reads_ from the
address pointed to by `a' ("a->size" being on the right side of the
assignemnt), it's not where the crash happens, since we have this
line in the crash message:

Quote:> Page Fault at 0x0006939c, error=0006

"error=0006" means the crash happened when the program tried to
_write_ to some address, and that address was found to be invalid.

So, looking at the definition of ENDSZ:

Quote:> #define ENDSZ(bp)  (*(size_t *)((char *)bp + bp->size + 4))

We see that it dereferences puts a value into the address computed
like this:

     a + a->size + 4

`a' is almost certainly a good pointer, since otherwise the program
would have crashed when it computed "a->size".  Therefore, if you
print the value of "a->size", you will most probably see a garbled
value, probably produced by some code that overwrote the value
recorded there by malloc.  (a->size records the size of the allocated
buffer.)

Now the trick is to put a watchpoint at the address of a->size, and
then run the program again.  Then you will see what code writes a
bogus value there.

Quote:> Unfortunately, if I put a watchpoint on 0x6939c or on 0x5ecfa24, gdb
> freezes or crashes badly in the run.

0x6939c is an address in the code section, so you cannot usefully
watch it.  And as the analysis above suggests, 0x5ecfa24, which is the
value of `a', is not part of the problem; a->size is.
 
 
 

malloc/free blues

Post by Peter Claessen » Tue, 22 Jul 2003 22:09:04


Just a remark about this:

Quote:>  - download djlsr203.zip, extract the module malloc.c from it, and
>    paste its code into your program's sources;

I thought it would be a good idea to set the #DEBUG preprocessor symbol
in that file to 1. Apparently it wasn't.  I get the ugliest crashes at
startup of the program, apparently based on a segmentation fault (at
least that's what GDB says) in the beginning of the program, before any
output is sent to the screen, resulting in blue screens etc in win98. I
guess the debugging code isn't meant to run under windows? Or is it
really indicating something about my program being very wrong?
In case it would matter, the int _crt0_startup_flags =  
_CRT0_FLAG_FILL_SBRK_MEMORY |  _CRT0_FLAG_FILL_DEADBEEF; is still set.

Cheers,
P

 
 
 

malloc/free blues

Post by Eli Zaretski » Wed, 23 Jul 2003 01:25:32


> Date: Mon, 21 Jul 2003 15:09:04 +0200

> I thought it would be a good idea to set the #DEBUG preprocessor symbol
> in that file to 1. Apparently it wasn't.  I get the ugliest crashes at
> startup of the program, apparently based on a segmentation fault (at
> least that's what GDB says) in the beginning of the program, before any
> output is sent to the screen, resulting in blue screens etc in win98. I
> guess the debugging code isn't meant to run under windows? Or is it
> really indicating something about my program being very wrong?

The latter, I guess.  Can you post a SYMIFY'ed traceback of such a
crash?  Also, what happens if you boot into plain DOS (by holding F8
or F2 during startup), and then run your program? does it crash, and
if so, what gets printed when it does?

(To run a DJGPP program on plain DOS, you will need to make sure you
have CWSDPMI installed.)

 
 
 

malloc/free blues

Post by Peter Claessen » Wed, 23 Jul 2003 01:32:35


Here the results for the runs with the DEBUG flag set to 1.

I get many conversion warnings at compilation time. I guess most
warnings are ok except for this:
"malloc.c:353: warning: unknown conversion type character `,' in format
malloc.c:353: warning: unsigned int format, pointer arg (arg 6)
malloc.c:353: warning: too many arguments for format"
I just added an 'x' after %08x->%08 to get rid of this.

C:\src\dots2002>dotread -cl
Exiting due to signal SIGSEGV
Page fault at eip=000bd61d, error=0006
eax=000000af ebx=000bd724 ecx=00000010 edx=00001757 esi=000bd724
edi=031c17dc
ebp=00000fbc esp=00000fbc program=<**UNKNOWN**>
cs: sel=00a7  base=84830000  limit=03242fff
ds: sel=00af  base=84830000  limit=03242fff
es: sel=00af  base=84830000  limit=03242fff
fs: sel=0087  base=00015ee0  limit=0000ffff
gs: sel=00bf  base=00000000  limit=0010ffff
ss: sel=03cb  invalid
App stack: [03243000..031c3000]  Exceptn stack: [00168718..001667d8]

Call frame traceback EIPs:
  0x000bd61d

Notice the bizarre 'program' value. This is the same run under GDB:

Program received signal SIGSEGV, Segmentation fault.
0x000ca7e1 in _doprnt ()
(gdb) backtrace
#0  0x000ca7e1 in _doprnt ()
Cannot access memory at address 0x804

Under plain dos I had (leaving out trailing zeroes):
a GPF at eip=bd63f; flags=3016
eax=0 ebx=123 ecx=4000 edx=0 esi=16 edi=8fc ebp=d esp=1684d8
cs=a7 ds=af es=af fs=8f gs=bf ss=af error=0000
!
Running gdb under plain dos resulted in the same segmentation fault
crash as above, with the same address.

This is what I got under plain dos with the DEBUG flag set to 0:

General Protection Fault at eip=00003548
eax=6d657270 ebx=00239ae0 ecx=002398e0 edx=00000048 esi=001e6980
edi=00239ad4
ebp=001e68e8 esp=001e68d0 program=C:\SRC\DOTS2002\DOTREAD.EXE
cs: sel=00a7  base=10000000  limit=0058ffff
ds: sel=00af  base=10000000  limit=0058ffff
es: sel=00af  base=10000000  limit=0058ffff
fs: sel=00bf  base=00000000  limit=0010ffff
gs: sel=00bf  base=00000000  limit=0010ffff
ss: sel=00af  base=10000000  limit=0058ffff
App stack: [001e79d4..001679d4]  Exceptn stack: [00167918..001659d8]

Call frame traceback EIPs:
  0x00003548 merge(BLOCK*, BLOCK*, BLOCK*)+118, file
c:/src/dots2002/malloc.c, l
  0x00001b01 .debug_line+22, file c:/src/dots2002/malloc.c, line 318
  0x000d3e11 operator delete(void*)+21, file fnmatch.c
  0x000fe3c8 std::string::_Rep::_M_destroy..+40, file fnmatch.c
  0x000fc1bb std::string::~string()+59, file fnmatch.c
  0x0004e050 readvar(std::string, std::st..+778, file c:/src/d..ig.cpp,
line 162
  0x000610e2 exprun(std::string, std::s..+43154, file c:/src/d..xp.cpp,
line 731
  0x000357c9 experiment_cmd()+1021, file c:/src/dots2002/irpreter.cpp,
line 1006
  0x00026d11 .debug_line+1513, file c:/src/dots2002/irpreter.cpp, line 211
  0x00025d19 .debug_line+283, file c:/src/dots2002/irpreter.cpp, line 97
  0x0002562a cl()+250, file c:/src/dots2002/irface.cpp, line 74

The last 2 times I ran the prog in plain dos under gdb, it froze. Ctrl+c
didn't solve anything, I had to reboot.

Do you think it would be a good or a bad idea to try the malloc_debug
functions in the nmalloc package that I found under the alpha
distribution info for djdev?

Sorry that I channel all this material through the djgpp mailing list. I
have some problems posting on the newsgroup. (Seems I didn't sacrifice
enough to the gods of the digital age huh.)

Thanks,
P


>>Date: Mon, 21 Jul 2003 15:09:04 +0200

>>I thought it would be a good idea to set the #DEBUG preprocessor symbol
>>in that file to 1. Apparently it wasn't.  I get the ugliest crashes at
>>startup of the program, apparently based on a segmentation fault (at
>>least that's what GDB says) in the beginning of the program, before any
>>output is sent to the screen, resulting in blue screens etc in win98. I
>>guess the debugging code isn't meant to run under windows? Or is it
>>really indicating something about my program being very wrong?

>The latter, I guess.  Can you post a SYMIFY'ed traceback of such a
>crash?  Also, what happens if you boot into plain DOS (by holding F8
>or F2 during startup), and then run your program? does it crash, and
>if so, what gets printed when it does?

>(To run a DJGPP program on plain DOS, you will need to make sure you
>have CWSDPMI installed.)

 
 
 

malloc/free blues

Post by Peter Claessen » Wed, 23 Jul 2003 03:43:23


Additional information:

When I add a trivial output statement  (printf("test\n");) to malloc.c,
in this case in the malloc function, line 112, the prog crashes even
before reaching the main function.


>>Date: Mon, 21 Jul 2003 15:09:04 +0200

>>I thought it would be a good idea to set the #DEBUG preprocessor symbol
>>in that file to 1. Apparently it wasn't.  I get the ugliest crashes at
>>startup of the program, apparently based on a segmentation fault (at
>>least that's what GDB says) in the beginning of the program, before any
>>output is sent to the screen, resulting in blue screens etc in win98. I
>>guess the debugging code isn't meant to run under windows? Or is it
>>really indicating something about my program being very wrong?

>The latter, I guess.  Can you post a SYMIFY'ed traceback of such a
>crash?  Also, what happens if you boot into plain DOS (by holding F8
>or F2 during startup), and then run your program? does it crash, and
>if so, what gets printed when it does?

>(To run a DJGPP program on plain DOS, you will need to make sure you
>have CWSDPMI installed.)

 
 
 

malloc/free blues

Post by Eli Zaretski » Wed, 23 Jul 2003 13:34:17


> Date: Mon, 21 Jul 2003 18:32:35 +0200

> C:\src\dots2002>dotread -cl
> Exiting due to signal SIGSEGV
> Page fault at eip=000bd61d, error=0006
> eax=000000af ebx=000bd724 ecx=00000010 edx=00001757 esi=000bd724
> edi=031c17dc
> ebp=00000fbc esp=00000fbc program=<**UNKNOWN**>
> cs: sel=00a7  base=84830000  limit=03242fff
> ds: sel=00af  base=84830000  limit=03242fff
> es: sel=00af  base=84830000  limit=03242fff
> fs: sel=0087  base=00015ee0  limit=0000ffff
> gs: sel=00bf  base=00000000  limit=0010ffff
> ss: sel=03cb  invalid
> App stack: [03243000..031c3000]  Exceptn stack: [00168718..001667d8]

> Call frame traceback EIPs:
>   0x000bd61d

> Notice the bizarre 'program' value. This is the same run under GDB:

> Program received signal SIGSEGV, Segmentation fault.
> 0x000ca7e1 in _doprnt ()
> (gdb) backtrace
> #0  0x000ca7e1 in _doprnt ()
> Cannot access memory at address 0x804

Sounds like somehow it tries to print too early, when the run-time
environment is not yet set up.  Weird.

Anyway, this probably means you should for now drop the idea of
setting DEBUG to a non-zero value.

Quote:> Do you think it would be a good or a bad idea to try the malloc_debug
> functions in the nmalloc package that I found under the alpha
> distribution info for djdev?

It cannot hurt to use the malloc_debug package.  Try setting the debug
level to the maximum, and see what it tells you.  Also, call the
function that checks the heap integrity in a few places, and see where
it starts complaining.

Quote:> Sorry that I channel all this material through the djgpp mailing
> list.

Nothing to be sorry about, this is the appropriate place for such
discussions.
 
 
 

malloc/free blues

Post by Peter Claessen » Thu, 24 Jul 2003 06:30:13


My problem is finally solved now... I did the following:
I installed djdev 2.04, where malloc_debug and malloc_verify are
implemented in the libc, recompiled with malloc debugging level to 4. I
had to use some tricks to read the error output, because right after the
abort due to the memory problem, I got in an infinite 'abort' loop
somewhere in the allegro_exit function. It turned out that the 'fixpage'
pointer (see first post, with the code) and the associated struct fields
were always involved in these crashes.  Then I suddenly spotted the
problem in my code. Normally I set a pointer to NULL after deleting its
object, and I do the same for bitmap pointers after calling
destroy_bitmap; I check pointers against being zero before trying to
delete or destroy them. Forgetting to set the pointer to NULL after a
destroy_bitmap one time was enough to cast this nightmare upon me. I
added the line to do so, and behold.... my program doesn't crash. I
deleted the malloc_dbug statement though, because of the errors I
receive when shutting down the program.
Conclusion: I lost a lot of time, but at least I learned a lot about
debugging.
Many thanks to everyone who helped me sorting out this problem!

Cheers,
Peter.