LinuxThreads+signals => SIGSEGV in system call?

LinuxThreads+signals => SIGSEGV in system call?

Post by Fergus Henders » Thu, 30 Apr 1998 04:00:00



Synopsis:
---------

I'm trying to port the Boehm (et al) conservative garbage collector
to work with LinuxThreads.  I've got it to the point where
it works *some* of the time.  The problem is that it sometimes
fails, apparently getting a segmentation fault in a signal call.
It generates a core file and when I examine the core file in gdb,
the current instruction pointer is always just past the `int $80'
instruction that invokes the system call.

Any suggestions on how I can go about debugging this?

Details:
--------

There's two tricky parts to the port.  One part is determining where
the thread stacks are so that the collector can include them in its
root set.  This part I have got figured out.  My code must depend on
some of the implementation details of LinuxThreads, but otherwise this
part is not too hard.

The other tricky part is implementing the GC_stop_world() function,
which must suspend all the other threads.  The way I have implemented
this is to send them all a "SIG_SUSPEND" signal, and to have the signal
handler first call sem_post() to tell the main thread that they're
ready to suspend, and then call sigsuspend() (or sleep() -- I tried
both) inside the signal handler.  When the GC is done, the collector
calls GC_start_world() which sends all the threads a "SIG_RESTART"
signal.  The SIG_RESTART handler doesn't do anything except return; the
effect of the signal is just to terminate the call to sigsuspend() or
sleep().

(Normally I'd use SIGUSR1 and SIGUSR2 for my SIG_SUSPEND and
SIG_RESTART signals, but LinuxThreads already uses those, so I'm
currently reusing SIGIO and SIGPWR for SIG_SUSPEND and SIG_RESTART.)

Anyway, that's all well and good, and when I run the collector's test
case, about 50% of the time it works.  But the other 50% or so, it dies,
sometimes due to failed assertions, but more often due to what is
apparently a segmentation fault in a system call.

Is the problem due to Linux for some reason not liking code that
suspends inside a signal handler?   If so, why doesn't Linux allow this?
Or alternatively, what else could be causing this problem,
and how can I go about debugging it?

I'm using LinuxThreads 0.6, libc 5.3.12, kernel 2.1.35, gcc 2.7.2,
and gdb 4.16.

--

WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"

 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by Kaz Kylhek » Thu, 30 Apr 1998 04:00:00



> Synopsis:
> ---------

> I'm trying to port the Boehm (et al) conservative garbage collector
> to work with LinuxThreads.  I've got it to the point where
> it works *some* of the time.  The problem is that it sometimes
> fails, apparently getting a segmentation fault in a signal call.
> It generates a core file and when I examine the core file in gdb,
> the current instruction pointer is always just past the `int $80'
> instruction that invokes the system call.

> Any suggestions on how I can go about debugging this?

Start by reading the LinuxThreads FAQ!!!

You cannot use core files to debug LinuxThreads applications.
All threads except for the main thread are not dumpable.

Early in the LinuxThreads effort, I jumped on the bandwagon and
started using it. I soon discovered that if two threads failed
at the same time, the simultaneous core dump would cause the kernel
to lock up. I sent a report to Linus who reported that the core
dump generation code wasn't thread safe. I patched the problem
by changing the clone code so that if a thread is created, it is
not dumpable (the capability is already there in the processes
structure for the sake of setuid processes which must not core
dump for security reasons). This approach seems to be the
way it's done right now.

If you examine a core file generated by a LinuxThreads process,
it will not tell you where the problem is.

You also cannot use GDB to debug a LinuxThreads application;
you can only use it on the main thread. If any of the other
threads hits a breakpoint, the application will die.

Currently, the _only_ way to debug LinuxThreads applications is
to insert diagnostic statements into your code, and analyze
the results.

Quote:> Details:
> --------

> There's two tricky parts to the port.  One part is determining where
> the thread stacks are so that the collector can include them in its
> root set.  This part I have got figured out.  My code must depend on
> some of the implementation details of LinuxThreads, but otherwise this
> part is not too hard.

> The other tricky part is implementing the GC_stop_world() function,
> which must suspend all the other threads.  The way I have implemented
> this is to send them all a "SIG_SUSPEND" signal, and to have the signal
> handler first call sem_post() to tell the main thread that they're
> ready to suspend, and then call sigsuspend() (or sleep() -- I tried
> both) inside the signal handler.  When the GC is done, the collector
> calls GC_start_world() which sends all the threads a "SIG_RESTART"
> signal.  The SIG_RESTART handler doesn't do anything except return; the
> effect of the signal is just to terminate the call to sigsuspend() or
> sleep().

I don't know anything about the internals of this garbage collector,
but the approach sounds bogus. Why should all the threads ever
have to stop? If there is some critical data structure whose
integrity must be preserved under multi-threading conditions,
then mutexes and condition variables can be used to implement
a locking scheme. That way, should the threads all try to execute
the protected code, they _will_ all suspend themselves, as with
the SIG_SUSPEND scheme, but if they don't execute the critical
code, they can continue running.

Quote:> I'm using LinuxThreads 0.6, libc 5.3.12, kernel 2.1.35, gcc 2.7.2,
> and gdb 4.16.

The last time I checked, the latest LT was 0.71, and that's the one
I'm using. You should also probably be using libc 5.4.44, (or go all
the way to glibc2, which is recommended for multi-threaded programming).

 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by Bryan O'Sulliva » Thu, 30 Apr 1998 04:00:00


Quote:Fergus writes:

f> I'm using LinuxThreads 0.6, libc 5.3.12, kernel 2.1.35, gcc 2.7.2,
f> and gdb 4.16.

This is a weird mix.  Many of these components are out of date in
various ways.  LinuxThreads is obsolete; threads have been folded into
libc 6.x, and your kernel version is out of date by over 60 revisions.
You might want to try a more stable combination.

k> I don't know anything about the internals of this garbage
k> collector, but the approach sounds bogus.

It's more or less the only way you can get a conservative collector to
work sensibly with an unfriendly language.

k> Why should all the threads ever have to stop?

So you can scan their stacks and registers for possible pointers into
the heap.

        <b

--
Let us pray:
What a Great System.
Please Do Not Crash.

 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by David Wrag » Fri, 01 May 1998 04:00:00



> You also cannot use GDB to debug a LinuxThreads application;
> you can only use it on the main thread. If any of the other
> threads hits a breakpoint, the application will die.

> Currently, the _only_ way to debug LinuxThreads applications is
> to insert diagnostic statements into your code, and analyze
> the results.

If your program dies from SIGSEGVs not in the main thread,
this can also help:

  #include <signal.h>

  /* These two lines needed for libc5 */
  #define sigcontext_struct sigcontext
  #include <asm/sigcontext.h>

  #define HURL(reg) fprintf(stderr, #reg ": %u 0x%x\n", \
                  (unsigned int)(sc.reg), (unsigned int)(sc.reg))

  static void sigsegv_handler(int x, struct sigcontext sc)
  {
      fprintf(stderr, "*** SIGSEGV ***\n");
      HURL(eax);
      HURL(ebx);
      HURL(ecx);
      HURL(edx);
      HURL(edi);
      HURL(esi);
      HURL(ebp);
      HURL(esp);
      HURL(eip);
      /* cr2 gives the faulting address */
      HURL(cr2);
      _exit(0);
  }

      ...
      signal(SIGSEGV, (void (*)(int))sigsegv_handler);
      ...

It's unspeakably primitive, but can give you a lead when nothing else
does, or at least make you feel that you're not just banging your head
against a brick wall.

--
Dave Wragg

 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by Bryan O'Sulliva » Sat, 02 May 1998 04:00:00


Slap on the wrist to Bruce for both posting and emailing his followup,
which I now need to respond to for a second time.  Your helpful net
grouch asks you to wmail or post, but not both.

b> I heard somewhere that, in a mulithreaded environment, each thread
b> should have its own garbage collector.  The implementation is supposed
b> to be simpler and more elegant.

While it's possible to do this, it is neither as simple nor as elegant
as using a single garbage collector that stops all threads, if you're
living in an uncooperative world (as is the case for conservative
collectors, such as the one Fergus describes).

The problem is that data referenced by one thread can be passed to
another thread at almost any time, so simply scanning the set of data
referenced by any one thread at one time is not sufficient to
determine whether or not there is garbage present.

        <b

--
Let us pray:
What a Great System.
Please Do Not Crash.

 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by Bruce Bigb » Sat, 02 May 1998 04:00:00



> Slap on the wrist to Bruce for both posting and emailing his followup,
> which I now need to respond to for a second time.  Your helpful net
> grouch asks you to wmail or post, but not both.

> b> I heard somewhere that, in a mulithreaded environment, each thread
> b> should have its own garbage collector.  The implementation is supposed
> b> to be simpler and more elegant.

> While it's possible to do this, it is neither as simple nor as elegant
> as using a single garbage collector that stops all threads, if you're
> living in an uncooperative world (as is the case for conservative
> collectors, such as the one Fergus describes).

> The problem is that data referenced by one thread can be passed to
> another thread at almost any time, so simply scanning the set of data
> referenced by any one thread at one time is not sufficient to
> determine whether or not there is garbage present.

>         <b

> --
> Let us pray:
> What a Great System.
> Please Do Not Crash.


Thanks for the tip.  I thought that I was doing everyone a favor.  I
realize the error of my ways, now.
--
Bruce W. Bigby/Technical Specialist (Software Engineer)
Xerox Corporation - 300-12S, 800 Phillips Road, Webster, NY 14580

///////////////////////////////////////////////////////////////////
All of the opinions in this e-mail message are my own and do not
represent the opinions or policies of my employer, unless I have
explicitly stated so.
///////////////////////////////////////////////////////////////////
 
 
 

LinuxThreads+signals => SIGSEGV in system call?

Post by Rickard Westm » Thu, 07 May 1998 04:00:00




>> I'm trying to port the Boehm (et al) conservative garbage collector
>> to work with LinuxThreads.
(...)
>> The other tricky part is implementing the GC_stop_world() function,
>> which must suspend all the other threads.  The way I have implemented
>> this is to send them all a "SIG_SUSPEND" signal, and to have the signal
>> handler first call sem_post() to tell the main thread that they're
>> ready to suspend, and then call sigsuspend() (or sleep() -- I tried
>> both) inside the signal handler.  When the GC is done, the collector
>> calls GC_start_world() which sends all the threads a "SIG_RESTART"
>> signal.  The SIG_RESTART handler doesn't do anything except return; the
>> effect of the signal is just to terminate the call to sigsuspend() or
>> sleep().

>I don't know anything about the internals of this garbage collector,
>but the approach sounds bogus. Why should all the threads ever
>have to stop? If there is some critical data structure whose
>integrity must be preserved under multi-threading conditions,
>then mutexes and condition variables can be used to implement
>a locking scheme. That way, should the threads all try to execute
>the protected code, they _will_ all suspend themselves, as with
>the SIG_SUSPEND scheme, but if they don't execute the critical
>code, they can continue running.

A conservative garbage collector is a garbage collector which works
with minimal cooperation from the program whose data structures are
collected.  In the simplest mode of operation, the program is
changed to use the GCs replacement malloc() instead of the normal
malloc(), and all free() calls are changed into no-ops.  The GC
will automatically free memory which, in the GCs opinion, can no
longer be accessed through any pointer.  To detect this condition,
it will periodically scan all the memory of the process, looking
for potential pointers.  That's the critical operation during which
the world is stopped.  To synchronize this using mutexes, every
single data access in the program would need to be protected by a
mutex in some way, which is clearly not practical.  (Even if it
were, it would contradict the goal of a conservative GC, since the
idea is that the program to which GC is added should only need
minimal modification.)

Now, I don't know anything about LinuxThreads, I just wanted to
clarify why the world needs to be stopped from time to time when
using a Boehm-style conservative garbage collector.

--

"Beware of the panacea peddlers: Just because you
 wind up * doesn't make you an emperor."
                         - Michael A Padlipsky

 
 
 

1. AIX: SEVERE ERROR: sigsetup - SIGSEGV signal detected, Calling Shutdown.

Hi list,
Hope someone are able to help or point me in the right direction.

Got the above message on one AIX V4 system when running a vendor program -
but it runs OK in our AIX V4 test system. (I believe that the vendor program
is written in SAS C)

Are there any system settings I can compare between the two systems to
ensure that this is not an  environmental problem?

I would contact the vendor (CA), but they have not proven very
helpful/responsive in the past.

Regards,
Peter

2. linux docs

3. Reentrant system calls (can be called from signal handler)

4. routers with bandwidth shaping/limiting

5. Konsole and <ins> <del> keys

6. Advanced sendpage question

7. How I could add a new system call to linux or modify a system calls

8. which modems best to receive calls?

9. How to use open system call in a new system call

10. "Interrupted system call" at "low level" - system calls

11. problem with signal() system call

12. Usage of signal system call

13. System calls inside signal handlers