pthread_mutex_lock() causes vfork call?!

pthread_mutex_lock() causes vfork call?!

Post by Neil Kessle » Sat, 04 Sep 1999 04:00:00



I am developing a program using pthreads on solaris 2.6.  I have a
situation where a thread gets a lock on a mutex.  Another thread calls
spawns a child using vfork() to do a call to pthread_mutex_lock() ( ie.,
blocking call ).  When the child does this, since it is already locked
by a thread in the parent process, it blocks and goes to sleep.  At this
point the parent is blocked in a wait for the child.  The result:  the
parent cannot conitinue and subsequently release the lock it has on the
mutex, so the child sleeps forever, causing the parent to sleep
forever.  Deadlock.

Does anyone know if and why solaris uses vfork() at all to control any
pthread activity.  And if so, how do I get around this fatal problem?

Thanks,
Neil

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Chris Thomps » Sun, 05 Sep 1999 04:00:00




Quote:>I am developing a program using pthreads on solaris 2.6.  I have a
>situation where a thread gets a lock on a mutex.  Another thread calls
>spawns a child using vfork() to do a call to pthread_mutex_lock() ( ie.,
>blocking call ).  When the child does this, since it is already locked
>by a thread in the parent process, it blocks and goes to sleep.  At this
>point the parent is blocked in a wait for the child.  The result:  the
>parent cannot conitinue and subsequently release the lock it has on the
>mutex, so the child sleeps forever, causing the parent to sleep
>forever.  Deadlock.

>Does anyone know if and why solaris uses vfork() at all to control any
>pthread activity.  And if so, how do I get around this fatal problem?

The short answer is that multi-threading programs shouldn't use vfork()
at all, ever, period. As the man page says

| NOTES
[...]
|
|      vfork() is unsafe in multi-thread applications.

The longer answer would involve asking you why on earth a thread is
"spawn[ing] a child using vfork() to do a call to pthread_mutex_lock()".
Why can't it call pthread_mutex_lock() directly?

Chris Thompson
Email: cet1 [at] cam.ac.uk

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by palow.. » Sun, 05 Sep 1999 04:00:00






> >I am developing a program using pthreads on solaris 2.6.  I have a
> >situation where a thread gets a lock on a mutex.  Another thread
calls
> >spawns a child using vfork() to do a call to pthread_mutex_lock() (
ie.,
> >blocking call ).  When the child does this, since it is already
locked
> >by a thread in the parent process, it blocks and goes to sleep.  At
this
> >point the parent is blocked in a wait for the child.  The result:
the
> >parent cannot conitinue and subsequently release the lock it has on
the
> >mutex, so the child sleeps forever, causing the parent to sleep
> >forever.  Deadlock.

> >Does anyone know if and why solaris uses vfork() at all to control
any
> >pthread activity.  And if so, how do I get around this fatal problem?

> The short answer is that multi-threading programs shouldn't use
vfork()
> at all, ever, period. As the man page says

> | NOTES
> [...]
> |
> |      vfork() is unsafe in multi-thread applications.

> The longer answer would involve asking you why on earth a thread is
> "spawn[ing] a child using vfork() to do a call to

pthread_mutex_lock()".

Quote:> Why can't it call pthread_mutex_lock() directly?

You almost think that this is the basic argument/difference between
how Solaris pthreads and Linux threads are implemented.

---Bob

--
Bob Palowoda   The Solaris x86 Corner   http://fishbutt.fiver.net

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Roger A. Faulkn » Tue, 07 Sep 1999 04:00:00




Quote:>I am developing a program using pthreads on solaris 2.6.  I have a
>situation where a thread gets a lock on a mutex.  Another thread calls
>spawns a child using vfork() to do a call to pthread_mutex_lock() ( ie.,
>blocking call ).  When the child does this, since it is already locked
>by a thread in the parent process, it blocks and goes to sleep.  At this
>point the parent is blocked in a wait for the child.  The result:  the
>parent cannot conitinue and subsequently release the lock it has on the
>mutex, so the child sleeps forever, causing the parent to sleep
>forever.  Deadlock.

>Does anyone know if and why solaris uses vfork() at all to control any
>pthread activity.  And if so, how do I get around this fatal problem?

Solaris does not use vfork() anywhere in the threads/pthreads
library implementation.  Your problem lies elsewhere.

Roger Faulkner

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Neil Kessle » Wed, 08 Sep 1999 04:00:00



> I am developing a program using pthreads on solaris 2.6.  I have a
> situation where a thread gets a lock on a mutex.  Another thread calls
> spawns a child using vfork() to do a call to pthread_mutex_lock() ( ie.,
> blocking call ).  When the child does this, since it is already locked
> by a thread in the parent process, it blocks and goes to sleep.  At this
> point the parent is blocked in a wait for the child.  The result:  the
> parent cannot conitinue and subsequently release the lock it has on the
> mutex, so the child sleeps forever, causing the parent to sleep
> forever.  Deadlock.

> Does anyone know if and why solaris uses vfork() at all to control any
> pthread activity.  And if so, how do I get around this fatal problem?

> Thanks,
> Neil

Well, I have discovered what is happening.  At various places in my code I
am using the call system() to do some script execution, etc.  What happens
is that the system is called, which invokes vfork(), a child is created (
with ALL of the threads in the process copied ) and before the thread that
called the system has a chance to do an execve() call, one of the other
copied threads does a pthread_mutex_lock() ( blocking ) on a mutex already
locked by another thread.  This causes the process to deadlock, because now
there is a child sleeping on the mutex, then the execve() is called, and
you're done, because no other threads in the child can run because of the
execve() and the parent is suspended waiting for the child to return.  It
looks like I am going to have to rewrite system() to use fork1 ( which only
copies the calling thread, not all of the threads in the process ) instead
of vfork().

If anyone has any other suggestions as to another workaround, I'm all ears.

Neil

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Roger A. Faulkn » Thu, 09 Sep 1999 04:00:00





[snip]
>Well, I have discovered what is happening.  At various places in my code I
>am using the call system() to do some script execution, etc.  What happens
>is that the system is called, which invokes vfork(), a child is created (
>with ALL of the threads in the process copied ) and before the thread that
>called the system has a chance to do an execve() call, one of the other
>copied threads does a pthread_mutex_lock() ( blocking ) on a mutex already
>locked by another thread.  This causes the process to deadlock, because now
>there is a child sleeping on the mutex, then the execve() is called, and
>you're done, because no other threads in the child can run because of the
>execve() and the parent is suspended waiting for the child to return.  It
>looks like I am going to have to rewrite system() to use fork1 ( which only
>copies the calling thread, not all of the threads in the process ) instead
>of vfork().

You are very confused.

vfork() is the same as fork1() except that the child process
shares the parent's address space.  The libc version of system()
makes an attempt to be compatible with libthread by not grabbing
any locks after the vfork() and before calling exec().

You must have replaced vfork() with fork() in your application's code
by supplying a vfork() that just calls fork().  Or else you supplied
your own version of system() that calls fork().

Go debug your application.  You are barking up the wrong tree.

Roger Faulkner

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Chris Thomps » Thu, 09 Sep 1999 04:00:00




[...]

Quote:

>vfork() is the same as fork1() except that the child process
>shares the parent's address space.  The libc version of system()
>makes an attempt to be compatible with libthread by not grabbing
>any locks after the vfork() and before calling exec().

The 2.6 man page for system(3s) says

  ATTRIBUTES
  ...
    | MT-Level      |  Unsafe         |

though. Are you saying that this is no longer correct?

Chris Thompson
Email: cet1 [at] cam.ac.uk

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Neil Kessle » Thu, 09 Sep 1999 04:00:00






> [snip]
> >Well, I have discovered what is happening.  At various places in my code I
> >am using the call system() to do some script execution, etc.  What happens
> >is that the system is called, which invokes vfork(), a child is created (
> >with ALL of the threads in the process copied ) and before the thread that
> >called the system has a chance to do an execve() call, one of the other
> >copied threads does a pthread_mutex_lock() ( blocking ) on a mutex already
> >locked by another thread.  This causes the process to deadlock, because now
> >there is a child sleeping on the mutex, then the execve() is called, and
> >you're done, because no other threads in the child can run because of the
> >execve() and the parent is suspended waiting for the child to return.  It
> >looks like I am going to have to rewrite system() to use fork1 ( which only
> >copies the calling thread, not all of the threads in the process ) instead
> >of vfork().

> You are very confused.

> vfork() is the same as fork1() except that the child process
> shares the parent's address space.  The libc version of system()
> makes an attempt to be compatible with libthread by not grabbing
> any locks after the vfork() and before calling exec().

> You must have replaced vfork() with fork() in your application's code
> by supplying a vfork() that just calls fork().  Or else you supplied
> your own version of system() that calls fork().

> Go debug your application.  You are barking up the wrong tree.

> Roger Faulkner

the output of truss for my program:

    1034/1:         lwp_mutex_lock(0xEF5D9920)                      = 0
    1034/3:         lwp_mutex_unlock(0xEF5D9920)                    = 0

now 1034/1 has the lock on 0xEF5D9920 after 1034/3 releases it.  Continuing the
output:

    1034/1:
lwp_sema_post(0xEE305E80)                                           = 0
    1034/3:
lwp_sema_wait(0xEE305E80)                                           = 0
    1034/1:         write(1, " a b o u t   t o   l o c"..,
72)                                      = 72
    1034/1:
lwp_sema_post(0xEE305E80)                                           = 0
    1034/3:
lwp_sema_wait(0xEE305E80)                                           = 0
    1034/1:         sigaction(SIGCLD, 0xEED014A0, 0xEED01614)       = 0
    1034/3:         write(1, " e n g - d e v 6 . m i r"..,
25)                                 = 25
    1034/1:
vfork()
= 1213
    1213/1:         vfork()         (returning as child
...)                                        = 1034
    1213/1:         lwp_mutex_lock(0xEF5D9920)      (sleeping...)
    1034/1:         vfork()         (waiting for child to exit()/exec()...)

Because I do not call fork() or vfork(), nor had I written my own system() call
ANYWHERE in my code, from the looks of it 1034/1 calls system(), the vfork() is
called, the child returns and instead
of doing a execve() it does a lwp_mutex_lock().  Thus causing the program to
deadlock.

By writing my own system() and replacing vfork() with fork1() this problem goes
away.  How is that possible if vfork() and fork1() are the same?  These are
honest questions, not rhetoric.  I agree that the man pages say that vfork only
borrows the parent's thread of control that called the system.  If a thread
calls system(), does it get control of a different thread in the parent?  How
else would a lwp_mutex_lock() be called?  The truss shows that something else
is happening inside of vfork() besides the simple execve() that it is supposed
to be calling.

Neil

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Roger A. Faulkn » Thu, 09 Sep 1999 04:00:00






>[...]

>>vfork() is the same as fork1() except that the child process
>>shares the parent's address space.  The libc version of system()
>>makes an attempt to be compatible with libthread by not grabbing
>>any locks after the vfork() and before calling exec().

>The 2.6 man page for system(3s) says

>  ATTRIBUTES
>  ...
>    | MT-Level      |  Unsafe         |

>though. Are you saying that this is no longer correct?

No.  I said it makes an attempt to be safe.
It is still unsafe if some other thread in the process is fooling
around with SIGCHLD (system() fools around with SIGCHLD).
Mostly though, in Solaris 2.6 and later, it works fine.

Roger Faulkner

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Roger A. Faulkn » Fri, 10 Sep 1999 04:00:00





[snip]
>> You are very confused.

>> vfork() is the same as fork1() except that the child process
>> shares the parent's address space.  The libc version of system()
>> makes an attempt to be compatible with libthread by not grabbing
>> any locks after the vfork() and before calling exec().
[snip]

>the output of truss for my program:

[snip, showing child grabbing a lock and blocking]

Quote:

>Because I do not call fork() or vfork(), nor had I written my own system() call
>ANYWHERE in my code, from the looks of it 1034/1 calls system(), the vfork() is
>called, the child returns and instead
>of doing a execve() it does a lwp_mutex_lock().  Thus causing the program to
>deadlock.

Yes, this is the (broken) behavior of system() before Solaris 2.6
It is marked MT-unsafe in the man page, so you should not have used it.
Strictly speaking, it is still MT-unsafe, although it mostly works in
Solaris 2.6 and later.  You must have a Solaris 2.5.1 or prior system.

Quote:>By writing my own system() and replacing vfork() with fork1() this problem goes
>away.  How is that possible if vfork() and fork1() are the same?  These are
>honest questions, not rhetoric.  I agree that the man pages say that vfork only
>borrows the parent's thread of control that called the system.  If a thread
>calls system(), does it get control of a different thread in the parent?  How
>else would a lwp_mutex_lock() be called?  The truss shows that something else
>is happening inside of vfork() besides the simple execve() that it is supposed
>to be calling.

The child has no control of any thread in the parent.
To ask such a question shows basic misunderstanding of UNIX.

You are confused about the behavior of vfork().
vfork() creates a child process that shares the parent's address space.
The child has only thread, just like fork1().  Unlike fork1() however, the
parent is blocked from executing until the child performs exec() or exit().
If the child attempts to grab a mutex that some thread in the parent owns,
it will block and because the parent is blocked by the semantics of vfork(),
there is deadlock.

You asserted that vfork() was cloning all of the parent's threads.
This is untrue.  This made me angry.  I flamed you.  I'm not sorry.

Asking for information is ok, but don't spread disinformation
on this newsgroup.  Be sure you know what you are talking about.

You need to study some books about UNIX programming.

I don't want to continue this discussion, so this is the
last response from me.

Roger Faulkner

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Neil Kessle » Fri, 10 Sep 1999 04:00:00






> [snip]
> >> You are very confused.

> >> vfork() is the same as fork1() except that the child process
> >> shares the parent's address space.  The libc version of system()
> >> makes an attempt to be compatible with libthread by not grabbing
> >> any locks after the vfork() and before calling exec().
> [snip]

> >the output of truss for my program:
> [snip, showing child grabbing a lock and blocking]

> >Because I do not call fork() or vfork(), nor had I written my own system() call
> >ANYWHERE in my code, from the looks of it 1034/1 calls system(), the vfork() is
> >called, the child returns and instead
> >of doing a execve() it does a lwp_mutex_lock().  Thus causing the program to
> >deadlock.

> Yes, this is the (broken) behavior of system() before Solaris 2.6
> It is marked MT-unsafe in the man page, so you should not have used it.
> Strictly speaking, it is still MT-unsafe, although it mostly works in
> Solaris 2.6 and later.  You must have a Solaris 2.5.1 or prior system.

> >By writing my own system() and replacing vfork() with fork1() this problem goes
> >away.  How is that possible if vfork() and fork1() are the same?  These are
> >honest questions, not rhetoric.  I agree that the man pages say that vfork only
> >borrows the parent's thread of control that called the system.  If a thread
> >calls system(), does it get control of a different thread in the parent?  How
> >else would a lwp_mutex_lock() be called?  The truss shows that something else
> >is happening inside of vfork() besides the simple execve() that it is supposed
> >to be calling.

> The child has no control of any thread in the parent.
> To ask such a question shows basic misunderstanding of UNIX.

> You are confused about the behavior of vfork().
> vfork() creates a child process that shares the parent's address space.
> The child has only thread, just like fork1().  Unlike fork1() however, the
> parent is blocked from executing until the child performs exec() or exit().
> If the child attempts to grab a mutex that some thread in the parent owns,
> it will block and because the parent is blocked by the semantics of vfork(),
> there is deadlock.

> You asserted that vfork() was cloning all of the parent's threads.
> This is untrue.  This made me angry.  I flamed you.  I'm not sorry.

> Asking for information is ok, but don't spread disinformation
> on this newsgroup.  Be sure you know what you are talking about.

> You need to study some books about UNIX programming.

> I don't want to continue this discussion, so this is the
> last response from me.

> Roger Faulkner

Not all of us know as much as you, Roger.  The reason that some people
post on the newsgroups with QUESTIONS, is because they do not have at
their disposal the experience or knowledge to solve a problem. There are
some things about threads on UNIX that I find confusing.  I am posting my
opinions, assumptions and yes, QUESTIONS because I need HELP.

I have a Solaris 2.6 machine, so yes I was confused by your assertion that
my  system() should not be doing what it was doing ( as shown by the
output of truss ).

It is obvious to everyone reading this that I am at an disadvantage to
you in terms of Solaris and UNIX and multithreaded programming experience.
That is why I am ASKING and you are ANSWERING.  It is also obvious that
you feel it your right to flame.  This I think not true.  I don't know why
you've reacted this way to someone in obvious need, but I don't believe
it's motivated solely by the altruistic desire to stop the "spread of
disinformation" on this  newsgroup of which you accuse me.  My
"disinformation," which I feel is obvious to see, is an imperfect attempt
to clarify and to seek clarification from others.  Below is a
distillation of what you have said that I believe is useful to this newsgroup
and to me.  My suggestion to you is, leave out the other stuff.  If you
behave with kindness and patience, I believe that people will be more
grateful, and ultimately find you more helpful.

        Solaris does not use vfork() anywhere in the threads/pthreads
        library implementation.

       vfork() is the same as fork1() except that the child process
        shares the parent's address space.  The libc version of system()
        makes an attempt to be compatible with libthread by not grabbing
        any locks after the vfork() and before calling exec().

        vfork() creates a child process that shares the parent's
        address space.  The child has only thread, just like fork1().
        Unlike fork1() however, the parent is blocked from executing until
        the child performs exec() or exit(). If the child attempts to grab
        a mutex that some thread in the parent owns, it will block and
        because the parent is blocked by the semantics of vfork(),
        there is deadlock.

        [ what was happening in my program as evidenced by the output of
          truss was ] the (broken) behavior of system() before Solaris
        2.6. It is marked MT-unsafe in the man page, so you should not
        have used it. Strictly speaking, it is still MT-unsafe, although
        it mostly works in Solaris 2.6 and later.  You must have a Solaris
        2.5.1 or prior system.

        You asserted that vfork() was cloning all of the parent's threads.
        This is untrue.

Thanks for the information.

Neil

 
 
 

pthread_mutex_lock() causes vfork call?!

Post by Chris Thomps » Fri, 10 Sep 1999 04:00:00




> Not all of us know as much as you, Roger.  

(which is clearly true of nearly all of us, in the current context)

and goes on to suggest that Roger has overreacted a bit here. I agree:
all newsgroups contain postings spreading disinformation, usually (as
I believe here) inadvertantly. Usually, the best cure is to refute the
disinformation by providing valid information.

Nevertheless... Roger asserted

Quote:>>   [ what was happening in my program as evidenced by the output of
>>   truss was ] the (broken) behavior of system() before Solaris
>>   2.6. It is marked MT-unsafe in the man page, so you should not
>>   have used it. Strictly speaking, it is still MT-unsafe, although
>>   it mostly works in Solaris 2.6 and later.  You must have a Solaris
>>   2.5.1 or prior system.

to which you say

Quote:> Thanks for the information.

But at the very start of this thread you said

Quote:> I am developing a program using pthreads on solaris 2.6

so that we had all sort of assumed that your diagnostics were obtained on
one or more 2.6 systems. True or false? ITWSBT.

Chris Thompson

 
 
 

1. regular expression library calls caused core-dumped

[...]

I ran into the same problem recently. This is a gcc problem, not
restricted to Linux.

Get Henry Spencer's regexp package at zoo.toronto.edu:/pub/bookregexp.shar

(there's also regexp.shar, which is an attempt of a POSIX-compliant
implementation but is not fully working).

You'll probably have to rewrite parts of your code if you wrote it
first for the rx lib, as there are a few differences between the
two. There are also differences in the regexp syntax (parenthesis, for
instance). No big deal, really.

But overall, Spencer's code is rock-solid. It served as a basis for
numerous implementations of regexps (Perl, among others).
--
--
                        Guillaume

2. Xspice for Linux - need help

3. What would cause the fork sys call t die?

4. WANdisco Announces New High Availability Disaster Recovery Solution for CVS, Subversion and CVSNT.

5. Strange side effects caused by calling sh functions with args.

6. Linux and Novell

7. Call to malloc() causes SEGV...

8. joe (the editor) scaling problems

9. Linking with libpthread causes gethostbyname call to crash

10. Help on resolving orphaned processes caused by system() call

11. reiserfs calls set_bit on not-a-long integer which causes oops on 64bit arches.

12. BUG() call in vmalloc.c causes segmentation fault.

13. call to close causing Segmentation fault