Developing multi-threading applications

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 17:20:07



Hi All,

I'm designing a multithreding application with many threads,
from ~100 to 300/400. I need to take some decisions about
which threading library use, and which patch I need for the
kernel to improve the scheduler performances. The machines
will be a SMP Xeon with 4/8 processors with 4Gb RAM.
All threads are almost computational intensive and the library
need a fast interprocess comunication and syncronization
because there are many sync & async threads time
dependent and/or critical. I'm planning, in the future, to distribuite
all the threads in a pool of SMP box.

Thanks in advance.

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by David Schwart » Fri, 14 Jun 2002 18:20:07



>I'm designing a multithreding application with many threads,
>from ~100 to 300/400. I need to take some decisions about
>which threading library use, and which patch I need for the
>kernel to improve the scheduler performances. The machines
>will be a SMP Xeon with 4/8 processors with 4Gb RAM.
>All threads are almost computational intensive and the library
>need a fast interprocess comunication and syncronization
>because there are many sync & async threads time
>dependent and/or critical. I'm planning, in the future, to distribuite
>all the threads in a pool of SMP box.

        With 4/8 processors, you don't want to create 100-400 threads doing
computation intensive tasks. So redesign things so that the number of threads
you create is more in line with the number of CPUs you have available. That
is, use a 'thread per CPU' (or slightly more threads than their are CPUs per
node) approach and you'll perform a lot better. Distribute the available work
over the available threads.

        DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 18:20:20




> >I'm designing a multithreding application with many threads,
> >from ~100 to 300/400. I need to take some decisions about
> >which threading library use, and which patch I need for the
> >kernel to improve the scheduler performances. The machines
> >will be a SMP Xeon with 4/8 processors with 4Gb RAM.
> >All threads are almost computational intensive and the library
> >need a fast interprocess comunication and syncronization
> >because there are many sync & async threads time
> >dependent and/or critical. I'm planning, in the future, to distribuite
> >all the threads in a pool of SMP box.

>         With 4/8 processors, you don't want to create 100-400 threads doing
>computation intensive tasks. So redesign things so that the number of threads
>you create is more in line with the number of CPUs you have available. That
>is, use a 'thread per CPU' (or slightly more threads than their are CPUs per
>node) approach and you'll perform a lot better. Distribute the available work
>over the available threads.

You are right! But "computational intensive" is not totaly right as I say ;-),
because most of thread are waiting for I/O, after I/O are performed the
computational intensive tasks, finished its work all the result are sent
to thread-father, the father collect all the child's result and perform some
computational work and send its result to its father and so on with many
thread-father controlling other child. So I think the main problem/overhead
is thread creation and the thread's numbers.

Quote:>         DS

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Peter W?chtle » Fri, 14 Jun 2002 18:50:06





>> >I'm designing a multithreding application with many threads,
>> >from ~100 to 300/400. I need to take some decisions about
>> >which threading library use, and which patch I need for the
>> >kernel to improve the scheduler performances. The machines
>> >will be a SMP Xeon with 4/8 processors with 4Gb RAM.
>> >All threads are almost computational intensive and the library
>> >need a fast interprocess comunication and syncronization
>> >because there are many sync & async threads time
>> >dependent and/or critical. I'm planning, in the future, to distribuite
>> >all the threads in a pool of SMP box.

>>         With 4/8 processors, you don't want to create 100-400 threads
>> doing
>> computation intensive tasks. So redesign things so that the number of
>> threads
>> you create is more in line with the number of CPUs you have available.
>> That
>> is, use a 'thread per CPU' (or slightly more threads than their are
>> CPUs per
>> node) approach and you'll perform a lot better. Distribute the
>> available work
>> over the available threads.

> You are right! But "computational intensive" is not totaly right as I
> say ;-),
> because most of thread are waiting for I/O, after I/O are performed the
> computational intensive tasks, finished its work all the result are sent
> to thread-father, the father collect all the child's result and perform
> some
> computational work and send its result to its father and so on with many
> thread-father controlling other child. So I think the main problem/overhead
> is thread creation and the thread's numbers.

Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/

they provide M:N threading model where threads can live in userspace.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 19:00:12



Quote:>>You are right! But "computational intensive" is not totaly right as I say
>>;-),
>>because most of thread are waiting for I/O, after I/O are performed the
>>computational intensive tasks, finished its work all the result are sent
>>to thread-father, the father collect all the child's result and perform some
>>computational work and send its result to its father and so on with many
>>thread-father controlling other child. So I think the main problem/overhead
>>is thread creation and the thread's numbers.

>Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/

>they provide M:N threading model where threads can live in userspace.

Yes! I'm looking for it. But I want evaluate some other before.

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by David Schwart » Fri, 14 Jun 2002 19:20:15



>You are right! But "computational intensive" is not totaly right as I say ;-
>),

        It's really not fair to change the premises in the middle of an argument.

Quote:>because most of thread are waiting for I/O,

        Still wrong. You don't tie up threads waiting for I/O. You can wait without
having a thread doing the waiting.

Quote:>after I/O are performed the
>computational intensive tasks, finished its work all the result are sent
>to thread-father,

        Okay, so you need a new abstraction -- separate the waiting from the
working. Create as many threads to do the work as you have processors to do
the work on. As for the waiting, minimize threads waiting, they're pure
overhead. If it's sockets, use 'poll' so one thread can do lots of waiting.

Quote:>the father collect all the child's result and perform some
>computational work and send its result to its father and so on with many
>thread-father controlling other child. So I think the main problem/overhead
>is thread creation and the thread's numbers.

        So get rid of the problem! Don't create so many threads, create only as many
threads as can do useful work and reuse them rather than destroying and
recreating them. Solve the actual problem/overhead since it's totally
artificial and due to your model rather than your problem!

        DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Peter W?chtle » Fri, 14 Jun 2002 19:20:16




>>> You are right! But "computational intensive" is not totaly right as I
>>> say ;-),
>>> because most of thread are waiting for I/O, after I/O are performed the
>>> computational intensive tasks, finished its work all the result are sent
>>> to thread-father, the father collect all the child's result and
>>> perform some
>>> computational work and send its result to its father and so on with many
>>> thread-father controlling other child. So I think the main
>>> problem/overhead
>>> is thread creation and the thread's numbers.

>> Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/

>> they provide M:N threading model where threads can live in userspace.

> Yes! I'm looking for it. But I want evaluate some other before.

There is a paper rse-pmt.ps included in the tar archives from Ralf Engelschall
(author of GNU portable threads).

There you will find lots of interesting pointers to other thread packages.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 19:30:09




> > You are right! But "computational intensive" is not totaly right as I
> say ;-),
> > because most of thread are waiting for I/O, after I/O are performed the
> > computational intensive tasks, finished its work all the result are sent
> > to thread-father, the father collect all the child's result and perform
> some
> > computational work and send its result to its father and so on with many
> > thread-father controlling other child. So I think the main problem/overhead
> > is thread creation and the thread's numbers.

>So you are creating a simulation/emulation application/engine, right?
>Or a measured data analysis engine? (which is basically the same
>task)

Yes! It's a simulation/emulation application.

Quote:>For these kind of tasks creating your own kind of "threads" is
>probably better.

>Split it in the following data structure:

>struct my_thread {
>    actor_function_t actor;
>    input_t inbuf;
>    output_t outbuf;
>    state_t statebuf;
>}

>And provide rules and primitives for accessing inbuf/outbuf, if
>they might be shared (which is probable).

This can be a solution.

Quote:>Now you can build a dependency tree/graph for the whole stuff
>easily and schedule works of the same level to some real worker
>threads (which might be on different machines), which are one per CPU.

>The problem is to build the actor as a REAL primitive, that
>scales only by the size of inbuf and not by the contents of it.

Yes!

Quote:>Everything else is going to be bloated and not really scalable,
>but can be implemented by every "Joe Programmer" after finishing
>high school ;-)

Depending by the threading library, if it's totaly userspace or not!
With so many thread that aren't totaly userspace the scheduler
performances/caratteristics are much important. I prefer a mixed
solution for example. Because some problem can be easily resolved
with a userspace threads and other not.

Quote:>Regards

>Ingo Oeser
>--
>Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 19:50:08





>>>>You are right! But "computational intensive" is not totaly right as I
>>>>say ;-),
>>>>because most of thread are waiting for I/O, after I/O are performed the
>>>>computational intensive tasks, finished its work all the result are sent
>>>>to thread-father, the father collect all the child's result and perform
>>>>some
>>>>computational work and send its result to its father and so on with many
>>>>thread-father controlling other child. So I think the main problem/overhead
>>>>is thread creation and the thread's numbers.

>>>Have a look at http://www-124.ibm.com/developerworks/opensource/pthreads/

>>>they provide M:N threading model where threads can live in userspace.

>>Yes! I'm looking for it. But I want evaluate some other before.

And I don't want use a library that's totally in userspace.

Quote:>There is a paper rse-pmt.ps included in the tar archives from Ralf Engelschall
>(author of GNU portable threads).

>There you will find lots of interesting pointers to other thread packages.

I'll take a look. Thanks!

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Fri, 14 Jun 2002 20:30:07




> >You are right! But "computational intensive" is not totaly right as I say ;-
> >),

>         It's really not fair to change the premises in the middle of an
> argument.

Sorry ;-)!

Quote:> >because most of thread are waiting for I/O,

>         Still wrong. You don't tie up threads waiting for I/O. You can
> wait without
>having a thread doing the waiting.

> >after I/O are performed the
> >computational intensive tasks, finished its work all the result are sent
> >to thread-father,

>         Okay, so you need a new abstraction -- separate the waiting from the
>working. Create as many threads to do the work as you have processors to do
>the work on. As for the waiting, minimize threads waiting, they're pure
>overhead. If it's sockets, use 'poll' so one thread can do lots of waiting.

This's a possible solution.

Quote:> >the father collect all the child's result and perform some
> >computational work and send its result to its father and so on with many
> >thread-father controlling other child. So I think the main problem/overhead
> >is thread creation and the thread's numbers.

>         So get rid of the problem! Don't create so many threads, create
> only as many
>threads as can do useful work and reuse them rather than destroying and
>recreating them. Solve the actual problem/overhead since it's totally
>artificial and due to your model rather than your problem!

Depending by the applications. With my simulation/emulation program I need
to create
many thread because each thread resolve/manage/compute a specific problem and
it's live depend by some factors. Each thread is create only if needed to
avoid the
overhead. The simulation/emulation is a "merge" of many and many object,
each object
work to resolve/manage/compute a specific problem. All the low objects are
grouped to
resolve a specific problem and are managed by a thread controller that
should take some
decision or doing some work. Some thread controller are grouped and managed
by another
thread controller and so on. Do not think that I need always 400 threads
active they are
create only if need by the controller. You must thinks this
simulation/emulation as collection
of many and many object that should interoperate, and the model is designed
to scale easily
on a distribuite environment.

Quote:>         DS

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by David Schwart » Fri, 14 Jun 2002 21:00:12


Quote:>Depending by the applications. With my simulation/emulation program I need
>to create
>many thread because each thread resolve/manage/compute a specific problem
and
>it's live depend by some factors. Each thread is create only if needed to
>avoid the
>overhead. The simulation/emulation is a "merge" of many and many object,
>each object
>work to resolve/manage/compute a specific problem. All the low objects are
>grouped to
>resolve a specific problem and are managed by a thread controller that
>should take some
>decision or doing some work. Some thread controller are grouped and managed
>by another
>thread controller and so on. Do not think that I need always 400 threads
>active they are
>create only if need by the controller. You must thinks this
>simulation/emulation as collection
>of many and many object that should interoperate, and the model is designed
>to scale easily
>on a distribuite environment.

        If it's a simulation, you don't *really* need the threads, you just need to
be able to act as if you had them. After all, what are you simulating if what
work gets done when is up to the random vagaries of the OS scheduler?

        If it's a real application wanting real performance, the suggestions I made
stand -- you don't want many more threads working than you have CPUs and you
don't want a lot of threads sitting around waiting for work (and thus forcing
bazillions of extra context switches).

        It sounds to me like your design is broken, needlessly mapping threads to
I/Os that are being waited for one-to-one. This is a common error among
programmers who consciously or subconsciously have accepted the 'more threads
can do more work' philosophy.

        What you need to do is take whatever it is you're thinking of as a 'thread'
right now, which I'd roughly define as 'one logical task, from start to
completion' and realize that there is absolutely no reason to map this
one-to-one to actual pthreads threads and every reason in the world not to.

        This will conserve resources (12 thread stacks instead of 300, 12 KSEs
instead of 300), reduce context switches (context switches will only occur
when there's no work to do at all or a thread uses up its entire timeslice
rather than every time we change which client/task we're doing work for/on),
improve scheduler efficiency (because the number of ready threads will not
exceed the number of CPUs by much) and more often than not, clean up a lot of
ugliness in your architecture (because threads are probably being used
instead of a sane abstraction for 'work to be done' or 'a client I'm doing
work for').

        DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Sat, 15 Jun 2002 01:30:13



Quote:>         If it's a simulation, you don't *really* need the threads, you
> just need to
>be able to act as if you had them. After all, what are you simulating if what
>work gets done when is up to the random vagaries of the OS scheduler?

>         If it's a real application wanting real performance, the
> suggestions I made
>stand -- you don't want many more threads working than you have CPUs and you
>don't want a lot of threads sitting around waiting for work (and thus forcing
>bazillions of extra context switches).

This is a scheduler problem! All threads waiting for I/O are blocked by
the scheduler, and this doesn't have any impact for the context switches
it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
of code, it should make a big difference for example.

Quote:>         It sounds to me like your design is broken, needlessly mapping
> threads to
>I/Os that are being waited for one-to-one. This is a common error among
>programmers who consciously or subconsciously have accepted the 'more threads
>can do more work' philosophy.

I don't think "more threads == more work done"! With the thread's approch it's
possible to split a big sequential program in a variety of concurrent logical
programs with a big win for code revisions and new implementation.

Quote:>         What you need to do is take whatever it is you're thinking of as
> a 'thread'
>right now, which I'd roughly define as 'one logical task, from start to
>completion' and realize that there is absolutely no reason to map this
>one-to-one to actual pthreads threads and every reason in the world not to.

>         This will conserve resources (12 thread stacks instead of 300, 12
> KSEs
>instead of 300), reduce context switches (context switches will only occur
>when there's no work to do at all or a thread uses up its entire timeslice
>rather than every time we change which client/task we're doing work for/on),
>improve scheduler efficiency (because the number of ready threads will not
>exceed the number of CPUs by much) and more often than not, clean up a lot of
>ugliness in your architecture (because threads are probably being used
>instead of a sane abstraction for 'work to be done' or 'a client I'm doing
>work for').

You are right! But depend by the application! If you have todo I/O like
signal acquisition,
sensors acquisitions and so on, you must have a one thread for each type of
data acquisition,
you must have a thread that perform some data computation with a subset,
for examples,
of this data, and generate the output that could be a new input for an
other thread.
This make the environment more realistic. I agree with you that if we
increase the thread's
numbers the system could collapse (= context switches become expensive = we
must increase
the CPU numbers or new box is required or new approch should be make).

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by David Schwart » Sun, 16 Jun 2002 06:00:06




>This is a scheduler problem! All threads waiting for I/O are blocked by
>the scheduler, and this doesn't have any impact for the context switches
>it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
>of code, it should make a big difference for example.

        You are incorrect. If you have ten threads each waiting for an I/O and all
ten I/Os are ready, then ten context switches are needed. If you have one
thread waiting for ten I/Os, and then I/Os come ready, one context switch is
needed.

[snip]

Quote:>I don't think "more threads == more work done"! With the thread's approch
>it's
>possible to split a big sequential program in a variety of concurrent
>logical
>programs with a big win for code revisions and new implementation.

        I'm not advising eliminating the threads approach. I'm only advising not
using threads as your abstraction for clients or work to be done. Use threads
as the execution vehicles that pick up work when there's work to be done.
(Think thread pools, think separating I/O from computation.)

[snip]

Quote:>You are right! But depend by the application! If you have todo I/O like
>signal acquisition,
>sensors acquisitions and so on, you must have a one thread for each type of
>data acquisition,

        Even if that's true, and it's often not, how many different types of data
acquisition can you have? Ten? Twenty? That's a far cry from 300.

        DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Roberto Ficher » Sun, 16 Jun 2002 18:10:05





> >This is a scheduler problem! All threads waiting for I/O are blocked by
> >the scheduler, and this doesn't have any impact for the context switches
> >it increase only the waitqueue, using the Ingo's O(1) scheduler, a big piece
> >of code, it should make a big difference for example.

>         You are incorrect. If you have ten threads each waiting for an
> I/O and all
>ten I/Os are ready, then ten context switches are needed. If you have one
>thread waiting for ten I/Os, and then I/Os come ready, one context switch is
>needed.

You are right with this specific case, but always depending what kind of I/O
you must be done. Not all the case could be reduce to your logic, only a
specific case. It's a only "local" optimization.

Quote:>[snip]

> >I don't think "more threads == more work done"! With the thread's approch
> >it's
> >possible to split a big sequential program in a variety of concurrent
> >logical
> >programs with a big win for code revisions and new implementation.

>         I'm not advising eliminating the threads approach. I'm only
> advising not
>using threads as your abstraction for clients or work to be done. Use threads
>as the execution vehicles that pick up work when there's work to be done.
>(Think thread pools, think separating I/O from computation.)

Yes! This is what I want!

Quote:>[snip]
> >You are right! But depend by the application! If you have todo I/O like
> >signal acquisition,
> >sensors acquisitions and so on, you must have a one thread for each type of
> >data acquisition,

>         Even if that's true, and it's often not, how many different types
> of data
>acquisition can you have? Ten? Twenty? That's a far cry from 300.

Currently are 190! Always active are ~110! So thinking by separating I/O from
the computation we double the threads.

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Developing multi-threading applications

Post by Ingo Oese » Sun, 16 Jun 2002 20:10:04



> >         Even if that's true, and it's often not, how many different types
> > of data
> >acquisition can you have? Ten? Twenty? That's a far cry from 300.

> Currently are 190! Always active are ~110! So thinking by separating I/O from
> the computation we double the threads.

So basically you are just traversing your data depedency graph
wrongly. Do a level order traversion if it is a dependency forest
or an breadth first traversion if not.

If this node require IO -> schedule the IO and return back to the upper
level noticing it, that you like to be woken, if the IO is
finished.

If this node require Computation -> do it, if this CPU is the one with
lowest load, else schedule it for the CPU with lowest load.

Continue with next node.

(load is meant "number of compuations with same metric scheduled
on this thread")

Use only one thread per CPU. Try to make the IO-Waiting as unique
as possible (poll would be perfect).

So this is all doable, once you analyze your data dependency
graph properly and make the simulation data driven (which it
usally is).

Regards

Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Multi-threading in unix applications

OK, here's an easy question for you unix programming gurus...

How do you write a multi-threaded application in C for unix systems?

Are there OS calls that facilitate this? or are there commonly used libraries
for making this easier.  Ideally I'd just fork a new process for each thread,
but I'd like for the program to be as self-contained as possible, and not
appear to be a lot of running processes.

Any ideas?  Thanks.

Jim
_______________________________________________________________________________

_______________________________________________________________________________
Our request is in keeping with the preference of the community,    -Post Office
as reflected by local ordinance.                                    Sign

2. DLINK DE-1500 Hub administration

3. Multi-threaded application thread stops receiving signals

4. Shop at stores selling Linux

5. malloc()/free() hangs in a multi-threaded application

6. sed query

7. dlclose in a multi-threaded application

8. Solaris -- Slow Useradd

9. many open files in multi-threaded application and segmentation fault?

10. setsockopt() in multi-threaded application

11. multiple timers for multi-thread applications

12. Problems with debugging multi-threaded application with gdb

13. Leaks in multi-threaded application