threads and processors

threads and processors

Post by Javier Trave » Sat, 11 Jan 2003 03:06:01



Hi everybody,

Just three simple questions.

1. Threads or MPI, or others?

I am completely new to threads, so I think my question is very simple
(even silly or stupid for many of you). I am still browsing books about
the subject just to discover if threads would be the appropriate choice
for me. My interest in using threads is efficiency, i.e., using several
processors to achieve speed-up. I do not know whether threads are
better/worse than other choices such as MPI or VPM, etc. How should one
know which option is best? Does this depend on the particular problem
one wants to parallelize? Or on the available hardware?

2. How to assign threads to processors  and control the number of
processors that will be available

Now, assuming threads, my second question is how can one control how
many processors to use and how to associate threads to processors. It
seems a very basic thing, but I do not find this kind of information in
the books I have. All I have come accross is an example program using

sysconf(_SC_NPROCESSORS_ONL)

to find out how many processors are available in the computer, and
create as many threads as processors. After this, there is nothing in
this sample program related to the assignment of threads to processes.
It seems that, by default, one thread is assigned to each of the
available processors. Am I right?

Let N be the number of processors in my system. I'd like to observe the
performance evolution when the program is run with 1,2,3,...,N
processors. Should this be controlled within the program using threads?

3. Gentle start with threads
A final third question: is there any URL with (very) easy introductions
to threads (and possibly their use in parallel programs)? I need
documents of the kind "threads made easy", you know ;-), or simple
programs easy to understand, just to begin with and gain some
confidence.

Thank-you very much in advance for your help,

Javier

 
 
 

threads and processors

Post by David Butenho » Sat, 11 Jan 2003 22:38:02


Javier Traver wrote:
> Just three simple questions.

> 1. Threads or MPI, or others?

> I am completely new to threads, so I think my question is very simple
> (even silly or stupid for many of you). I am still browsing books about
> the subject just to discover if threads would be the appropriate choice
> for me. My interest in using threads is efficiency, i.e., using several
> processors to achieve speed-up. I do not know whether threads are
> better/worse than other choices such as MPI or VPM, etc. How should one
> know which option is best? Does this depend on the particular problem
> one wants to parallelize? Or on the available hardware?

Yes to both.

First, what do you want to "speed up"? The most common definition is that
you want to use more than 1 unit of CPU per unit time -- parallel
processing. If that's what you mean, you'll get no benefit from threads on
a uniprocessor machine becauser there IS only one unit of CPU per unit
time. On the other hand, with MPI you can exploit two uniprocessors, or run
two processes on a dual processor system. MPI, therefore, provides
flexibility. (Though MPI still won't help if you have only ONE
uniprocessor.)

If your application involves frequent fine-grain communication between
computational units, you'll get far more speed-up in a well designed
threaded application than in a well designed MPI application. (Though one
can also argue that this is meaningless because a well designed MPI
application doesn't communicate that way!)

Whereas most people won't get near a single computer with more than 8 or so
CPUs, lots of people can find networks with tens of computers they can
access -- so in that sense MPI may be more widely applicable at high
scaling factors.

If carefully crafted, a threaded application's threads can efficiently
communicate with high bandwidth and frequency -- but you need to worry
about locking and cache line thrashing. On the other hand, you can share
data using ordinary language and OS concepts like pointers, heap, and even
static or extern data, which gives you a lot of power -- and
responsibility.

There's really no one right answer. Even for a given well-defined problem
you can usually structure the solution either way.

Sometimes the best answer, in fact, is: "both". That is, you may want to
distribute coarse-grain large jobs through MPI across a network, while each
individual node that happens to be a multiprocessor may exploit local
threading to parallelize its own piece of the job. (And sometimes the best
answer is: "neither". Some jobs are so communication-bound that they're
inherently serialized and the overhead of either threads or MPI is a total
waste of resources. For a degenerate example, the classic "Hello world"
program has been written many times as both a threaded and an MPI
application, but no matter how you structure it all "thread-ness" or
"MPI-ness" accomplishes nothing towards the goal of generating the console
message "Hello world".)

> 2. How to assign threads to processors  and control the number of
> processors that will be available

> Now, assuming threads, my second question is how can one control how
> many processors to use and how to associate threads to processors. It
> seems a very basic thing, but I do not find this kind of information in
> the books I have. All I have come accross is an example program using

> sysconf(_SC_NPROCESSORS_ONL)

> to find out how many processors are available in the computer, and
> create as many threads as processors. After this, there is nothing in
> this sample program related to the assignment of threads to processes.
> It seems that, by default, one thread is assigned to each of the
> available processors. Am I right?

> Let N be the number of processors in my system. I'd like to observe the
> performance evolution when the program is run with 1,2,3,...,N
> processors. Should this be controlled within the program using threads?

You should OBSERVE, but most of the time you shouldn't MEDDLE. We dragged
our feet for years on adding an API to bind threads to processors. Not
because it's never useful, so much, but rather because it's one of those
"silly knobs" that seems to attract "twiddlers" with no idea what they're
really doing or why. When we finally did add it, under pressure, we gave it
a name that helps to explain the basic problem. A common name for such a
function includes the phrase "bind to CPU". But that's misleading, and
experience shows that it's widely misinterpreted. Instead, we used the
phrase "use only CPU". That is, the "victim" thread, no matter how busy
"CPU n" may be, and no matter how many other CPUs may be underutilized or
even idle, CANNOT use any CPU but "n".

There are cases, in monolithic embedded systems, where the application
really knows what all processors are doing at all times. That's rare in any
modern OS because there are daemons of all sorts, other users (even if your
application is the "major user" at any time), plus random interrupts,
background kernel maintenance (swapping, self-testing, polling, etc.).

If you can CONSISTENTLY get SUBSTANTIALLY better performance by binding
threads, then there's a serious problem with the scheduling of threads on
that system, and the problem extends way beyond your application. Complain,
and try to provoke a fix. The system's job is to deploy its resources
effectively. If it's not doing that job, you're doing nobody a favor by
hiding the evidence under the virtual rug.

Furthermore, modern multiprocessors are a lot more complicated. You're
likely to run into hotswap issues, where processors can be dynamically
added and removed during execution. If you're bound to a processor that
goes away, you're dead. If a new processor comes online, and all your
threads are bound to busy processors, you won't be able to exploit the new
one. Furthermore, many high performance multiprocessors are NUMA: "Non
Uniform Memory Architecture". Counts, or even "CPU ID" lists, just aren't
enough to efficiently exploit these systems -- you need to have the actual
hardware topology map. And that map may be dynamic.

So how do "we" (as standards definers and system implementers) describe to
"you" (as application designers and implementers) how to do all this? The
answer is that a bunch of us in POSIX spent a whole lot of time, in lift
lines at Snowbird and Alta in Utah, on rides at Disneyland, wandering the
streets of Amsterdam looking for cool restaurants, over pizza in Chicago,
and other stressfully serious business locations, discussing these
difficult issues... and short of developing a pretty good sized and
complicated standard just for that purpose, we couldn't figure out a
solution. There really wasn't enough support to even consider something
that complicated, and we decided something along the lines of "how many
processors are there?" wasn't even worth specifying.

Every system provides some way to query the topology, in a manner and form
deemed useful by the designers. Every system provides some way to control
the deployment of processes and threads across the available processors.
None of this is remotely portable, except by coincidence. Given the wide
variety of architectural constraints involved, that's probably the way it
should be.

> 3. Gentle start with threads
> A final third question: is there any URL with (very) easy introductions
> to threads (and possibly their use in parallel programs)? I need
> documents of the kind "threads made easy", you know ;-), or simple
> programs easy to understand, just to begin with and gain some
> confidence.

There are lots of online examples, ranging from trivially simple (for
example, my own book's "silly but obligatory 'hello world' example",
http://homepage.mac.com/dbutenhof/Threads/code/hello.c) to horrendously
complicated and convoluted. (I'll let you do your own searches for that end
of the spectrum.) In between, you'll find tons, some really good, and some
really bad. A good place to start would be my own book, Programming with
POSIX Threads (Addison-Wesley), with the source examples available from
http://homepage.mac.com/dbutenhof/Threads/code/; or Bil Lewis'
"Multithreaded Programming with Pthreads" (SunSoft Press), with examples
downloadable at http://www.LambdaCS.com/books/books.html. (You can read
mine online as well as downloading the full tar file, whereas Bil has only
posted the tar file.) Of course, I'd recommend reading my book, or Bil's,
rather than trying to learn just by reading the examples. The page on Bil's
site also links to a list of thread books he's compiled, if you want a
different perspective.

Ask questions. You've already found a good place for that.

--
/--------------------[ David.Buten...@hp.com ]--------------------\
| Hewlett-Packard Company       Tru64 UNIX & VMS Thread Architect |
|     My book: http://www.awl.com/cseng/titles/0-201-63392-2/     |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

 
 
 

threads and processors

Post by Volodymyr Tarasen » Sat, 11 Jan 2003 23:27:09


Quote:> 1. Threads or MPI, or others?

> I am completely new to threads, so I think my question is very simple
> (even silly or stupid for many of you). I am still browsing books about
> the subject just to discover if threads would be the appropriate choice
> for me. My interest in using threads is efficiency, i.e., using several
> processors to achieve speed-up. I do not know whether threads are
> better/worse than other choices such as MPI or VPM, etc.

                                                 ^^^ ops, do you mean
PVM (paralel virtual machine)?;)
Quote:> How should one
> know which option is best? Does this depend on the particular problem
> one wants to parallelize? Or on the available hardware?

Some times threads are perfect solution, some time are worst. It is
depend on used architecture (SMP, MPP, NUMA) and, of course, your
problem.
But I think it is incorrect to compare those interfaces. MPI is
message oriented interface and could be used on clasters, MPP systems
or so, threads is designed only for SMP systems. MPI is operating on
process level and could be used on SMP systems (I do mean hardware)
and in such case, processes interact via shared memory. In opposite,
threads is working in the same address segements - it is one process.

Quote:> 2. How to assign threads to processors  and control the number of
> processors that will be available

> Now, assuming threads, my second question is how can one control how
> many processors to use and how to associate threads to processors. It
> seems a very basic thing, but I do not find this kind of information in
> the books I have. All I have come accross is an example program using

> sysconf(_SC_NPROCESSORS_ONL)

> to find out how many processors are available in the computer, and
> create as many threads as processors. After this, there is nothing in
> this sample program related to the assignment of threads to processes.
> It seems that, by default, one thread is assigned to each of the
> available processors. Am I right?

In whole yes, but you should understand thread is working within
process. If you want MxN model you should use
pthread_setconcurrency().

Quote:> Let N be the number of processors in my system. I'd like to observe the
> performance evolution when the program is run with 1,2,3,...,N
> processors. Should this be controlled within the program using threads?

Actually I think, it should not.

Of course, I could be wrong;)

Best regards,
Volodymyr!

 
 
 

threads and processors

Post by Alexander Terekho » Sat, 11 Jan 2003 23:47:24


[...]

Quote:> If you want MxN model you should use pthread_setconcurrency().

Why?

regards,
alexander.

 
 
 

threads and processors

Post by Patrick TJ McPh » Sun, 12 Jan 2003 06:07:40



% Hi everybody,
%
% Just three simple questions.
%
% 1. Threads or MPI, or others?

A simple question without a simple answer. It depends on what you're
trying to do.

% 2. How to assign threads to processors  and control the number of
% processors that will be available

There's no portable way to do this using posix threads (which I'll
assume since you mention sysconf()). Some systems provide ways of
binding a particular thread to a particular processor, but you're
generally better off trusting the scheduler to use the available
resources as effectively as possible. Try looking for `affinity' in
your system documentation.

[...]

% It seems that, by default, one thread is assigned to each of the
% available processors. Am I right?

Assuming a reasonable scheduler and no other activity on the system,
this will likely happen. Some systems fail to do this (Solaris is
notorious for this, and it seems like some recent AIX versions have
had similar problems).

% A final third question: is there any URL with (very) easy introductions
% to threads (and possibly their use in parallel programs)? I need
% documents of the kind "threads made easy", you know ;-), or simple
% programs easy to understand, just to begin with and gain some
% confidence.

I don't know of any good on-line introductions. Butenhof's book,
_Programming With POSIX Threads_ comes highly recommended.
--

Patrick TJ McPhee
East York  Canada

 
 
 

threads and processors

Post by Markus Elfri » Mon, 13 Jan 2003 00:05:48


1. Did you choose a functions or class library for your needs?
- Message Passing Interface
  http://www.mpi-forum.org/
  http://www.computer-science.cardiff.ac.uk/user/David.W.Walker/seminar...
- Multiprocessing standard
  http://www.OpenMP.org/
- Non-Uniform Memory Access
  http://lse.sourceforge.net/numa/
  http://www.epcc.ed.ac.uk/direct/newsletter5/node15.html

2. Example: Windows Platform SDK
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/win6...

3. Please read the frequently asked questions and their answers.
http://groups.google.com/groups?group=comp.programming.threads&selm=4...

 
 
 

threads and processors

Post by Volodymyr Tarasen » Tue, 14 Jan 2003 18:53:26




> [...]
> > If you want MxN model you should use pthread_setconcurrency().

> Why?

As I understand, this function informs the system of how many active
kernel entities (lightweigth process) will be in this process. So, for
MxN model it should be pthread_setconcurrency(N). Is it correct?
But of course, I should say that it is just a hint to system. For
instance, in Linux, this call does nothing, just stores a new value.
Correct me if I was wrong.

Regards,
Volodymyr!

 
 
 

threads and processors

Post by Steve Wa » Wed, 15 Jan 2003 01:33:01






>> [...]
>> > If you want MxN model you should use pthread_setconcurrency().

>> Why?
>As I understand, this function informs the system of how many active
>kernel entities (lightweigth process) will be in this process. So, for
>MxN model it should be pthread_setconcurrency(N). Is it correct?

That's the Solaris theory, but it's not even true there.

Quote:>But of course, I should say that it is just a hint to system. For
>instance, in Linux, this call does nothing, just stores a new value.
>Correct me if I was wrong.

In any correctly operating threads implementation, the Linux
implementation does all that is needed.  Because Sun wasn't able to
figure out how to do correct scheduling for run-bound threads in their
M:N threading package, they forced this through the standards committee.
However, it appears that even Sun has given up on their old broken
library, and is now recommending that everyone use the 1:1 lib.

Heck, I'd go out on a limb and say sched_yield() has a more useful
operation than pthread_setconcurrency(). :)
--
Steve Watt KD6GGD  PP-ASEL-IA          ICBM: 121W 56' 57.8" / 37N 20' 14.9"

   Free time?  There's no such thing.  It just comes in varying prices...

 
 
 

threads and processors

Post by David Butenho » Wed, 15 Jan 2003 22:13:18








>>> [...]
>>> > If you want MxN model you should use pthread_setconcurrency().

>>> Why?
>>As I understand, this function informs the system of how many active
>>kernel entities (lightweigth process) will be in this process. So, for
>>MxN model it should be pthread_setconcurrency(N). Is it correct?

> That's the Solaris theory, but it's not even true there.

>>But of course, I should say that it is just a hint to system. For
>>instance, in Linux, this call does nothing, just stores a new value.
>>Correct me if I was wrong.

Even in Solaris, [p]th[rea]d_setconcurrency() was a "hint" that wasn't
really guaranteed to do much, and in any case had an effect that was
temporally limited. (That is, if you get new LWPs to order, the system can
decide on its own to shut them down later if it doesn't think you're using
them enough.)

Quote:> In any correctly operating threads implementation, the Linux
> implementation does all that is needed.  Because Sun wasn't able to
> figure out how to do correct scheduling for run-bound threads in their
> M:N threading package, they forced this through the standards committee.
> However, it appears that even Sun has given up on their old broken
> library, and is now recommending that everyone use the 1:1 lib.

And on Solaris 9 the old M:N library has gone away, leaving no alternative
but the 1:1 version. Given that Sun lacked the ability (or, more
accurately, the corporate willingness) to fix the basic flaws in their M:N
scheduler, that was a reasonably good decision.

Quote:> Heck, I'd go out on a limb and say sched_yield() has a more useful
> operation than pthread_setconcurrency(). :)

The joke, unfortunately, will probably be missed by the many who think that
sched_yield() does something useful. ;-)

The whole concept of pthread_setconcurrency() is flawed because the
application doesn't have enough information to judge whether additional
"kernel execution entities" will help or hurt. THAT requires knowledge of
what all the CPUs and active entities on the system are doing. It's load
balancing and scheduling, and that's what the scheduler should (and must)
do to be worthy of the title.

--

| Hewlett-Packard Company       Tru64 UNIX & VMS Thread Architect |
|     My book: http://www.awl.com/cseng/titles/0-201-63392-2/     |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/