Javier Traver wrote:
> Just three simple questions.
> 1. Threads or MPI, or others?
> I am completely new to threads, so I think my question is very simple
> (even silly or stupid for many of you). I am still browsing books about
> the subject just to discover if threads would be the appropriate choice
> for me. My interest in using threads is efficiency, i.e., using several
> processors to achieve speed-up. I do not know whether threads are
> better/worse than other choices such as MPI or VPM, etc. How should one
> know which option is best? Does this depend on the particular problem
> one wants to parallelize? Or on the available hardware?
Yes to both.
First, what do you want to "speed up"? The most common definition is that
you want to use more than 1 unit of CPU per unit time -- parallel
processing. If that's what you mean, you'll get no benefit from threads on
a uniprocessor machine becauser there IS only one unit of CPU per unit
time. On the other hand, with MPI you can exploit two uniprocessors, or run
two processes on a dual processor system. MPI, therefore, provides
flexibility. (Though MPI still won't help if you have only ONE
uniprocessor.)
If your application involves frequent fine-grain communication between
computational units, you'll get far more speed-up in a well designed
threaded application than in a well designed MPI application. (Though one
can also argue that this is meaningless because a well designed MPI
application doesn't communicate that way!)
Whereas most people won't get near a single computer with more than 8 or so
CPUs, lots of people can find networks with tens of computers they can
access -- so in that sense MPI may be more widely applicable at high
scaling factors.
If carefully crafted, a threaded application's threads can efficiently
communicate with high bandwidth and frequency -- but you need to worry
about locking and cache line thrashing. On the other hand, you can share
data using ordinary language and OS concepts like pointers, heap, and even
static or extern data, which gives you a lot of power -- and
responsibility.
There's really no one right answer. Even for a given well-defined problem
you can usually structure the solution either way.
Sometimes the best answer, in fact, is: "both". That is, you may want to
distribute coarse-grain large jobs through MPI across a network, while each
individual node that happens to be a multiprocessor may exploit local
threading to parallelize its own piece of the job. (And sometimes the best
answer is: "neither". Some jobs are so communication-bound that they're
inherently serialized and the overhead of either threads or MPI is a total
waste of resources. For a degenerate example, the classic "Hello world"
program has been written many times as both a threaded and an MPI
application, but no matter how you structure it all "thread-ness" or
"MPI-ness" accomplishes nothing towards the goal of generating the console
message "Hello world".)
> 2. How to assign threads to processors and control the number of
> processors that will be available
> Now, assuming threads, my second question is how can one control how
> many processors to use and how to associate threads to processors. It
> seems a very basic thing, but I do not find this kind of information in
> the books I have. All I have come accross is an example program using
> sysconf(_SC_NPROCESSORS_ONL)
> to find out how many processors are available in the computer, and
> create as many threads as processors. After this, there is nothing in
> this sample program related to the assignment of threads to processes.
> It seems that, by default, one thread is assigned to each of the
> available processors. Am I right?
> Let N be the number of processors in my system. I'd like to observe the
> performance evolution when the program is run with 1,2,3,...,N
> processors. Should this be controlled within the program using threads?
You should OBSERVE, but most of the time you shouldn't MEDDLE. We dragged
our feet for years on adding an API to bind threads to processors. Not
because it's never useful, so much, but rather because it's one of those
"silly knobs" that seems to attract "twiddlers" with no idea what they're
really doing or why. When we finally did add it, under pressure, we gave it
a name that helps to explain the basic problem. A common name for such a
function includes the phrase "bind to CPU". But that's misleading, and
experience shows that it's widely misinterpreted. Instead, we used the
phrase "use only CPU". That is, the "victim" thread, no matter how busy
"CPU n" may be, and no matter how many other CPUs may be underutilized or
even idle, CANNOT use any CPU but "n".
There are cases, in monolithic embedded systems, where the application
really knows what all processors are doing at all times. That's rare in any
modern OS because there are daemons of all sorts, other users (even if your
application is the "major user" at any time), plus random interrupts,
background kernel maintenance (swapping, self-testing, polling, etc.).
If you can CONSISTENTLY get SUBSTANTIALLY better performance by binding
threads, then there's a serious problem with the scheduling of threads on
that system, and the problem extends way beyond your application. Complain,
and try to provoke a fix. The system's job is to deploy its resources
effectively. If it's not doing that job, you're doing nobody a favor by
hiding the evidence under the virtual rug.
Furthermore, modern multiprocessors are a lot more complicated. You're
likely to run into hotswap issues, where processors can be dynamically
added and removed during execution. If you're bound to a processor that
goes away, you're dead. If a new processor comes online, and all your
threads are bound to busy processors, you won't be able to exploit the new
one. Furthermore, many high performance multiprocessors are NUMA: "Non
Uniform Memory Architecture". Counts, or even "CPU ID" lists, just aren't
enough to efficiently exploit these systems -- you need to have the actual
hardware topology map. And that map may be dynamic.
So how do "we" (as standards definers and system implementers) describe to
"you" (as application designers and implementers) how to do all this? The
answer is that a bunch of us in POSIX spent a whole lot of time, in lift
lines at Snowbird and Alta in Utah, on rides at Disneyland, wandering the
streets of Amsterdam looking for cool restaurants, over pizza in Chicago,
and other stressfully serious business locations, discussing these
difficult issues... and short of developing a pretty good sized and
complicated standard just for that purpose, we couldn't figure out a
solution. There really wasn't enough support to even consider something
that complicated, and we decided something along the lines of "how many
processors are there?" wasn't even worth specifying.
Every system provides some way to query the topology, in a manner and form
deemed useful by the designers. Every system provides some way to control
the deployment of processes and threads across the available processors.
None of this is remotely portable, except by coincidence. Given the wide
variety of architectural constraints involved, that's probably the way it
should be.
> 3. Gentle start with threads
> A final third question: is there any URL with (very) easy introductions
> to threads (and possibly their use in parallel programs)? I need
> documents of the kind "threads made easy", you know ;-), or simple
> programs easy to understand, just to begin with and gain some
> confidence.
There are lots of online examples, ranging from trivially simple (for
example, my own book's "silly but obligatory 'hello world' example",
http://homepage.mac.com/dbutenhof/Threads/code/hello.c) to horrendously
complicated and convoluted. (I'll let you do your own searches for that end
of the spectrum.) In between, you'll find tons, some really good, and some
really bad. A good place to start would be my own book, Programming with
POSIX Threads (Addison-Wesley), with the source examples available from
http://homepage.mac.com/dbutenhof/Threads/code/; or Bil Lewis'
"Multithreaded Programming with Pthreads" (SunSoft Press), with examples
downloadable at http://www.LambdaCS.com/books/books.html. (You can read
mine online as well as downloading the full tar file, whereas Bil has only
posted the tar file.) Of course, I'd recommend reading my book, or Bil's,
rather than trying to learn just by reading the examples. The page on Bil's
site also links to a list of thread books he's compiled, if you want a
different perspective.
Ask questions. You've already found a good place for that.
--
/--------------------[ David.Buten...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/