Design of a server: multi-threaded or multi-processes?

Design of a server: multi-threaded or multi-processes?

Post by PANG RA » Tue, 16 Oct 2001 22:27:24



Hi gurus,

In my final year thesis project I have to design a server that handles
queries from many (likely thousands of) clients. I have previous
experiences of working with POSIX threads and I decide to implement the
server for UNIX (at least for now). I am not sure if I should design the
server such that it handles client requests with threads or subprocesses.
Is there any general guidelines concerning efficiency and stability and
the magnitude of the number of clients for the choice?

BTW, MS windows doesnt make any sense to me and I dont know anything about
it. Therefore I think I am asking a UNIX specific question, as least I
intend to.

Thanks
-----
Ran Pang
SE 99

http://www.cas.mcmaster.ca/~pangr
tel: (905)529-5619

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Benjamin Kaufma » Tue, 16 Oct 2001 23:22:15


Part of it depends upon what the applications is and tradeoffs between
performance and reliability.

Since this is for your thesis, I strongly recommend that you spend a few days
doing a proto of each to demonstrate to your professor a pragmatic understanding
of this material. It's far better to show him/her experimental testcode rather
than quotes from a news group.

Ben


>Hi gurus,

>In my final year thesis project I have to design a server that handles
>queries from many (likely thousands of) clients. I have previous
>experiences of working with POSIX threads and I decide to implement the
>server for UNIX (at least for now). I am not sure if I should design the
>server such that it handles client requests with threads or subprocesses.
>Is there any general guidelines concerning efficiency and stability and
>the magnitude of the number of clients for the choice?

>BTW, MS windows doesnt make any sense to me and I dont know anything about
>it. Therefore I think I am asking a UNIX specific question, as least I
>intend to.

>Thanks
>-----
>Ran Pang
>SE 99

>http://www.cas.mcmaster.ca/~pangr
>tel: (905)529-5619


 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Andrew Giert » Tue, 16 Oct 2001 23:36:55


 PANG> Hi gurus,
 PANG> In my final year thesis project I have to design a server that
 PANG> handles queries from many (likely thousands of) clients. I have
 PANG> previous experiences of working with POSIX threads and I decide
 PANG> to implement the server for UNIX (at least for now). I am not
 PANG> sure if I should design the server such that it handles client
 PANG> requests with threads or subprocesses.  Is there any general
 PANG> guidelines concerning efficiency and stability and the
 PANG> magnitude of the number of clients for the choice?

The precise nature of the server can make a big difference to the
design choice between threads and processes. Processes give you more
isolation between clients; threads allow tighter coupling. So if the
actions of one client directly affect what happens to others, e.g.  in
a chat server or an NNTP transit server, then having a single process
with either threads or select/poll multiplexing may be best; if
clients are independent but don't specifically need isolating (e.g. an
NNRP server), then pre-forked processes or a hybrid threads/processes
design may be best; if different clients need to act with different
user credentials (e.g. non-anonymous FTP), then threads are useless
and you need a process-per-client model for security.

Having a single process with many threads is of course less robust
because any bug that brings the process down will kill all the current
connections.

So there's no general answer to this, only tradeoffs that differ
according to the actual function of the server.

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
                           or <URL: http://www.whitefang.com/unix/>

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by David Schwart » Wed, 17 Oct 2001 03:39:03



> In my final year thesis project I have to design a server that handles
> queries from many (likely thousands of) clients. I have previous
> experiences of working with POSIX threads and I decide to implement the
> server for UNIX (at least for now). I am not sure if I should design the
> server such that it handles client requests with threads or subprocesses.
> Is there any general guidelines concerning efficiency and stability and
> the magnitude of the number of clients for the choice?

        You should consider more than just these two architectures. You should
consider:

        1) One process, one thread. In this architecture, your application is
basically a big loop around 'select' or 'poll'. In the loop, you do all
the work until you need to wait for I/O, then you call 'select' or
'poll' again.

        2) One process, one thread for each client. In this architecture, you
have one thread for each client, generally blocked on 'recv' or 'read'.
One thread blocked on 'listen' accepts new clients.

        3) One process, pool of threads. In this architecture, one or more
threads loop on 'select' or 'poll' and when there's work to be done,
they put it on a work queue. A small number of threads pull jobs from
the work queue and service them.

        4) One process for each client. In this architecture, a master process
listens for new connections and forks sub-processes to handle them. In a
more advanced version, the sub-processes stay running as clients are
assigned to them by the master process. Each worker process remains
assigned to a single client from start to finish. The client processes
could call 'accept' themselves or the master process could accept the
connections and hand them off.

        5) Process poll. In this architecture, shared memory is used between
processes for client state informatino. File descriptors are handed off
from process to process. A process can work on a given client, but if it
winds up waiting too long, it can work on another client. The number of
processes used is generally less than the total number of clients being
services and slightly greater than the number of 'active processes'.
This is analogous to the pool of threads approach but using processes
instead of threads.

        You can also make hybrid architectures by combining bits and pieces of
these architectures. This list is not complete. 80% of the time, '3' is
the best choice. But a large number of factors go into the decision.

        DS

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by David Schwart » Wed, 17 Oct 2001 05:37:48


        Sorry for the typos.

Quote:>         5) Process poll. In this architecture, shared memory is used between

        This should be "process pool".

Quote:> processes for client state informatino. File descriptors are handed off
> from process to process. A process can work on a given client, but if it
> winds up waiting too long, it can work on another client. The number of
> processes used is generally less than the total number of clients being
> services and slightly greater than the number of 'active processes'.

        This should be "number of 'active connections' or 'active clients'."

Quote:> This is analogous to the pool of threads approach but using processes
> instead of threads.

        DS
 
 
 

Design of a server: multi-threaded or multi-processes?

Post by phil-news-nos.. » Wed, 17 Oct 2001 16:09:27



| In my final year thesis project I have to design a server that handles
| queries from many (likely thousands of) clients. I have previous
| experiences of working with POSIX threads and I decide to implement the
| server for UNIX (at least for now). I am not sure if I should design the
| server such that it handles client requests with threads or subprocesses.
| Is there any general guidelines concerning efficiency and stability and
| the magnitude of the number of clients for the choice?

Part of your thesis should be a point by point analysis of the server
project itself, to examine the merits of the various choices as given
in other posts.  And it is not as simple as just processes vs threads.

Right now I'm building a daemon library to implement three different
process based models.  One is the more classical connect-then-fork.
The second keeps a pool of worker processes, but one process still
takes the connections and transfers the descriptor to a ready worker.
The third has each worker doing its own listening via an accept()
call.  There are other ways, but they don't fit within the goals of
my library project.  In my goals, it is necessary for each worker
process to exit after a connection is serviced.  I'm only doing this
for simpler server designs.

You might want to look at selecting a few different kinds of servers
and building each in some of the different models.  Do a rigorous study
of the results, not only in performance (throughput, latency), but also
in issues of design speed, reliability, maintainability, scalability,
etc.  Choosing the right methodology for a particular kind of server is
as important as actually doing the serevr design.

| BTW, MS windows doesnt make any sense to me and I dont know anything about
| it. Therefore I think I am asking a UNIX specific question, as least I
| intend to.

MS windows doesn't make any sense to me, either, at least from the
programming perspective.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Ruediger R. Asch » Thu, 18 Oct 2001 00:45:54


ehm, in case the person who posted this originally would like to rethink
his/her statement about Windows:

Any of the architectures mentioned above will suffer tremendously when the
computations the server must perform on the behalf of the client is CPU
bound, no matter what OS. I'm sure you have studied all of this in operating
system classes.

You can say whatever you want about Windows, but NT provides much more
powerful asynchronous I/O than Unix. For example, there are I/O completion
ports which sort of combine asynchronous I/O with multithreading and give
you the best of both worlds. Also, notification-based asynchronous I/O is
already built into the NT kernel, so you can associate every pending
outstanding I/O request with a unique event object (pretty much what
condition variables are) that is signalled by the I/O system when the asynch
request has completed. Any architecture can work off the pending requests in
any order they like, with or without OS support. Also, the thread pooling
architecture mentioned below is already built into the Win2K kernel.

I understand the concerns and sentiments about NT, but there are a few areas
in which the OS is worth looking at. For a thesis about performance of multi
client server apps, you really leave out a major part if you don't look at
true asynchronous I/O.

RAc




> > In my final year thesis project I have to design a server that handles
> > queries from many (likely thousands of) clients. I have previous
> > experiences of working with POSIX threads and I decide to implement the
> > server for UNIX (at least for now). I am not sure if I should design the
> > server such that it handles client requests with threads or
subprocesses.
> > Is there any general guidelines concerning efficiency and stability and
> > the magnitude of the number of clients for the choice?

> You should consider more than just these two architectures. You should
> consider:

> 1) One process, one thread. In this architecture, your application is
> basically a big loop around 'select' or 'poll'. In the loop, you do all
> the work until you need to wait for I/O, then you call 'select' or
> 'poll' again.

> 2) One process, one thread for each client. In this architecture, you
> have one thread for each client, generally blocked on 'recv' or 'read'.
> One thread blocked on 'listen' accepts new clients.

> 3) One process, pool of threads. In this architecture, one or more
> threads loop on 'select' or 'poll' and when there's work to be done,
> they put it on a work queue. A small number of threads pull jobs from
> the work queue and service them.

> 4) One process for each client. In this architecture, a master process
> listens for new connections and forks sub-processes to handle them. In a
> more advanced version, the sub-processes stay running as clients are
> assigned to them by the master process. Each worker process remains
> assigned to a single client from start to finish. The client processes
> could call 'accept' themselves or the master process could accept the
> connections and hand them off.

> 5) Process poll. In this architecture, shared memory is used between
> processes for client state informatino. File descriptors are handed off
> from process to process. A process can work on a given client, but if it
> winds up waiting too long, it can work on another client. The number of
> processes used is generally less than the total number of clients being
> services and slightly greater than the number of 'active processes'.
> This is analogous to the pool of threads approach but using processes
> instead of threads.

> You can also make hybrid architectures by combining bits and pieces of
> these architectures. This list is not complete. 80% of the time, '3' is
> the best choice. But a large number of factors go into the decision.

> DS

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Kevin D. Cla » Sat, 20 Oct 2001 02:50:08



Quote:> In my final year thesis project I have to design a server that handles
> queries from many (likely thousands of) clients. I have previous
> experiences of working with POSIX threads and I decide to implement the
> server for UNIX (at least for now). I am not sure if I should design the
> server such that it handles client requests with threads or subprocesses.
> Is there any general guidelines concerning efficiency and stability and
> the magnitude of the number of clients for the choice?

Since I didn't see anybody else in this thread suggest it, I will:
You definitely want to examine this book:

_Unix Network Programming_, Volume 1, Second Edition, by W. Richard
Stevens.  This book contains a whole section on this topic.

Quote:> BTW, MS windows doesnt make any sense to me and I dont know anything about
> it. Therefore I think I am asking a UNIX specific question, as least I
> intend to.

Doesn't make much sense to me either.

--kevin
--
Kevin D. Clark (CetaceanNetworks.com!kclark)  |
Cetacean Networks, Inc.                       |   Give me a decent UNIX
Portsmouth, N.H. (USA)                        |  and I can move the world
alumni.unh.edu!kdc (PGP Key Available)        |

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Keith Willoughb » Fri, 26 Oct 2001 14:56:33




> > In my final year thesis project I have to design a server that handles
> > queries from many (likely thousands of) clients. I have previous
> > experiences of working with POSIX threads and I decide to implement the
> > server for UNIX (at least for now). I am not sure if I should design the
> > server such that it handles client requests with threads or subprocesses.
> > Is there any general guidelines concerning efficiency and stability and
> > the magnitude of the number of clients for the choice?

> Since I didn't see anybody else in this thread suggest it, I will:
> You definitely want to examine this book:

> _Unix Network Programming_, Volume 1, Second Edition, by W. Richard
> Stevens.  This book contains a whole section on this topic.

I am also about to have a go at server programming. The shocking
omission of multi-player cribbage on Linux needs addressing :-)

My copy of Unix Networking Programming is on order, but in the
meantime can anyone suggest a similar application whose code I can
look at? I'm not currently sure if I should try and generalise it into
a server that can host multiple games, or one that hosts only one
game, but assuming the latter, it would have to accept asynchronous
commands from 2-4 clients, all dealing with a common state and common
data.

Ta.

--
Keith Willoughby
This isn't TV, he isn't William Shatner

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Ed L Cashi » Sat, 27 Oct 2001 04:47:14





...
> > _Unix Network Programming_, Volume 1, Second Edition, by W. Richard
> > Stevens.  This book contains a whole section on this topic.

> I am also about to have a go at server programming. The shocking
> omission of multi-player cribbage on Linux needs addressing :-)

> My copy of Unix Networking Programming is on order, but in the
> meantime can anyone suggest a similar application whose code I can
> look at? I'm not currently sure if I should try and generalise it into
> a server that can host multiple games, or one that hosts only one
> game, but assuming the latter, it would have to accept asynchronous
> commands from 2-4 clients, all dealing with a common state and common
> data.

For a good example of single-process, single-thread multiplexing,
check out thttpd:

  http://www.acme.com/software/thttpd/thttpd.html

With such a model, sharing state between clients is trivial, and
thttpd shows that you can go pretty derned fast dispite (because of?)
the simplicity of the approach.

--
--Ed Cashin                     integrit file-verification system:

    Note: If you want me to send you email, don't munge your address.

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Keith Willoughb » Sat, 27 Oct 2001 05:20:32



Quote:> For a good example of single-process, single-thread multiplexing,
> check out thttpd:

>   http://www.acme.com/software/thttpd/thttpd.html

Thanks, downloading right now.

--
Keith Willoughby
This isn't TV, he isn't William Shatner

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by phil-news-nos.. » Sun, 28 Oct 2001 09:31:56



| For a good example of single-process, single-thread multiplexing,
| check out thttpd:
|
|   http://www.acme.com/software/thttpd/thttpd.html
|
| With such a model, sharing state between clients is trivial, and
| thttpd shows that you can go pretty derned fast dispite (because of?)
| the simplicity of the approach.

From the notes link:

   The second generation of web servers addressed this problem by forking
   off a child process for each request. This is very straightforward to
   do under Unix, only a few extra lines of code. CERN and NCSA 1.3 are
   examples of type of server. Unfortunately, forking a process is a
   fairly expensive operation, so performance of this type of server is
   still pretty poor. The long random pauses are gone, but instead every
   request has a short constant pause at startup. Because of this, the
   server can't handle a high rate of connections.

Modern Unix and clones seem to not have this problem.  I've not seen it
in FreeBSD, Linux, or Solaris.  Tests in Linux show on 800 MHz P-III,
a fork() is under 100 nanoseconds.  I can get thousands of fork()'s per
second.  Maybe back in the days of slower CPUs on older OSes this might
have been an issue.

Calling an exec() function might involve more because of the fact that
it involves a whole new program and library mapping and linking, and
in some cases, a lot more initialization to load and compile (either
all at once or just in time) or interpret a script.  But if the server
is not going to call exec(), you're dealing with the cost of fork()
and any subsequent VM copy on write.

For simple static pages, thttpd should be great.  If any kind of logic
needs to be integrated, such as a dynamically generated site, thttpd
does not seem right if the logic can be integrated directly in the
server (e.g. avoiding exec() calls).

Also, I found that when thttpd forks a process for CGI, it does not
close all the descriptors for all the existing connections.  If one
of the CGI scripts has to run for a long time, it could end up holding
thise descriptors (and hence the HTTP connections) open well after
they should have been closed elsewhere.

At the moment I don't have the time to run a serious benchmark test of
this vs. other servers.  If anyone does, I'd also like to see included
in that the figures for khttpd in the Linux kernel.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by mindwar » Wed, 31 Oct 2001 07:15:43


I wouldn't call myself The Guru or anything but if you don't need
processes really bad, go for the pthreads. Context switching is far more
cheaper on modern OS-es (read linux, solaris) with threads then
processes. Even tough fork() is quite fast today, you might need those
CPU-cycles for something else.

The new apache is built around threads ( and processes for that matter),
they should know what they are doing :D

Apropo win32... it's usable, but I'm not share holder in Microsoft...
hence the lack of interest.

--

Posted via dBforums
http://dbforums.com

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by phil-news-nos.. » Wed, 31 Oct 2001 22:52:09



| I wouldn't call myself The Guru or anything but if you don't need
| processes really bad, go for the pthreads. Context switching is far more
| cheaper on modern OS-es (read linux, solaris) with threads then
| processes. Even tough fork() is quite fast today, you might need those
| CPU-cycles for something else.

Since a thread does not duplicate as many resources, I will concur
that creating new threads is cheaper in CPU time than new processes.
However, threads have limitations, too.  The stack space has to be
allocated from the one common address space separately for each
thread.  This fragments the space more, and lowers the ceiling on
expanding stack space.  Large arrays cannot be created as automatic
variables or from alloca() and instead malloc() and kin must be used.
With all this memory sharing comes some cost.  Syncronization is more
important, and more effort has to be made to serialize access to
certain resources.  These efforts can use additional CPU time, and
over the course of the life of a process or thread, could be more CPU
than the difference between creating a process vs. creating a thread.

I believe it is best to understand what the differences between the
two really are, understand what your application actually needs, and
determine which is the best fit.

| The new apache is built around threads ( and processes for that matter),
| they should know what they are doing :D

Apache also tries to achieve (and does well at) a number of other
goals, including portability.  It also includes a lot of code to
handle unusual circumstances which other applications may never see.
Pulling code from Apache for other things will require a bit of
cleanup to remove those attributes unless your application happens to
have the same needs as Apache.  In that case, it is probably a web
server anyway.

Don't dismiss processes unless you don't need them.  Don't dismiss
threads unless you don't need them.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

Design of a server: multi-threaded or multi-processes?

Post by Rolan » Thu, 01 Nov 2001 00:08:53


Hi!

I do it with threads, cuz FORKing does always allocate the complete process-context again but a new thread does not.
and a thread also can share memory with other threads, so u dont need everything a second (3rd, 4th,.. 1000th) time in memory.
and forking 1000s of processes really does kill a machine. So, I recommend the use of threads.

BTW: ur your opinion about M$_Win: 100% agreed! :))

Best regards!
Roland.
http://www.xiberg.com


> Hi gurus,
> In my final year thesis project I have to design a server that handles
> queries from many (likely thousands of) clients. I have previous
> experiences of working with POSIX threads and I decide to implement the
> server for UNIX (at least for now). I am not sure if I should design the
> server such that it handles client requests with threads or subprocesses.
> Is there any general guidelines concerning efficiency and stability and
> the magnitude of the number of clients for the choice?
> BTW, MS windows doesnt make any sense to me and I dont know anything about
> it. Therefore I think I am asking a UNIX specific question, as least I
> intend to.
> Thanks
> -----
> Ran Pang
> SE 99

> http://www.cas.mcmaster.ca/~pangr
> tel: (905)529-5619