Using Threads?

Using Threads?

Post by Junfeng Zhan » Tue, 05 Dec 2000 15:28:00



Hello all,

I am new to postgreSQL. When I read the documents, I find out the Postmaster
daemon actual spawns a new backend server process to serve a new client
request. Why not use threads instead? Is that just for a historical reason,
or some performance/implementation concern?

Thank you very much.
Junfeng

 
 
 

Using Threads?

Post by Karel Z » Wed, 06 Dec 2000 00:51:34



> Hello all,

> I am new to postgreSQL. When I read the documents, I find out the Postmaster
> daemon actual spawns a new backend server process to serve a new client
> request. Why not use threads instead? Is that just for a historical reason,
> or some performance/implementation concern?

 It's a little a historical reason, but not only. The PostgreSQL allows
 to use user defined modules (functions), it means that bad module or
 bug in core code crash one backend only, but postmaster run still. In the
 thread model crash all running backend. Big differntion is in the lock
 method too.

                                Karel

 
 
 

Using Threads?

Post by The Hermit Hack » Wed, 06 Dec 2000 02:45:10



> Hello all,

> I am new to postgreSQL. When I read the documents, I find out the
> Postmaster daemon actual spawns a new backend server process to serve
> a new client request. Why not use threads instead? Is that just for a
> historical reason, or some performance/implementation concern?

Several reasons, 'historical' probably being the strongest right now
... since PostgreSQL was never designed for threading, its about as
'un-thread-safe' as they come, and cleaning that up will/would be a
complete nightmare (should eventually be done, mind you) ...

The other is stability ... right now, if one backend drops away, for
whatever reason, it doesn't take down the whole system ... if you ran
things as one process, and that one process died, you just lost your whole
system ...

 
 
 

Using Threads?

Post by Ross J. Reedstro » Wed, 06 Dec 2000 03:53:17


Myron -
Putting aside the fork/threads discussion for a moment (the reasons,
both historical and other, such as inter-backend protection, are well
covered in the archives), the work you did sounds like an interesting
experiment in code redesign. Would you be willing to release the hacked
code somewhere for others to learn from? Hacking flex to generate
thread-safe code is of itself interesting, and the question about PG and
threads comes up so often, that an example of why it's not a simple task
would be useful.

Ross


> I maybe wrong but I think that PGSQL is not threaded mostly due to
> historical reasons.  It looks to me like the source has developed over
> time where much of the source is not reentrant with many global variables
> throughout.  In addition, the parser is generated by flex which
> can be made to generate reentrant code but is still not thread safe b/c
> global variables are used.

> That being said, I experimented with the 7.0.2 source and came up with a
> multithreaded backend for PGSQL which uses Solaris Threads. It seems to
> work, but I drifted very far from the original source.  I
> had to hack flex to generate threadsafe code as well.  I use it as a
> linked library with my own fe<->be protocol. This ended up being much much
> more than I bargained for and looking back would probably not have tried
> had I known any better.

> Myron Scott

 
 
 

Using Threads?

Post by The Hermit Hack » Wed, 06 Dec 2000 04:23:10


if we were to do this in steps, I beliee that one of the major problems
irght now is that we have global variables up the wazoo ... my
'thread-awareness' is limited, as I've yet to use them, so excuse my
ignorance ... if we got patches that cleaned up the code in stages, moving
towards a cleaner code base, then we could get it into the main source
tree ... ?


> Myron -
> Putting aside the fork/threads discussion for a moment (the reasons,
> both historical and other, such as inter-backend protection, are well
> covered in the archives), the work you did sounds like an interesting
> experiment in code redesign. Would you be willing to release the hacked
> code somewhere for others to learn from? Hacking flex to generate
> thread-safe code is of itself interesting, and the question about PG and
> threads comes up so often, that an example of why it's not a simple task
> would be useful.

> Ross


> > I maybe wrong but I think that PGSQL is not threaded mostly due to
> > historical reasons.  It looks to me like the source has developed over
> > time where much of the source is not reentrant with many global variables
> > throughout.  In addition, the parser is generated by flex which
> > can be made to generate reentrant code but is still not thread safe b/c
> > global variables are used.

> > That being said, I experimented with the 7.0.2 source and came up with a
> > multithreaded backend for PGSQL which uses Solaris Threads. It seems to
> > work, but I drifted very far from the original source.  I
> > had to hack flex to generate threadsafe code as well.  I use it as a
> > linked library with my own fe<->be protocol. This ended up being much much
> > more than I bargained for and looking back would probably not have tried
> > had I known any better.

> > Myron Scott

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy


 
 
 

Using Threads?

Post by Tom La » Wed, 06 Dec 2000 06:28:49



Quote:>> Why not use threads instead? Is that just for a
>> historical reason, or some performance/implementation concern?
> Several reasons, 'historical' probably being the strongest right now
> ... since PostgreSQL was never designed for threading, its about as
> 'un-thread-safe' as they come, and cleaning that up will/would be a
> complete nightmare (should eventually be done, mind you) ...
> The other is stability ... right now, if one backend drops away, for
> whatever reason, it doesn't take down the whole system ... if you ran
> things as one process, and that one process died, you just lost your whole
> system ...

Portability is another big reason --- using threads would create lots
of portability headaches for platforms that had no threads or an
incompatible threads library.  (Not to mention buggy threads libraries,
not-quite-thread-safe libc routines, yadda yadda.)

The amount of work required looks far out of proportion to the payoff...

                        regards, tom lane

 
 
 

Using Threads?

Post by Bruce Guent » Wed, 06 Dec 2000 06:37:54


--OgqxwSJOaUobr8KG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


> I am new to postgreSQL. When I read the documents, I find out the Postmas=
ter
> daemon actual spawns a new backend server process to serve a new client
> request. Why not use threads instead? Is that just for a historical reaso=
n,
> or some performance/implementation concern?

Once all the questions regarding "why not" have been answered, it would
be good to also ask "why use threads?"  Do they simplify the code?  Do
they offer significant performance or efficiency gains?  What do they
give, other than being buzzword compliant?
--=20

--OgqxwSJOaUobr8KG
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6K/5a6W+y3GmZgOgRAnZBAKCbg4T4aYqCQXtNkWY6PXhmpuBtmACdHacy
9rhRiqayH5eKFRDSUDkzw/I=
=I1+l
-----END PGP SIGNATURE-----

--OgqxwSJOaUobr8KG--

 
 
 

Using Threads?

Post by Junfeng Zha » Wed, 06 Dec 2000 06:50:57


All the major operating systems should have POSIX threads implemented.
Actually this can be configurable--multithreads or one thread.

Thread-only server is unsafe, I agree. Maybe the following model can be a
little better. Several servers, each is multi-threaded. Every server can
support a maximum number of requests simultaneously. If anything bad
happends, it is limited to that server.

The cons side of processes model is not the startup time. It is about
kernel resource and context-switch cost. Processes consume much more
kernel resource than threads, and have a much higher cost for context
switch. The scalability of threads model is much better than that of
processes model.

-Junfeng


> > I am new to postgreSQL. When I read the documents, I find out the Postmaster
> > daemon actual spawns a new backend server process to serve a new client
> > request. Why not use threads instead? Is that just for a historical reason,
> > or some performance/implementation concern?

> Both. Not all systems supported by PostgreSQL have a standards-compliant
> threading implementation (even more true for the systems PostgreSQL has
> supported over the years).

> But there are performance and reliability considerations too. A
> thread-only server is likely more brittle than a process-per-client
> implementation, since all threads share the same address space.
> Corruption in one server might more easily propagate to other servers.

> The time to start a backend is quite often small compared to the time
> required for a complete session, so imho the differences in absolute
> speed are not generally significant.

>                        - Thomas

 
 
 

Using Threads?

Post by Adam Haberla » Wed, 06 Dec 2000 07:13:39




> > I am new to postgreSQL. When I read the documents, I find out the Postmaster
> > daemon actual spawns a new backend server process to serve a new client
> > request. Why not use threads instead? Is that just for a historical reason,
> > or some performance/implementation concern?

> Once all the questions regarding "why not" have been answered, it would
> be good to also ask "why use threads?"  Do they simplify the code?  Do
> they offer significant performance or efficiency gains?  What do they
> give, other than being buzzword compliant?

        Typically (on a well-written OS, at least), the spawning of a thread
is much cheaper then the creation of a new process (via fork()).  Also,
since everything in a group of threads (I'll call 'em a team) shares the
same address space, there can be some memory overhead savings.

--
Adam Haberlach           |"California's the big burrito, Texas is the big

http://www.veryComputer.com/*.com| the big tamale ... and the only tamale that
'88 EX500                | counts any more." -- Dan Rather

 
 
 

Using Threads?

Post by Dan Ly » Wed, 06 Dec 2000 07:55:23


Quote:Adam Haberlach writes:
> Typically (on a well-written OS, at least), the spawning of a thread
> is much cheaper then the creation of a new process (via fork()).

This would be well worth testing on some representative sample
systems.

Within the past year and a half at one of my gigs some coworkers did
tests on various platforms (Irix, Solaris, a few variations of Linux
and *BSDs) and concluded that in fact the threads implementations were
often *slower* than using processes for moving and distributing the
sorts of data that they were playing with.

With copy-on-write and interprocess pipes that are roughly equivalent
to memcpy() speeds it was determined for that application that the
best way to split up tasks was fork()ing and dup().

As always, your mileage will vary, but the one thing that consistently
amazes me on the Un*x like operating systems is that usually the
programmatically simplest way to implement something has been
optimized all to heck.

A lesson that comes hard to those of us who grew up on MS systems.

Dan

 
 
 

Using Threads?

Post by Bruce Guent » Wed, 06 Dec 2000 08:33:24


--5vNYLRcllDrimb99
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


>    Typically (on a well-written OS, at least), the spawning of a thread
> is much cheaper then the creation of a new process (via fork()).

Unless I'm mistaken, the back-end is only forked when starting a new
connection, in which case the latency of doing the initial TCP tri-state
and start-up queries is much larger than any process creation cost.  On
Linux 2.2.16 on a 500MHz PIII, I can do the fork/exit/wait sequence in
about 164us.  On the same server, I can make/break a PostgreSQL
connection in about 19,000us (with 0% CPU idle, about 30% CPU system).
Even if we can manage to get a thread for free, and assume that the fork
from postmaster takes more than 164us, it won't make a big difference
once the other latencies are worked out.

Quote:> Also, since everything in a group of threads (I'll call 'em a team)

Actually, you call them a process.  That is the textbook definition.

Quote:> shares the
> same address space, there can be some memory overhead savings.

Only slightly.  All of the executable and libraries should already be
shared, as will all non-modified data.  If the data is modified by the
threads, you'll need seperate copies for each thread anyways, so the net
difference is small.

I'm not denying there would be a difference.  Compared to seperate
processes, threads are more efficient.  Doing a context switch between
threads means there is no PTE invalidations, which makes them quicker
than between processes.  Creation would be a bit faster due to just
linking in the VM to a new thread rather than marking it all as COW.
The memory savings would come from reduced fragmentation of the modified
data (if you have 1 byte modified on each of 100 pages, the thread would
grow by a few K, compared to 400K for processes).  I'm simply arguing
that the differences don't appear to be significant compared to the
other costs involved.
--=20

--5vNYLRcllDrimb99
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6LCXw6W+y3GmZgOgRAm4XAKCdIT1mJ6vL6XTS+2AsXqxOanTg8wCbBmwy
6ukH0k3t+b67Xer2NWPyzV4=
=YRLD
-----END PGP SIGNATURE-----

--5vNYLRcllDrimb99--

 
 
 

Using Threads?

Post by Matth » Wed, 06 Dec 2000 09:08:28


        *snip*
Quote:

> > Once all the questions regarding "why not" have been answered, it would
> > be good to also ask "why use threads?"  Do they simplify the code?  Do
> > they offer significant performance or efficiency gains?  What do they
> > give, other than being buzzword compliant?

        The primary advantage that I see is that a single postgres process
can benefit from multiple processors. I see little advantage to using thread
for client connections.
 
 
 

Using Threads?

Post by Bruce Guent » Wed, 06 Dec 2000 09:21:22


--OwLcNYc0lM97+oe1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


> Adam Haberlach writes:
> > Typically (on a well-written OS, at least), the spawning of a thread
> > is much cheaper then the creation of a new process (via fork()).
> This would be well worth testing on some representative sample
> systems.

Using the following program for timing process creation and cleanup:

main() {
  int i;
  int pid;
  for (i=3D0; i<100000; ++i) {
    pid=3Dfork();
    if(pid=3D=3D-1) exit(1);
    if(!pid) _exit(0);
    waitpid(pid,0,0);
  }
  exit(0);

Quote:}=20

And using the following program for timing thread creation and cleanup:

#include <pthread.h>

threadfn() { pthread_exit(0); }

main() {
  int i;
  pthread_t thread;
  for (i=3D0; i<100000; ++i) {
    if (pthread_create(&thread, 0, threadfn, 0)) exit(1);
    if (pthread_join(thread, 0)) exit(1);
  }
  exit(0);

Quote:}=20

On a relatively unloaded 500MHz PIII running Linux 2.2, the fork test
program took a minimum of 16.71 seconds to run (167us per
fork/exit/wait), and the thread test program took a minimum of 12.10
seconds to run (121us per pthread_create/exit/join).  I use the minimums
because those would be the runs where the tasks were least interfered
with by other tasks.  This amounts to a roughly 25% speed improvement
for threads over processes, for the null-process case.

If I add the following lines before the for loop:
  char* m;
  m=3Dmalloc(1024*1024);
  memset(m,0,1024,1024);
The cost for doing the fork balloons to 240us, whereas the cost for
doing the thread is constant.  So, the cost of marking the pages as COW
is quite significant (using those numbers, 73us/MB).

So, forking a process with lots of data is expensive.  However, most of
the PostgreSQL data is in a SysV IPC shared memory segment, which
shouldn't affect the fork numbers.
--=20

--OwLcNYc0lM97+oe1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6LC9p6W+y3GmZgOgRAgMzAJ9RXbckgjWksdkUaS9/4A2U5crLKgCcDCXu
6BB37JyzlcpE63pPucPzciQ=
=E2rH
-----END PGP SIGNATURE-----

--OwLcNYc0lM97+oe1--

 
 
 

Using Threads?

Post by Lamar Ow » Wed, 06 Dec 2000 10:21:49



>         The primary advantage that I see is that a single postgres process
> can benefit from multiple processors. I see little advantage to using thread
> for client connections.

Multiprocessors best benefit multiple backends.  And the current forked
model lends itself admirably to SMP.

And I say that even after using a multithreaded webserver (AOLserver)
for three and a half years.  Of course, AOLserver also sanely uses the
multi process PostgreSQL backends in a pooled fashion, but that's beside
the point.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

 
 
 

Using Threads?

Post by Tom La » Wed, 06 Dec 2000 14:08:51



> [ some very interesting datapoints ]

> So, forking a process with lots of data is expensive.  However, most of
> the PostgreSQL data is in a SysV IPC shared memory segment, which
> shouldn't affect the fork numbers.

I believe (but don't have numbers to prove it) that most of the present
backend startup time has *nothing* to do with thread vs process
overhead.  Rather, the primary startup cost has to do with initializing
datastructures, particularly the system-catalog caches.  A backend isn't
going to get much real work done until it's slurped in a useful amount
of catalog cache --- for example, until it's got the cache entries for
pg_class and the indexes thereon, it's not going to accomplish anything
at all.

Switching to a thread model wouldn't help this cost a bit, unless
we also switch to a shared cache model.  That's not necessarily a win
when you consider the increased costs associated with cross-backend
or cross-thread synchronization needed to access or update the cache.
And if it *is* a win, we could get most of the same benefit in the
multiple-process model by keeping the cache in shared memory.

The reason that a new backend has to do all this setup work for itself,
rather than inheriting preloaded cache entries via fork/copy-on-write
from the postmaster, is that the postmaster isn't part of the ring of
processes that can access the database files directly.  That was done
originally for robustness reasons: since the PM doesn't have to deal
with database access, cache invalidation messages, etc etc yadda yadda,
it is far simpler and less likely to crash than a real backend.  If we
conclude that shared syscache is not a reasonable idea, it might be
interesting to look into making the PM into a full-fledged backend
that maintains a basic set of cache entries, so that these entries are
immediately available to new backends.  But we'd have to take a real
hard look at the implications for system robustness/crash recovery.

In any case I think we're a long way away from the point where switching
to threads would make a big difference in connection startup time.

                        regards, tom lane

 
 
 

1. Using Threads in VB 5.0

I have a routine which pulls back currently about 80,000 client ID's.
Unfortunately this process even indexed to hell and back takes about 25-35
seconds depending on network traffic.  In order for the user to continue
with what he is doing I only need to pull back immediately the first 500
records.  I wanted to create a thread to pull the rest back in the
background.  Any pointers for creating threads in VB would help...  Please
send responses in email.

Thanks,
Dave Erickson

2. Mystery: How does the data get into this field?

3. Using Threads?

4. performance question (drop/create vs truncate/insert)

5. DAO Access Violation using threaded TQuery

6. Bi-directional Replication over the Internet

7. Using Threads

8. 2.6 under NT

9. SP and using Threads

10. Using Posix Threads and SYBASE OpenServer threads in same application(executable)

11. Thread or not threads?

12. Native threads and DCE threads, AIX 4.1.4

13. Common Functions - threaded & non-threaded