accept() weirdness? (Apache/1.3.6 + Solaris/2.6)

accept() weirdness? (Apache/1.3.6 + Solaris/2.6)

Post by Bruno Connell » Wed, 21 Jul 1999 04:00:00



Hello,

I'm seeing some strange socket related Apache/1.3.6 issues under Solaris
2.6.  It looks like it could be anything from socket serialization to
locking problems to who knows what.

I'm convinced what I'm seeing is *not* an Apache bug and not specific to
Solaris 2.6 but rather something specific to the 2.6 image/configuration
I'm using on my machines.  I've exhausted all of my thoughts and running
out of ideas of what to look at next.

On these machines exhibiting the problem, if I configure Apache to bind to
multiple addresses (using either 'BindAddress *' or multiple 'Listen'
directives) I get frequent aborted connections.  What will happen is
when Apache attempts to accept() it will (intermittently) fail with
ECONNABORTED and have to try to accept() again.  This never kills the
connection, but can often end up in *huge* lag times trying to complete
the request.

Setting Apache to bind to only one interface solves the problem
completely.  So, presumably its an issue of non-blocking sockets or
non-serialized sockets?  Apache is compiled with
-DUSE_FCNTL_SERIALIZED_ACCEPT and the exact same build works perfectly on
some of my other Solaris machines (even other 2.6 machines).  So, its
definitely something specific to these few machines in question.

A truss of one of the Apache children when its exhibiting the
intermittent problem yields:

  [05:51:29] poll(0xEFFFD9E8, 3, -1)                              = 1
  [05:51:31] accept(17, 0xEFFFFA58, 0xEFFFFA54) (sleeping...)
  [05:52:28] accept(17, 0xEFFFFA58, 0xEFFFFA54)           Err#130
             ECONNABORTED
  [05:52:28] poll(0xEFFFD9E8, 3, -1)                              = 1
  [05:52:28] accept(16, 0xEFFFFA58, 0xEFFFFA54)           = 3
  [05:52:28] fcntl(20, F_SETLKW, 0x000C4418)                      = 0
  [05:52:28] sigaction(SIGUSR1, 0xEFFFF918, 0xEFFFF998)   = 0
  [05:52:28] getsockname(3, 0xEFFFFA68, 0xEFFFFA54)               = 0
  [05:52:28] setsockopt(3, 6, 1, 0xEFFFF9D4, 4)           = 0
  [05:52:28] read(3, " G E T   / t m p / 1 0 0".., 4096)  = 100

Also, the Apache error log is getting filled with these (at least once a
minute):

  [Mon May  3 12:57:41 1999] [warn] (22)Invalid argument: setsockopt:
                                        (TCP_NODELAY)

....indicating other socket related issues.  Trussing the master daemon
shows setsockopt() returning with EINVAL.  Fun.  :-)

Any thoughts are, of course, appreciated.

--bruno
____________________________________________


 Interweb Ninjaneer       Whack Productions
____________________________________________