Help: recv takes a long time to time out!

Help: recv takes a long time to time out!

Post by bev.. » Fri, 20 Nov 1998 04:00:00



Hello,

I was wondering if anyone might be able to help me with a TCP/IP problem
that I have. I can also reproduce the problem on Solaris, but for the moment
I'll stick with AIX. The problem is as follows:

My application opens a TCP/IP socket, which it keeps open for the duration
of its execution. It then happily sends and receives data from this socket.
In order to reproduce a customer problem, I remove all network connections
from my computer (ie, pull out the LAN cable). The application hangs on a
recv function and spends 9 minutes timing out, a little too long for this
particular customer.

The question is - why does it always spend 9 minutes timing out, and is there
a way to reduce this time? I am aware of the 'no' command for aix to set
various TCP/IP parameters (such as tcp_keepidle) and my initial guess is
that one of these parameters needs to be lowered. But which one? There must
be some reason for the 9 minute time-out value, and I've spent some time
trying to match this value to the various TCP/IP params.

Can anyone shed some light on this problem. I would be very grateful for any
help.

Thanks, Blake Evans-Pritchard

--
All opinions expressed above are my own. They may not necessarily
be those of my employer, IBM.

 
 
 

Help: recv takes a long time to time out!

Post by Dave Marquard » Fri, 20 Nov 1998 04:00:00



> I was wondering if anyone might be able to help me with a TCP/IP problem
> that I have. I can also reproduce the problem on Solaris, but for the moment
> I'll stick with AIX. The problem is as follows:

> My application opens a TCP/IP socket, which it keeps open for the duration
> of its execution. It then happily sends and receives data from this socket.
> In order to reproduce a customer problem, I remove all network connections
> from my computer (ie, pull out the LAN cable). The application hangs on a
> recv function and spends 9 minutes timing out, a little too long for this
> particular customer.

> The question is - why does it always spend 9 minutes timing out, and is there
> a way to reduce this time? I am aware of the 'no' command for aix to set
> various TCP/IP parameters (such as tcp_keepidle) and my initial guess is
> that one of these parameters needs to be lowered. But which one? There must
> be some reason for the 9 minute time-out value, and I've spent some time
> trying to match this value to the various TCP/IP params.

> Can anyone shed some light on this problem. I would be very grateful for any
> help.

A 9 to 10 minute retransmit timeout is pretty typical for most TCPs.
The network options you want to play with to reduce this timeout are

                  rto_low = 1
                 rto_high = 64
                rto_limit = 7
               rto_length = 13

The explanation of HOW to use these in the "no" man page is pretty
sketchy, and I didn't get any other hits when I did a search of all
the online books.  When the system boots, we look at the rto_* network
options in order to calculate the multipliers to retransmission
timers.  The first timer is set to rto_low * the round trip time
(RTT), and we increase the multiplier exponentially over 7 steps to
get to 64 * RTT, and we continue to run timers 13 times before we give
up.

So, you can do the math to figure out how long it will be before we
time out:

RTT + 2 RTT + 4 RTT + 8 RTT + 16 RTT + 32 RTT + 64 RTT + 64 RTT +
64 RTT + 64 RTT + 64 RTT + 64 RTT + 64 RTT = 511 RTT

Since you saw about 9 minutes before timing out, that's 540 seconds,
so we measure the RTT at a little less than a second.

So, if you want to time out sooner, you could adjust any of these.  I
would suggest playing with rto_length.  Figure that each time you
reduce rto_length by 1, you reduce the timeout by about a minute.

Note that you must set these network options BEFORE the netinet kernel
extension is loaded.  This means you should put them at the top of
/etc/rc.net, before rc.net runs /usr/lib/methods/defif.

-Dave

 
 
 

Help: recv takes a long time to time out!

Post by bev.. » Tue, 24 Nov 1998 04:00:00


Thanks for the help. By fiddling with the suggested values, I managed to
get the timeout from 9 mins down to 3 mins. The only problem is now working
out the equivalent values to change on Solaris...

Many thanks, Blake Evans-Pritchard

--
All opinions expressed above are my own. They may not necessarily
be those of my employer, IBM.