Am I right in thinking the KEEPALIVE option is supposed to prevent
this? Just how long is the time out value in KEEPALIVE? (I waited
>10 minutes.)
An extract from /usr/include/netinet/tcp_timer.h (SunOs 4.1.something):
* The TCPT_KEEP timer is used to keep connections alive. If a
* connection is idle (no segments received) for TCPTV_KEEP_INIT amount of time,
* but not yet established, then we drop the connection. Once the connection
* is established, if the connection is idle for TCPTV_KEEP_IDLE time
* (and keepalives have been enabled on the socket), we begin to probe
* the connection. We force the peer to send us a segment by sending:
* <SEQ=SND.UNA-1><ACK=RCV.NXT><CTL=ACK>
* This segment is (deliberately) outside the window, and should elicit
* an ack segment in response from the peer. If, despite the TCPT_KEEP
* initiated segments we cannot elicit a response from a peer in TCPT_MAXIDLE
* amount of time probing, then we drop the connection.
*/
[...]
#define TCPTV_KEEP_INIT ( 75*PR_SLOWHZ) /* initial connect keep alive */
#define TCPTV_KEEP_IDLE (120*60*PR_SLOWHZ) /* dflt time before probing */
#define TCPTV_KEEPINTVL ( 75*PR_SLOWHZ) /* default probe interval */
#define TCPTV_KEEPCNT 8 /* max probes before drop */
So (if I understand this properly, which is a dubious proposition!) it
waits for two hours before starting keepalive pinging, then gives up
after 8*75 seconds (10 minutes) of keepalive probing.
In the case of a rebooted machine, the keepalive will kill the
connection on the first keepalive ping because the remote machine will
send a RST in response to the keepalive packet.
So the answer is you have to wait two hours or a bit more. If more
rapid response to crashed machines is needed, you will need to do that
in the application. One of our applications here writes a single
space character to a (non-blocking) socket that has been idle for 1
minute. Either this elicits an RST (reflected as ECONNRESET?) if the
machine has booted, or returns EWOULDBLOCK if the remote machine has
been down and the TCP buffers are full. The remote end ignores the
space character (it is a broadcast data feed), so we get keepalive
functionality with programable 1 minute resolution.
Greg.
--
Knox's 386 is slick. Fox in Sox, on Knox's Box
Knox's box is very quick. Plays lots of LSL. He's sick!
(Apologies to John "Iron Bar" Mackin.)