Sockets get stuck forever in FIN_WAIT_1

Sockets get stuck forever in FIN_WAIT_1

Post by Mohan Myso » Mon, 01 Oct 2001 06:35:26



I'm running into a very painful bug.  After running a stress test, I
find that around 15 sockets are stuck in the FIN_WAIT_1 state and each
one of those has several thousand bytes stuck in the Send_Q.  I find
that this exhausts the stack data pool mbufs.  These sockets do not
recover and the only way to get the system running is to reboot it.

For 1-3 hours, this test runs normally, with only a reasonable number
of clusters being used, and all of a sudden, almost all of the
clusters are gone, leaving 70 mBlks ( = total # mBlks - total #
clusters).  Below are the socket and mbuf statistics.  And it doesn't
seem to me that simply increasing the number of mBlks/clBlks/clusters
in the pNetDpool would fix this problem.

I use VxWorks, which is a BSD derivative... does anyone know of a
similar bug, or better still, a bug fix in *BSD?  I have a strong
hunch that this might be an old BSD bug... I've seen several similar
reports on the web (in VxWorks, FreeBSD, NetBSD, BSD4.?), but have
seen no solution.  I even applied Steven's fix for a similar problem
(http://mail-index.netbsd.org/netbsd-bugs/1996/04/16/0004.html) to get
the Persist Timer going, but that doesn't seem to  help.

Mohan.

---------------------------------
Active Internet connections (including servers)
PCB      Proto Recv-Q Send-Q  Local Address      Foreign Address  
(state)
-------- ----- ------ ------  ------------------ ------------------
-------
3e4ff7c  TCP        0    324  64.209.75.224.3111 192.168.15.158.130
FIN_WAIT_1
3e4fb5c  TCP        0   3472  64.209.75.224.3111 192.168.9.156.1405
FIN_WAIT_1
3e4f634  TCP        0   3472  64.209.75.224.3111 192.168.16.152.131
FIN_WAIT_1
3e4fbe0  TCP        0   3472  64.209.75.224.3111 192.168.14.91.1352
FIN_WAIT_1
3e4f7c0  TCP        0   3472  64.209.75.224.3111 192.168.16.153.131
FIN_WAIT_1
3e4f9d0  TCP        0   3472  64.209.75.224.3111 192.168.11.95.1236
FIN_WAIT_1
3e4f94c  TCP        0   3472  64.209.75.224.3111 192.168.9.154.1402
FIN_WAIT_1
3e4fef8  TCP        0   3292  64.209.75.224.1111 192.168.13.156.132
FIN_WAIT_1
3e5018c  TCP        0   3288  64.209.75.224.1111 192.168.15.156.130
FIN_WAIT_1
3e4f8c8  TCP        0   3472  64.209.75.224.3111 192.168.10.154.130
FIN_WAIT_1
3e4f52c  TCP        0   3320  64.209.75.224.1111 192.168.9.93.1392
FIN_WAIT_1
3e50108  TCP        0   7094  64.209.75.224.1111 192.168.11.150.123
FIN_WAIT_1
3e4f3a0  TCP        0   7109  64.209.75.224.1111 192.168.15.92.1300
FIN_WAIT_1
3e50084  TCP        0   7095  64.209.75.224.1111 192.168.13.149.133
FIN_WAIT_1
3e4fa54  TCP        0   3162  64.209.75.224.1111 192.168.15.155.129
FIN_WAIT_1
3e4f844  TCP        0   3306  64.209.75.224.1111 192.168.12.151.125
FIN_WAIT_1
3e4f10c  TCP        0      0  0.0.0.0.3111       0.0.0.0.0        
LISTEN
3e4f088  TCP        0      0  0.0.0.0.2111       0.0.0.0.0        
LISTEN
3e4f004  TCP        0      0  0.0.0.0.1111       0.0.0.0.0        
LISTEN
3e4ef80  TCP        0      0  0.0.0.0.23         0.0.0.0.0        
LISTEN
3e4eefc  TCP        0      0  0.0.0.0.301        0.0.0.0.0        
LISTEN
3e4ecec  TCP        0      0  0.0.0.0.80         0.0.0.0.0        
LISTEN
3e4ebe4  TCP        0      0  0.0.0.0.21         0.0.0.0.0        
LISTEN
3e4f190  UDP        0      0  0.0.0.0.0          0.0.0.0.0        
3e4ee78  UDP        0      0  0.0.0.0.1025       0.0.0.0.0        
3e4edf4  UDP        0      0  0.0.0.0.1024       0.0.0.0.0        
3e4ed70  UDP        0      0  0.0.0.0.161        0.0.0.0.0        

---------------------------------

type        number
---------   ------
FREE    :     70
DATA    :    289
HEADER  :     41
SOCKET  :      0
PCB     :      0
RTABLE  :      0
HTABLE  :      0
ATABLE  :      0
SONAME  :      0
ZOMBIE  :      0
SOOPTS  :      0
FTABLE  :      0
RIGHTS  :      0
IFADDR  :      0
CONTROL :      0
OOBDATA :      0
IPMOPTS :      0
IPMADDR :      0
IFMADDR :      0
MRTABLE :      0
TOTAL   :    400
number of mbufs: 400
number of times failed to find space: 53507711
number of times waited for space: 0
number of times drained protocols for space: 65955681
__________________
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size     clusters  free      usage
-------------------------------------------------------------------------------
64       100       0         365942
128      100       0         350230
256      40        0         185861
512      40        0         117064
1024     25        0         63172
2048     25        0         6072
-------------------------------------------------------------------------------

 
 
 

1. TCP SYN_RECV state: stuck forever in accept() ?

 I have been reading the kernel TCP/IP code, and can't find
a way for a process that has done a blocking accept() call
to get out of the SYN_RECV state.

 Initially, the state is set to LISTEN, and then when a
SYN is received in this state, the tcp_conn_request() routine
sets the state for the new socket to SYN_RECV.

 However, the tcp_recv() routine (in 0.99.9) does not have a
case for the SYN_RECV state  except for the default case, which doesn't
seem to do the right thing.

 Can anyone explain to me how the ESTABLISHED state is
reached from the SYN_RECV state ? Is this a bug ? Am I
missing something basic ?

Peter.

2. help io-ports !!

3. Unkillable processes stuck in "D" state running forever

4. kiosk mode for NCD X terminals

5. HELP: fetchmail gets stuck getting mail over PPP

6. Route table size

7. Why sendto() on UDP socket can block forever

8. CD Writer: needs WIN95+OSR2???

9. TCP socket state in LAST_ACK forever?

10. close() on a socket blocks forever

11. help urgent! Spark 10 after rebooting gets stuck..Subject: help! Spark 10 after rebooting gets stuck..

12. Network card gets stuck after some time of high traffic load

13. Computer gets stuck at boot up, only "LI" of the LILO prompt shows