NT socket stack bug: Listening socket stops notification?

NT socket stack bug: Listening socket stops notification?

Post by Rob Becke » Tue, 09 Jul 1996 04:00:00



I've written a FTP server for MS-Windows (works on Win3.1 ... NT), named
Serv-U, with currently several thousand people using it. In general, all
works well, except for a few people with a problem only happening on NT.
Normally I'd assume some peculiar software-software interaction, but in
this case I've seen it happen twice on my own FTP site (about 500 clients
a day). I've used the occasions to log everything, up to the socket
function call level, in an attempt to get some idea what's going on. The
picture is clear, but the cause is not.

What seems to happen is that the listening socket (on port 21), which is
an asynchronous socket set up to send notification messages on connect,
stops sending messages. Incoming clients still connect, because the stack
handles that by itself, but the connection is never passed on to the FTP
server. All other sockets (even those from the same task) seem to work
normally meanwhile).

I run a watchdog timer, which once in 3 minutes does a 'getsockopt()'
function call to see if the listening socket is still listening, and the
return value indicates that as far as the stack is concerned all is well.
I also let it post a fake FD_ACCEPT message (ie. once in 3 minutes), and
when this is going on it invariably finds the connecting client and
handles it as it should, thus indicating the rest of the program and the
message window itself is still functioning normally. The watchdog also
re-sets the listening socket to respond to FD_ACCEPT messages through a
'WSAAsyncSelect()' call, however, once the socket stops responding it will
not go back to sending out messages on connections, despite the async
select call.

I'm also running Borland's CodeGuard info linked into the program, so I'm
sure there are no messed up pointers and the program internals are OK (ie.
it would find error messages on problems like dangling pointers etc, so
I'm fairly sure the program does not get into some unknown state because
of bugs).

That's about the whole story. I'm puzzled by what's happening. There does
not seem to be anything wrong, yet NT messes up the listening socket
somehow. One problem is that I can't detect this from within my program,
so I can't just junk the socket and get a new one.

I'm hoping this rings some bells as to 'known' problems in NT. I'm about
at the end of my rope as far as debugging goes, and all I'm finding is
that all is well within Serv-U, yet the socket goes off the deep end.

The above happens on NT 3.51 SP4. It has never been observed in Win95,
seems to work fine there, and this is with a 32-bit program.

Well, hope the above made some sense, and you have more of a clue as to
what's going on than I have.

Thanks for your help!
Regards,

        Rob
        -/-


well. I'll try to keep up with reading this newsgroup, but E-mail is
faster.

--------- "Save a plant, eat a vegetarian..."  (Rajesh '95) -----------
Rob Beckers is the author of "Serv-U", FTP server for Win3.1, WFW3.11,
Win95 and NT. There are currently well over 2300 registered users, not
counting licenses for multiple copies. You can find more information about
Serv-U at http://CatSoft.dorm.duke.edu
-----------------------------------------------------------------------

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Alun Jon » Wed, 10 Jul 1996 04:00:00



>What seems to happen is that the listening socket (on port 21), which is
>an asynchronous socket set up to send notification messages on connect,
>stops sending messages. Incoming clients still connect, because the stack
>handles that by itself, but the connection is never passed on to the FTP
>server. All other sockets (even those from the same task) seem to work
>normally meanwhile).

This sounds like a problem people are reporting with WFTPD on NT 3.51 SP4, too
- it sounds unlikely that we've both made the same mistake in programming, and
more likely that this is a bug in NT itself.  It's not the first time
Microsoft have done this, either, which makes it even more likely, to my mind,
that this is a bug in TCP/IP for NT (the first time was in the TCP/IP-32 for
Windows for Workgroups).

I'll see if I can find a quote from one of their readmes describing this
problem in the WFW stack.  I haven't got too much time for testing this at the
moment, but my guess would be that it has something to do with the tinkering
they've done to the listen backlog, which started at 5, increased to 50 in NT
3.1, and then to 100 in 3.5.  It may be that they've done this again, and
screwed something up.

Alun.
~~~~

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Larry Ka » Fri, 12 Jul 1996 04:00:00




wrot
>e:
>>What seems to happen is that the listening socket (on port 21), which is
>>an asynchronous socket set up to send notification messages on connect,
>>stops sending messages. Incoming clients still connect, because the stack
>>handles that by itself, but the connection is never passed on to the FTP
>>server. All other sockets (even those from the same task) seem to work
>>normally meanwhile).

>This sounds like a problem people are reporting with WFTPD on NT 3.51 SP4,
too
>- it sounds unlikely that we've both made the same mistake in programming,
and
>more likely that this is a bug in NT itself.  It's not the first time
>Microsoft have done this, either, which makes it even more likely, to my
mind,
>that this is a bug in TCP/IP for NT (the first time was in the TCP/IP-32 for
>Windows for Workgroups).

>I'll see if I can find a quote from one of their readmes describing this
>problem in the WFW stack.  I haven't got too much time for testing this at
the
>moment, but my guess would be that it has something to do with the tinkering
>they've done to the listen backlog, which started at 5, increased to 50 in NT
>3.1, and then to 100 in 3.5.  It may be that they've done this again, and
>screwed something up.

>Alun.
>~~~~

I have never had this problem with any servers I 've written either bind
or syslog ... what version of nt.. could it be the 4.0 beta ii workstation
and you have too many connections and they are being denied...

--
_________________________________________________________________________
Larry Kahn            __    __    __    __       Senior Software Engineer
                     /  \  /  \  /  \  /  \       Dynamics Research Corp.
____________________/  __\/  __\/  __\/  __\_____________________________
___________________/  /__/  /__/  /__/  /________________________________

                  \_/   \_/   \_/   \_/   \    o \  
                                           \_____/--<


_________________________________________________________________________

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Greg Mario » Fri, 12 Jul 1996 04:00:00


I'm not 100% sure this is problem, but check Article ID: Q152474.
( here's a snip )

Window Socket Application Failure with Connection Reset Event
Article ID: Q152474
Revision Date: 18-JUN-1996

The information in this article applies to:

 - Microsoft Windows NT Workstation versions 3.51
 - Microsoft Windows NT Server versions 3.51

SYMPTOMS

A Windows Socket (WinSock) application fails to receive data from a
remote server. It may display a message indicating that a Connection
Reset message or a
WSAECONNRESET event occurred.

CAUSE

Under certain circumstances, a condition exists with Microsoft TCP/IP
where a TCP packet can be sent on a connection that has already closed
causing a Reset packet
to be issued. This causes the WinSock application to fail with a
WSAECONNRESET event.



> >What seems to happen is that the listening socket (on port 21), which is
> >an asynchronous socket set up to send notification messages on connect,
> >stops sending messages. Incoming clients still connect, because the stack
> >handles that by itself, but the connection is never passed on to the FTP
> >server. All other sockets (even those from the same task) seem to work
> >normally meanwhile).

> This sounds like a problem people are reporting with WFTPD on NT 3.51 SP4, too
> - it sounds unlikely that we've both made the same mistake in programming, and
> more likely that this is a bug in NT itself.  It's not the first time
> Microsoft have done this, either, which makes it even more likely, to my mind,
> that this is a bug in TCP/IP for NT (the first time was in the TCP/IP-32 for
> Windows for Workgroups).

> I'll see if I can find a quote from one of their readmes describing this
> problem in the WFW stack.  I haven't got too much time for testing this at the
> moment, but my guess would be that it has something to do with the tinkering
> they've done to the listen backlog, which started at 5, increased to 50 in NT
> 3.1, and then to 100 in 3.5.  It may be that they've done this again, and
> screwed something up.

> Alun.
> ~~~~

--
==================================================================

Greg Marion                                                   SDRC
SDRC North American Customer Services

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Alun Jon » Sat, 13 Jul 1996 04:00:00



>I have never had this problem with any servers I 've written either bind
>or syslog ... what version of nt.. could it be the 4.0 beta ii workstation
>and you have too many connections and they are being denied...

As I stated (buried) in the previous article, this seems to be specific to NT
3.51, with Service Pack 4 only.  What happens is that we have a socket that we
are listening on, and which we have used WSAAsyncSelect, asking to be notified
of any new connections.  At some point, we cease to get any notifications of
new incoming connections on that socket, although the client at the other end
gets told that they have a connection.  Since many FTP clients don't have a
timeout implemented at this stage, this can cause big problems for people
running our server.

There is a similar bug that was fixed by Microsoft in TCP/IP-32a for Windows
for Workgroups, and if I could just dig out that readme, I would be able to
give the terms Microsoft used.

Alun.
~~~~

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Rob Becke » Sun, 14 Jul 1996 04:00:00



>I'm not 100% sure this is problem, but check Article ID: Q152474.
>( here's a snip )

>Window Socket Application Failure with Connection Reset Event
>Article ID: Q152474
>Revision Date: 18-JUN-1996

>The information in this article applies to:

> - Microsoft Windows NT Workstation versions 3.51
> - Microsoft Windows NT Server versions 3.51

>SYMPTOMS

>A Windows Socket (WinSock) application fails to receive data from a
>remote server. It may display a message indicating that a Connection
>Reset message or a
>WSAECONNRESET event occurred.

>CAUSE

>Under certain circumstances, a condition exists with Microsoft TCP/IP
>where a TCP packet can be sent on a connection that has already closed
>causing a Reset packet
>to be issued. This causes the WinSock application to fail with a
>WSAECONNRESET event.

>Greg Marion                                                   SDRC
>SDRC North American Customer Services


Thanks for the info Greg, but this is not the problem we're experiencing.
What happens is that the stack keeps accepting connections (up until the
listen backlog queue is filled up, but this can be set to 150 connections
in NT 3.51), but the stack never posts a FD_ACCEPT to the window specified
when the socket was put in async mode via 'WSAAsyncSelect()'. No FD_ACCEPT
message means the server is never made aware of the new client.

What makes this problem especially hard to trace, is that everything
usually works fine for many days (and many hundreds of FTP clients), and
suddenly FD_ACCEPT notification stops. Sofar only observed on NT 3.51, and
Alun says it only happens on SP4. The same code works fine on Win95.

I've tried using a watchdog timer to check periodically if there are any
pending accepts via a 'select()' call on the listening socket. Since it's
an async socket this should *never* happen unless the notification is
stuck. When pending connections are detected I then kill the listening
socket and get a new one, in the hope that the new socket will work as
advertised (and send notification messages) for at least a while. However,
the 'select()' call does detect the problem but the newly created
listening socket has the same defect. Despite an explicit
'WSAAsyncSelect()' to enable it for FD_ACCEPTS it'll not do so. The whole
task needs to be stopped and restarted to get things back to work.

Reports sofar indicate things work fine on NT 4. So, maybe Micro$oft fixed
this bug there...

        Rob
        -/-

--------- "Save a plant, eat a vegetarian..."  (Rajesh '95) -----------
Rob Beckers            |           All about Serv-U:

Author of "FTP Serv-U" |
FTP-Server for WinSock |     Latest is v2.0b, file SERVU20b.ZIP
-----------------------------------------------------------------------

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Vadim Lebede » Fri, 19 Jul 1996 04:00:00




> >I'm not 100% sure this is problem, but check Article ID: Q152474.
> >( here's a snip )

> >Window Socket Application Failure with Connection Reset Event
> >Article ID: Q152474
> >Revision Date: 18-JUN-1996

> >The information in this article applies to:

> > - Microsoft Windows NT Workstation versions 3.51
> > - Microsoft Windows NT Server versions 3.51

> >SYMPTOMS

> >A Windows Socket (WinSock) application fails to receive data from a
> >remote server. It may display a message indicating that a Connection
> >Reset message or a
> >WSAECONNRESET event occurred.

> >CAUSE

> >Under certain circumstances, a condition exists with Microsoft TCP/IP
> >where a TCP packet can be sent on a connection that has already closed
> >causing a Reset packet
> >to be issued. This causes the WinSock application to fail with a
> >WSAECONNRESET event.

> >Greg Marion                                                   SDRC
> >SDRC North American Customer Services

> Thanks for the info Greg, but this is not the problem we're experiencing.
> What happens is that the stack keeps accepting connections (up until the
> listen backlog queue is filled up, but this can be set to 150 connections
> in NT 3.51), but the stack never posts a FD_ACCEPT to the window specified
> when the socket was put in async mode via 'WSAAsyncSelect()'. No FD_ACCEPT
> message means the server is never made aware of the new client.

> What makes this problem especially hard to trace, is that everything
> usually works fine for many days (and many hundreds of FTP clients), and
> suddenly FD_ACCEPT notification stops. Sofar only observed on NT 3.51, and
> Alun says it only happens on SP4. The same code works fine on Win95.

> I've tried using a watchdog timer to check periodically if there are any
> pending accepts via a 'select()' call on the listening socket. Since it's
> an async socket this should *never* happen unless the notification is
> stuck. When pending connections are detected I then kill the listening
> socket and get a new one, in the hope that the new socket will work as
> advertised (and send notification messages) for at least a while. However,
> the 'select()' call does detect the problem but the newly created
> listening socket has the same defect. Despite an explicit
> 'WSAAsyncSelect()' to enable it for FD_ACCEPTS it'll not do so. The whole
> task needs to be stopped and restarted to get things back to work.

> Reports sofar indicate things work fine on NT 4. So, maybe Micro$oft fixed
> this bug there...

>         Rob
>         -/-

Rob,
 I can suggest you following hack...
   You can try perioddically to connect to yourself...
   start a timer, do a connect call form antother thread, if a FD_ACCEPT
don't arrive in say 1 sec you know that you have an "Interesting
situation"....

Vadim.

 
 
 

NT socket stack bug: Listening socket stops notification?

Post by Rob Becke » Fri, 19 Jul 1996 04:00:00




>> Thanks for the info Greg, but this is not the problem we're experiencing.
>> What happens is that the stack keeps accepting connections (up until the
>> listen backlog queue is filled up, but this can be set to 150 connections
>> in NT 3.51), but the stack never posts a FD_ACCEPT to the window specified
>> when the socket was put in async mode via 'WSAAsyncSelect()'. No FD_ACCEPT
>> message means the server is never made aware of the new client.

>> What makes this problem especially hard to trace, is that everything
>> usually works fine for many days (and many hundreds of FTP clients), and
>> suddenly FD_ACCEPT notification stops. Sofar only observed on NT 3.51, and
>> Alun says it only happens on SP4. The same code works fine on Win95.

>> I've tried using a watchdog timer to check periodically if there are any
>> pending accepts via a 'select()' call on the listening socket. Since it's
>> an async socket this should *never* happen unless the notification is
>> stuck. When pending connections are detected I then kill the listening
>> socket and get a new one, in the hope that the new socket will work as
>> advertised (and send notification messages) for at least a while. However,
>> the 'select()' call does detect the problem but the newly created
>> listening socket has the same defect. Despite an explicit
>> 'WSAAsyncSelect()' to enable it for FD_ACCEPTS it'll not do so. The whole
>> task needs to be stopped and restarted to get things back to work.

>> Reports sofar indicate things work fine on NT 4. So, maybe Micro$oft fixed
>> this bug there...

>>         Rob
>>         -/-

>Rob,
> I can suggest you following hack...
>   You can try perioddically to connect to yourself...
>   start a timer, do a connect call form antother thread, if a FD_ACCEPT
>don't arrive in say 1 sec you know that you have an "Interesting
>situation"....

>Vadim.

Yeah, that would be one way... But the trick above (using 'select()' on the
listening socket) also works reliably to detect the problem. Doesn't help solve
it though: Getting a new listening socket has the same defects.

The latest I tried was using a separate thread for the listening socket, which
blocks on a 'select()'. Then use that thread to post FD_ACCEPs to the
application when it detects a connection is made (ie. the listening socket is
put in async mode via 'WSAAsyncSelect()' but is not set up for any notifcation
messages). That works for a while, but eventually the whole application
freezes. Running it with Borland's 'CodeGuard' debug and check info linked in,
so I'm quite sure there are no messed up pointers and the like. So, the whole
thing freezing is somewhat of a mystery as well, unless the NT stack doesn't
like applications to mix socket types (the listening socket is used in blocking
mode, all other sockets are async).

Think I'll give up on it. I'm convinced the whole thing is a NT bug, especially
after hearing others have the same problem in their own applications. Let's
just hope this is going to be fixed in NT 4. (But then, yesterday someone told
me the 'workstation' version of NT 4 has limitations build into the stack: It
will only allow 10 different IP clients to connect to the stack in any 10
minute period. Still trying to get confirmation on that one..).

        Rob
        -/-

 
 
 

1. Shutdown Listening socket (win socket 2 - vs 1)

In my server - side TCP /IP application , I create a listening blocking
socket.
The WSAAccept is called from a different thread. The socket works ok
(accepts connections and blocks again etc). The problem is that when I try
to close the socket,
shutdown blocks and the socket doesnt close.
(in the same application if I use winsocket 1 - accept -, instead of
winsocket 2 the socket closes OK).

The source is something like this:

if(m_hSocket!=INVALID_SOCKET)
 {
  shutdown(m_hSocket,SD_RECEIVE);
  closesocket(m_hSocket);
 }

The other thread is blocked in WSAAccept()

Any ideas ?
Thanx in advance

PS:
For a simple TCP/IP application would I profit by using winsocket 2 instead
of 1 (I mean in performance)

2. Any usage of DES cryptosystem in applications?

3. listen(socket, 1) then socket cannot be shutdown

4. Epson 500 Slow printing

5. Socket remains in Listening status after Windows Service is stopped

6. ATARI Mega ST with OMTI Controller anybody?

7. Difference between NT4.0 socket and NT 3.51 socket

8. QIX?

9. nt and number of concurrently open sockets and socket cleanup time

10. NT hangs in Socket Notification Sink

11. NT: No socket close notification

12. Help: Windows NT sockets BUG ?

13. BUG: NT 4.0 hangs in "socket notofication sink"