Odd TCP client/server throughput problem

Odd TCP client/server throughput problem

Post by Richard Eic » Sun, 26 Nov 2006 04:03:28



Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
machine patched after install from CDs.

Communicating over:     GigE LAN

1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
2) Server consistently has TCP RECV-Q of 0.
3) Client (sender) consistently shows a TCP SEND-Q of 80K.
4) Socket is up and connection is ESTABLISHED from both sides.
5) No data is transmitted.

To troubleshoot, I've torn down and re-established to connection
countless times.  There may be a trickle of data initially, but
within a few seconds the client SEND-Q builds and transmission stops.  
Receiver's window size never goes below 32K.

Never seen this kind of behavior before.  If the server process was
slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Sun, 26 Nov 2006 04:24:06



> Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> machine patched after install from CDs.

> Communicating over:        GigE LAN

> 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> 2) Server consistently has TCP RECV-Q of 0.
> 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> 4) Socket is up and connection is ESTABLISHED from both sides.
> 5) No data is transmitted.

> To troubleshoot, I've torn down and re-established to connection
> countless times.  There may be a trickle of data initially, but
> within a few seconds the client SEND-Q builds and transmission stops.
> Receiver's window size never goes below 32K.

> Never seen this kind of behavior before.  If the server process was
> slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

I would use tcpdump or wireshark/ethereal to see what is being sent and
received.

 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Sun, 26 Nov 2006 04:33:15




> > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > machine patched after install from CDs.

> > Communicating over:   GigE LAN

> > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > 2) Server consistently has TCP RECV-Q of 0.
> > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > 4) Socket is up and connection is ESTABLISHED from both sides.
> > 5) No data is transmitted.

> > To troubleshoot, I've torn down and re-established to connection
> > countless times.  There may be a trickle of data initially, but
> > within a few seconds the client SEND-Q builds and transmission stops.
> > Receiver's window size never goes below 32K.

> > Never seen this kind of behavior before.  If the server process was
> > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> I would use tcpdump or wireshark/ethereal to see what is being sent and
> received.

How do you think I know the TCP RWIN size for the server?
 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Sun, 26 Nov 2006 04:37:34





> > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > machine patched after install from CDs.

> > > Communicating over:      GigE LAN

> > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > 2) Server consistently has TCP RECV-Q of 0.
> > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > 5) No data is transmitted.

> > > To troubleshoot, I've torn down and re-established to connection
> > > countless times.  There may be a trickle of data initially, but
> > > within a few seconds the client SEND-Q builds and transmission stops.
> > > Receiver's window size never goes below 32K.

> > > Never seen this kind of behavior before.  If the server process was
> > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > received.

> How do you think I know the TCP RWIN size for the server?

So, what does tcpdump shows you before the transfer stops?
 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Sun, 26 Nov 2006 04:44:14





> > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > machine patched after install from CDs.

> > > Communicating over:      GigE LAN

> > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > 2) Server consistently has TCP RECV-Q of 0.
> > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > 5) No data is transmitted.

> > > To troubleshoot, I've torn down and re-established to connection
> > > countless times.  There may be a trickle of data initially, but
> > > within a few seconds the client SEND-Q builds and transmission stops.
> > > Receiver's window size never goes below 32K.

> > > Never seen this kind of behavior before.  If the server process was
> > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > received.

> How do you think I know the TCP RWIN size for the server?

Is this the same problem?
http://groups.google.com/group/comp.databases.mysql/msg/9555a65936967a36
 
 
 

Odd TCP client/server throughput problem

Post by Simple Simo » Sun, 26 Nov 2006 04:44:20






> > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > machine patched after install from CDs.

> > > > Communicating over: GigE LAN

> > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > 5) No data is transmitted.

> > > > To troubleshoot, I've torn down and re-established to connection
> > > > countless times.  There may be a trickle of data initially, but
> > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > Receiver's window size never goes below 32K.

> > > > Never seen this kind of behavior before.  If the server process was
> > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > received.

> > How do you think I know the TCP RWIN size for the server?

> So, what does tcpdump shows you before the transfer stops?

SYN =>
SYN ACK <=
SYN ACK =>

then either nothing, or

Packet =>
ACK <=       RWIN 32K        // immediate reply
Packet =>
ACK <= RWIN 32K      // immediate reply
...
nothing.  No shrinking window, no slowdown in the rate of ACKs.

--
Taxes are not "punishment for success".  Nor are they "theft".  Taxes
are a royalty paid commensurate to the economic benefit obtained from
a shared socio-economic system.

"Those who gain the benefit should also bear the disadvantage."
                                                   - Common Law maxim

 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Sun, 26 Nov 2006 04:49:24






> > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > machine patched after install from CDs.

> > > > Communicating over: GigE LAN

> > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > 5) No data is transmitted.

> > > > To troubleshoot, I've torn down and re-established to connection
> > > > countless times.  There may be a trickle of data initially, but
> > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > Receiver's window size never goes below 32K.

> > > > Never seen this kind of behavior before.  If the server process was
> > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > received.

> > How do you think I know the TCP RWIN size for the server?

> Is this the same problem?
> http://groups.google.com/group/comp.databases.mysql/msg/9555a65936967a36

Yes, except that we now have a minimal case (described above) that
exhibits the behavior without what we have learned since apparently
are red herrings.
 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Sun, 26 Nov 2006 05:39:12







> > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > machine patched after install from CDs.

> > > > > Communicating over:    GigE LAN

> > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > 5) No data is transmitted.

> > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > countless times.  There may be a trickle of data initially, but
> > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > Receiver's window size never goes below 32K.

> > > > > Never seen this kind of behavior before.  If the server process was
> > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > received.

> > > How do you think I know the TCP RWIN size for the server?

> > So, what does tcpdump shows you before the transfer stops?

> SYN =>
> SYN ACK <=
> SYN ACK =>

> then either nothing, or

> Packet =>
> ACK <=  RWIN 32K        // immediate reply
> Packet =>
> ACK <= RWIN 32K // immediate reply
> ...
> nothing.  No shrinking window, no slowdown in the rate of ACKs.

tcpdump output might provide more details.

strace'ing the sender while reproducing the problem may give you a
clue. You may like to find out what the sending thread is doing or
where it is blocked when the data transfer stops.

 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Mon, 27 Nov 2006 09:57:22


maxim.yegorush...@gmail.com wrote...

> Richard Eich wrote:
> > maxim.yegorush...@gmail.com wrote...

> > > Richard Eich wrote:
> > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > machine patched after install from CDs.

> > > > Communicating over: GigE LAN

> > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > 5) No data is transmitted.

> > > > To troubleshoot, I've torn down and re-established to connection
> > > > countless times.  There may be a trickle of data initially, but
> > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > Receiver's window size never goes below 32K.

> > > > Never seen this kind of behavior before.  If the server process was
> > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > received.

> > How do you think I know the TCP RWIN size for the server?

> So, what does tcpdump shows you before the transfer stops?

Looks like we're getting a lot of retransmissions, which I believe
would clearly explain the full sender tcp send-q, the receiver empty
tcp recv-q, and the receiver's normal receive window size.

(Relative ACKs used for clarity).

|Time     | 192.168.75.100      | 192.168.76.15   |
|0.000    |         PSH, ACK - Len: 4             |Seq = 0 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.000    |         PSH, ACK - Len: 1448          |Seq = 4 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.000    |         ACK       |                   |Seq = 0 Ack = 4
|         |(43349)  <------------------  (9550)   |
|0.000    |         ACK - Len: 1448               |Seq = 1452 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.000    |         ACK - Len: 1448               |Seq = 2900 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.001    |         ACK       |                   |Seq = 0 Ack = 1452
|         |(43349)  <------------------  (9550)   |
|0.001    |         ACK - Len: 1448               |Seq = 4348 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.002    |         ACK       |                   |Seq = 0 Ack = 2900
|         |(43349)  <------------------  (9550)   |
|0.002    |         PSH, ACK - Len: 1448          |Seq = 5796 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.002    |         ACK       |                   |Seq = 0 Ack = 4348
|         |(43349)  <------------------  (9550)   |
|0.002    |         ACK - Len: 1448               |Seq = 7244 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.002    |         ACK       |                   |Seq = 0 Ack = 5796
|         |(43349)  <------------------  (9550)   |
|0.003    |         ACK - Len: 1448               |Seq = 8692 Ack = 0
|         |(43349)  ------------------>  (9550)   |
|0.003    |         ACK - Len: 1448               |Seq = 10140 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.003    |         ACK       |                   |Seq = 0 Ack = 7244
|         |(43349)  <------------------  (9550)   |
|0.003    |         ACK - Len: 1448               |Seq = 11588 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.003    |         ACK       |                   |Seq = 0 Ack = 8692
|         |(43349)  <------------------  (9550)   |
|0.003    |         ACK - Len: 1448               |Seq = 13036 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.004    |         ACK       |                   |Seq = 0 Ack =
10140
|         |(43349)  <------------------  (9550)   |
|0.004    |         PSH, ACK - Len: 1448          |Seq = 14484 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.004    |         ACK       |                   |Seq = 0 Ack =
10140
|         |(43349)  <------------------  (9550)   |
|0.004    |         ACK - Len: 1448               |Seq = 15932 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.004    |         ACK       |                   |Seq = 0 Ack =
10140
|         |(43349)  <------------------  (9550)   |
|0.004    |         ACK - Len: 1448               |Seq = 17380 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.005    |         ACK       |                   |Seq = 0 Ack =
10140
|         |(43349)  <------------------  (9550)   |
|0.005    |         ACK - Len: 1448               |Seq = 10140 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.005    |         ACK       |                   |Seq = 0 Ack =
10140
|         |(43349)  <------------------  (9550)   |
|0.006    |         ACK       |                   |Seq = 0 Ack =
14484
|         |(43349)  <------------------  (9550)   |
|0.006    |         ACK - Len: 1448               |Seq = 18828 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.007    |         ACK       |                   |Seq = 0 Ack =
14484
|         |(43349)  <------------------  (9550)   |
|0.007    |         PSH, ACK - Len: 1448          |Seq = 14484 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.008    |         ACK       |                   |Seq = 0 Ack =
20276
|         |(43349)  <------------------  (9550)   |
|0.008    |         ACK - Len: 1448               |Seq = 20276 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.010    |         ACK       |                   |Seq = 0 Ack =
21724
|         |(43349)  <------------------  (9550)   |
|0.010    |         ACK - Len: 1448               |Seq = 21724 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.010    |         ACK - Len: 1448               |Seq = 23172 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.011    |         ACK       |                   |Seq = 0 Ack =
23172
|         |(43349)  <------------------  (9550)   |
|0.011    |         ACK - Len: 1448               |Seq = 24620 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.011    |         ACK - Len: 1448               |Seq = 26068 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.011    |         ACK       |                   |Seq = 0 Ack =
24620
|         |(43349)  <------------------  (9550)   |
|0.011    |         ACK - Len: 1448               |Seq = 27516 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.012    |         ACK       |                   |Seq = 0 Ack =
24620
|         |(43349)  <------------------  (9550)   |
|0.012    |         ACK - Len: 1448               |Seq = 28964 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.012    |         ACK       |                   |Seq = 0 Ack =
24620
|         |(43349)  <------------------  (9550)   |
|0.012    |         ACK - Len: 1448               |Seq = 30412 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.013    |         ACK       |                   |Seq = 0 Ack =
24620
|         |(43349)  <------------------  (9550)   |
|0.013    |         ACK - Len: 1448               |Seq = 24620 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.014    |         ACK       |                   |Seq = 0 Ack =
28964
|         |(43349)  <------------------  (9550)   |
|0.216    |         ACK - Len: 1448               |Seq = 28964 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.218    |         ACK       |                   |Seq = 0 Ack =
31860
|         |(43349)  <------------------  (9550)   |
|0.218    |         ACK - Len: 1448               |Seq = 31860 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.218    |         ACK - Len: 1448               |Seq = 33308 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.219    |         ACK       |                   |Seq = 0 Ack =
33308
|         |(43349)  <------------------  (9550)   |
|0.219    |         ACK - Len: 1448               |Seq = 34756 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.219    |         ACK - Len: 1448               |Seq = 36204 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.219    |         ACK       |                   |Seq = 0 Ack =
34756
|         |(43349)  <------------------  (9550)   |
|0.219    |         ACK - Len: 1448               |Seq = 37652 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.220    |         ACK       |                   |Seq = 0 Ack =
34756
|         |(43349)  <------------------  (9550)   |
|0.220    |         ACK - Len: 1448               |Seq = 39100 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.220    |         ACK       |                   |Seq = 0 Ack =
34756
|         |(43349)  <------------------  (9550)   |
|0.220    |         ACK - Len: 1448               |Seq = 40548 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.221    |         ACK       |                   |Seq = 0 Ack =
34756
|         |(43349)  <------------------  (9550)   |
|0.221    |         ACK - Len: 1448               |Seq = 34756 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.222    |         ACK       |                   |Seq = 0 Ack =
39100
|         |(43349)  <------------------  (9550)   |
|0.424    |         ACK - Len: 1448               |Seq = 39100 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.426    |         ACK       |                   |Seq = 0 Ack =
41996
|         |(43349)  <------------------  (9550)   |
|0.426    |         ACK - Len: 1448               |Seq = 41996 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.426    |         ACK - Len: 1448               |Seq = 43444 Ack =
0
|         |(43349)  ------------------>  (9550)   |
|0.427    |         ACK       |      
...

read more »

 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Mon, 27 Nov 2006 23:34:16







> > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > machine patched after install from CDs.

> > > > > Communicating over:    GigE LAN

> > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > 5) No data is transmitted.

> > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > countless times.  There may be a trickle of data initially, but
> > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > Receiver's window size never goes below 32K.

> > > > > Never seen this kind of behavior before.  If the server process was
> > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > received.

> > > How do you think I know the TCP RWIN size for the server?

> > So, what does tcpdump shows you before the transfer stops?

> Looks like we're getting a lot of retransmissions, which I believe
> would clearly explain the full sender tcp send-q, the receiver empty
> tcp recv-q, and the receiver's normal receive window size.

> (Relative ACKs used for clarity).

[]

In the final six tcp segments show that the last two 1448-byte segments
remain unacknowledged. The sender's tcp stack must retransmit the
segments as it happened in the packet dump before. The only explanation
I can think of why we can not see retransmissions in the dump, is that
the retransmissions never get to the network interface egress queue.
The queue is full when the network is congested or when the driver is
misbehaving. You may like to check ifconfig output, particularly if
errors, dropped and collisions fields are non zero for the related
interface.

Does send/write in the client return with an error and if so what is
errno value?

 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Tue, 28 Nov 2006 01:39:07








> > > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > > machine patched after install from CDs.

> > > > > > Communicating over:       GigE LAN

> > > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > > 5) No data is transmitted.

> > > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > > countless times.  There may be a trickle of data initially, but
> > > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > > Receiver's window size never goes below 32K.

> > > > > > Never seen this kind of behavior before.  If the server process was
> > > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > > received.

> > > > How do you think I know the TCP RWIN size for the server?

> > > So, what does tcpdump shows you before the transfer stops?

> > Looks like we're getting a lot of retransmissions, which I believe
> > would clearly explain the full sender tcp send-q, the receiver empty
> > tcp recv-q, and the receiver's normal receive window size.

> > (Relative ACKs used for clarity).

> []

> In the final six tcp segments show that the last two 1448-byte segments
> remain unacknowledged. The sender's tcp stack must retransmit the
> segments as it happened in the packet dump before. The only explanation
> I can think of why we can not see retransmissions in the dump, is that
> the retransmissions never get to the network interface egress queue.
> The queue is full when the network is congested or when the driver is
> misbehaving. You may like to check ifconfig output, particularly if
> errors, dropped and collisions fields are non zero for the related
> interface.

I've been checking /proc/net/dev regularly already.  In sum, hundreds
of thousands of packets out that interface, 34 errors and 0 drops.

Quote:> Does send/write in the client return with an error and if so what is
> errno value?

I log if write() to that socket returns <= 0, and haven't seen any
log messages for that event.  I'll double-check that to make sure
it's unlikely that a log message is getting lost.

I'll also push to have the patches made current, just in case there's
the driver is a factor and there's already a fix for it.

 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Tue, 28 Nov 2006 07:43:00









> > > > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > > > machine patched after install from CDs.

> > > > > > > Communicating over:  GigE LAN

> > > > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > > > 5) No data is transmitted.

> > > > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > > > countless times.  There may be a trickle of data initially, but
> > > > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > > > Receiver's window size never goes below 32K.

> > > > > > > Never seen this kind of behavior before.  If the server process was
> > > > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > > > received.

> > > > > How do you think I know the TCP RWIN size for the server?

> > > > So, what does tcpdump shows you before the transfer stops?

> > > Looks like we're getting a lot of retransmissions, which I believe
> > > would clearly explain the full sender tcp send-q, the receiver empty
> > > tcp recv-q, and the receiver's normal receive window size.

> > > (Relative ACKs used for clarity).

> > []

> > In the final six tcp segments show that the last two 1448-byte segments
> > remain unacknowledged. The sender's tcp stack must retransmit the
> > segments as it happened in the packet dump before. The only explanation
> > I can think of why we can not see retransmissions in the dump, is that
> > the retransmissions never get to the network interface egress queue.
> > The queue is full when the network is congested or when the driver is
> > misbehaving. You may like to check ifconfig output, particularly if
> > errors, dropped and collisions fields are non zero for the related
> > interface.

> I've been checking /proc/net/dev regularly already.  In sum, hundreds
> of thousands of packets out that interface, 34 errors and 0 drops.

> > Does send/write in the client return with an error and if so what is
> > errno value?

> I log if write() to that socket returns <= 0, and haven't seen any
> log messages for that event.  I'll double-check that to make sure
> it's unlikely that a log message is getting lost.

> I'll also push to have the patches made current, just in case there's
> the driver is a factor and there's already a fix for it.

I've recently noticed that, even when the throughput is fine, that
the senders receive window hovers around 12.  I take that to mean
that that TCP input buffer is consistently well-loaded.

Would another explanation for the lack of retransmissions in the dump
be that the ACKs are delayed through the TCP input buffer?

I've had the net.core.rmem_max and net.core.wmem_max set to 16MB for
a few weeks now.  Maybe that's too big?

 
 
 

Odd TCP client/server throughput problem

Post by phil-news-nos.. » Tue, 28 Nov 2006 13:40:15



| Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
| machine patched after install from CDs.
|
| Communicating over:     GigE LAN
|
| 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
| 2) Server consistently has TCP RECV-Q of 0.
| 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
| 4) Socket is up and connection is ESTABLISHED from both sides.
| 5) No data is transmitted.
|
| To troubleshoot, I've torn down and re-established to connection
| countless times.  There may be a trickle of data initially, but
| within a few seconds the client SEND-Q builds and transmission stops.  
| Receiver's window size never goes below 32K.
|
| Never seen this kind of behavior before.  If the server process was
| slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

I once had a problem that looked like this when doing a very large file
transfer (a tarball of a entire version of Slackware packages).  It
would always stop at the same point, showing a full send queue at the
sender, and nothing at the receiver.  Various diagnostics were attempted
and I finally discovered that the packets being resent were arriving at
the receiving machine sligtly corrupt.  It seems there was a very strange
data sensitivity in the ethernet interface hardware apparently causing it
to lose sync and shift things off by one byte.  I tried this on 3 other
machines with the same ethernet interface hardware and they all had the
very same problem.  I installed a separate ethernet card on that machine
and used it instead of the ethernet in the motherboard, and then it was
fine.  It all pointed at the ethernet chip on that particular motherboard.
I subsequently get around that issue by compressing and/or encrypting the
data.  The point in the data where the corruption took place had many
bytes of binary zero in a row, but not too many.  Longer sequences of
zeros did not cause a problem.  It seemed to depend on some of the data
before the zeros as well.  This problem was on the Intel ISP1100 server
and was reproducible on every ISP1100 I had access to.  It did not occur
on any Intel ethernet card, nor any other ethernet I have.

If something in the data is causing data corruption at a hardware layer,
all the retransmissions will be in vain.

--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |

|------------------------------------/-------------------------------------|

 
 
 

Odd TCP client/server throughput problem

Post by Maxim Yegorushki » Tue, 28 Nov 2006 18:05:10










> > > > > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > > > > machine patched after install from CDs.

> > > > > > > > Communicating over:     GigE LAN

> > > > > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > > > > 5) No data is transmitted.

> > > > > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > > > > countless times.  There may be a trickle of data initially, but
> > > > > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > > > > Receiver's window size never goes below 32K.

> > > > > > > > Never seen this kind of behavior before.  If the server process was
> > > > > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > > > > received.

> > > > > > How do you think I know the TCP RWIN size for the server?

> > > > > So, what does tcpdump shows you before the transfer stops?

> > > > Looks like we're getting a lot of retransmissions, which I believe
> > > > would clearly explain the full sender tcp send-q, the receiver empty
> > > > tcp recv-q, and the receiver's normal receive window size.

> > > > (Relative ACKs used for clarity).

> > > []

> > > In the final six tcp segments show that the last two 1448-byte segments
> > > remain unacknowledged. The sender's tcp stack must retransmit the
> > > segments as it happened in the packet dump before. The only explanation
> > > I can think of why we can not see retransmissions in the dump, is that
> > > the retransmissions never get to the network interface egress queue.
> > > The queue is full when the network is congested or when the driver is
> > > misbehaving. You may like to check ifconfig output, particularly if
> > > errors, dropped and collisions fields are non zero for the related
> > > interface.

> > I've been checking /proc/net/dev regularly already.  In sum, hundreds
> > of thousands of packets out that interface, 34 errors and 0 drops.

> > > Does send/write in the client return with an error and if so what is
> > > errno value?

> > I log if write() to that socket returns <= 0, and haven't seen any
> > log messages for that event.  I'll double-check that to make sure
> > it's unlikely that a log message is getting lost.

> > I'll also push to have the patches made current, just in case there's
> > the driver is a factor and there's already a fix for it.

> I've recently noticed that, even when the throughput is fine, that
> the senders receive window hovers around 12.  I take that to mean
> that that TCP input buffer is consistently well-loaded.

You probably have window scaling enabled by default, so 12 should be
scaled (12 * 2 ^ wscale). http://tools.ietf.org/html/rfc1323#section-2

The sender does not receive any data but ACKs, I'm not sure if its
receive buffer size has relevance.

Quote:> Would another explanation for the lack of retransmissions in the dump
> be that the ACKs are delayed through the TCP input buffer?

> I've had the net.core.rmem_max and net.core.wmem_max set to 16MB for
> a few weeks now.  Maybe that's too big?

I would try using another network card / driver to see if the hardware
and the driver are good.
 
 
 

Odd TCP client/server throughput problem

Post by Richard Eic » Tue, 28 Nov 2006 23:14:06











> > > > > > > > > Client and Server OS: SuSE 9.3 Pro 2.6.11 default kernel, neither
> > > > > > > > > machine patched after install from CDs.

> > > > > > > > > Communicating over:        GigE LAN

> > > > > > > > > 1) Server (receiver) is consistently adversting a TCP RWIN of 32K.
> > > > > > > > > 2) Server consistently has TCP RECV-Q of 0.
> > > > > > > > > 3) Client (sender) consistently shows a TCP SEND-Q of 80K.
> > > > > > > > > 4) Socket is up and connection is ESTABLISHED from both sides.
> > > > > > > > > 5) No data is transmitted.

> > > > > > > > > To troubleshoot, I've torn down and re-established to connection
> > > > > > > > > countless times.  There may be a trickle of data initially, but
> > > > > > > > > within a few seconds the client SEND-Q builds and transmission stops.
> > > > > > > > > Receiver's window size never goes below 32K.

> > > > > > > > > Never seen this kind of behavior before.  If the server process was
> > > > > > > > > slow, I'd expect to see a RECV-Q buildup to go with the big SEND-Q.

> > > > > > > > I would use tcpdump or wireshark/ethereal to see what is being sent and
> > > > > > > > received.

> > > > > > > How do you think I know the TCP RWIN size for the server?

> > > > > > So, what does tcpdump shows you before the transfer stops?

> > > > > Looks like we're getting a lot of retransmissions, which I believe
> > > > > would clearly explain the full sender tcp send-q, the receiver empty
> > > > > tcp recv-q, and the receiver's normal receive window size.

> > > > > (Relative ACKs used for clarity).

> > > > []

> > > > In the final six tcp segments show that the last two 1448-byte segments
> > > > remain unacknowledged. The sender's tcp stack must retransmit the
> > > > segments as it happened in the packet dump before. The only explanation
> > > > I can think of why we can not see retransmissions in the dump, is that
> > > > the retransmissions never get to the network interface egress queue.
> > > > The queue is full when the network is congested or when the driver is
> > > > misbehaving. You may like to check ifconfig output, particularly if
> > > > errors, dropped and collisions fields are non zero for the related
> > > > interface.

> > > I've been checking /proc/net/dev regularly already.  In sum, hundreds
> > > of thousands of packets out that interface, 34 errors and 0 drops.

> > > > Does send/write in the client return with an error and if so what is
> > > > errno value?

> > > I log if write() to that socket returns <= 0, and haven't seen any
> > > log messages for that event.  I'll double-check that to make sure
> > > it's unlikely that a log message is getting lost.

> > > I'll also push to have the patches made current, just in case there's
> > > the driver is a factor and there's already a fix for it.

> > I've recently noticed that, even when the throughput is fine, that
> > the senders receive window hovers around 12.  I take that to mean
> > that that TCP input buffer is consistently well-loaded.

> You probably have window scaling enabled by default, so 12 should be
> scaled (12 * 2 ^ wscale). http://tools.ietf.org/html/rfc1323#section-2

> The sender does not receive any data but ACKs, I'm not sure if its
> receive buffer size has relevance.

The sender has two of its four interfaces in promiscuous mode, and is
getting hit pretty hard 24x7 (i.e., 240+ million packets-in on eth0
alone, in the past six hours).  Most of that is TCP.

Quote:> > Would another explanation for the lack of retransmissions in the dump
> > be that the ACKs are delayed through the TCP input buffer?

> > I've had the net.core.rmem_max and net.core.wmem_max set to 16MB for
> > a few weeks now.  Maybe that's too big?

> I would try using another network card / driver to see if the hardware
> and the driver are good.

I'm heading to that course of action.  It's complicated a little by
the two interfaces in promisc mode being on an optical GiG/E card,
which was supplied after-market by our hardware vendor and
retrofitted by Solaris SAs.  I'll have to get a replacement (not
difficult) and updated drivers (if any, not difficult) and then get
into a remote production datacenter and perform surgery (logistically
messy, but not difficult).

I greatly appreciate every single thing you've offered for help.

 
 
 

1. Log TCP Communication between client and Server (tee for TCP)

Hello,

before doing it by myself,

Has anybody developed a program to log the Communication between a
client and service using inetd.
I want to put this thing into inetd.conf (like tcpd).
and log the communication between client and server to track down an
error.  
client <-> teeforTCP <-> server app
                   ->logfile

Regards
Dietmar

2. Restricted to TTYP0...TTYP8

3. How can I calculate the throughput of a TCP server having many connections?

4. POP3 question:invalid mailbox format

5. Client/Server TCP/IP (FILE *) problem

6. remote wake-up

7. TCP/IP client/server chat problem

8. Help in installing on Compaq Prosignia w/ NCR 53c710 SCSI

9. Odd TCP problems *sigh*

10. Odd Linux TCP problem

11. pppd 2.3.10/linux 2.2.13: odd TCP connection problems

12. Red Hat 4 odd TCP stack problem

13. Connecting two machines:; Client only communicate with server when server continuously pings client.