Very high bandwith packet based interface and performance problems

Very high bandwith packet based interface and performance problems

Post by Nye Li » Thu, 22 Feb 2001 11:19:55



I am working on a very high speed packet based interface but we are having
severe problems related to bandwidth vs cpu horsepower. enclosed is a part

Thanks!!!

--
"Who would be stupid enough to quote a fictitious character?"
        -- Don Quixote

[ Attached Message ]

From:
To:
Cc:
Date: Tue, 20 Feb 2001 18:03:56 -0800
Local: Tues, Feb 20 2001 9:03 pm
Subject: SI/ppc performance issues.
Due to the limited horsepower of our ppc740 (as it has no cache) our
proprietary, 2 Gbit packet based interface is capable of overwhelming
the software throughput capabilities of the kernel. This congestion is
causing severe network performance issues in both UDP and TCP.  In UDP,
if the frame rate exceeds approximately 300Mbit (1500 byte packets),
the kernel usage goes to 100%, leaving no cpu power for user space
applications to even recieve frames, causing severe queuing packet
loss. In the TCP case, there seem to be constant acks from the kernel,
but most data never seems to make it to user space.

Inspecting the /proc/net/dev and /proc/net/snmp counters reveal no errors.

As a control, the private 10/100 ethernet interface is capable of
sustaining 100Mbits of unidirectional UDP and TCP trafic with no problems.

Similary, if a traffic policer is used to limit the load of the
proprietary high speed interface to approx. 200Mbits, there is no packet
loss in UDP. Since we lack a shaper, we can't test TCP reliably, as
the policer drops packets instead of shaping output. We can test this
qualitatively by artificially preventing the TCP source from sending
too quickly.. we can do this by loading the source cpu heavily. however,
results from this are mixed. We seem to be able to attain only approx.
50-60Mbits by this method.

Questions:

There are two options in the 2.0 kernel. One is "Cpu is too slow for
network" or something similar. A second (driver specific option) is a
flow control mechanism.

In 2.4, the first seems to be missing. The second is only available for
a few drivers (e.g. tulip)

What do these options do?

In 2.4, what is the recommended way of keeping a high speed interface
from overwhelming the kernel network queue (e.g. Gig ethernet)?

Does this affect user space programs (e.g. ftpd, apache, etc)?

If so, how?

What are the mechanisms by which the Linux kernel drops frames?

Which mechanisms are accompanied by statistics, and what are they.

Which mechanisms are NOT accompanied by statistics.

Why is the kernel acking a blocked TCP stream? (i.e. when a user space
TCP program is unable to read from a socket because it is not being
schedulued due to kernel cpu load)

(todd... please comment, as this is a prelim document for the problem
description)

-Nye

--
"Who would be stupid enough to quote a fictitious character?"
        -- Don Quixote

 
 
 

Very high bandwith packet based interface and performance problems

Post by Nye Li » Fri, 23 Feb 2001 07:00:55



> Dropping packets under load will make tcp do the right thing. You don't need
> complex mathematical models since dropping frames under load is just another
> form of congestion and tcp handles it pretty sanely

Alan: thanks for your response...

This is exactly what I would expect to see, but we are seeing something
else..

Under HEAVY load we are seeing approximately 20Mbit of TCP throughput. If
we "shape" (i use the term loosely, we dont actually have a real shaper,
just loading the cpu who is trasmitting) the presented load, we can
get 60-70Mbit. I'm not quite sure why this is.  My first guess was
that because the kernel was getting 99% of the cpu, the application was
getting very little, and thus the read wasn't happening fast enough, and
the socket was blocking. In this case, you would expect the system to get
to a nice equilibrium, where if the app stopped reading, the kernel would
stop acking, and the transmitter would back off, eventually to a point
where the app could start reading again because the kernel load dropped.

This is NOT what I'm seeing at all.. the kernel load appears to be
pegged at 100% (or very close to it), the user space app is getting
enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
appears to be ACKING ALL the traffic, which I don't understand at all
(e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

With udp, we can get the full 300MBit throughput, but only if we shape
the load to 300Mbit. If we increase the load past 300 MBit, the received
frames (at the user space udp app) drops to 10-20MBit, again due to
user-space application scheduling problems.

-nye
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Alan Co » Fri, 23 Feb 2001 07:07:32


Quote:> that because the kernel was getting 99% of the cpu, the application was
> getting very little, and thus the read wasn't happening fast enough, and

Seems reasonable

Quote:> This is NOT what I'm seeing at all.. the kernel load appears to be
> pegged at 100% (or very close to it), the user space app is getting
> enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> appears to be ACKING ALL the traffic, which I don't understand at all
> (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

TCP _requires_ the remote end ack every 2nd frame regardless of progress.

Quote:> With udp, we can get the full 300MBit throughput, but only if we shape
> the load to 300Mbit. If we increase the load past 300 MBit, the received
> frames (at the user space udp app) drops to 10-20MBit, again due to
> user-space application scheduling problems.

How is your incoming traffic handled architecturally - irq per packet or
some kind of ring buffer with irq mitigation.  Do you know where the cpu
load is - is it mostly the irq servicing or mostly network stack ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Nye Li » Fri, 23 Feb 2001 07:11:57



> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and

> Seems reasonable

> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> TCP _requires_ the remote end ack every 2nd frame regardless of progress.

> > With udp, we can get the full 300MBit throughput, but only if we shape
> > the load to 300Mbit. If we increase the load past 300 MBit, the received
> > frames (at the user space udp app) drops to 10-20MBit, again due to
> > user-space application scheduling problems.

> How is your incoming traffic handled architecturally - irq per packet or
> some kind of ring buffer with irq mitigation.  Do you know where the cpu
> load is - is it mostly the irq servicing or mostly network stack ?

Alan: thanks again for your prompt response!

bus mastered DMA ring buffer. As to the load, I'm not quite sure... we
were using a fairly large ring buffer, but increasing/decreasing the size
didn't seem to affect the number of packets per interrrupt. I added a
little watermarking code, and it seems that we do (at peak) about 30-35
packets per interrupt. That is STILL a heck of a lot of interrupts! I
can't quite figure out why the driver refuses to go deeper.

I can think of a couple possible solutions. our interface has a HUGE
amount of hardware buffers, so I can easily simply stop reading for
a small time if we detect conjestion... can you suggest a nice clean
mechanism for this?

any other ideas?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Alan Co » Fri, 23 Feb 2001 07:25:54


Quote:> I can think of a couple possible solutions. our interface has a HUGE
> amount of hardware buffers, so I can easily simply stop reading for
> a small time if we detect conjestion... can you suggest a nice clean
> mechanism for this?

If you have a lot of buffers you can try one thing to see if its IRQ load,
turn the IRQ off, set a fast timer running and hook the buffer handling to
the timer irq.

Next obvious step would be using the timer based irq handling to limit the
number of buffers you use netif_rx() on and discard any others.

Finally don't rule out memory bandwidth, if the ram is main memory then the
dma engine could be pretty much driving the cpu off the bus  at high data
rates.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Gregory Maxwel » Fri, 23 Feb 2001 07:27:06


[snip]

Quote:> This is NOT what I'm seeing at all.. the kernel load appears to be
> pegged at 100% (or very close to it), the user space app is getting
> enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> appears to be ACKING ALL the traffic, which I don't understand at all
> (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> With udp, we can get the full 300MBit throughput, but only if we shape
> the load to 300Mbit. If we increase the load past 300 MBit, the received
> frames (at the user space udp app) drops to 10-20MBit, again due to
> user-space application scheduling problems.

Perhaps excess context switches are thrashing the system?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Very high bandwith packet based interface and performance problems

Post by Nye Li » Fri, 23 Feb 2001 10:24:31



> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and

> Seems reasonable

> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> TCP _requires_ the remote end ack every 2nd frame regardless of progress.

YIPES. I didn't realize this was the case.. how is end-to-end application
flow control handled when the bottle neck is user space bound and not b/w
bound? e.g. if i write a test app that does a

while(1) {
    sleep (5);
    read(sock, buf, 1);

Quote:}

and the transmitter is unrestricted, what happens?

Does it have to do with TCP_FORMAL_WINDOW (eg. automatically reduce window
size to zero when queue backs up?)

or is it only a cpu loading problem? (ie. is there a difference in queuing
behavior between 1) the user process doesnt get cycles 2) the user process
simply fails to read ?)

Also, I have been reading up on CONFIG_HW_FLOWCONTROL.. what is the
recommended way for the driver to stop receiving? In the sample tulip
code i see you can register a xon callback, but i can't tell if there
is a way to see the backlog from the driver.

-nye
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Rick Jone » Fri, 23 Feb 2001 10:46:25



> > that because the kernel was getting 99% of the cpu, the application was
> > getting very little, and thus the read wasn't happening fast enough, and

> Seems reasonable

> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> TCP _requires_ the remote end ack every 2nd frame regardless of progress.

um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
MUST?

rick jones
--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Rick Jone » Fri, 23 Feb 2001 10:50:48


Quote:> > > This is NOT what I'm seeing at all.. the kernel load appears to be
> > > pegged at 100% (or very close to it), the user space app is getting
> > > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > > appears to be ACKING ALL the traffic, which I don't understand at all
> > > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> > TCP _requires_ the remote end ack every 2nd frame regardless of progress.

> YIPES. I didn't realize this was the case.. how is end-to-end application
> flow control handled when the bottle neck is user space bound and not b/w
> bound? e.g. if i write a test app that does a

If the app is not reading from the socket buffer, the receiving TCP is
supposed to stop sending window-updates, and the sender is supposed to
stop sending data when it runs-out of window.

If TCP ACK's data, it really should (must?) not then later drop it on
the floor without aborting the connection. If a TCP is ACKing data and
then that data is dropped before it is given to the application, and the
connection is not being reset, that is probably a bug.

A TCP _is_ free to drop data prior to sending an ACK - it simply drops
it and does not ACK it.

rick jones

--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Alan Co » Fri, 23 Feb 2001 19:14:19


Quote:> and the transmitter is unrestricted, what happens?
> Does it have to do with TCP_FORMAL_WINDOW (eg. automatically reduce window
> size to zero when queue backs up?)

Read RFC1122. Basically your guess is right. The sender sends data, and gets
back acks saying 'window 0'. It will then do exponential backoffs while
polling the 0 window as it backs off (ack being unreliable)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Alan Co » Fri, 23 Feb 2001 19:20:46


Quote:> > TCP _requires_ the remote end ack every 2nd frame regardless of progress.

> um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
> MUST?

Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
must and I know of no stack significantly violating it.

RFC1122 also requires that your protocol stack SHOULD be able to leap tall
buldings at a single bound of course...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Rick Jone » Sat, 24 Feb 2001 03:12:55



> > > TCP _requires_ the remote end ack every 2nd frame regardless of progress.

> > um, I thought the spec says that ACK every 2nd segment is a SHOULD not a
> > MUST?

> Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> must and I know of no stack significantly violating it.

I didn't know there was such a thing as a normal environment :)

Quote:> RFC1122 also requires that your protocol stack SHOULD be able to leap tall
> buldings at a single bound of course...

And, of course my protocol stack does :) It is also a floor wax, AND a
dessert topping!-)

rick jones
--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by Pavel Mache » Sat, 24 Feb 2001 18:10:03


Hi!

Quote:> > This is NOT what I'm seeing at all.. the kernel load appears to be
> > pegged at 100% (or very close to it), the user space app is getting
> > enough cpu time to read out about 10-20Mbit, and FURTHERMORE the kernel
> > appears to be ACKING ALL the traffic, which I don't understand at all
> > (e.g. the transmitter is simply blasting 300MBit of tcp unrestricted)

> TCP _requires_ the remote end ack every 2nd frame regardless of
> progress.

Should not TCP advertise window of 0 to stop sender?

Where does kernel put all those data in tcp case? I do not understand
that. Transmiter blasts at 300Mbit, userspace gets 20Mbit. There's
280Mbit datastream going _somewhere_. It should be eating memory at
35MB/second, unless you have 1Gig of ram, something interesting should
happen after minute or so...
                                                                Pavel
--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Very high bandwith packet based interface and performance problems

Post by kuz.. » Sun, 25 Feb 2001 03:50:03


Hello!

Quote:> > Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> > must and I know of no stack significantly violating it.

> I didn't know there was such a thing as a normal environment :)

Jokes apart, such "normal" environments are rare today.

From tcpdumps it is clear, that win2000 does not ack each other mss.
It can ack once per window at high load. I have seen the same behaviour
of solaris. freebsd-4.x surely does not ack each second mss
(it is from source code), which is probably bug (at least, it stops
to ack at all as soon as MSG_WAITALL is used. 8))

Acking each second mss is required to do slow start more or less
fastly. As soon as window is full, they are useless, so that win2000
is fully right and, in fact, optimal.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. NF-HIPAC: High Performance Packet Classification for Netfilter

Hi,

nf-hipac aims to become a drop-in replacement for the iptables packet
filtering module. It implements a novel framework for packet classification
which uses an advanced algorithm to reduce the number of memory lookups per
packet. The module is ideal for environments where large rulesets and/or
high bandwidth networks are involved.

The algorithm code is designed in a way that it can be verified in userspace,
so the algorithm code itself can be considered correct. We are not able to
really verify the remaining files nfhp_mod.[ch] and the userspace tool
(nf-hipac.[ch]), but they are tested in depth and shouldn't contain any
critical bugs.

We have the results of some basic performance tests available on our web page.
The test compares the performance of the iptables filter table to the
performance of nf-hipac. Results are pretty impressive :-)

You can find the performance test results on our web page http://www.hipac.org
The releases can be downloaded from http://sourceforge.net/projects/nf-hipac/

Features:
    - optimized for high performance packet classification
      with moderate memory usage
    - completely dynamic:
        data structure isn't rebuild from scratch when inserting or
        deleting rules, so fast updates are possible
    - userspace tool syntax is very similar to the iptables syntax
    - kernel does not need to be patched
    - compatible to iptables: you can use iptables and nf-hipac at
      the same time:
        for example you could use the connection tracking module from
        iptables and match the states with nf-hipac
    - match support for:
        + source/destination ip
        + in/out interface
        + protocol (udp, tcp, icmp)
        + source/destination ports (udp, tcp)
        + icmp type
        + tcp flags
        + ttl
        + state match (conntrack module must be loaded)
   - /proc/net/nf-hipac:
        + algorithm statistics available via
            # cat /proc/net/nf-hipac
        + allows to dynamically limit the maximum memory usage
            # echo   >  /proc/net/nf-hipac

Enjoy,

+-----------------------+----------------------+
|   Michael Bellion     |     Thomas Heinz     |

+-----------------------+----------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Helix Gnome, installation

3. NF-HIPAC: High Performance Packet Classification

4. adding additional ide hard drives to SCO 5.0.0

5. nf-hipac: High Performance Packet Classification for Netfilter

6. tux....i fink

7. ESSL 2.1 wastes 60 MB

8. route packet based on incoming interface, not by routing table??

9. Performance problems with low bandwith

10. high packet loss w/ packet size > 1024 byte

11. Performance of 2.4.17-based Kernel vs 2.5.26-based Kernel Under Database Workload

12. Router dropping packets from eth interface to ppp interface