Errors on direct GigE link

Errors on direct GigE link

Post by Andrew Gideo » Sun, 19 Mar 2006 05:41:35



I'm seeing errors on one side of a GigE link.  The link is a
point-to-point, just connecting two computers.  As I move a lot of data,
the error count increases.

What can I do to fix/debug something like this?  My understanding is that
there's no half-duplex GigE, so that cannot be the problem.  The "errors"
that I see are RX errors, dropped, and overruns.

The kernel is 2.6.9-34.ELsmp.  The driver for the NIC is the e1000.

ifconfig reports:

eth1      Link encap:Ethernet  HWaddr 00:0D:60:83:78:33
          inet addr:192.168.6.202  Bcast:192.168.6.203  Mask:255.255.255.252
          inet6 addr: fe80::20d:60ff:fe83:7833/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:71585966 errors:82025 dropped:82025 overruns:82025 frame:0
          TX packets:47757031 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:330120282 (314.8 MiB)  TX bytes:1106484502 (1.0 GiB)
          Base address:0x3400 Memory:d0240000-d0260000

/proc/net/dev contains:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:   18152     103    0    0    0     0          0         0    18152     103    0    0    0     0       0          0
  eth0:  754856    8269    0    0    0     0          0         0  1559998    6480    0    0    0     0       0          0
  eth1:330120388 71585967 82025 82025 82025     0          0         0 1106484672 47757033    0    0    0     0       0          0
  sit0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

The two controllers on this machine are:

02:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller
04:03.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller

I'm confused about these overruns.  The computer is doing nothing but a dd
reading from an iSCSI SAN (and writing to /dev/null).  It contains a
3Ghz Pentium 4 (it's an IBM x306).  Why would it be unable to
handle the traffic?

        - Andrew

 
 
 

Errors on direct GigE link

Post by Bill Marcu » Sun, 19 Mar 2006 19:26:50


On Fri, 17 Mar 2006 15:41:35 -0500, Andrew Gideon

> I'm seeing errors on one side of a GigE link.  The link is a
> point-to-point, just connecting two computers.  As I move a lot of data,
> the error count increases.

> What can I do to fix/debug something like this?  My understanding is that
> there's no half-duplex GigE, so that cannot be the problem.  The "errors"
> that I see are RX errors, dropped, and overruns.

Do you have a spare GigE card?  Or at least a spare cable?

--
To be loved is very demoralizing.
                -- Katharine Hepburn

 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Mon, 20 Mar 2006 09:00:28



> Do you have a spare GigE card?  Or at least a spare cable?

I did try swapping the cable Just In Case.  However, I'm getting this
result on two different machines (both identical, though: IBM x306s).

I've been experimenting.  I tried, for example, enabling flow control in
the e1000 driver.  According to ethtool, it defaults to "off" (which I
didn't expect).

This has the good effect of bringing overruns down to zero.  It has the
bad effect of killing performance almost as bad as when I let the overruns
occur.  It's as if the "pause" brought on by flow control is too long.
I've not [yet] seen a way to control this, however.

I've tried a few configurations of various parameters of the e1000, and
none yielded terribly good performance.  Some just yielded terrible
performance.  So I'm backing off, and taking the more methodical approach
of tweaking each parameter in turn and recording the results.

Once I get through some decent combinations, I'll add in tweaks to
variables I can control via the iSCSI initiator software (ie. TCP window
size).  I'd certain welcome any suggestions of parameters that anyone
knows will help much; I could concentrate on those first.

But I feel like I'm doing work that shouldn't be necessary.  It simply
cannot be the case that GigE is so difficult to "get right".

        - Andrew

 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Mon, 20 Mar 2006 10:09:13



> But I feel like I'm doing work that shouldn't be necessary.  It simply
> cannot be the case that GigE is so difficult to "get right".

It occurred to me after I wrote this that I was already using GigE,
including in one case where a lot of data is received (ie. on our backup
server).  I just checked, and ethtool says that flow control is off but
ifconfig says that there have been zero overruns (or any other errors)
since the interface was last reset.

But this is connected not directly to another computer but to a
[Cisco] switch.  So it's not GigE per se.

        - Andrew

 
 
 

Errors on direct GigE link

Post by David Schwart » Mon, 20 Mar 2006 15:03:24



Quote:> I'm seeing errors on one side of a GigE link.  The link is a
> point-to-point, just connecting two computers.  As I move a lot of data,
> the error count increases.

    Why do you think this is a problem?

Quote:> What can I do to fix/debug something like this?  My understanding is that
> there's no half-duplex GigE, so that cannot be the problem.  The "errors"
> that I see are RX errors, dropped, and overruns.
> I'm confused about these overruns.  The computer is doing nothing but a dd
> reading from an iSCSI SAN (and writing to /dev/null).  It contains a
> 3Ghz Pentium 4 (it's an IBM x306).  Why would it be unable to
> handle the traffic?

    I doubt it can handle the peak bursts 100% of the time.

    DS

 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Tue, 21 Mar 2006 02:07:26





>> I'm seeing errors on one side of a GigE link.  The link is a
>> point-to-point, just connecting two computers.  As I move a lot of data,
>> the error count increases.

>     Why do you think this is a problem?

Interesting question.  I'm blaming this for poor performance I'm seeing
under certain conditions.

Specifically, I'm copying data from an iSCSI-based SAN (testing the use of
the SAN, in fact).  When I have a single process doing the copy (ie. a dd
from a volume mounted from the SAN to /dev/null), I get pretty good
performance.  When I run two dd processes concurrently, I get awful
performance.

I noticed that I see no (or occasionally very few) overruns in the case of
the single dd process, but more overruns in the case of the two dd
processes.

So I've been assuming that overruns are, if not the cause, at least a
common symptom.

Is that reasonable, or do you believe otherwise?  I'd be most interested
in other directions in which I should be looking.

[...]

Quote:>     I doubt it can handle the peak bursts 100% of the time.

I would think the same thing of my backup server.  Yet asking both the
switch to which it is connected and that machine itself, no errors of any
sort (including overruns) are reported.  Yet flow control is listed as off
on both switch and computer.

So something else is throttling.

I've no problem "blaming" rsync or ssh for this throttling.  I seem to
recall, in fact, reading of some "problem" in one of those two where the
TCP window was too small for fully exploiting certain links (high
speed, high latency?  I don't recall the details).

Of course, I run multiple rsync/ssh processes concurrently, with streams
at the backup server coming from all over my network.  So I'm not sure I
entirely believe my own explanation.

So perhaps the solution is in tuning parameters like the TCP window for
iSCSI.  It looks like the initiator software on Linux permits this, so
I'll be checking.  I've no idea, though, how well this will impact the
SAN's choices.  Unfortunately, it's more of a black box than I'd prefer.

But I welcome any and all insights into what's occurring and what I can do
to get better performance over this "point to point GigE" link.

Thanks...

        Andrew

 
 
 

Errors on direct GigE link

Post by David Schwart » Tue, 21 Mar 2006 09:41:23



Quote:> Specifically, I'm copying data from an iSCSI-based SAN (testing the use of
> the SAN, in fact).  When I have a single process doing the copy (ie. a dd
> from a volume mounted from the SAN to /dev/null), I get pretty good
> performance.  When I run two dd processes concurrently, I get awful
> performance.

> I noticed that I see no (or occasionally very few) overruns in the case of
> the single dd process, but more overruns in the case of the two dd
> processes.

> So I've been assuming that overruns are, if not the cause, at least a
> common symptom.

> Is that reasonable, or do you believe otherwise?  I'd be most interested
> in other directions in which I should be looking.

    Is the CPU maxed at the time?

    What is "pretty good" performance and what is "awful performance"? The
problem may be more having to do with the heads having to seek back and
forth for the two concurrent operations.

    DS

 
 
 

Errors on direct GigE link

Post by Allen McIntos » Tue, 21 Mar 2006 10:32:30


Quote:> The
> problem may be more having to do with the heads having to seek back and
> forth for the two concurrent operations.

Except IIRC he was running dd to /dev/null.

 >     Is the CPU maxed at the time?

This is my guess.  Assuming the SAN has multiple disks in it, copying
one file out may be limited by the disk transfer rate, and that may be
insufficient to max out the receiving CPU.  Two transfers may result in
enough throughput to swamp the receiving CPU.

What kernel is being used here?  Does the driver have NAPI turned on?
(It's off in vanilla kernels by default, YMMV.)

 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Wed, 22 Mar 2006 02:03:54



>> The
>> problem may be more having to do with the heads having to seek back and
>> forth for the two concurrent operations.

> Except IIRC he was running dd to /dev/null.

I assumed that he meant the disk heads on the SAN.  I've tried to avoid
this by having the two processes (in fact, all of my testing along this
line) read the same file from the SAN.  It is too large for cache (it's a
2G file), but there should still be minimal thrashing of disk hardware.

Quote:>  >     Is the CPU maxed at the time?

> This is my guess.  Assuming the SAN has multiple disks in it, copying
> one file out may be limited by the disk transfer rate, and that may be
> insufficient to max out the receiving CPU.  Two transfers may result in
> enough throughput to swamp the receiving CPU.

I never thought to check for that.  I'll address that below.

Quote:> What kernel is being used here?  Does the driver have NAPI turned on?
> (It's off in vanilla kernels by default, YMMV.)

It's 2.6.9-34.ELsmp (RHEL 4).  I don't know about whether or not NAPI is
on; how would I "ask"?

With respect to performance numbers:

Using a little tool we've here (so take all numbers as relative), I see a
count of from 45 to 65 with a single process and flow control off.  With
two processes and no flow control, I see between 4 and 6.

Supposedly, these numbers are MB/s, but I'd not really trust them as
absolute values.

The program is reading in 64KB chunks, so two should do little with
respect to the initiator's 1G of RAN.

There's a lot of variability in the numbers.  The same test might yield 22
in one case and then 55 in the next.  Yet, aside from my testing, the SAN
and the initiator are doing nothing.

I've tried tweaking various values (ie. the TCP window size, the number of
RX descriptors on the initiator, MaxRecvDataSegmentLength, etc.  So far,
all results have been consistent with not having made the changes (perhaps
because there's so much variability).

When I first started this thread, I was thinking that perhaps the
"network" was an issue.  Now, I'm not so sure.

With respect to the system's load:

I'm running a dual process test now.  The load on the system is
holding just below two.  When I look at top, the processes doing the
reading are often at the top.  Occasionally, it is iscsi-rx.

With a single process, it's at about 1.4 most of the time.

        - Andrew

 
 
 

Errors on direct GigE link

Post by Rick Jone » Wed, 22 Mar 2006 07:36:05



> The program is reading in 64KB chunks, so two should do little with
> respect to the initiator's 1G of RAN.

What size are the iSCSI requests?  

How many iSCSI requests are outstanding at a time?

How do those compare with the size of the FIFO on the GbE NIC?

IIRC Overruns means that the inbound FIFO doing the "speed matching"
between the network and the memory subsystem (shorthand for
"everything between the NIC and the DRAMS :) filled - basically, it
means the DMA to memory lost the race with the network.  Having said
that I thought that all GbE NICs were store-and-forward, but I suppose
some low-end ones might try to get-by with a FIFO rather than
store-and-forward memory on the "NIC" itself.

And... I suppose that if the NIC memory for store-and-forward filled
that _might_ be aggregated into an "Overflows" stat.  If ethtool can
show some more specific stats that might be good.  Ethtool should be
able to distinguish (assuming the driver does) between FIFO overflows
and running-out of rx descriptors etc.

rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

 
 
 

Errors on direct GigE link

Post by Allen McIntos » Wed, 22 Mar 2006 11:39:49


Quote:> It's 2.6.9-34.ELsmp (RHEL 4).  I don't know about whether or not NAPI is
> on; how would I "ask"?

grep NAPI /boot/config-2.6.9-34.ELsmp
(or whatever they named the config file used to generate the kernel).
 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Tue, 28 Mar 2006 23:56:44



>> It's 2.6.9-34.ELsmp (RHEL 4).  I don't know about whether or not NAPI is
>> on; how would I "ask"?

> grep NAPI /boot/config-2.6.9-34.ELsmp (or whatever they named the config
> file used to generate the kernel).

CONFIG_AMD8111E_NAPI=y
CONFIG_ADAPTEC_STARFIRE_NAPI=y
CONFIG_E100_NAPI=y
CONFIG_E1000_NAPI=y
CONFIG_R8169_NAPI=y
CONFIG_IXGB_NAPI=y
CONFIG_S2IO_NAPI=y

The NIC I'm using uses the e1000 driver, so I'm going to guess that the
answer is "yes".

        - Andrew

 
 
 

Errors on direct GigE link

Post by Andrew Gideo » Tue, 28 Mar 2006 23:55:23




>> The program is reading in 64KB chunks, so two should do little with
>> respect to the initiator's 1G of RAN.

> What size are the iSCSI requests?

I'm not sure.  The initiator has configuration parameters which should
control this, but I don't know how to confirm that these are being
[properly] used.

Assuming that they are, however: The latest testing I've done used a
Maximum Burst Length of 131072 bytes and a Maximum Receive Data Segment of
65536 bytes.  I've been playing (decreasing) these values during testing
to no apparent effect.

Quote:> How many iSCSI requests are outstanding at a time?

I don't know.  How can I determine that?

Quote:> How do those compare with the size of the FIFO on the GbE NIC?

I'm missing how I can get this information (ie. from ethtool).  How do I
determine the size of the queue?

Quote:> IIRC Overruns means that the inbound FIFO doing the "speed matching"
> between the network and the memory subsystem (shorthand for "everything
> between the NIC and the DRAMS :) filled - basically, it means the DMA to
> memory lost the race with the network.  Having said that I thought that
> all GbE NICs were store-and-forward, but I suppose some low-end ones
> might try to get-by with a FIFO rather than store-and-forward memory on
> the "NIC" itself.

Does it matter that this is a NIC on a system board?  Might that make it
more likely to be "low end"?

Quote:> And... I suppose that if the NIC memory for store-and-forward filled
> that _might_ be aggregated into an "Overflows" stat.  If ethtool can
> show some more specific stats that might be good.  Ethtool should be
> able to distinguish (assuming the driver does) between FIFO overflows
> and running-out of rx descriptors etc.

You're speaking of the stats from ethtool -S?  I've not tried checking
these.  I shall.

        - Andrew

 
 
 

Errors on direct GigE link

Post by Rick Jone » Wed, 29 Mar 2006 07:10:38





>>> The program is reading in 64KB chunks, so two should do little with
>>> respect to the initiator's 1G of RAN.

>> What size are the iSCSI requests?
> I'm not sure.  The initiator has configuration parameters which should
> control this, but I don't know how to confirm that these are being
> [properly] used.
> Assuming that they are, however: The latest testing I've done used a
> Maximum Burst Length of 131072 bytes and a Maximum Receive Data Segment of
> 65536 bytes.  I've been playing (decreasing) these values during testing
> to no apparent effect.
>> How many iSCSI requests are outstanding at a time?
> I don't know.  How can I determine that?

Heck if I know :)

Quote:>> How do those compare with the size of the FIFO on the GbE NIC?
> I'm missing how I can get this information (ie. from ethtool).  How do I
> determine the size of the queue?

I'm not sure if that is in the ethtool output.  It _may_ be part of
stuff spat to dmesg.  Otherwise, armed with the part information for
your GbE chip you may need to go to the vendor's website.

Quote:> Does it matter that this is a NIC on a system board?  Might that make it
> more likely to be "low end"?

Possibly.

Quote:>> And... I suppose that if the NIC memory for store-and-forward filled
>> that _might_ be aggregated into an "Overflows" stat.  If ethtool can
>> show some more specific stats that might be good.  Ethtool should be
>> able to distinguish (assuming the driver does) between FIFO overflows
>> and running-out of rx descriptors etc.
> You're speaking of the stats from ethtool -S?  I've not tried checking
> these.  I shall.

Can't hurt.

rick jones
--
No need to believe in either side, or any side. There is no cause.
There's only yourself. The belief is in your own precision.  - Jobert
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

 
 
 

Errors on direct GigE link

Post by Allen McIntos » Wed, 29 Mar 2006 12:34:31




>>> It's 2.6.9-34.ELsmp (RHEL 4).  I don't know about whether or not NAPI is
>>> on; how would I "ask"?
>> grep NAPI /boot/config-2.6.9-34.ELsmp (or whatever they named the config
>> file used to generate the kernel).

> CONFIG_E1000_NAPI=y

> The NIC I'm using uses the e1000 driver, so I'm going to guess that the
> answer is "yes".

Indeed.  NAPI is supposed to help get away from "one interrupt per
packet" mode.  Looks like it is already doing that for you.
 
 
 

1. 2.4.13-ac5 D-Link GigE and MSWin transfer timeout

   A while back I reported this problem, which seemed to be ignored.
   Almost a week ago I switch to a DGE500T instead of a DGE550T. All
problems went away.
   I tried 2 different DGT550Ts. So, possibly, I received 2 broken ones.
If not, then it sure appears there is a problem somewhere with this NIC
and the Samba transfers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. dosemu for 1.1.15

3. LINUX PPP : RX packet errors on direct ppp link

4. 2.5.69-bk1[23] kconfig loop

5. Re-directed mail to a program causes Sendmail to issue the error: 119, unknown mailer error.

6. Router/Firewall Setup?

7. SLIP/DIP on direct link?

8. RH 6.1 and USR 128K internal adapter.

9. vpn (direct link) .. Any Ideas

10. Q: how to make direct serial link 8-bit clean?

11. Can't use Xwrapper, but direct link works - Why?

12. SLIP/DIP Confusion/Telepath - Direct Internet Link

13. How to use seyon on direct serial link?