Network timeout and excessive collisions in rescue environment.

Network timeout and excessive collisions in rescue environment.

Post by Gary Taylo » Fri, 08 Nov 2002 13:16:44



Hi,

I'm working on my back up regimen and when I boot up in my rescue
environment the networking eventually dies and I am getting 100s
of thousands of collisions.  Over 500,000 when trying to do
network operations.  When I perform the same operation in my
Linux box native environment it works fine and I rarely get
collisions.

I have two Linux boxes.  The server has the tape drive and is a
Mandrake 2.2.17-21mdk kernel.  The client machine is a Red Hat
2.4.18-mywin4lin-6 kernel.  What I do is backup the client to the
server tape drive when I have the full blown Red Hat distribution
running and that works great.  Next I boot in the rescue
environment so I can try the restore.  The restore starts but
eventually it quits responding.  If I kill the backup process at
that point the client console works normal, it isn't locked up.
At that point the server will not respond to pings.

I tried booting in a rescue environment by using both Mindi
and/or Linux rescue from the first Red Hat CD.  I can transfer
some data when in the rescue environment but eventually the
operation simply stops.  When I tried to ftp in the rescue
environment, instead of seeing steady progress it seems to
transfer in bursts.

I know collisions are the result of two devices trying to talk at
the same time.  What I can't figure out is why this happens only
in the rescue environment.  Plus I don't know why there is
contention at all given that I only have two active devices.  I
don't know that the problem is the collisions per se or just that
that is the symptom I notice.

Both ethernet nics are 10/100 and the hub is 10/100. Both are
operating in 100Mb mode. The server nic is an Intel Pro 10/100.
The client is a VIA (on an Amptron MB) nic. The hub is Netgear.
The machines are connected by RJ-45 network cables that are less
than 5 ft long each.  There are a total of four machines on this
"network" with only these two actively being used.  The others
are plugged in and running but with no or minimal network
activity.  

I can rlogin to the server and that works but the response is
sluggish while doing the restore.  

I tried doing a ping while transferring a file and the results
look like this:

64 bytes from 192.168.1.1: icmp_seq=35 ttl=255 time=3080 ms
64 bytes from 192.168.1.1: icmp_seq=36 ttl=255 time=2081 ms
64 bytes from 192.168.1.1: icmp_seq=37 ttl=255 time=1081 ms
64 bytes from 192.168.1.1: icmp_seq=38 ttl=255 time=81.1 ms
64 bytes from 192.168.1.1: icmp_seq=39 ttl=255 time=4.05 ms
64 bytes from 192.168.1.1: icmp_seq=40 ttl=255 time=0.203 ms
64 bytes from 192.168.1.1: icmp_seq=41 ttl=255 time=0.215 ms
64 bytes from 192.168.1.1: icmp_seq=42 ttl=255 time=2066 ms
64 bytes from 192.168.1.1: icmp_seq=43 ttl=255 time=1065 ms
64 bytes from 192.168.1.1: icmp_seq=44 ttl=255 time=67.0 ms
64 bytes from 192.168.1.1: icmp_seq=45 ttl=255 time=2.57 ms
64 bytes from 192.168.1.1: icmp_seq=46 ttl=255 time=1.53 ms
64 bytes from 192.168.1.1: icmp_seq=47 ttl=255 time=0.348 ms
64 bytes from 192.168.1.1: icmp_seq=48 ttl=255 time=2.48 ms
64 bytes from 192.168.1.1: icmp_seq=49 ttl=255 time=0.316 ms

--- 192.168.1.1 ping statistics ---
63 packets transmitted, 61 received, 3% loss, time 62285ms
rtt min/avg/max/mdev = 0.185/915.554/3180.369/1045.451 ms, pipe 4

I manually configure my devices on the client and here is how my
eth0 device looks on the client after I set it up:

eth0      Link encap:Ethernet  HWaddr 00:07:95:44:4D:E8  
          inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:882 (882.0 b)  TX bytes:0 (0.0 b)
          Interrupt:5 Base address:0xdc00

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Here is the eth0 from the server.
eth0      Link encap:Ethernet  HWaddr 00:D0:B7:AF:2C:14  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1020171 errors:0 dropped:0 overruns:0 frame:1
          TX packets:1088683 errors:0 dropped:0 overruns:0 carrier:0
          collisions:575159 txqueuelen:100
          Interrupt:11 Base address:0xd800

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:90251 errors:0 dropped:0 overruns:0 frame:0
          TX packets:90251 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

I've tried running the diags from
http://www.scyld.com/diag/index.html and as far as I can tell
things are good.  

I haven't tried putting another nic into one of the available PCI
slots, mainly because it involves futzing with an otherwise
working hardware and kernel configuration.  

I would take any advice or troubleshooting suggestions.

Thanks,
Gary


 
 
 

Network timeout and excessive collisions in rescue environment.

Post by Greg DeFreita » Sat, 09 Nov 2002 09:25:23



> Hi,

> I'm working on my back up regimen and when I boot up in my rescue
> environment the networking eventually dies and I am getting 100s
> of thousands of collisions.  Over 500,000 when trying to do
> network operations.  When I perform the same operation in my
> Linux box native environment it works fine and I rarely get
> collisions.

Hmmm,
        What about those times when you do it from install media ?
Using a "stock" kernel such as the dist's "rescue" option should let you
try your ideas, you could boot from fd/cd and give it a go.
Save stuff like typescript output and command history  to fd for later
preparation of scripts (yeah, I'm _that_ lazy ;-)

> Thanks,
> Gary



You're welcome, HTH.

 
 
 

1. Excessive collision problems

Chris,

sounds like a classic problem with half/full duplex mix.  Can you tell from
your BayStack what the duplexing is set to?  What about on the card?

The "tulip" drivers have had their share of problems with certain chipsets.
What exact card do you have?


2. Network Backup Info

3. Excessive Collisions with Asante BNC Hubs

4. Help Me Upgrade X to 3.3.5 with RAGE LT PRO support

5. Excessive Collisions Reported - What's "tdr" ?

6. How to copy to DOS floppies on a SPARC ??

7. excessive/bizarre collision rate

8. 2.4.20-pre2 i2c updates

9. Excessive Collisions

10. ppp timeout excessive

11. pppd dial ins and , "LCP: timeout sending Config-Requests"

12. SE toolkit collision stats vs. wire collision stats

13. Matrox Mystique ands X.