100% packet loss with new router - help!

100% packet loss with new router - help!

Post by Terry Sike » Sun, 23 Nov 2003 06:01:32



Hi all. I'm having a problem which I've looked into for a couple hours,
and which has been bounced off three sysadmins, one of which is a
certified Solaris admin. I'm fairly competent with Unix admin in
general, but I'm certainly not a guru.

The situation is this. At our old location we had a cable modem,
connected to a router. It's address was 192.168.2.1, netmask
255.255.255.0. We had to leave the router at the old location, so we
picked up a new one for this office. I'm 99.9% sure it is configured
just as the old one was. I also have a Linux development box, which had
a static IP at the old location (192.168.2.32). It came right up, saw
the Internet and is generally a happy camper. We also hooked a Windows
box up and it was able to get a DHCP IP address, could see the internet,
and ping the Linux box. So, the network seems generally healthy.

However, when the Solaris box (SunOS 5.8) booted, and I tried to ping it
at it's static IP address (192.168.2.107) as I usually do to see if it's
up yet, I never got anything except "Destination Host Unreachable" and a
100% packet loss message after ^C. I have set DHCP on the router to
start handing out addresses at 192.168.2.110, so that's not a conflict
(and the router's DHCP table never showed the 107 address being leased
regardless).

The only other thing I've been able to ferret out is that if I run
tcpdump on the eth0 device on the Linux box, and then ping the Solaris
box, tcpdump reports:

15:36:25.159509 arp who-has sisus01 tell 192.168.2.32

Now the weird part about that is "sisus01" is not present anywhere
except on the Solaris box. Also, when the Linux box pings, the light for
the Solaris box connection also blinks as though the router knows the
systems is 192.168.2.107. I'm out of ideas at this point.

This is a showstopper for us right now, I could REALLY use some help!

Thanks very much!

Terry

 
 
 

100% packet loss with new router - help!

Post by Terry Sike » Sun, 23 Nov 2003 06:46:26



> Now the weird part about that is "sisus01" is not present anywhere
> except on the Solaris box.

My mistake "sisus01" WAS in the hosts file on the Linux box...so we most
likely aren't getting any data back.

Thanks again for any help,

Terry

 
 
 

100% packet loss with new router - help!

Post by Gary Armstron » Sun, 23 Nov 2003 06:54:10



> Hi all. I'm having a problem which I've looked into for a couple hours,
> and which has been bounced off three sysadmins, one of which is a
> certified Solaris admin. I'm fairly competent with Unix admin in
> general, but I'm certainly not a guru.

> The situation is this. At our old location we had a cable modem,
> connected to a router. It's address was 192.168.2.1, netmask
> 255.255.255.0. We had to leave the router at the old location, so we
> picked up a new one for this office. I'm 99.9% sure it is configured
> just as the old one was. I also have a Linux development box, which had
> a static IP at the old location (192.168.2.32). It came right up, saw
> the Internet and is generally a happy camper. We also hooked a Windows
> box up and it was able to get a DHCP IP address, could see the internet,
> and ping the Linux box. So, the network seems generally healthy.

> However, when the Solaris box (SunOS 5.8) booted, and I tried to ping it
> at it's static IP address (192.168.2.107) as I usually do to see if it's
> up yet, I never got anything except "Destination Host Unreachable" and a
> 100% packet loss message after ^C. I have set DHCP on the router to
> start handing out addresses at 192.168.2.110, so that's not a conflict
> (and the router's DHCP table never showed the 107 address being leased
> regardless).

How is the Sun getting it's host info? NIS, files, DNS?

- Show quoted text -

> The only other thing I've been able to ferret out is that if I run
> tcpdump on the eth0 device on the Linux box, and then ping the Solaris
> box, tcpdump reports:

> 15:36:25.159509 arp who-has sisus01 tell 192.168.2.32

> Now the weird part about that is "sisus01" is not present anywhere
> except on the Solaris box. Also, when the Linux box pings, the light for
> the Solaris box connection also blinks as though the router knows the
> systems is 192.168.2.107. I'm out of ideas at this point.

> This is a showstopper for us right now, I could REALLY use some help!

> Thanks very much!

> Terry

 
 
 

100% packet loss with new router - help!

Post by Terry Sike » Sun, 23 Nov 2003 06:57:22




>> Hi all. I'm having a problem which I've looked into for a couple
>> hours, and which has been bounced off three sysadmins, one of which is
>> a certified Solaris admin. I'm fairly competent with Unix admin in
>> general, but I'm certainly not a guru.

>> The situation is this. At our old location we had a cable modem,
>> connected to a router. It's address was 192.168.2.1, netmask
>> 255.255.255.0. We had to leave the router at the old location, so we
>> picked up a new one for this office. I'm 99.9% sure it is configured
>> just as the old one was. I also have a Linux development box, which
>> had a static IP at the old location (192.168.2.32). It came right up,
>> saw the Internet and is generally a happy camper. We also hooked a
>> Windows box up and it was able to get a DHCP IP address, could see the
>> internet, and ping the Linux box. So, the network seems generally
>> healthy.

>> However, when the Solaris box (SunOS 5.8) booted, and I tried to ping
>> it at it's static IP address (192.168.2.107) as I usually do to see if
>> it's up yet, I never got anything except "Destination Host
>> Unreachable" and a 100% packet loss message after ^C. I have set DHCP
>> on the router to start handing out addresses at 192.168.2.110, so
>> that's not a conflict (and the router's DHCP table never showed the
>> 107 address being leased regardless).

> How is the Sun getting it's host info? NIS, files, DNS?

It's using the file (the name escapes me at the moment, equivalent of
/etc/hosts). DNS was never set up properly, that's a task for the final
site location in a few weeks...at least that's when I'd prefer to do it. :-)

Terry

 
 
 

100% packet loss with new router - help!

Post by Gary Armstron » Sun, 23 Nov 2003 07:07:24





>>> Hi all. I'm having a problem which I've looked into for a couple
>>> hours, and which has been bounced off three sysadmins, one of which
>>> is a certified Solaris admin. I'm fairly competent with Unix admin in
>>> general, but I'm certainly not a guru.

>>> The situation is this. At our old location we had a cable modem,
>>> connected to a router. It's address was 192.168.2.1, netmask
>>> 255.255.255.0. We had to leave the router at the old location, so we
>>> picked up a new one for this office. I'm 99.9% sure it is configured
>>> just as the old one was. I also have a Linux development box, which
>>> had a static IP at the old location (192.168.2.32). It came right up,
>>> saw the Internet and is generally a happy camper. We also hooked a
>>> Windows box up and it was able to get a DHCP IP address, could see
>>> the internet, and ping the Linux box. So, the network seems generally
>>> healthy.

>>> However, when the Solaris box (SunOS 5.8) booted, and I tried to ping
>>> it at it's static IP address (192.168.2.107) as I usually do to see
>>> if it's up yet, I never got anything except "Destination Host
>>> Unreachable" and a 100% packet loss message after ^C. I have set DHCP
>>> on the router to start handing out addresses at 192.168.2.110, so
>>> that's not a conflict (and the router's DHCP table never showed the
>>> 107 address being leased regardless).

>> How is the Sun getting it's host info? NIS, files, DNS?

> It's using the file (the name escapes me at the moment, equivalent of
> /etc/hosts). DNS was never set up properly, that's a task for the final
> site location in a few weeks...at least that's when I'd prefer to do it.
> :-)

> Terry

reply with:
grep host /etc/nsswitch.conf
cat /etc/resolv.conf
cat /etc/hosts or /etc/inet/hosts they are usually linked
 
 
 

100% packet loss with new router - help!

Post by Logan Sha » Sun, 23 Nov 2003 07:19:30



> The only other thing I've been able to ferret out is that if I run
> tcpdump on the eth0 device on the Linux box, and then ping the Solaris
> box, tcpdump reports:

> 15:36:25.159509 arp who-has sisus01 tell 192.168.2.32

I guess sisus01 corresponds to the Solaris machine's IP address?

What appears to be happening here is that your Linux machine needs
to know the Ethernet address of the Solaris box, so it is sending
out an ARP request.  It isn't getting an answer.

This means one of the following:

(1)  You have a broken switch that switches traffic wrong.
(2)  You have a physical problem with the connection for the Solaris
      system.
(3)  Your Solaris system doesn't believe its address is whatever
      address correponds to sisus01.
(4)  Your Solaris machine has a hardware problem.

I would check the physical connections first.  Make sure you have
a link light on BOTH ends of the link between the Solaris machine
and the switch.  Make sure that you have not plugged the Solaris
machine into an uplink port on the switch.  Try a different
Ethernet cable.

Then, go to the Solaris machine and type "ifconfig -a" to see what
it thinks its IP address is and that the network interface is "up".
Do a "netstat -nr" to ensure that it really has a route to the
192.168.2.X network.  If it looks like it has the right address, do
a "snoop -r" as root and see if you see any traffic going by.  Do
the ping from the Linux machine and see if that causes any traffic.
Try a ping from the Solaris machine to the router's IP address and
see if that gets a reply.

By the way, you say when do a ping from the Linux machine to the
Solaris machine, you see a light blinking.  (I guess you mean
on the switch[1].)  This doesn't mean a whole lot.  When an
ARP request goes out, it is broadcast to all machines on the
ethernet because the whole point of an ARP request is that you
don't know which ethernet address corresponds to the IP address
you want to speak to.  So you must do an ethernet broadcast.
It could also mean that "ping"'s icmp packet is making to that
port on the switch.  Even so, that doesn't rule out a cabling
problem.  The two directions of signalling in a twisted-pair
ethernet network are independent, so it's possible to have the
correct electrical connection for sending but not receiving
and vice versa.

If that doesn't help you, you'll probably need to provide some
more information.

Hope that helps.

   - Logan

[1]  Your device may be a combination router and switch.  I'm just
      using whatever noun applies functionally within the context.

 
 
 

100% packet loss with new router - help!

Post by Terry Sike » Sun, 23 Nov 2003 07:21:19



> reply with:
> grep host /etc/nsswitch.conf
> cat /etc/resolv.conf
> cat /etc/hosts or /etc/inet/hosts they are usually linked

I sure will as soon as I can. We should have a serial cable to get into
the headless Sun box any minute now.

Thanks VERY much for your help!

Terry

 
 
 

100% packet loss with new router - help!

Post by Terry Sike » Sun, 23 Nov 2003 07:37:19




>> The only other thing I've been able to ferret out is that if I run
>> tcpdump on the eth0 device on the Linux box, and then ping the Solaris
>> box, tcpdump reports:

>> 15:36:25.159509 arp who-has sisus01 tell 192.168.2.32

> I guess sisus01 corresponds to the Solaris machine's IP address?

> What appears to be happening here is that your Linux machine needs
> to know the Ethernet address of the Solaris box, so it is sending
> out an ARP request.  It isn't getting an answer.

> This means one of the following:

> (1)  You have a broken switch that switches traffic wrong.
> (2)  You have a physical problem with the connection for the Solaris
>      system.
> (3)  Your Solaris system doesn't believe its address is whatever
>      address correponds to sisus01.
> (4)  Your Solaris machine has a hardware problem.

> I would check the physical connections first.  Make sure you have
> a link light on BOTH ends of the link between the Solaris machine
> and the switch.  Make sure that you have not plugged the Solaris
> machine into an uplink port on the switch.  Try a different
> Ethernet cable.

> Then, go to the Solaris machine and type "ifconfig -a" to see what
> it thinks its IP address is and that the network interface is "up".
> Do a "netstat -nr" to ensure that it really has a route to the
> 192.168.2.X network.  If it looks like it has the right address, do
> a "snoop -r" as root and see if you see any traffic going by.  Do
> the ping from the Linux machine and see if that causes any traffic.
> Try a ping from the Solaris machine to the router's IP address and
> see if that gets a reply.

> By the way, you say when do a ping from the Linux machine to the
> Solaris machine, you see a light blinking.  (I guess you mean
> on the switch[1].)  This doesn't mean a whole lot.  When an
> ARP request goes out, it is broadcast to all machines on the
> ethernet because the whole point of an ARP request is that you
> don't know which ethernet address corresponds to the IP address
> you want to speak to.  So you must do an ethernet broadcast.
> It could also mean that "ping"'s icmp packet is making to that
> port on the switch.  Even so, that doesn't rule out a cabling
> problem.  The two directions of signalling in a twisted-pair
> ethernet network are independent, so it's possible to have the
> correct electrical connection for sending but not receiving
> and vice versa.

> If that doesn't help you, you'll probably need to provide some
> more information.

> Hope that helps.

>   - Logan

> [1]  Your device may be a combination router and switch.  I'm just
>      using whatever noun applies functionally within the context.

Thanks very much, that all looks very useful. We've had a setback
getting a suitable cable so it may be a couple of hours, but as soon as
I can I'll provide all the information that's been requested.

You guys are lifesavers! (Although I'm suspecting a hardware issue at
this point).

Terry

 
 
 

100% packet loss with new router - help!

Post by Lon Stowel » Sun, 23 Nov 2003 08:18:29


Approximately 11/21/03 13:01, Terry Sikes uttered for posterity:

Quote:> Hi all. I'm having a problem which I've looked into for a couple hours,
> and which has been bounced off three sysadmins, one of which is a
> certified Solaris admin. I'm fairly competent with Unix admin in
> general, but I'm certainly not a guru.

> The situation is this. At our old location we had a cable modem,
> connected to a router. It's address was 192.168.2.1, netmask
> 255.255.255.0. We had to leave the router at the old location, so we
> picked up a new one for this office. I'm 99.9% sure it is configured
> just as the old one was. I also have a Linux development box, which had
> a static IP at the old location (192.168.2.32). It came right up, saw
> the Internet and is generally a happy camper. We also hooked a Windows
> box up and it was able to get a DHCP IP address, could see the internet,
> and ping the Linux box. So, the network seems generally healthy.

  I would guess that you've been bitten by the 0.1% chance that your
  router is misconfigured.

  Typical cause for ICMP Host Unreachable is that a *router* has
  received a packet for a host and is attempting to ARP for that
  host and gets nada.  The router will generate the Host Unreachable.

  This is useful information, as the packet shouldn't have gotten
  within 100 yards of the router in the first place...

  You don't identify which particular box you are sending the
  ping from, as well as the IP address and netmask of that box
  and what it may have for a default router.  Or even whether or
  not the box doing the pinging is on the local subnet or is
  behind the router... all critical information here.

  If you send a ping on anything in the 192.168.2.* subnet,
  it should never reach the router and if it does, the sending
  station has a bad netmask.

Quote:

> However, when the Solaris box (SunOS 5.8) booted, and I tried to ping it
> at it's static IP address (192.168.2.107) as I usually do to see if it's
> up yet, I never got anything except "Destination Host Unreachable" and a
> 100% packet loss message after ^C. I have set DHCP on the router to
> start handing out addresses at 192.168.2.110, so that's not a conflict
> (and the router's DHCP table never showed the 107 address being leased
> regardless).

  Y'all need to check this from a station on the Solaris subnet.
  If your pinging box is allegedly on that same subnet, then that
  pinging host has a bad netmask.

Quote:

> The only other thing I've been able to ferret out is that if I run
> tcpdump on the eth0 device on the Linux box, and then ping the Solaris
> box, tcpdump reports:

> 15:36:25.159509 arp who-has sisus01 tell 192.168.2.32

  Sure would be useful information to know what "sisus01" hostname
  resolves to in terms of IP address.   If it is 192.168.2.107
  then the Sun box is either not receiving the arp request or
  is not responding or *it* may have a bad netmask.

  ifconfig -a.

  Or better yet, run tcpdump without hostname resolution onn this
  Linux box and run snoop on the Sun at the same time.

  If the Sun sees the ARP request, it should respond which it will
  do only if it agrees that the IP address is a local one.  Use
  the mac addresses in the snoop and tcpdump to see exactly who
  is speaking and being spoken to.

  If that Sun has multiple net interfaces, make sure they all
  have unique MAC addresses, otherwise you'll drive your switch nuts.

  While you are pinging on the Linux, also do a ping from the Solaris
  to that same Linux.... with snoop/tcpdump running.

Quote:

> Now the weird part about that is "sisus01" is not present anywhere
> except on the Solaris box. Also, when the Linux box pings, the light for
> the Solaris box connection also blinks as though the router knows the
> systems is 192.168.2.107. I'm out of ideas at this point.

  snoop solaris.

  And I have no idea what you mean by "the light for the Solaris
  box connection", as a ping from Linux to Solaris should not
  be routed at all...they are on the same subnet.   Check the light
  on the Sun NIC.

  With both boxen on the same subnet:

     Ping from Linux to SUN.

      With tcpdump, you should see the ARP going to the MAC
      broadcast address with the IP address of the Sun in the
      ARP data field.

      With snoop, you should see this ARP arrive on the Sun,
      again with a broadcast MAC address.

      If from the Linux, you see a ping to the sun with the
      router's MAC address, the Linux has a bad netmask.

      On the Sun, if that IP address is plumbed and alive
      [see output of netstat -in] the Sun should answer.

      The Sun should send the ARP reply to the MAC of the
      Linux.

      If the Sun sends an ARP for the address of the router, it
      has a bad netmask.  Ditto if it sends anything to the
      MAC address of the router.

> This is a showstopper for us right now, I could REALLY use some help!


  This could be as trivial as the Sun interface being left to
  autonegotiate and it is failing.  In that case, ARP will only
  work intermittently.

Quote:

> Thanks very much!

> Terry

--
Still a Raiders fan, but no longer sure why.
 
 
 

100% packet loss with new router - help!

Post by Lon Stowel » Sun, 23 Nov 2003 08:20:24


Approximately 11/21/03 13:46, Terry Sikes uttered for posterity:


>> Now the weird part about that is "sisus01" is not present anywhere
>> except on the Solaris box.

> My mistake "sisus01" WAS in the hosts file on the Linux box...so we most
> likely aren't getting any data back.

  Had to have been.  If that hostname were unknown to Linux, it
  couldn't have put that name in the tcpdump output unless the
  Sun box were a NIS or DNS server, in which case the ping
  would have worked.

  Currently you don't know if the pings are arriving at the
  Sun... use snoop.

--
Still a Raiders fan, but no longer sure why.

 
 
 

100% packet loss with new router - help!

Post by Lon Stowel » Sun, 23 Nov 2003 08:22:01


Approximately 11/21/03 13:57, Terry Sikes uttered for posterity:


>> How is the Sun getting it's host info? NIS, files, DNS?

> It's using the file (the name escapes me at the moment, equivalent of
> /etc/hosts). DNS was never set up properly, that's a task for the final
> site location in a few weeks...at least that's when I'd prefer to do it. :-)

> Terry

  The file is /etc/inet/hosts which is symlink'd as /etc/hosts

  Then you need to make sure that /etc/nsswitch.conf has
  "files" as the first entry for the "hosts" line.

  Best to totally avoid using names in pings at the stage you
  are now... use nothing but IP addresses until those all work.

--
Still a Raiders fan, but no longer sure why.

 
 
 

100% packet loss with new router - help!

Post by Twirli » Mon, 24 Nov 2003 03:28:09


Hi again all. Well, it looks like it was bad router hardware. We
replaced it and everything began working beautifully. I have neither
time nor inclination to do any analysis on what exactly was going wrong,
but that was a whacky problem!

Thanks again for all the helpful suggestions and potential solutions!

Terry