no ping response but telnet works

no ping response but telnet works

Post by Thomas Schreibe » Thu, 01 Jun 2000 04:00:00



Strange things are happening in our network. The most irritating
observation is this:

host1 (Solaris 2.5.1), host2 (Solaris 2.6) and host3 (Zyxcel Router) are
in the same subnet:

on host1: 'ping host3' returns a 'no answer from host3'. But 'telnet
host3' returns a proper telnet prompt
on host2: tcpdump shows both the echo request from host1 to host3 and
and the echo reply from host3 to host1
Even on host1 itself (the host that does not succeed with the ping)
tcpdump shows the ping echo reply packet.

What could be the reason for this behaviour?

Thomas

 
 
 

no ping response but telnet works

Post by Barry Margoli » Thu, 01 Jun 2000 04:00:00




Quote:>Strange things are happening in our network. The most irritating
>observation is this:

>host1 (Solaris 2.5.1), host2 (Solaris 2.6) and host3 (Zyxcel Router) are
>in the same subnet:

>on host1: 'ping host3' returns a 'no answer from host3'. But 'telnet
>host3' returns a proper telnet prompt
>on host2: tcpdump shows both the echo request from host1 to host3 and
>and the echo reply from host3 to host1
>Even on host1 itself (the host that does not succeed with the ping)
>tcpdump shows the ping echo reply packet.

>What could be the reason for this behaviour?

Does the data portion of the Echo Reply match the data in the Echo Request?
Ping puts sequence numbers in the data portion so that it can determine
which replies go with which requests.  If the router isn't copying the data
from the request to the reply (as required by the ICMP specification) then
this could have the results you're seeing.

You can use the -x option to tcpdump to see a full hex dump of the packets.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

no ping response but telnet works

Post by James Carlso » Thu, 01 Jun 2000 04:00:00



> on host1: 'ping host3' returns a 'no answer from host3'. But 'telnet
> host3' returns a proper telnet prompt
> on host2: tcpdump shows both the echo request from host1 to host3 and
> and the echo reply from host3 to host1
> Even on host1 itself (the host that does not succeed with the ping)
> tcpdump shows the ping echo reply packet.

Instead of a description, could you post actual logs from tcpdump (one
for the ping, and another for the telnet set-up)?  Even better,
include logs with tcpdump flags "-s 1500 -nex".

What does "netstat -s" say on host1 before and after the ping?  Any
errors going up?  How about "netstat -ni"?

At a guess, someone has a marginal ICMP implementation ...

--

SUN Microsystems / 1 Network Drive         71.234W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.497N   Fax +1 781 442 1677
"PPP Design and Debugging" --- http://people.ne.mediaone.net/carlson/ppp

 
 
 

no ping response but telnet works

Post by Thomas Schreibe » Thu, 01 Jun 2000 04:00:00


You are right, the sequence number of the reply packet is 0000 instead
of 0004 (btw. it is part of the header not the data).

OK, so this is a bug in the Zyxcel Router but should have nothing to do
with the actual network problem we have. This is our main problem:

From time to time (almost always when we work on the wire i.e. plug off
a host or plug in a new one) in our Class C net it happens that some
hosts in the same subnet become unreachable. I.e. a ping or telnet or
whatever from host1 to host2 succeeds before the change but does no
longer succeed after the change. We oberved that sometimes one or more
of these actions help to settle the situation but it is not
reproducable:

- reset the hub
- reboot a machine (not necessarily the machine host1 or host2)
- switch off/on a CISCO 760 ISDN Router and do a reboot

A second observation we make:

The CISCO router is reachable from most hosts in the same subnet. When
we switch off the CISCO and then on again and then reboot it, the router
becomes reachable from those few hosts it was not reachable before but
is no longer reachable from the other hosts it was reachable before.
This also happens after we plug in into the same subnet a new host for
example. Here again some nondeterminism is at work: sometimes a switch
on without a reboot is sufficient, sometimes not. sometimes an
additional reset of the hub is necessary. sometimes we have to wait for
some time (10 minutes) until it becomes reachable at all. Also these two
set of hosts are not static, i.e. a host that was not able to reach the
CISCO before the event do not reach it after the event but perhaps a few
trys later.

We suspect:
- broken network card
- bad hub (but we already changed it with another one of the same type -
both cheap ones)
- broken CISCO

Can anybody give us a hint what it actually may be or how we can test it
further? Are there Network Monitors that may give hep us trace it down?

The net contains 10BaseT (most), 100BaseT and 3 hosts with BNC linked
with 2 ATI Hubs and 1 very cheap "level one" hub.

Thomas

Barry Margolin schrieb:



> >Strange things are happening in our network. The most irritating
> >observation is this:

> >host1 (Solaris 2.5.1), host2 (Solaris 2.6) and host3 (Zyxcel Router) are
> >in the same subnet:

> >on host1: 'ping host3' returns a 'no answer from host3'. But 'telnet
> >host3' returns a proper telnet prompt
> >on host2: tcpdump shows both the echo request from host1 to host3 and
> >and the echo reply from host3 to host1
> >Even on host1 itself (the host that does not succeed with the ping)
> >tcpdump shows the ping echo reply packet.

> >What could be the reason for this behaviour?

> Does the data portion of the Echo Reply match the data in the Echo Request?
> Ping puts sequence numbers in the data portion so that it can determine
> which replies go with which requests.  If the router isn't copying the data
> from the request to the reply (as required by the ICMP specification) then
> this could have the results you're seeing.

> You can use the -x option to tcpdump to see a full hex dump of the packets.

> --

> Genuity, Burlington, MA
> *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
> Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

no ping response but telnet works

Post by mag.. » Fri, 02 Jun 2000 04:00:00




[...]

Quote:> A second observation we make:

> The CISCO router is reachable from most hosts in the same subnet. When
> we switch off the CISCO and then on again and then reboot it, the router
> becomes reachable from those few hosts it was not reachable before but
> is no longer reachable from the other hosts it was reachable before.
> This also happens after we plug in into the same subnet a new host for
> example. Here again some nondeterminism is at work: sometimes a switch
> on without a reboot is sufficient, sometimes not. sometimes an
> additional reset of the hub is necessary. sometimes we have to wait for
> some time (10 minutes) until it becomes reachable at all. Also these two
> set of hosts are not static, i.e. a host that was not able to reach the
> CISCO before the event do not reach it after the event but perhaps a few
> trys later.

> We suspect:
> - broken network card
> - bad hub (but we already changed it with another one of the same type -
> both cheap ones)
> - broken CISCO

> Can anybody give us a hint what it actually may be or how we can test it
> further? Are there Network Monitors that may give hep us trace it down?

> The net contains 10BaseT (most), 100BaseT and 3 hosts with BNC linked
> with 2 ATI Hubs and 1 very cheap "level one" hub.

I may know why your 760 is causing you trouble. For a while Cisco was
selling the 760's with two different sets of software images. The key
difference in the two images was that one allowed you to have a limited
number of users accessing the router and the other allowed you to have
as many as was practical. [I believe they did this by limiting the number
of entries possible in the arp cache, but it's been a while.]

This caused us endless joy because a) Cisco has never been terribly good
at putting the right software image on a router and b) Cisco's naming
convention for software images has never been terribly clear (at least
to me).

While I think this is a distinct possibility, I am skeptical that it
relates to the rest of your problems.
--Michael

 
 
 

no ping response but telnet works

Post by t.. » Wed, 28 Jun 2000 04:00:00






> [...]

> > A second observation we make:

> > The CISCO router is reachable from most hosts in the same subnet.
When
> > we switch off the CISCO and then on again and then reboot it, the
router
> > becomes reachable from those few hosts it was not reachable before
but
> > is no longer reachable from the other hosts it was reachable before.
> > This also happens after we plug in into the same subnet a new host
for
> > example. Here again some nondeterminism is at work: sometimes a
switch
> > on without a reboot is sufficient, sometimes not. sometimes an
> > additional reset of the hub is necessary. sometimes we have to wait
for
> > some time (10 minutes) until it becomes reachable at all. Also these
two
> > set of hosts are not static, i.e. a host that was not able to reach
the
> > CISCO before the event do not reach it after the event but perhaps a
few
> > trys later.

> > We suspect:
> > - broken network card
> > - bad hub (but we already changed it with another one of the same
type -
> > both cheap ones)
> > - broken CISCO

> > Can anybody give us a hint what it actually may be or how we can
test it
> > further? Are there Network Monitors that may give hep us trace it
down?

> > The net contains 10BaseT (most), 100BaseT and 3 hosts with BNC
linked
> > with 2 ATI Hubs and 1 very cheap "level one" hub.

> I may know why your 760 is causing you trouble. For a while Cisco was
> selling the 760's with two different sets of software images. The key
> difference in the two images was that one allowed you to have a
limited
> number of users accessing the router and the other allowed you to have
> as many as was practical. [I believe they did this by limiting the
number
> of entries possible in the arp cache, but it's been a while.]

> This caused us endless joy because a) Cisco has never been terribly
good
> at putting the right software image on a router and b) Cisco's naming
> convention for software images has never been terribly clear (at least
> to me).

> While I think this is a distinct possibility, I am skeptical that it
> relates to the rest of your problems.
> --Michael

Michael,

now that we have lived with no more network errors for the last weeks we
can conclude that in fact the CISCO firmware was responsible. The
upgrade to the newest version solved the problems that caused endless
hours of trouble.

Thanks a lot for this hint.

Thomas

Sent via Deja.com http://www.deja.com/
Before you buy.