Token Ring Lockups

Token Ring Lockups

Post by Christopher A. Smi » Sat, 20 Jul 1996 04:00:00



Ever since we started using Linux systems at work on our Token Ring
network, we've been occasionally plagued with what I've come to call
"token ring lockups."  When my Linux system, running 2.0.0 with an IBM
16/4 ISA Token Ring adaptor, is doing something very network-intensive,
such as multi-megabyte NFS reading or writing, it's not uncommon for the
following to appear in the syslog:

kernel: tr0: Line errors 01, Internal errors 00, Burst errors 00
kernel: A/C errors 00, Abort delimiters 00, Lost frames 00
kernel: Receive congestion count 00, Frame copied errors 00
kernel: Frequency errors FF, Token errors 00
kernel: tr0: unknown command in asb 08
kernel: tr0: ASB not free !!!
kernel: tr0: Arrg. Transmitter busy for more than 50 msec. Donald resets adapter, but resetting
kernel: the IBM tokenring adapter takes a long time. It might not even help when the
kernel: ring is very busy, so we just wait a little longer and hope for the best.

Once this happens, all network traffic is dead.  The only way to get out
of the deadlock is to either reboot the machine or -- just discovered this
today and have only done it once -- take tr0 down with ifconfig and bring
it back up.  It has never recovered by itself.

This has been happening with increasing regularity here -- once a day, on
average, and sometimes as many as three or four.  I've no doubt that our
token ring network is atrocious...

If anyone has any thoughts -- kernel upgrade, hardware tweaking, etc. --
that might help ease this a bit, I'd appreciate it!

                 --------------------------------------
                          Christopher A. Smith

 
 
 

Token Ring Lockups

Post by Mike Eckhof » Sat, 20 Jul 1996 04:00:00



Quote:> I now want to try to use Linux as our primary DNS platform on
> Chubb's WAN.  It's disturbing that Christopher has the problem in 2.0.0
> -
> maybe I should rethink my decision to use it here in production??

All I can say is to try it.  If you have another machine on the ring that
is not complaining, chances are, that another won't either.  We use linux
for everythign that we do -- dns/tacacs/web/etc.  We don't have a
problem.  However, I have seen the problem before and know that it does
exist.  I have also noticed that the ring doesn't HAVE to be busy for
this error to pop up.

Quote:> I could write a stupid script to check /var/adm/messages for the
> errors and then run ifconfig when necessary, but that's a horribly
> sloppy solution.  If Microsoft can get this to work, surely we can!
> Why don't commerical operating systems ever complain that the
> transmitter
> is busy?  What aren't we doing right?

From what I gather from reading the code, the timing for the token ring
driver was just modified from the ethernet drivers.  This in itself could
be the problem, in my book.  I do not have any info on the internals of
token ring -- just a few error desc books, etc.  If anyone has this
information, it would most likely lead to a solution.

Good luck.

-me

+---------------------------------------+-------------------------------+
|         Mike Eckhoff - KB0TPT         |      Technology Director      |

|                                       +-------------------------------+
|  "Yo!  Ding dong man.  Ding dong!     |    Work:  (402) 375-3150      |
|       Ding dong Yo!"   -Wierd Al      |     Fax:  (402) 375-5251      |
+---------------------------------------+-------------------------------+

 
 
 

Token Ring Lockups

Post by Mike Eckho » Sat, 20 Jul 1996 04:00:00



: kernel: tr0: Arrg. Transmitter busy for more than 50 msec. Donald resets adapter, but resetting
: kernel: the IBM tokenring adapter takes a long time. It might not even help when the
: kernel: ring is very busy, so we just wait a little longer and hope for the best.

I would suggest increasing the busy interval in
/usr/src/linux/drivers/net/ibmtr.h.   The higher this value, the longer
it will wait before it declares the transmitter busy.  Some people have
had success with this (I threw it in around 1.3.72 or so, I think, since it
worked), others just end up creating more errors by setting it too high.  
It seems like it all depends on how busy your network is.

good luck!

--
+---------------------------------------+-------------------------------+
|         Mike Eckhoff - KB0TPT         |      Technology Director      |

|                                       +-------------------------------+
|  "Yo!  Ding dong man.  Ding dong!     |    Work:  (402) 375-3150      |
|       Ding dong Yo!"   -Wierd Al      |     Fax:  (402) 375-5251      |
+---------------------------------------+-------------------------------+

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Tue, 23 Jul 1996 04:00:00



Quote:>I had lot of trouble with this here at Chubb.  No matter how long I
>set the timeout, it would still time out.  It is the sole reason
>that I couldn't justify running anything critical on Linux on
>Token Ring (and all of Chubb is Token Ring).  However, since I've
>upgraded to kernel 2.0.1, I haven't seen that error yet - and I've
>realllllly tried to break it.

The token ring driver code hasn't changed for a while - since about
1.3.84, I think. Maybe something else changed that fixed the problem for
most people.

Quote:

>I now want to try to use Linux as our primary DNS platform on
>Chubb's WAN.  It's disturbing that Christopher has the problem in 2.0.0
>-
>maybe I should rethink my decision to use it here in production??

>I could write a stupid script to check /var/adm/messages for the
>errors and then run ifconfig when necessary, but that's a horribly
>sloppy solution.  If Microsoft can get this to work, surely we can!
>Why don't commerical operating systems ever complain that the
>transmitter
>is busy?  What aren't we doing right?

Eh, when I tried ifconfig with the Arrg! problem, half the time ifconfig
would hang without doing anything. A better solution might be to have the
driver reset the adapter if it is busy too long, since the alternative
seems to be rebooting the machine.

When I was able to recreate the Arrg! problem at will I restructured the
driver code somewhat and added some checks to see if the adapter had
any outstanding work pending for the driver to handle and, if so, try
to handle it. This was somewhat successful and might be the best way
to fix this. I'll try and fire up a 1.3.xx system on my home network
and see if I can still recreate the problem.

>--

>http://www.chubb.com/  The Chubb Corporation

--
Paul P. Norton                  

 
 
 

Token Ring Lockups

Post by jh.. » Tue, 30 Jul 1996 04:00:00


I'm having the same problem as the original post. The adapter opens
fine when booting up...I can ping different sites, etc.  And at some
point it decides to give up the ghost.  It looks like it is discarding
a lot of packets before bombing (a clue?).  

I've noticed the problem is only occuring in the desktop machine
(Value Point, IBM 16/4 Token Ring Card (8-bit)).  My laptop's PCMCIA
Token Ring has no problems at all.  The difference between the two
kernels is that the PCMCIA Token Ring kernel doesn't have the Tropic
Chipset stuff compiled into it (another clue?).

I've checked for interupt, etc. conflicts already. Anybody have any
idea how to proceed?

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Wed, 31 Jul 1996 04:00:00



>I'm having the same problem as the original post. The adapter opens
>fine when booting up...I can ping different sites, etc.  And at some
>point it decides to give up the ghost.  It looks like it is discarding
>a lot of packets before bombing (a clue?).  

>I've noticed the problem is only occuring in the desktop machine
>(Value Point, IBM 16/4 Token Ring Card (8-bit)).  My laptop's PCMCIA
>Token Ring has no problems at all.  The difference between the two
>kernels is that the PCMCIA Token Ring kernel doesn't have the Tropic
>Chipset stuff compiled into it (another clue?).

>I've checked for interupt, etc. conflicts already. Anybody have any
>idea how to proceed?

Ok, first things first. Which version of the linux kernel are you running?
Has it worked fine before and just stopped working recently, or has
it always had this problem? When it "gives up the ghost" does all
networking (through the tr interface) cease, or can you still ping?
Any suspicious messages on the system log? What MTU size are you using?
Does tr0 initialize correctly?

Paul
--
Paul P. Norton                  

 
 
 

Token Ring Lockups

Post by Ted Har » Wed, 31 Jul 1996 04:00:00


<snip> (Token Ring problems...)

>Ok, first things first. Which version of the linux kernel are you running?
>Has it worked fine before and just stopped working recently, or has
>it always had this problem? When it "gives up the ghost" does all
>networking (through the tr interface) cease, or can you still ping?
>Any suspicious messages on the system log? What MTU size are you using?
>Does tr0 initialize correctly?

>Paul
>--
>Paul P. Norton                  


Joe and I work together so I can answer those.

The Kernel versions we have gone through are 2.0.0 and 2.0.10; neither made
any difference. We have always experienced this problem with the adapter. The
behaviour is thus: The tr0 interface is opened and the network protocol (IP)
initializes without error. Pinging sites (by IP address or name) works fine.
However, when any large amount of data is being transferred (i.e. FTP) the
adapter/interface locks. The MTU is 1500 (same as the server we were FTP'ing  
 into).
The /var/adm/syslog does get messages (sorry I don't recall them verbatim)
like:
The transmitter is busy for more than 50ms...This can happen on very busy
rings.
Trying to reset the adapter. This can take a long time....

This message will appear repeatedly (~20 times) before the interface/adapter
then 'gives up the ghost'. We can to an: ifconfig tr0 down . but any more
dealings with tr0 will cause the xterm to lock up.
We have tried to change the values of:
TR_RETRY_INTERVAL
TR_RESET_INTERVAL
TR_BUSY_INTERVAL
in /usr/src/linux/drivers/net/ibmtr.h to no avail (yes, we recompiled the
kernel after changing the values)

(Joe, if I've forgotten anything just *in)

Thanks!
-Ted Hardy

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Thu, 01 Aug 1996 04:00:00


Here is a patch to ibmtr.c that will display a bit more information when
the problem occurs. Please send me whatever output this produces.

BTW, nando.net failed DNS lookup, so I couldn't send it by e-mail.


        ti=(struct tok_info *) dev->priv;

        if (dev->tbusy) {
+               unsigned char status;
                int ticks_waited;

                ticks_waited=jiffies - dev->trans_start;
                if (ticks_waited<TR_BUSY_INTERVAL) return 1;

+               status=readb(ti->mmio + ACA_OFFSET + ACA_RW + ISRP_ODD);
                DPRINTK("Arrg. Transmitter busy for more than 50 msec. "
                        "Donald resets adapter, but resetting\n"
                        "the IBM tokenring adapter takes a long time."
                        " It might not even help when the\n"
                        "ring is very busy, so we just wait a little longer "
-                       "and hope for the best.\n");          
+                       "and hope for the best.\n ISRP_ODD=%02x\n", status);
+              
+               if (status & ARB_CMD_INT)
+                       DPRINTK("arb cmd = %02x\n", readb(ti->arb));
                dev->trans_start+=5; /* we fake the transmission start time... */
                return 1;
        }
--
Paul P. Norton                  

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Thu, 01 Aug 1996 04:00:00



>Joe and I work together so I can answer those.

>The Kernel versions we have gone through are 2.0.0 and 2.0.10; neither made
>any difference. We have always experienced this problem with the adapter. The
>behaviour is thus: The tr0 interface is opened and the network protocol (IP)
>initializes without error. Pinging sites (by IP address or name) works fine.
>However, when any large amount of data is being transferred (i.e. FTP) the
>adapter/interface locks. The MTU is 1500 (same as the server we were FTP'ing  
> into).
>The /var/adm/syslog does get messages (sorry I don't recall them verbatim)
>like:
>The transmitter is busy for more than 50ms...This can happen on very busy
>rings.
>Trying to reset the adapter. This can take a long time....

>This message will appear repeatedly (~20 times) before the interface/adapter
>then 'gives up the ghost'. We can to an: ifconfig tr0 down . but any more
>dealings with tr0 will cause the xterm to lock up.
>We have tried to change the values of:
>TR_RETRY_INTERVAL
>TR_RESET_INTERVAL
>TR_BUSY_INTERVAL
>in /usr/src/linux/drivers/net/ibmtr.h to no avail (yes, we recompiled the
>kernel after changing the values)

>(Joe, if I've forgotten anything just *in)

>Thanks!
>-Ted Hardy


The Arrg problem. Since you seem to be able to recreate this consistantly
would you mind if I gave you a patch to apply to the token ring driver
so that I can get more information about it when it reaches that state?
It won't fix the problem, but I want to verify that you get to the Arrg
state for the same reason that I used to.

Out of curiosity, how much shared ram do you have the adapter set for?

--
Paul P. Norton                  

 
 
 

Token Ring Lockups

Post by Michael Mado » Fri, 09 Aug 1996 04:00:00



>Here is a patch to ibmtr.c that will display a bit more information when
>the problem occurs. Please send me whatever output this produces.
>            DPRINTK("Arrg. Transmitter busy for more than 50 msec. "
>                    "Donald resets adapter, but resetting\n"
>                    "the IBM tokenring adapter takes a long time."
>                    " It might not even help when the\n"
>                    "ring is very busy, so we just wait a little longer "
>-                   "and hope for the best.\n");          
>+                   "and hope for the best.\n ISRP_ODD=%02x\n", status);
>+          
>+           if (status & ARB_CMD_INT)
>+                   DPRINTK("arb cmd = %02x\n", readb(ti->arb));
>            dev->trans_start+=5; /* we fake the transmission start time... */
>            return 1;
>    }

Hi Paul,

I am also experiencing the dreaded token-ring Arrg problem.  I am
running Debian Linux 1.1 (Linux 2.0 Kernel)  I have experienced this
problem compiling the token-ring driver both as a module, and as part
of the kernel.  I am using an IBM Auto 16/4 ISA token-ring card.  I am
using interrupt #3, 16K RAM, and I have tried both the primary and
secondary address settings.  I am also running Samba.

I can easily replicate the problem by copying a large amount of data
(> 10MB) from my Windows for Workgroups machine to a shared drive on
the linux machine.  Everything goes along fine, sometimes up to 20MB
or more, and then I get the error.  The last time it happened, I was
just about the only person left in the office, so I don't think the
network was exceptionally busy.

I compiled the above changes into the token ring driver, and then
proceeded to bombard linux with a large file copy.  This time in
addition to the usual error message, I received ISRP_ODD=00.

If there is any other information you need please let me know.  I
would also be more than happy to test any suggestions or patches.  I'd
really like to see this driver working properly.  I was hoping to have
my Windows users back up their data to my linux server instead of
having to use a box of floppies.

Michael Madore
PC/Network Support
Sun Gro Horticulture, Inc.

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Fri, 09 Aug 1996 04:00:00


ISRP_ODD=00 is normal.

When you say you are using a Linux 2.0 kernel, do you mean 1.2.0 or 2.0.x?

I take it, then, that you are able to reliably duplicate this problem.
Expect some more informational patches from me via e-mail.

It looks like your adapter is configured ok. 16k shared ram is good.
More than 16k is a waste without tweaking the driver some.

Paul
--
Paul P. Norton                  

 
 
 

Token Ring Lockups

Post by Uwe Saue » Sat, 10 Aug 1996 04:00:00


Hello Michael,

I read in the Linux Network Newsgroup about
your manageing a

Quote:>>> Token Ring Network.

I do so to.
I use IBM compatible "3COM 3C619B/C" Token-Ring Cards
with the "Caldera Network Desktop" (Kernel 1.2.13)
commercial Linux. The Token Ring Driver Kernel was build
for me by lunetix.de, the Caldera distributor for Germany.
Up to now I had no major problems! Ok, I feel the NFS/SAMBA
transmission rate is a little bit low, but up to now
I had no time to proof it.
Is your problem with the Token Ring driver specific to 2.x kernels?
Will I go into trouble if I will upgrade?
Have you ever tried 3COM TR-adapters?

Bye, Uwe

 
 
 

Token Ring Lockups

Post by Paul P. Nort » Sat, 10 Aug 1996 04:00:00


[ some text deleted ]

Quote:>Is your problem with the Token Ring driver specific to 2.x kernels?
>Will I go into trouble if I will upgrade?
>Have you ever tried 3COM TR-adapters?

>Bye, Uwe

The Arrg problem seems to have been around since at least 1.3.x. I
don't know if you'll have problems if you upgrade, but it doesn't
seem related to release level.

Paul
--
Paul P. Norton                  

 
 
 

1. Token Ring Lockup?

I've an old NCR machine, one of the original Pentium 66 ones, with an
IBM Auto 16/4 Token Ring Card (ISA). I've been experiencing hard lockups
on the machine with older 2.0.32 kernel from Redhat 5.0 and the
pre-2.0.34 kernel shipped with RedHat 5.1. No oops, the machine just
goes dead. This generally happens when some form of network activity is
occurring, hence I suspect Token Ring, although the machine also has an
Adaptec 2940 PCI SCSI controller.

I suspect marginal hardware may be triggering the problem, but am not
too sure what to focus on. Has anyone else had a similar problem in the
past? Any recommendations on were to start?

Please email, by ISP expires news very quickly and I may miss a post.

Thanks,
Chris

------
Christopher Horn

2. jumpstart client on diff subnet

3. token ring token ring token ring

4. LPRng Accounting

5. TOKEN RING? (unfortunately *IBM* Token Ring)

6. Can Linux Read/Write to NTFS?????

7. Token ring ASB error

8. copyright on manuals based on open source software

9. Slow token ring?

10. Any token ring cards work with Linux?

11. Linux & Token-Ring

12. token-ring patch,nettools installation step

13. Token Ring card on X86?