eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Paul Jakm » Wed, 08 May 2002 23:30:11



hi,

i have a problem with a Dell poweredge with onboard Intel eepro NICs.

The network card basically doesnt work. The system logs are filled
with:

        eepro100: wait_for_cmd_done timeout!

and of course attendant "last message repeated x times". at less
frequent intervals we get NETDEV watchdog messages:

        NETDEV WATCHDOG: eth0: transmit timed out

always followed by an error message which may be descriptive:

        eth0: Transmit timed out: status 0090  0cf0 at 13
        70/1430 command 000c0000

the parameter following command is always 000c0000.
the parameter following status varies between:

        0050 0c80
        0050 0cf0
        0090 0c80
        0090 0cf0

distribution of the above is:

     5  0050 0c80
    227 0050 0cf0
     22 0090 0c80
    120 0090 0cf0

the xxxxx/yyyyy number is always different.

lspci of the network interfaces concerned:

00:01.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Dell Computer Corporation: Unknown device 00da
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fe2ff000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at ecc0 [size=64]
        Region 2: Memory at fe100000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00:02.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Dell Computer Corporation: Unknown device 00da
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fe2fe000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at ec80 [size=64]
        Region 2: Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

kernel version is 2.4.19-pre8, however, exact same thing occurs with
2.4.19-pre2. (its running pre8 cause we hoped it was a problem fixed
since pre2)

mii-tool -v -v eth0 shows no difference (that i see) between the
interface on the working machine and this "problem" machine:

non-working:

eth0: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 1:
    3000 782d 02a8 0154 05e1 45e1 0001 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0a03 0000 0001 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000
  product info: Intel 82555 rev 4
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

working machine:

eth0: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 1:
    3000 782d 02a8 0154 05e1 45e1 0001 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0a03 0000 0001 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000
  product info: Intel 82555 rev 4
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

The strange thing is this machine has a sister machine, an identical
poweredge bought at the same time, hooked up to the same switch,
running the same software, (exact same kernel 2.4.19-pre2 as other
machine used to run), same link negotiated, which does not have this
problem. we have changed the cable obviously, but this made no
difference.

looking at the code concerned:

static inline void wait_for_cmd_done(long cmd_ioaddr)
{
        int wait = 1000;
        do  udelay(1) ;
        while(inb(cmd_ioaddr) && --wait >= 0);
#ifndef final_version
        if (wait < 0)
                printk(KERN_ALERT "eepro100: wait_for_cmd_done timeout!\n");
#endif

Quote:}

it seems the driver simply wants to read from the NIC and this doesnt
succeed (after trying 1000 times).

this, along with the fact than an identical machine has no problems,
would suggest to me i have a hardware problem. Is this a valid
assumption or are there "funnies" with the eepro100 driver or hardware
that i should be aware of? (eg is it possible the eepro100 has gotten
into some weird state?).

NB: i also tried the intel e100 driver, and curiously it prints a very
similar message to the eepro100 driver (wait_for_exec... in the case
of the intel e100 driver).

NB2: this problem may be multicast related. it started happening after
we installed and ran zebra ospfd on the machines which uses multicast.  
however, running without ospfd does not cure it.

if anyone needs further info, i can provide it.

regards,

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Richard B. Johnso » Thu, 09 May 2002 00:00:08



> hi,

> i have a problem with a Dell poweredge with onboard Intel eepro NICs.

> The network card basically doesnt work. The system logs are filled
> with:

[SNIPPED...]

Quote:> looking at the code concerned:

> static inline void wait_for_cmd_done(long cmd_ioaddr)
> {
>         int wait = 1000;
>         do  udelay(1) ;
>         while(inb(cmd_ioaddr) && --wait >= 0);
> #ifndef final_version
>         if (wait < 0)
>                 printk(KERN_ALERT "eepro100: wait_for_cmd_done timeout!\n");
> #endif
> }

This procedure is called from numerous places in the code.
In line 1069 of eepro100.c, comment out the call to wait_for_cmd_done().
See if this fixes it. If it does, look in the header and send a patch
to the current maintainer. FYI, I use this driver with no problems
on 2.4.18 -- but I have commented-out that call because there, in fact,
might be no command to wait for and I got spurious messages.

Cheers,
* Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Samuel Maftou » Thu, 09 May 2002 00:10:10




> > hi,

> > i have a problem with a Dell poweredge with onboard Intel eepro NICs.

> > The network card basically doesnt work. The system logs are filled
> > with:
> [SNIPPED...]

> > looking at the code concerned:

> > static inline void wait_for_cmd_done(long cmd_ioaddr)
> > {
> >         int wait = 1000;
> >         do  udelay(1) ;
> >         while(inb(cmd_ioaddr) && --wait >= 0);
> > #ifndef final_version
> >         if (wait < 0)
> >                 printk(KERN_ALERT "eepro100: wait_for_cmd_done timeout!\n");
> > #endif
> > }

> This procedure is called from numerous places in the code.
> In line 1069 of eepro100.c, comment out the call to wait_for_cmd_done().
> See if this fixes it. If it does, look in the header and send a patch
> to the current maintainer. FYI, I use this driver with no problems
> on 2.4.18 -- but I have commented-out that call because there, in fact,
> might be no command to wait for and I got spurious messages.

> Cheers,
>* Johnson

> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

>                  Windows-2000/Professional isn't.

I have the same message but only when I'm using my ieee-1394 devices (
firewire ) .
I copy from NFS to ieee-1394 HD and approximatively at 256 meg of copied
data from network I have the message (wait_for_cmd_timeout), and I'm not
able use the network, nor the mounted HD.

I need to say the system is running 2.4.18 SMP ( 2 proc ) with 2go of
RAM (higmeme 4-GB from suse ) ( It's a scientific data analysis and extraction system ).

What should I do ?
Should I remove the code you told me to remove ?
        Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Mickael Baill » Thu, 09 May 2002 00:30:08


        Hi,

We got the same problem with PCI eepro NICs (not onboard) on Dell Poweredge.
We got the same error messages
We tried the alternative 'e100' driver, but without success.
We tried on 2.2 kernels (RedHat 6.2), then on 2.4 (RedHat 7.1/7.2): problem
still exist.

At last we disabled APIC ( configuration line 'append=noapic' in your
lilo.conf configuration file )

Now it's working fine since 2 days... need a little more time to validate the
change...

See you
Mickael


> hi,

> i have a problem with a Dell poweredge with onboard Intel eepro NICs.

> The network card basically doesnt work. The system logs are filled
> with:

>    eepro100: wait_for_cmd_done timeout!

> and of course attendant "last message repeated x times". at less
> frequent intervals we get NETDEV watchdog messages:

>    NETDEV WATCHDOG: eth0: transmit timed out

> always followed by an error message which may be descriptive:

>    eth0: Transmit timed out: status 0090  0cf0 at 13
>    70/1430 command 000c0000

> the parameter following command is always 000c0000.
> the parameter following status varies between:

>    0050 0c80
>    0050 0cf0
>    0090 0c80
>    0090 0cf0

> distribution of the above is:

>      5     0050 0c80
>     227 0050 0cf0
>      22 0090 0c80
>     120 0090 0cf0

> the xxxxx/yyyyy number is always different.

> lspci of the network interfaces concerned:

> 00:01.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
> (rev 08) Subsystem: Dell Computer Corporation: Unknown device 00da
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr-
> DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A
> routed to IRQ 16
>         Region 0: Memory at fe2ff000 (32-bit, non-prefetchable) [size=4K]
>         Region 1: I/O ports at ecc0 [size=64]
>         Region 2: Memory at fe100000 (32-bit, non-prefetchable) [size=1M]
>         Capabilities: [dc] Power Management version 2
>                 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME-
> 00:02.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
> (rev 08) Subsystem: Dell Computer Corporation: Unknown device 00da
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr-
> DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A
> routed to IRQ 17
>         Region 0: Memory at fe2fe000 (32-bit, non-prefetchable) [size=4K]
>         Region 1: I/O ports at ec80 [size=64]
>         Region 2: Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
>         Capabilities: [dc] Power Management version 2
>                 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME-

> kernel version is 2.4.19-pre8, however, exact same thing occurs with
> 2.4.19-pre2. (its running pre8 cause we hoped it was a problem fixed
> since pre2)

> mii-tool -v -v eth0 shows no difference (that i see) between the
> interface on the working machine and this "problem" machine:

> non-working:

> eth0: negotiated 100baseTx-FD flow-control, link ok
>   registers for MII PHY 1:
>     3000 782d 02a8 0154 05e1 45e1 0001 0000
>     0000 0000 0000 0000 0000 0000 0000 0000
>     0a03 0000 0001 0000 0000 0000 0000 0000
>     0000 0000 0000 0000 0000 0000 0000 0000
>   product info: Intel 82555 rev 4
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
>   advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
> flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
> flow-control

> working machine:

> eth0: negotiated 100baseTx-FD flow-control, link ok
>   registers for MII PHY 1:
>     3000 782d 02a8 0154 05e1 45e1 0001 0000
>     0000 0000 0000 0000 0000 0000 0000 0000
>     0a03 0000 0001 0000 0000 0000 0000 0000
>     0000 0000 0000 0000 0000 0000 0000 0000
>   product info: Intel 82555 rev 4
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
>   advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
> flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
> flow-control

> The strange thing is this machine has a sister machine, an identical
> poweredge bought at the same time, hooked up to the same switch,
> running the same software, (exact same kernel 2.4.19-pre2 as other
> machine used to run), same link negotiated, which does not have this
> problem. we have changed the cable obviously, but this made no
> difference.

> looking at the code concerned:

> static inline void wait_for_cmd_done(long cmd_ioaddr)
> {
>         int wait = 1000;
>         do  udelay(1) ;
>         while(inb(cmd_ioaddr) && --wait >= 0);
> #ifndef final_version
>         if (wait < 0)
>                 printk(KERN_ALERT "eepro100: wait_for_cmd_done
> timeout!\n"); #endif
> }

> it seems the driver simply wants to read from the NIC and this doesnt
> succeed (after trying 1000 times).

> this, along with the fact than an identical machine has no problems,
> would suggest to me i have a hardware problem. Is this a valid
> assumption or are there "funnies" with the eepro100 driver or hardware
> that i should be aware of? (eg is it possible the eepro100 has gotten
> into some weird state?).

> NB: i also tried the intel e100 driver, and curiously it prints a very
> similar message to the eepro100 driver (wait_for_exec... in the case
> of the intel e100 driver).

> NB2: this problem may be multicast related. it started happening after
> we installed and ran zebra ospfd on the machines which uses multicast.
> however, running without ospfd does not cure it.

> if anyone needs further info, i can provide it.

> regards,

> --paulj

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Samuel Maftou » Thu, 09 May 2002 00:30:13



>    Hi,

> We got the same problem with PCI eepro NICs (not onboard) on Dell Poweredge.

I'm on dell's also but not only poweredge
Quote:> We got the same error messages
> We tried the alternative 'e100' driver, but without success.

We also did but we get  approximatively with the same message ( no
really the same, but one talking about timeouts)
Quote:> We tried on 2.2 kernels (RedHat 6.2), then on 2.4 (RedHat 7.1/7.2): problem
> still exist.
I didn't tried 2.2

> At last we disabled APIC ( configuration line 'append=noapic' in your
> lilo.conf configuration file )

I did also but my problem with firewire still exists
That's the first thing I did, disableapic and noapic ( it's an smp
system so I have no idea what the consequences of this are )

Quote:

> Now it's working fine since 2 days... need a little more time to validate the
> change...

> See you
> Mickael

Thanks for the tip.
        Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Richard B. Johnso » Thu, 09 May 2002 00:50:07





> >                  Windows-2000/Professional isn't.

[SNIPPED..]

Quote:> I have the same message but only when I'm using my ieee-1394 devices (
> firewire ) .
> I copy from NFS to ieee-1394 HD and approximatively at 256 meg of copied
> data from network I have the message (wait_for_cmd_timeout), and I'm not
> able use the network, nor the mounted HD.

> I need to say the system is running 2.4.18 SMP ( 2 proc ) with 2go of
> RAM (higmeme 4-GB from suse ) ( It's a scientific data analysis and extraction system ).

> What should I do ?
> Should I remove the code you told me to remove

No. I told someone to comment out a call to wait_for_cmd_timeout() in
a procedure where this generates spurious (incorrect) warning messages.

You are probably getting real errors (the chip stops) when its interrupts
can't be handled quickly enough.

This may be because the firewire driver may be looping in its ISR.
Typically, when drivers don't play together very well, it's because one
or both of the drivers were written by people who didn't learn how to play
together as children. ^;) "It's my CPU (baseball). I'm going to keep it as
long as I want...."  

In 100% of the cases where I have been asked to help fix these kinds of
problems, getting rid of the loops in ISRs fixes the problems forever.
Yes, I know about "interrupt mitigation...", but what's the use of
maximizing driver throughput if the computer won't work?

The fixes to lots of chip drivers that hang and lock-up won't
happen until schools start teaching future software engineers to
play together as children. Until that time, you can probably fix
your particular drivers by getting rid of those loops in the ISRs.

A quick-fix, just to prove it to yourself, is to set the loop-counter
(max_interrupt_work in eepro100.c) to 1. You need to do this in
the fire-wire driver also, but that's not as simple, several drivers
do "while something()" in the interrupt routines. That something()
may be true for a very long time, using CPU cycles that your net-card
really needs.

Cheers,
* Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Paul Jakm » Thu, 09 May 2002 05:30:08



> No. I told someone to comment out a call to wait_for_cmd_timeout() in
> a procedure where this generates spurious (incorrect) warning messages.

ah.. not spurious.

Quote:> You are probably getting real errors (the chip stops) when its interrupts
> can't be handled quickly enough.

the server i have has no network connectivity via that interface while
this is going on.

anyway... i rebooted the machine and (so far, touch wood, fingers
crossed) it hasnt shown the problem so far. (odd cause i rebooted the
thing several times last friday and it didnt clear the problem).

/before/ the reboot i set multicast_filter_limit=1, downed the
interface, removed the module and upped it again (after waiting) but
this made no difference. however, the reboot (with this parameter set)
made a difference where before reboots made no difference.


current maintainer) about your fix and whether it is proper or not?

regards,

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

Post by Paul Jakm » Thu, 09 May 2002 12:30:05



> This procedure is called from numerous places in the code.
> In line 1069 of eepro100.c, comment out the call to wait_for_cmd_done().
> See if this fixes it.

server started showing same problem again and, nope... this doesnt fix
it for me.

:(

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. 2.4.19-pre6aa1 (possible all kernel after 2.4.19-pre2) athlon PCI workaround

Hi,

This is a known problem I know, the screen problem
with some athlon computer due to some PCI optimization
code  etc..; but how can I work around this. For
2.4.19-pre2 I remember to go somewhere to find pci.c
and comment out the code related, but in
2.4.19-pre6aa1 I got stuck..

Is there an official way to overcome/fix this problem
? (kernel build option?) Or just anyone interested in
doing such job?

Please help and copy your reply to my email address.

Thanks a lot

=====
Steve Kieu

http://messenger.yahoo.com.au - Yahoo! Messenger
- A great way to communicate long-distance for FREE!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Emulators and the like

3. new eepro100 driver: "eepro100: wait_for_cmd_done timeout!"

4. Problems installing XML::PArser on Solaris 9 (due to missing Expat.o?)

5. Kernel patching 2.4.19pre1 -> 2.4.19pre2

6. Help: Cannot login to my computer! "Cannot open /dev/tty"

7. 2.4.19-rc2 -> 2.4.19-rc3 : no more eth (fwd)

8. RH6.0: process start time is wrong in 'ps aux' with SMP kernel

9. 3Ware ok 2.4.19, dies 2.4.19-ac4

10. Promise 20267 hangs with 2.4.19-pre3 and 2.4.19-pre3-ac3

11. Kernel panic 2.4.19-pre6 AND 2.4.19-pre5-ac3 - More info - ksymoops

12. kbuild25 version 3.0 for 2.4.19-pre9 and 2.4.19-pre9-ac3

13. eepro100: wait_for_cmd_done timeout