HELP! with OSR5.0.0 Lockup under heavy IO

HELP! with OSR5.0.0 Lockup under heavy IO

Post by Jean-Pierre Radl » Wed, 30 Oct 1996 04:00:00



Joe Schofield propounded certain bytes, to wit:
| Has anyone experienced OSR5.0.0 locking up/panic during heavy disk IO?
| I have one OSR5.0.0 machine and one OSR3.2v4.2 machines, both have
| Adaptec 2940 cards, and with only SCSI hard drives.  I have multiple
| applications which run well in OSR3.2.v4.2, but cause a lockup or
| panic (I've had both) in OSR5.0.0.  Has anyone had this problem? If
| so, which patch seemed to help? I have most if not all of the patches
| installed for OSR5.0.0.  If anyone has had this problem, then switched
| to OSR5.0.2, did it help? I am also in the process of switching from
| Linux to OSR5.0.2 as a news server, and if this is the case, I am not
| going to even think of replacing my Linux box!

You *might* be facing the problem in this TA:

 Non-Corollary system gets "WARNING: ip: spinning on PCB Fxxxxxxx" messages.

 KEYWORDS: ip: spinning on PCB net100 openserver networking supplement release
 1.0 v5 5.0.0 5.0.2 console ERGREF E130167 us-ip-spin tcp_reaper tcp reap
 reaper warning

 RELEASE: SCO OpenServer Enterprise System Release 5.0.0, 5.0.2
          SCO OpenServer Desktop System Release 5.0.0, 5.0.2
          SCO Networking Supplement Release 1.0.0

 PROBLEM: The message "WARNING: ip: spinning on PCB Fxxxxxxx" scrolls on
          the console and all user processes hang.  Eventually, error log
          overflow messages will also appear.

          This can happen on both SCO OpenServer 5.0.2 and SCO OpenServer
          5.0.0 systems with the SCO Networking Supplement 1.0.0 installed.

          The same symptoms can appear on SCO OpenServer Release 5.0.0 systems
          with SCO SMP (Symmetrical Multiprocessing) Release 5.0.0 on Corollary
          Architecture machines for a different reason.

          See IT os/2793, "Corollary system gets 'WARNING: IP:spinning on
          PCB Fxxxxxxx' messages" for information on this related issue.

 CAUSE:  This is caused by a problem with the tcp_reaper() routine in the
          tcp driver shipped with the above products.

 SOLUTION: This problem has been reported to SCO Engineering.

When it happens here, everything locks up tight as a drum, and I cannot
even switch screens back to the console device (tty01).

I finally attached a terminal to COM1 and put SYSTTY=1 in /etc/default/boot
to make that the console, where I could see the error message scrolling
indefinitely.   Fortunately (ha!), it's a * enough bug that it
does not (have time to?) write to /usr/adm/messages or /usr/adm/syslog,
otherwise you'd be faced with a filled-up root filesystem.

I have no idea when a patch will forthcome...

[PS: what's the sco.scounix newsgroup to which you cross-posted, another
do-it-yourself hierarchy?]

--

 
 
 

HELP! with OSR5.0.0 Lockup under heavy IO

Post by Joe Schofie » Wed, 30 Oct 1996 04:00:00


Has anyone experienced OSR5.0.0 locking up/panic during heavy disk IO?
I have one OSR5.0.0 machine and one OSR3.2v4.2 machines, both have
Adaptec 2940 cards, and with only SCSI hard drives.  I have multiple
applications which run well in OSR3.2.v4.2, but cause a lockup or
panic (I've had both) in OSR5.0.0.  Has anyone had this problem? If
so, which patch seemed to help? I have most if not all of the patches
installed for OSR5.0.0.  If anyone has had this problem, then switched
to OSR5.0.2, did it help? I am also in the process of switching from
Linux to OSR5.0.2 as a news server, and if this is the case, I am not
going to even think of replacing my Linux box!

Thanks,
Joe Schofield


 
 
 

HELP! with OSR5.0.0 Lockup under heavy IO

Post by James R. Sulliva » Thu, 31 Oct 1996 04:00:00



> Has anyone experienced OSR5.0.0 locking up/panic during heavy disk IO?
> I have one OSR5.0.0 machine and one OSR3.2v4.2 machines, both have
> Adaptec 2940 cards, and with only SCSI hard drives.  I have multiple
> applications which run well in OSR3.2.v4.2, but cause a lockup or
> panic (I've had both) in OSR5.0.0.  Has anyone had this problem?

In the past, problems of this sort tended to be related to the SCSI
termination.  OpenServer Release 5 is more demanding on the SCSI
sub-system and as a result, poor termination of hte SCSI bus can
cause problems.  In every instance that I've seen were this behaviour
occurs (system works fine with Version 4.2, has problems with Release
5),
cleaning you the SCSI termination, either by removing devices or
properly terminating the bus has resolved the problem.

That's where I would start my investigation.
--

----
Jim Sullivan            "Don't plant your bad days.  They grow into bad
SMB Segment Marketing    weeks and then bad months and before you know it

416 216 4611

 
 
 

1. System lockup on heavy Disk IO

Hi,

i want to ask if anybody of you had the same problems as me.
Here it goes ... imagine you have 2 or more disks in your system.
you now go on an do an find . -print | cpio /otherdisk which copies
several small 10000's files (average 50K)
On my system (pc164 with redhat5.1 and kernel 2.0.34 and an Adaptec
2940uw) it will copy some time and then i get an scsi error saying that
254 scsi buffers are overrun. It then tries to reset the scsibus several
times and then the system locks up.
if i run the same command as above with another job in the background
which does a sync every second, all looks fine. But that should not the
the final solution ...

Does anybody have an explanation (or better a solution) to this behavior
?

I also testet it with 2.0.35-2 Kernel with alpha-patches (sure). But
then i got a null pointer reference panic (sorry i have not the exact
Screendump available for this and the above errormessage).

Thanks in advance,
  Juergen
--

The sun shines on my way,
every night and every day. :-))

2. setuid not working?

3. OSR5.0.0 ==> OSR5.0.4 UIP Errors

4. Samba create permissions in a W2K domain

5. OSR5.0.0 -- tuneable kernel parameters

6. Remote printing

7. SCO OSR5.0.0 and SMP performance problem

8. how to get ftp root login changed from /foldername to /

9. OSR5.0.0 TImeout on fixed disk

10. Need C-Kermit 7.0 binaries for OSR5.0.0

11. PROBLEM: VDM on OSR5.0.0

12. Upgrade from Current OSR5.0.0 to 5.0.2?

13. Installing OSR5.0.0 on a Pentium Pro 200...system hangs