freeze during heavy (SCSI) disk access

freeze during heavy (SCSI) disk access

Post by Erich J. Green » Thu, 21 Feb 2002 02:26:09



The machine:

ASL Marquis C460-S

Tyan Thunder i860 S2603 ATX motherboard with 4 GB RDRAM and
dual-channel LSI 1010 Ultra 160/m SCSI controller

2 Xeon CPUs (1.7 GHz)

2 Seagate ST173404LC Cheetah 73GB Ultra 160/m SCSI hard drives,
partitioned as follows:

   Name        Flags      Part Type  FS Type          [Label]        Size (MB)
------------------------------------------------------------------------------
   sda1                    Primary   Linux ext2                       20974.47
   sda5                    Logical   Linux ext2                       50322.27
   sda3                    Primary   Linux swap                        2105.68
   sdb1                    Primary   Linux ext2                       73402.40

/dev/sda1 is the root filesystem; /dev/sda5 and /dev/sdb1 are used for
data storage and processing.

External SCSI Zip drive on an Adaptec 2906 controller; internal IDE
CD-RW (HP 9500I); Intel 82550 & Netgear GA620 Ethernet cards; Matrox
G450 AGP video; Soundblaster 128 PCI

2.4.17-2 kernel (originally 2.4.7; problem independent of kernel
version) with SMP support and sym53c8xx (1.7.3c-20010512) compiled
in.  (See below for full config.)

Red Hat 7.0

The problem:

The machine can completely lock up (no response to anything but the
power button) during heavy disk access.  No oops messages are written
to the logs.

The freeze almost always happens during heavy access to the second
partition on the first hard drive (/dev/sda5); however, the machine
can lock up while accessing any partition on either drive.  Freezes
aren't strictly reproducible; the machine may make it most of the way
through a job one time but crash almost immediately another.  Heavy
data processing, compressing or uncompressing large files,
transferring large files via sshd, and running badblocks are jobs that
typically cause problems.  (Badblocks sometimes makes it through a
read-only test on /dev/sda5, reporting no problems; it never finishes
a read-write test, safe or unsafe.)

After a freeze, the affected filesystem may make it through fsck, may
show a few inode problems, or may be complete mush (with thousands of
errors, and non-sensical contents after fsck makes its changes).

The SCSI controller channels are on their own IRQs, so it doesn't seem
likely to be a conflict.

So far, I've tried updating the kernel and the SCSI device driver,
creating a new filesystem on the most commonly affected partition
(though it can take several tries for mke2fs to run all the way
through without locking the machine), and making the worst partition
primary instead of extended (and back again); none of this has helped.
(I did the last two when the problem appeared to be confined to
/dev/sda5).

I haven't found anything helpful in the HOWTOs, FAQs, or Usenet
archives.

Any ideas would be much appreciated.

Thanks.

-- Erich

P.S.  Current kernel .config follows:

#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_X86_MSR=y
# CONFIG_X86_CPUID is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_MULTIQUAD is not set
CONFIG_HAVE_DEC_LOCK=y

#
# General setup
#
CONFIG_NET=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
# CONFIG_HOTPLUG_PCI is not set
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_PM=y
# CONFIG_ACPI is not set
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
# CONFIG_APM_CPU_IDLE is not set
# CONFIG_APM_DISPLAY_BLANK is not set
# CONFIG_APM_RTC_IS_GMT is not set
# CONFIG_APM_ALLOW_INTS is not set
CONFIG_APM_REAL_MODE_POWER_OFF=y

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_PC_CML1=m
CONFIG_PARPORT_SERIAL=m
CONFIG_PARPORT_PC_FIFO=y
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AMIGA is not set
# CONFIG_PARPORT_MFC3 is not set
# CONFIG_PARPORT_ATARI is not set
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_SUNBPP is not set
# CONFIG_PARPORT_OTHER is not set
CONFIG_PARPORT_1284=y

#
# Plug and Play configuration
#
CONFIG_PNP=m
CONFIG_ISAPNP=m

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
CONFIG_BLK_DEV_DAC960=m
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID5=y
CONFIG_MD_MULTIPATH=y
CONFIG_BLK_DEV_LVM=m

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK_DEV=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NET_IPIP=m
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set

#
#   IP: Netfilter Configuration
#
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IRC=m
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_STATE=m
# CONFIG_IP_NF_MATCH_UNCLEAN is not set
# CONFIG_IP_NF_MATCH_OWNER is not set
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
# CONFIG_IP_NF_TARGET_MIRROR is not set
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_SNMP_BASIC=m
CONFIG_IP_NF_NAT_IRC=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
# CONFIG_IP_NF_COMPAT_IPCHAINS is not set
# CONFIG_IP_NF_COMPAT_IPFWADM is not set
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
CONFIG_VLAN_8021Q=m

#
#  
#
CONFIG_IPX=m
# CONFIG_IPX_INTERN is not set
CONFIG_ATALK=m
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y

#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_IDEDISK_STROKE is not set
# CONFIG_BLK_DEV_IDEDISK_VENDOR is not set
# CONFIG_BLK_DEV_COMMERIAL is not set
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDETAPE=m
# CONFIG_BLK_DEV_IDEFLOPPY is not set
CONFIG_BLK_DEV_IDESCSI=m
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_TASKFILE_IO=y

#
# IDE chipset support/bugfixes
#
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_PCI_WIP=y
# CONFIG_BLK_DEV_IDEDMA_TIMEOUT is not set
CONFIG_IDEDMA_NEW_DRIVE_LISTINGS=y
CONFIG_BLK_DEV_ADMA=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_AMD74XX_OVERRIDE=y
CONFIG_BLK_DEV_CMD64X=y
CONFIG_BLK_DEV_CMD680=y
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC_ADMA is not set
CONFIG_BLK_DEV_PDC202XX=y
CONFIG_PDC202XX_BURST=y
CONFIG_PDC202XX_FORCE=y
CONFIG_BLK_DEV_SVWKS=y
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_IDE_CHIPSETS is not set
# CONFIG_BLK_DEV_ELEVATOR_NOOP is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# ...

read more »

 
 
 

freeze during heavy (SCSI) disk access

Post by Michael Lee Yoh » Thu, 21 Feb 2002 03:14:20


Quote:> Tyan Thunder i860 S2603 ATX motherboard with 4 GB RDRAM and
> dual-channel LSI 1010 Ultra 160/m SCSI controller

Since you've said that you tried the latest kernel and the version
of the driver, you might have stumbled upon a driver bug.  Find the
driver in your kernel source tree.  Open it up and look for the author
of that particular driver.  Send him your exact post and ask to see if
he has any knowledge with regards to what might be causing a breakdown.

Driver authors are, in general, very helpful to help you de-bug their
driver, especially if you have equipment that hasn't been yet subjected
to their rudimentary tests.

--


Software Developer, Engineering Services
Red Hat, Inc.

http://people.redhat.com/myohe/

QUIPd 1.02: (568 of 814)
-> Nothing can bring you peace but yourself.
-> - Ralph Waldo Emerson, 1803-1882

 
 
 

freeze during heavy (SCSI) disk access

Post by Bob Hauc » Thu, 21 Feb 2002 12:10:07




[fast box with fast scsi disks]

Quote:> The machine can completely lock up (no response to anything but the
> power button) during heavy disk access.  No oops messages are written
> to the logs.

The most popular reasons for that are a problem with cables or
termination.

Quote:> data processing, compressing or uncompressing large files,
> transferring large files via sshd, and running badblocks are jobs that
> typically cause problems.

Yup.  That's the typical behavior for a cable or termination problem.

--
 -| Bob Hauck
 -| To Whom You Are Speaking
 -| http://www.haucks.org/

 
 
 

1. HP Netserver LD PRO 200 freezes under heavy disk access

I have an HP Netserver LD PRO 200MHz processor, with a single 4.2 GB SCSI "Hot
Swap" drive, and 64 MB of ecc RAM.
The same problem happens when either SCO UNIX 3.2v4.2, and SCO Open Server
5.0.4 were loaded.
Under moderate to heavy disk access, the system freezes, and the hard drive
light usually stays on.
HP has swapped out hard drives, the motherboard, the cpu board, and the power
supply.  This keeps happening.  Is there a fix for this???


Thanx
Paul

2. Network Problem, Sun Blade 100

3. Computer freezes during disk access

4. debian dpkg vs. hand-built X

5. Why does XF86 *freeze* when I do heavy font access?

6. Oracle/Unix prog/dba/admin needed

7. Running Linux on an ASUS A7N8X-X motherboard -- Problems with freezing after heavy disk IO

8. Connection 9600+ and linux.

9. Sparc 10 locks up during heavy disk use

10. severe slowdown with 2.4 series w/heavy disk access

11. severe slowdown with 2.4 series w/heavy disk access (revisited)

12. RS6000 almost stops with heavy disk access.

13. EXT3 - freeze ups during disk writes