RH 7.1 crashes under heavy disk load, RH 6.1 works fine

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by Jochen Rot » Wed, 20 Jun 2001 10:03:51



Hello,

I have been running RH 6.0 and later RH 6.1 on a machine for years without
any problems whatsoever. Hardware is Asus P2B, Pentium II-350, 128 MB SDRAM,
WDC AC38400L IDE hard disk.

When I got a new 80G hard disk I decided to install RH 7.1 on the new disk
and mount the old one at /old. The new disk is a Maxtor 98196H8. I updated
the mainboard BIOS from 1006 to 1013 in order to install the huge new disk.
The old boot disk with RH 6.1 still continues to work fine with the new
BIOS under heavy load.

Now the problem: When I run RH 7.1 the system will eventually crash during
heavy disk activity. I run the simple shell script below to reproduce the
error.

I tried switching the disk from udma2 mode to mdma2 using the command
hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes slower.
Same for trying PIO mode. I also tried running with swap disabled, no
success either.

The error is almost always "Unable to handle kernel paging request", and
this even if no swap space is configured. Most of the OOPSes happen within
the init process, and the logging information does not make it to the
/var/log/messages file. Below is the ksymoops output for the crash with
no swap space configured.

The new 2.4 kernels appear to run faster than the old 2.2 kernels used in
RH 6.x, so I would prefer to use them if I can get the system to run
reliably.

Comments and suggestions appreciated.

Regards,
Jochen

-------------------------------------------------------
Load script to induce failure:

#!/bin/sh
count=$1
delay=$2
[ -z "$count" ] && count=1
[ -z "$delay" ] && delay=10
echo count=$count delay=$delay

index=1
while [ $index -le $count ]
do
        echo -n " $index"
        # /home/jochen contains about 3 gigs of data in several
        # thousand files.
        find /home/jochen -type f -exec echo -n "{} " ";"\
 -exec sum "{}" ";" >s-$$.$index &
        sleep $delay
        index=`expr $index + 1`
done
echo

-------------------------------------------------------
ksymoops output:

ksymoops 2.4.0 on i686 2.4.2-2.  Options used
     -v /boot/vmlinux-2.4.2-2 (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.2-2/ (default)
     -m /boot/System.map (specified)

Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01af860, vmlinux says c0153510.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol tulip_max_interrupt_work  , tulip says c88514e0, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
Warning (compare_maps): mismatch on symbol tulip_rx_copybreak  , tulip says c88514e4, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc4.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says c883f1a0, /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o says c883ecc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o entry
Jun 14 21:21:17 lard kernel: Unable to handle kernel paging request at virtual address 40f89194
Jun 14 21:21:17 lard kernel: c012493a
Jun 14 21:21:17 lard kernel: Oops: 0002
Jun 14 21:21:17 lard kernel: CPU:    0
Jun 14 21:21:17 lard kernel: EIP:    0010:[__remove_inode_page+74/112]
Jun 14 21:21:17 lard kernel: EIP:    0010:[<c012493a>]
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 14 21:21:17 lard kernel: EFLAGS: 00010246
Jun 14 21:21:17 lard kernel: eax: 00000000   ebx: c10b5f50   ecx: c57a8aa8   edx: 40f89194
Jun 14 21:21:17 lard kernel: esi: c10b5f6c   edi: 0000018a   ebp: c0258a04   esp: c21f7e60
Jun 14 21:21:18 lard kernel: ds: 0018   es: 0018   ss: 0018
Jun 14 21:21:18 lard kernel: Process sum (pid: 30648, stackpage=c21f7000)
Jun 14 21:21:19 lard kernel: Stack: c10b5f50 c012bbff c10b5f50 c0258a04 c0258d74 00000001 00000001 c012d6ba
Jun 14 21:21:19 lard kernel:        c0258a04 00000001 c0258d7c 00000000 c0258d70 c012d7d5 c0258d70 00000000
Jun 14 21:21:19 lard kernel:        00000001 00000001 00000015 00000001 00000000 c119c8d8 c70d7c28 00002039
Jun 14 21:21:19 lard kernel: Call Trace: [reclaim_page+687/1056] [__alloc_pages_limit+122/176] [__alloc_pages+229/640] [generic_file_readahead+494/656] [account_io_end+60/80] [do_generic_file_read+528/1344] [ide_end_request+79/96]
Jun 14 21:21:19 lard kernel: Call Trace: [<c012bbff>] [<c012d6ba>] [<c012d7d5>] [<c01259fe>] [<c0162d3c>] [<c0125cb0>] [<c0188bcf>]
Jun 14 21:21:19 lard kernel:        [<c0126134>] [<c0125fe0>] [<c01339a6>] [<c010a488>] [<c010a4ac>] [<c010901b>]
Jun 14 21:21:19 lard kernel: Code: 89 02 c7 43 34 00 00 00 00 ff 0d 60 86 25 c0 c7 43 08 00 00

Quote:>>EIP; c012493a <__remove_inode_page+4a/70>   <=====

Trace; c012bbff <reclaim_page+2af/420>
Trace; c012d6ba <__alloc_pages_limit+7a/b0>
Trace; c012d7d5 <__alloc_pages+e5/280>
Trace; c01259fe <generic_file_readahead+1ee/290>
Trace; c0162d3c <account_io_end+3c/50>
Trace; c0125cb0 <do_generic_file_read+210/540>
Trace; c0188bcf <ide_end_request+4f/60>
Trace; c0126134 <generic_file_read+64/80>
Trace; c0125fe0 <file_read_actor+0/f0>
Trace; c01339a6 <sys_read+96/d0>
Trace; c010a488 <do_IRQ+68/b0>
Trace; c010a4ac <do_IRQ+8c/b0>
Trace; c010901b <system_call+33/38>
Code;  c012493a <__remove_inode_page+4a/70>
00000000 <_EIP>:
Code;  c012493a <__remove_inode_page+4a/70>   <=====
   0:   89 02                     mov    %eax,(%edx)   <=====
Code;  c012493c <__remove_inode_page+4c/70>
   2:   c7 43 34 00 00 00 00      movl   $0x0,0x34(%ebx)
Code;  c0124943 <__remove_inode_page+53/70>
   9:   ff 0d 60 86 25 c0         decl   0xc0258660
Code;  c0124949 <__remove_inode_page+59/70>
   f:   c7 43 08 00 00 00 00      movl   $0x0,0x8(%ebx)

5 warnings issued.  Results may not be reliable.

------------------------------------------------
Jochen Roth              jochen at panix dot com

 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by BrentRBria » Wed, 20 Jun 2001 10:20:22


The folks at RH will probably say I am full of *<again>, but I was
having kernel panics, HD LED would peg on and the screen would go nuts,
system locks up and you can't even telnet in from another box ... thought
I had been hit by lightning, (bad storm around the time I upgraded 6.1 ->
7.0).

Then, in desperation, I downloaded the 2.4.5 kernel from kernel.org,
compiled/installed it, and, wow, all my HARDWARE problems went away.

I was having to run FSCK all the time (with a HUGE number of repairs),
but not since (am using said machine to send this).

B



> Hello,
> I have been running RH 6.0 and later RH 6.1 on a machine for years
> without any problems whatsoever. Hardware is Asus P2B, Pentium II-350,
> 128 MB SDRAM, WDC AC38400L IDE hard disk.
> When I got a new 80G hard disk I decided to install RH 7.1 on the new
> disk and mount the old one at /old. The new disk is a Maxtor 98196H8. I
> updated the mainboard BIOS from 1006 to 1013 in order to install the
> huge new disk. The old boot disk with RH 6.1 still continues to work
> fine with the new BIOS under heavy load.
> Now the problem: When I run RH 7.1 the system will eventually crash
> during heavy disk activity. I run the simple shell script below to
> reproduce the error.
> I tried switching the disk from udma2 mode to mdma2 using the command
> hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes
> slower. Same for trying PIO mode. I also tried running with swap
> disabled, no success either.
> The error is almost always "Unable to handle kernel paging request", and
> this even if no swap space is configured. Most of the OOPSes happen
> within the init process, and the logging information does not make it to
> the /var/log/messages file. Below is the ksymoops output for the crash
> with no swap space configured.
> The new 2.4 kernels appear to run faster than the old 2.2 kernels used
> in RH 6.x, so I would prefer to use them if I can get the system to run
> reliably.
> Comments and suggestions appreciated.  Regards,
> Jochen
> ------------------------------------------------------- Load script to
> induce failure:
> #!/bin/sh
> count=$1
> delay=$2
> [ -z "$count" ] && count=1
> [ -z "$delay" ] && delay=10
> echo count=$count delay=$delay
> index=1
> while [ $index -le $count ]
> do
>         echo -n " $index"
>    # /home/jochen contains about 3 gigs of data in several # thousand
>    files.
>         find /home/jochen -type f -exec echo -n "{} " ";"\
>  -exec sum "{}" ";" >s-$$.$index &
>         sleep $delay
>         index=`expr $index + 1`
> done
> echo
> ------------------------------------------------------- ksymoops output:
> ksymoops 2.4.0 on i686 2.4.2-2.  Options used
>      -v /boot/vmlinux-2.4.2-2 (specified)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.2-2/ (default)
>      -m /boot/System.map (specified)
> Warning (compare_maps): ksyms_base symbol
> __VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux.  Ignoring
> ksyms_base entry Warning (compare_maps): mismatch on symbol
> partition_name  , ksyms_base says c01af860, vmlinux says c0153510.
> Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol
> tulip_max_interrupt_work  , tulip says c88514e0,
> /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc0.
> Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
> Warning (compare_maps): mismatch on symbol tulip_rx_copybreak  , tulip
> says c88514e4, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o
> says c8850bc4.  Ignoring
> /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry Warning
> (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says
> c883f1a0, /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o says
> c883ecc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o
> entry Jun 14 21:21:17 lard kernel: Unable to handle kernel paging
> request at virtual address 40f89194 Jun 14 21:21:17 lard kernel:
> c012493a Jun 14 21:21:17 lard kernel: Oops: 0002 Jun 14 21:21:17 lard
> kernel: CPU:    0 Jun 14 21:21:17 lard kernel: EIP:  
> 0010:[__remove_inode_page+74/112] Jun 14 21:21:17 lard kernel: EIP:  
> 0010:[<c012493a>] Using defaults from ksymoops -t elf32-i386 -a i386 Jun
> 14 21:21:17 lard kernel: EFLAGS: 00010246 Jun 14 21:21:17 lard kernel:
> eax: 00000000   ebx: c10b5f50   ecx: c57a8aa8   edx: 40f89194 Jun 14
> 21:21:17 lard kernel: esi: c10b5f6c   edi: 0000018a   ebp: c0258a04  
> esp: c21f7e60 Jun 14 21:21:18 lard kernel: ds: 0018   es: 0018   ss:
> 0018 Jun 14 21:21:18 lard kernel: Process sum (pid: 30648,
> stackpage=c21f7000) Jun 14 21:21:19 lard kernel: Stack: c10b5f50
> c012bbff c10b5f50 c0258a04 c0258d74 00000001 00000001 c012d6ba Jun 14
> 21:21:19 lard kernel:        c0258a04 00000001 c0258d7c 00000000
> c0258d70 c012d7d5 c0258d70 00000000 Jun 14 21:21:19 lard kernel:      
> 00000001 00000001 00000015 00000001 00000000 c119c8d8 c70d7c28 00002039
> Jun 14 21:21:19 lard kernel: Call Trace: [reclaim_page+687/1056]
> [__alloc_pages_limit+122/176] [__alloc_pages+229/640]
> [generic_file_readahead+494/656] [account_io_end+60/80]
> [do_generic_file_read+528/1344] [ide_end_request+79/96] Jun 14 21:21:19
> lard kernel: Call Trace: [<c012bbff>] [<c012d6ba>] [<c012d7d5>]
> [<c01259fe>] [<c0162d3c>] [<c0125cb0>] [<c0188bcf>] Jun 14 21:21:19 lard
> kernel:        [<c0126134>] [<c0125fe0>] [<c01339a6>] [<c010a488>]
> [<c010a4ac>] [<c010901b>] Jun 14 21:21:19 lard kernel: Code: 89 02 c7 43
> 34 00 00 00 00 ff 0d 60 86 25 c0 c7 43 08 00 00
>>>EIP; c012493a <__remove_inode_page+4a/70>   <=====
> Trace; c012bbff <reclaim_page+2af/420> Trace; c012d6ba
> <__alloc_pages_limit+7a/b0> Trace; c012d7d5 <__alloc_pages+e5/280>
> Trace; c01259fe <generic_file_readahead+1ee/290> Trace; c0162d3c
> <account_io_end+3c/50> Trace; c0125cb0 <do_generic_file_read+210/540>
> Trace; c0188bcf <ide_end_request+4f/60> Trace; c0126134
> <generic_file_read+64/80> Trace; c0125fe0 <file_read_actor+0/f0> Trace;
> c01339a6 <sys_read+96/d0>
> Trace; c010a488 <do_IRQ+68/b0>
> Trace; c010a4ac <do_IRQ+8c/b0>
> Trace; c010901b <system_call+33/38>
> Code;  c012493a <__remove_inode_page+4a/70> 00000000 <_EIP>:
> Code;  c012493a <__remove_inode_page+4a/70>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c012493c <__remove_inode_page+4c/70>
>    2:   c7 43 34 00 00 00 00      movl   $0x0,0x34(%ebx)
> Code;  c0124943 <__remove_inode_page+53/70>
>    9:   ff 0d 60 86 25 c0         decl   0xc0258660
> Code;  c0124949 <__remove_inode_page+59/70>
>    f:   c7 43 08 00 00 00 00      movl   $0x0,0x8(%ebx)
> 5 warnings issued.  Results may not be reliable.  
> ------------------------------------------------ Jochen Roth            
>  jochen at panix dot com


 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by Dave Uhrin » Wed, 20 Jun 2001 10:29:07



> Hello,

> I have been running RH 6.0 and later RH 6.1 on a machine for years without
> any problems whatsoever. Hardware is Asus P2B, Pentium II-350, 128 MB
> SDRAM, WDC AC38400L IDE hard disk.

> When I got a new 80G hard disk I decided to install RH 7.1 on the new disk
> and mount the old one at /old. The new disk is a Maxtor 98196H8. I updated
> the mainboard BIOS from 1006 to 1013 in order to install the huge new
> disk. The old boot disk with RH 6.1 still continues to work fine with the
> new BIOS under heavy load.

> Now the problem: When I run RH 7.1 the system will eventually crash during
> heavy disk activity. I run the simple shell script below to reproduce the
> error.

> I tried switching the disk from udma2 mode to mdma2 using the command
> hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes slower.
> Same for trying PIO mode. I also tried running with swap disabled, no
> success either.

> The error is almost always "Unable to handle kernel paging request", and
> this even if no swap space is configured. Most of the OOPSes happen within
> the init process, and the logging information does not make it to the
> /var/log/messages file. Below is the ksymoops output for the crash with
> no swap space configured.

> The new 2.4 kernels appear to run faster than the old 2.2 kernels used in
> RH 6.x, so I would prefer to use them if I can get the system to run
> reliably.

> Comments and suggestions appreciated.

> Regards,
> Jochen

I have had problems running the 2.4 kernels on older hardware which is well
supported in the 2.2 kernels.  Where I have installed RH-7.1 on older
hardware, I have also installed 2.2.19 kernel and all runs well.
 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by Glen Sanf » Wed, 20 Jun 2001 10:35:15



> Hello,

> I have been running RH 6.0 and later RH 6.1 on a machine for years without
> any problems whatsoever. Hardware is Asus P2B, Pentium II-350, 128 MB SDRAM,
> WDC AC38400L IDE hard disk.

> When I got a new 80G hard disk I decided to install RH 7.1 on the new disk
> and mount the old one at /old. The new disk is a Maxtor 98196H8. I updated
> the mainboard BIOS from 1006 to 1013 in order to install the huge new disk.
> The old boot disk with RH 6.1 still continues to work fine with the new
> BIOS under heavy load.

> Now the problem: When I run RH 7.1 the system will eventually crash during
> heavy disk activity. I run the simple shell script below to reproduce the
> error.

> I tried switching the disk from udma2 mode to mdma2 using the command
> hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes slower.
> Same for trying PIO mode. I also tried running with swap disabled, no
> success either.

> The error is almost always "Unable to handle kernel paging request", and
> this even if no swap space is configured. Most of the OOPSes happen within
> the init process, and the logging information does not make it to the
> /var/log/messages file. Below is the ksymoops output for the crash with
> no swap space configured.

> The new 2.4 kernels appear to run faster than the old 2.2 kernels used in
> RH 6.x, so I would prefer to use them if I can get the system to run
> reliably.

> Comments and suggestions appreciated.

> Regards,
> Jochen

> -------------------------------------------------------
> Load script to induce failure:

> #!/bin/sh
> count=$1
> delay=$2
> [ -z "$count" ] && count=1
> [ -z "$delay" ] && delay=10
> echo count=$count delay=$delay

> index=1
> while [ $index -le $count ]
> do
>         echo -n " $index"
>         # /home/jochen contains about 3 gigs of data in several
>         # thousand files.
>         find /home/jochen -type f -exec echo -n "{} " ";"\
>  -exec sum "{}" ";" >s-$$.$index &
>         sleep $delay
>         index=`expr $index + 1`
> done
> echo

> -------------------------------------------------------
> ksymoops output:

> ksymoops 2.4.0 on i686 2.4.2-2.  Options used
>      -v /boot/vmlinux-2.4.2-2 (specified)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.2-2/ (default)
>      -m /boot/System.map (specified)

> Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux.  Ignoring ksyms_base entry
> Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01af860, vmlinux says c0153510.  Ignoring ksyms_base entry
> Warning (compare_maps): mismatch on symbol tulip_max_interrupt_work  , tulip says c88514e0, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
> Warning (compare_maps): mismatch on symbol tulip_rx_copybreak  , tulip says c88514e4, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc4.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
> Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says c883f1a0, /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o says c883ecc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o entry
> Jun 14 21:21:17 lard kernel: Unable to handle kernel paging request at virtual address 40f89194
> Jun 14 21:21:17 lard kernel: c012493a
> Jun 14 21:21:17 lard kernel: Oops: 0002
> Jun 14 21:21:17 lard kernel: CPU:    0
> Jun 14 21:21:17 lard kernel: EIP:    0010:[__remove_inode_page+74/112]
> Jun 14 21:21:17 lard kernel: EIP:    0010:[<c012493a>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> Jun 14 21:21:17 lard kernel: EFLAGS: 00010246
> Jun 14 21:21:17 lard kernel: eax: 00000000   ebx: c10b5f50   ecx: c57a8aa8   edx: 40f89194
> Jun 14 21:21:17 lard kernel: esi: c10b5f6c   edi: 0000018a   ebp: c0258a04   esp: c21f7e60
> Jun 14 21:21:18 lard kernel: ds: 0018   es: 0018   ss: 0018
> Jun 14 21:21:18 lard kernel: Process sum (pid: 30648, stackpage=c21f7000)
> Jun 14 21:21:19 lard kernel: Stack: c10b5f50 c012bbff c10b5f50 c0258a04 c0258d74 00000001 00000001 c012d6ba
> Jun 14 21:21:19 lard kernel:        c0258a04 00000001 c0258d7c 00000000 c0258d70 c012d7d5 c0258d70 00000000
> Jun 14 21:21:19 lard kernel:        00000001 00000001 00000015 00000001 00000000 c119c8d8 c70d7c28 00002039
> Jun 14 21:21:19 lard kernel: Call Trace: [reclaim_page+687/1056] [__alloc_pages_limit+122/176] [__alloc_pages+229/640] [generic_file_readahead+494/656] [account_io_end+60/80] [do_generic_file_read+528/1344] [ide_end_request+79/96]
> Jun 14 21:21:19 lard kernel: Call Trace: [<c012bbff>] [<c012d6ba>] [<c012d7d5>] [<c01259fe>] [<c0162d3c>] [<c0125cb0>] [<c0188bcf>]
> Jun 14 21:21:19 lard kernel:        [<c0126134>] [<c0125fe0>] [<c01339a6>] [<c010a488>] [<c010a4ac>] [<c010901b>]
> Jun 14 21:21:19 lard kernel: Code: 89 02 c7 43 34 00 00 00 00 ff 0d 60 86 25 c0 c7 43 08 00 00

> >>EIP; c012493a <__remove_inode_page+4a/70>   <=====
> Trace; c012bbff <reclaim_page+2af/420>
> Trace; c012d6ba <__alloc_pages_limit+7a/b0>
> Trace; c012d7d5 <__alloc_pages+e5/280>
> Trace; c01259fe <generic_file_readahead+1ee/290>
> Trace; c0162d3c <account_io_end+3c/50>
> Trace; c0125cb0 <do_generic_file_read+210/540>
> Trace; c0188bcf <ide_end_request+4f/60>
> Trace; c0126134 <generic_file_read+64/80>
> Trace; c0125fe0 <file_read_actor+0/f0>
> Trace; c01339a6 <sys_read+96/d0>
> Trace; c010a488 <do_IRQ+68/b0>
> Trace; c010a4ac <do_IRQ+8c/b0>
> Trace; c010901b <system_call+33/38>
> Code;  c012493a <__remove_inode_page+4a/70>
> 00000000 <_EIP>:
> Code;  c012493a <__remove_inode_page+4a/70>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c012493c <__remove_inode_page+4c/70>
>    2:   c7 43 34 00 00 00 00      movl   $0x0,0x34(%ebx)
> Code;  c0124943 <__remove_inode_page+53/70>
>    9:   ff 0d 60 86 25 c0         decl   0xc0258660
> Code;  c0124949 <__remove_inode_page+59/70>
>    f:   c7 43 08 00 00 00 00      movl   $0x0,0x8(%ebx)

> 5 warnings issued.  Results may not be reliable.

> ------------------------------------------------
> Jochen Roth              jochen at panix dot com

I don't know if it's applicable or not, but your post brought to mind
http://uwsg.iu.edu/hypermail/linux/kernel/0001.3/1067.html

Glen

 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by Craig Kelle » Wed, 20 Jun 2001 13:36:29



> Now the problem: When I run RH 7.1 the system will eventually crash during
> heavy disk activity. I run the simple shell script below to reproduce the
> error.

You probably want to upgrade to 2.4.5 -- although it still has some
known VM bugs in it as well (ACox still runs 2.2 because of this).
2.4.2 had some pretty bad bugs in it; specifically the VM system.  It
works great on many systems, but croaks on others (especially MP).

  -Craig

 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by John Ouellett » Wed, 20 Jun 2001 22:47:10


You said in your message that you don't have any swap space enabled:
that might be a big
problem.  I don't know if you've seen any of the posts on the kernel
groups, but the
2.4 kernels handle swapping issues much differently than the 2.2 kernels
(some would
say more poorly).  With the 2.2 kernels you were reasonably safe having
little or no
swap space if you had tons of memory: this doesn't seem to be true.  If
you don't have
a swap partition, make an on-disk swap file with mkswap -- make it
fairly sizeable and see
if that helps with your problems.  

BTW, Maxtor disks suck.  Always have, and from the looks of it, always
will. Loud and slow.

Hope the swap stuff helps.
John Ouellette


> Hello,

> I have been running RH 6.0 and later RH 6.1 on a machine for years without
> any problems whatsoever. Hardware is Asus P2B, Pentium II-350, 128 MB SDRAM,
> WDC AC38400L IDE hard disk.

> When I got a new 80G hard disk I decided to install RH 7.1 on the new disk
> and mount the old one at /old. The new disk is a Maxtor 98196H8. I updated
> the mainboard BIOS from 1006 to 1013 in order to install the huge new disk.
> The old boot disk with RH 6.1 still continues to work fine with the new
> BIOS under heavy load.

> Now the problem: When I run RH 7.1 the system will eventually crash during
> heavy disk activity. I run the simple shell script below to reproduce the
> error.

> I tried switching the disk from udma2 mode to mdma2 using the command
> hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes slower.
> Same for trying PIO mode. I also tried running with swap disabled, no
> success either.

> The error is almost always "Unable to handle kernel paging request", and
> this even if no swap space is configured. Most of the OOPSes happen within
> the init process, and the logging information does not make it to the
> /var/log/messages file. Below is the ksymoops output for the crash with
> no swap space configured.

> The new 2.4 kernels appear to run faster than the old 2.2 kernels used in
> RH 6.x, so I would prefer to use them if I can get the system to run
> reliably.

> Comments and suggestions appreciated.

> Regards,
> Jochen

> -------------------------------------------------------
> Load script to induce failure:

> #!/bin/sh
> count=$1
> delay=$2
> [ -z "$count" ] && count=1
> [ -z "$delay" ] && delay=10
> echo count=$count delay=$delay

> index=1
> while [ $index -le $count ]
> do
>         echo -n " $index"
>         # /home/jochen contains about 3 gigs of data in several
>         # thousand files.
>         find /home/jochen -type f -exec echo -n "{} " ";"\
>  -exec sum "{}" ";" >s-$$.$index &
>         sleep $delay
>         index=`expr $index + 1`
> done
> echo

> -------------------------------------------------------
> ksymoops output:

> ksymoops 2.4.0 on i686 2.4.2-2.  Options used
>      -v /boot/vmlinux-2.4.2-2 (specified)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.2-2/ (default)
>      -m /boot/System.map (specified)

> Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux.  Ignoring ksyms_base entry
> Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01af860, vmlinux says c0153510.  Ignoring ksyms_base entry
> Warning (compare_maps): mismatch on symbol tulip_max_interrupt_work  , tulip says c88514e0, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
> Warning (compare_maps): mismatch on symbol tulip_rx_copybreak  , tulip says c88514e4, /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o says c8850bc4.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/net/tulip/tulip.o entry
> Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says c883f1a0, /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o says c883ecc0.  Ignoring /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o entry
> Jun 14 21:21:17 lard kernel: Unable to handle kernel paging request at virtual address 40f89194
> Jun 14 21:21:17 lard kernel: c012493a
> Jun 14 21:21:17 lard kernel: Oops: 0002
> Jun 14 21:21:17 lard kernel: CPU:    0
> Jun 14 21:21:17 lard kernel: EIP:    0010:[__remove_inode_page+74/112]
> Jun 14 21:21:17 lard kernel: EIP:    0010:[<c012493a>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> Jun 14 21:21:17 lard kernel: EFLAGS: 00010246
> Jun 14 21:21:17 lard kernel: eax: 00000000   ebx: c10b5f50   ecx: c57a8aa8   edx: 40f89194
> Jun 14 21:21:17 lard kernel: esi: c10b5f6c   edi: 0000018a   ebp: c0258a04   esp: c21f7e60
> Jun 14 21:21:18 lard kernel: ds: 0018   es: 0018   ss: 0018
> Jun 14 21:21:18 lard kernel: Process sum (pid: 30648, stackpage=c21f7000)
> Jun 14 21:21:19 lard kernel: Stack: c10b5f50 c012bbff c10b5f50 c0258a04 c0258d74 00000001 00000001 c012d6ba
> Jun 14 21:21:19 lard kernel:        c0258a04 00000001 c0258d7c 00000000 c0258d70 c012d7d5 c0258d70 00000000
> Jun 14 21:21:19 lard kernel:        00000001 00000001 00000015 00000001 00000000 c119c8d8 c70d7c28 00002039
> Jun 14 21:21:19 lard kernel: Call Trace: [reclaim_page+687/1056] [__alloc_pages_limit+122/176] [__alloc_pages+229/640] [generic_file_readahead+494/656] [account_io_end+60/80] [do_generic_file_read+528/1344] [ide_end_request+79/96]
> Jun 14 21:21:19 lard kernel: Call Trace: [<c012bbff>] [<c012d6ba>] [<c012d7d5>] [<c01259fe>] [<c0162d3c>] [<c0125cb0>] [<c0188bcf>]
> Jun 14 21:21:19 lard kernel:        [<c0126134>] [<c0125fe0>] [<c01339a6>] [<c010a488>] [<c010a4ac>] [<c010901b>]
> Jun 14 21:21:19 lard kernel: Code: 89 02 c7 43 34 00 00 00 00 ff 0d 60 86 25 c0 c7 43 08 00 00

> >>EIP; c012493a <__remove_inode_page+4a/70>   <=====
> Trace; c012bbff <reclaim_page+2af/420>
> Trace; c012d6ba <__alloc_pages_limit+7a/b0>
> Trace; c012d7d5 <__alloc_pages+e5/280>
> Trace; c01259fe <generic_file_readahead+1ee/290>
> Trace; c0162d3c <account_io_end+3c/50>
> Trace; c0125cb0 <do_generic_file_read+210/540>
> Trace; c0188bcf <ide_end_request+4f/60>
> Trace; c0126134 <generic_file_read+64/80>
> Trace; c0125fe0 <file_read_actor+0/f0>
> Trace; c01339a6 <sys_read+96/d0>
> Trace; c010a488 <do_IRQ+68/b0>
> Trace; c010a4ac <do_IRQ+8c/b0>
> Trace; c010901b <system_call+33/38>
> Code;  c012493a <__remove_inode_page+4a/70>
> 00000000 <_EIP>:
> Code;  c012493a <__remove_inode_page+4a/70>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c012493c <__remove_inode_page+4c/70>
>    2:   c7 43 34 00 00 00 00      movl   $0x0,0x34(%ebx)
> Code;  c0124943 <__remove_inode_page+53/70>
>    9:   ff 0d 60 86 25 c0         decl   0xc0258660
> Code;  c0124949 <__remove_inode_page+59/70>
>    f:   c7 43 08 00 00 00 00      movl   $0x0,0x8(%ebx)

> 5 warnings issued.  Results may not be reliable.

> ------------------------------------------------
> Jochen Roth              jochen at panix dot com

--
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
John Ouellette                     | Ph: 212-313-7919
Department of Astrophysics         | Fax: 212-769-5007

Central Park West at 79th St.      |
http://research.amnh.org/astrophysics
New York, NY  10024-5192           |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 
 

RH 7.1 crashes under heavy disk load, RH 6.1 works fine

Post by Jochen Rot » Thu, 21 Jun 2001 09:44:57



> You said in your message that you don't have any swap space enabled:
> that might be a big problem.

What I wrote below meant to say that I first tried with swap, then
turned it off to see if that would help. Sorry for the confusion.

Quote:> BTW, Maxtor disks suck.  Always have, and from the looks of it, always
> will. Loud and slow.

That's strange. I have a fast and quiet Maxtor 40G disk in a system here.
Never had a problem with it, I guess I just got lucky with it.


>> Hello,
..
>> I tried switching the disk from udma2 mode to mdma2 using the command
>> hdparm -d1 -X34 /dev/hda with no success. Disk access just becomes slower.
>> Same for trying PIO mode. I also tried running with swap disabled, no
>> success either.

------------------------------------------------
Jochen Roth              jochen at panix dot com