PROBLEM: various oops's in 2.2.19 SMP kernel.

PROBLEM: various oops's in 2.2.19 SMP kernel.

Post by Michael J Schou » Fri, 29 Jun 2001 01:30:13



[1.] One line summary of the problem:    

PROBLEM: various oops's in 2.2.19 SMP kernel.

[2.] Full description of the problem/report:

The machine runs along fine for a couple of weeks, and will eventually hang
with an oops.   I have had this happen approximately 5 times to date with
different 2.2.x kernels, but have not reported it until now.  The ooops's are
not always the same.

Latest oops (2.2.19 SMP)
-------------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address 00000100
current->tss.cr3 = 1c6ad000, %%cr3 = 1c6ad000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[find_buffer+104/144]
EFLAGS: 00010206
eax: 00000100   ebx: 00000007   ecx: 0005ff98   edx: 00000100
esi: 0000000d   edi: 00003006   ebp: 0003198f   esp: e426dee4
ds: 0018   es: 0018   ss: 0018
Process postmaster (pid: 13314, process nr: 30, stackpage=e426d000)
Stack: edffc498 00000000 0005ff98 c012bd04 00003006 0003198f 00001000 00000000  
       c0142d42 00003006 0003198f 00001000 000003fc 00000000 c0142ece edffc498  
       d9eecff0 00000000 0000001b 00000000 edffc498 00000000 d18c5320 c0142f4e  
Call Trace: [get_hash_table+24/76] [sync_block+46/216] [sync_indirect+78/128] [sync_dindirect+78/128] [ext2_sync_file+103/164] [sys_fsync+143/200] [system_call+52/56]  [_stext+43/164]  
-------------------------------------------

An older oops (2.2.16 SMP (redhat's kernel - redhats patches))
-------------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address 00000134
current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[remove_from_queues+169/328]
EFLAGS: 00010206
eax: 00000100   ebx: de47ae40   ecx: de47ae40   edx: e4b886c0
esi: 0000000c   edi: 00000000   ebp: 00000007   esp: c2339ed0
ds: 0018   es: 0018   ss: 0018
Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000)
Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19
       de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008
       00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc
Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/368] [system_call+52/56]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d

-------------------------------------------

THe machine is a dual CPU (both PIII 500's) SMP machine with a Mylex DAC960
raid controller

[3.] Keywords (i.e., modules, networking, kernel):

    linux kernel 2.2.19 SMP oops 0000 [find_buffer+104/144]

    loaded modules:
    3c59x
    DAC960

[4.] Kernel version (from /proc/version):

Linux version 2.2.19 (r...@deathstar.gkg-com.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Fri Apr 27 10:49:53 CDT 2001

[5.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/oops-tracing.txt)

-----------------------------------------------------------------------
Latest oops (2.2.19 SMP vanilla)

Options used: -V (default)
              -o /lib/modules/2.2.19/ (default)
              -k /proc/ksyms (default)
              -l /proc/modules (default)
              -m /boot/System.map (specified)
              -c 1 (default)

Unable to handle kernel NULL pointer dereference at virtual address 00000134
current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[remove_from_queues+169/328]
EFLAGS: 00010206
eax: 00000100   ebx: de47ae40   ecx: de47ae40   edx: e4b886c0
esi: 0000000c   edi: 00000000   ebp: 00000007   esp: c2339ed0
ds: 0018   es: 0018   ss: 0018
Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000)
Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19
       de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008
       00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc
Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/368] [system_call+52/56]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d

Code:  00000000 Before first symbol            00000000 <_IP>: <===
Code:  00000000 Before first symbol               0:    89 50 34                mov    %edx,0x34(%eax) <===
Code:  00000003 Before first symbol               3:    c7 01 00 00 00 00       movl   $0x0,(%ecx)
Code:  00000009 Before first symbol               9:    89 02                   mov    %eax,(%edx)
Code:  0000000b Before first symbol               b:    c7 41 34 00 00 00 00    movl   $0x0,0x34(%ecx)
Code:  00000012 Before first symbol              12:    ff 0d 00 00 00 00       decl   0x0

833 warnings issued.  Results may not be reliable.

-----------------------------------------------------------------------
Older oops (2.2.16-3 SMP (redhat stock)

Options used: -V (default)
              -o /lib/modules/2.2.19/ (default)
              -k /proc/ksyms (default)
              -l /proc/modules (default)
              -m /boot/System.map-2.2.16-3smp (specified)
              -c 1 (default)

Unable to handle kernel NULL pointer dereference at virtual address 00000134
current->tss.cr3 = 03b8f000, %%cr3 = 03b8f000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[remove_from_queues+169/328]
EFLAGS: 00010206
eax: 00000100   ebx: de47ae40   ecx: de47ae40   edx: e4b886c0
esi: 0000000c   edi: 00000000   ebp: 00000007   esp: c2339ed0
ds: 0018   es: 0018   ss: 0018
Process postmaster (pid: 18747, process nr: 54, stackpage=c2339000)
Stack: 0007a671 c012b752 de47ae40 de47ae40 ece48420 c012bf5a de47ae40 c0148b19
       de47ae40 00000000 eb28c000 00000000 eb28c0d0 00001000 ec54d0b8 00000008
       00000400 0007a66a 00000000 0000002e c0148edf eb28c000 0000000c eb28c0cc
Call Trace: [put_last_free+50/124] [__bforget+34/40] [trunc_indirect+505/692] [ext2_truncate+107/516] [do_truncate+85/132] [do_truncate+105/132] [filp_open+172/240] [sys_ftruncate+322/
368] [system_call+52/56]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d

Code:  00000000 Before first symbol            00000000 <_IP>: <===
Code:  00000000 Before first symbol               0:    89 50 34                mov    %edx,0x34(%eax) <===
Code:  00000003 Before first symbol               3:    c7 01 00 00 00 00       movl   $0x0,(%ecx)
Code:  00000009 Before first symbol               9:    89 02                   mov    %eax,(%edx)
Code:  0000000b Before first symbol               b:    c7 41 34 00 00 00 00    movl   $0x0,0x34(%ecx)
Code:  00000012 Before first symbol              12:    ff 0d 00 00 00 00       decl   0x0

843 warnings issued.  Results may not be reliable.

[6.] A small shell script or example program which triggers the
     problem (if possible)

Not applicable. Cant reproduce easily.  The oops's have been happening
anytime after 3 to 6 weeks of uptime.

[7.] Environment

Machine is an SMP dual PIII 500 machine with 768MB RAM,
Mylex DAC960 RAID controller in RAID 5 configuration with
5 active disks, 1 standby disk.

this machine functions as a postgresql (7.0.3) server. Nothing else signifigant
is running on this machine.  The machine is running redhat 6.2 with all of the
released updates applied, and is running a stock 2.2.19 kernel that was
downloaded from ftp.us.kernel.org and was compiled on this same machine.

[7.1.] Software (add the output of the ver_linux script here)

Linux deathstar.gkg-com.com 2.2.19 #1 SMP Fri Apr 27 10:49:53 CDT 2001 i686 unknown

Gnu C                  egcs-2.91.66
Gnu make               3.78.1
binutils               2.9.5.0.22
util-linux             2.10r
modutils               2.3.21
e2fsprogs              1.18
pcmcia-cs              3.1.8
Linux C Library        2.1.3
Dynamic linker (ldd)   2.1.3
Procps                 2.0.6
Net-tools              1.54
Console-tools          0.3.3
Sh-utils               2.0
Modules Loaded         3c59x DAC960

[7.2.] Processor information (from /proc/cpuinfo):

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 501.147
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
bogomips        : 999.42

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 501.147
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
bogomips        : 999.42

[7.3.] Module information (from /proc/modules):

3c59x                  22480   2 (autoclean)
DAC960                 60848   3

[7.4.] SCSI information (from /proc/scsi/scsi)

/proc/scsi/scsi:
Attached devices: none

/proc/rd/c0/current_status:
***** DAC960 RAID Driver Version 2.2.10 of 1 February 2001 *****
Copyright 1998-2001 by Leonard N. Zubkoff <l...@dandelion.com>
Configuring Mylex DAC960PTL1 PCI RAID Controller
  Firmware Version: 4.07-0-29, Channels: 1, Memory Size: 8MB
  PCI Bus: 0, Device: 18, Function: 1, I/O Address: Unassigned
  PCI Address: 0xFC8FE000 mapped at 0xF0810000, IRQ Channel: 18
  Controller Queue Depth: 124, Maximum Blocks per Command: 128
  Driver Queue Depth: 123, Scatter/Gather Limit: 33 of 33 Segments
  Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
  Physical Devices:
    0:0  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0NXK100007008KQ52
         Disk Status: Online, 17782784 blocks
    0:1  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0P60T00007012R69K
         Disk Status: Online, 17782784 blocks
    0:2  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0P4VE00007012R7QV
         Disk Status: Online, 17782784 blocks
    0:3  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0P3V700001005HKUC
         Disk Status: Standby, 17782784 blocks
    0:4  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0P3SW000070113RRF
         Disk Status: Online, 17782784 blocks
    0:5  Vendor: SEAGATE   Model: ST39175LW         Revision: 0001
         Serial Number: 3AL0P1MV00007012RDGX
         Disk Status: Online, 17782784 blocks
  Logical Drives:
    /dev/rd/c0d0: RAID-5, Online, 71131136 blocks, Write Thru
  No Rebuild or Consistency Check in Progress

[7.5.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

None.S

[X.] Other notes, patches, fixes, workarounds:

None.  Please email me directly if you need any additional
information on this problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/