Linux Kernel 2.4.18 and 2.4.19 problems

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Thomas Lang? » Sun, 21 Jul 2002 04:40:06



We've got a few Dell PowerEdge 2650 machines, and thought they would
become nice fileservers, and we installed RedHat Linux 7.3 on them.
So far, so good; after the installation, pretty much was downhill
from there. With RedHat's 2.4.18-3 and 2.4.18-5 kernel we detect
all disks connected through our QLogic FC 2200 HBA's, with
vanilla 2.4.18 and 2.4.19-rc2, we detect nothing; and we've tried
Qlogic's 6.0beta13 and 6.1beta2 drivers, as well as the driver
that comes with redhat's release. We're currently running an almost
identical configuration, only diff. is one HBA pr server, and
the servers are 2550's and not 2650's.

Ok, to sum up problems:

With redhat kernels:
* Disks found, _but_ after about 2-3 mins with heavy I/O on
  FC HBA's the machine dies, and only thing working is cold boot

With vanilla kernels:
* Disks not found, so we don't know about I/O problems.

Anyone have any ideas?

Here's dmesg and lspci -vvvxx, if anything else is needed, please
tell me, and I'll provide you with the info:

test4:~# dmesg
 idx=8 mapped at ffff6000
ACPI table found: APIC v1 [DELL   PE2650   0.1]
__va_range(0xfdd18, 0x88): idx=8 mapped at ffff6000
LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 0 (0x0000) enabledProcessor #0 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0002] id[0x2] enabled[1])
CPU 1 (0x0200) enabledProcessor #2 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0003] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Unknown CPU [15:2] APIC version 16

LAPIC (acpi_id[0x0004] id[0x3] enabled[1])
CPU 3 (0x0300) enabledProcessor #3 Unknown CPU [15:2] APIC version 16

IOAPIC (id[0x4] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x5] address[0xfec01000] global_irq_base[0x10])
IOAPIC (id[0x6] address[0xfec02000] global_irq_base[0x20])
LAPIC_NMI (acpi_id[0x0001] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0002] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0003] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0004] polarity[0x1] trigger[0x1] lint[0x1])
4 CPUs total
Local APIC address fee00000
__va_range(0xfdda0, 0x24): idx=8 mapped at ffff6000
__va_range(0xfdda0, 0x50): idx=8 mapped at ffff6000
ACPI table found: SPCR v1 [DELL   PE2650   0.1]
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: DELL     Product ID: PE 0121      APIC at: 0xFEE00000
I/O APIC #4 Version 17 at 0xFEC00000.
I/O APIC #5 Version 17 at 0xFEC01000.
I/O APIC #6 Version 17 at 0xFEC02000.
Processors: 4
Kernel command line: auto BOOT_IMAGE=linux ro root=802 BOOT_FILE=/boot/bzImage-2.4.18-3 max_scsi_luns=128
Initializing CPU#0
Detected 1794.244 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 3578.26 BogoMIPS
Memory: 2065004k/2097088k available (1578k kernel code, 31700k reserved, 469k data, 220k init, 1179584k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 131072 (order: 8, 1048576 bytes)
Mount-cache hash table entries: 32768 (order: 6, 262144 bytes)
Buffer cache hash table entries: 131072 (order: 7, 524288 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgo...@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU0: Intel(R) XEON(TM) CPU 1.80GHz stepping 04
per-CPU timeslice cutoff: 1462.89 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000040
ESR value after enabling vector: 00000000
Booting processor 1/1 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 3578.26 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) XEON(TM) CPU 1.80GHz stepping 04
Booting processor 2/2 eip 2000
Initializing CPU#2
masked ExtINT on CPU#2
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 3578.26 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#2.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU2: Intel(R) XEON(TM) CPU 1.80GHz stepping 04
Booting processor 3/3 eip 2000
Initializing CPU#3
masked ExtINT on CPU#3
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 3578.26 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#3.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU3: Intel(R) XEON(TM) CPU 1.80GHz stepping 04
Total of 4 processors activated (14313.06 BogoMIPS).
cpu_sibling_map[0] = 1
cpu_sibling_map[1] = 0
cpu_sibling_map[2] = 3
cpu_sibling_map[3] = 2
ENABLING IO-APIC IRQs
Setting 4 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 4 ... ok.
Setting 5 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 5 ... ok.
Setting 6 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 6 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 4-0, 4-7, 4-10, 4-11, 4-13, 6-0, 6-1, 6-2, 6-3, 6-4, 6-5, 6-6, 6-7, 6-8, 6-9, 6-10, 6-11, 6-12, 6-13,
6-14, 6-15 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
number of MP IRQ sources: 35.
number of IO-APIC #4 registers: 16.
number of IO-APIC #5 registers: 16.
number of IO-APIC #6 registers: 16.
testing the IO APIC.......................

IO APIC #4......
.... register #00: 04000000
.......    : physical APIC id: 04
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 04000000
.......     : arbitration: 04
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  
 00 00F 0F  0    0    0   0   0    1    1    31
 01 00F 0F  0    0    0   0   0    1    1    39
 02 000 00  1    0    0   0   0    0    0    00
 03 00F 0F  0    0    0   0   0    1    1    41
 04 00F 0F  0    0    0   0   0    1    1    49
 05 00F 0F  1    1    0   1   0    1    1    51
 06 00F 0F  0    0    0   0   0    1    1    59
 07 000 00  1    0    0   0   0    0    0    00
 08 00F 0F  0    0    0   0   0    1    1    61
 09 00F 0F  0    0    0   0   0    1    1    69
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 00F 0F  0    0    0   0   0    1    1    71
 0d 000 00  1    0    0   0   0    0    0    00
 0e 00F 0F  0    0    0   0   0    1    1    79
 0f 00F 0F  0    0    0   0   0    1    1    81

IO APIC #5......
.... register #00: 05000000
.......    : physical APIC id: 05
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 05000000
.......     : arbitration: 05
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  
 00 00F 0F  1    1    0   1   0    1    1    89
 01 00F 0F  1    1    0   1   0    1    1    91
 02 00F 0F  1    1    0   1   0    1    1    99
 03 00F 0F  1    1    0   1   0    1    1    A1
 04 00F 0F  1    1    0   1   0    1    1    A9
 05 00F 0F  1    1    0   1   0    1    1    B1
 06 00F 0F  1    1    0   1   0    1    1    B9
 07 00F 0F  1    1    0   1   0    1    1    C1
 08 00F 0F  1    1    0   1   0    1    1    C9
 09 00F 0F  1    1    0   1   0    1    1    D1
 0a 00F 0F  1    1    0   1   0    1    1    D9
 0b 00F 0F  1    1    0   1   0    1    1    E1
 0c 00F 0F  1    1    0   1   0    1    1    E9
 0d 00F 0F  1    1    0   1   0    1    1    32
 0e 00F 0F  1    1    0   1   0    1    1    3A
 0f 00F 0F  1    1    0   1   0    1    1    42

IO APIC #6......
.... register #00: 06000000
.......    : physical APIC id: 06
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 06000000
.......    
...

read more »

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Patrick Mansfiel » Sun, 21 Jul 2002 06:10:14



> We've got a few Dell PowerEdge 2650 machines, and thought they would
> become nice fileservers, and we installed RedHat Linux 7.3 on them.
> So far, so good; after the installation, pretty much was downhill
> from there. With RedHat's 2.4.18-3 and 2.4.18-5 kernel we detect
> all disks connected through our QLogic FC 2200 HBA's, with
> vanilla 2.4.18 and 2.4.19-rc2, we detect nothing; and we've tried
> Qlogic's 6.0beta13 and 6.1beta2 drivers, as well as the driver
> that comes with redhat's release. We're currently running an almost
> identical configuration, only diff. is one HBA pr server, and
> the servers are 2550's and not 2650's.

> Ok, to sum up problems:

> With redhat kernels:
> * Disks found, _but_ after about 2-3 mins with heavy I/O on
>   FC HBA's the machine dies, and only thing working is cold boot

> With vanilla kernels:
> * Disks not found, so we don't know about I/O problems.

> Anyone have any ideas?

> Here's dmesg and lspci -vvvxx, if anything else is needed, please
> tell me, and I'll provide you with the info:

> test4:~# dmesg
> Processors: 4
> Kernel command line: auto BOOT_IMAGE=linux ro root=802 BOOT_FILE=/boot/bzImage-2.4.18-3 max_scsi_luns=128

So this is the dmesg for the redhat 2.4.18-3? You said above that
it found the disks, but, further down the qla driver inits and shows:

- Show quoted text -

> qla2x00_set_info starts at address = f8836060
> qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

> scsi(1): Allocated 4096 SRB(s)
> PCI: Setting latency timer of device 02:04.0 to 64
> scsi(1): Configure NVRAM parameters...
> scsi(1): 64 Bit PCI Addressing Enabled
> scsi(1): Verifying loaded RISC code...
> scsi(1): Verifying chip...
> scsi(1): Waiting for LIP to complete...
> scsi(1): Cable is unplugged...
> qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

> scsi(2): Allocated 4096 SRB(s)
> PCI: Setting latency timer of device 02:05.0 to 64
> scsi(2): Configure NVRAM parameters...
> scsi(2): 64 Bit PCI Addressing Enabled
> scsi(2): Verifying loaded RISC code...
> scsi(2): Verifying chip...
> scsi(2): Waiting for LIP to complete...
> scsi(2): Cable is unplugged...
> scsi1 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 4 irq 16
>         Firmware version:  2.02.03, Driver version 6.1b2
> scsi2 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 5 irq 17
>         Firmware version:  2.02.03, Driver version 6.1b2

It complains about "Cable is unplugged", and does not find any drives.
So, it looks like your redhat kernel is not finding any drives.

You might want to check the hardware and connections. I've seen the qla
(I'm using some beta6 with 2.5.25) get confused as to the state of the
adapter and its connection.

If you turn on scsi logging (be careful, if syslog is running you can get
infinite logging), and insmod your driver, you might get some useful
information, I use:

        echo scsi log scan 5  >/proc/scsi/scsi

The above is safe to use with syslog running (since it logs the scsi
scanning that happens when the adapter comes up, but not all IO).

Also, cat /proc/scsi/scsi and /proc/scsi/qla*/[0-9] and see what they show.

If the adapter appears to find devices, but scanning does not (likely
lun problems), try manually scanning for a device, for example:

        echo scsi add-single-device 1 0 0 0 >/proc/scsi/scsi

Where the numbering above is host, channel, target-id, and then lun.

-- Patrick Mansfield
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Thomas Lang? » Sun, 21 Jul 2002 06:30:08


Patrick Mansfield:

Quote:> So this is the dmesg for the redhat 2.4.18-3? You said above that
> it found the disks, but, further down the qla driver inits and shows:

I did a dmesg on one of the machines currently without disks, but, the
error shown below is what we get with the std linux-kernel;
"Cable unplugged".  

This is from our syslog from when we booted up one machine with drives:
kern.info:Jul 19 16:53:45 gekko.stud.ntnu.no kernel: qla2x00_set_info starts at address = f8955060
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1): Allocated 4096 SRB(s)
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1): LIP reset occurred
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1): Waiting for LIP to complete...
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1): LOOP UP detected
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1: Topology - (F_Port), Host Loop address 0xffff
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1: Host table full.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1: Topology - (F_Port), Host Loop address 0xffff
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1: Host table full.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-adapter-node=200000e08b02894b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-adapter-port=210000e08b02894b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-target-0=2000005013d01808;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-target-1=2000005013d016d3;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-target-2=2000005013d0158d;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla0-target-3=2000005013d0154b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2): Allocated 4096 SRB(s)
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2): LIP reset occurred
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2): Waiting for LIP to complete...
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2): LOOP UP detected
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi2: Topology - (F_Port), Host Loop address 0xffff
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi2: Host table full.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi2: Topology - (F_Port), Host Loop address 0xffff
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi2: Host table full.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1): Waiting for LIP to complete...
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-adapter-node=200000e08b02854b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-adapter-port=210000e08b02854b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-target-0=2000005013d01808;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-target-1=2000005013d016d3;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-target-2=2000005013d0158d;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi-qla1-target-3=2000005013d0154b;
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 0 device 0
irq 16
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi2 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 0 device 0
irq 20
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi1: Topology - (F_Port), Host Loop address 0xffff
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1:0:0:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1:0:1:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1:0:2:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(1:0:3:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2:0:0:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2:0:0:1): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2:0:1:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2:0:2:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel: scsi(2:0:3:0): Enabled tagged queuing, queue depth 16.
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel:  /dev/scsi/host1/bus0/target0/lun0: p1
kern.info:Jul 19 16:53:54 gekko.stud.ntnu.no kernel:  /dev/scsi/host2/bus0/target0/lun1: p1

(We've got an in-production system that we plan to migrate, so
we need it running untill we find a viable solution to this problem.)

Quote:> You might want to check the hardware and connections. I've seen the qla
> (I'm using some beta6 with 2.5.25) get confused as to the state of the
> adapter and its connection.

This is actually the first time I ever see this, never seen it before, and
we've been using qla for one year now, this august. Started seeing this
odd behaviour with qla and vanilla kernel, together with Dell's 2650
(Dual P4 Xeon).

Quote:> If you turn on scsi logging (be careful, if syslog is running you can get
> infinite logging), and insmod your driver, you might get some useful
> information, I use:
>    echo scsi log scan 5  >/proc/scsi/scsi
> The above is safe to use with syslog running (since it logs the scsi
> scanning that happens when the adapter comes up, but not all IO).
> Also, cat /proc/scsi/scsi and /proc/scsi/qla*/[0-9] and see what they show.

I'll try this when I get back to work over the weekend.

Quote:> If the adapter appears to find devices, but scanning does not (likely
> lun problems), try manually scanning for a device, for example:
>    echo scsi add-single-device 1 0 0 0 >/proc/scsi/scsi
> Where the numbering above is host, channel, target-id, and then lun.

We did try this tho, but I'm unsure on which of the kernels, etc, it was
getting late and we had to get the other system back online for the
weekend.  I'll get back to you on this on monday.

--
Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Fabio Massimo Di Nitt » Sun, 21 Jul 2002 06:30:10




>>qla2x00_set_info starts at address = f8836060
>>qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

>>scsi(1): Allocated 4096 SRB(s)
>>PCI: Setting latency timer of device 02:04.0 to 64
>>scsi(1): Configure NVRAM parameters...
>>scsi(1): 64 Bit PCI Addressing Enabled
>>scsi(1): Verifying loaded RISC code...
>>scsi(1): Verifying chip...
>>scsi(1): Waiting for LIP to complete...
>>scsi(1): Cable is unplugged...
>>qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

>>scsi(2): Allocated 4096 SRB(s)
>>PCI: Setting latency timer of device 02:05.0 to 64
>>scsi(2): Configure NVRAM parameters...
>>scsi(2): 64 Bit PCI Addressing Enabled
>>scsi(2): Verifying loaded RISC code...
>>scsi(2): Verifying chip...
>>scsi(2): Waiting for LIP to complete...
>>scsi(2): Cable is unplugged...

I have an HSG80 connected on the other side and I got this problem with
the beta6 drivers from qlogic.

The only way I made it working was using the kernel driver shipped with
rh7.3
that has been modified to support the HSG80 (according to the changelog  
supported
only by the beta6 series).

Fabio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Thomas Lang? » Sun, 21 Jul 2002 06:50:07


Fabio Massimo Di Nitto:

Quote:> I have an HSG80 connected on the other side and I got this problem with
> the beta6 drivers from qlogic.
> The only way I made it working was using the kernel driver shipped with
> rh7.3
> that has been modified to support the HSG80 (according to the changelog  
> supported
> only by the beta6 series).

I find it odd if it's the driver, to be honest. Cause we've been running
with everything from 4.x-series, 5.x-series and 6.x-series on our current
in-production-system. This also runs linux vanilla, although with qla
driver patched in-kernel. (Currently 2.4.18 running).

So, the only differense between the new and old setup is:
* Dell 2550 in old vs 2650 in new (2xP3 vs 2xP4 Xeon)
* Old servers mount one disk each (and have only one HBA), but
  the new servers are supposed to have two HBAs, and have one
  disk on each HBA.

I don't see any good reason why this should result in no disks found
whatsoever.

--
Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Austin Gonyo » Sun, 21 Jul 2002 07:40:11


Two suggestions.

Use SGI's XFS installer and move to XFS on those boxen. You will not be
unhappy.

Second suggestion, once you've installed a RH 7.3 XFS installed box,
then go get 2.4.18-aa and recompile that and use it.
You will need to make a change to the scsi_scan.c file to add
BLIST_LARGELUN support to which ever devices you're using.

(or use the 2.4.19-rc1-aa2 since it might be done already, assuming your
using PV250's and up)

Do that, and you won't have any more problems on those boxen. Feel free
to email me for more details.



> > We've got a few Dell PowerEdge 2650 machines, and thought they would
> > become nice fileservers, and we installed RedHat Linux 7.3 on them.
> > So far, so good; after the installation, pretty much was downhill
> > from there. With RedHat's 2.4.18-3 and 2.4.18-5 kernel we detect
> > all disks connected through our QLogic FC 2200 HBA's, with
> > vanilla 2.4.18 and 2.4.19-rc2, we detect nothing; and we've tried
> > Qlogic's 6.0beta13 and 6.1beta2 drivers, as well as the driver
> > that comes with redhat's release. We're currently running an almost
> > identical configuration, only diff. is one HBA pr server, and
> > the servers are 2550's and not 2650's.

> > Ok, to sum up problems:

> > With redhat kernels:
> > * Disks found, _but_ after about 2-3 mins with heavy I/O on
> >   FC HBA's the machine dies, and only thing working is cold boot

> > With vanilla kernels:
> > * Disks not found, so we don't know about I/O problems.

> > Anyone have any ideas?

> > Here's dmesg and lspci -vvvxx, if anything else is needed, please
> > tell me, and I'll provide you with the info:

> > test4:~# dmesg

> > Processors: 4
> > Kernel command line: auto BOOT_IMAGE=linux ro root=802 BOOT_FILE=/boot/bzImage-2.4.18-3 max_scsi_luns=128

> So this is the dmesg for the redhat 2.4.18-3? You said above that
> it found the disks, but, further down the qla driver inits and shows:

> > qla2x00_set_info starts at address = f8836060
> > qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

> > scsi(1): Allocated 4096 SRB(s)
> > PCI: Setting latency timer of device 02:04.0 to 64
> > scsi(1): Configure NVRAM parameters...
> > scsi(1): 64 Bit PCI Addressing Enabled
> > scsi(1): Verifying loaded RISC code...
> > scsi(1): Verifying chip...
> > scsi(1): Waiting for LIP to complete...
> > scsi(1): Cable is unplugged...
> > qla2x00: Found  VID=1077 DID=2200 SSVID=1077 SSDID=2

> > scsi(2): Allocated 4096 SRB(s)
> > PCI: Setting latency timer of device 02:05.0 to 64
> > scsi(2): Configure NVRAM parameters...
> > scsi(2): 64 Bit PCI Addressing Enabled
> > scsi(2): Verifying loaded RISC code...
> > scsi(2): Verifying chip...
> > scsi(2): Waiting for LIP to complete...
> > scsi(2): Cable is unplugged...
> > scsi1 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 4 irq 16
> >         Firmware version:  2.02.03, Driver version 6.1b2
> > scsi2 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 5 irq 17
> >         Firmware version:  2.02.03, Driver version 6.1b2

> It complains about "Cable is unplugged", and does not find any drives.
> So, it looks like your redhat kernel is not finding any drives.

> You might want to check the hardware and connections. I've seen the qla
> (I'm using some beta6 with 2.5.25) get confused as to the state of the
> adapter and its connection.

> If you turn on scsi logging (be careful, if syslog is running you can get
> infinite logging), and insmod your driver, you might get some useful
> information, I use:

>    echo scsi log scan 5  >/proc/scsi/scsi

> The above is safe to use with syslog running (since it logs the scsi
> scanning that happens when the adapter comes up, but not all IO).

> Also, cat /proc/scsi/scsi and /proc/scsi/qla*/[0-9] and see what they show.

> If the adapter appears to find devices, but scanning does not (likely
> lun problems), try manually scanning for a device, for example:

>    echo scsi add-single-device 1 0 0 0 >/proc/scsi/scsi

> Where the numbering above is host, channel, target-id, and then lun.

> -- Patrick Mansfield
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Thomas Lang? » Sun, 21 Jul 2002 08:00:12


Austin Gonyou:

Quote:> Use SGI's XFS installer and move to XFS on those boxen. You will not be
> unhappy.

We're using reiserfs, and converting to XFS is out of the question,
sorry.

Quote:> Second suggestion, once you've installed a RH 7.3 XFS installed box,
> then go get 2.4.18-aa and recompile that and use it.
> You will need to make a change to the scsi_scan.c file to add
> BLIST_LARGELUN support to which ever devices you're using.

Why?  The vanilla kernel works fine with what we have in production,
but not with the new P4 Dual Xeon.

Quote:> (or use the 2.4.19-rc1-aa2 since it might be done already, assuming your
> using PV250's and up)

PV?  No, we're not using Dell's storage solutions, only their servers.

--
Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Thomas Lang? » Mon, 22 Jul 2002 11:50:05


Patrick Mansfield:

Quote:> Also, cat /proc/scsi/scsi and /proc/scsi/qla*/[0-9] and see what they show.

test3:~# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: DELL     Model: PERCRAID Mirror  Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: CNSi     Model: G7324            Rev: L400
  Type:   Direct-Access                    ANSI SCSI revision: 03

test3:~# cat /proc/scsi/qla2200/1
QLogic PCI to Fibre Channel Host Adapter for ISP2100/ISP2200/ISP2200A:
        Firmware version:  2.01.35, Driver version 5.31.RH1
Entry address = f8955060
HBA: QLA2200 , Serial# B50409
Request Queue = 0x36300000, Response Queue = 0x36ddc000
Request Queue count= 512, Response Queue count= 64
Number of pending commands = 0x0
Number of queued commands = 0x0
Number of free request entries = 494
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in retry_q = 0
Number of reqs in done_q = 0
Number of pending in_q reqs = 0
Host adapter: state = UP, flags= 0x20a0837

SCSI Device Information:
scsi-qla0-adapter-node=200000e08b02894b;
scsi-qla0-adapter-port=210000e08b02894b;
scsi-qla0-port-0=1000005013d01808:2000005013d01808;
scsi-qla0-port-1=1000005013d016d3:2000005013d016d3;
scsi-qla0-port-2=1000005013d0158d:2000005013d0158d;
scsi-qla0-port-3=1000005013d0154b:2000005013d0154b;

SCSI LUN Information:
(Id:Lun)
( 3: 0): Total reqs 12, Pending 0, Queued 0, full 0, flags 0x0, 0:0:82,

test3:~# cat /proc/scsi/qla2200/2
QLogic PCI to Fibre Channel Host Adapter for ISP2100/ISP2200/ISP2200A:
        Firmware version:  2.01.35, Driver version 5.31.RH1
Entry address = f8955060
HBA: QLA2200 , Serial# B50405
Request Queue = 0x362e0000, Response Queue = 0x36d86000
Request Queue count= 512, Response Queue count= 64
Number of pending commands = 0x0
Number of queued commands = 0x0
Number of free request entries = 508
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in retry_q = 0
Number of reqs in done_q = 0
Number of pending in_q reqs = 0
Host adapter: state = UP, flags= 0x20a0837

SCSI Device Information:
scsi-qla1-adapter-node=200000e08b02854b;
scsi-qla1-adapter-port=210000e08b02854b;
scsi-qla1-port-0=1000005013d01808:2000005013d01808;
scsi-qla1-port-1=1000005013d016d3:2000005013d016d3;
scsi-qla1-port-2=1000005013d0158d:2000005013d0158d;
scsi-qla1-port-3=1000005013d0154b:2000005013d0154b;

SCSI LUN Information:
(Id:Lun)

Both controllers should find a disk now, really, but they don't.
Swapping to a newer version of the driver (6.0beta or 6.1beta)
will make it find one drive on each HBA, I'll do that tomorrow
along with logging stuff.

--
Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Linux Kernel 2.4.18 and 2.4.19 problems

Post by Bill Davidse » Fri, 26 Jul 2002 21:50:20



> Why?  The vanilla kernel works fine with what we have in production,
> but not with the new P4 Dual Xeon.

I know it's not likely to be related in any way, but did you try booting
with hyperthreading off? I believe I saw four CPUs in the stuff you
posted, but it's gone now.

--

  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/