Wrong IRQ distribution on Dual Xeon SMP system (2.4.17).

Wrong IRQ distribution on Dual Xeon SMP system (2.4.17).

Post by Kosta Porotchki » Wed, 08 May 2002 03:20:09



Problem:  Wrong interrupts distribution between processors, as a result of
IO APIC configuration errors

Platform:  SuperMicro P4DP6 motherboard (Intel E7500 chipset) with two Intel
Xeon Processors (512 Kb L2 cache @ 2.2 GHz), 1 Gb PC2100 RAM. Phoenix BIOS
1.1a (latest available for this board). Hyper threading enabled, ACPI
enabled.

Kernel:    2.4.17 Sherman-x330 (MontaVista) - SMP enabled.

Problem Description:

After booting the /proc/interrupts files reads as follows:

           CPU0       CPU1       CPU2       CPU3
  0:      23794          0          0          0    IO-APIC-edge  timer
  1:       1900          0          0          0    IO-APIC-edge  keyboard
  2:          0          0          0          0          XT-PIC  cascade
  4:          0          0          0          0    IO-APIC-edge  KGDB-stub
  9:          0          0          0          0    IO-APIC-edge  acpi
 17:       4718          0          0          0   IO-APIC-level  eth0
NMI:      23672      23672      23672      23672
LOC:      23649      23648      23648      23646
ERR:          0
MIS:          0
****************************************************************************
**
Some entries from dmesg output:
****************************************************************************
**
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000d8000 - 00000000000e0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fef0000 (usable)
 BIOS-e820: 000000003fef0000 - 000000003fefc000 (ACPI data)
 BIOS-e820: 000000003fefc000 - 000000003ff00000 (ACPI NVS)
 BIOS-e820: 000000003ff00000 - 000000003ff80000 (usable)
 BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
 BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
found SMP MP-table at 000f6710
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
hm, page 0009f000 reserved twice.
hm, page 000a0000 reserved twice.
On node 0 totalpages: 262016
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32640 pages.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID:   Product ID: Kings Canyon APIC at: 0xFEE00000
Processor #0 Unknown CPU [15:2] APIC version 20
Processor #6 Unknown CPU [15:2] APIC version 20
Processor #1 Unknown CPU [15:2] APIC version 20
Processor #7 Unknown CPU [15:2] APIC version 20
I/O APIC #2 Version 32 at 0xFEC00000.
I/O APIC #3 Version 32 at 0xFEC80000.
I/O APIC #4 Version 32 at 0xFEC80400.
I/O APIC #5 Version 32 at 0xFEC81000.
I/O APIC #8 Version 32 at 0xFEC81400.
Processors: 4

**********************************************************************
Each logical processor has its own LAPIC.
This board has two Intel PCI/PCI-X Hubs (P64H2), each of them has two I/O
APICs (one for primary and one for secondary bus).
The following is delivered from Intel E7500 specification (paragraph 4.1.4 -
I/O APIC Memory Space):
Two I/O APICs a (I/OAPIC1 (HI_B)) are located in memory region 0xFEC80000 -
0xFEC80FFF, the next two (I/OAPIC2 (HI_C)) in memory region 0xFEC81000 -
0xFEC81FFF.
The fifth I/O APIC is coming from I/O Controller hub (ICH3) - I/O APIC #2 in
the above printout or I/O APIC0 (HI_A) according to the Intel E7500 chipset
Memory Range Address Map.
The addresses reported by Linux setup procedure are falling in the above
noted ranges. I tried to force kernel to use the ACPI tables instead of MP
table in order to see if the last one is correct. The addresses of I/O APICs
from the ACPI table were exactly the same as in MP table. For me this is
indication that there is no error in MP table (Am I wrong?).
As part of kernel initialization process, all the I/O APIC IDs were updated:
**********************************************************************
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
Setting 4 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 4 ... ok.
Setting 5 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 5 ... ok.
Setting 8 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 8 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-5, 2-10, 2-11, 2-20, 2-21, 2-22, 2-23, 3-0,
3-1, 3-2, 3-3, 3-4, 3-5, 3-6, 3-7, 3-8, 3-9, 3-10, 3-11, 3-12, 3-13, 3-14,
3-15, 3-16, 3-17, 3-18, 3-19, 3-20, 3-21, 3-22, 3-23, 4-0, 4-1, 4-2, 4-3,
4-4, 4-5, 4-6, 4-7, 4-8, 4-9, 4-10, 4-11, 4-12, 4-13, 4-14, 4-15, 4-16,
4-17, 4-18, 4-19, 4-20, 4-21, 4-22, 4-23, 5-0, 5-1, 5-2, 5-3, 5-4, 5-5, 5-6,
5-7, 5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15, 5-16, 5-17, 5-18, 5-19,
5-20, 5-21, 5-22, 5-23, 8-0, 8-1, 8-2, 8-3, 8-4, 8-5, 8-6, 8-7, 8-8, 8-9,
8-10, 8-11, 8-12, 8-13, 8-14, 8-15, 8-16, 8-17, 8-18, 8-19, 8-20, 8-21,
8-22, 8-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
activating NMI Watchdog ... done.
testing NMI watchdog ... OK.
number of MP IRQ sources: 19.
number of IO-APIC #2 registers: 24.
number of IO-APIC #3 registers: 24.
number of IO-APIC #4 registers: 24.
number of IO-APIC #5 registers: 24.
number of IO-APIC #8 registers: 24.
*************************************************************************
The problems starts to appear during the I/O APIC tests. The I/O Controller
Hub I/O APIC shows 0x020080000, which is wrong according to the I/O APIC
specification (should be 0x02000000 in my case), which cause kernel to print
a warning message.
Both I/O APICS of first P64H2 reported the same physical ID. The same
situation is for second P64H2:
*************************************************************************
IO APIC #2......
.... register #00: 02008000
.......    : physical APIC id: 02
 WARNING: unexpected IO-APIC, please mail
          to linux-...@vger.kernel.org
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00  1    0    0   0   0    0    0    00
 01 00F 0F  0    0    0   0   0    1    1    39
 02 00F 0F  0    0    0   0   0    1    1    31
 03 00F 0F  0    0    0   0   0    1    1    41
 04 00F 0F  0    0    0   0   0    1    1    49
 05 000 00  1    0    0   0   0    0    0    00
 06 00F 0F  0    0    0   0   0    1    1    51
 07 00F 0F  0    0    0   0   0    1    1    59
 08 00F 0F  0    0    0   0   0    1    1    61
 09 00F 0F  0    0    0   0   0    1    1    69
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 00F 0F  0    0    0   0   0    1    1    71
 0d 00F 0F  0    0    0   0   0    1    1    79
 0e 00F 0F  0    0    0   0   0    1    1    81
 0f 00F 0F  0    0    0   0   0    1    1    89
 10 00F 0F  1    1    0   1   0    1    1    91
 11 00F 0F  1    1    0   1   0    1    1    99
 12 00F 0F  1    1    0   1   0    1    1    A1
 13 00F 0F  1    1    0   1   0    1    1    A9
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00

IO APIC #3......
.... register #00: 04000000
.......    : physical APIC id: 04
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 04000000
.......     : arbitration: 04
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00  1    0    0   0   0    0    0    00
 01 000 00  1    0    0   0   0    0    0    00
 02 000 00  1    0    0   0   0    0    0    00
 03 000 00  1    0    0   0   0    0    0    00
 04 000 00  1    0    0   0   0    0    0    00
 05 000 00  1    0    0   0   0    0    0    00
 06 000 00  1    0    0   0   0    0    0    00
 07 000 00  1    0    0   0   0    0    0    00
 08 000 00  1    0    0   0   0    0    0    00
 09 000 00  1    0    0   0   0    0    0    00
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 000 00  1    0    0   0   0    0    0    00
 0d 000 00  1    0    0   0   0    0    0    00
 0e 000 00  1    0    0   0   0    0    0    00
 0f 000 00  1    0    0   0   0    0    0    00
 10 000 00  1    0    0   0   0    0    0    00
 11 000 00  1    0    0   0   0    0    0    00
 12 000 00  1    0    0   0   0    0    0    00
 13 000 00  1    0    0   0   0    0    0    00
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00

IO APIC #4......
.... register #00: 04000000
.......    : physical APIC id: 04
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
.... register #02: 04000000
.......     : arbitration: 04
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
 00 000 00  1    0    0   0   0    0    0    00
 01 000 00  1    0    0   0   0    0    0    00
 02 000 00  1    0    0   0   0    0    0    00
 03 000 00  1    0    0   0   0    0    0    00
 04 000 00  1    0    0   0   0    0    0    00
 05 000 00  1    0    0   0   0    0    0    00
 06 000 00  1    0    0   0   0    0    0    00
 07 000 00  1    0    0   0   0    0    0    00
 08 000 00  1    0    0   0   0    0    0    00
 09 000 00  1    0    0   0   0    0    0    00
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0  
...

read more »

 
 
 

Wrong IRQ distribution on Dual Xeon SMP system (2.4.17).

Post by Martin J. Blig » Wed, 08 May 2002 03:40:07


> Problem:  Wrong interrupts distribution between processors, as a result of
> IO APIC configuration errors

> Platform:  SuperMicro P4DP6 motherboard (Intel E7500 chipset) with two Intel

> 1.1a (latest available for this board). Hyper threading enabled, ACPI
> enabled.

> Kernel:    2.4.17 Sherman-x330 (MontaVista) - SMP enabled.

> Problem Description:

> After booting the /proc/interrupts files reads as follows:

>            CPU0       CPU1       CPU2       CPU3
>   0:      23794          0          0          0    IO-APIC-edge  timer
>   1:       1900          0          0          0    IO-APIC-edge  keyboard
>   2:          0          0          0          0          XT-PIC  cascade
>   4:          0          0          0          0    IO-APIC-edge  KGDB-stub
>   9:          0          0          0          0    IO-APIC-edge  acpi
>  17:       4718          0          0          0   IO-APIC-level  eth0
> NMI:      23672      23672      23672      23672
> LOC:      23649      23648      23648      23646
> ERR:          0
> MIS:          0

P4 based systems don't round-robin interrupts like P3 based systems do.
Check back in the archives, and find one of the interrupt routing patches.
Dave Olien had one, and Ingo Molnar had one, they use slightly different
approaches.

Martin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/