2.5.68 Fix IO_APIC IRQ assignment bug

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Chuck Ebber » Tue, 22 Apr 2003 00:20:10



 Looks like the fix for the "ran out of interrupt sources" panic
has a problem.  It will eventually assign a device the same IRQ
number as the first system vector, i.e. the local APIC timer.
I think this will fix it:

--- a/arch/i386/kernel/io_apic.c

        if (current_vector == SYSCALL_VECTOR)
                goto next;

-       if (current_vector > FIRST_SYSTEM_VECTOR) {
+       if (current_vector >= FIRST_SYSTEM_VECTOR) {
                offset = (offset + 1) & 7;
                current_vector = FIRST_DEVICE_VECTOR + offset;
        }

 I found this while trying to forward-port my .66 patch to make
the redirect table look like this:

 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  
 00 001 01  0    0    0   0   0    1    1    E7     <== timer at level E
 01 001 01  0    0    0   0   0    1    1    30     <== start at 30, not 31
 02 000 00  1    0    0   0   0    0    0    00
 03 001 01  0    0    0   0   0    1    1    38
 04 001 01  0    0    0   0   0    1    1    40
 05 001 01  0    0    0   0   0    1    1    48
 06 001 01  0    0    0   0   0    1    1    50
 07 001 01  0    0    0   0   0    1    1    58
 08 001 01  0    0    0   0   0    1    1    60
 09 001 01  0    0    0   0   0    1    1    68
 0a 001 01  0    0    0   0   0    1    1    70
 0b 001 01  0    0    0   0   0    1    1    78
 0c 001 01  0    0    0   0   0    1    1    88     <== only one device at 8
 0d 001 01  0    0    0   0   0    1    1    90
 0e 001 01  0    0    0   0   0    1    1    98
 0f 000 00  1    0    0   0   0    0    0    00
 10 001 01  1    1    0   1   0    1    1    A0
 11 001 01  1    1    0   1   0    1    1    A8
 12 001 01  1    1    0   1   0    1    1    B0
 13 001 01  1    1    0   1   0    1    1    B8
 14 001 01  0    0    0   0   0    1    1    C0
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00

------
 Chuck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Linus Torval » Tue, 22 Apr 2003 01:10:04




Quote:

> Looks like the fix for the "ran out of interrupt sources" panic
>has a problem.  It will eventually assign a device the same IRQ
>number as the first system vector, i.e. the local APIC timer.

Good call.

Although I suspect you need about a million interrupt sources to hit
this, since FIRST_SYSTEM_VECTOR is somethign like 0xef, and thus you can
hit it only when "offset" has already been incremented seven times
(which implies that we've walked the whole vector space quite a few
times by then).

Did you actually see this on hardware?

Anyway, applied as obvious.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Linus Torval » Tue, 22 Apr 2003 01:10:07




Quote:

> I found this while trying to forward-port my .66 patch to make
>the redirect table look like this:

Btw, why would you _want_ your redirect table to look like that?

Quote:> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  
> 00 001 01  0    0    0   0   0    1    1    E7     <== timer at level E
> 01 001 01  0    0    0   0   0    1    1    30     <== start at 30, not 31

Starting at 31 is better, because..

Quote:> 02 000 00  1    0    0   0   0    0    0    00
> 03 001 01  0    0    0   0   0    1    1    38
> 04 001 01  0    0    0   0   0    1    1    40
> 05 001 01  0    0    0   0   0    1    1    48
> 06 001 01  0    0    0   0   0    1    1    50
> 07 001 01  0    0    0   0   0    1    1    58
> 08 001 01  0    0    0   0   0    1    1    60
> 09 001 01  0    0    0   0   0    1    1    68
> 0a 001 01  0    0    0   0   0    1    1    70
> 0b 001 01  0    0    0   0   0    1    1    78
> 0c 001 01  0    0    0   0   0    1    1    88     <== only one device at 8

Then we'd have devices at 81 and 89 (two per block of 16 is ok, but 80
isn't ok because we use that for system calls).

And having two devices in the 8x series means that it takes more irq
sources to overflow and start to re-use the 3x block - and we want to
delay re-using the 3x block as long as possible due to the silly "only
one pending irq per block" rule that some intel APIC's have.

So that's why FIRST_DEVICE_VECTOR is normally 31 - because it gives a
nicer pattern for the first round of allocations, and that's the common
case (machines with hundreds of APIC irq sources are still rare, the
common case is a single IOAPIC with 24 or so pins).

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Zwane Mwaikamb » Tue, 22 Apr 2003 01:20:08



> Good call.

> Although I suspect you need about a million interrupt sources to hit
> this, since FIRST_SYSTEM_VECTOR is somethign like 0xef, and thus you can
> hit it only when "offset" has already been incremented seven times
> (which implies that we've walked the whole vector space quite a few
> times by then).

> Did you actually see this on hardware?

Yes, we need to bail out in assign_irq_vector when we wrap around,
otherwise we cause collisions when programming the IOAPIC. And we also
need to avoid overruning NR_IRQS structures in setup_IO_APIC_irqs.

        Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Chuck Ebber » Tue, 22 Apr 2003 10:00:12


Quote:> Yes, we need to bail out in assign_irq_vector when we wrap around,
> otherwise we cause collisions when programming the IOAPIC. And we also
> need to avoid overruning NR_IRQS structures in setup_IO_APIC_irqs.

  Do you mean the panic on running out of sources should be put
back in?

------
 Chuck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Zwane Mwaikamb » Tue, 22 Apr 2003 19:30:11



> BTW, I think a better way is to switch from IRQ-base to vector-base.
> We are working on PCI MSI (Messaged Signaled Support) support, and MSI
> does not require IRQ (or the platform does not provide IRQ at all), and
> found a vector-based solution was simpler.

> Even if MSI is not required, I think vector-based is a clearer solution
> (IA-64 is using vector-based). IRQs are given by the platform, and the
> kernel cannot do anything with those. Vector assignment/allocation are
> fully controlled by the kernel, and the kernel can return the vector
> number instead of IRQ (except legacy drivers where IRQs < 16).

> We made a prototype that simply returns the vector numbers for IRQ to
> device drivers (dev.irq). The function do_IRQ(), for example, gets the
> vector number, instead of IRQ. No changes to arch/i386/kernel/irq.c were
> required.

I know there are more people who want to get rid of NR_IRQS e.g. due to
very sparse irq distribution. For one of the platforms i'm interested in,
we have to make a clear distinction between irqs and vectors so that we
can have seperate vector allocations per interrupt handling domain. I
believe IA-64 does the same but instead per cpu (our domain/node consists
of 4 cpus) NR_IRQS gets in the way due to it being set at 224 when we actually
can service NR_IRQ_VECTORS * NUM_MAXNODES I/O vectors. Can you post your
patch?

Also what MSI devices are you using?

- Show quoted text -

Quote:> Before (IRQ-based)
> # cat /proc/interrupts
>            CPU0       CPU1      
>   0:      10921     671640    IO-APIC-edge  timer
>   2:          0          0          XT-PIC  cascade
>   9:          0          0   IO-APIC-level  acpi
>  14:       5102          1    IO-APIC-edge  ide0
>  15:         10          1    IO-APIC-edge  ide1
>  16:          0          0   IO-APIC-level  uhci-hcd, uhci-hcd
>  18:        449          0   IO-APIC-level  uhci-hcd
>  19:         61          0   IO-APIC-level  uhci-hcd
>  20:        345          0   IO-APIC-level  eth0
>  23:          0          0   IO-APIC-level  ehci-hcd
> NMI:          0          0
> LOC:     680526     680437
> ERR:          0
> MIS:          0

> After (vector-based)
>            CPU0       CPU1      
>   0:     709682          0    IO-APIC-edge  timer
>   2:          0          0          XT-PIC  cascade
>   9:          0          0   IO-APIC-level  acpi
>  14:       4988          1    IO-APIC-edge  ide0
>  15:         10          1    IO-APIC-edge  ide1
> 177:         78          0   IO-APIC-level  uhci-hcd
> 185:          0          0   IO-APIC-level  uhci-hcd, uhci-hcd
> 193:         58          0   IO-APIC-level  uhci-hcd
> 201:          0          0   IO-APIC-level  ehci-hcd
> 209:        356          0   IO-APIC-level  eth0
> NMI:          0          0
> LOC:     707613     707524
> ERR:          0
> MIS:          0

--
function.linuxpower.ca
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Zwane Mwaikamb » Tue, 22 Apr 2003 19:40:08



> Why can't we use the same vector for multiple ioapic entrys? After all,
> we are already sharing irqs, and an irq is just a cookie for a vector.
> What do you mean with "lost irq routing" ?

Each ioredtbl can take a vector, if you assign another ioredtbl with the
same vector and different IRQ then you collide with the previous entry and
wipe it from the IDT. Also irq != vector

        Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Mika Penttil » Tue, 22 Apr 2003 20:00:10


yes the current code has the assumption of 1to1 mapping from vector to
irq, but that's a software limitation.

--Mika



>>Why can't we use the same vector for multiple ioapic entrys? After all,
>>we are already sharing irqs, and an irq is just a cookie for a vector.
>>What do you mean with "lost irq routing" ?

>Each ioredtbl can take a vector, if you assign another ioredtbl with the
>same vector and different IRQ then you collide with the previous entry and
>wipe it from the IDT. Also irq != vector

>    Zwane

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Nakajima, Ju » Tue, 22 Apr 2003 20:30:19


Quote:> I know there are more people who want to get rid of NR_IRQS e.g. due to
> very sparse irq distribution. For one of the platforms i'm interested in,
> we have to make a clear distinction between irqs and vectors so that we
> can have seperate vector allocations per interrupt handling domain. I
> believe IA-64 does the same but instead per cpu (our domain/node consists
> of 4 cpus) NR_IRQS gets in the way due to it being set at 224 when we
> actually

I heard such requests too, and suggested the same thing, i.e. seperate vector allocations per interrupt handling domain.

Quote:> can service NR_IRQ_VECTORS * NUM_MAXNODES I/O vectors. Can you post your
> patch?

Yes, we'll post it after cleanups.

Quote:

> Also what MSI devices are you using?

Adaptec 39320 SCSI HBA (it has the 7902 ASIC), which has two MSI-capable functions. You don't need to change the driver (aic79xx) to enable MSI.

Thanks,
Jun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Chuck Ebber » Wed, 23 Apr 2003 00:00:16


 Oops, meant to send this to l-k:


>> Looks like the fix for the "ran out of interrupt sources" panic
>>has a problem.  It will eventually assign a device the same IRQ
>>number as the first system vector, i.e. the local APIC timer.
 ...
> Did you actually see this on hardware?

  In my dreams. :)  But someone must have such hardware or the panic
wouldn't have been removed...

  Only reason I found it at all is I had changed the exact same lines
in my patch (and mine had a much bigger bug than that.)

Quote:>Btw, why would you _want_ your redirect table to look like that?

>> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:  
>> 00 001 01  0    0    0   0   0    1    1    E7     <== timer at level E
>> 01 001 01  0    0    0   0   0    1    1    30     <== start at 30, not

31

 The patch does two things:

  1.  Reserves all of priority level E for the timers --
      legacy timer at E7, first system vector DF.
      This is might be worth doing (but first system
      vector should be E0.)

  2.  Starts assigning devices at 0x30 instead of 0x31.

(1) is probably worth doing (first system vector should be E0, though.)

Quote:> Starting at 31 is better, because..
 ...
> Then we'd have devices at 81 and 89 (two per block of 16 is ok, but 80
> isn't ok because we use that for system calls).

> And having two devices in the 8x series means that it takes more irq
> sources to overflow and start to re-use the 3x block - and we want to

 But doesn't IRQ 0x80, even though it is software-initiated, contend
with 'real' device interrupts at priority 8, which would mean there are
three possible sources (80, 81 and 89?)  That's what I was assuming...

------
 Chuck

------
 Chuck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Maciej W. Rozyck » Wed, 23 Apr 2003 17:10:04



> Why can't we use the same vector for multiple ioapic entrys? After all,
> we are already sharing irqs, and an irq is just a cookie for a vector.

 IIRC, there are serious issues with using the same vector for multiple
I/O APIC interrupts, at least for certain implementations.  So it's
probably not even worth investigating.

--
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.68 Fix IO_APIC IRQ assignment bug

Post by Maciej W. Rozyck » Wed, 23 Apr 2003 17:10:09



>  But doesn't IRQ 0x80, even though it is software-initiated, contend
> with 'real' device interrupts at priority 8, which would mean there are
> three possible sources (80, 81 and 89?)  That's what I was assuming...

 Problems are with local APIC hardware (with queueing arriving IRQ
messages); "int 0x80" doesn't go through the APIC.

--
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/