2.4.16 kernel/printk.c (per processor initialization check)

2.4.16 kernel/printk.c (per processor initialization check)

Post by j-nom.. » Tue, 04 Dec 2001 18:00:18



Hello,

I experienced system hang on my SMP machine and it turned out to be due to
console write before mmu initialization completes.

To be more specific, even if secondary processors are not in status enough
to do actual console I/O (e.g. mmu is not initialized), call_console_drivers()
tries to do it.
This leads to unpredictable result. For me, for example, it cause machine
check abort and hang up system.

Attached is a patch for it.

--- kernel/printk.c     2001/11/27 04:41:49     1.1.1.8

  */
 void release_console_sem(void)
 {
        unsigned long flags;
        unsigned long _con_start, _log_end;
        unsigned long must_wake_klogd = 0;

        for ( ; ; ) {
                spin_lock_irqsave(&logbuf_lock, flags);
                must_wake_klogd |= log_start - log_end;
+               if (!(cpu_online_map & 1UL << smp_processor_id()))
+                       break;
                if (con_start == log_end)
                        break;                  /* Nothing to print */
                _con_start = con_start;
                _log_end = log_end;
                con_start = log_end;            /* Flush */
                spin_unlock_irqrestore(&logbuf_lock, flags);
                call_console_drivers(_con_start, _log_end);
        }
        console_may_schedule = 0;
        up(&console_sem);

Best regards.
--

HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.4.16 kernel/printk.c (per processor initialization check)

Post by j-nom.. » Wed, 05 Dec 2001 10:40:11


Hi,

Thank you for commenting.


Subject: Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
Date: Mon, 03 Dec 2001 01:20:28 -0800

Quote:> Seems that there is some sort of ordering problem here - someone
> is calling printk before the MMU is initialised, but after some
> console drivers have been installed.

Yes.
Because smp_init() is later in place than console_init(), printk() can be
called in such a situation.
For example, in IA-64, identify_cpu() is called before ia64_mmu_init(),
while identify_cpu() calls printk() in it.
I don't think the ordering itself is a problem.

Quote:> I suspect the real fix is elsewhere, but I'm not sure where.

> Probably a clearer place to put this test would be within
> printk itself, immediately before the down_trylock.  Does that
> work?

The reason I put it in release_console_sem() is that release_console_sem()
can be called from other functions than printk(), e.g. console_unblank().
I agree with you that it is clearer but I think it is not sufficient.

Best regards.
--

HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.4.16 kernel/printk.c (per processor initialization check)

Post by Andrew Morto » Wed, 05 Dec 2001 11:10:09



> Hello,

> I experienced system hang on my SMP machine and it turned out to be due to
> console write before mmu initialization completes.

> To be more specific, even if secondary processors are not in status enough
> to do actual console I/O (e.g. mmu is not initialized), call_console_drivers()
> tries to do it.
> This leads to unpredictable result. For me, for example, it cause machine
> check abort and hang up system.

> Attached is a patch for it.

> --- kernel/printk.c     2001/11/27 04:41:49     1.1.1.8
> +++ kernel/printk.c     2001/12/03 05:25:26

>   */
>  void release_console_sem(void)
>  {
>         unsigned long flags;
>         unsigned long _con_start, _log_end;
>         unsigned long must_wake_klogd = 0;

>         for ( ; ; ) {
>                 spin_lock_irqsave(&logbuf_lock, flags);
>                 must_wake_klogd |= log_start - log_end;
> +               if (!(cpu_online_map & 1UL << smp_processor_id()))
> +                       break;
>                 if (con_start == log_end)
>                         break;                  /* Nothing to print */
>                 _con_start = con_start;
>                 _log_end = log_end;
>                 con_start = log_end;            /* Flush */
>                 spin_unlock_irqrestore(&logbuf_lock, flags);
>                 call_console_drivers(_con_start, _log_end);
>         }
>         console_may_schedule = 0;
>         up(&console_sem);

Seems that there is some sort of ordering problem here - someone
is calling printk before the MMU is initialised, but after some
console drivers have been installed.

I suspect the real fix is elsewhere, but I'm not sure where.

Probably a clearer place to put this test would be within
printk itself, immediately before the down_trylock.  Does that
work?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. 2.4.16 kernel/printk.c (per processorinitializationcheck)

Ok, it does. However, this still does not make me change my mind.

Prove, please. If you show me it can also happen on other architectures,
I'll be glad to apply the patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Sun Netra i station and Solaris.

3. Debian Dist - 2.4.3 Kernel to 2.4.16 Kernel

4. Complaining,Explanation,Apology...

5. Frequent kernel crashes with 2.4.16

6. Linux Frequently Asked Questions with Answers (Part 6 of 6)

7. Kernel 2.4.16 crashing blues ....

8. SHMLBA constant.

9. Boot loop, kernel 2.4.16

10. Kernel 2.4.16 failed to compile

11. RH 7.2, ext 3 and Kernel 2.4.16

12. Power off NOT working, kernel 2.4.16

13. maximum allowable swap size in arca-VM kernels (2.4.16)