An oops while running 2.5.65-mm2

An oops while running 2.5.65-mm2

Post by Martin Josefsso » Sat, 22 Mar 2003 03:30:09





> > Greetings -

> > Here is some info about an oops from 2.5.65-mm2

> It is not exactly an oops, but it is a warning of a fatal bug.

This might explain the crashes I've seen in my routers but been unable
to capture so far (and without the extra slab debugging I don't know if
I would have been able to find it or even get a decent capture...)

Patch attached that should fix the problem, not tested or even compiled.

Joe, can you please please test it and report back?

Quote:> Look at this lovely trace:
> Mar 20 11:06:46 jyro kernel: Slab corruption: start=ceaa2234, expend=ceaa2377, problemat=ceaa22ac
> Mar 20 11:06:46 jyro kernel: Last user: [<e0a12cb0>](destroy_conntrack+0xd0/0x140 [ip_conntrack])
> Mar 20 11:06:46 jyro kernel: Data: ************************************************************************************************************************AC 22 AA CE AC 22 AA CE ***************************************************************************************************************************************************************************************************A5
> Mar 20 11:06:46 jyro kernel: Next: 71 F0 2C .B0 2C A1 E0 71 F0 2C .********************
> Mar 20 11:06:46 jyro kernel: slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing
> Looking at the data pattern, it is probably an INIT_LIST_HEAD() against a
> list_head field which is 120 bytes into the object.  (problemat - start).  Or
> a list_del() against a different object which erroneously remains on a list
> with this object.

You are correct. It was a list_del() that caused it (at least I think
so, it's 2am right now).

1. conntrack helper adds an expectation and adds that to a list*
of off a connection.

2. the expected connection arrives. the expectation is still on the
list.

3. the original connection that caused the expectation terminates but
the expectation still thinks it's added to the list.

4. the expected connection terminates and list_del() is called to remove
it from the list which doesn't exist anymore. boom!

Quote:> Manfred has extra toys in the works which will be able to unmap slab objects
> from the kernel virtual address space when they are freed.  When this debug
> code is working (it will run slowly) we will get an oops at the site of the
> bug.

This will be a nice feature, might make it easier to find the bugs.
Too bad it can't help work out the relationship between structs... I had
to use pen and paper to work out the relationship between connections
and expectations :) That part is almost a little bit hairy.

--
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.

  ip_conntrack-siblingfix.diff
< 1K Download
 
 
 

An oops while running 2.5.65-mm2

Post by jjs » Sat, 22 Mar 2003 20:50:09


We have 12 hours uptime so far with this
patch, and everything is good so far -

Will report any change in status -

Joe


>You are correct. It was a list_del() that caused it (at least I think
>so, it's 2am right now).

>1. conntrack helper adds an expectation and adds that to a list*
>of off a connection.

>2. the expected connection arrives. the expectation is still on the
>list.

>3. the original connection that caused the expectation terminates but
>the expectation still thinks it's added to the list.

>4. the expected connection terminates and list_del() is called to remove
>it from the list which doesn't exist anymore. boom!

>------------------------------------------------------------------------

>--- linux-2.5.64-bk10/net/ipv4/netfilter/ip_conntrack_core.c.orig   2003-03-21 01:42:57.000000000 +0100
>+++ linux-2.5.64-bk10/net/ipv4/netfilter/ip_conntrack_core.c        2003-03-21 01:44:11.000000000 +0100

>             * the un-established ones only */
>            if (exp->sibling) {
>                    DEBUGP("remove_expectations: skipping established %p of %p\n", exp->sibling, ct);
>+                   exp->sibling = NULL;
>                    continue;
>            }


>    WRITE_LOCK(&ip_conntrack_lock);
>    /* Delete our master expectation */
>    if (ct->master) {
>-           /* can't call __unexpect_related here,
>-            * since it would*up expect_list */
>-           list_del(&ct->master->expected_list);
>+           if (ct->master->sibling) {
>+                   /* can't call __unexpect_related here,
>+                    * since it would*up expect_list */
>+                   list_del(&ct->master->expected_list);
>+           }
>            kfree(ct->master);
>    }
>    WRITE_UNLOCK(&ip_conntrack_lock);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

An oops while running 2.5.65-mm2

Post by Martin Josefsso » Sun, 23 Mar 2003 02:10:06



> You are correct. It was a list_del() that caused it (at least I think
> so, it's 2am right now).

> 1. conntrack helper adds an expectation and adds that to a list*
> of off a connection.

> 2. the expected connection arrives. the expectation is still on the
> list.

> 3. the original connection that caused the expectation terminates but
> the expectation still thinks it's added to the list.

> 4. the expected connection terminates and list_del() is called to remove
> it from the list which doesn't exist anymore. boom!

Ok, the previous patch was a little bit incorrect. It did fix the use
after free bug (which can cause corruption if the slabmemory is
reallocated before we write to it) but lost some internal information.
I can't see that we use this anywhere after this point but here's the
proper patch.

Sorry about that.

--- linux-2.5.64-bk10/net/ipv4/netfilter/ip_conntrack_core.c.orig       2003-03-21 01:42:57.000000000 +0100

                 * the un-established ones only */
                if (exp->sibling) {
                        DEBUGP("remove_expectations: skipping established %p of %p\n", exp->sibling, ct);
+                       exp->expectant = NULL;
                        continue;
                }

        WRITE_LOCK(&ip_conntrack_lock);
        /* Delete our master expectation */
        if (ct->master) {
-               /* can't call __unexpect_related here,
-                * since it would*up expect_list */
-               list_del(&ct->master->expected_list);
+               if (ct->master->expectant) {
+                       /* can't call __unexpect_related here,
+                        * since it would*up expect_list */
+                       list_del(&ct->master->expected_list);
+               }
                kfree(ct->master);
        }
        WRITE_UNLOCK(&ip_conntrack_lock);
--
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

An oops while running 2.5.65-mm2

Post by J Sloa » Tue, 25 Mar 2003 06:50:05



>This might explain the crashes I've seen in my routers but been unable
>to capture so far (and without the extra slab debugging I don't know if
>I would have been able to find it or even get a decent capture...)

>Patch attached that should fix the problem, not tested or even compiled.

>Joe, can you please please test it and report back?

I've been running 2.5.65-kb2 + your patch
for the past 2 days and everything is fully
functional, with no repeat of the problems
we saw before.

Nice work.

Joe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

An oops while running 2.5.65-mm2

Post by Martin Josefsso » Tue, 25 Mar 2003 15:30:22



> I've been running 2.5.65-kb2 + your patch
> for the past 2 days and everything is fully
> functional, with no repeat of the problems
> we saw before.

Great, glad it seems to work. If you see anything like that again,
please let me know.

Quote:> Nice work.

Thanks, Andrew's notes realy helped.

--
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. ide kernel panic: 2.5.64-ac3 2.5.65-ac1 2.5.65-mm4

AMD K6/2 with VIA chipset has this panic at boot:

Kernel panic: ide: default attach failed

Panic on 2.5.64-ac3, 2.5.65-ac[13], 2.5.65-mm4, 2.5.65-bk4.

No panic on 2.5.61-ac1, 2.5.65-mm3, 2.5.65, 2.4.21-pre5, 2.4.21-pre5-ac3.

No modules.

egrep '^C.*IDE|^C.*VIA' /usr/src/linux-2.5.65-ac1/.config
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y

Boot message on 2.5.65-ac1:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c586b (rev 47) IDE UDMA33 controller on pci00:07.1
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 51536U3, ATA DISK drive
hdb: ATAPI CDROM, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: Maxtor 52049U4, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: host protected area => 1
hda: 30015216 sectors (15368 MB) w/2048KiB Cache, CHS=29777/16/63, UDMA(33)
 hda: [PTBL] [1868/255/63] hda1 hda2 hda3
hdc: host protected area => 1
hdc: 40020624 sectors (20491 MB) w/2048KiB Cache, CHS=39703/16/63, UDMA(33)
 hdc: hdc1 hdc2 hdc3
ide-disk: hdc: Failed to register the driver with ide.c
ide-default: hdc: Failed to register the driver with ide.c
Kernel panic: ide: default attach failed

lspci -vvv for IDE interface

IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at e000 [size=16]

lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 47)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C (rev 10)
01:00.0 VGA compatible controller: nVidia Corporation NV6 [Vanta] (rev 15)

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. open file information in a core file

3. [2.5.65] kexec for 2.5.65 available

4. color ls VS screen

5. aic7(censored) dying horribly in 2.5.65-mm2

6. rlogin issue.

7. benchmark anobjrmap with 2.5.65-mm2

8. (Review ID: 159248) Starting Tomcat servlet engine gives HotSpot Virtual Machine Error: 11, error ID

9. 2.5.65-mm2 with contest

10. Sleeping in illegal context with 2.5.65-mm2

11. aic7(censored) dying horribly in 2.5.65-mm2 (fwd)

12. WimMark I report for 2.5.65-mm2

13. 2.5.65-mm2