2.5.47 - Assertion failed in fs/jbd/journal.c:415

2.5.47 - Assertion failed in fs/jbd/journal.c:415

Post by Robert Macaula » Thu, 19 Dec 2002 21:10:09



We were performing an IO performance test on 2.5.47. The storage we were
writing to was a Fibre Channel array(dell 650f) via qlogic 2200 cards
using the qlogicfc driver in the Linux kernel. There were 8 separate LUNS
on the FC array, each of which has an ext3 filesystem on them. There are
no partition tables on the disks(one of the disks would not accept one,
separate issue). The ext3 filesystem was created directly on the block
devices, /dev/sdf /dev/sdg etc. The server is a Dell Poweredge 6650, 4
procs, 8Gig RAM. More detailed system information is appended at the
bottom.

For now, the test was 100% writing to all 8 filesystems in parallel. The
following BUG was reported halfway through the 4th run of this test. I'm
not sure how reproducible this is.

The machine is still running. IO in progress at the time of the BUG has
stopped in D state, New IO is stil possible though to the disks. I will
leave the system up and running if there is any more info needed for a few
days.

I will be trying a more recent version in a few days. 2.5.47 was the
latest kernel I could compile at the time. I've looked through the
archives, but could not find any mention of this particular bug, so I do
not know if it has been addressed or not. Thanks

Assertion failure in journal_write_metadata_buffer() at fs/jbd/journal.c:415: "buffer_jdirty(jh2bh(jh_in))"
------------[ cut here ]------------
kernel BUG at fs/jbd/journal.c:415!
invalid operand: 0000
qlogicfc autofs
CPU:    2
EIP:    0060:[<c0193b62>]    Not tainted
EFLAGS: 00010246
eax: 0000006f   ebx: cd31e720   ecx: 00000000   edx: c02f6388
esi: 00000000   edi: ea0caa50   ebp: f29abb00   esp: c6b23e30
ds: 0068   es: 0068   ss: 0068
Process kjournald (pid: 3032, threadinfo=c6b22000 task=ee07d100)
Stack: c02b2240 c02afb0d c02afae0 0000019f c02afaf1 00000000 c6b22000 f6a5de00
       00000000 00000000 cd31e720 00000000 ea0caa50 f29abb00 c0191062 f29abb00
       cd31e720 c6b23e98 000015bf f6a5de94 00000000 00000f9c cb1a3064 0000000a
Call Trace: [<c0191062>]  [<c0193976>]  [<c01937e0>]  [<c0193800>]  [<c0108b75>]
Code: 0f 0b 9f 01 e0 fa 2a c0 83 c4 14 8b 54 24 2c 8b 4a 0c 85 c9

Decoded is below

ksymoops 2.4.8 on i686 2.5.47.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.5.47/ (default)
     -m /gold/linux-2.5.46/System.map (specified)

Warning (compare_maps): ksyms_base symbol page_states__per_cpu_R__ver_page_states__per_cpu not found in System.map.  Ignoring ksyms_base entry
kernel BUG at fs/jbd/journal.c:415!
invalid operand: 0000
CPU:    2
EIP:    0060:[<c0193b62>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 0000006f   ebx: cd31e720   ecx: 00000000   edx: c02f6388
esi: 00000000   edi: ea0caa50   ebp: f29abb00   esp: c6b23e30
ds: 0068   es: 0068   ss: 0068
Stack: c02b2240 c02afb0d c02afae0 0000019f c02afaf1 00000000 c6b22000 f6a5de00
       00000000 00000000 cd31e720 00000000 ea0caa50 f29abb00 c0191062 f29abb00
       cd31e720 c6b23e98 000015bf f6a5de94 00000000 00000f9c cb1a3064 0000000a
Call Trace: [<c0191062>]  [<c0193976>]  [<c01937e0>]  [<c0193800>]  [<c0108b75>]
Code: 0f 0b 9f 01 e0 fa 2a c0 83 c4 14 8b 54 24 2c 8b 4a 0c 85 c9

>>EIP; c0193b62 <journal_write_metadata_buffer+62/260>   <=====
>>ebx; cd31e720 <_end+cef86d4/3852bfb4>
>>edx; c02f6388 <log_wait+4/c>
>>edi; ea0caa50 <_end+29ca4a04/3852bfb4>
>>ebp; f29abb00 <_end+32585ab4/3852bfb4>
>>esp; c6b23e30 <_end+66fdde4/3852bfb4>

Trace; c0191062 <journal_commit_transaction+812/1219>
Trace; c0193976 <kjournald+176/260>
Trace; c01937e0 <commit_timeout+0/10>
Trace; c0193800 <kjournald+0/260>
Trace; c0108b75 <kernel_thread_helper+5/10>

Code;  c0193b62 <journal_write_metadata_buffer+62/260>
00000000 <_EIP>:
Code;  c0193b62 <journal_write_metadata_buffer+62/260>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0193b64 <journal_write_metadata_buffer+64/260>
   2:   9f                        lahf  
Code;  c0193b65 <journal_write_metadata_buffer+65/260>
   3:   01 e0                     add    %esp,%eax
Code;  c0193b67 <journal_write_metadata_buffer+67/260>
   5:   fa                        cli    
Code;  c0193b68 <journal_write_metadata_buffer+68/260>
   6:   2a c0                     sub    %al,%al
Code;  c0193b6a <journal_write_metadata_buffer+6a/260>
   8:   83 c4 14                  add    $0x14,%esp
Code;  c0193b6d <journal_write_metadata_buffer+6d/260>
   b:   8b 54 24 2c               mov    0x2c(%esp,1),%edx
Code;  c0193b71 <journal_write_metadata_buffer+71/260>
   f:   8b 4a 0c                  mov    0xc(%edx),%ecx
Code;  c0193b74 <journal_write_metadata_buffer+74/260>
  12:   85 c9                     test   %ecx,%ecx

1 warning issued.  Results may not be reliable.

/proc/meminfo
MemTotal:      7769884 kB
MemFree:          7196 kB
MemShared:           0 kB
Buffers:         94304 kB
Cached:        7384944 kB
SwapCached:       2440 kB
Active:          64680 kB
Inactive:      7420544 kB
HighTotal:     6946752 kB
HighFree:         5248 kB
LowTotal:       823132 kB
LowFree:          1948 kB
SwapTotal:     2096440 kB
SwapFree:      2092160 kB
Dirty:          229620 kB
Writeback:           0 kB
Mapped:          12712 kB
Slab:           252824 kB
Committed_AS:    81216 kB
PageTables:       1360 kB
ReverseMaps:     19053

/proc/version
Linux version 2.5.47 (root@bottom) (gcc version 2.96 20000731 (Red Hat
Linux 7.2 2.96-108.1)) #13 SMP Wed Dec 11 11:49:05 CST 2002

/proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: ServerWorks CMIC-HE (rev 34).
  Bus  0, device   0, function  1:
    Host bridge: ServerWorks CMIC-HE (#2) (rev 0).
  Bus  0, device   0, function  2:
    Host bridge: ServerWorks CMIC-HE (#3) (rev 0).
  Bus  0, device   0, function  3:
    Host bridge: ServerWorks CMIC-HE (#4) (rev 0).
  Bus  0, device   4, function  0:
    VGA compatible controller: ATI Technologies Inc Rage XL (rev 39).
      Master Capable.  Latency=32.  Min Gnt=8.
      Non-prefetchable 32 bit memory at 0xfd000000 [0xfdffffff].
      I/O at 0xec00 [0xecff].
      Non-prefetchable 32 bit memory at 0xfe101000 [0xfe101fff].
  Bus  0, device   5, function  0:
    Class ff00: Dell Computer Corporation Remote Assistant Card 3 (rev 0).
      IRQ 20.
      Master Capable.  Latency=32.
      Prefetchable 32 bit memory at 0xfeb80000 [0xfeb80fff].
      I/O at 0xe8f8 [0xe8ff].
      I/O at 0xe8e8 [0xe8ef].
  Bus  0, device   5, function  1:
    Class ff00: Dell Computer Corporation PowerEdge Expandable RAID
Controller 3/Di (rev 0).
      IRQ 31.
      Master Capable.  Latency=32.
      Non-prefetchable 32 bit memory at 0xfe100000 [0xfe100fff].
      I/O at 0xe880 [0xe8bf].
      Prefetchable 32 bit memory at 0xfeb00000 [0xfeb7ffff].
  Bus  0, device   5, function  2:
    Class 0c07: PCI device 1028:0009 (Dell Computer Corporation) (rev 0).
      IRQ 41.
      Master Capable.  Latency=32.
      I/O at 0xe8f4 [0xe8f7].
  Bus  0, device  15, function  0:
    Host bridge: ServerWorks CSB5 South Bridge (rev 147).
      Master Capable.  Latency=64.
  Bus  0, device  15, function  1:
    IDE interface: ServerWorks CSB5 IDE Controller (rev 147).
      Master Capable.  Latency=64.
      I/O at 0x8c0 [0x8c7].
      I/O at 0x8c8 [0x8cb].
      I/O at 0x8d0 [0x8d7].
      I/O at 0x8d8 [0x8db].
      I/O at 0x8b0 [0x8bf].
  Bus  0, device  15, function  3:
    ISA bridge: PCI device 1166:0225 (ServerWorks) (rev 0).
  Bus  0, device  16, function  0:
    Host bridge: ServerWorks CIOB30 (rev 3).
      Master Capable.  Latency=32.
  Bus  0, device  16, function  2:
    Host bridge: ServerWorks CIOB30 (#2) (rev 3).
      Master Capable.  Latency=32.
  Bus  0, device  17, function  0:
    Host bridge: ServerWorks CIOB30 (#3) (rev 3).
      Master Capable.  Latency=32.
  Bus  0, device  17, function  2:
    Host bridge: ServerWorks CIOB30 (#4) (rev 3).
      Master Capable.  Latency=32.
  Bus  0, device  18, function  0:
    Host bridge: ServerWorks CIOB30 (#5) (rev 3).
      Master Capable.  Latency=32.
  Bus  0, device  18, function  2:
    Host bridge: ServerWorks CIOB30 (#6) (rev 3).
      Master Capable.  Latency=32.
  Bus  3, device   1, function  0:
    PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (rev 0).
      Master Capable.  Latency=32.  Min Gnt=6.
  Bus  4, device   0, function  0:
    PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (#2) (rev 0).
      Master Capable.  Latency=32.  Min Gnt=6.
  Bus  4, device   1, function  0:
    SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3
SCSI Processor (rev 6).
      IRQ 32.
      Master Capable.  Latency=32.  Min Gnt=64.
      I/O at 0xdc00 [0xdcff].
      Non-prefetchable 32 bit memory at 0xfcdff000 [0xfcdfffff].
  Bus  5, device   0, function  0:
    RAID bus controller: American Megatrends Inc. MegaRAID (rev 32).
      IRQ 21.
      Master Capable.  Latency=32.
      Prefetchable 32 bit memory at 0xf0000000 [0xf7ffffff].
  Bus 11, device   1, function  0:
    Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet Controller
(rev 2).
      IRQ 25.
      Master Capable.  Latency=64.  Min Gnt=255.
      Non-prefetchable 64 bit memory at 0xefe20000 [0xefe3ffff].
      Non-prefetchable 64 bit memory at 0xefe00000 [0xefe1ffff].
      I/O at 0xcce0 [0xccff].
  Bus 21, device   1, function  0:
    Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet Controller
(#2) (rev 2).
      IRQ 29.
      Master Capable.  Latency=64.  Min Gnt=255.
      Non-prefetchable 64 bit memory at 0xefa20000 [0xefa3ffff].
      Non-prefetchable 64 bit memory at 0xefa00000 [0xefa1ffff].
      I/O at 0xace0 [0xacff].
  Bus 16, device   2, function  0:
    SCSI storage controller: QLogic Corp. QLA2200 (rev 5).
      IRQ 24.
      Master Capable.  Latency=32.  Min Gnt=64.
      I/O at 0xbc00 [0xbcff].
      Non-prefetchable 32 bit memory at 0xefc00000 [0xefc00fff].
  Bus 26, device   1, function  0:
    SCSI storage controller: QLogic
...

read more »

 
 
 

2.5.47 - Assertion failed in fs/jbd/journal.c:415

Post by Andrew Morto » Thu, 19 Dec 2002 22:40:10



> We were performing an IO performance test on 2.5.47. The storage we were
> writing to was a Fibre Channel array(dell 650f) via qlogic 2200 cards
> using the qlogicfc driver in the Linux kernel. There were 8 separate LUNS
> on the FC array, each of which has an ext3 filesystem on them. There are
> no partition tables on the disks(one of the disks would not accept one,
> separate issue). The ext3 filesystem was created directly on the block
> devices, /dev/sdf /dev/sdg etc. The server is a Dell Poweredge 6650, 4
> procs, 8Gig RAM. More detailed system information is appended at the
> bottom.

> For now, the test was 100% writing to all 8 filesystems in parallel. The
> following BUG was reported halfway through the 4th run of this test. I'm
> not sure how reproducible this is.

> The machine is still running. IO in progress at the time of the BUG has
> stopped in D state, New IO is stil possible though to the disks. I will
> leave the system up and running if there is any more info needed for a few
> days.

> I will be trying a more recent version in a few days. 2.5.47 was the
> latest kernel I could compile at the time. I've looked through the
> archives, but could not find any mention of this particular bug, so I do
> not know if it has been addressed or not. Thanks

> Assertion failure in journal_write_metadata_buffer() at fs/jbd/journal.c:415: "buffer_jdirty(jh2bh(jh_in))"

I can't immediately see what would cause this.  There is code in
__journal_file_buffer which could have triggered this, but we should
have exclusion from that via both lock_kernel() and lock_journal().

I'll see if Stephen can spot it.   I shall assume you were using
the data-ordered journalling mode.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.47 - Assertion failed in fs/jbd/journal.c:415

Post by Robert Macaula » Thu, 19 Dec 2002 23:10:09



> I can't immediately see what would cause this.? There is code in
> __journal_file_buffer which could have triggered this, but we should
> have exclusion from that via both lock_kernel() and lock_journal().

> I'll see if Stephen can spot it.?? I shall assume you were using
> the data-ordered journalling mode.

Correct, I also had them mounted with noatime as well if that matters.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

2.5.47 - Assertion failed in fs/jbd/journal.c:415

Post by Andrew Morto » Fri, 20 Dec 2002 00:10:12




> > I can't immediately see what would cause this.  There is code in
> > __journal_file_buffer which could have triggered this, but we should
> > have exclusion from that via both lock_kernel() and lock_journal().

> > I'll see if Stephen can spot it.   I shall assume you were using
> > the data-ordered journalling mode.

> Correct, I also had them mounted with noatime as well if that matters.

Seems that I failed to propagate one of Stephen's 2.4 fixes forwards.
It could well explain this failure.  I shall send you the diff after
testing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

1. 2.5.47 / unusual ext3 fs errors

Under 2.5.x I seem to be getting a lot of fs errors on fsck, mainly
dealing with bad inode counts in groups. Just now though, I had /var
remounted read-only due to the following:

t_transaction: Journal has aborted
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted
...
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted
EXT3-fs error (device ide0(3,9)) in start_transaction: Journal has aborted

And again, on reboot into single user mode and a full fsck bad inode
count errors were present. There were no errors detected whilst testing
the disk with -c.

Under 2.4.x my filesystems never showed errors.

I'd provide more info but this is all that I have. If you need more then
you'll need to tell me what to do to get it. :)

Thanks.

--
        All people are equal,
        But some are more equal then others.
            - George W. Bush Jr, President of the United States
              September 21, 2002 (Abridged version of security speech)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. ksh beginner needs help

3. FS corruption with 2.5.47

4. /proc/kcore vs. /proc/meminfo

5. 2.5.47 : fs/nfsd/nfs4proc.c compile error

6. Mounting Software....HELP

7. Orinoco pcmcia fails in 2.5.47, OK in 2.5.43

8. Error: NETDEV WATCHDOG:eth1: transmit timed out

9. 2.5.47: make modules_install fails

10. 2.5.47-ac1 fails linking

11. 2.5.47: Fix e100 driver bug on STL2 motherboard -- 'e100: hw init failed'

12. i2o_lan modules fails to build in 2.5.47

13. 2.5.47 make fail