Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

Post by Torrey Hoffma » Fri, 14 Mar 2003 08:00:13



I heard that the firewire merge in 2.4.21-pre5 was messed up, so I
replaced the -pre5 drivers/ieee1394 with the one from -pre4.

I got an oops while loading the driver.  I will continue to experiment
with recent kernels, and try to find a bitkeeper snapshot with the
latest firewire fixes.  Any suggestions are welcome.

(I am experimenting with recent kernels because these modules cause oops
and hangs with the latest Red Hat kernel as well.  However, the hardware
works fine with older Red Hat kernels.)

Anyway, the oops is decoded below.  System is an up to date Red Hat 8,
except for the kernel.

ohci1394_0: Unexpected PCI resource length of 1000!
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9]  MMIO=[e9000000-e90007ff]  Max Packet=[2048]
ieee1394: SelfID completion called outside of bus reset!
ieee1394: Device added: Node[00:1023]  GUID[0004830000002cb3]  [Oxford  ]
ieee1394: Host added: Node[01:1023]  GUID[0030dd8000505e29]  [Linux OHCI-1394]
Unable to handle kernel NULL pointer dereference at virtual address 0000002c
 printing eip:
c016e639
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c016e639>]    Not tainted
EFLAGS: 00013246
eax: 00000000   ebx: d1e518c0   ecx: d1eaf550   edx: d1eaf550
esi: 00000000   edi: 00000000   ebp: 00000000   esp: d2547e60
ds: 0018   es: 0018   ss: 0018
Process kjournald (pid: 157, stackpage=d2547000)
Stack: c9656b00 d273f380 d1eaf550 d1e4dc60 c016c92d d1eaf550 c9656b00 00000004
       000006e4 00000000 d273f3f4 00000000 000003dc c95ebc24 00000000 d1e4dc60
       d1eaf3d0 000006e4 ce8ed480 c9656a40 ce8ed5a0 ce8ed540 ce8ed4e0 ce8ed480
Call Trace:    [<c016c92d>] [<c01176fc>] [<c016f62a>] [<c016f4c0>] [<c010744e>]
  [<c016f4e0>]

Code: 3b 5e 2c 74 29 89 34 24 89 5c 24 04 e8 56 01 00 00 8b 5c 24
 <6>ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[00:1023]: Max speed [S400] - Max payload [2048]
scsi1 : IEEE-1394 SBP-2 protocol driver (host: ohci1394)

SBP-2 module load options:
- Max speed supported: S400
- Max sectors per I/O supported: 255
- Max outstanding commands supported: 8
- Max outstanding commands per lun supported: 1
- Serialized I/O (debug): no
- Exclusive login: yes
  Vendor: WDC WD12  Model: 00JB-00CRA1       Rev:
  Type:   Direct-Access                      ANSI SCSI revision: 06
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
 sda: sda1

Quote:>>EIP; c016e639 <__journal_remove_checkpoint+39/90>   <=====
>>ebx; d1e518c0 <_end+11ae7e88/144df628>
>>ecx; d1eaf550 <_end+11b45b18/144df628>
>>edx; d1eaf550 <_end+11b45b18/144df628>
>>esp; d2547e60 <_end+121de428/144df628>

Trace; c016c92d <journal_commit_transaction+6dd/1180>
Trace; c01176fc <schedule+21c/360>
Trace; c016f62a <kjournald+14a/1d0>
Trace; c016f4c0 <commit_timeout+0/10>
Trace; c010744e <kernel_thread+2e/40>
Trace; c016f4e0 <kjournald+0/1d0>

Code;  c016e639 <__journal_remove_checkpoint+39/90>
00000000 <_EIP>:
Code;  c016e639 <__journal_remove_checkpoint+39/90>   <=====
   0:   3b 5e 2c                  cmp    0x2c(%esi),%ebx   <=====
Code;  c016e63c <__journal_remove_checkpoint+3c/90>
   3:   74 29                     je     2e <_EIP+0x2e>
Code;  c016e63e <__journal_remove_checkpoint+3e/90>
   5:   89 34 24                  mov    %esi,(%esp,1)
Code;  c016e641 <__journal_remove_checkpoint+41/90>
   8:   89 5c 24 04               mov    %ebx,0x4(%esp,1)
Code;  c016e645 <__journal_remove_checkpoint+45/90>
   c:   e8 56 01 00 00            call   167 <_EIP+0x167>
Code;  c016e64a <__journal_remove_checkpoint+4a/90>
  11:   8b 5c 24 00               mov    0x0(%esp,1),%ebx

Torrey Hoffman

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

Post by Ben Collin » Fri, 14 Mar 2003 08:20:06



> I heard that the firewire merge in 2.4.21-pre5 was messed up, so I
> replaced the -pre5 drivers/ieee1394 with the one from -pre4.

I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
also fixes some things in general).

Quote:> I got an oops while loading the driver.  I will continue to experiment
> with recent kernels, and try to find a bitkeeper snapshot with the
> latest firewire fixes.  Any suggestions are welcome.
> >>EIP; c016e639 <__journal_remove_checkpoint+39/90>   <=====

This happened in the kjournald thread context. I'm not sure it is
ieee1394 related, but it is suspect that it happened in the middle of
handling an ieee1394 bus reset.

Is this reproducible when loading the ohci1394 driver? If so, does it
occur when you turn off hotplug (IOW, don't load sbp2 driver) or if the
sbp2 device is not attached?

--
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
Deqo       - http://www.deqo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

Post by Torrey Hoffma » Fri, 14 Mar 2003 20:10:05


[ohci1394 / sbp2 problems]

Quote:> I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
> also fixes some things in general).

Thanks for the response.

Last night I (finally) installed bitkeeper, pulled the latest 2.4 tree,
and gave it a try.  It seems to have solved the problem on my single CPU
machine.  I will try my SMP machine tonight and see how things go there.

I run reiserfs on my firewire drives but ext3 on some other partitions.
These oops have often occurred when doing rsync's between the reiserfs
on firewire and an ext3 or reiserfs partition on a regular disk or raid5
setup.  

On my SMP machine this morning (using Red Hat's 2.4.18-18smp kernel) I
had a similar oops with references to kjournald under a heavy firewire
load.   The machine didn't die, and after the bus resets completed, the
rsync from the firewire drive continued.  

These oopses have been very reproducible while loading ohci1394, and
sometimes while transferring data after loading.  They don't occur if
the sbp2 device is not attached.   I have hacked my rc.sysinit script to
always load the drivers, since Red Hat's autodetection stuff there quit
working around 2.4.18-17, and as long as the device isn't attached
2.4.18-24 boots fine and loads the drivers.

Up until installing 2.4-bk last night, I normally booted to Red Hat's
2.4.18-24 kernel, except when I need to use firewire. 2.4.18-24 doesn't
work at all for me under firewire, and 2.4.18-18 "mostly" works.)

Anyway, I will upgrade all my machines to the latest -bk snapshot and
will be back with more bug reports if I see any glitches...

Hopefully Red Hat will update their official kernel with the firewire
fixes.  And fix their rc.sysinit script too, while they are at it.  (No,
I haven't submitted a bugzilla report yet, will do so if -bk fixes
things for me...)

Thanks again,

Torrey Hoffman

Quote:

> > I got an oops while loading the driver.  I will continue to experiment
> > with recent kernels, and try to find a bitkeeper snapshot with the
> > latest firewire fixes.  Any suggestions are welcome.

> > >>EIP; c016e639 <__journal_remove_checkpoint+39/90>   <=====

> This happened in the kjournald thread context. I'm not sure it is
> ieee1394 related, but it is suspect that it happened in the middle of
> handling an ieee1394 bus reset.

> Is this reproducible when loading the ohci1394 driver? If so, does it
> occur when you turn off hotplug (IOW, don't load sbp2 driver) or if the
> sbp2 device is not attached?

> --
> Debian     - http://www.debian.org/
> Linux 1394 - http://www.linux1394.org/
> Subversion - http://subversion.tigris.org/
> Deqo       - http://www.deqo.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

Post by Torrey Hoffma » Sat, 15 Mar 2003 05:50:06


[about firewire problems]

Quote:> I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
> also fixes some things in general).

Although I said things were working, it turns out firewire is not bug
free quite yet.

Here's another oops to look at.

The kernel was a bitkeeper snapshot from last night (March 12), no other
patches or modifications, and the hardware is a single CPU Pentium III,
a Maxtor-brand firewire controller, with an ATA-6 supporting
IDE-to-Firewire bridge hooked up to it.  

This occurred when rc.sysinit loaded ohci1394.  sbp2 is automatically
loaded by the hotplug driver, so that may have actually been the source
of the problem.  

The system continued to boot after the oops (and in fact it's running
now as I write this...)

ksymoops 2.4.5 on i686 2.4.21-bk-0312.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.21-bk-0312/ (default)
     -m /boot/System.map-2.4.21-bk-0312 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
d3848227
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<d3848227>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: 00000000   ecx: 00000001   edx: d24b2000
esi: d24b2000   edi: 00000002   ebp: 00000002   esp: ce0c9fa8
ds: 0018   es: 0018   ss: 0018
Process knodemgrd (pid: 211, stackpage=ce0c9000)
Stack: 00000001 00000000 d24b20f4 0000ffc2 d38482d7 d24b2000 00000002 00000002
       c1361da0 c1361db8 d384f1f8 d24b2000 d384836a d24b2000 00000018 00000f00
       d2461e50 d24b2000 c010744e c1361da0 d3848300 c1361da4
Call Trace:    [<d38482d7>] [<d384f1f8>] [<d384836a>] [<c010744e>] [<d3848300>]
Code: 8b 1b 3d 08 f2 84 d3 75 f0 58 5b 5e 5f c3 39 78 20 74 eb 89

Quote:>>EIP; d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50>   <=====
>>edx; d24b2000 <_end+1218d108/134fe168>
>>esi; d24b2000 <_end+1218d108/134fe168>
>>esp; ce0c9fa8 <_end+dda50b0/134fe168>

Trace; d38482d7 <[ieee1394]nodemgr_node_probe+87/b0>
Trace; d384f1f8 <[ieee1394]nodemgr_serialize+0/10>
Trace; d384836a <[ieee1394]nodemgr_host_thread+6a/b0>
Trace; c010744e <kernel_thread+2e/40>
Trace; d3848300 <[ieee1394]nodemgr_host_thread+0/b0>

Code;  d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50>
00000000 <_EIP>:
Code;  d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50>   <=====
   0:   8b 1b                     mov    (%ebx),%ebx   <=====
Code;  d3848229 <[ieee1394]nodemgr_node_probe_cleanup+29/50>
   2:   3d 08 f2 84 d3            cmp    $0xd384f208,%eax
Code;  d384822e <[ieee1394]nodemgr_node_probe_cleanup+2e/50>
   7:   75 f0                     jne    fffffff9 <_EIP+0xfffffff9>
Code;  d3848230 <[ieee1394]nodemgr_node_probe_cleanup+30/50>
   9:   58                        pop    %eax
Code;  d3848231 <[ieee1394]nodemgr_node_probe_cleanup+31/50>
   a:   5b                        pop    %ebx
Code;  d3848232 <[ieee1394]nodemgr_node_probe_cleanup+32/50>
   b:   5e                        pop    %esi
Code;  d3848233 <[ieee1394]nodemgr_node_probe_cleanup+33/50>
   c:   5f                        pop    %edi
Code;  d3848234 <[ieee1394]nodemgr_node_probe_cleanup+34/50>
   d:   c3                        ret    
Code;  d3848235 <[ieee1394]nodemgr_node_probe_cleanup+35/50>
   e:   39 78 20                  cmp    %edi,0x20(%eax)
Code;  d3848238 <[ieee1394]nodemgr_node_probe_cleanup+38/50>
  11:   74 eb                     je     fffffffe <_EIP+0xfffffffe>
Code;  d384823a <[ieee1394]nodemgr_node_probe_cleanup+3a/50>
  13:   89 00                     mov    %eax,(%eax)

1 warning issued.  Results may not be reliable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/