Problem with usb-storage using HP 8200 external CD-ROM burner

Problem with usb-storage using HP 8200 external CD-ROM burner

Post by Brent Baccal » Fri, 03 Aug 2001 16:10:09



Hi -

I've been having some problems with my USB CD-ROM burner (HP 8200) - it
locks up the machine occasionally.  This is with Linux 2.4.6.

I saw a posting to this list (linux-usb-devel) by Cody Pisto with some
HP
8200 patches (17 Jul 2001), but the geocrawler archive doesn't archive
the
attachments - or at least I couldn't figure out how to retrieve them.  I
wouldn't mind if somebody (Cody?) could send me a copy, but here's what
I've
found out...

Of course, first you have to patch drivers/usb/Config.in to add an
HP8200 option.

Next, the lockups.  They're caused by an attempt to lock the
io_request_lock spinlock while it's already locked.  I'm running on a
single processor machine.  I'm posting two patches to linux-kernel - one
is enhanced spinlock debugging code that reveals the problem, the other
is a remote debugger stub for intel so I can use gdb on the kernel like
sparc & ppc.

In, short, here's the problem:

The trouble starts when scsi_try_to_abort_command is called (for reasons
I'm still unclear on).  This function is in drivers/scsi/scsi_error.c:

     755 STATIC int scsi_try_to_abort_command(Scsi_Cmnd * SCpnt, int
timeout)
     756 {
     757         int rtn;
     758         unsigned long flags;
     759
     760         SCpnt->eh_state = FAILED;   /* Until we come up with
something better */
     761
     762         if (SCpnt->host->hostt->eh_abort_handler == NULL) {
     763                 return FAILED;
     764         }
     765         /*
     766          * scsi_done was called just after the command timed
out and before
     767          * we had a chance to process it. (DB)
     768          */
     769         if (SCpnt->serial_number == 0)
     770                 return SUCCESS;
     771
     772         SCpnt->owner = SCSI_OWNER_LOWLEVEL;
     773
     774         spin_lock_irqsave(&io_request_lock, flags);
     775         rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
     776         spin_unlock_irqrestore(&io_request_lock, flags);
     777         return rtn;
     778 }

Notice, at the end, the call to eh_abort_handler while the
io_request_lock is held (lines 774-777).

For the USB CD-ROM burner, eh_abort_handler is a pointer to
command_abort in drivers/usb/storage/scsiglue.c:

     173 static int command_abort( Scsi_Cmnd *srb )
     174 {
     175         struct us_data *us = (struct us_data
*)srb->host->hostdata[0];
     176
     177         US_DEBUGP("command_abort() called\n");
     178
     179         /* if we're stuck waiting for an IRQ, simulate it */
     180         if (atomic_read(us->ip_wanted)) {
     181                 US_DEBUGP("-- simulating missing IRQ\n");
     182                 up(&(us->ip_waitq));
     183         }
     184
     185         /* if the device has been removed, this worked */
     186         if (!us->pusb_dev) {
     187                 US_DEBUGP("-- device removed already\n");
     188                 return SUCCESS;
     189         }
     190
     191         /* if we have an urb pending, let's wake the control
thread up */
     192         if (us->current_urb->status == -EINPROGRESS) {
     193                 /* cancel the URB -- this will automatically
wake the thread */
     194                 usb_unlink_urb(us->current_urb);
     195
     196                 /* wait for us to be done */
     197                 down(&(us->notify));
     198                 return SUCCESS;
     199         }
     200
     201         US_DEBUGP ("-- nothing to abort\n");
     202         return FAILED;
     203 }

The problem is the down on line 197.  It causes the kernel to schedule
while the io_request_lock is held.  Now, if anything else comes along
that needs the io_request_lock, and runs before the down completes, the
kernel locks up.  Lots of stuff can actually trigger the lockup; I've
seen a page fault trying to read something in from disk cause it, as
well as just a normal disk read from user space.

I'm attaching a kernel gdb trace of one of these lockups.  It's a bit
cryptic, because the kernel gdb doesn't let me switch between tasks, so
I have to read back through a stack dump manually.  Basically, the trace
starts with a BUG() in my revised spinlock code that detects when the
same processor that holds the lock attempts to grab it again.  The
spinlock recorded the PC and task_struct when the lock was first
grabbed, so even though we're looking at the moment when the second task
came along and tried to grab it again, we can use the stored information
to find 1) which task grabbed the lock; 2) what it's PC counter was when
it grabbed it; and 3) (by reading the stack trace) what's it's doing
now.  In this trace, the answers to those questions are: 1) pid 1370
(comm = "scsi_eh"; unclear what that is); 2) the spinlock in
scsi_try_to_abort_command; and 3) scheduled from the down on line 197

Anyway, I don't know enough about this code to try and figure what the
fix should be, so maybe somebody on this list can suggest it.  Then
I'll need to figure out why scsi_try_to_abort_command() is being called
in the first place - any ideas?  It seems to be about a 50/50
proposition that during an entire CD burn, sometimes it locks up, and
sometimes it completes the whole thing.

And like I said, I'm attaching the kernel gdb trace... as an
attachment... so geocrawler can lose it too..

--
                                        -bwb

                                        Brent Baccala
                                        bacc...@freesoft.org

==============================================================================
       For news from freesoft.org, subscribe to annou...@freesoft.org:

mailto:announce-requ...@freesoft.org?subject=subscribe&body=subscribe
==============================================================================

[ kgdb.trace 41K ]
[baccala@y2k intel:linux-2.4.6-kgdb]$ gdb vmlinux
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) target remote /dev/cua0
Remote debugging using /dev/cua0
breakpoint () at i386-stub.c:622
622     }
(gdb) cont
Continuing.

Program received signal SIGILL, Illegal instruction.
0xc01b76d9 in blk_get_queue (dev=769)
    at /home/baccala/src/linux-2.4.6-kgdb/include/asm/spinlock.h:103
103                     BUG();
(gdb) print io_request_lock
$1 = {lock = 0, magic = 3735899821, last_lock_addr = 0xc01e9c60,
  last_lock_current = 0xc135a000, last_lock_processor = 0}
(gdb) print io_request_lock->last_lock_addr
$2 = (void *) 0xc01e9c60
(gdb) list *io_request_lock->last_lock_addr
0xc01e9c60 is in scsi_try_to_abort_command (/home/baccala/src/linux-2.4.6-kgdb/include/asm/spinlock.h:97).
92      static inline void spin_lock(spinlock_t *lock)
93      {
94      #if SPINLOCK_DEBUG
95              __label__ here;
96      here:
97              if (lock->magic != SPINLOCK_MAGIC) {
98      printk("eip: %p\n", &&here);
99                      BUG();
100             }
101             if (spin_is_locked(lock)
(gdb) print io_request_lock->last_lock_current
$3 = (void *) 0xc135a000
(gdb) print (struct task_struct *)io_request_lock->last_lock_current
$4 = (struct task_struct *) 0xc135a000
(gdb) print ((struct task_struct *)io_request_lock->last_lock_current)->thread
$5 = {esp0 = 3241525248, eip = 3222356453, esp = 3241524852, fs = 24, gs = 24,
  debugreg = {0, 0, 0, 0, 0, 0, 0, 0}, cr2 = 0, trap_no = 0, error_code = 0,
  i387 = {fsave = {cwd = -64641, swd = -65536, twd = -1, fip = 0, fcs = 0,
      foo = 0, fos = -65536, st_space = {0 <repeats 16 times>, -2146699776,
        16405, 0, 0}, status = 0}, fxsave = {cwd = 895, swd = 65535, twd = 0,
      fop = 65535, fip = -1, fcs = 0, foo = 0, fos = 0, mxcsr = -65536,
      reserved = 0, st_space = {0 <repeats 15 times>, -2146699776, 16405,
        0 <repeats 15 times>}, xmm_space = {0 <repeats 32 times>}, padding = {
        0 <repeats 56 times>}}, soft = {cwd = -64641, swd = -65536, twd = -1,
      fip = 0, fcs = 0, foo = 0, fos = -65536, st_space = {
        0 <repeats 16 times>, -2146699776, 16405, 0, 0}, ftop = 0 '\000',
      changed = 0 '\000', lookahead = 0 '\000', no_update = 0 '\000',
      rm = 0 '\000', alimit = 0 '\000', info = 0x0, entry_eip = 0}},
  vm86_info = 0x0, screen_bitmap = 0, v86flags = 0, v86mask = 0, v86mode = 0,
  saved_esp0 = 0, ioperm = 0, io_bitmap = {4294967295, 0 <repeats 32 times>}}
(gdb) print ((struct task_struct *)io_request_lock->last_lock_current)->thread->eip
$6 = 3222356453
(gdb) printf "%x\n", ((struct task_struct *)io_request_lock->last_lock_current)->thread->eip
c01141e5
(gdb) list *((struct task_struct *)io_request_lock->last_lock_current)->thread->eip
0xc01141e5 is in schedule (sched.c:669).
664    
665             /*
666              * This just switches the register state and the
667              * stack.
668              */
669             switch_to(prev, next, prev);
670             __schedule_tail(prev);
671    
672     same_process:
673             reacquire_kernel_lock(current);
(gdb) printf "%x\n", ((struct task_struct *)io_request_lock->last_lock_current)->thread->esp
c135be74
(gdb) x/64x  ((struct task_struct *)io_request_lock->last_lock_current)->thread->esp
0xc135be74:     0xc135bef0      0xc24ed7a0      0x00000000      0xc135a000
0xc135be84:     0xc14469dc      0xc14469e4      0xc24ed7a0      0x44505c49
0xc135be94:     0x00000092      0xc1649ec0      0xc5f3bf78      0xc5f69000
0xc135bea4:     0x55665351      0x00000083      0xc1649ec0      0xc5f3bf78
0xc135beb4:     0xc5f69000      0xc5f69098      0xc0312000      0xc02fbb40
0xc135bec4:     0x00000083      0xc135be00      0xc135a000      0x00000000
0xc135bed4:     0xc135a000      0xc14469dc      0xc135a000      0xfffffc18
0xc135bee4:     0x00000000      0xc135a000      0xc0353040      0xc135bf1c
0xc135bef4:     0xc0105d1d      0xc1446800      0xc03071c8      0x00000206
0xc135bf04:     0xc14469e4      0xc135bf0c      0x00000001      0xc135a000
0xc135bf14:     0xc14469f8      0xc14469f8      0xc135bf3c      0xc01061e4
0xc135bf24:     0xc14469dc      0xc5f69000      0x00000000      0xc02848c3
0xc135bf34:     0xc1649ec0      0xc135a000      0xc135bf58      0xc01e9cf0
0xc135bf44:     0xc1975400      0xc4db74a0      0xc135bf84      0xc1975400
0xc135bf54:     0xc01060a1      0xc135bf88      0xc01ea614      0xc1975400
0xc135bf64:     0x000005dc      0xc4db74a0      0xc135a000      0xc02fb2a0
(gdb) print *((struct task_struct *)io_request_lock->last_lock_current)
$7 = {state = 2, flags = 64, sigpending = 0, addr_limit = {seg = 4294967295},
  exec_domain = 0xc02fbc60, need_resched = 0, ptrace = 0, lock_depth = -1,
  counter = 11, nice = 0, policy = 0, mm = 0x0, has_cpu = 0, processor = 0,
  cpus_allowed = 4294967295, run_list = {next = 0x0, prev = 0xc02fbb40},
  sleep_time = 12446, next_task = 0xc14dc000, prev_task = 0xc135c000,
  active_mm = 0x0, binfmt = 0x0, exit_code = 0, exit_signal = 0,
  pdeath_signal = 0, personality = 0, dumpable = 0, did_exec = 0, pid = 1370,
  pgrp = 1, tty_old_pgrp = 0, session = 1, tgid = 1370, leader = 0,
  p_opptr = 0xc5f52000, p_pptr = 0xc5f52000, p_cptr = 0x0, p_ysptr = 0x0,
  p_osptr = 0xc135c000, thread_group = {next = 0xc135a098, prev = 0xc135a098},
  pidhash_next = 0x0, pidhash_pprev = 0xc037493c, wait_chldexit = {lock = {
      lock = 1, magic = 3735899821, last_lock_addr = 0x0,
      last_lock_current = 0x0, last_lock_processor = 0}, task_list = {
      next = 0xc135a0bc, prev = 0xc135a0bc}}, vfork_sem = 0x0,
  rt_priority = 0, it_real_value = 0, it_prof_value = 0, it_virt_value = 0,
  it_real_incr = 0, it_prof_incr = 0, it_virt_incr = 0, real_timer = {list = {
      next = 0x0, prev = 0x0}, expires = 0, data = 3241517056,
    function = 0xc011d648 <it_real_fn>}, times = {tms_utime = 0,
    tms_stime = 0, tms_cutime = 0, tms_cstime = 0}, start_time = 6420,
  per_cpu_utime = {0 <repeats 32 times>}, per_cpu_stime = {
    0 <repeats 32 times>}, min_flt = 0, maj_flt = 0, nswap = 0, cmin_flt = 0,
  cmaj_flt = 0, cnswap = 0, swappable = -1, uid = 0, euid = 0, suid = 0,
  fsuid = 0, gid = 0, egid = 0, sgid = 0, fsgid = 0, ngroups = 0, groups = {
---Type <return> to continue, or q <return> to quit---
    0 <repeats 32 times>}, cap_effective = 4294967039, cap_inheritable = 0,
  cap_permitted = 4294967295, keep_capabilities = 0, user = 0xc02fc9d4,
  rlim = {{rlim_cur = 4294967295, rlim_max = 4294967295}, {
      rlim_cur = 4294967295, rlim_max = 4294967295}, {rlim_cur = 4294967295,
      rlim_max = 4294967295}, {rlim_cur = 8388608, rlim_max = 4294967295}, {
      rlim_cur = 0, rlim_max = 4294967295}, {rlim_cur = 4294967295,
      rlim_max = 4294967295}, {rlim_cur = 3072, rlim_max = 3072}, {
      rlim_cur = 1024, rlim_max = 1024}, {rlim_cur = 4294967295,
      rlim_max = 4294967295}, {rlim_cur = 4294967295, rlim_max = 4294967295}, {
      rlim_cur = 4294967295, rlim_max = 4294967295}}, used_math = 1,
  comm = "scsi_eh_0\000\000\000\000\000\000", link_count = 0, tty = 0x0,
  locks = 0, semundo = 0x0, semsleeping = 0x0, thread = {esp0 = 3241525248,
    eip = 3222356453, esp = 3241524852, fs = 24, gs = 24, debugreg = {0, 0, 0,
      0, 0, 0, 0, 0}, cr2 = 0, trap_no = 0, error_code = 0, i387 = {fsave = {
        cwd = -64641, swd = -65536, twd = -1, fip = 0, fcs = 0, foo = 0,
        fos = -65536, st_space = {0 <repeats 16 times>, -2146699776, 16405, 0,
          0}, status = 0}, fxsave = {cwd = 895, swd = 65535, twd = 0,
        fop = 65535, fip = -1, fcs = 0, foo = 0, fos = 0, mxcsr = -65536,
        reserved = 0, st_space = {0 <repeats 15 times>, -2146699776, 16405,
          0 <repeats 15 times>}, xmm_space = {0 <repeats 32 times>},
        padding = {0 <repeats 56 times>}}, soft = {cwd = -64641, swd = -65536,
        twd = -1, fip = 0, fcs = 0, foo = 0, fos = -65536, st_space = {
          0 <repeats 16 times>, -2146699776, 16405, 0, 0}, ftop = 0 '\000',
---Type <return> to continue, or q <return> to quit---
        changed = 0 '\000', lookahead = 0 '\000', no_update = 0 '\000',
        rm = 0 '\000', alimit = 0 '\000', info = 0x0, entry_eip = 0}},
    vm86_info = 0x0, screen_bitmap = 0, v86flags = 0, v86mask = 0,
    v86mode = 0, saved_esp0 = 0, ioperm = 0, io_bitmap = {4294967295,
      0 <repeats 32 times>}}, fs = 0xc02f9440, files = 0xc02f9480,
  sigmask_lock = {lock = 1, magic = 3735899821, last_lock_addr = 0x0,
    last_lock_current = 0x0, last_lock_processor = 0}, sig = 0xc145a080,
  blocked = {sig = {4294967294, 4294967295}}, pending = {head = 0x0,
    tail = 0xc135a668, signal = {sig = {0, 0}}}, sas_ss_sp = 0,
  sas_ss_size = 0, notifier = 0, notifier_data = 0x0, notifier_mask = 0x0,
  parent_exec_id = 0, self_exec_id = 0, alloc_lock = {lock = 1,
    magic = 3735899821, last_lock_addr = 0xc0114200,
    last_lock_current = 0xc0312000, last_lock_processor = 0}}
(gdb) list *0xc0105d1c
0xc0105d1c is in __down (semaphore.c:80).
75                              break;
76                      }
77                      sem->sleepers = 1;   /* us - see -1 above */
78                      spin_unlock_irq(&semaphore_lock);
79      
80                      schedule();
81                      tsk->state = TASK_UNINTERRUPTIBLE;
82                      spin_lock_irq(&semaphore_lock);
83              }
84              spin_unlock_irq(&semaphore_lock);
(gdb) list *0xc01061e4
0xc01061e4 is at af_packet.c:1878.
1873    {
1874            remove_proc_entry("net/packet", 0);
1875            unregister_netdevice_notifier(&packet_netdev_notifier);
1876            sock_unregister(PF_PACKET);
1877            return;
1878    }
1879    
1880    static int __init packet_init(void)
1881    {
1882            sock_register(&packet_family_ops);
(gdb) list *0xc01e9cf0
0xc01e9cf0 is in scsi_try_to_abort_command (scsi_error.c:775).
770                     return SUCCESS;
771    
772             SCpnt->owner = SCSI_OWNER_LOWLEVEL;
773    
774             spin_lock_irqsave(&io_request_lock, flags);
775             rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
776             spin_unlock_irqrestore(&io_request_lock, flags);
777             return rtn;
778     }
779    
(gdb) list *0xc0219f29
0xc0219f29 is in command_abort (scsiglue.c:198).
193                     /* cancel the URB -- this will automatically wake the thread */
194                     usb_unlink_urb(us->current_urb);
195    
196                     /* wait for us to be done */
197            
...

read more »

 
 
 

Problem with usb-storage using HP 8200 external CD-ROM burner

Post by Brent Baccal » Wed, 08 Aug 2001 05:00:09



> Brent --

> As the module maintainer, I'm very intereted in your analysis.....

> Of course, I'm interested in knowing how the command_abort function can be
> made safe -- I think there are already patches in the 2.4.8 kernel which
> should fix the cause of this function getting called.

> Any ideas on how to fix this issue?

Well, what comes to mind immediately is two things.

First, does scsiglue.c's abort_command really need to handshake with the
code in usb.c?  If not, just get rid of the down and its matching up.

Second, this code (in scsi_error.c):

      774         spin_lock_irqsave(&io_request_lock, flags);
      775         rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
      776         spin_unlock_irqrestore(&io_request_lock, flags);

seems like a real shotgun approach.  Get rid of the spinlock stuff, and
make sure that the abort handlers lock io_request_lock themselves if
they need it.  Of course, this would require changes to all the scsi
drivers.

I don't work with the kernel that much, so really I'm hoping somebody
else can suggest the fix - that's why I posted it in the first place.
I'll cc this to the mailing lists, in the hope that somebody will have
an idea.

--
                                        -bwb

                                        Brent Baccala

==============================================================================


==============================================================================
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Problem with usb-storage using HP 8200 external CD-ROM burner

Post by Matthew Dhar » Wed, 08 Aug 2001 12:20:07




> > Of course, I'm interested in knowing how the command_abort function can be
> > made safe -- I think there are already patches in the 2.4.8 kernel which
> > should fix the cause of this function getting called.

> > Any ideas on how to fix this issue?

> Well, what comes to mind immediately is two things.

> First, does scsiglue.c's abort_command really need to handshake with the
> code in usb.c?  If not, just get rid of the down and its matching up.

Unfortunately, it does.  The SCSI layer seems to believe that once we've
returned from the abort_command() routine, the driver is in a position to
accept a new command.  Thus, some level of handshaking is necessary.

Quote:> Second, this code (in scsi_error.c):

>       774         spin_lock_irqsave(&io_request_lock, flags);
>       775         rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
>       776         spin_unlock_irqrestore(&io_request_lock, flags);

> seems like a real shotgun approach.  Get rid of the spinlock stuff, and
> make sure that the abort handlers lock io_request_lock themselves if
> they need it.  Of course, this would require changes to all the scsi
> drivers.

Hrm... perhaps I could just unlock that spinlock and then re-lock it before
returning.  Anyone have a clue if this would work?

Matt

--

Maintainer, Linux USB Mass Storage Driver

Would you mind not using our Web server? We're trying to have a game of
Quake here.
                                        -- Greg
User Friendly, 5/11/1998

  application_pgp-signature_part
< 1K Download
 
 
 

Problem with usb-storage using HP 8200 external CD-ROM burner

Post by Jens Axbo » Wed, 08 Aug 2001 16:40:06



> >       774         spin_lock_irqsave(&io_request_lock, flags);
> >       775         rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
> >       776         spin_unlock_irqrestore(&io_request_lock, flags);

> > seems like a real shotgun approach.  Get rid of the spinlock stuff, and
> > make sure that the abort handlers lock io_request_lock themselves if
> > they need it.  Of course, this would require changes to all the scsi
> > drivers.

> Hrm... perhaps I could just unlock that spinlock and then re-lock it before
> returning.  Anyone have a clue if this would work?

That would work -- stuff like the above is already scheduled for removal
for 2.5. Locking will be moved from the mid layer to the drivers
themselves.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Problem with usb-storage using HP 8200 external CD-ROM burner

Post by Brent Baccal » Thu, 09 Aug 2001 00:20:06




> > >       774         spin_lock_irqsave(&io_request_lock, flags);
> > >       775         rtn = SCpnt->host->hostt->eh_abort_handler(SCpnt);
> > >       776         spin_unlock_irqrestore(&io_request_lock, flags);

> > > seems like a real shotgun approach.  Get rid of the spinlock stuff, and
> > > make sure that the abort handlers lock io_request_lock themselves if
> > > they need it.  Of course, this would require changes to all the scsi
> > > drivers.

> > Hrm... perhaps I could just unlock that spinlock and then re-lock it before
> > returning.  Anyone have a clue if this would work?

> That would work -- stuff like the above is already scheduled for removal
> for 2.5. Locking will be moved from the mid layer to the drivers
> themselves.

If that's the case (the locking will be moved in 2.5), then I'd suggest
using Matthew's idea of unlocking, then re-locking the spinlock, as a
temporary measure.

--
                                        -bwb

                                        Brent Baccala

==============================================================================


==============================================================================
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/