lock_kernel() usage and sync_*() functions

lock_kernel() usage and sync_*() functions

Post by Nigel Gambl » Fri, 23 Mar 2001 12:20:03



Why is the kernel lock held around sync_supers() and sync_inodes() in
sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
to serialize calls to these functions, or is there some other reason?

Since this use of the BKL is one of the causes of high preemption
latency in a preemptible kernel, I'm hoping it would be OK to replace
them with a semaphore.  Please let me know if this is not the case.

Thanks!


Mountain View, CA, USA.                         http://www.nrg.org/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

lock_kernel() usage and sync_*() functions

Post by Linus Torval » Fri, 23 Mar 2001 13:00:04




Quote:

>Why is the kernel lock held around sync_supers() and sync_inodes() in
>sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
>to serialize calls to these functions, or is there some other reason?

A lot of the FS locks need the kernel lock and are not SMP-safe on their
own.  Look at "lock_super()" for the worst offender (I think most of the
other ones have been converted to properly lock on SMP).

sync_inodes() _shouldn't_ need it. sync_supers() definitely does.

The fact that sync_dev() doesn't get the kernel lock looks worrisome.
Of course, I don't think much of anything actually _uses_ "sync_dev()"
anyway (quick grep shows it up in revalidate, which gets the kernel lock
earlier)

But it might be a good idea to try to (a) remove the bkl from the
functions, and push it down into sync_supers() that definitely needs it
now (and remove it when it doesn't any more).

The long-term plan (ie 2.5.x) is to basically remove the bkl from all
the VFS interfaces. For 2.4.x, only the truly performance-critical stuff
was done (ie mainly name lookup and read/write page).

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

lock_kernel() usage and sync_*() functions

Post by Alexander Vir » Fri, 23 Mar 2001 14:40:03





> >Why is the kernel lock held around sync_supers() and sync_inodes() in
> >sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
> >to serialize calls to these functions, or is there some other reason?

> A lot of the FS locks need the kernel lock and are not SMP-safe on their
> own.  Look at "lock_super()" for the worst offender (I think most of the
> other ones have been converted to properly lock on SMP).

Fixed in namespaces patch.

Quote:> sync_inodes() _shouldn't_ need it. sync_supers() definitely does.

Unfortunately, sync_inodes() involves scanning the list of superblocks.
And that's a source of big PITA.

Quote:> The fact that sync_dev() doesn't get the kernel lock looks worrisome.
> Of course, I don't think much of anything actually _uses_ "sync_dev()"
> anyway (quick grep shows it up in revalidate, which gets the kernel lock
> earlier)

sync_dev() is called only under BKL, AFAICS.

Quote:> But it might be a good idea to try to (a) remove the bkl from the
> functions, and push it down into sync_supers() that definitely needs it
> now (and remove it when it doesn't any more).

Again, done in namespaces patch (->s_lock is semaphore there).

Quote:> The long-term plan (ie 2.5.x) is to basically remove the bkl from all
> the VFS interfaces. For 2.4.x, only the truly performance-critical stuff
> was done (ie mainly name lookup and read/write page).

Ehh... Linus, the main problem is in get_super(). Want a nice race?
sys_ustat() vs. sys_umount(). The former does get_super(), finds
struct super_block and does nothing to guarantee that it will stay.
Then it calls ->statfs(). In the meanwhile, you umount the thing
and do rmmod. Oops..

We need to refcount the struct super_block. I went for the following:
rw-semaphore ->s_umount protects the superblock "contents", ->s_count
is a refcount.
        get_super() grabs a spinlock, looks through the list of
superblocks and increments s_count before dropping the spinlock.
Then it does down_read() on ->s_umount and checks ->s_root once it
got the semaphore. non-NULL - OK, NULL - repeat the search.
        drop_super() - read_up() and atomic_dec_and_test(), followed
by kfree().
        kill_super() - grab the ->s_umount for write(), remove from
list and set ->s_root to NULL while we are holding the ->s_umount and
only then drop it.
        ->s_count gets contributions from each get_super() (temp.
reference) _and_ from having vfsmounts over that superblock. Same
scheme as with mm_struct - it's a number of temp refs + one more if
there are perm. ones.

        It works (I'm running such kernel for more than a month on
my boxen) and I'm going to show this beast in details in San Jose.
Patch is about 150K unpacked, but most of that stuff is in cleanup
of last stages of boot sequence and general cleanup of fs/super.c,
so we could deal with get_super() races with smaller patch. However,
it requires changes to drivers - get_super() needs to be balanced.
Sorry - no way around tha, AFAICS.

        I started with adding
void invalidate_dev(kdev_t dev, int sync_flag)
{
        struct super_block *sb = get_super(dev);
        if (sync_flag == 1)
                sync_dev(dev);
        else if (sync_flag == 2)
                fsync_dev(dev);
        if (sb) {
                invalidate_inodes(sb);
                /* drop_super(sb); here */
        }
        invalidate_buffers(dev);

Quote:}

in fs/buffer.c and converted drivers to that - all uses of get_super()
are in the kernel proper, so after that it was easier to fix without
excessive pain. We could do the same in the main tree as the first
step of get_super() fixes. Mind if I submit such patch?

                                                Cheers,
                                                        Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

lock_kernel() usage and sync_*() functions

Post by Ingo Oese » Sat, 24 Mar 2001 00:10:03



>    I started with adding
> void invalidate_dev(kdev_t dev, int sync_flag)
> {
>         struct super_block *sb = get_super(dev);
>         if (sync_flag == 1)
>                 sync_dev(dev);
>         else if (sync_flag == 2)
>                 fsync_dev(dev);
>         if (sb) {
>                 invalidate_inodes(sb);
>                 /* drop_super(sb); here */
>         }
>         invalidate_buffers(dev);
> }

Could we remove the "magic" sync_flag from the exported interface?

Do sth. like renaming your invalidate_dev() to
_invalidate_dev() and adding 3 defines:

#define invalidate_dev(dev) _invalidate_dev(dev,0)
#define invalidate_dev_sync(dev) _invalidate_dev(dev,1)
#define invalidate_dev_fsync(dev) _invalidate_dev(dev,2)

This would make it quite clear, what will be done.

AFAIR Linus dosn't like these magic numers either, right?

Regards

Ingo Oeser
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

lock_kernel() usage and sync_*() functions

Post by Alexander Vir » Sat, 24 Mar 2001 02:30:06



> Could we remove the "magic" sync_flag from the exported interface?

Sure. But I seriously suspect that sync_dev() is wrong in 100% of cases.
So "flag" is eventually going to become "do we want to sync it or not?"
thing. However, I don't want to deal with that sort of analysis right now -
callers are in drivers/* and we are in even branch.

Quote:> Do sth. like renaming your invalidate_dev() to
> _invalidate_dev() and adding 3 defines:

> #define invalidate_dev(dev) _invalidate_dev(dev,0)
> #define invalidate_dev_sync(dev) _invalidate_dev(dev,1)
> #define invalidate_dev_fsync(dev) _invalidate_dev(dev,2)

> This would make it quite clear, what will be done.

> AFAIR Linus dosn't like these magic numers either, right?

I also don't like them. I _don't_ believe that magic #defines are
any better, though. And I would rather localize the get_super() to
kernel proper preserving the current behaviour and left dealing
with the sync vs. fsync to 2.5. It's easy to grep and if my gut
feeling is correct your invalidate_dev_sync() is going to be a ballast.

Again, for 2.4 I would rather do a change that obviously doesn't
change behaviour of drivers, doesn't add functions without need
and is easy to review once drivers become a fair game again (== in 2.5).
Comments?
                                                        Cheers,
                                                                Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/