Rusty's module talk at the Kernel Summit

Rusty's module talk at the Kernel Summit

Post by Rusty Russel » Fri, 12 Jul 2002 11:50:07



On Thu, 4 Jul 2002 10:24:11 -0700

Quote:>    The system that I am composing this email on has 1.1MB of
> modules and does not have sound drivers loaded.  It has ipv4 and a
> number of other facilities modularized that are not modules in the
> stock kernels.  Every system that I use has a configuration like this.
> With a lower per-module overhead, I would be more inclined to try to
> modularize other facilities and break up some larger modules into
> smaller ones, in the case where there is substantial code that is not
> needed for some configurations.

For God's sake, WHY?  Look at what you're doing to your TLB (and if you
made IPv4 a removable module, I'll bet real money you have a bug unless
you are *very* *very* clever).

Modules are not "free".  Sorry.
Rusty.
--
   there are those who do and those who hang on and you don't see too
   many doers quoting their contemporaries.  -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Arnaldo Carvalho de Mel » Fri, 12 Jul 2002 12:00:08


Em Thu, Jul 11, 2002 at 12:48:30PM +1000, Rusty Russell escreveu:

> On Thu, 4 Jul 2002 10:24:11 -0700

> > smaller ones, in the case where there is substantial code that is not
> > needed for some configurations.

> For God's sake, WHY?  Look at what you're doing to your TLB (and if you
> made IPv4 a removable module, I'll bet real money you have a bug unless
> you are *very* *very* clever).

> Modules are not "free".  Sorry.

What about Andi Kleen patch to not use vmalloc (well, vmalloc is used as a
fallback) when loading modules but instead use big pages?  It is being
integrated in 2.4.20-pre, IIRC. IIRC with that there is still some issues, so
for enlightening the audience here, could you share your view on that patch? 8)

And for _debugging_ IPv4 maybe the modularisation, if Adam was clever, could
help somewhat.

- Arnaldo (who is stupid not to be using UML extensively, but this
           will change RSN 8) )
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by David S. Mille » Fri, 12 Jul 2002 12:00:08



   Date: Thu, 11 Jul 2002 12:48:30 +1000

   For God's sake, WHY?  Look at what you're doing to your TLB (and if you
   made IPv4 a removable module, I'll bet real money you have a bug unless
   you are *very* *very* clever).

Modules can be mapped using a large PTE mapping.
I've been meaning to do this on sparc64 for a long
time.

So this TLB argument alone is not sufficient :-)
I do concur on the "ipv4 as module is difficult to
get correct" argument however.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Arnaldo Carvalho de Mel » Fri, 12 Jul 2002 12:10:07


Em Wed, Jul 10, 2002 at 11:55:04PM -0300, Arnaldo C. Melo escreveu:

Quote:> Em Thu, Jul 11, 2002 at 12:48:30PM +1000, Rusty Russell escreveu:
> And for _debugging_ IPv4 maybe the modularisation, if Adam was clever, could
> help somewhat.

BTW, where are these patches for IPv4 modularisation? I'd love to take a look
and try it... Adam? Is it available for 2.5.latest?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Alexander Vir » Fri, 12 Jul 2002 12:40:04




>    Date: Thu, 11 Jul 2002 12:48:30 +1000

>    For God's sake, WHY?  Look at what you're doing to your TLB (and if you
>    made IPv4 a removable module, I'll bet real money you have a bug unless
>    you are *very* *very* clever).

> Modules can be mapped using a large PTE mapping.
> I've been meaning to do this on sparc64 for a long
> time.

> So this TLB argument alone is not sufficient :-)
> I do concur on the "ipv4 as module is difficult to
> get correct" argument however.

Sure, but consider the amount of tricky modules and amount of easy ones.
net/ipv4/*.c _is_ tricky; so much that having system with many parts of
such complexity would be extremely painful.

IOW, yes, we have some very tricky interfaces between the parts of kernel;
and their trickiness alone guarantees that we don't want to have them
breeding.  Stuff that genuinely needs complex interfaces is *not* something
you want to be mass-produced.

Do we need to disable rmmod when
        a) 90-odd percents of modules can be handled safely and
        b) any module that wants to prevent rmmod on itself can do that
with one line in its init_module() (add MOD_INC_USE_COUNT; and that's it)?

Notice that generic netfilter module and, say it, driver that provides
a character device are very different beasts.  The latter can be easily
handled in safe way; it has simple use model and very few places in
core code that need to take care of the things - at once for all such
modules.  The former is much trickier.  The thing being, there are
hundreds of simple modules and a dozen or so tricky ones.  And as the
time goes the ratio will only increase, presuming that we want some
sanity for the tree.  With complex interfaces .text is not the only
thing that needs nontrivial protection, to put it mildly.

I'd rather get the simple (== large) classes into decent shape and then
deal with what's left.  FVO "deal" possibly including "no rmmod for these
guys".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Cort Douga » Fri, 12 Jul 2002 13:10:05


Large PTE's aren't free either, though.  Cheap enough to implement but
there's some fragmentation that isn't easy to deal with in some
pathological cases.  The virtual space is pretty tight on some archs
already.

A lot of stock distributions load most drivers as modules so a machine well
stocked with devices may run into trouble.

} Modules can be mapped using a large PTE mapping.
} I've been meaning to do this on sparc64 for a long
} time.
}
} So this TLB argument alone is not sufficient :-)
} I do concur on the "ipv4 as module is difficult to
} get correct" argument however.
} -
} To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

} More majordomo info at  http://vger.kernel.org/majordomo-info.html
} Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Rusty's module talk at the Kernel Summit

Post by Arnaldo Carvalho de Mel » Fri, 12 Jul 2002 13:30:07


Em Wed, Jul 10, 2002 at 10:02:44PM -0600, Cort Dougan escreveu:

Quote:> Large PTE's aren't free either, though.  Cheap enough to implement but
> there's some fragmentation that isn't easy to deal with in some
> pathological cases.  The virtual space is pretty tight on some archs
> already.

> A lot of stock distributions load most drivers as modules so a machine well
> stocked with devices may run into trouble.

yes, that is what I like about modules: for general purpose distros and also
for debugging.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Cort Douga » Fri, 12 Jul 2002 14:00:05


I checked out the sparc64 PTE structure.  It's not a bozo-design which I
thought it was (like some PPC's).  If you can select powers of 2 sizes my
concern is meaningless there since you can pick the appropriate
granularity.

A genuine Bershad-esque superpages design would be perfect there.

} Large PTE's aren't free either, though.  Cheap enough to implement but
} there's some fragmentation that isn't easy to deal with in some
} pathological cases.  The virtual space is pretty tight on some archs
} already.
}
} A lot of stock distributions load most drivers as modules so a machine well
} stocked with devices may run into trouble.
}
} } Modules can be mapped using a large PTE mapping.
} } I've been meaning to do this on sparc64 for a long
} } time.
} }
} } So this TLB argument alone is not sufficient :-)
} } I do concur on the "ipv4 as module is difficult to
} } get correct" argument however.
} } -
} } To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

} } More majordomo info at  http://vger.kernel.org/majordomo-info.html
} } Please read the FAQ at  http://www.tux.org/lkml/
} -
} To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

} More majordomo info at  http://vger.kernel.org/majordomo-info.html
} Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Rusty's module talk at the Kernel Summit

Post by Adam J. Richte » Fri, 12 Jul 2002 14:10:08



>On Thu, 4 Jul 2002 10:24:11 -0700

>>       The system that I am composing this email on has 1.1MB of
>> modules and does not have sound drivers loaded.  It has ipv4 and a
>> number of other facilities modularized that are not modules in the
>> stock kernels.  Every system that I use has a configuration like this.
>> With a lower per-module overhead, I would be more inclined to try to
>> modularize other facilities and break up some larger modules into
>> smaller ones, in the case where there is substantial code that is not
>> needed for some configurations.

>For God's sake, WHY?  Look at what you're doing to your TLB (and if you
>made IPv4 a removable module, I'll bet real money you have a bug unless
>you are *very* *very* clever).

        My motivation in modularizing ipv4 was to be able to sqeeze more
drivers onto a boot floppy for CD's or hard disks and have that kernel
still be able to continue on bring up networking later (and to avoid
maintaining a different kernel binary).  Ultimately, I would like to
see CONFIG_NET modularized, if only to reduce the time spent reading
the floppy.

        I have deliberately not fixed some reference count problems in
my ipv4.o module right now because I'm pretty sure lots of things would
break if I tried removing it.  I did write a module_exit function, but
I never tried turning off the reference counting and executing it.

        I also was under the impression that Dave Miller had a modularized
ipv4 in a "vger cvs" kernel (remember them?), so I assumed that some
modularization of ipv4 was working its way to Linus.

        About translation lookaside cache misses, I was considering
breaking down these large modules mostly after the optimizations that
I wishfully described later in my posting:

| Then somebody changes allocation of
| modules that are less than a page to use kmalloc(GFP_HIGHMEM) instead
| of vmalloc (~30% of modules on my system are already this small).
| Then somebody figures out a way to have vmalloc's larger than a page
| that do not need page alignment can sometimes start in the unused last
| page of another vmalloc.

        In that case, it's a much more emperical question about
whether eliminating large chunks of unused code brings the code that
does run into the same page more often than splitting the module
causes code that was in the same page to be split into two different
pages, especially if there is a reasoonable chance that that code is
going to be loaded into a location that shares a page that would already
be in the TLB.

        Come to think of it, if modules do not have to occupy full pages,
you could perhaps add a "module affinity" so that modules that reference
each other would be more likely to end up sharing a page.  Module loading
happens tens of times a day, if that.  Inter-module calls can happen a
zillion times per second.  So, who knows, it might be worth the complexity,
could be in insmod.

        Dave Miller's proposal to use 4MB pages for modules is an
interesting alternative, but, isn't kmalloc()'ed memory already
in the kernel's big page?  If so, then using that for small modules
would have the same effect for at least those modules, and I believe
that kmalloc is set up to handle up to 128kB.

Adam J. Richter     __     ______________   575 Oroville Road

+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Rusty Russel » Fri, 12 Jul 2002 14:20:04



Quote:> > So this TLB argument alone is not sufficient :-)
> > I do concur on the "ipv4 as module is difficult to
> > get correct" argument however.

> Sure, but consider the amount of tricky modules and amount of easy ones.
> net/ipv4/*.c _is_ tricky; so much that having system with many parts of
> such complexity would be extremely painful.

> IOW, yes, we have some very tricky interfaces between the parts of kernel;
> and their trickiness alone guarantees that we don't want to have them
> breeding.  Stuff that genuinely needs complex interfaces is *not* something
> you want to be mass-produced.

Sure, if you want to reduce the problem space to "modules which are a
single fs/net/etc device driver" then we can *definitely* work
something out.  This works because they have such a narrow and
non-time-critical interface (who cares if we do a gratuitous
atomic_inc on every fs mount?).

To really get this to work well, you should make sure such modules
don't even need init and remove functions, by providing something
like:

        I_AM_A_FILESYSTEM_DRIVER("ramfs", ramfs_fs_type);

Quote:> I'd rather get the simple (== large) classes into decent shape and then
> deal with what's left.  FVO "deal" possibly including "no rmmod for these
> guys".

This was *entirely* my question at the Kernel Summit:

        Are modules first class citizens?
          Should everything be modular?
          What complexity are we prepared to pay?

We *can* do anything, up to and including modules which hand out
references to themselves in interrupt context, and dealing with the
race between "my module count is zero" and "oops, someone jumped in
before I had deactivated myself" without using try_inc_mod_count.

But *should* we?  The solution, for those of strong stomach, looks
something like this:

        Each module implements: init(), start(), stop(), reinit(), destroy().
        Each registerable interface takes a "struct module *" parameter.
        Every call through a function ptr does "inc_mod_count(struct->module)"
                (Of course, if you make assumptions about a struct
                containing only functions from the same module or
                in-kernel ones, and knowing that some strategy
                functions are always called before others, you can
                optimize this).

I don't think we're disagreeing, but I did want to clarify,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Rusty Russel » Fri, 12 Jul 2002 14:20:05



> Em Thu, Jul 11, 2002 at 12:48:30PM +1000, Rusty Russell escreveu:
> > On Thu, 4 Jul 2002 10:24:11 -0700

> > > smaller ones, in the case where there is substantial code that is not
> > > needed for some configurations.

> > For God's sake, WHY?  Look at what you're doing to your TLB (and if you
> > made IPv4 a removable module, I'll bet real money you have a bug unless
> > you are *very* *very* clever).

> > Modules are not "free".  Sorry.

> What about Andi Kleen patch to not use vmalloc (well, vmalloc is used as a
> fallback) when loading modules but instead use big pages?  It is being
> integrated in 2.4.20-pre, IIRC. IIRC with that there is still some issues, so
> for enlightening the audience here, could you share your view on that patch?

8)

Sure, but there was no indication that Adam was using such a patch 8)

Quote:> And for _debugging_ IPv4 maybe the modularisation, if Adam was clever, could
> help somewhat.

Definitely.  For debugging purposes, you don't need reference
counting: when the hacker says "remove it", you remove it. 8)

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Alexander Vir » Fri, 12 Jul 2002 15:50:08



> Sure, if you want to reduce the problem space to "modules which are a
> single fs/net/etc device driver" then we can *definitely* work
> something out.  This works because they have such a narrow and
> non-time-critical interface (who cares if we do a gratuitous
> atomic_inc on every fs mount?).

Note: "single" can be easily removed here.

Quote:> To really get this to work well, you should make sure such modules
> don't even need init and remove functions, by providing something
> like:

>    I_AM_A_FILESYSTEM_DRIVER("ramfs", ramfs_fs_type);

Not needed.  Really not needed (just wait for a couple of days until
I get the infrastructure for race-free register/unregister on generic
stuff into submittable shape).

Quote:> > I'd rather get the simple (== large) classes into decent shape and then
> > deal with what's left.  FVO "deal" possibly including "no rmmod for these
> > guys".

> This was *entirely* my question at the Kernel Summit:

>    Are modules first class citizens?
>      Should everything be modular?
>      What complexity are we prepared to pay?

That depends.  As it is, currently we can pick _any_ part of the code
and declare it modular - matter of adding more gratitious exports and
maybe several "upcalls" (a-la recently killed devpts ones).

In _that_ sense of "module" questions are ridiculous - and answer are
"not in that generality"/"don't be silly"/"nowhere near the amount needed
to make __down_failed() modular".

However, absolute majority of modules are nowhere near that monstrous.
And actually we don't need to special-case "I'm a filesystem"/"I'm a
block device"/"I'm a framebuffer" - with a bit of massage all of these
and then some can be handled by the same code.  Again, wait for a couple
of days and I'll post the patches for testing.

Call them well-behaving modules if you wish.  For these the answers are
"yes"/"a lot of things can be"/"it's easy to handle".  What's left?
The pieces of code with really complex interfaces.  And guess what,
race-prevention is complex for these guys - and it's not just about
rmmod races.  E.g. parts of procfs, sysctls and devfs are still quite racy
even if you compile everything into the tree and remove all module-related
syscalls completely.

Again, complex API -> complex race-prevention.  No way around it and frankly,
I wouldn't want to have one - a lot of otherwise sane people are prone
to creating ugly and overcompicated interfaces and if there is something that
makes people think hard before doing that I'm only glad.

Nobody sane argues for allowing to make any piece of code modular
(hands up those who really want modular semaphores; good, now turn
face to the wall, the firing squad will be taking care of you in
a moment).

Every time you are creating an interface between the main kernel and
modules you _are_ responsible for protection against races, be they
rmmod-related or not.

When you are using existing interface - you are using existing protection.
And preferably - with minimal PITA on your side.

Nobody promises that some random piece of code you want to cut out will be
safe or easy to make safe.  As long as for absolute majority of drivers
we _can_ make things safe painlessly for driver - that's it.  You want
something tricky - you get to hold the pieces.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Rusty's module talk at the Kernel Summit

Post by Rusty Russel » Fri, 12 Jul 2002 16:20:06



e:

Quote:> Not needed.  Really not needed (just wait for a couple of days until
> I get the infrastructure for race-free register/unregister on generic
> stuff into submittable shape).

Yes, I look forward to your code.

There's no point discussing this until we see your solution, is there?

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/