atomic kernel operations are very tricky to export to user space (was [RFC] Improved inode number allocation for HTree )

atomic kernel operations are very tricky to export to user space (was [RFC] Improved inode number allocation for HTree )

Post by Hans Reise » Wed, 12 Mar 2003 23:30:16






>>>Let's make noatime the default for VFS.


>>[...]

>>>>If I were able to design Unix over again, I'd state that if you don't
>>>>lock a directory before traversing it then it's your own fault if
>>>>somebody changes it under you, and I would have provided an interface
>>>>to inform you about your bad luck.  Strictly wishful thinking.
>>>>(There, it feels better now.)

>>I'm happy nobody _can_ lock a directory like that.  Think of it - unable
>>to create or delete files while some slow-moving program is traversing
>>the directory?  Ouch.  Plenty of options for DOS attacks too.
>>And how to do "rm *.bak" if rm locks the dir for traversal?

><wishful thinking>
>Now that you mention it, just locking out create and rename during directory
>traversal would eliminate the pain.  Delete is easy enough to handle during
>traversal.  For a B-Tree, coalescing could simply be deferred until the
>traversal is finished, so reading the directory in physical storage order
>would be fine.  Way, way cleaner than what we have to do now.
></wishful thinking>

>Regards,

>Daniel
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

You would want a  directory listing command that is a single system
call, and then that could be made isolated, without risk of userspace
failing to unlock.   Then you have to worry about very large directories
straining things in user space, but you can probably have the user
specify the maximimum size directory his process has space for, etc.

In general, DOS issues are what makes it difficult to introduce
transactions into the kernel.  We are grappling with that now in
reiser4.  It seems that the formula is to create atomic plugins, and
then it is the system administrator's responsibility to verify that he
can live with whatever DOS vulnerabilities it has before he installs it
in his kernel (or our responsibility if it is sent to us for inclusion
in the main kernel).  Allowing arbitrary filesystem operations to be
combined into one atomic transaction seems problematic for either user
space or the kernel, depending on what you do.

In general, allowing user space to lock things means that you trust user
space  to unlock.  This creates all sorts of trust troubles, and if you
force the unlock after some timeout, then the user space application
becomes vulnerable to DOS from other processes causing it to exceed the
timeout.

Ideas on this are welcome.

--
Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

atomic kernel operations are very tricky to export to user space (was [RFC] Improved inode number allocation for HTree )

Post by Jamie Lokie » Thu, 13 Mar 2003 02:00:15



> Allowing arbitrary filesystem operations to be
> combined into one atomic transaction seems problematic for either user
> space or the kernel, depending on what you do.

> In general, allowing user space to lock things means that you trust user
> space  to unlock.  This creates all sorts of trust troubles, and if you
> force the unlock after some timeout, then the user space application
> becomes vulnerable to DOS from other processes causing it to exceed the
> timeout.

> Ideas on this are welcome.

You can allow user space to begin a transaction, do some operations
and end a transaction, possibly returning an "abort" result which
means userspace should assume the transaction did not commit any
results and/or whatever was read in the transaction was not reliable.

On the face of it this leaves userspace susceptible to DOS or indeed
fairness/livelock problems.  For example if another program is always
changing a directory entry, how can you read that whole directory
in a transaction?

Fairness/livelock problems are hard to avoid with any kinds of lock.
Even the kernel's internal locks have these problems in corner cases
(for example, remember when gettimeofday()'s clock access had to be
converted from using a spinlock to a sequence lock - and that still
doesn't _guarantee_ there is no problem in principle, it just reduces
the probability in all reasonable scenarios).

However, some remedies can be applied to filesystem transactions.  If
an operation would cause some other task's transaction to eventually
return an abort code, consider sleeping for a short duration.
Randomise that duration.  If the other transaction(s) have been
aborting repeatedly, consider lengthening the sleep duration and/or
specifically waiting for the other transaction to complete, to boost
the other task(s) likilihood of transaction success.  Randomise this
decision too.  If you know something about the type of other
transactions (such as it is trying to implement a read-write lock by
doing atomic operations on bytes in a file), consider exactly what
policy you hope to offer (writer preference?  reader preference?
something in between?)

By which point it has remarkable similarities to the problems of
fairness in the task scheduler, and fairness/livelock in locks.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. [RFC] Improved inode number allocation for HTree

Why start?  Who actually uses atime for anything at all, other than the
tiny number of shops that care about moving untouched files to tertiary
storage?

Surely if you want to heap someone else's plate with work, you should
offer a reason why :-)

        <b

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. LILO installation problems

3. Filesystem write priorities, (Was: Re: [RFC] Improved inode number allocation for HTree)

4. How do I install device drivers for "Integrated Audio device" on IBM RS/6000 43P model 140 with AIX 4.1.2?

5. How does one start kaiman?

6. [RFC] Improved inode number allocation for HTree

7. bug in RMT or dump/restore?

8. Improved inode number allocation for HTree

9. I am buying an Ultra 5 but am lost in part numbers ....

10. remove mixture of non-atomic operations with page->flags which requires atomic operations to access

11. This clone thing...am I stupid, or am I right?