>The unix folklore is that "large directories" (that is, directories
>with more than a few hundred files) pay a horrible performace penalty.
linear search. CPU time for opening an existing file goes linearly with
the position of the file name in the directory. CPU time for failing an
open of a non-existent file or for creating a new file name goes linearly
with the size of the directory file (as measured by number of files plus
number of holes -- which is proportional to the total directory file size
on old fashioned (V7 or SysV) file systems). I/O for each of these also
tends to go linearly with the same parameters, but successive use of
the same directory will be more I/O efficient as the pages will be in
the buffer cache. Some systems have directory caches, and you can do
better than the linear search when accessing the same file repeatedly.
Any benchmarker should consider whether the intent is to model a load
which repeatedly accesses a small number of files in a small number of
directories or a large number of files in a large number of directories
or whatever. Size of the buffer cache (and directory cache, if any) can
be important tuning parameters, and modeling a realistic amount of other
system activity which affects these caches may also be important.
Richard M. Mathews Lietuva laisva = Free Lithuania
UUCP: ...!uunet!lcc!richard Eesti vabaks = Free Estonia