Realistic limit on number of files in a directory

Realistic limit on number of files in a directory

Post by Kevin W. Hammo » Wed, 01 Feb 1995 07:09:10



I've checked the FAQ and there does not seem to be any indiation on this...

What is a realistic maximum number of files one should place in a
sub-directory to avoid performance degredation?  Furthermore, is there
any problem with putting a large number of subdirecories in the entire tree?

We are implementing some software that will be generating approximately
25,000 files per day and need to store it in an effecient manner so that
file creation, lookup, etc. does not take forever.

There will not be 25,000 files created at one time; rather batches
of 1,000-1,500 files will be created at a time, and each of these can be
logically grouped into a seperate directory tree.

I've envisioned something like this:

        batches
        |
        +---batch01
        |   |
        |   +---00
        |   |   .
        |   |   .
        |   |   .
        |   +---99
        |   .
        |   .
        |   .
        +---batch02

Where \batches is the logical group of all batches scanned.  batch01 is
the first batch, batch02 is the second batch, etc.  Underneath the
batchnn directories would be a list of sub-directories, each sub-directory
holding the maximum number of files permitted without performance
degredation.

Thanks for any information!

[ of course a sure-fire way is through experimentation .  It's
  rather easy to write a series of tests which creates lots and lots of
  files and then times how long it takes to open one.  Be sure to
  pick random filenames to negate the effect of the inode cache.  --mod ]

        --kevin
--
Kevin W. Hammond

 
 
 

Realistic limit on number of files in a directory

Post by Doug Siebe » Thu, 02 Feb 1995 03:08:44



Quote:>I've checked the FAQ and there does not seem to be any indiation on this...
>What is a realistic maximum number of files one should place in a
>sub-directory to avoid performance degredation?  Furthermore, is there
>any problem with putting a large number of subdirecories in the entire tree?
>We are implementing some software that will be generating approximately
>25,000 files per day and need to store it in an effecient manner so that
>file creation, lookup, etc. does not take forever.
>There will not be 25,000 files created at one time; rather batches
>of 1,000-1,500 files will be created at a time, and each of these can be
>logically grouped into a seperate directory tree.

I've done some testing of this on HP-UX 9.x systems after seeing similar
problems when I had > 10,000 files in a single directory.  I ended up deciding
that performance degradations were noticeable after a few hundred files.  I
think 1000-1500 in a directory would probably be OK unless you are going to be
opening these files extremely often.

--
Doug Siebert             |  I have a proof that everything I have stated above


 
 
 

Realistic limit on number of files in a directory

Post by Bill Vermilli » Fri, 03 Feb 1995 00:02:40




Quote:>I've checked the FAQ and there does not seem to be any indiation on this...
>What is a realistic maximum number of files one should place in a
>sub-directory to avoid performance degredation?  Furthermore, is there
>any problem with putting a large number of subdirecories in the entire tree?

Somewhat system dependant.  A directory is just a file with
pointers to the file locations.  Once you get past the direct
blocks in the inode and go to indirect blocks you are going to
have to make addition disk accesses to read/write to that
directory.    On some of the smaller systems I see performance
start to go down when the directory gets over about 5000 bytes
long - that's just a few hundred files.

--

 
 
 

Realistic limit on number of files in a directory

Post by Neal P. Murp » Sat, 04 Feb 1995 05:36:00



Quote:>What is a realistic maximum number of files one should place in a
>sub-directory to avoid performance degredation?  Furthermore, is there
>any problem with putting a large number of subdirecories in the entire tree?

I don't know the internals of the different Unix implementations, but through
trial and error, I have a clue.

I believe directory searches are sequential, thus directories
containing large numbers of files will cause opens to take longer.
Personally, I would not like to have more than 100 or 200 files in any
directory.

I would suggest that you break up each batch into smaller groups of
files and place them in sub-sub-sub directories. But do watch your
file-name/directory-name lengths:  the total path length better not
exceed PATHMAX (or is it MAXPATHLEN; something like that.)

But you should experiment anyway.

Fester

 
 
 

Realistic limit on number of files in a directory

Post by Brian Fost » Sat, 04 Feb 1995 23:50:05



 | >What is a realistic maximum number of files one should place in a sub-
 | >directory to avoid performance degredation?  Furthermore, is there any
 | >problem with putting a large number of subdirecories in the entire tree?
 |   [ ... ]
 | I've done some testing of this on HP-UX 9.x systems after seeing similar
 | problems when I had > 10,000 files in a single directory.  I ended up
 | deciding that performance degradations were noticeable after a few hundred
 | files.  I think 1000-1500 in a directory would probably be OK unless you
 | are going to be opening these files extremely often.

a suggestion- you should test on whichever system(s) you are using,
as i suspect this is heavily influenced by both the kernel version and
filesystem format.  e.g., a rule of thumb for older 7th-edition-ish
filesystems is to try and avoid allowing the directory to grow larger
than one f.s.block, and to never allow it to grow larger than ten
f.s.blocks (when you start using indirect blocks in these systems).
the first point may be a legend, but the second _is_ distinctly
measurable.  the classic problem is two processes searching for
a lockfile in a large directory - the first may determine it's not
there, and create it in a slot the second's already searched, causing
the second to believe there isn't a lockfile.  worse, if the second
process's search is the one done as a part of the file creation
procedure, on some systems it was (is?) possible to wind up with
two files of the same name in the same directory.

a warning- many *ix systems have a maximum limit on the number of hard
links to an i-node, and on some systems, it's 1000 (e.g., current sco
unix and sco xenix).  this does not affect the number of non-directories
in a directory, but does limit the number of sub-directories you can
have in a directory.

disclaimer: my opinions are mine!
--
"There are 3 types of mathematician.   | Brian Foster, SGS-Thomson, c/o PACT

 cannot."       -Robert Arthur,        | (+44 or 0)117 9707 156     England
                   alt.fan.pratchett   | http://www.pact.srf.ac.uk

 
 
 

Realistic limit on number of files in a directory

Post by Skip Satterl » Mon, 06 Feb 1995 12:26:45





> >What is a realistic maximum number of files one should place in a
> >sub-directory to avoid performance degredation?  Furthermore, is there
> >any problem with putting a large number of subdirecories in the entire tree?

> I don't know the internals of the different Unix implementations, but through
> trial and error, I have a clue.

> I believe directory searches are sequential, thus directories
> containing large numbers of files will cause opens to take longer.
> Personally, I would not like to have more than 100 or 200 files in any
> directory.

> I would suggest that you break up each batch into smaller groups of
> files and place them in sub-sub-sub directories. But do watch your
> file-name/directory-name lengths:  the total path length better not
> exceed PATHMAX (or is it MAXPATHLEN; something like that.)

> But you should experiment anyway.

> Fester

Fester,
  There is no limit to the number of files in a UNIX directory.  However,
If you are worried about performance then you should follow several
benchmarks.  For instance, using symbolic linked files will slow down
directory lookups tremendously.  I suggest you pickup one of the UNIX
Administration books on the market.  O'Reilly books are the best.
-Skip  
 
 
 

Realistic limit on number of files in a directory

Post by Brian Fost » Thu, 09 Feb 1995 01:01:14


 | [ discussing 7th-edition-ish directory size limitations ... ]
 | the classic problem is two processes searching for
 | a lockfile in a large directory - the first may determine it's not
 | there, and create it in a slot the second's already searched, causing
 | the second to believe there isn't a lockfile.  worse, if the second
 | process's search is the one done as a part of the file creation
 | procedure, on some systems it was (is?) possible to wind up with
 | two files of the same name in the same directory.  [ ... ]

this has caused a considerable amount of confusion (and e-mail),
for which i apologize.  what i termed the classic problem is my
possibly faulty recollection of the abstract nature of one of the
problems which showed up when using pre-hdb-uucp with numerous
(hundreds) of sites.  old-time uucp admins should be quite familar
with the consequences, which, as i now vaguely recall, included
"spurious" "connection" timeouts.

on the second -- and quite different "problem" -- what i failed to
point out was that the _only_ time i've seen the two-names-in-one-
directory situation was during a *ix port at a company which shall
rename nameless.  the statement is correct, and based on direct
observation, but unintentionally misleading.  again, i apologize.

disclaimer: my opinions are mine!
--
"There are 3 types of mathematician.   | Brian Foster, SGS-Thomson, c/o PACT

 cannot."       -Robert Arthur,        | (+44 or 0)117 9707 156     England
                   alt.fan.pratchett   | http://www.pact.srf.ac.uk