"Performance" limit to file numbers in a directory

"Performance" limit to file numbers in a directory

Post by R Ghosh-Ro » Thu, 23 Jan 1997 04:00:00



Hi,

I am aware of the fact that in DOS, its better to keep the number
of files in a directory below 150 - for performance reasons. I am
bit confused if the same applies for UNIX and if it does exist, I
am interested to know the number.

Thanks for your help.,

Rana

--

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


+ --.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.----.-- +
+ All opinions stated are my own, and don't even vaguely resemble  +
+ those of Brunel University or Brunel Colleges.  ;-)              +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 
 
 

"Performance" limit to file numbers in a directory

Post by Eric Levene » Thu, 23 Jan 1997 04:00:00



>I am aware of the fact that in DOS, its better to keep the number
>of files in a directory below 150 - for performance reasons. I am
>bit confused if the same applies for UNIX and if it does exist, I
>am interested to know the number.

It is not important for UNIX.

If you have 1000 files in a directory, when you type "ls", the
command must sort them before displaying them, so it is longer
than if there were only 100 files.

With UNIX a 150 files directory is a little directory. ;-)

If you have a lot of files to store, the tradition of UNIX is to
make subdirectories like "man" or 'terminfo", but it is more because
it is simpler to display them by small quantity than a speed there
access.

--

--------------------------------------------------------------------
ric Lvnez              "Felix qui potuit rerum cognoscere causas"

(NeXTMail, MIME)                                   Georgica, II-489
--------------------------------------------------------------------

 
 
 

"Performance" limit to file numbers in a directory

Post by Mark Ha » Fri, 24 Jan 1997 04:00:00


: >of files in a directory below 150 - for performance reasons. I am
: >bit confused if the same applies for UNIX and if it does exist, I
: >am interested to know the number.

under a couple hundred is reasonable.  over 1k is pushing it,
and over a few thousand you'll start wasting real time.
I'm about to rewrite a program that generates 20k entries,
and spends a significant amount of time screwing with the
directory.

: It is not important for UNIX.

nonsense.  Unix doesn't normally use anything more clever than
a list for directories.  actually, it's _less_ clever than a
list, since it can contain potentially a lot of deleted entries,
makeing the dir sparse (and thus inefficient.)  I've only heard
of one NFS server vendor fixing this.

regards, mark hahn.
--

                                        http://neurocog.lrdc.pitt.edu/~hahn/

 
 
 

"Performance" limit to file numbers in a directory

Post by Alicia Carla Longstree » Fri, 24 Jan 1997 04:00:00



> Hi,

> I am aware of the fact that in DOS, its better to keep the number
> of files in a directory below 150 - for performance reasons. I am
> bit confused if the same applies for UNIX and if it does exist, I
> am interested to know the number.

Several things:
1) Since your question is about Unix, you will probably get better and
more complete answers in an appropriate forum, comp.lang.c is for
discussing the Standard C language. Try:
        comp.unix.programmer                  General Unix Questions
        comp.unix.[vendor]                    Various Unix vendors

2) I don't believe that there is a specific number, rather cluster size
is more important.  If you can fit an entire directory in one cluster
you will get better performance than if the directory requires two or
more clusters.

3) If you are so concerned about performance that optimal directory
performance is an issue, you might want to invest in an OS with a high
performance file system.  Neither DOS nor most flavors of Unix,
available for the Intel architecture, use an HPFS (I don't know about
Linux-anybody?)  The availablitiy of an HPFS will have a significant
effect on your question.

Alicia Carla Longstreet
"The time has come," the Walrus said,
"To talk of many things:
Of shoes-and ships-and sealing wax-
Of cabbages-and kings-
And why the sea is boiling hot-
And whether pigs have wings."
                Lewis Carroll

 
 
 

"Performance" limit to file numbers in a directory

Post by Bennett To » Fri, 24 Jan 1997 04:00:00


[ This isn't a C question at all; followups directed to c.u.p]

The answer depends on which Unix filesystem type your are talking about. I've
heard that SGI has hacked on this stuff extensively, with nice efficient hash
tables for directories. Most Unix filesystems still use linear lists, which
means that every operation requires a search; many tasks (like populating such
a directory!) end up being quadratic in the number of files. This is _bad_ for
large numbers of files.

Another limit comes up because many Unix implementations have a maximum size,
either in the shell (command-line setup) or the kernel (exec syscall and/or
data layout, where argv is stored) that causes commands to fail if you try to
set up too long an argument list. It's a bummer when you can't use "*" for an
argument list in a ``problem'' directory; all kinds of stuff (including users'
skills) break when you can't use wildcards.

I recently helped someone un-break a totally hosed up system; we ended up
using something like

        mkdir .old
        ls|perl -lne 'rename $_,".old/$_" if /pattern/ and -M > 60;'

to get things working again. Icko.

Take a look at the file architecture used by terminfo, often to be
found in someplace like /usr/lib/terminfo, /usr/share/terminfo,
/usr/lib/share/terminfo, etc. (moving the terminfo database is one of the more
popular ways for Unix vendors to ``mark territory'', kinda like dogs peeing on
fire hydrants). Anyway, if you can find where your vendor hid it, take a peek
at that; it wants to have well over a thousand files in it, so it splits them
up by the first character. Thus the top-level has maybe 60-odd subdirectories
with 1-character names; these have anywhere from 0 to 200-odd files each. This
is a simple strategy, and can be adapted to save many such situations.

-Bennett

 
 
 

"Performance" limit to file numbers in a directory

Post by Guy Harr » Fri, 24 Jan 1997 04:00:00


["comp.lang.c" removed; this isn't a C question.]


>I am aware of the fact that in DOS, its better to keep the number
>of files in a directory below 150 - for performance reasons. I am
>bit confused if the same applies for UNIX and if it does exist, I
>am interested to know the number.


Quote:>It is not important for UNIX.

Having worked at two companies who had to add stuff to their file system
code to speed up searches in large directories because of problems
customers had accessing large directories on our NFS servers from UNIX
clients, I can state for certain that the claim made in the sentence
above is simply not true (our machines don't run UNIX, but they export
file systems to UNIX systems over NFS, and my previous employer's
machines run UNIX on the host processor and use the BSD file system).

Quote:>If you have 1000 files in a directory, when you type "ls", the
>command must sort them before displaying them, so it is longer
>than if there were only 100 files.

"ls" isn't the only thing you do on a directory.

If a program refers to a file in that directory, the *program* isn't
going to do anything with the other 999 files in that directory;
however, the file system's *directory search code* might well have to
look though, on average, 499 or so of them.

Quote:>If you have a lot of files to store, the tradition of UNIX is to
>make subdirectories like "man" or 'terminfo", but it is more because
>it is simpler to display them by small quantity than a speed there
>access.

That may be true of "man"; however, I don't think Mark Horton much cared
about the ease of doing "ls"s when he set up the way "terminfo" worked -
I suspect he more cared about minimizing the amount of time the file
system code would have to spend doing lookups of "terminfo" files.

To go back to the original poster's question:

There is not necessarily a single answer for UNIX systems, as there
isn't a single *file system* for all UNIX systems.  Some file system
types do linear searches in directories; others may do hashed searches;
others may store directories as B-trees; etc..

Unfortunately, I don't know offhand appropriate numbers for various
types of file system on UNIX systems.

 
 
 

"Performance" limit to file numbers in a directory

Post by Lawrence Kir » Fri, 24 Jan 1997 04:00:00



>Hi,

>I am aware of the fact that in DOS, its better to keep the number
>of files in a directory below 150 - for performance reasons. I am
>bit confused if the same applies for UNIX and if it does exist, I
>am interested to know the number.

It probably depends on the version of Unix and more precisely the details
of the particular filesystem type you are using. It has nothing to do with
the C language so I've directed follow-ups away from comp.lang.c.

However, yes, on many Unix filesystems having more than a few hundred files
can seriously hurt performance.

--
-----------------------------------------


-----------------------------------------

 
 
 

"Performance" limit to file numbers in a directory

Post by Richard Scrant » Fri, 24 Jan 1997 04:00:00


While I will do no more than note the annoying pedanticism of the quoted
response, which seems a common posture on comp.lang.c, I must note
that the responder is apparently not a unix programmer, or she would
not be using terms like 'cluster' and 'HPFS' nor would she state that
Unix file systems are inefficient by nature.  This has not been true since
the 'v7' inode-based file system was the norm.  Current Unix file systems
are based in large part on the Berkeley "Fast File System" (ffs) which
goes to great lengths to ensure both timely access to information and
reasonable recoverability in the event of a system mishap.  ffs distributes
bitmaps and cylinder groups across the surface of the disk to minimize
head travel, and attempts to minimize fragmentation, also to that end.
Becoming much more common also are 'log structured file systems'
that allow the disk to be used more safely in asynchronous write mode
by using a two-phase commit for file system meta-data changes.  In the
event of a system problem, the system can replay an 'intent log' of
file system changes and compare the existing disk structure to it.  If the
meta-data change did not complete successfully, it is rolled back.  This
allows the system to reboot quickly, as 'fsck' doesn't need to analyze the
entire disk before mounting it for multi-user access.  Berkeley LFS and
Veritas VFS are good examples of log structured file systems.

Also relevant is what you intend to do with however many files you
create in a single large directory.  Large usenet spools typically contain
thousands of files while in operation.  Unix deals with this gracefully
by maintaining a 'vnode cache' in memory.  Recently accessed files
are probably still in the cache and can be found without a physical
disk lookup for the vnode/inode address.  Some news server administrators
will change the period of time 'update' sleeps to prevent too-frequent
flushes of the file system and vnode buffer caches to disk in the interest
of I/O performance.

In summary, although the number of files in a directory does have
performance implications, you'll probably run into problems managing
them administratively before the system is expending significant effort
to find them.


> > Hi,

> > I am aware of the fact that in DOS, its better to keep the number
> > of files in a directory below 150 - for performance reasons. I am
> > bit confused if the same applies for UNIX and if it does exist, I
> > am interested to know the number.

> Several things:
> 1) Since your question is about Unix, you will probably get better and
> more complete answers in an appropriate forum, comp.lang.c is for
> discussing the Standard C language. Try:
>   comp.unix.programmer                  General Unix Questions
>   comp.unix.[vendor]                    Various Unix vendors

> 2) I don't believe that there is a specific number, rather cluster size
> is more important.  If you can fit an entire directory in one cluster
> you will get better performance than if the directory requires two or
> more clusters.

> 3) If you are so concerned about performance that optimal directory
> performance is an issue, you might want to invest in an OS with a high
> performance file system.  Neither DOS nor most flavors of Unix,
> available for the Intel architecture, use an HPFS (I don't know about
> Linux-anybody?)  The availablitiy of an HPFS will have a significant
> effect on your question.

> Alicia Carla Longstreet
> "The time has come," the Walrus said,
> "To talk of many things:
> Of shoes-and ships-and sealing wax-
> Of cabbages-and kings-
> And why the sea is boiling hot-
> And whether pigs have wings."
>                 Lewis Carroll

----
========================================
Richard Scranton - LDA Systems, Columbus

 
 
 

"Performance" limit to file numbers in a directory

Post by Gordon Burdi » Sun, 26 Jan 1997 04:00:00


Quote:>>I am aware of the fact that in DOS, its better to keep the number
>>of files in a directory below 150 - for performance reasons. I am
>>bit confused if the same applies for UNIX and if it does exist, I
>>am interested to know the number.

>It is not important for UNIX.

Yes, it is.  It's going to be an issue for just about any OS,
although the specific number will vary.  But since ANSI C doesn't
recognize the existence of anything called a "directory", it's
not relevant to C.

I have had systems trying to use UUCP cease working because
the work directory got so large it took too long to search it
and the other end timed out.  The directory got over
about 5,000 files.  It didn't help that this particular old
version of UNIX would never shrink a directory.  Once things
got bad, they would only get worse.

Quote:>If you have 1000 files in a directory, when you type "ls", the
>command must sort them before displaying them, so it is longer
>than if there were only 100 files.

>With UNIX a 150 files directory is a little directory. ;-)

Imagine you are chief programmer for AOL or Netcom.  Would
you design mail so that each user's mail is kept in one file
in a single mail directory?  AOL has at least a million
subscribers last I heard, and they should be planning for expansion.
Assume a typical directory entry takes about 20 bytes.
That's a *20 MEGABYTE* directory NOW, and maybe 200 megabytes
a few years later - and this is just for the directory, not
for the files in it.  To find a file in that directory, you
have to read, on the average, half of it.  *SLOW*.

Typical breakpoints in performance searching UNIX directories
occur at these points:
- The directory gets bigger than one block.
- The directory acquires an indirect block (typically, gets bigger
  than 10 blocks)
- The directory acquires a double-indirect block.  (it's
  probably over the megabyte range now)
- The directory acquires a triple-indirect block.  (Now you're
  getting totally insane, and where were you planning on putting
  the file contents?).

Just because these are breakpoints where search time increases
in a way WORSE than proportional to the size of the directory
doesn't mean you shouldn't try to keep the directory as small
as possible.

How big a UNIX "block" is depends on your system, and perhaps
how the disk was initialized.  Typical values include 512 bytes,
4k, and 8k.  With BSD UFS filesystems, directory entries are
not fixed size.  Use of very long filenames will require very
large directory entries.

In the old V7 UNIX filesystem, a block was 512 bytes, directory
entries were 16 bytes long (14 character names max), and
you could fit 30 files (plus "." and "..") in a single block,
and 318 before getting an indirect block.  

Assuming similar-sized directory entries for a BSD 8k filesystem,
that would be about 510 files and 5118 files, respectively.
This would get a lot smaller if you go nuts with long file names.

                                        Gordon L. Burditt
                                        sneaky.lerctr.org!gordon

 
 
 

"Performance" limit to file numbers in a directory

Post by Will Ro » Tue, 28 Jan 1997 04:00:00


: Hi,

: I am aware of the fact that in DOS, its better to keep the number
: of files in a directory below 150 - for performance reasons. I am
: bit confused if the same applies for UNIX and if it does exist, I
: am interested to know the number.

It's probably a function of the block structure of the filesystem
(ie. implementation dependant).  Be aware that tools such as ls
may have a fixed upper limit to the number of files they will
handle, not related to the block structure.  The files will still
be safe in the directory, but you'll need to write special tools to
see their names.

Searching directories with several levels of indirect blocks gets
old, too.  A very slow process.  As a general rule I'd keep the number
of files per directory below a few hundred, if I were you.

Will

 
 
 

"Performance" limit to file numbers in a directory

Post by R!ch » Tue, 28 Jan 1997 04:00:00



Quote:> It's probably a function of the block structure of the filesystem
> (ie. implementation dependant).  Be aware that tools such as ls
> may have a fixed upper limit to the number of files they will
> handle, not related to the block structure.  The files will still

Huh?  It would be a *very* brocken ls (or whatever) that would have
an upper limit to the number of files it would handle!

Quote:> Searching directories with several levels of indirect blocks gets
> old, too.  A very slow process.  As a general rule I'd keep the number

True, but you'd need a *huge* number of directory entries before
you start going into multiply indirect blocks - but don't forget
about caching.

Quote:> of files per directory below a few hundred, if I were you.

I think the limit (and even then, only for performance reasons) can safely
be a few thousand - it depensd on your hardware/OS.

--
R!ch

If it ain't analogue, it ain't music.
#include <disclaimer.h>                          \\|// - ?
                                                 (o o)
          /==================================oOOo=(_)=oOOo========\

          |  Sun Service Contractor                               |
          |                            Voice: +44 (0)1276 691974  |
          |                                 .oooO                 |
          |                                  (  )  Oooo.          |
          \===================================\ (==(   )==========/
                                               \_)  ) /
                                                   (_/

 
 
 

"Performance" limit to file numbers in a directory

Post by Colin Doole » Tue, 28 Jan 1997 04:00:00



> True, but you'd need a *huge* number of directory entries before
> you start going into multiply indirect blocks - but don't forget
> about caching.

I'd like to see you put more than 112 files in the root directory
of an MS-DOS machine....

--
<\___/>
/ O O \
\_____/  FTB.

IMHO: Windows is ^!%#!

 
 
 

"Performance" limit to file numbers in a directory

Post by James Youngm » Thu, 30 Jan 1997 04:00:00



Quote:>I'd like to see you put more than 112 files in the root directory
>of an MS-DOS machine....

It's possible with MS-DOS __ramdisks__, I think...

--
James Youngman       VG Gas Analysis Systems |The trouble with the rat-race
 Before sending advertising material, read   |is, even if you win, you're
http://www.law.cornell.edu/uscode/47/227.html|still a rat.

 
 
 

"Performance" limit to file numbers in a directory

Post by K. Bjarnas » Sun, 02 Feb 1997 04:00:00




>> True, but you'd need a *huge* number of directory entries before
>> you start going into multiply indirect blocks - but don't forget
>> about caching.

>I'd like to see you put more than 112 files in the root directory
>of an MS-DOS machine....

 Just did it.

 247 files, 31 dirs.  No problem.  Yes, in the root.

Fly Heisenberg Air: Don't know where we are, but we're making good time!

 
 
 

"Performance" limit to file numbers in a directory

Post by Alicia Carla Longstree » Tue, 04 Feb 1997 04:00:00





> >> True, but you'd need a *huge* number of directory entries before
> >> you start going into multiply indirect blocks - but don't forget
> >> about caching.

> >I'd like to see you put more than 112 files in the root directory
> >of an MS-DOS machine....

>  Just did it.

>  247 files, 31 dirs.  No problem.  Yes, in the root.

> Fly Heisenberg Air: Don't know where we are, but we're making good time!

Right, the 112 limit is for floppy disks only!  Thank you, I knew that
there was something wrong with the original number.

Alicia Carla Longstreet
"The time has come," the Walrus said,
"To talk of many things:
Of shoes-and ships-and sealing wax-
Of cabbages-and kings-
And why the sea is boiling hot-
And whether pigs have wings."
                Lewis Carroll
But Please, not on comp.lang.c or comp.lang.asm.x86

 
 
 

1. rss" and "stack" and "data" in /etc/security/limits file

Hi there,

One of the users is complaining that his shell is not getting
sufficient memory. Did any one play with rss, stack and data parameters
in /etc/security/limits file.

Do I need to change them or what? I never have such a request before.

Thanks,

Arshad

Sent via Deja.com http://www.deja.com/
Before you buy.

2. mpeg_play for Sol.2.x?

3. GETSERVBYNAME()????????????????????"""""""""""""

4. how to build 6+ way systems?

5. syslog() "returns" "bad file number"

6. KSH/SH/CSH translator to PERL

7. """"""""My SoundBlast 16 pnp isn't up yet""""""""""""

8. What's Wrong With My Network II!

9. "Standard Journaled File System" vs "Large File Enabled Journaled File System"

10. Type "(", ")" and "{", "}" in X...

11. "cd file.zip" instead of "unzip file.zip"?

12. "Can't open file" ( flat file ) and "flock()unimplemented ..."

13. "> file" versus "cp /dev/null file"