Optimizing directory structures for ext2 fs and lots of files.

Optimizing directory structures for ext2 fs and lots of files.

Post by Lincoln Ye » Sun, 02 Apr 2000 04:00:00



Hi,

I've heard that ext2 fs becomes less efficient if there are tons of files
in a directory.

OK what if I have lots of files. How should they be split?
By 100s?
e.g.
/opt/d0/file0
..
/opt/d0/file99

/opt/d1/file100
..
/opt/d1/file199

/opt/d99/file9900
..
/opt/d99/file9999

Or by 200s? 500s? or 1000s?

Basically how much time does it take to change one directory level, vs scan
through 100 files. How flat should the "pyramid" be.

I'll probably consider other file systems in the future (they have to be
fast, cheap, reliable, robust and SMP safe). But meanwhile I'm sticking
with ext2.

Thanks,

Have a nice day!
Link.
****************************



*******************************

 
 
 

Optimizing directory structures for ext2 fs and lots of files.

Post by Christopher Brow » Sun, 02 Apr 2000 04:00:00


Centuries ago, Nostradamus foresaw a time when Lincoln Yeoh would say:

Quote:>Hi,

>I've heard that ext2 fs becomes less efficient if there are tons of files
>in a directory.

>OK what if I have lots of files. How should they be split?
>By 100s?
>e.g.
>/opt/d0/file0
>..
>/opt/d0/file99

>/opt/d1/file100
>..
>/opt/d1/file199

>/opt/d99/file9900
>..
>/opt/d99/file9999

>Or by 200s? 500s? or 1000s?

>Basically how much time does it take to change one directory level, vs scan
>through 100 files. How flat should the "pyramid" be.

>I'll probably consider other file systems in the future (they have to be
>fast, cheap, reliable, robust and SMP safe). But meanwhile I'm sticking
>with ext2.

I think I'd go with the 100 option.  

It has the merit that you can go into the directory, type "ls," and
get a list of files/directories that is not so large that it has to occupy
several screens.  

A couple other thoughts:

a) Use leading zeros so that these encoded filenames are of uniform
   length.
   For instance, /opt/d00, /opt/d02, ... /opt/d98, /opt/d99

Uniform lengths means that you can do matches via more specific
expressions that can be safer and possibly faster.

"ls /opt/d[0-9] /opt/d[0-9][0-9]"
   is not as good as
"ls /opt/d[0-9][0-9]"

b) If this stuff is cryptic, there's no merit in having long filenames.

/opt/d00/f210 is more compact than /opt/d00/file210, and is no less
understandable.

c) Be prepared to do a benchmark based on using 100 files/directory
as well as 1000 files/directory.  That's likely the most relevant
comparison.  You're not likely to see *great* benefit in moving from
100 files/directory to some "perfect sweet spot" of 345/directory.

d) Consider using hexadecimal values in the encoding, or, if you want
"several hundred" files per directory, the option of transforming to
"base 36," where you combine the 10 digits 0..9 with the 26 letters a..z
to provide you [10+26] * [10+26] or 1296 as the limit in two characters.
Small filenames are going to be more efficient to work with both in your
code and within the kernel's support for the filesystem.
--
--Kill Running Inferiors--