File Names and File Dates

File Names and File Dates

Post by Manny Cel » Fri, 04 Jun 1999 04:00:00



Hello Everyone,

I wrote a program that reads the contents of a directory. I am using
"dirent.h" to get my file name.

I also need to get the file date and, for that, I am using stat(). The
problem is that when you use stat(), it reads the entire file system to
get to the filename that you have specified and thus it reads each file
twice for each call.

Example:
Fl_Name        readdir()        stat()
file.1            Get name        get date for file.1
file.2            Get name        read file.1,  get date for file.2
file.3            Get name        read file.1, read file.2, get date for
file.3

and so on.

When you have a large sample (600,000+) files, you can imaging that by
the time it gets to the 250,000 file, it is taking up to 15 minutes to
get the date of the 251,000 file. On my test, it took close to three
hours to read, and get the date of the 250,000 files.

Is there a better way of doing this? Help!!!!

Thank you

 
 
 

File Names and File Dates

Post by Barry Margoli » Fri, 04 Jun 1999 04:00:00



Quote:>Hello Everyone,

>I wrote a program that reads the contents of a directory. I am using
>"dirent.h" to get my file name.

>I also need to get the file date and, for that, I am using stat(). The
>problem is that when you use stat(), it reads the entire file system to
>get to the filename that you have specified and thus it reads each file
>twice for each call.

I think you mean "it reads the entire directory to get to the filename that
you have specified and thus it reads each directory twice for each call."

The kernel has a cache of recently-used names, so that it doesn't have to
re-scan the directory as often as you think.

Quote:>When you have a large sample (600,000+) files, you can imaging that by
>the time it gets to the 250,000 file, it is taking up to 15 minutes to
>get the date of the 251,000 file. On my test, it took close to three
>hours to read, and get the date of the 250,000 files.

>Is there a better way of doing this? Help!!!!

Are you saying that you have a single directory with 600,000 files in it?
Most Unix filesystems are not optimized for such large directories (it
would be nice in such cases if directories were implemented as hash tables
or some kind of B-tree, but they aren't).  It would be best to divide it up
into subdirectories, which would reduce the search time by a logarithmic
factor.

--

GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

File Names and File Dates

Post by Kurt J. Lanz » Sat, 05 Jun 1999 04:00:00



> Hello Everyone,

> I wrote a program that reads the contents of a directory. I am using
> "dirent.h" to get my file name.

> I also need to get the file date and, for that, I am using stat(). The
> problem is that when you use stat(), it reads the entire file system to
> get to the filename that you have specified and thus it reads each file
> twice for each call.

Wrong. It must read the inode for each file. But obviously this takes
time. From your example (250,000 files in 3 hours) your OS is managing
about 20+ files per second. Not infinitely fast, but not too shabby.

More detail: a directory entry contains the file name and the inode
number. This number maps directly to the inode, which is read in one I/O
operation. Its direct addressing. No scanning or anything like that. You
can't get the file's stat() information any faster except by spinning
the disk faster.

 
 
 

File Names and File Dates

Post by Barry Margoli » Sat, 05 Jun 1999 04:00:00





>> Hello Everyone,

>> I wrote a program that reads the contents of a directory. I am using
>> "dirent.h" to get my file name.

>> I also need to get the file date and, for that, I am using stat(). The
>> problem is that when you use stat(), it reads the entire file system to
>> get to the filename that you have specified and thus it reads each file
>> twice for each call.

>Wrong. It must read the inode for each file. But obviously this takes
>time. From your example (250,000 files in 3 hours) your OS is managing
>about 20+ files per second. Not infinitely fast, but not too shabby.

>More detail: a directory entry contains the file name and the inode
>number. This number maps directly to the inode, which is read in one I/O
>operation. Its direct addressing. No scanning or anything like that. You
>can't get the file's stat() information any faster except by spinning
>the disk faster.

I think his complaint is that readdir() already has a pointer to the
directory entry, but when you call stat() on a filename the kernel has to
scan all the way through the directory to find that entry, in order to get
the inode number.  So stat'ing every entry in a directory seems like it's
an N-squared operation.

As I mentioned in my other post, there's a name cache.  However, on
reflection, I don't know offhand if names that are returned by readdir()
are put in it.  The name might not be cached until you call stat(), by
which time it's too late (what he'd like to do is speed up the stat() in
the first place).

As I mentioned, the best solution is to organize your directory better.
The kernel handles deep, thin directory trees much more efficiently than
shallow, wide trees.

--

GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

1. Mail Status File with the File Name Changes by Date

Can someone help me here?

We have a program that runs every night and creates an output file in the
format of "status.year_month_date". For example, it creates a file
for yesterday (Sep 12, 1993) as "status.930912".

I wonder is it possible to write a perl or awk program to mail the
status file to users next morning automatically? I have trouble to make
awk interactive with an existing file.


-Yon

2. Network HP printer setup help!!

3. Scripting Help: tar a dir with time and date as file tar file name..................TIA

4. ISDN:host lookup - failure

5. batch file using system date in file name??

6. "w" and "who" do not list the same users logged in

7. Archiving files in "date folders" based on date and time file generated

8. System crashes (UFS related?) on reboot or shutdown

9. Restore (file by file) and many named files

10. how to extract path/file name upto the mzximum of two levels from the file name?

11. Put Date On File Name

12. Using system date-time as file name

13. create a file copy with name and date