> I need to have a script scan a directory like every 10 minutes and do some
> things to a file that appears in that directory. You see, the file gets there
> after some processing is done on another filesystem. Anyway, the scripting
> isn't the question I have. My co-worker is concerned that this technique of
> scanning the directory every 10 minutes is not foolproof because she says that
> it is possible that my script may grab the file before all the data comes in.
> She's saying that I may grab the file before the full size of the file truly
> comes in. Is that true? My question to you guys is : when I set my cron to
> grab a file every 10 minutes, is it possible that it could get only half a file
> instead of the fullsized file. Does the filename get to a directory before all
> the data does? Also, is making the cron look at a directory every 10 minutes
> the only way to truly scan a directory for a file, since I can't think of a way
> to say, "When a file (THE WHOLE FILE) arrives, run this script".
You don't specify which OS you're using, but fuser(1) may help you out. Its output
is not too hard to parse with ksh/awk/perl. Alternatively, you could 'poll' the file
(it appears to be a single well-known file you're concerned about), for a few
seconds, and see if it's been modified during that time. Create a reference file,
use some thing like:
[[ file.youre.interested.in -nt reference.file ]]
This could be fooled by a slow delivery, so I wouldn't personally rely on it.
Alternatively, if your mechanism is flexible enough, can the writer of the file also
rename it? A simple technique is for the writer to write to 'foo.tmp', and when
they've finished writing, rename 'foo.tmp' as 'foo'. That way, your script can't be
fooled by an incomplete file as its name won't appear until the writing is complete.
When the only tool you have is a hammer,
every problem begins to look like a nail.