Scan a Directory/cron question

Scan a Directory/cron question

Post by Bernard Blundel » Tue, 16 Feb 1999 04:00:00




> I need to have a script scan a directory like every 10 minutes and do some
> things to a file that appears in that directory.  You see, the file gets there
> after some processing is done on another filesystem.  Anyway, the scripting
> isn't the question I have.  My co-worker is concerned that this technique of
> scanning the directory every 10 minutes is not foolproof because she says that
> it is possible that my script may grab the file before all the data comes in.
> She's saying that I may grab the file before the full size of the file truly
> comes in.  Is that true?  My question to you guys is : when I set my cron to
> grab a file every 10 minutes, is it possible that it could get only half a file
> instead of the fullsized file.  Does the filename get to a directory before all
> the data does?  Also, is making the cron look at a directory every 10 minutes
> the only way to truly scan a directory for a file, since I can't think of a way
> to say, "When a file (THE WHOLE FILE) arrives, run this script".

Hello,

You don't specify which OS you're using, but fuser(1) may help you out. Its output
is not too hard to parse with ksh/awk/perl. Alternatively, you could 'poll' the file
(it appears to be a single well-known file you're concerned about), for a few
seconds, and see if it's been modified during that time. Create a reference file,
use some thing like:

while
    touch reference.file
    sleep 2
    [[ file.youre.interested.in -nt reference.file ]]
do
    :
done

This could be fooled by a slow delivery, so I wouldn't personally rely on it.

Alternatively, if your mechanism is flexible enough, can the writer of the file also
rename it? A simple technique is for the writer to write to 'foo.tmp', and when
they've finished writing, rename 'foo.tmp' as 'foo'. That way, your script can't be
fooled by an incomplete file as its name won't appear until the writing is complete.

Good luck

--
When the only tool you have is a hammer,
every problem begins to look like a nail.

 
 
 

Scan a Directory/cron question

Post by RACE » Tue, 16 Feb 1999 04:00:00



Quote:> I need to have a script scan a directory like every 10 minutes and do some
> things to a file that appears in that directory.  You see, the file gets there
> after some processing is done on another filesystem.  Anyway, the scripting
> isn't the question I have.  My co-worker is concerned that this technique of
> scanning the directory every 10 minutes is not foolproof because she says that
> it is possible that my script may grab the file before all the data comes in.
> She's saying that I may grab the file before the full size of the file truly
> comes in.  Is that true?  My question to you guys is : when I set my cron to
> grab a file every 10 minutes, is it possible that it could get only half a file
> instead of the fullsized file.  Does the filename get to a directory before all
> the data does?  Also, is making the cron look at a directory every 10 minutes
> the only way to truly scan a directory for a file, since I can't think of a way
> to say, "When a file (THE WHOLE FILE) arrives, run this script".

Hi,

        if your first program is running in another fs, you might as well
create a second file (after the whole file has been sent) within the same
directory and have your cron job/script scan for that file... and if it
exists, then you know that your file is intact and whole, and run what you
need to. perhaps a little more overhead, but as i see it, better than
risking a lag of some sort and getting a piece of the data.

-John.

 
 
 

Scan a Directory/cron question

Post by Richard Howlet » Tue, 16 Feb 1999 04:00:00



> I need to have a script scan a directory like every 10 minutes and do some
> things to a file that appears in that directory.  You see, the file gets there
> after some processing is done on another filesystem.  Anyway, the scripting
> isn't the question I have.  My co-worker is concerned that this technique of
> scanning the directory every 10 minutes is not foolproof because she says that
> it is possible that my script may grab the file before all the data comes in.
> She's saying that I may grab the file before the full size of the file truly
> comes in.  Is that true?  My question to you guys is : when I set my cron to
> grab a file every 10 minutes, is it possible that it could get only half a file
> instead of the fullsized file.  Does the filename get to a directory before all
> the data does?  Also, is making the cron look at a directory every 10 minutes
> the only way to truly scan a directory for a file, since I can't think of a way
> to say, "When a file (THE WHOLE FILE) arrives, run this script".

Can you not send another file containing the length of the target file?

Or how about sending to another directory then moving to the pickup
directory? The "move" should be atomic.

--
Richard Howlett


 
 
 

Scan a Directory/cron question

Post by Ralf Draege » Wed, 24 Feb 1999 04:00:00




> > I need to have a script scan a directory like every 10 minutes and do some
> > things to a file that appears in that directory.  You see, the file gets there
> > after some processing is done on another filesystem.  Anyway, the scripting
> > isn't the question I have.  My co-worker is concerned that this technique of
> > scanning the directory every 10 minutes is not foolproof because she says that
> > it is possible that my script may grab the file before all the data comes in.
> > She's saying that I may grab the file before the full size of the file truly
> > comes in.  Is that true?  My question to you guys is : when I set my cron to
> > grab a file every 10 minutes, is it possible that it could get only half a file
> > instead of the fullsized file.  Does the filename get to a directory before all
> > the data does?  Also, is making the cron look at a directory every 10 minutes
> > the only way to truly scan a directory for a file, since I can't think of a way
> > to say, "When a file (THE WHOLE FILE) arrives, run this script".

> Can you not send another file containing the length of the target file?

> Or how about sending to another directory then moving to the pickup
> directory? The "move" should be atomic.

Not true if the move goes from one fs to another.
fuser is IMHO the better joice.

And a hint for SJacksonII:
To fully understand the problem try this:
$ touch xxx
$ tail -f xxx &
(This scans a file and if data is appended it is also printed on the terminal)
$ cat /etc/hosts >> xxx
(OK, a UUOC but it's for the understanding :)
$ fg
<Ctrl-C>

--

- Intraplan Consult Gmbh  Orleansplatz 5a  81667 Muenchen  +49 89 45911-0 -

God, root, what is the difference? -Pitr (www.userfriendly.org: 11/11/1998)

 
 
 

1. Scan a Directory/cron question


Unless the file is showing up as the result of mv command
(or rename() function) (or link command/function) from a
file on the same filesystem as the target directory, there
is indeed a race condition, and you are running the risk of
your script seeing an incompletely transfered file.

When a file is created, it is created with zero size and a
full complement of metadata, including the directory entry.
Any other process can see that file and start reading it while
the first process is still writing to it.  But look back to
that "unless" in my previous paragraph.  Therein lies your most
likely solution (if you must poll for the file), provided you
have some control on the means by which the files appears in
the target directory: have the file get transfered into the
directory with one name, and then do a mv/rename of the file
to a second name once the transfer/creation of the first file
is complete.  Have your script look for this second name;
it won't exist unless the whole file is there.

Then again, if you have this kind of control, it may be just
as easy to have the creating process fire up an instance of
your script once the file transfer is complete, obviating
the need for the cron-based polling.

                --Ken Pizzini

2. Help installing LINUX on CNR machine

3. NEC XV17 Modelines??

4. os5 install prob

5. Help! Cron and Samba / Cron and missing directories?

6. Getting a MouseSystems mouse to work

7. question about scanning directories in c++

8. cron.deny and cron.allow files for CRON management

9. Good way to scan a directory

10. can grep scan the contents of files held in sub-directories?

11. Directory Scan