synchronize files with downloaderrors

synchronize files with downloaderrors

Post by peter pils » Mon, 17 Mar 2003 03:39:52



I've just downloaded a very big binary file from a remote machine and there
was a downloadproblem and the md5sums does not match.

Now I could simply download the file again (which would be very
timeconsuming and put load on the bandwitdth again) or I could try a
different approach and only download the part that actually has problems.

The basic idea I had was to split the file in chunks on both machines
(based on a binary approach)  and compare the md5-hashes for each chunk. If
the hashes dont match, then split the chunk again and compare the hashes
again and so on. Therefore I could 'easily' (at least in theory) get the
part(s) that has problems and redownload only this part.  (Or get mainly
errornous chunks and therefore know that I need to download the whole thing
again).
Is there already a tool that can do what I want ?

thnx,
peter  

--
peter pilsl

http://www.goldfisch.at

 
 
 

synchronize files with downloaderrors

Post by peter pils » Mon, 17 Mar 2003 03:43:23


I've just downloaded a very big binary file from a remote machine and there
was a downloadproblem and the md5sums does not match.

Now I could simply download the file again (which would be very
timeconsuming and put load on the bandwitdth again) or I could try a
different approach and only download the part that actually has problems.

The basic idea I had was to split the file in chunks on both machines
(based on a binary approach)  and compare the md5-hashes for each chunk. If
the hashes dont match, then split the chunk again and compare the hashes
again and so on. Therefore I could 'easily' (at least in theory) get the
part(s) that has problems and redownload only this part.  (Or get mainly
errornous chunks and therefore know that I need to download the whole thing
again).
Is there already a tool that can do what I want ?

Mainly a tool would be interesting that builds a hash of a file and every x
MB dump the intermediate hash every x MB plus the hash of the last x MB.
Therefore one could easily see where the problems starts and where the
files get in sync again. (especially if you ran into from beginning to end
and vice versa)

thnx,
peter  

--
peter pilsl

http://www.goldfisch.at

 
 
 

synchronize files with downloaderrors

Post by Erik Max Franci » Mon, 17 Mar 2003 05:51:31



> I've just downloaded a very big binary file from a remote machine and
> there
> was a downloadproblem and the md5sums does not match.

> Now I could simply download the file again (which would be very
> timeconsuming and put load on the bandwitdth again) or I could try a
> different approach and only download the part that actually has
> problems.

How did you download the file?  If the download claimed to successfully
finish but the MD5 hashes don't match, it's likely that something
systematically went wrong rather than the file got corrupted in just one
(or a very few) spots.  It's more likely you, say, FTP'd the file in
ASCII mode (in which case the entire file is corrupted).

Quote:> The basic idea I had was to split the file in chunks on both machines
> (based on a binary approach)  and compare the md5-hashes for each
> chunk. If
> the hashes dont match, then split the chunk again and compare the
> hashes
> again and so on. Therefore I could 'easily' (at least in theory) get
> the
> part(s) that has problems and redownload only this part.  (Or get
> mainly
> errornous chunks and therefore know that I need to download the whole
> thing
> again).
> Is there already a tool that can do what I want ?

Should take about five minutes to test your theory with a shell script,
certainly with fix-sized chunks, anyway.

--

 __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/  \ All bad poetry springs from genuine feeling.
\__/ Oscar Wilde
    Bosskey.net: Unreal Tournament 2003 / http://www.bosskey.net/ut2k3/
 A personal guide to Unreal Tournament 2003.

 
 
 

synchronize files with downloaderrors

Post by Mina Nagui » Mon, 17 Mar 2003 06:01:10


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I've just downloaded a very big binary file from a remote machine and there
> was a downloadproblem and the md5sums does not match.

> Now I could simply download the file again (which would be very
> timeconsuming and put load on the bandwitdth again) or I could try a
> different approach and only download the part that actually has problems.

> The basic idea I had was to split the file in chunks on both machines
> (based on a binary approach)  and compare the md5-hashes for each chunk. If
> the hashes dont match, then split the chunk again and compare the hashes
> again and so on. Therefore I could 'easily' (at least in theory) get the
> part(s) that has problems and redownload only this part.  (Or get mainly
> errornous chunks and therefore know that I need to download the whole thing
> again).
> Is there already a tool that can do what I want ?

> Mainly a tool would be interesting that builds a hash of a file and every x
> MB dump the intermediate hash every x MB plus the hash of the last x MB.
> Therefore one could easily see where the problems starts and where the
> files get in sync again. (especially if you ran into from beginning to end
> and vice versa)

I've contemplated something quite similar a while wgo, however none of
the existing popular protocols (ftp/http) support anything like that.

Also bare in mind that calculating the MD5 checksum on a large file (or
a large chunk) is expensive CPU-wise.  Implementing something like that
server-side would surely spell disaster and an easy way to do DOS-style
attacks on the server.

A viable alternative is if the server pre-calculates the MD5 checksums
once, up to X levels deep (1/2 #1, 1/2 #2, 1/4 #1, 1/4 #2, 1/4 #3, 1/4
#4) for example.. And a client would be able to request the
pre-calculated checksum for level X chunk Y.  The checksums of the
chunks would be metadata associated on the server-level for each file.

This is not hard to implement technically-speaking.  However, since most
standard protocols are that, standards, modifying them would require
going through quite tedious procedures and discussions to get it into
the RFCs.. And then the often endless wait of hoping that software
producers adhere to the new standards...

Alternatively, a new protocol could be developed that works in
conjunction with the standard protocols, which the sole job of would be
to reply with MD5 checksums.  Using that info, a "resumed" transfer
could be requested from the HTTP or FTP server, both of which I believe
support resumed transfers at this point in time.

Just my $0.02

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+c5SZeS99pGMif6wRAjBWAJ9yd53qMBELXV2Z+RvqevNC4blScACg1cEO
xAj5VC5AIhUKXfEGNAX8wXM=
=bY3j
-----END PGP SIGNATURE-----

 
 
 

synchronize files with downloaderrors

Post by Dale R Worle » Wed, 19 Mar 2003 10:56:00



> The basic idea I had was to split the file in chunks on both machines
> (based on a binary approach)  and compare the md5-hashes for each chunk. If
> the hashes dont match, then split the chunk again and compare the hashes
> again and so on.

What if byte number 5 got dropped?  Then all the chunk checksums would
be different...

Dale

 
 
 

synchronize files with downloaderrors

Post by Mina Nagui » Thu, 20 Mar 2003 14:35:23


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



>>The basic idea I had was to split the file in chunks on both machines
>>(based on a binary approach)  and compare the md5-hashes for each chunk. If
>>the hashes dont match, then split the chunk again and compare the hashes
>>again and so on.

> What if byte number 5 got dropped?  Then all the chunk checksums would
> be different...

Precisely, so you divide the chunk into 2 parts and compare the
checksums for each.

Repeat it down to X levels at which point you can say "enough
calculating, send me this while chunk"...

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+eAGgeS99pGMif6wRAiKvAJwPZHVPtbz6VVGs45mKc7zxeeJffACfXP6Q
QMuxpwzVEjrr+q2WhIw/bac=
=Q1e9
-----END PGP SIGNATURE-----

 
 
 

1. Synchronize Files Between Two Linux Machines

Anyone know of software which will synchronize directories between two
Linux machines on a network or across the internet?  How about Windows
machines or Linux to Windows??

I need to have users make changes at either machine and have the changes
propogate to the other machine so that in time both machines will become
identical (at least the directories specified).

Thanks in advance.  Please email me your response.

Steve Marcus

2. How to simulate hardware action?

3. Synchronizing files

4. ppp not connecting

5. Need to synchronize password files on multiple machines

6. any Free Software replacement for Ultimate Bulletin BoardTM?

7. Synchronising file access permissions?

8. move nr_active and nr_inactive into per-CPU page accounting

9. Synchronizing 2 file systems

10. what's the strategy of mmap() synchronize memory with disk file

11. Synchronizing Files

12. How to synchronize files across standalone machines ?

13. How to synchronize a Jfs file in Kernel