-----BEGIN PGP SIGNED MESSAGE-----
> I've just downloaded a very big binary file from a remote machine and there
> was a downloadproblem and the md5sums does not match.
> Now I could simply download the file again (which would be very
> timeconsuming and put load on the bandwitdth again) or I could try a
> different approach and only download the part that actually has problems.
> The basic idea I had was to split the file in chunks on both machines
> (based on a binary approach) and compare the md5-hashes for each chunk. If
> the hashes dont match, then split the chunk again and compare the hashes
> again and so on. Therefore I could 'easily' (at least in theory) get the
> part(s) that has problems and redownload only this part. (Or get mainly
> errornous chunks and therefore know that I need to download the whole thing
> Is there already a tool that can do what I want ?
> Mainly a tool would be interesting that builds a hash of a file and every x
> MB dump the intermediate hash every x MB plus the hash of the last x MB.
> Therefore one could easily see where the problems starts and where the
> files get in sync again. (especially if you ran into from beginning to end
> and vice versa)
I've contemplated something quite similar a while wgo, however none of
the existing popular protocols (ftp/http) support anything like that.
Also bare in mind that calculating the MD5 checksum on a large file (or
a large chunk) is expensive CPU-wise. Implementing something like that
server-side would surely spell disaster and an easy way to do DOS-style
attacks on the server.
A viable alternative is if the server pre-calculates the MD5 checksums
once, up to X levels deep (1/2 #1, 1/2 #2, 1/4 #1, 1/4 #2, 1/4 #3, 1/4
#4) for example.. And a client would be able to request the
pre-calculated checksum for level X chunk Y. The checksums of the
chunks would be metadata associated on the server-level for each file.
This is not hard to implement technically-speaking. However, since most
standard protocols are that, standards, modifying them would require
going through quite tedious procedures and discussions to get it into
the RFCs.. And then the often endless wait of hoping that software
producers adhere to the new standards...
Alternatively, a new protocol could be developed that works in
conjunction with the standard protocols, which the sole job of would be
to reply with MD5 checksums. Using that info, a "resumed" transfer
could be requested from the HTTP or FTP server, both of which I believe
support resumed transfers at this point in time.
Just my $0.02
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----