In that situation seems like the diff could always be underQuote:> 1) What is the THEORETICAL best likely binary diff savings over
> simply saving the whole thing every time? Assume a medium (not more
> than 10% but not trivial either) difference in bytes, but the
> differences are in random places, in a random application domain.
approximately 20% of the size of the file (10% for the differing
bytes, 12.5% for a bitmap saying which bytes differ). Although that
may be for a definition of "differing bytes" which isn't what you have
in mind, my intuition is that for some plausible definition of
differing bytes, the result would be in that ballpark.
I would think that any good binary diff algorithm would include someQuote:> I know there are those that have experienced binary diff situations
> where the result files were much LARGER than the original two files!
> However in theory this should not happen, since you could simply
> have a flag included, that if the result is larger, do not use the
> result, simply use the original file plus flag: the file ends up
> only a few bytes larger.
such mechanism (probably per-block rather than per-file), but I don't
know whether the common algorithms like xdelta do or not.