PKZIP vs. Info-ZIP: vaguely amusing results for three OSes

PKZIP vs. Info-ZIP: vaguely amusing results for three OSes

Post by Cave Ne » Mon, 21 Nov 1994 12:50:10

First of all, I'm with Info-ZIP.  But I tried to be sort of fair, so don't
assume this is completely biased propaganda.  (It's only partially biased
propaganda. :-) )

Second, this is long (~200 lines), so you have to be a serious geek to read
it all.  The most interesting stuff (*I* think) is at the end, however.

Third, I did this out of curiosity and don't intend to make a habit out
of it (and I *certainly* don't intend to test any other archivers, zip-
compatible or not), although I may fill in a few of the holes some day.

Vital stats:

hardware:  486/33 ISA bus, 20MB RAM, Maxtor LXT340A IDE drive, ATI GUP 2MB
           (ATI card may make a tiny difference in some results due to
           the scrolling speed in graphics-mode windows)

software:  DOS, OS/2 and Linux; Zip 2.0x/UnZip 5.1x and PKZIP/UNZIP 2.04g;
           various timer utilities

test files:  Calgary Corpus (3MB of text and binary files); in

MS-DOS 3.3 + 386^max 5.1 (I think):

timer zip386 -1 ..\ *     61.46 sec      1240148 bytes
timer zip386 ..\ *        86.67 sec      1067959 bytes    default
timer zip386 -9 ..\ *    118.64 sec      1060575 bytes

timer zip -1 ..\ *        41.19 sec      1240148 bytes
timer zip ..\ *           66.57 sec      1067959 bytes    default
timer zip -9 ..\ *        99.09 sec      1060575 bytes

timer pkzip -es ..\ *     20.54 sec      1261273 bytes
timer pkzip -ef ..\ *     29.17 sec      1142772 bytes
timer pkzip ..\ *         44.22 sec      1074550 bytes    default
timer pkzip -ex ..\ *     70.85 sec      1062813 bytes

        I didn't have the patience to do more than one iteration of each
        test.  On the other hand, I have no disk cache beyond whatever
        small one DOS seems to have.

        Strangely enough, the 386 version of Zip is indeed slower than
        the 16-bit version, as has been noted in Jonathan Burt's archiver
        comparison and elsewhere.  This appears to be largely due to a
        problem with the go32 (djgpp) extender; at the end of compression
        the disk goes nuts for roughly 17-20 seconds, and it sounds very
        much like the entire zipfile is being copied from disk to disk.
        This disk activity does *not* occur in the 16-bit version, so it
        is presumably not a problem inherent to Zip.  One of these days
        I'll recompile Zip with emx+gcc and see what happens under the
        rsx extender.  The 386 version here was compiled with djgpp 1.10,
        and the 16-bit version was compiled with MSC 6 or 7.

        PKZIP clearly wins for speed.  In each of the comparable cases (-es,
        default, -ex) it is faster by 20-some seconds (40-100% faster).  In
        each of those cases Zip wins for size, however.

timer unzip -t ..\       21.75 / 21.80 / 21.75 sec
timer unzip386 -t ..\    11.37 / 11.36 / 11.37 sec
timer pkunzip -t ..\      6.48 /  6.48 /  6.53 sec

        The archive used for decompression was the one created by zip -1
        (1240148 bytes), and the numbers are the result of three trials for
        each test.

        Again PKUNZIP clearly wins in speed.  Contrary to the report from
        one of our beta testers (who claimed unzip386 was sometimes faster
        than PKUNZIP), I don't see any way this could be possible in general,
        unless QEMM 7.x works much better with go32/djgpp than 386^max 5.1
        does (which is quite possible; 386^max 5.1 is rather old).

OS/2 3.0 DOS window (35 lines):

timer zip386 ...                "DPMI:  Not enough memory"

timer zip -1 ..\ *        40.09 sec
timer zip ..\ *           77.37 sec    default
timer zip -9 ..\ *       119.97 sec

timer pkzip ...                 forgot to test (oops)

        The sizes have been omitted here; they're identical to the DOS sizes.

        The DPMI problem is due to the way go32/djgpp allocates it in 1.10;
        it's fixed in 1.12, but the public release of zip386 has never been
        recompiled (and I didn't bother, either :-) ).  I forgot to test
        PKZIP, but judging from the results with 16-bit Zip and with PKUNZIP
        (below), it's probably a few seconds slower than running under pure

timer unzip -t ..\       23.84 / 22.44 / 22.50 sec
timer unzip386 -t ..\    11.62 / 11.13 / 11.16 sec
timer pkunzip -t ..\      6.66 /  6.66 /  6.66 sec

        Not really any change here; unzip16 and pkunzip are slightly slower
        than under pure DOS, but unzip32 is slightly faster.  Weird.  Maybe
        emx+gcc gets the credit for this (as opposed to djgpp gcc in DOS).

OS/2 3.0 OS/2 window (60 lines):

timer zip16 ...                 not available

timer zip32 -1 ..\ *      29.56 sec
timer zip32 ..\ *         62.44 sec    default
timer zip32 -9 ..\ *     103.41 sec

timer unzip16 -t ..\     32.31 / 30.62 / 30.62 sec
timer unzip32 -t ..\      9.44 /  9.25 /  9.43 sec

        I didn't have a copy of 16-bit OS/2 Zip handy and didn't feel like
        compiling one; sorry.  Note that the 32-bit version is getting
        quite respectable in speed, however--it beats both versions of Zip
        under pure DOS and is getting closer to PKZIP...

        Meanwhile the 16-bit version of UnZip is much slower than it was
        under DOS; I don't know if that's because Microsoft's OS/2 libraries
        suck or because OS/2's 32-to-16-bit thunking does.  The 32-bit
        version (compiled with emx+gcc 0.8h) is gratifyingly faster, though
        still not in PKUNZIP's league.

Linux 1.1.54 + XFree86 3.1 color_xterm window (60 lines) on FAT partition:

time zip -1 ../ *        20.78 + 4.78 =  25.56 sec
time zip ../ *           53.80 + 4.85 =  58.65 sec    default
time zip -9 ../ *        94.23 + 4.60 =  98.83 sec

time unzip -t ../         8.76 /  8.21 /  8.15 sec
                      (7.70 + 1.06 / 7.73 + 0.48 / 7.53 + 0.62 sec)

        Still closer to PKWARE, but obviously not there yet.  Of course,
        Linux has a dynamic buffer cache, and that gives Zip/UnZip a small
        advantage--for example, the change in system times on the UnZip test
        is due to the whole archive being cached (along with UnZip itself).

Linux 1.1.54 + XFree86 3.1 color_xterm window (60 lines) on ext2 partition:

time zip -1X ../ *       20.08 + 1.68 =  21.76 sec
time zip -X ../ *        50.24 + 3.42 =  53.66 sec    default
time zip -9X ../ *       85.11 + 1.86 =  86.97 sec

time unzip -t ../         8.82 /  8.26 /  8.17 sec

        The middle Zip test was the first performed; again we see the effect
        of the buffer cache in the system times.

        Note that "zip -1" is now very competitive with "pkzip -es", and the
        archive is smaller, too!  Of course, for honesty throw in that extra
        1.7 seconds to offset the disk cache...  Still, it's pretty amazing:
        23.5 seconds versus 20.5 isn't too bad.

        Also note that all of the Zip times are smaller on the ext2 file
        system, but the UnZip times are virtually identical; Linux is fairly
        pathetic when writing to FAT partitions, but it has no trouble read-
        ing them.  (This was *really* apparent while copying a 15MB Doom file
        to a FAT partition, sigh...)


For kicks I tested Zip's -5 and -6 options separately to see how Zip would
compare with PKZIP's default mode for output archives of approximately the
same size:

time zip -5X ../ *       37.45 + 1.77 =  39.22 sec   1081073 bytes
timer pkzip ..\ *  [DOS]                 44.22 sec   1074550 bytes
time zip -6X ../ *       49.93 + 1.67 =  51.60 sec   1067959 bytes

Unfortunately there's no finer tuning in Zip, so that's as close as we get.
Note, however, that the average size of the Zip archives is 1074516 bytes,
almost exactly the same as the PKZIP archive.  If we interpolate the times
as well, we get 45.41 seconds, only 1.2 seconds slower.  Not too shabby, eh?
Of course, this is for Zip running under an efficient (and 32-bit) OS, not
MS-DOS; and it's using a real filesystem, not FAT.  Also don't forget to
throw in that extra 1.7 seconds to offset the disk cache...that changes the
time to 47.1 seconds (2.9 seconds or 6.6% slower).


Conclusions?  Well, even a well-chosen benchmark suite is still only one
suite; in other words, real-life tests with real data (e.g., all of the
SimTel zipfile archives) are required in order to draw statistically sig-
nificant conclusions.  Nevertheless, with the Corpus, at least, PKZIP and
PKUNZIP win for speed; Zip wins (barely) for size; and UnZip doesn't really
win for anything except portability and charm. :-)  The real point is that
32-bit C programs (Zip uses one assembler routine for string matches, I
think) can compete reasonably well with 16/32-bit tuned assembly language.
I suspect this point will be driven home more forcefully when (if?) the
next version of PKZIP is released...

Among OSes, Linux wins, but OS/2 is close behind (and I've only had 3.0 for
a week, so it may not be tuned particularly well just yet).  DOS just sucks,
but pretty much everybody agrees on that. :-)  (And if you just *have* to
flame me on this (luser!), please do so via e-mail or in alt.flame.  We've
all seen dozens of OS flamewars...)

Follow-ups are directed to comp.compression only.

Greg Roelofs