Which is better: tar->gzip or gzip->tar?

Which is better: tar->gzip or gzip->tar?

Post by Jeff Arnho » Fri, 02 Sep 1994 01:23:02



For best compression on previously uncompressed files,
which is better: tar * | gzip, or gzip * | tar?
IE, is it best to tar compressed files, or compress
a tar file of uncompressed files?  Does gzip -r * work
better than either solution?

I'm looking for the most robust method to archive groups
of files.

---


Mayo Medical and Graduate Schools        
200 1st St. SW, Rochester, MN 55905

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Henry Wa » Fri, 02 Sep 1994 09:50:39



>For best compression on previously uncompressed files,
>which is better: tar * | gzip, or gzip * | tar?
>IE, is it best to tar compressed files, or compress
>a tar file of uncompressed files?  Does gzip -r * work
>better than either solution?

tar * | gzip will give a smaller output because the headers will be
compressed & because the file is larger, minimizing gzip's own headers.
You can get the same effect by using the GNU tar -z option.  gzip -r
recursively gzips files: it does not create a single file.

Quote:>I'm looking for the most robust method to archive groups
>of files.

Thats a different and stickier question.  I don't know that either of
these is especially robust.

-Henry

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Remco Treffko » Fri, 02 Sep 1994 17:02:17


: tar * | gzip will give a smaller output because the headers will be
: compressed & because the file is larger, minimizing gzip's own headers.
: You can get the same effect by using the GNU tar -z option.  gzip -r
: recursively gzips files: it does not create a single file.

: >I'm looking for the most robust method to archive groups
: >of files.

: Thats a different and stickier question.  I don't know that either of
: these is especially robust.

You already answerd that :-)
In comparison it is better to compress first, and tar then.
This way a bit error only messes up one file, but tar can recover on the
next header. If you compress the resulting .tar file, a bit error will
mess up the rest of the archive.

OTOH, I have yet to see a messed up .tgz or .tar.z file (knock on wood).

There you have it.

Remco
--

Remco Treffkorn, DC2XT

(408) 685-1201

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Alex Nicola » Fri, 02 Sep 1994 22:22:32



>For best compression on previously uncompressed files,
>which is better: tar * | gzip, or gzip * | tar?
>IE, is it best to tar compressed files, or compress
>a tar file of uncompressed files?  Does gzip -r * work
>better than either solution?
>I'm looking for the most robust method to archive groups
>of files.

 For two reasons it should be better to tar first and compress later.
If you zip first, each zip file will have some overhead that you could be
repeating hundreds of times. And tarring all the zip files will introduce
tar's overhead (uncompressed) between every file. The other way, all of
tar's header info gets compressed, you don't hvae zip overhead, and you
should also be able to achieve better compression on the one big file than
you can individually on little files (VERY true for tiny files).

Between tarring then zipping versus using zip to contain the directory
structure I have no idea which is better - why not just try it on "typical"
files?

alex

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Jason Rimm » Sat, 03 Sep 1994 03:39:37


        If I'm not mistaken, gzip only compresses 1 file.  In other words,
if you have a group of files, gzip will only zip one of the files.  TAR's
functionality is very different from GZIP, where gzip is a compression
product, TAR's function is to combine many files (and in different
trees) into one file.
        So your question is really moot, as tar and gzip are really
different things entirely.  If you have many files to zip, you should use
TAR to combine them into one file, and then use gzip to zip the resulting
file.

: For best compression on previously uncompressed files,
: which is better: tar * | gzip, or gzip * | tar?
: IE, is it best to tar compressed files, or compress
: a tar file of uncompressed files?  Does gzip -r * work
: better than either solution?

: I'm looking for the most robust method to archive groups
: of files.

: ---
:    

: Mayo Medical and Graduate Schools        
: 200 1st St. SW, Rochester, MN 55905

--

Eclectic Technologies

"I want to die peacefully in my sleep like my
 grandfather, not screaming in terror like his
 passengers."
        -Anonymous(?)

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Tom Griffi » Sat, 03 Sep 1994 13:10:52



>For best compression on previously uncompressed files,
>which is better: tar * | gzip, or gzip * | tar?
>IE, is it best to tar compressed files, or compress
>a tar file of uncompressed files?  Does gzip -r * work
>better than either solution?

>I'm looking for the most robust method to archive groups
>of files.

For me, the order doesn't make much difference, except
that it's easier to use GNU-tar like so:

    tar czf {outputFileOrDevice} {fileSpec}

and

   tar xzf {inputFileOrDevice}

Works for me!

--
 _____________________________________________________
| Thomas L. Griffing       |                          |

|__________________________|__________________________|

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Delema » Sat, 03 Sep 1994 18:32:17


        It's more critical than that: tar is a "block device" archiver, that
means it use N blocks for each file archived with a block size of Nx512 bytes
(default N=20). Suppose you have 100 small files of 512 bytes, each of them will
require one 20x512 bytes long block, 1000kb for them all to be compare with
100x512 = 50kb.

        If you first archive files, then compress the archive file, gzip will
well compress the lost space at the end of blocks. If you first compress then
archive, you will have small files but will never compress the lost spaces !

        The best solution is definitively to use "tar zcf".

                                        DELEMAR Olivier

       ******************************************************************
       * DELEMAR Olivier               | Room   : 527                   *
       * ICP/INPG                      | Phone  : 76-57-48-27           *
       * 46 Av. Felix VIALLET          | Fax.   : 76-57-47-10           *

       ******************************************************************

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Adam Tilghm » Sun, 04 Sep 1994 01:26:51



>If you zip first, each zip file will have some overhead that you could be
>repeating hundreds of times. And tarring all the zip files will introduce
>tar's overhead (uncompressed) between every file. The other way, all of
>tar's header info gets compressed, you don't hvae zip overhead, and you
>should also be able to achieve better compression on the one big file than
>you can individually on little files (VERY true for tiny files).

        Of course, if you're storing this tar file on a tape drive,
it might be wise to compress each file individually:  tape drives
have been known to lose a block here and there, and if you lose a
byte 50% into your 100Mb backup tape, gunzip might not have enough
context to continue with the restore.

        -- adam

--
Adam G. Tilghman | email:              | voice:          | Rng FCNZ naq yvir.

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Randy Hootm » Sun, 04 Sep 1994 03:31:11


Let me through in my two cents (which is another reason that I run Linux
;-) beside I love it).

It would seem to me that it would be a case by case basis as to what is
better maybe. I would seem to me that there would be little difference if
the files were binary, tar then gzip would be slightly better maybe.
However, if the files were text, maybe it would be better to tar then
gzip. Wouldn't you get better rtl lz compression that way? It probably
doesn't make that big of a difference anyways. Just a rambling thought.

Randy

--

///////////////////////////////////////////////////////////////////////
     "In recognizing the humanity of our fellow beings,
      we pay ourselves the highest tribute." - Thurgood Marshall
//////////////////////////////////////////////////////////////////////
Randy Hootman                Randysoft Software             (408) 229-0119

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Steven A. Reism » Sun, 04 Sep 1994 04:29:39


: For best compression on previously uncompressed files,
: which is better: tar * | gzip, or gzip * | tar?
: IE, is it best to tar compressed files, or compress
: a tar file of uncompressed files?  Does gzip -r * work
: better than either solution?

Offhand, I'd say that "tar * | gzip" would give you a smaller file.
However, if some portion of the resultant file gets corrupted by even a
single bit, all data past that point would be unrecoverable.

: I'm looking for the most robust method to archive groups
: of files.

"gzip * | tar" would be much safer.

You might look into "afio", too.

--
Steven A. Reisman

Afton, MN  55001                                      (612) 436-7125

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Alan C » Tue, 06 Sep 1994 19:12:04


I for one use tar and then gzip -9 to get smallest files, but gzip and then
tar in two cases

1.      When I want to pull odd files out without uncompressing 30Mb of data
2.      For backups. If you lose a chunk of a gzipped file you can throw it
        away pretty much. If you lose a chunk of tar you can get the other
        files out still.

Alan

--
  ..-----------,,----------------------------,,----------------------------,,

 ``----------'`----------------------------'`----------------------------''

 
 
 

Which is better: tar->gzip or gzip->tar?

Post by Kai Petz » Fri, 09 Sep 1994 22:15:33



>    It's more critical than that: tar is a "block device" archiver, that
>means it use N blocks for each file archived with a block size of Nx512 bytes
>(default N=20). Suppose you have 100 small files of 512 bytes, each of them will
>require one 20x512 bytes long block, 1000kb for them all to be compare with
>100x512 = 50kb.

This is wrong.  Tar writes blocks of 20 x 512 bytes (unless you override
it with options), but it does not pad files to 20 x 512 byte blocks.  It
pads files only to 512 byte blocks.  The maximum, that you loose on one
file in an tar archive thus is 511 byte and another 512 byte for the header
block.

However, the archive as a whole is padded to 20x512 byte blocks.  So the
uncompressed size of tar files is always a multiple of 10240 bytes.

Kai
--
Kai Petzke                      | How fast can computers get?
Technical University of Berlin  |
Berlin, Germany                 | Sol 9, of course, on Star Trek.