Text to text compression

Text to text compression

Post by Maldwyn G.T. Morri » Wed, 13 Nov 1996 04:00:00



I want to compress an ascii text file, but would like the output to be
another ascii text file.

Is there an algorithm to do this that beats simply compressing and
uuencoding ?

Thanks, Maldwyn.

 
 
 

Text to text compression

Post by Walter Robers » Wed, 13 Nov 1996 04:00:00




:I want to compress an ascii text file, but would like the output to be
:another ascii text file.
:Is there an algorithm to do this that beats simply compressing and
:uuencoding ?

Of course there is. uuencoding only uses a range of 64 characters per
position. Even just using an encoding algorithm that packed more densely
would be better.

There is no particular reason that a compression algorithm needs to
output into base 256 (a binary file), or indeed into a base that is
a power of 2. Arithmetic encoding in particular might be more
efficient with base 256, but works fine with other bases such as base
95 (the number of printable characters in ASCII.)

 
 
 

Text to text compression

Post by Gordon Swo » Wed, 13 Nov 1996 04:00:00



Quote:>I want to compress an ascii text file, but would like the output to be
>another ascii text file.

>Is there an algorithm to do this that beats simply compressing and
>uuencoding ?

PGP with the "-c" option will compress your ascii file before
encrypting it with IDEA encryption. The encrypted ascii file will be
approximately 30% larger than an encrypted binary of the same source
file, but the compression more than compensates.

For example, for a text file containing the complete King James Bible:

uncompressed ascii          = 4,844KB
PGP compressed (ascii)    = 2,116KB
PGP compressed (binary) = 1,539KB
Zip compressed (binary)    = 1,394KB

The result of PGP compression is of course unreadable (it's encrypted)
but it can be mailed through normal e-mail channels. MIME and UUENCODE
also compress files to ascii, though I don't know the compression
ratios.

Hope this helps.

_________________________________

 
 
 

Text to text compression

Post by bme.. » Thu, 14 Nov 1996 04:00:00



>There is no particular reason that a compression algorithm needs to
>output into base 256 (a binary file), or indeed into a base that is
>a power of 2. Arithmetic encoding in particular might be more
>efficient with base 256, but works fine with other bases such as base
>95 (the number of printable characters in ASCII.)

You _really_ think so? In theory, of course, but try implementing it,
and all sorts of ugly things crop up. I know --- I implemented AC
with residue number systems in '94, and when your base is not a power
of two, there is LOTS of trouble ;-)

Bernie
--
============================================================================
"How does Windows work?"
  Mike Battersby, BCS (Hon), System administrator at Deakin Uni, while trying
  to handle Netscape/Windows during a meeting of the Linux Users of Victoria

 
 
 

Text to text compression

Post by Maldwyn G.T. Morri » Thu, 14 Nov 1996 04:00:00



> > ...
> > Is there an algorithm to do this that beats simply compressing and
> > uuencoding ?

> Of course there is...
> ...

WELL WHAT IS IT THEN ??!!
( I had hoped my question kind of implied this, but there you go...)

Maldwyn.

 
 
 

Text to text compression

Post by Leonid A. Broukh » Thu, 14 Nov 1996 04:00:00




>>There is no particular reason that a compression algorithm needs to
>>output into base 256 (a binary file), or indeed into a base that is
>>a power of 2. Arithmetic encoding in particular might be more
>>efficient with base 256, but works fine with other bases such as base
>>95 (the number of printable characters in ASCII.)
>You _really_ think so? In theory, of course, but try implementing it,
>and all sorts of ugly things crop up. I know --- I implemented AC
>with residue number systems in '94, and when your base is not a power
>of two, there is LOTS of trouble ;-)

To encode into base 95 is to decode the bitstream using a uniform model
with 95 characters, isn't it? The only ugly thing that crops up is
how to encode the end of the bitstream.

        Leo

 
 
 

Text to text compression

Post by bme.. » Fri, 15 Nov 1996 04:00:00




>>You _really_ think so? In theory, of course, but try implementing it,
>>and all sorts of ugly things crop up. I know --- I implemented AC
>>with residue number systems in '94, and when your base is not a power
>>of two, there is LOTS of trouble ;-)
>To encode into base 95 is to decode the bitstream using a uniform model
>with 95 characters, isn't it? The only ugly thing that crops up is
>how to encode the end of the bitstream.

Yes, yes, yes --- I am sorry! I was tired (and had had a shocker of a day,
anyway), and mixed up the internal workings of the AC and the output
alphabet. Of course, as long as they use the same number base, you are
fine --- but implementing a base-2 output AC with something other than
base-2 for the internal workings, _that's_ trouble.

Sorry for the confusion,

   Bernie

--
============================================================================
"How does Windows work?"
  Mike Battersby, BCS (Hon), System administrator at Deakin Uni, while trying
  to handle Netscape/Windows during a meeting of the Linux Users of Victoria

 
 
 

1. Text to Text compression

Hi,

Does anyone know of any Text to Text compression utilities or algorithm?

The scenario is that I have an XML file which has one of its elements
too huge[ about 4 MB ] and I don't want that to be sent over the
network causing severe delay. I'm planning to compress that element [
which has only 0-9 and A-F characters ] alone and then embed the
compressed one instead of the original element thereby reducing the
size of the XML file to be sent over the network. At the receiving side
I can write the inflation logic inside the Parser.

Does anyone know about any utility to do the compression or does anyone
know of better way of implementing this ?

Thanx,
Prem.

Sent via Deja.com http://www.deja.com/
Before you buy.

2. &rpt.qty doesn't work

3. Text Compression, Image Compression and GEnetic Algorithms

4. Win2K backup

5. Which program to transform a scanned text-page in a text ?

6. Is it possible to make goodlooking document with mathematical symbols using matlab. How?

7. HP6200...Whats the smallest format to save text images(not using text for.)??

8. a2091 - Garbled ROMS - WTF?

9. Is Excel text not really text?

10. Getting text entry in one cell to return text entry in adjoining cell

11. 123 Formula (Text + Cell Reference + Text)

12. Convert XL or comma dlimited text to lineat text?