Postscript file corrupted - extracting text/patching

Postscript file corrupted - extracting text/patching

Post by Theo van der Merw » Sat, 26 May 2001 21:22:12



I have obtained a Postscript file (apparently generated with Microsoft Word)
of which I can only read the first page using gv (an error - moveto - is
generated on the next page). I have the following questions:

a) How do I extract just the text from the Postscript file? How is the raw
text in a Postscript file encoded?

b) Is it possible to fix a corrupted Postscript file (e.g. by extracting the
usable portions to a new file)?

Any help with the above would be greatly appreciated.

Best regards,

 
 
 

Postscript file corrupted - extracting text/patching

Post by Yvan Lorang » Sat, 26 May 2001 23:37:15



Quote:> I have obtained a Postscript file (apparently generated with Microsoft Word)
> of which I can only read the first page using gv (an error - moveto - is
> generated on the next page). I have the following questions:

> a) How do I extract just the text from the Postscript file? How is the raw
> text in a Postscript file encoded?

> b) Is it possible to fix a corrupted Postscript file (e.g. by extracting the
> usable portions to a new file)?

> Any help with the above would be greatly appreciated.

i'm sure there are pgms to accomplish the above but allow me to suggest
reading the file with xpdf - might work!

--
Merci........Yvan          Pour le plein air: Club Vertige
                               http://www.ncf.ca/vertige

 
 
 

Postscript file corrupted - extracting text/patching

Post by John Thompso » Sun, 27 May 2001 06:27:43



Quote:> I have obtained a Postscript file (apparently generated with Microsoft Word)
> of which I can only read the first page using gv (an error - moveto - is
> generated on the next page). I have the following questions:

> a) How do I extract just the text from the Postscript file? How is the raw
> text in a Postscript file encoded?

> b) Is it possible to fix a corrupted Postscript file (e.g. by extracting the
> usable portions to a new file)?

> Any help with the above would be greatly appreciated.

Have you tried "ps2ascii" (comes with ghostscript)?

PS2ASCII(1)             Ghostscript Tools             PS2ASCII(1)

NAME
       ps2ascii  -  Ghostscript translator from PostScript or PDF
       to ASCII

SYNOPSIS
       ps2ascii [ input.ps [ output.txt ] ]
       ps2ascii input.pdf [ output.txt ]

DESCRIPTION
       ps2ascii  uses  gs(1)   to   extract   ASCII   text   from
       PostScript(tm)  or  Adobe  Portable  Document Format (PDF)
       files. If no files are specified on the command  line,  gs
       reads from standard input; but PDF input must come from an
       explicitly-named file, not standard input.  If  no  output
       file  is  specified, the ASCII text is written to standard
       output.

--


 
 
 

Postscript file corrupted - extracting text/patching

Post by Stefano Ghirland » Sun, 27 May 2001 20:24:04



Quote:> I have obtained a Postscript file (apparently generated with
> Microsoft Word) of which I can only read the first page using gv
> (an error - moveto - is generated on the next page).

First of all be sure to use ghostscript 7.0. I was using the old 5.x
an I find 7.0 much improved.

Quote:> a) How do I extract just the text from the Postscript file? How is
> the raw text in a Postscript file encoded?

There should be a ps2ascii utility included with ghostscript.

Quote:> b) Is it possible to fix a corrupted Postscript file (e.g. by
> extracting the usable portions to a new file)?

The utility fixps (probably from the psutils) might help.
I once had luck with file and dd in extracting a postscript readable
by 5.x from a newer postscript file generated some Adobe Program. The
good old file told me something like 'x bytes of garbage at the
beginning, Postscript file from byte x+1 to y, TIFF image from byte
y+1 to z', and with dd I extracted the x+1-to-y part only.

--
Stefano - Hodie septimo Kalendas Iunias MMI est

 
 
 

1. Postscript file corrupted - extracting text/patching

I have obtained a Postscript file (apparently generated with Microsoft Word)
of which I can only read the first page using gv (an error - moveto - is
generated on the next page). I have the following questions:

a) How do I extract just the text from the Postscript file? How is the raw
text in a Postscript file encoded?

b) Is it possible to fix a corrupted Postscript file (e.g. by extracting the
usable portions to a new file)?

Any help with the above would be greatly appreciated.

Best regards,

2. Overflow causes FPE (pl10)

3. Question on Extracting Text From Postscript Files

4. Funiest install ever?

5. Extract sections of delimeted text from postscript file

6. boot floppy -or- lilo doc

7. Extracting lines from a text file that match a certain criteria to another text file

8. How do I permanently change an IP on solaris 7???

9. Script to extract portions of text from a text file

10. extract text from postscript

11. how to convert a postscript file to a text file ??!!

12. Sol2.3 tar corrupts extracted files if 'v' used with stdin, *sometimes*

13. postscript printer is printing the postscript text