How do I remove "non-ASCII" characters from a file?

How do I remove "non-ASCII" characters from a file?

Post by B. Lun » Mon, 13 Sep 1999 04:00:00



Actually, the problem is that I'd like to remove any lines from a "text"
file which contain non-ASCII characters. "non-ASCII" may not be technically
correct; I'm referring to the normal characters which can be entered from a
keyboard, but NOT the control characters.

OS is FreeBSD. The man page for grep says there are some pre-defined
character classes which should let me do this, but I can't figure out the
right syntax. I could also do it with sed I suppose but don't know the right
RE. Can anyone help with an example?

Thanks.


 
 
 

How do I remove "non-ASCII" characters from a file?

Post by Philip Rowland » Tue, 14 Sep 1999 04:00:00



> Actually, the problem is that I'd like to remove any lines from a "text"
> file which contain non-ASCII characters. "non-ASCII" may not be technically
> correct; I'm referring to the normal characters which can be entered from a
> keyboard, but NOT the control characters.

The class of characters I think you mean is usually referred to as the
printable characters.

Quote:> OS is FreeBSD. The man page for grep says there are some pre-defined
> character classes which should let me do this, but I can't figure out the
> right syntax. I could also do it with sed I suppose but don't know the
> right RE. Can anyone help with an example?

Since you're substituting, use sed, like this:

sed "s/[^[:print:]]//g" < binary-file

[^[:print:]] means "any character not in the class of printable
characters"
g means "substitute it everywhere you find it", as opposed to the first
instance per line only (although "line" slightly loses its meaning in this
context).

Phil

 
 
 

How do I remove "non-ASCII" characters from a file?

Post by Claudio Sprenge » Fri, 24 Sep 1999 04:00:00


why don't you try the `tr' command? it't very useful for these sorts of
problems.
claudio

> Actually, the problem is that I'd like to remove any lines from a "text"
> file which contain non-ASCII characters. "non-ASCII" may not be technically
> correct; I'm referring to the normal characters which can be entered from a
> keyboard, but NOT the control characters.

> OS is FreeBSD. The man page for grep says there are some pre-defined
> character classes which should let me do this, but I can't figure out the
> right syntax. I could also do it with sed I suppose but don't know the right
> RE. Can anyone help with an example?

> Thanks.



 
 
 

How do I remove "non-ASCII" characters from a file?

Post by Kermit Lowry, II » Fri, 24 Sep 1999 04:00:00





> > Actually, the problem is that I'd like to remove any lines from a
"text"
> > file which contain non-ASCII characters. "non-ASCII" may not be
technically
> > correct; I'm referring to the normal characters which can be
entered from a
> > keyboard, but NOT the control characters.

> sed "s/[^[:print:]]//g" < binary-file

Since he wants to remove the entire line, wouldn't he want something
like:

sed "/[^[:print:]]/d" $filename

?
-- Kermit Lowry, III
----------------
"Only you can prevent forest fires!" -Smoky

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.