How to print only a regular expression with SED or AWK

How to print only a regular expression with SED or AWK

Post by bedive.. » Wed, 20 Dec 2000 00:17:49



I have what I thought was a simple question but I am finding the answer
hard to attain.  I want to match a regular expression on a file and
then only print that regular expression.  Can this be done with SED or
AWK.  I do not want to print the entire line containing that expression.

Example:
Expression
(NT|PC)[1-9]{5}\.

Example of file:
PC55212.cascss.some.domain:PC:WinNT:000055552225:169.187.32.132:NT55212.
decal.some.domain:
PC82275.cascss.some.domain:PC:WinNT:00a024edd70d:169.187.32.133:hpjetadm
in.cascss.some.domain,NT82275.decal.some.domain:
pc82280.cascss.some.domain:PC:WinNT:00a024f106e2:169.187.32.134:NT82280.
decal.some.domain:
temp94422.cascss.some.domain:PC:Win2000:00d0b7be0c21:169.187.32.135::
pc83483.cascss.some.domain:PC:WinNT:0060972caaef:169.187.32.136:NT83483.
decal.some.domain:
hajlaptop.jour.some.domain:Mac:Finder:000502911d7c:169.187.32.137::
loaner3.cascss.some.domain:PC:Win2000:0002b31115c3:169.187.32.138:NT9443
7.decal.some.domain:
csci543proto1.cascss.some.domain:PC:WinNT:00d0b72533b9:169.187.32.139:NT
92729.decal.some.domain:
CSCI543proto2.cascss.some.domain:PC:WinNT:00d0b725338e:169.187.32.140:NT
92743.decal.some.domain:

Example of desired output:
55212
82275
82280
94422
83483
94437
92729
92743

Any hep would be greatly appreciated.  Is there a way to do this with
UNIX Text Processing utils?

Thanks,
Robert

Sent via Deja.com
http://www.deja.com/

 
 
 

How to print only a regular expression with SED or AWK

Post by Matthew Land » Wed, 20 Dec 2000 00:25:41



> I have what I thought was a simple question but I am finding the answer
> hard to attain.  I want to match a regular expression on a file and
> then only print that regular expression.  Can this be done with SED or
> AWK.  I do not want to print the entire line containing that expression.

> Example:
> Expression
> (NT|PC)[1-9]{5}\.

> Example of file:
> PC55212.cascss.some.domain:PC:WinNT:000055552225:169.187.32.132:NT55212.
> decal.some.domain:
> PC82275.cascss.some.domain:PC:WinNT:00a024edd70d:169.187.32.133:hpjetadm
> in.cascss.some.domain,NT82275.decal.some.domain:
> pc82280.cascss.some.domain:PC:WinNT:00a024f106e2:169.187.32.134:NT82280.
> decal.some.domain:
> temp94422.cascss.some.domain:PC:Win2000:00d0b7be0c21:169.187.32.135::
> pc83483.cascss.some.domain:PC:WinNT:0060972caaef:169.187.32.136:NT83483.
> decal.some.domain:
> hajlaptop.jour.some.domain:Mac:Finder:000502911d7c:169.187.32.137::
> loaner3.cascss.some.domain:PC:Win2000:0002b31115c3:169.187.32.138:NT9443
> 7.decal.some.domain:
> csci543proto1.cascss.some.domain:PC:WinNT:00d0b72533b9:169.187.32.139:NT
> 92729.decal.some.domain:
> CSCI543proto2.cascss.some.domain:PC:WinNT:00d0b725338e:169.187.32.140:NT
> 92743.decal.some.domain:

> Example of desired output:
> 55212
> 82275
> 82280
> 94422
> 83483
> 94437
> 92729
> 92743

> Any hep would be greatly appreciated.  Is there a way to do this with
> UNIX Text Processing utils?

> Thanks,
> Robert

> Sent via Deja.com
> http://www.deja.com/

I am no awk expert, but thinking about the problem I would try to use
two functions: split( String, A, [Ere] ) and substr( String, M, [ N ] ).
The first will split the line into elements using your REGEX for the
split mechanism.  I think the FIRST element A[1] will NOT be a match,
but evey valid element past that is a match.  Use substr to cut out the
5 characters from each array element and print.

 - Matt
--
_______________________________________________________________________

   << Comments, views, and opinions are mine alone, not IBM's. >>

 
 
 

How to print only a regular expression with SED or AWK

Post by Kenny McCorma » Wed, 20 Dec 2000 01:32:42



>I have what I thought was a simple question but I am finding the answer
>hard to attain.  I want to match a regular expression on a file and
>then only print that regular expression.  Can this be done with SED or
>AWK.  I do not want to print the entire line containing that expression.

>Example:
>Expression
>(NT|PC)[1-9]{5}\.

Here's a useful AWK function that you should keep handy:

function extract(s,re) {
        match(s,re)
        return substr(s,RSTART,RLENGTH)
        }

And then you work your way through your string with something like:

# Untested - and I may have gotten a detail or two wrong, but you get the
# idea...
{
for (s=$0; t = extract(s,"(NT|PC)[1-9]{5}\\."); s = substr(s,RSTART+RLENGTH))
        print t

Quote:}

 
 
 

How to print only a regular expression with SED or AWK

Post by Jim Mont » Wed, 20 Dec 2000 05:57:41



> I have what I thought was a simple question but I am finding the answer
> hard to attain. I want to match a regular expression on a file and
> then only print that regular expression. Can this be done with SED or
> AWK. I do not want to print the entire line containing that expression.

> Example:
> Expression
> (NT|PC)[1-9]{5}\.

> Example of file:
> PC55212.cascss.some.domain:PC:WinNT:000055552225:169.187.32.132:NT55212.
> decal.some.domain:
> PC82275.cascss.some.domain:PC:WinNT:00a024edd70d:169.187.32.133:hpjetadm
> in.cascss.some.domain,NT82275.decal.some.domain:
> pc82280.cascss.some.domain:PC:WinNT:00a024f106e2:169.187.32.134:NT82280.
> decal.some.domain:
> temp94422.cascss.some.domain:PC:Win2000:00d0b7be0c21:169.187.32.135::
> pc83483.cascss.some.domain:PC:WinNT:0060972caaef:169.187.32.136:NT83483.
> decal.some.domain:
> hajlaptop.jour.some.domain:Mac:Finder:000502911d7c:169.187.32.137::
> loaner3.cascss.some.domain:PC:Win2000:0002b31115c3:169.187.32.138:NT9443
> 7.decal.some.domain:
> csci543proto1.cascss.some.domain:PC:WinNT:00d0b72533b9:169.187.32.139:NT
> 92729.decal.some.domain:
> CSCI543proto2.cascss.some.domain:PC:WinNT:00d0b725338e:169.187.32.140:NT
> 92743.decal.some.domain:

> Example of desired output:
> 55212
> 82275
> 82280
> 94422
> 83483
> 94437
> 92729
> 92743

Well, you won't get THAT output using THIS regular expression pattern:

    (NT|PC)[1-9]{5}\.

If you want to match the string "NT82280", which has a zero in it,
then you'll need to modify the regular expression:

    (NT|PC)[0-9]{5}\.

Quote:> Any help would be greatly appreciated. Is there a way to do this with
> UNIX Text Processing utils?

Perl does this most easily and directly:

    perl -lne 'print $1 while /(?:NT|PC)(\d{5})\./g' filename

See these two comp.lang.awk articles for related information about awk:

    http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=619033725&fmt=text
    http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=509878789&fmt=text

--
Jim Monty

Tempe, Arizona USA

 
 
 

How to print only a regular expression with SED or AWK

Post by Harry Putna » Wed, 20 Dec 2000 09:10:55




> > I have what I thought was a simple question but I am finding the answer
> > hard to attain. I want to match a regular expression on a file and
> > then only print that regular expression. Can this be done with SED or
> > AWK. I do not want to print the entire line containing that expression.

> > Example:
> > Expression
> > (NT|PC)[1-9]{5}\.

> > Example of file:
> > PC55212.cascss.some.domain:PC:WinNT:000055552225:169.187.32.132:NT55212.
> > decal.some.domain:
> > PC82275.cascss.some.domain:PC:WinNT:00a024edd70d:169.187.32.133:hpjetadm
> > in.cascss.some.domain,NT82275.decal.some.domain:
> > pc82280.cascss.some.domain:PC:WinNT:00a024f106e2:169.187.32.134:NT82280.
> > decal.some.domain:
> > temp94422.cascss.some.domain:PC:Win2000:00d0b7be0c21:169.187.32.135::
> > pc83483.cascss.some.domain:PC:WinNT:0060972caaef:169.187.32.136:NT83483.
> > decal.some.domain:
> > hajlaptop.jour.some.domain:Mac:Finder:000502911d7c:169.187.32.137::
> > loaner3.cascss.some.domain:PC:Win2000:0002b31115c3:169.187.32.138:NT9443
> > 7.decal.some.domain:
> > csci543proto1.cascss.some.domain:PC:WinNT:00d0b72533b9:169.187.32.139:NT
> > 92729.decal.some.domain:
> > CSCI543proto2.cascss.some.domain:PC:WinNT:00d0b725338e:169.187.32.140:NT
> > 92743.decal.some.domain:

> > Example of desired output:
> > 55212
> > 82275
> > 82280
> > 94422
> > 83483
> > 94437
> > 92729
> > 92743

If those lines above are supposed to really be on one line in some for
or other.....

Seems like something like this could be adjusted to do the job.

At least with Gnu awk (gawk)

awk -v"RS=(NT|PC)[0-9]+" '{print RT}'

Will print:

PC55212
NT55212
PC82275
NT82275
NT82280
NT83483
NT9443

When run against the lines above even with the quote characters in
place.

So :
awk -v"RS=(NT|PC)[0-9]+" '{sub (/PC|NT/,"",RT);print RT}'

Should be getting close....

55212
55212
82275
82275
82280
83483
9443

I'm pretty sure the regexp can be adjusted to do what you want, using
the above technique.

If you set RS to a regexp like above (-v"RS=REGEXP") then the built in
variable RT will contain the strings that regexp finds.

 
 
 

1. REGULAR EXPRESSIONS; PERL/AWK/SED; SINGLE QUOTES/DOUBLE QUOTES

Hi,

I am trying to look for the string

.uk,|/|end-of-string

this regular expression can be constructed as follows:

'\.uk,|/|$'

Now, suppose I stick the text uk into a variable called $suffix.
I can no longer use single quotes, because I destroy the meaning of
$suffix. If I use double quotes, though, then i Destroy the special
meaning of the metacharacter $.

"\.$suffix,|/|$"

Help!

Never mind,

I wound up using "\.$suffix"',|/|$', but this problem is so persistent
and pervasive in awk, csh, sed, and even perl that I wonder if there
are better-documented solutions.

Also, what is csh's equivalent of the \b,\w,\s constructs in perl?

Sent via Deja.com http://www.deja.com/
Before you buy.

2. Office Connect Lan Modem

3. Awk Regular Expression

4. SPARC 2 message

5. awk pattern as a variable with regular expression

6. router recommendations - linksys BEFSR41 has dead port!

7. variables in regular expressions using awk

8. promiscuous mode under Solaris

9. can you pass a regular expression to be the Record Separator in awk?

10. Awk Regular Expression Problem

11. awk and regular expressions problem

12. Regular expressions in awk

13. Sed and regular expression help