sed: extracting a pattern

sed: extracting a pattern

Post by Tapani Tarvaine » Thu, 12 Feb 1998 04:00:00



This turned out to be harder than I thought:

How can I extract what a pattern matches from a line,
i.e., replace the line with the part that matches the pattern?
Example:

echo "this is a sample line" | sed -n "s/.*\(s[a-z]*e\).*/\1/p"

prints "sample" in this case but will not work in the general
case, e.g., replace "sample" with "ssample" and the first s
will not show up. In effect, the preceding ".*" will cause
the second pattern to find last possible match instead of
the first (longest).

Replacing ".*" with something that can't match the beginning
of the target pattern works, IF you can find a suitable
substitute -- there isn't any that works in every case.

Here's one way that works, more or less:

s/\(mypattern\)/\/&\//
tmatch
d
:match
s/.*\/\(.*\)\/.*/\1/

but it's rather ugly.
Better suggestions, anyone?

--
Tapani Tarvainen

 
 
 

sed: extracting a pattern

Post by Douglas Wilso » Thu, 12 Feb 1998 04:00:00



> This turned out to be harder than I thought:

> How can I extract what a pattern matches from a line,
> i.e., replace the line with the part that matches the pattern?
> Example:

> echo "this is a sample line" | sed -n "s/.*\(s[a-z]*e\).*/\1/p"
> Better suggestions, anyone?

perl, maybe:
perl -p -e 's/.*?(s[a-z]*e).*/$1/' <<!
abcd sample line
!
(returns sample)

or just match at word boundaries:
perl -p -e 's/.*?(\bs[a-z]*e\b).*/$1' <<!
abcd esample spade line
!
(returns spade)

or even build an array of all matching patterns

then find the longest matching string or whatever.

Quote:

> --
> Tapani Tarvainen

Hope that helps,
Douglas Wilson

 
 
 

sed: extracting a pattern

Post by Dr A. N. Walke » Thu, 12 Feb 1998 04:00:00



> Replacing ".*" with something that can't match the beginning
> of the target pattern works, IF you can find a suitable
> substitute -- there isn't any that works in every case.

        You can use embedded newlines to delimit things, as
these cannot appear in the input line.

Quote:> Here's one way that works, more or less:

> s/\(mypattern\)/\/&\//
> tmatch
> d
> :match
> s/.*\/\(.*\)\/.*/\1/

> but it's rather ugly.
> Better suggestions, anyone?

        Firstly, rather than the "ugly" "td:" sequence, I
would use the "!" command.  If I then add an embedded newline,
as proposed above, I get:

        s/\(mypattern\).*/\
        \1/
        /\n/!d
        s/.*\n//

It *should* be possible to do something clever with the "D"
command, or with the hold space, but "sed" doesn't quite do
what I'd like with these.

--
Andy Walker, Maths Dept., Nott'm Univ., UK.

 
 
 

sed: extracting a pattern

Post by Hunter Johns » Thu, 12 Feb 1998 04:00:00




Quote:> This turned out to be harder than I thought:
> How can I extract what a pattern matches from a line, i.e., replace
> the line with the part that matches the pattern?  Example:
> echo "this is a sample line" | sed -n "s/.*\(s[a-z]*e\).*/\1/p"
> prints "sample" in this case but will not work in the general case,
> e.g., replace "sample" with "ssample" and the first s will not show
> up. In effect, the preceding ".*" will cause the second pattern to
> find last possible match instead of the first (longest).

Right, because the .* is greedy also.  This is doing what you stated,
in that the line is replace by the part that matches the pattern.
It's just that the part that matched isn't what you wanted.

Quote:> Replacing ".*" with something that can't match the beginning
> of the target pattern works, IF you can find a suitable
> substitute -- there isn't any that works in every case.

If you have access to perl, this becomes easier:

echo "this is a ssample line" | perl -n -e '/s[a-z]*e/ && print $&, "\n";'

prints

ssample

Hunter
--
J. Hunter Johnson        |  "I know that I came face-to-face with God's    

(937) 865-6800 x5385     |     amazing grace saved a wretch like me."      
Lexis-Nexis, Dayton, OH  | Philip Yancey, _What's So Amazing About Grace?_

 
 
 

1. sed extract pattern from stream

hi

I am very new to sed, I want to match a pattern from STDIN, for example

<table class="navbar">
<tr>
<td rowspan="2" align="right"><a
href="http://www.scripps.edu/">link</td>
</tr>
...

I want to print out all links in the input stream, for example,
http://www.scripps.edu/ , and ignore all other lines without links.

online tutorials for beginners only have examples for string
replacement, but no pattern extraction, I don't know anyone can help me
on this?

if I want to use perl command line to do the same thing, what the
command should be?

thanks

Qianqian

2. Incorrect packet ACK during FTP transfers

3. sed one liner to extract 1 line before & after the pattern

4. I'm just so tired of NT ...

5. SED: extracting the first occurence of a pattern

6. TUMMY!! ZLIB question

7. Pattern matching and extracting the data which matches the pattern

8. command line editing in C shell?

9. sed pattern replacement question with pattern duplication

10. Substitute this pattern for that pattern EXCEPT if this is true with sed???

11. Extracting patterns from input

12. Extracting files with a specific pattern only, from a tarred tape?

13. Extracting files with specific pattern from a tarred tape?