Grep/Perl and regular expressions

Grep/Perl and regular expressions

Post by Igor Spiva » Wed, 26 Jan 2000 04:00:00



I am attempting to Grep through a file, and extract information
beginning with a key word (which is not at the beginning of a line) and
until the new line character. Is that possible at all, (i am looking for
the expression that will let me do that) or should I turn to perl?

thanks a lot,

I.D.S.

 
 
 

Grep/Perl and regular expressions

Post by John Doher » Wed, 26 Jan 2000 04:00:00


| If I understand your problem statement correctly, this should do ya':
|   sed -n 's/.*\(keyword\)/\1/p'

  $ cat infile
  blah blah foo and stuff that follows "foo"
  whatever bar baz foo and more stuff you want to print
  yadda yadda foo yadda yadda yadda
  foo is at the start of this line, as it happens
  but the end of this line is foo
  this line doesn't contain the magic word
  $ sed -n 's/.*\(foo\)/\1/p' infile
  foo"
  foo and more stuff you want to print
  foo yadda yadda yadda
  foo is at the start of this line, as it happens
  foo
  $

A couple of things are apparent: (1) the command above may not produce
the desired output for an input line that happens to contain more than
one occurrence of the pattern to match (depending on what the desired
output is), and (2) since the pattern to match is invariable, there is
no need to tag it in the target string and backreference it in the
replacement string -- it might as well just be a literal string in both
cases. That is, this works just as well:

  $ sed -n 's/.*foo/foo/p' infile
  foo"
  foo and more stuff you want to print
  foo yadda yadda yadda
  foo is at the start of this line, as it happens
  foo
  $

Given the first line of input above, how do you produce output that
consists of the first occurrence of "foo" and everything that follows
it? I honestly don't know how to do that with sed, although with some
other programs I use that support regular expressions, it would be
easy. For example, in one program I use, ".:*foo" would match the
shortest string of characters that were followed by "foo", as opposed
to ".*foo", which matches the longest such string. If I really want to
match the shortest such string, how do I match it with sed?

--

 
 
 

Grep/Perl and regular expressions

Post by Ken Pizzi » Thu, 27 Jan 2000 04:00:00



>I am attempting to Grep through a file, and extract information
>beginning with a key word (which is not at the beginning of a line) and
>until the new line character. Is that possible at all, (i am looking for
>the expression that will let me do that) or should I turn to perl?

The grep family will only select whole matching lines from files.
To massage the matched lines further, use sed (or perl, or awk)
instead.  If I understand your problem statement correctly, this
should do ya':
  sed -n 's/.*\(keyword\)/\1/p'

                --Ken Pizzini

 
 
 

Grep/Perl and regular expressions

Post by Ken Pizzi » Thu, 27 Jan 2000 04:00:00



>(2) since the pattern to match is invariable, there is
>no need to tag it in the target string and backreference it in the
>replacement string -- it might as well just be a literal string in both
>cases.

True, but if the string is complex it is much easier to
type as a backreference.  For whatever reason it seemed more
natural to me to post the backreference solution, but either
way works fine.

Quote:> how do you produce output that
>consists of the first occurrence of "foo" and everything that follows
>it?

Problably the easiest way is along the lines of:
  sed 's/%/%x/g; s/foo/%%/; s/.*%%//; s/%x/%/g'
or perhaps:
  sed 's/foo/\
/; s/.*\n//'

                --Ken Pizzini

 
 
 

Grep/Perl and regular expressions

Post by Greg Bac » Thu, 27 Jan 2000 04:00:00




: I am attempting to Grep through a file, and extract information
: beginning with a key word (which is not at the beginning of a line) and
: until the new line character. Is that possible at all, (i am looking for
: the expression that will let me do that) or should I turn to perl?

You make Perl sound like a last resort or something. :-)  The following
does what you want (and addresses the problem that John Doherty pointed
out):

    % perl -pe 's/^.*?(keyword.*)$/$1/' file ..

Greg
--
Never underestimate the power of stupid people in large groups.
    -- George Carlin

 
 
 

Grep/Perl and regular expressions

Post by Cal Dunniga » Fri, 28 Jan 2000 04:00:00


: I am attempting to Grep through a file, and extract information
: beginning with a key word (which is not at the beginning of a line) and
: until the new line character. Is that possible at all, (i am looking for
: the expression that will let me do that) or should I turn to perl?

Grep will return the entire line, use sed.
    sed "s/.*\(keyword.*\)/\1/" file

Other posters had similar solutions but may have overlooked the fact
that you wanted everything on the line after the keyword as well.

--
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

      Consulting                   wrong with a world in which Ken
      Modeling                     Thompson lives in obscurity and
      Training                     Bill Gates is a famous billionaire.
//////////////////////////////////////////////////////////////////////

 
 
 

Grep/Perl and regular expressions

Post by Ken Pizzi » Fri, 28 Jan 2000 04:00:00



>    sed "s/.*\(keyword.*\)/\1/" file

>Other posters had similar solutions but may have overlooked the fact
>that you wanted everything on the line after the keyword as well.

All the posts I've seen in this thread kept everything on the
line after the keyword, as requested.  For example:
    sed -n 's/.*\(keyword\)/\1/p'
For lines which contain the keyword this produces identical
results; the only difference is that yours prints all lines
which lack the keyword and this one omits them.  Either script
can easily be adapted to provide the other behavior; the point
is that you don't need the trailing .* in order to preserve
the rest of the line.

                --Ken Pizzini

 
 
 

Grep/Perl and regular expressions

Post by Cal Dunniga » Fri, 28 Jan 2000 04:00:00



:>    sed "s/.*\(keyword.*\)/\1/" file
:>
:>Other posters had similar solutions but may have overlooked the fact
:>that you wanted everything on the line after the keyword as well.
: All the posts I've seen in this thread kept everything on the
: line after the keyword, as requested.  For example:
:     sed -n 's/.*\(keyword\)/\1/p'

Yeah, I realized right after I posted this that it was wrong.  Gonna
have to look up the procedure to kill posts.

--
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

      Consulting                   wrong with a world in which Ken
      Modeling                     Thompson lives in obscurity and
      Training                     Bill Gates is a famous billionaire.
//////////////////////////////////////////////////////////////////////