Regular Expression Syntax Limitation?

Regular Expression Syntax Limitation?

Post by Adam Warne » Fri, 02 Mar 2001 11:07:19



Hi all,

I've been trying to use grep to filter a file whenever it says Copyright
[some name other than Microsoft].

I thought this expression syntax might be appropriate:

'opyright.*[^M][^i][^c][^r][^o][^s][^o][^f][^t]'

However this still finds strings that are Copyright ... Microsoft (e.g.
Copyright (c) 1997 - 1999 Microsoft Corporation). This makes sense
because, for example, " (c) 1997" doesn't match "Microsoft".

Any advice about how to approach this situation differently?

Many thanks,
Adam

 
 
 

Regular Expression Syntax Limitation?

Post by John W. Krah » Fri, 02 Mar 2001 13:52:19



> Hi all,

> I've been trying to use grep to filter a file whenever it says Copyright
> [some name other than Microsoft].

> I thought this expression syntax might be appropriate:

> 'opyright.*[^M][^i][^c][^r][^o][^s][^o][^f][^t]'

> However this still finds strings that are Copyright ... Microsoft (e.g.
> Copyright (c) 1997 - 1999 Microsoft Corporation). This makes sense
> because, for example, " (c) 1997" doesn't match "Microsoft".

> Any advice about how to approach this situation differently?

$ perl -n0777 -e '/(?i:copyright).*?(?!Microsoft)/ and print "$ARGV\n"'
*

John

 
 
 

Regular Expression Syntax Limitation?

Post by Adam Warne » Fri, 02 Mar 2001 18:29:09


Thank you John,

Quote:>> I've been trying to use grep to filter a file whenever it says
>> Copyright
>> [some name other than Microsoft].

>> I thought this expression syntax might be appropriate:

>> 'opyright.*[^M][^i][^c][^r][^o][^s][^o][^f][^t]'

>> However this still finds strings that are Copyright ... Microsoft (e.g.
>> Copyright (c) 1997 - 1999 Microsoft Corporation). This makes sense
>> because, for example, " (c) 1997" doesn't match "Microsoft".

>> Any advice about how to approach this situation differently?
> $ perl -n0777 -e '/(?i:copyright).*?(?!Microsoft)/ and print "$ARGV\n"'
> *

Thanks. I am REALLY going to have to learn Perl. Here was my shell kludge
(I worked out how to do this today):

for file in *.*
   do

      strings -a -f $file | tr -s '\n' '|' | sed -e
      s/[Cc][Oo][Pp][Yy][Rr][Ii][Gg][Hh][Tt].*[Mm][Ii][Cc][Rr][Oo][Ss][Oo][Ff][Tt]/XXXXXXXX/
      | tr -s '|' '\n' | grep -B2 -C2 -i 'copyright' >> output/fulllist

   done

strings -a -f $file strips the strings out of the binary file. tr -s '\n' '|' converts
all line breaks to a dummy character "|" (this is so sed can operate over
multiple lines). sed then finds any occurances of text such as Copyright
... Microsoft and renames that text as XXXXXX. The line breaks are then
reconstructed. grep then processes the changes to determine if copyright
notices remain. They will remain if Copyright doesn't match with
Microsoft.

:-)

Regards,
Adam

 
 
 

Regular Expression Syntax Limitation?

Post by John W. Krah » Fri, 02 Mar 2001 21:29:41



> Thank you John,

> > $ perl -n0777 -e '/(?i:copyright).*?(?!Microsoft)/ and print "$ARGV\n"'
> > *

> Thanks. I am REALLY going to have to learn Perl. Here was my shell kludge
> (I worked out how to do this today):

> for file in *.*
>    do

>       strings -a -f $file | tr -s '\n' '|' | sed -e
>       s/[Cc][Oo][Pp][Yy][Rr][Ii][Gg][Hh][Tt].*[Mm][Ii][Cc][Rr][Oo][Ss][Oo][Ff][Tt]/XXXXXXXX/
>       | tr -s '|' '\n' | grep -B2 -C2 -i 'copyright' >> output/fulllist

>    done

> strings -a -f $file strips the strings out of the binary file. tr -s '\n' '|' converts
> all line breaks to a dummy character "|" (this is so sed can operate over
> multiple lines).

  ^^^^^^^^^^^^^^

If you want the perl regex to match over multiple lines change it to:
$ perl -n0777 -e '/(?i:copyright).*?(?!Microsoft)/s and print "$ARGV\n"'
*

I made the assumtion that the message "Copyright ... Microsoft" would
not have a newline (\n) between the two words (which I think is a safe
assumtion for most copyright notices.)

John

 
 
 

Regular Expression Syntax Limitation?

Post by Dave Bro » Fri, 02 Mar 2001 12:57:53




>I've been trying to use grep to filter a file whenever it says Copyright
>[some name other than Microsoft].

How about:    grep 'Copyright' | grep -v 'Microsoft'

--
Dave Brown  Austin, TX

 
 
 

Regular Expression Syntax Limitation?

Post by Harlan Grov » Sat, 03 Mar 2001 04:29:31



...

Quote:>I've been trying to use grep to filter a file whenever it says Copyright
>[some name other than Microsoft].
...
>'opyright.*[^M][^i][^c][^r][^o][^s][^o][^f][^t]'

Already good answers. The reason this doesn't work is because the .* is
'greedy'. As long as there are at least 9 characters after 'Microsoft', the
.* will happily eat 'Microsoft'.
 
 
 

Regular Expression Syntax Limitation?

Post by Adam Warne » Sat, 03 Mar 2001 05:21:14




Quote:>> for file in *.*
>>    do

>>       strings -a -f $file | tr -s '\n' '|' | sed -e
>>       s/[Cc][Oo][Pp][Yy][Rr][Ii][Gg][Hh][Tt].*[Mm][Ii][Cc][Rr][Oo][Ss][Oo][Ff][Tt]/XXXXXXXX/
>>       | tr -s '|' '\n' | grep -B2 -C2 -i 'copyright' >> output/fulllist

>>    done

>> strings -a -f $file strips the strings out of the binary file. tr -s
>> '\n' '|' converts all line breaks to a dummy character "|" (this is so
>> sed can operate over multiple lines).
>   ^^^^^^^^^^^^^^

> If you want the perl regex to match over multiple lines change it to:
> $ perl -n0777 -e '/(?i:copyright).*?(?!Microsoft)/s and print "$ARGV\n"'
> *

> I made the assumtion that the message "Copyright ... Microsoft" would
> not have a newline (\n) between the two words (which I think is a safe
> assumtion for most copyright notices.)

Thanks. The assumption wasn't safe. Many of the copyrights went:

LegalCopyright
   Microsoft

Regards,
Adam

 
 
 

1. Regular Expression Limitation in AIX?

In using the following syntax to check for a string in a specified
position of a record, it works up to position 256 but fails for larger
values. Is this a built in limit or is there some way around this?

/^.\{pos\}string

works for /^.\{255\}ABC         (searching for ABC  in position 256
but fails for /^.\{256\}ABC     (searching for ABC in position 257)

We looked at awk but the records are fixed field format without reliable
field separators.

(I tried this on UnixWare 7 and it apparently works all the way out to the
2048 character line length limit.)

Thank you,
Lucky

Lucky Leavell                      Phone: (800) 481-2393 (US/Canada)
UniXpress - Your Source for SCO       OR: (812) 366-4066
1560 Zoar Church Road NE             FAX: (812) 366-3618

WWW Home Page:  http://www.UniXpress.com  

2. Changing setenv PATH

3. Help with samples of regular expressions syntax please

4. Minerva mSQL perl adapter questions

5. regular expressions

6. SSI & CGI security question

7. regular expression library calls caused core-dumped

8. What is the max size of an environment variables in schell

9. regular expression

10. regular expression in C programming

11. regular expressions

12. Regular expression matching in sh

13. Why there isn't negation operator for regular expression ?