fgrep -easy question

fgrep -easy question

Post by Wells S. Hans » Wed, 12 Feb 1992 06:19:26



grep is so easy to control, but sometimes I have to use fgrep because
I need to have the strings I am searching for read from a file. I have
a shell script that looks through a data base something like this:

        product.code    current.invent    cost          supplier.code

        x2541           3605t             220           b251x2541

The fields are seperated by tabs and several of them contain
combinations of numbers and letters. I don't know how to control fgrep
so that it will return only records with (for example) the  product
code x254 w/o also finding records that contain the supplier
code in the example here (which happens also to contain "x2541"). I
know this is an easy (even silly) question for some of you, but I
can't get this script to work right.

Thanks!

                                        -wells

--



     1050 East 59th Street, Chicago, Il. 60637    #   (312) 702 4597

 
 
 

fgrep -easy question

Post by Jonathan I. Kame » Wed, 12 Feb 1992 10:48:19



|> Distribution: comp.unix.programmer comp.unix.questions

This is wrong.  In this case, you should have left this line blank.  If you're
confused about the "Distribution:" line, read the discussion about it in the
"Answers to Frequently Asked Questions" posting in news.announce.newusers.

|>   product.code    current.invent    cost          supplier.code
|>        
|>   x2541           3605t             220           b251x2541
|>
|> The fields are seperated by tabs and several of them contain
|> combinations of numbers and letters. I don't know how to control fgrep
|> so that it will return only records with (for example) the  product
|> code x254 w/o also finding records that contain the supplier
|> code in the example here (which happens also to contain "x2541"). I
|> know this is an easy (even silly) question for some of you, but I
|> can't get this script to work right.

Don't use fgrep, use awk:

        awk '$1 == x2541'

You could also use perl:


--

MIT Information Systems/Athena              Moderator, news.answers
              (Send news.answers-related correspondence


 
 
 

fgrep -easy question

Post by Carl Edm » Wed, 12 Feb 1992 13:09:18


Jonathan I. Kamens writes


> |>      product.code    current.invent    cost          supplier.code
> |>        
> |>      x2541           3605t             220           b251x2541
> |>
> |> The fields are seperated by tabs and several of them contain
> |> combinations of numbers and letters. I don't know how to control
> |> fgrep so that it will return only records with (for example) the
> |> product code x254 w/o also finding records that contain the
> |> supplier code in the example here (which happens also to contain
> |> "x2541"). I know this is an easy (even silly) question for some of
> |> you, but I can't get this script to work right.

> Don't use fgrep, use awk:

>    awk '$1 == x2541'

> You could also use perl:



No, better use (Where <TAB> stands for actually hitting the tab key):

grep "^x2541<TAB>"

Or e.g. if you wanted to match for the cost of 220:

grep "^[A-Za-z0-9]*<TAB><TAB>*[A-Za-z0-9]*<TAB><TAB>*220<TAB>"

Why ?

% ll /bin/awk /usr/local/bin/perl
-rwxr-xr-x  1 root        57344 Oct 22  1990 /bin/awk*
-rwxr-xr-x  1 root       207788 Nov 21 10:41 /usr/local/bin/perl*

% ll /bin/grep
-rwxr-xr-x  1 root         4060 Oct 22  1990 /bin/grep*

If you want to put this into a frequently used script, this definitely  
is worth the trouble.

        Carl Edman

 
 
 

fgrep -easy question

Post by Trond Kand » Wed, 12 Feb 1992 21:10:19


It would bee much easier to use awk to do this ...

Awk is a pattern matching language that is created to handle problems like this.

Book:
The awk programming language

Alfred V. Aho
Peter J. Weinberger
Brian W. Kernighan

Addison Wesley Publishing Company
ISBN 0-201-07981-X

Good luck!

Trond Kandal.
--
System-administrator: Trond Kandal, Institutt for Informatikk, Universitetet i


"I'm the king, I can do anything..." Jim Morrison.

 
 
 

fgrep -easy question

Post by Jerry Pe » Mon, 17 Feb 1992 21:42:17



Quote:>    product.code    current.invent    cost          supplier.code

>    x2541           3605t             220           b251x2541

> The fields are seperated by tabs and several of them contain
> combinations of numbers and letters. I don't know how to control
> fgrep so that it will return only records with (for example) the
> product code x254 w/o also finding records that contain the
> supplier code in the example here (which happens also to contain
> "x2541"). I know this is an easy (even silly) question for some of
> you, but I can't get this script to work right.


awk or perl.



Quote:> Or e.g. if you wanted to match for the cost of 220:

> grep "^[A-Za-z0-9]*<TAB><TAB>*[A-Za-z0-9]*<TAB><TAB>*220<TAB>"

> Why?

> % ll /bin/awk /usr/local/bin/perl /bin/grep
> -rwxr-xr-x  1 root        57344 Oct 22  1990 /bin/awk*
> -rwxr-xr-x  1 root         4060 Oct 22  1990 /bin/grep*
> -rwxr-xr-x  1 root       207788 Nov 21 10:41 /usr/local/bin/perl*

> If you want to put this into a frequently used script, this definitely  
> is worth the trouble.

It's true that grep is smaller than awk and perl.  But the smallest
binary isn't always the fastest.  On our Zenith 386 running Interactive
3.2, egrep is twice as big as grep, but it's also twice as fast here.

To test it, I made a 5000-line file named "big".  Every line but one
was the same: one line had "240" instead of "220" in the third field.


x2541           3605t             240           b251x2541
20.9u 0.4s 0:13 163%

% time awk '$3 == 240' big
x2541           3605t             240           b251x2541
13.0u 0.4s 0:08 167%

% time grep "^[A-Za-z0-9]*         *[A-Za-z0-9]*           *  240  " big
x2541           3605t             240           b251x2541
2.2u 0.3s 0:01 250%

% time egrep "^[A-Za-z0-9]*                *[A-Za-z0-9]*           *  240  " big
x2541           3605t             240           b251x2541
1.1u 0.3s 0:00 135%

On a system with one user, perl (the slowest) took 13 clock seconds and
21 CPU seconds.  egrep (the fastest) took less than 1 clock second and
less than 2 CPU seconds.

I'm not trying to start a battle between perl and awk and everything else
here.  I'm also not trying to prove anyone else wrong, or to say that
egrep is always the fastest grep (though it usually seems to be, to me).
I just wanted to point out that the smallest binary isn't always fastest.


 
 
 

fgrep -easy question

Post by Tom Christians » Tue, 18 Feb 1992 02:03:15




:>   product.code    current.invent    cost          supplier.code
:>        
:>   x2541           3605t             220           b251x2541
:>
:
:It's true that grep is smaller than awk and perl.  But the smallest
:binary isn't always the fastest.  On our Zenith 386 running Interactive
:3.2, egrep is twice as big as grep, but it's also twice as fast here.
:
:To test it, I made a 5000-line file named "big".  Every line but one
:was the same: one line had "240" instead of "220" in the third field.
:

:x2541          3605t             240           b251x2541
:20.9u 0.4s 0:13 163%
:
:% time awk '$3 == 240' big
:x2541          3605t             240           b251x2541
:13.0u 0.4s 0:08 167%
:
:% time grep "^[A-Za-z0-9]*                *[A-Za-z0-9]*           *  240  " big
:x2541          3605t             240           b251x2541
:2.2u 0.3s 0:01 250%
:
:% time egrep "^[A-Za-z0-9]*               *[A-Za-z0-9]*           *  240  " big
:x2541          3605t             240           b251x2541
:1.1u 0.3s 0:00 135%
:
:On a system with one user, perl (the slowest) took 13 clock seconds and
:21 CPU seconds.  egrep (the fastest) took less than 1 clock second and
:less than 2 CPU seconds.
:
:I'm not trying to start a battle between perl and awk and everything else
:here.  I'm also not trying to prove anyone else wrong, or to say that
:egrep is always the fastest grep (though it usually seems to be, to me).
:I just wanted to point out that the smallest binary isn't always fastest.

egrep is often much faster that grep.   In fact, on my system, it's not
just 2x as on Jerry's, but 5x.

I easily concede that egrep is the right tool to use here.  Nonetheless,
note that forcing perl into awk mode is not the optimal way to employ
it for this problem.  One of the reasons perl can often beat awk is that
there's not need to split every line in perl as there is in awk.  Even
if you're going to do so, there's no reason to keep around the extra fields.

Using a datafile just like Jerry's, here are the timings from my machine.
I've added a two better ways to do it in perl, as well as one in sed,
which just goes to show you that there's more than one way to do it.
The blanks in the regexps are tabs, and I've omitted the output
for brevity.


    2.277u 0.097s 0:01.43 50.0% 0+0k 4+7io 52pf+0w

    time perl -ne 'print if (split)[2] eq "240";' big
    1.369u 0.093s 0:00.88 50.0% 0+0k 5+5io 52pf+0w

    time perl -ne "print if /^[A-Za-z0-9]*\t[A-Za-z0-9]*\t240\t/" big
    0.267u 0.089s 0:00.21 50.0% 0+0k 4+5io 51pf+0w

    time sed -ne "/^[A-Za-z0-9]*    [A-Za-z0-9]*    240     /p" big
    0.364u 0.117s 0:00.29 50.0% 0+0k 6+0io 17pf+0w

    time awk '$3 == 240' big
    1.404u 0.079s 0:00.89 50.0% 0+0k 4+0io 27pf+0w

    time grep "^[A-Za-z0-9]*   [A-Za-z0-9]*    240     " big
    1.109u 0.068s 0:00.71 50.0% 0+0k 1+0io 14pf+0w

    time egrep "^[A-Za-z0-9]*      [A-Za-z0-9]*    240     " big
    0.233u 0.122s 0:00.21 50.0% 0+0k 0+0io 12pf+0w

    time gnugrep "^[A-Za-z0-9]* [A-Za-z0-9]*    240     " big
    0.097u 0.040s 0:00.08 50.0% 0+0k 1+0io 27pf+0w

As you can see, the 2nd awkish perl solution beats awk by a bit, and the
seddish perl solution beats sed by a substantial margin.  In fact, of
standard system utilities, only egrep beats it, and then only by a little
bit.  But let's not forget that GNU grep timing, which is clearly the best.

I just didn't want to let a perl timing go by that was an order of
magnitude slower than it needed to be.

--tom

 
 
 

fgrep -easy question

Post by Root Boy J » Sat, 22 Feb 1992 10:56:32



>egrep is often much faster than grep.

It is my experience that egrep is *always* faster than grep.
In fact, the only real reason to use plain grep is because
egrep doesn't support -i. However, GNU egrep has fixed that.
--

                Drawing Crazy Patterns on Your Screen
 
 
 

fgrep -easy question

Post by Bruce Robert Lars » Sat, 22 Feb 1992 22:57:56



Quote:>It is my experience that egrep is *always* faster than grep.
>In fact, the only real reason to use plain grep is because
>egrep doesn't support -i. However, GNU egrep has fixed that.

The summary says it all.  You can use tagged regular expressions
with grep but not with egrep, so don't bury your grep because it's
not dead yet!  

Bruce
--
Bruce R Larson

Integral Resources, MIlton MA

 
 
 

fgrep -easy question

Post by Michael Salm » Sun, 23 Feb 1992 23:12:17




|>
|> >egrep is often much faster than grep.
|>
|> It is my experience that egrep is *always* faster than grep.
|> In fact, the only real reason to use plain grep is because
|> egrep doesn't support -i. However, GNU egrep has fixed that.

The egrep supplied with SunOS supports -i but not the concept of words, which is in
my opinion more important.

egrep will in general be faster than grep because it uses a deterministic finite
state automaton while grep use a non-deterministic one (or at that was the case in
the beginning), it has hence a single state rather than a set of states that it can
be in. The conversion from an FSA (as generated from the regular expression) is
non-trivial and may take longer than the time that grep takes to make the search,
though probably this will not be the case. The story that I heard was that egrep
got its name because it was a grep that was exponential in space and time, i.e.
certain patterns can take a lot of time and memory to compile.

--

Michael Salmon

#include        <standard.disclaimer>
#include        <witty.saying>
#include        <fancy.pseudo.graphics>

Ericsson Telecom AB
Stockholm

 
 
 

fgrep -easy question

Post by Dave Dec » Wed, 26 Feb 1992 12:03:05


POSIX.2 grep supports all the functionality of all the former standard greps.

Dave

 
 
 

fgrep -easy question

Post by Root Boy J » Fri, 28 Feb 1992 10:50:47




?>It is my experience that egrep is *always* faster than grep.
?>In fact, the only real reason to use plain grep is because
?>egrep doesn't support -i. However, GNU egrep has fixed that.
?
?
?The summary says it all.  You can use tagged regular expressions
?with grep but not with egrep, so don't bury your grep because it's
?not dead yet!  

GNU egrep will do what either grep will do. fgrep is still different.
--

                Drawing Crazy Patterns on Your Screen

 
 
 

fgrep -easy question

Post by Bruce Albrec » Fri, 28 Feb 1992 14:30:48




>:It's true that grep is smaller than awk and perl.  But the smallest
>:binary isn't always the fastest.  On our Zenith 386 running Interactive
>:3.2, egrep is twice as big as grep, but it's also twice as fast here.
>:
>:To test it, I made a 5000-line file named "big".  Every line but one
>:was the same: one line had "240" instead of "220" in the third field.

[timings and commentary deleted]

Quote:>Using a datafile just like Jerry's, here are the timings from my machine.
>I've added a two better ways to do it in perl, as well as one in sed,
>which just goes to show you that there's more than one way to do it.
>The blanks in the regexps are tabs, and I've omitted the output
>for brevity.

[timings deleted, perl timings on my machine, below]

Quote:>As you can see, the 2nd awkish perl solution beats awk by a bit, and the
>seddish perl solution beats sed by a substantial margin.  In fact, of
>standard system utilities, only egrep beats it, and then only by a little
>bit.  But let's not forget that GNU grep timing, which is clearly the best.

>I just didn't want to let a perl timing go by that was an order of
>magnitude slower than it needed to be.

The real problem is that split is an expensive operation.  Using only a
pattern match is a clear win, but a combination of pattern matching first
before doing a split can also factor in big savings if it cuts out most of
the split operations:


6.5u 0.4s 0:07 97% 12+6io 0pf+0w                                

time perl -ne 'print if (split)[2] eq "240";' big
5.0u 0.4s 0:05 97% 11+7io 0pf+0w

time perl -ne 'print if /240/ && (split)[2] eq "240";' big
1.1u 0.3s 0:01 95% 0+4io 0pf+0w

time perl -ne 'if (/240/) { print if (split)[2] eq "240"; }' big
1.0u 0.3s 0:01 84% 33+6io 0pf+0w

time perl -ne "print if /^[A-Za-z0-9]*\t[A-Za-z0-9]*\t240\t/" big
0.8u 0.3s 0:01 94% 0+4io 0pf+0w

--

Youth is wasted on the young.

 
 
 

fgrep -easy question

Post by Larry Wa » Sat, 29 Feb 1992 09:39:24





: ?>It is my experience that egrep is *always* faster than grep.
: ?>In fact, the only real reason to use plain grep is because
: ?>egrep doesn't support -i. However, GNU egrep has fixed that.
: ?
: ?
: ?The summary says it all.  You can use tagged regular expressions
: ?with grep but not with egrep, so don't bury your grep because it's
: ?not dead yet!  
:
: GNU egrep will do what either grep will do. fgrep is still different.

But note that GNU egrep reverts to grep NDFA behavior to do what grep does,
and thus may not buy you much in that situation.

Larry

 
 
 

fgrep -easy question

Post by Bruce Robert Lars » Mon, 02 Mar 1992 02:08:01



Quote:

>GNU egrep will do what either grep will do. fgrep is still different.

Nope.  GNU egrep doesn't handle tagged regular expressions while
GNU grep does.

Try these commands, which use use a tagged regexp to match lines
in which the string "grep" is repeated:

1)  echo 'egrep is not grep' | grep '\(grep\).*\1'
2)  echo 'egrep is not grep' | egrep '\(grep\).*\1'

'1' produces "egrep is not grep" as output, while '2' produces
no output.  These results were produced using GNU e?grep-1.5.

I first read about tagged regular expressions on pages 326-7 of
"The UNIX Programming Environment", by Kernighan and Pike.  

Bruce
--
Bruce R Larson

Integral Resources, Milton MA

 
 
 

fgrep -easy question

Post by Joe Ilacq » Thu, 05 Mar 1992 04:11:52



>The egrep supplied with SunOS supports -i but not the concept of

<words, which is in my opinion more important.

        You might check out 'agrep', which supports both '-i' and
'-w'.  The "a" in agrep is for "approximate", you can tell agrep how
many errors to allow in a match.  Beyond this it seems to be pretty
much a egrep clone, but it has some added features including the
ability to display more of the context of the match.

        "agrep" can be FTPed from "optima.cs.arizona.edu".

->Spike
--
The World - Public Access Unix - +1 617-739-9753  24hrs {3,12,24,96,192}00bps

 
 
 

1. Easy question , Easy answer ?

Hi gurus,

I have following question, is there an easy answer ?

My LAN has couple of PC's with WIN95  and one Solaris 2.7 machine.
PCs are configured to share printers, so that I can print from PC1 at
printer of PC2.

I can also ftp, telnet ping,... to Solaris.
Is it possible to configure WIN95's printer to be also network printer for
Solaris ?

Thx ,

Milos

p.s.  If any answer, please send  a copy also to my direct address

2. help: duplicate MAC address

3. easy easy question

4. How do you restore the display after 'cat'-ing a file that distorts it?

5. easy easy easy one

6. Are [[ and (( posix-compliant shell constructs?

7. newbie fgrep question

8. iomega JAZZ drive supported under redhat 4.2 ?

9. fgrep Question, File Size Limitations

10. fgrep question

11. neb question, too easy Find File question...

12. Easy to setup, Easy to go -- Linux... the working mom's OS :) (satire)

13. Any suggestion for an easy, easy HTML book ?