which is faster? (grep....) and another Q

which is faster? (grep....) and another Q

Post by Z-ma » Wed, 07 Feb 1996 04:00:00



Part 1

Which of the following is faster? Is there another yet?

grep foo bar       > .tmp

cat bar | grep foo > .tmp

My guess would be that the first one is fastest. Is there a list
somewhere of different ways of phrasing things, and how they are
dependant on speed? I know there are many other instances of this type
of thing in shell scripting.

Part 2.

Say I have a huge file, and I want to do several searches on it.
Right now I have

grep foo bar
grep fee bar
grep fii bar
.
.
.
grep zzz bar

with the output going through awk and several other things (each time).
Is there a faster way to do this? I'm looking for execution speed, not
lines of script speed. The only other thing I can think of is going
through the file line by line and sorting it into files. That way I only
open the file once. Does that make sense? Better solutions?

Any help, ideas, suggestions would be greatly appreciated.

Thanks

-Z

 
 
 

which is faster? (grep....) and another Q

Post by Bill Zissimopoulo » Thu, 08 Feb 1996 04:00:00


[snip]
Quote:> Which of the following is faster? Is there another yet?

> grep foo bar       > .tmp

> cat bar | grep foo > .tmp

[snip]

The first one is faster I think. And the following could be even faster:
        egrep foo bar > .tmp

[snip]

Quote:> Say I have a huge file, and I want to do several searches on it.
> Right now I have

> grep foo bar
> grep fee bar
> grep fii bar
> .
> .
> .
> grep zzz bar

> with the output going through awk and several other things (each time).
> Is there a faster way to do this? I'm looking for execution speed, not

[snip]

egrep again. Try
        egrep foo|fee|fii|...|zzz bar

man egrep should tell you the details.

Bill
--


http://www-dept.cs.ucl.ac.uk/students/B.Zissimopoulos/

 
 
 

which is faster? (grep....) and another Q

Post by Beirne Konars » Thu, 08 Feb 1996 04:00:00



>Part 1
>Which of the following is faster? Is there another yet?
>grep foo bar       > .tmp
>cat bar | grep foo > .tmp
>My guess would be that the first one is fastest. Is there a list
>somewhere of different ways of phrasing things, and how they are
>dependant on speed? I know there are many other instances of this type
>of thing in shell scripting.

You can always run some tests to see which one is faster.  In this case, though,
logic suggests that the first choice is better, because it only uses one process.
The second method adds the cat process, which while being fast, it is unnecessary.
All cat does is open the file, which grep can do fine.  It is very rare that you
actually need cat in a shell script.

Quote:>Part 2.
>Say I have a huge file, and I want to do several searches on it.
>Right now I have
>grep foo bar
>grep fee bar
>grep fii bar
>.
>.
>.
>grep zzz bar
>with the output going through awk and several other things (each time).
>Is there a faster way to do this? I'm looking for execution speed, not
>lines of script speed. The only other thing I can think of is going
>through the file line by line and sorting it into files. That way I only
>open the file once. Does that make sense? Better solutions?
>Any help, ideas, suggestions would be greatly appreciated.

You can do

egrep "foo|fee|fii|zzz" bar

to look for everything at the same time.

You might also try

fgrep -f file bar

where file contains the strings you want to look for.  This is supposed to be very
fast, but I haven't played with it much.

Beirne
--
-------------------------------------------------------------------------------
Beirne Konarski                 | Visit the unicycling home page at

"Untouched by Scandal"          |      

 
 
 

which is faster? (grep....) and another Q

Post by Mark Styl » Thu, 08 Feb 1996 04:00:00





>>Part 1
>>Which of the following is faster? Is there another yet?

>>grep foo bar       > .tmp
>>cat bar | grep foo > .tmp

>In this case, though, logic suggests that the first choice is
>better, because it only uses one process.

Yeah, I'd go along with that, the cat in the second example is just
redundant.  You could also try:

fgrep foo bar > .tmp

fgrep is supposedly faster than grep, although I've never tested it.
The only think you have to remember with fgrep is that it doesn't
do any pattern matching, it's pure string comparison.

or you could also try:

fgrep foo < bar > .tmp

I'm not sure about the speed of this one, I expect its exactly the same,
but the shell is doing the file handling instead of the grep *shrug*

Quote:>>Part 2.
>>Say I have a huge file, and I want to do several searches on it.
>>Right now I have

>>grep foo bar
>>grep fee bar
>>grep fii bar
>>.
>>.
>>grep zzz bar

>>with the output going through awk and several other things (each time).
>>Is there a faster way to do this? I'm looking for execution speed, not
>>lines of script speed. The only other thing I can think of is going
>>through the file line by line and sorting it into files. That way I only
>>open the file once. Does that make sense? Better solutions?

>You can do

>egrep "foo|fee|fii|zzz" bar

>to look for everything at the same time.
>You might also try

>fgrep -f file bar

>where file contains the strings you want to look for.  

If you are passing the output to awk, you should not need to use
grep at all, awk can do all the pattern matching that grep can, you
could use the following:

awk -f foo.awk bar

where foo.awk contains:

/foo/ { #do processing for foo occurence }
/fee/ { #do processing for fee occurence }
/fii/ { #do processing for fii occurence }
/zzz/ { #do processing for zzz occurence }

This will search each line of the file for all patterns you specify,
and do the relevant processing for the found patterns.

HTH!

--
** Mark Styles aka Small       -- Opinions expressed here are my own --   **
**                             -- unless otherwise specified         --   **
**    "The future will be better tomorrow." - Vice President Dan Quayle   **

 
 
 

which is faster? (grep....) and another Q

Post by Icarus Spar » Thu, 08 Feb 1996 04:00:00




>Part 1

>Which of the following is faster? Is there another yet?

>grep foo bar       > .tmp

>cat bar | grep foo > .tmp

>My guess would be that the first one is fastest. Is there a list
>somewhere of different ways of phrasing things, and how they are
>dependant on speed? I know there are many other instances of this type
>of thing in shell scripting.

The first will almost certainly be faster. Consider that if you use
'cat' then it will have to read all the data, then write it to a pipe,
and then 'grep' will have to read all the data.

It is always worth looking at faster versions of 'grep'. In particular
'egrep' is often faster then 'grep'. Also look at 'agrep' and the GNU
grep programs which are often faster still.

Quote:>Part 2.

>Say I have a huge file, and I want to do several searches on it.
>Right now I have

>grep foo bar
>grep fee bar
>grep fii bar
>.
>.
>.
>grep zzz bar

>with the output going through awk and several other things (each time).
>Is there a faster way to do this? I'm looking for execution speed, not
>lines of script speed. The only other thing I can think of is going
>through the file line by line and sorting it into files. That way I only
>open the file once. Does that make sense? Better solutions?

It depends on all sorts of things. If most lines do not match any of your
words, then a quick first pass to eliminate them is a good idea. Put the words
"foo", "fee", "fii", "zzz" one per line in a file, and then use

        ggrep -f filename input > output
and then use
        ggrep "foo" output | awk...

This is not comp.lang.perl nor comp.lang.awk, but both of these allow you
to do processing on the file, e.g.

awk '/foo/ { do_the_foo_stuff }
/fee/   { do_the_fee_stuff }

/zzz/   { you_get_the_idea }' input

or
perl -ne '
if (/foo/) { do_the_foo_stuff }
if (/zzz/) { do_the_zzz_stuff }
' input

I would use PERL, but you should use whatever you are happy with.

 
 
 

which is faster? (grep....) and another Q

Post by Jonathan Ch » Fri, 09 Feb 1996 04:00:00




>[snip]
>> Which of the following is faster? Is there another yet?

>> grep foo bar       > .tmp

>> cat bar | grep foo > .tmp
>[snip]

>The first one is faster I think. And the following could be even faster:
>    egrep foo bar > .tmp

The first one is *DEFINITELY* faster since it doesn't require the
creation of a 'cat' process
--

 
 
 

which is faster? (grep....) and another Q

Post by John Caru » Fri, 09 Feb 1996 04:00:00



>It is always worth looking at faster versions of 'grep'. In particular
>'egrep' is often faster then 'grep'.

I recently tested them on Solaris 2.4 and found grep to be about 10-20%
faster than egrep for fixed strings.  Fgrep was amazingly slow, about
2-3 times slower than either grep or egrep.  Does anyone know the reason
for this?  I've always heard that grep/egrep were faster than fgrep but
I've never seen a reason given, and it certainly seems like fgrep could
be much faster since it doesn't need to understand regular expressions.

--
John Caruso, Senior Technical Consultant
ADP Claims Solutions Group                 Phone: (800) 366-4237 x2102
2010 Crow Canyon Place                     FAX  : (510) 866-4839

 
 
 

which is faster? (grep....) and another Q

Post by Andreas Schw » Sat, 10 Feb 1996 04:00:00



|>>
|>> It is always worth looking at faster versions of 'grep'. In particular
|>> 'egrep' is often faster then 'grep'.

|> I recently tested them on Solaris 2.4 and found grep to be about 10-20%
|> faster than egrep for fixed strings.  Fgrep was amazingly slow, about
|> 2-3 times slower than either grep or egrep.  Does anyone know the reason
|> for this?

Bad algorithms perhaps?

Btw, GNU grep usually uses the same binary for all three, and `grep
fixed_string' should be the same as `fgrep fixed_string', speedwise.
--
Andreas Schwab                                      "And now for something

 
 
 

which is faster? (grep....) and another Q

Post by Mike Bro » Sat, 10 Feb 1996 04:00:00


I'm trying to prove that fgrep is faster than grep (hey, why trust the man
page?) when searching for fixed strings.  I'm using the csh 'time' command
like this to how long it takes to look for string 'machines' in the file
'httpd':
        time grep -c machines httpd
        time fgrep -c machines httpd

I get results back like:
        0.164u 0.463s 0:02.43 25.5% 0+0k 219+0io 0pf+0w
        0.099u 0.439s 0:00.55 94.5% 0+0k 0+0io 0pf+0w

The csh man page on my system doesn't tell me what any of these numbers
mean.  The % figure is always very high and the figure to its left very
low if I have just run the test.  The % figure is always low and the
figure to its left very high if I let it sit for 20-30 seconds before
executing subsequent tests.

I also found that the numbers in the first two columns vary significantly
between consecutive tests and it is difficult to say whether grep or fgrep
is really working faster.

1.  What do the numbers mean
2.  Why do they fluctuate depending on how long between tests
3.  Is there any meaningful conclusion that can be drawn from the comparisons

Thanks,
Mike

 
 
 

which is faster? (grep....) and another Q

Post by John Caru » Tue, 13 Feb 1996 04:00:00



Quote:>I'm trying to prove that fgrep is faster than grep (hey, why trust the man
>page?) when searching for fixed strings.  I'm using the csh 'time' command
>like this to how long it takes to look for string 'machines' in the file
>'httpd':
>    time grep -c machines httpd
>    time fgrep -c machines httpd

>I get results back like:
>    0.164u 0.463s 0:02.43 25.5% 0+0k 219+0io 0pf+0w
>    0.099u 0.439s 0:00.55 94.5% 0+0k 0+0io 0pf+0w

>1.  What do the numbers mean

User CPU, kernel CPU, wall time, CPU percentage, shared+unshared memory,
input+output blocks, page faults+swaps.  Wall time is the one you'd be
most interested in for your test.  You may want to try the timex command--
it eliminates many of the fields from time's output which you probably
aren't interested in anyway.

Quote:>2.  Why do they fluctuate depending on how long between tests

Pages of the files you're using for your tests will be paged out over time,
causing differences between tests at different times.  You should "prime"
tests like these by reading in the file one time to make sure that all
subsequent commands have an equal chance at a completely paged-in file,
and then run your tests in close succession while the system is otherwise
unloaded.

Quote:>3.  Is there any meaningful conclusion that can be drawn from the comparisons

Based on your data, I'm not sure.  You'd need to provide a little more
and indicate which data goes with which test.  Use timex and you should be
able to tell the difference yourself.

As for your original intent--to prove that fgrep is faster than grep--I
recently ran similar tests on Solaris 2.4/2.5 and found that fgrep is
slower than the others by 2-3 times, egrep is second, and grep is about
15-20% faster than egrep (for fixed strings).  And GNU grep smokes them
all easily.  I've seen this claim before but I've never seen an explanation
for it, and I still don't see why fgrep is slow since it should be able
to cut the most corners.  I thought fgrep might be doing ineffecient I/O,
so I checked with truss and got these figures:

      fgrep:       8K read() calls
      grep/egrep:  1K read() calls
      ggrep:       32K read() calls

And indeed, truss shows that grep/egrep do 8 times the number of read()
calls for the same file...but they still run more than twice as fast as
fgrep.  So it must be in the algorithm.  I just can't imagine that running
a DFA over the input is faster than doing what basically amounts to a
simple strstr().

--
John Caruso, Senior Technical Consultant
ADP Claims Solutions Group                 Phone: (800) 366-4237 x2102
2010 Crow Canyon Place                     FAX  : (510) 866-4839

 
 
 

which is faster? (grep....) and another Q

Post by Andrew Trist » Wed, 14 Feb 1996 04:00:00




>As for your original intent--to prove that fgrep is faster than grep--I
>recently ran similar tests on Solaris 2.4/2.5 and found that fgrep is
>slower than the others by 2-3 times, egrep is second, and grep is about
>15-20% faster than egrep (for fixed strings).  And GNU grep smokes them
>all easily.  I've seen this claim before but I've never seen an explanation
>for it, and I still don't see why fgrep is slow since it should be able
>to cut the most corners.  I thought fgrep might be doing ineffecient I/O,
>so I checked with truss and got these figures:

I was under the impression that (I've never tested this idea) for really
large files, fgrep is faster than all the others.  I've only got GNU
grep here, so I'm unable to test it immediately.
Cheers,
Andrew

--
Andrew Tristan                       No one shall drive us out of the


 
 
 

which is faster? (grep....) and another Q

Post by Daniel LaBe » Fri, 15 Mar 1996 04:00:00



) Newsgroups: comp.unix.shell
) Date: 8 Feb 1996 17:59:04 GMT

)

) >
) >It is always worth looking at faster versions of 'grep'. In particular
) >'egrep' is often faster then 'grep'.
)
) I recently tested them on Solaris 2.4 and found grep to be about 10-20%
) faster than egrep for fixed strings.  Fgrep was amazingly slow, about
) 2-3 times slower than either grep or egrep.  Does anyone know the reason
) for this?  I've always heard that grep/egrep were faster than fgrep but
) I've never seen a reason given, and it certainly seems like fgrep could
) be much faster since it doesn't need to understand regular expressions.
)

Yes grep should be faster.  I know I read that in _Unix for the Impatient_.
From what I recall, fgrep stands for Fixed grep, not fast grep.
Something about some bizarre circumstances where grep would either
fail or run very slowy ( I can't remember which ).

--

 
 
 

which is faster? (grep....) and another Q

Post by G. Mark Stewa » Sun, 17 Mar 1996 04:00:00




: ) Newsgroups: comp.unix.shell
: ) Date: 8 Feb 1996 17:59:04 GMT


: ) >It is always worth looking at faster versions of 'grep'. In particular
: ) >'egrep' is often faster then 'grep'.
: )
: ) I recently tested them on Solaris 2.4 and found grep to be about 10-20%
: ) faster than egrep for fixed strings.  Fgrep was amazingly slow, about
: ) 2-3 times slower than either grep or egrep.  Does anyone know the reason
: ) for this?  I've always heard that grep/egrep were faster than fgrep but
: ) I've never seen a reason given, and it certainly seems like fgrep could
: ) be much faster since it doesn't need to understand regular expressions.
:  
: Yes grep should be faster.  I know I read that in _Unix for the Impatient_.
: From what I recall, fgrep stands for Fixed grep, not fast grep.
: Something about some bizarre circumstances where grep would either
: fail or run very slowy ( I can't remember which ).

There's also, as I recall, a bug in some implementations that neglect to
resize some buffer areas or something to match the machine they're on.
See if cgrep is available on your machine, as I find that to be faster
in most applications.

GMS
http://www.svs.com/users/gmark

 
 
 

which is faster? (grep....) and another Q

Post by Pete Houst » Wed, 20 Mar 1996 04:00:00


For those on NeXTs (and possibly other platforms), bm is by far the
fastest pattern matcher that I have found. It is searingly fast
compared to grep.

                        Pete
--

WWW: http://sable.ox.ac.uk/~phouston/ | Opinions are mine.
Phone: +44-1865-792542                | Facts are everyone's.
Fax:   +44-1865-58817                 |

 
 
 

1. which grep is faster?

Which version of grepping is faster, to use a single regular expression,
or a series of pipes, and to what extent does this depend on the size
of the file, the complexity of the expression, and the frequency of
occurence in the said file?

Ex1) cat said.file | grep 'foo' | grep 'goo'

Ex2) egrep '(.*foo.*goo.*|.*goo.*foo.*)' said.file
     (ignore minor typographical errors, you get the idea)

//dave

--

"In my 9 and 50 years I've never known-- that to call myself a man, for
my loved ones I must stand now, Harris, fetch thy mare, and take us home." SR
This article dedicated to Canada's greatest folk artist, Stan Rogers, R.I.P.'83

2. 3.51 sorry, meant Alladin Ghostscript3.51

3. Want to Write a Faster Grep!

4. How can I get csh as default login?

5. Faster than grep?

6. Cannot mount cdrom???

7. grep sTerm * | grep -v grep ???

8. Inprise Problem ...on Solaris

9. tail | grep | grep | grep

10. Matrox Mystique ands X.

11. Faster client or faster server??

12. seti@home faster and faster, etc.

13. How to make grep select both lines using only a single 'grep' command