TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by sechee » Tue, 15 Jul 2003 23:50:59



I have a script that monitors a firewall drop log file and I need to
pull the protocol fields.  I used to know exactly where this field
was, so I could easily get the field with this statement:

        PROTOCOL=`awk 'print $5'`

But now the logs are dynamic and the field can be anywhere.  One thing
I do know is that the protocol field always follows a field labelled
"proto".  Thus the follow command gets it for me:

        PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Trouble is, this command takes about 10 times as long to run as the
awk did.  The result is that the execution time for my script overall
has gone from about 1 minute to 10 minutes.

Can anyone think of a faster way to get the job done?  BTW, perl is
available, but I'm unfamiliar with the language.

Thanks.

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by Ben » Wed, 16 Jul 2003 00:22:53



>    PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

> Trouble is, this command takes about 10 times as long to run as the

 From a POSIX type shell:
    PROTOCOL=${LINE#* proto } # chop everything up to 'proto'
    PROTOCOL=${PROTOCOL%% *}  # chop everything down to first field

regards,
Ben

--
BTW. I can be contacted at Username:newsgroup4.replies.benaltw
Domain:xoxy.net

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by Tapani Tarvaine » Wed, 16 Jul 2003 00:41:32



> I have a script that monitors a firewall drop log file and I need to
> pull the protocol fields.  I used to know exactly where this field
> was, so I could easily get the field with this statement:

>    PROTOCOL=`awk 'print $5'`

Is that really complete or should there be "echo ${LINE}"
in there as well, as in your new version, or is it really
reading stdin?

Quote:> But now the logs are dynamic and the field can be anywhere.  One thing
> I do know is that the protocol field always follows a field labelled
> "proto".  Thus the follow command gets it for me:

>    PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`
> Trouble is, this command takes about 10 times as long to run as the
> awk did.

Strange. Even as it is, sed+cut should not take 10 times as long as
awk. Anyway, you can do away with the cut:

PROTOCOL=`echo "${LINE}" | sed -n 's/^.* proto \([^ ]*\) .*/\1/p'`

I added quotes around ${LINE} in case it might contain
shell special characters (doesn't hurt in any case),
although it is still vulnerable to echo's oddities
(system-specific).

A better alternative might be using shell's string-editing features
(assuming POSIX shell or ksh or similar):

PROTOCOL="${LINE#*proto }"; PROTOCOL="${PROTOCOL %%*}"

However, you might be able to gain more by rearranging the script
more - how do you pull LINE out of the file and what else do you
do with it? If (as it seems) you are using "read" line-by-line,
you might be able to speed things up a lot by processing the entire
file with one sed call and piping the results to a while-read loop.
As a simplified example,

while read LINE ; do
  PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto \([^ ]*\) .*/\1/p'`
  ...
done <file

is generally *much* slower than

sed -n 's/^.* proto \([^ ]*\) .*/\1/p' file |
while read PROTOCOL ; do
  ...
done

If you need other fields from there as well it could probably
be done easily enough by modifying that accordingly, but
without knowing more about what you actually try to do
I can't give more detailed advice.

--
Tapani Tarvainen

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by rakesh shar » Wed, 16 Jul 2003 03:38:41



> I have a script that monitors a firewall drop log file and I need to
> pull the protocol fields.  I used to know exactly where this field
> was, so I could easily get the field with this statement:

>    PROTOCOL=`awk 'print $5'`

> But now the logs are dynamic and the field can be anywhere.  One thing
> I do know is that the protocol field always follows a field labelled
> "proto".  Thus the follow command gets it for me:

>    PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

> Trouble is, this command takes about 10 times as long to run as the
> awk did.  The result is that the execution time for my script overall
> has gone from about 1 minute to 10 minutes.

> Can anyone think of a faster way to get the job done?  BTW, perl is
> available, but I'm unfamiliar with the language.

I don't think the slowdown is because of sed, it's probably due to the $LINE
variable(I guess u r using the shell to loop thru the file).

what u can do is:

  sed -ne '/ proto /s/^.* proto  *\([^ ]*\).*$/\1/p' firewall_logfile

or with perl:

perl -wlane '/\sproto\s+\S/&&do{

                print $F[1];
        }
' firewall_logfile

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by James E Keen » Wed, 16 Jul 2003 05:00:31



> I have a script that monitors a firewall drop log file and I need to
> pull the protocol fields.  I used to know exactly where this field
> was, so I could easily get the field with this statement:

>    PROTOCOL=`awk 'print $5'`

> But now the logs are dynamic and the field can be anywhere.  One thing
> I do know is that the protocol field always follows a field labelled
> "proto".  Thus the follow command gets it for me:

>    PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

# untested:  requires Perl 5.8 or separate installation of Tie::File
module from CPAN

use strict;
use warnings;
use Tie::File;

my $file = 'firewall_log.txt'; # arbitrary; substitute your own


for (my $i=0; $i<=$#array; $i++) {
    if ($array[$i] =~ /^proto/ and $array[$i+1] =~ /^PROTOCOL/) {
        print $array[$i+1];
    }

Quote:}


 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by Carlton Bro » Wed, 16 Jul 2003 05:58:18



> I have a script that monitors a firewall drop log file and I need to
> pull the protocol fields.  I used to know exactly where this field
> was, so I could easily get the field with this statement:

>    PROTOCOL=`awk 'print $5'`

> But now the logs are dynamic and the field can be anywhere.  One thing
> I do know is that the protocol field always follows a field labelled
> "proto".  Thus the follow command gets it for me:

>    PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

while (<INPUTFILE>) {

     print "I found a protocol named: $pname[0]\n";

Quote:}

This works if you've correctly specified your file handles.  That,
along with any customizations that you may consider asking about next,
are intentionally left blank as an opportunity for self-study.
 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by Uri Guttma » Wed, 16 Jul 2003 06:25:41


  >>
  >> PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

  CB> while (<INPUTFILE>) {

  CB>      print "I found a protocol named: $pname[0]\n";
  CB> }

perl -ne '/proto\s+(\w+)/ && print "I found a protocol named: $1\n"'

using multiple shell commands and progs for this is slow. and that
should be faster then the full perl loop for several reasons (no block
entry, no temp array)

uri

--

--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by Juergen Hec » Wed, 16 Jul 2003 17:39:40



> I have a script that monitors a firewall drop log file and I need to
> pull the protocol fields.  I used to know exactly where this field
> was, so I could easily get the field with this statement:

>         PROTOCOL=`awk 'print $5'`

> But now the logs are dynamic and the field can be anywhere.  One thing
> I do know is that the protocol field always follows a field labelled
> "proto".  Thus the follow command gets it for me:

>         PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

> Trouble is, this command takes about 10 times as long to run as the
> awk did.  The result is that the execution time for my script overall
> has gone from about 1 minute to 10 minutes.

> Can anyone think of a faster way to get the job done?  BTW, perl is
> available, but I'm unfamiliar with the language.

> Thanks.

PROTOCOL=`awk '{sub(/.* proto /,"") ; print $1' yourlogfile`
for entry in $PROTOCOL
do
 ### process $entry
done

or

awk '{sub(/.* proto /,"") ; print $1' yourlogfile | while read entry
do
###process $entry
done

Regards
Juergen

 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by sechee » Wed, 16 Jul 2003 20:59:30


On 14 Jul 2003 18:41:32 +0300, Tapani Tarvainen



>> I have a script that monitors a firewall drop log file and I need to
>> pull the protocol fields.  I used to know exactly where this field
>> was, so I could easily get the field with this statement:

>>        PROTOCOL=`awk 'print $5'`

>Is that really complete or should there be "echo ${LINE}"
>in there as well, as in your new version, or is it really
>reading stdin?

Nope... there is an echo.  Thanks.
 
 
 

TOO SLOW: `echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

Post by sechee » Thu, 17 Jul 2003 02:20:04


Your suggestion worked beautifully!  The speed is incredible now;
guess using built-in shell code is always the better choice than
calling external commands.  I never even considered these constructs;
to be honest I had forgotten they existed.

Thanks.




>>        PROTOCOL=`echo ${LINE} | sed -n 's/^.* proto //p' | cut -f1 -d" "`

>> Trouble is, this command takes about 10 times as long to run as the

> From a POSIX type shell:
>    PROTOCOL=${LINE#* proto } # chop everything up to 'proto'
>    PROTOCOL=${PROTOCOL%% *}  # chop everything down to first field

>regards,
>Ben

>--
>BTW. I can be contacted at Username:newsgroup4.replies.benaltw
>Domain:xoxy.net

 
 
 

1. progname=`$echo "$0" | sed 's%^.*/%%'` bears no result!!!

This should be doing what 'basename' does for a simple invocation.  I
tried sending '/usr/bob' and 'bob' to sed and in both cases got 'bob'
back.  So using 'basename' (and 'dirname' if needed for the directory)
may work.

But, given that "$0" should always be the name of the script, the
variable progname should never be null...so something is rather odd
here.  What is your environment (OS, shell, method of invoking the
script, etc.)?

In progress...

Bob McGowan

2. Running KDE 2 without Mesa?

3. Simple 'sed', 'awk', 'cut' problem

4. NIS/NIS+ security

5. Why doesn't echo "text" 'command' "more text" work?

6. Measuring memory

7. Diald configuration problem

8. Using 'sed' to remove "line-feed"

9. 'eject' process stuck in "D" state

10. "cut'n'paste" minus Xserver = Xclient crash?

11. Does "echo prompt: | tr -d '\012'" always work?

12. 'echo "some text" |read foo' results in empty $foo