dwarf stars histogram again

dwarf stars histogram again

Post by Gran » Fri, 21 Oct 2005 00:39:55



Hi there,

Recently I asked about displaying info from /var/log/messages,
converting most recent max 100 events to a histogram plot.
Got around to playing with the suggestions, here's what I have:

Input data, one sample record from /var/log/messages:
Oct 20 00:57:01 deltree kernel: InpDrop: IN=ppp0 OUT= MAC= SRC=220.210.119.91 DST=220.240.117.195 LEN=48 TOS=0x00 PREC=0x20 TTL=111 ID=30103 DF PROTO=TCP SPT=4860 DPT=445 WINDOW=8496 RES=0x00 SYN URGP=0


   135/tcp **
   445/tcp *********************************************
  1026/udp ***
  1027/udp *
  1028/udp *

real    0m0.558s
user    0m0.370s
sys     0m0.190s

#!/bin/bash
#
function fast_one
{
        grep InpDrop: /var/log/messages | grep -v ICMP | tail -100 | \
        awk -F"[ =]" '
        BEGIN {
                for (;n++<7;)
                        dwarfs=dwarfs dwarfs "*"
        }
        {
                sub(/^.*PROTO=/,"")
                key[NR]=$5"/"$1
        }
        END {
                for (i in key)
                        cnt[key[i]]++
                for (i in cnt)
                        if (cnt[i] > max)
                                max=cnt[i]
                if (max > 50) {
                        for (i in cnt)
                                printf "%10s %s\n", tolower(i), substr(dwarfs,1,(cnt[i]+1)/2)
                }
                else {
                        for (i in cnt)
                                printf "%10s %s\n", tolower(i), substr(dwarfs,1,cnt[i])
                }
        }' | sort -n

Quote:}

fast_one

A hybrid solution that is much faster than the awk only solution,
possibly 'cos Ed suggested modulo 100 math / data collection over
entire file, instead of only last max 100 records matching the
'grep' filter above.  

Loki's dwarfs over Chris' string-chopper.  Thanks all.

Suggestions for cleanup to an awk: '#!/bin/awk -f'?  I ran into syntax
brick wall trying to convert the thing. :(  

Cheers,
Grant.

 
 
 

dwarf stars histogram again

Post by Ed Morto » Fri, 21 Oct 2005 00:46:45


 > Hi there,
 >
 > Recently I asked about displaying info from /var/log/messages,
 > converting most recent max 100 events to a histogram plot.
 > Got around to playing with the suggestions, here's what I have:
 >
 > Input data, one sample record from /var/log/messages:
 > Oct 20 00:57:01 deltree kernel: InpDrop: IN=ppp0 OUT= MAC=
SRC=220.210.119.91 DST=220.240.117.195 LEN=48 TOS=0x00 PREC=0x20 TTL=111
ID=30103 DF PROTO=TCP SPT=4860 DPT=445 WINDOW=8496 RES=0x00 SYN URGP=0
 >

 >    135/tcp **
 >    445/tcp *********************************************
 >   1026/udp ***
 >   1027/udp *
 >   1028/udp *
 >
 > real    0m0.558s
 > user    0m0.370s
 > sys     0m0.190s

 > #!/bin/bash
 > #
 > function fast_one
 > {
 >         grep InpDrop: /var/log/messages | grep -v ICMP | tail -100 | \
 >         awk -F"[ =]" '
<snip>
 >         }' | sort -n
 > }
 > fast_one
 >
 > A hybrid solution that is much faster than the awk only solution,
 > possibly 'cos Ed suggested modulo 100 math / data collection over
 > entire file, instead of only last max 100 records matching the
 > 'grep' filter above.
 >
 > Loki's dwarfs over Chris' string-chopper.  Thanks all.
 >
 > Suggestions for cleanup to an awk: '#!/bin/awk -f'?  I ran into syntax
 > brick wall trying to convert the thing. :(

I already showed you how to void the "grep -v ICMP" by using
/ICMP/{next}, but the tail -100 AFTER that and the sort are a little
problematic. Do you NEED to "sort" the output? Are you using "gawk" (it
has sorting functions built in)?

        Ed.

 
 
 

dwarf stars histogram again

Post by Steffen Schule » Fri, 21 Oct 2005 01:51:01



> Hi there,

> Recently I asked about displaying info from /var/log/messages,
> converting most recent max 100 events to a histogram plot.
> Got around to playing with the suggestions, here's what I have:

> Input data, one sample record from /var/log/messages:
> Oct 20 00:57:01 deltree kernel: InpDrop: IN=ppp0 OUT= MAC= SRC=220.210.119.91 DST=220.240.117.195 LEN=48 TOS=0x00 PREC=0x20 TTL=111 ID=30103 DF PROTO=TCP SPT=4860 DPT=445 WINDOW=8496 RES=0x00 SYN URGP=0


>    135/tcp **
>    445/tcp *********************************************
>   1026/udp ***
>   1027/udp *
>   1028/udp *

> real    0m0.558s
> user    0m0.370s
> sys     0m0.190s

> #!/bin/bash
> #
> function fast_one
> {
>         grep InpDrop: /var/log/messages | grep -v ICMP | tail -100 | \
>         awk -F"[ =]" '
>         BEGIN {
>                 for (;n++<7;)
>                         dwarfs=dwarfs dwarfs "*"
>         }
>         {
>                 sub(/^.*PROTO=/,"")
>                 key[NR]=$5"/"$1
>         }
>         END {
>                 for (i in key)
>                         cnt[key[i]]++
>                 for (i in cnt)
>                         if (cnt[i] > max)
>                                 max=cnt[i]
>                 if (max > 50) {
>                         for (i in cnt)
>                                 printf "%10s %s\n", tolower(i), substr(dwarfs,1,(cnt[i]+1)/2)
>                 }
>                 else {
>                         for (i in cnt)
>                                 printf "%10s %s\n", tolower(i), substr(dwarfs,1,cnt[i])
>                 }
>         }' | sort -n
> }
> fast_one

> A hybrid solution that is much faster than the awk only solution,
> possibly 'cos Ed suggested modulo 100 math / data collection over
> entire file, instead of only last max 100 records matching the
> 'grep' filter above.  

> Loki's dwarfs over Chris' string-chopper.  Thanks all.

> Suggestions for cleanup to an awk: '#!/bin/awk -f'?  I ran into syntax
> brick wall trying to convert the thing. :(  

> Cheers,
> Grant.

There is a faster GNU awk solution:

#!/usr/bin/gawk -f
func set_key(line, line_NR,    fields) {
   sub(/^.*PROTO=/, "", line)
   split(line, fields, /[ =]/)
   key[line_NR] = fields[5] "/" fields[1]

Quote:}

BEGIN {
   dwarfs = sprintf("%7s", "");
   gsub(/ /, "*", dwarfs)
   lines_count = 100
   lines_index = 1

Quote:}

/InpDrop:/ && !/ICMP/ {
   lines[lines_index] = $0
   if (lines_index == lines_count) {
     full = 1
     lines_index = 1
   } else {
     ++lines_index
   }
Quote:}

END {
   if (full) {
     prev = lines_index - 1
     for (i = lines_index; i != prev; i = i % lines_count + 1) {
       ++lines_NR
       set_key(lines[i], lines_NR)
     }
   } else {
     for (i = 1; i < lines_index; ++i) {
       set_key(lines[i],i)
     }
   }
   for (i in key) {
     cnt[key[i]]++
   }
   for (i in cnt) {
     if (cnt[i] > 50) {
       greater = 1
       break;
     }
   }
   if (greater) {
     for (i in cnt) {
       out[++l] = sprintf("%10s %s\n", tolower(i),
substr(dwarfs,1,(cnt[i]+1)/2))
     }
   } else {
     for (i in cnt) {
       out[++l] = sprintf("%10s %s\n", tolower(i), substr(dwarfs,1,cnt[i]))
     }
   }
   asort(out)
   for (l = 1; l in out; ++l) {
     print out[l]
   }

Quote:}

Regards,

Steffen

 
 
 

dwarf stars histogram again

Post by Gran » Fri, 21 Oct 2005 02:05:11



>I already showed you how to void the "grep -v ICMP" by using
>/ICMP/{next}, but the tail -100 AFTER that and the sort are a little
>problematic. Do you NEED to "sort" the output? Are you using "gawk" (it
>has sorting functions built in)?

Okay, take two:

~# bin/getjunk
   135/tcp **                                                 3
   445/tcp ***************************************            78
  1025/udp *                                                  1
  1026/udp ****                                               7
  1027/udp ***                                                6
  1028/udp *                                                  2
  1029/udp *                                                  1
  1030/udp *                                                  1
  1031/udp *                                                  1

~# cat bin/getjunk
#!/bin/bash
#
grep InpDrop: /var/log/messages | tail -100 | \
awk -F"[ =]" '
BEGIN { for (;n++<7;) dwarfs = dwarfs dwarfs "*" }
/ICMP/ { icmp++; next }
{
        sub(/^.*PROTO=/,"")
        key[NR] = $5"/"$1

Quote:}

END {
        for (i in key)
                cnt[key[i]]++
        for (i in cnt)
                if (cnt[i] > max) max = cnt[i]
        scale = (max > 50 ? 2 : 1)
        for (i in cnt)
                printf "%10s %-50s %d\n",
                                tolower(i),
                                substr(dwarfs,1,(cnt[i] + 1) / scale),
                                cnt[i]
        if (icmp > 0)
                printf "icmp: %d\n", icmp

Quote:}' | sort -n

I cannot get asort() to work as expected, tried 'asort(cnt)', asort(cnt, d),
asorti(), they not produce what '| sort -n' does for me.

ICMP packets are rare, so now I'm stuck on getting last 100 records
and doing a numeric sort in awk (gawk).

Thanks,
Grant.

 
 
 

dwarf stars histogram again

Post by Ed Morto » Fri, 21 Oct 2005 02:23:17




 >
 >
 >>I already showed you how to void the "grep -v ICMP" by using
 >>/ICMP/{next}, but the tail -100 AFTER that and the sort are a little
 >>problematic. Do you NEED to "sort" the output? Are you using "gawk" (it
 >>has sorting functions built in)?
 >
 >
 > Okay, take two:
 >
 > ~# bin/getjunk
 >    135/tcp **                                                 3
 >    445/tcp ***************************************            78
 >   1025/udp *                                                  1
 >   1026/udp ****                                               7
 >   1027/udp ***                                                6
 >   1028/udp *                                                  2
 >   1029/udp *                                                  1
 >   1030/udp *                                                  1
 >   1031/udp *                                                  1
 >
 > ~# cat bin/getjunk
 > #!/bin/bash
 > #
 > grep InpDrop: /var/log/messages | tail -100 | \
 > awk -F"[ =]" '
 > BEGIN { for (;n++<7;) dwarfs = dwarfs dwarfs "*" }
 > /ICMP/ { icmp++; next }
 > {
 >         sub(/^.*PROTO=/,"")
 >         key[NR] = $5"/"$1
 > }
 > END {
 >         for (i in key)
 >                 cnt[key[i]]++
 >         for (i in cnt)
 >                 if (cnt[i] > max) max = cnt[i]
 >         scale = (max > 50 ? 2 : 1)
 >         for (i in cnt)
 >                 printf "%10s %-50s %d\n",
 >                                 tolower(i),
 >                                 substr(dwarfs,1,(cnt[i] + 1) / scale),
 >                                 cnt[i]
 >         if (icmp > 0)
 >                 printf "icmp: %d\n", icmp
 > }' | sort -n
 >
 >
 > I cannot get asort() to work as expected, tried 'asort(cnt)',
asort(cnt, d),
 > asorti(), they not produce what '| sort -n' does for me.
 >
 > ICMP packets are rare, so now I'm stuck on getting last 100 records
 > and doing a numeric sort in awk (gawk).

gawk doesn't do numeric sorts, it does alphabetical sorts, hence the "a"
at the start of "asort()" and "asorti()". Your keys are non-numerical so
that's probably the problem you're having. See if "asort()" produces the
same result as "sort" without the "-n". You might need to keep a numeric
index of some kind and sort that....

        Ed.

 
 
 

dwarf stars histogram again

Post by johngnu » Fri, 21 Oct 2005 02:39:50


histogram ? Some thing to ponder in perl.

I often use perl hash for that. Just the hash syntax,

# simple count.
$my_info_hash{"$info"}++;

# then to unroll.....
while(($key,$value)=each %my_info_hash){
print "VALUE: $value KEY: $key";}

Warning: Perl hash does not sort stuff going in nor out of the hash.

Just 2 cents...

 
 
 

dwarf stars histogram again

Post by Chris F.A. Johnso » Fri, 21 Oct 2005 03:31:52




>>I already showed you how to void the "grep -v ICMP" by using
>>/ICMP/{next}, but the tail -100 AFTER that and the sort are a little
>>problematic. Do you NEED to "sort" the output? Are you using "gawk" (it
>>has sorting functions built in)?

> Okay, take two:

> ~# bin/getjunk
>    135/tcp **                                                 3
>    445/tcp ***************************************            78
>   1025/udp *                                                  1
>   1026/udp ****                                               7
>   1027/udp ***                                                6
>   1028/udp *                                                  2
>   1029/udp *                                                  1
>   1030/udp *                                                  1
>   1031/udp *                                                  1

> ~# cat bin/getjunk
> #!/bin/bash
> #
> grep InpDrop: /var/log/messages | tail -100 | \
> awk -F"[ =]" '
> BEGIN { for (;n++<7;) dwarfs = dwarfs dwarfs "*" }
> /ICMP/ { icmp++; next }

    After removing lines containing ICMP, you are not going to have
    100 lines to work with.

    You may also find that using 'grep -v ICMP' is faster than
    skipping the lines in awk (unless you use mawk).

- Show quoted text -

Quote:> {
>         sub(/^.*PROTO=/,"")
>         key[NR] = $5"/"$1
> }
> END {
>         for (i in key)
>                 cnt[key[i]]++
>         for (i in cnt)
>                 if (cnt[i] > max) max = cnt[i]
>         scale = (max > 50 ? 2 : 1)
>         for (i in cnt)
>                 printf "%10s %-50s %d\n",
>                                 tolower(i),
>                                 substr(dwarfs,1,(cnt[i] + 1) / scale),
>                                 cnt[i]
>         if (icmp > 0)
>                 printf "icmp: %d\n", icmp
> }' | sort -n

> I cannot get asort() to work as expected, tried 'asort(cnt)', asort(cnt, d),
> asorti(), they not produce what '| sort -n' does for me.

     Then use sort; asort does not do numerical sorting.

Quote:> ICMP packets are rare, so now I'm stuck on getting last 100 records
> and doing a numeric sort in awk (gawk).

     The Unix philosophy of using the right tool for the job is
     important.

--
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>

 
 
 

dwarf stars histogram again

Post by Kenny McCorma » Fri, 21 Oct 2005 06:49:50




...

Quote:>gawk doesn't do numeric sorts, it does alphabetical sorts, hence the "a"
>at the start of "asort()" and "asorti()". Your keys are non-numerical so
>that's probably the problem you're having. See if "asort()" produces the
>same result as "sort" without the "-n". You might need to keep a numeric
>index of some kind and sort that....

1) I'm pretty sure the "a" in "asort" stands for "array".

2) Generally, all you have to do to get the functionality of sort's "-n"
    is to zero-fill your numeric strings.  That's all it is.

3) TAWK, of course, handles this all correctly.

 
 
 

dwarf stars histogram again

Post by Gran » Fri, 21 Oct 2005 06:54:36



Quote:

>     The Unix philosophy of using the right tool for the job is
>     important.

I agree, got to bang on the limits to discover where they lie though?

The awk stuff slices through some parts of the problem, not so good
at other parts, that's okay 'cos I mix stuff up to get things done.

Cheers,
Grant.

 
 
 

dwarf stars histogram again

Post by Gran » Fri, 21 Oct 2005 07:01:47



Quote:>histogram ? Some thing to ponder in perl.

It is, except in this case I want to write a little toy with minimal
CPU usage, so perl is out.  Unless I hit a brick wall, of course.

Grant.

 
 
 

dwarf stars histogram again

Post by Steffen Schule » Fri, 21 Oct 2005 08:21:36


Some comments and two small corrections of my former gawk script:


> #!/usr/bin/gawk -f
> func set_key(line, line_NR,    fields) {
>   sub(/^.*PROTO=/, "", line)
>   split(line, fields, /[ =]/)
>   key[line_NR] = fields[5] "/" fields[1]
> }
> BEGIN {
>   dwarfs = sprintf("%7s", "");
>   gsub(/ /, "*", dwarfs)

# this is a more efficient solution to compute dwarfs with linear
# complexity versus your solution. Yours has quadratic complexity

Quote:>   lines_count = 100
>   lines_index = 1

> }
> /InpDrop:/ && !/ICMP/ {
>   lines[lines_index] = $0
>   if (lines_index == lines_count) {
>     full = 1
>     lines_index = 1
>   } else {
>     ++lines_index
>   }

# this simulates a ring buffer lines[i] i=1,...,100
# lines_index points to the next line to be used in lines[]
# full == 1 iff the ring buffer is full
# then the next line (pointed to by lines_index) was used
# before and is overwritten

Quote:> }
> END {
>   if (full) {
>     prev = lines_index - 1
>     for (i = lines_index; i != prev; i = i % lines_count + 1) {
>       ++lines_NR
>       set_key(lines[i], lines_NR)
>     }
>   } else {
>     for (i = 1; i < lines_index; ++i) {
>       set_key(lines[i],i)
>     }
>   }
>   for (i in key) {
>     cnt[key[i]]++
>   }
>   for (i in cnt) {
>     if (cnt[i] > 50) {
>       greater = 1
>       break;
>     }
>   }
>   if (greater) {
>     for (i in cnt) {
>       out[++l] = sprintf("%10s %s\n", tolower(i),

#                          ^^^^^^^^^^^

#corrected:                "%10s %s"

Quote:> substr(dwarfs,1,(cnt[i]+1)/2))
>     }
>   } else {
>     for (i in cnt) {
>       out[++l] = sprintf("%10s %s\n", tolower(i), substr(dwarfs,1,cnt[i]))

#                           ^^^^^^^^^^^

#corrected:                 "%10s %s"

Quote:>     }
>   }
>   asort(out)

# Because the leading number in out[i] is padded with blanks and blanks
# occur in the ASCII alphabet before the numbers asort(out) should work
# like a numeric sort

Quote:>   for (l = 1; l in out; ++l) {
>     print out[l]
>   }
> }

Regards,

Steffen

 
 
 

dwarf stars histogram again

Post by Ed Morto » Fri, 21 Oct 2005 10:45:48





> ...

>>gawk doesn't do numeric sorts, it does alphabetical sorts, hence the "a"
>>at the start of "asort()" and "asorti()". Your keys are non-numerical so
>>that's probably the problem you're having. See if "asort()" produces the
>>same result as "sort" without the "-n". You might need to keep a numeric
>>index of some kind and sort that....

> 1) I'm pretty sure the "a" in "asort" stands for "array".

I'd be disappointed if they really named the function based on the data
structure it operates on rather than the action it performs, but you may
be right. Of course, that would make it a little challenging to
introduce a new primitive for numeric sort on an array in future if they
so choose. If I'm right, the natural name would be "nsort()", if you're
right then it's not so obvious.

Quote:> 2) Generally, all you have to do to get the functionality of sort's "-n"
>     is to zero-fill your numeric strings.  That's all it is.

Right, if you have numeric indices, which the OP currently doesn't.

Quote:> 3) TAWK, of course, handles this all correctly.

Wher can I download that from again ;-) ?

        Ed.

 
 
 

dwarf stars histogram again

Post by John W. Krah » Fri, 21 Oct 2005 10:58:17




>>histogram ? Some thing to ponder in perl.

> It is, except in this case I want to write a little toy with minimal
> CPU usage, so perl is out.  Unless I hit a brick wall, of course.

This seems to run faster then the awk script you posted.  HTH

perl -ne'
next unless /InpDrop:/;
next if /ICMP/;


END {

  printf "%10s %s\n", lc, $x{ $_ } for sort { $a <=> $b } keys %x;

Quote:}

' /var/log/messages

John
--
use Perl;
program
fulfillment

 
 
 

dwarf stars histogram again

Post by Gran » Fri, 21 Oct 2005 11:47:49





>>>histogram ? Some thing to ponder in perl.

>> It is, except in this case I want to write a little toy with minimal
>> CPU usage, so perl is out.  Unless I hit a brick wall, of course.

I'm such a trusting soul :o)  

~# time perl -ne'
next unless /InpDrop:/;
next if /ICMP/;


END {

  printf "%10s %s\n", lc, $x{ $_ } for sort { $a <=> $b } keys %x;

Quote:}

' /var/log/messages
   135/tcp ***
[...]
real    0m1.978s
user    0m1.880s
sys     0m0.080s

Yep, the awk modulo 100 one was about 8 seconds when log was < 2MB

~# time bin/getjunk
[...]
real    0m0.430s
user    0m0.220s
sys     0m0.200s

~# ls -lh /var/log/messages
-rw-r-----  1 root root 2.3M 2005-10-20 12:42 /var/log/messages

But, hybrid bash grep|awk|sort script is heaps faster :)

Thank you for the suggestion, differing approaches stir the imagination.

Grant.

 
 
 

dwarf stars histogram again

Post by Loki Harfag » Fri, 21 Oct 2005 17:32:52


Le Thu, 20 Oct 2005 01:21:36 +0200, Steffen Schuler a crit?:

Quote:>> BEGIN {
>>   dwarfs = sprintf("%7s", "");
>>   gsub(/ /, "*", dwarfs)

> # this is a more efficient solution to compute dwarfs with linear
> # complexity versus your solution. Yours has quadratic complexity

 Well, that was the point and the soul of the idea :D)

 Besides, I think you want to write it :

Quote:>>   dwarfs = sprintf("%100s", "");

 But then we'll have to rename again those poor dwarfing stars ;-)
 
 
 

1. Dwarf Error: Cannot handle DW_FORM_strp in DWARF reader.

  hello.

i am compiling a program with gcc 3.1 .
i am trying to debug it with gdb 5.0 and i get this error :

"Dwarf Error: Cannot handle DW_FORM_strp in DWARF reader"

can anyone tell me what "Dwarf" is ? what causes this problem ? how can i
solve it ?

tx.
hagai yaffe.

2. Linux Kernel GURU's

3. Star Office again

4. Save the display of xterm or dtterm window in a log file

5. Building Glibc 2.1.2 again, again and again

6. subscribe

7. Splitting c.o.l Again, and Again, and AGAIN!!!

8. Solaris vs. SunOS

9. Again, again, and again

10. LinuxMall spams again and again and again...

11. writing histogram display of dropped packets?

12. xmgr: how to compute a frequency histogram

13. Pie Chart, Bar Char and Histogram in Qt