while read : slow

while read : slow

Post by luc » Thu, 30 Jun 2005 06:02:06



Hello,

I have to replace strings in a file with a counter for each key =>

AA00053500
AA66636767
BB54674775
CC54635666
CC65576567
EE12345678

result =>

100053500
166636767
254674775
354635666
365576567
412345678

the keys are AA, BB, CC, EE and replaced by 1,2,3,4

I successfully made it with a loop while read ... done.
But it's VERY slow.
Is there another way which would be faster (in ksh only, without doing
it in C)?

Thanks.

 
 
 

while read : slow

Post by Chris F.A. Johnso » Thu, 30 Jun 2005 06:17:34



> Hello,

> I have to replace strings in a file with a counter for each key =>

> AA00053500
> AA66636767
> BB54674775
> CC54635666
> CC65576567
> EE12345678

> result =>

> 100053500
> 166636767
> 254674775
> 354635666
> 365576567
> 412345678

> the keys are AA, BB, CC, EE and replaced by 1,2,3,4

> I successfully made it with a loop while read ... done.
> But it's VERY slow.
> Is there another way which would be faster (in ksh only, without doing
> it in C)?

sed -e 's/AA/1/' -e 's/BB/2/' -e 's/CC/3/' -e 's/DD/4/'

--
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>

 
 
 

while read : slow

Post by Bill Seiver » Thu, 30 Jun 2005 13:34:10




>>Hello,

>>I have to replace strings in a file with a counter for each key =>

>>AA00053500
>>AA66636767
>>BB54674775
>>CC54635666
>>CC65576567
>>EE12345678

>>result =>

>>100053500
>>166636767
>>254674775
>>354635666
>>365576567
>>412345678

>>the keys are AA, BB, CC, EE and replaced by 1,2,3,4

>>I successfully made it with a loop while read ... done.
>>But it's VERY slow.
>>Is there another way which would be faster (in ksh only, without doing
>>it in C)?

> sed -e 's/AA/1/' -e 's/BB/2/' -e 's/CC/3/' -e 's/DD/4/'

It may also be faster if you anchor your expressions,
otherwise it has to scan the whole line looking for, e.g., AA.

sed -e "s/^AA/1/" -e "s/^BB/2/" -e "s/^CC/3/" -e "s/^DD/4/"

For these, either single or double quotes will work.

Bill Seivert

 
 
 

while read : slow

Post by luc » Thu, 30 Jun 2005 16:00:17


  >

Quote:> sed -e 's/AA/1/' -e 's/BB/2/' -e 's/CC/3/' -e 's/DD/4/'

Thanks for the answer, but I can't do that.
The file has thounsands of lines... (it was just an extract)
 
 
 

while read : slow

Post by Chris F.A. Johnso » Thu, 30 Jun 2005 16:18:24




>> sed -e 's/AA/1/' -e 's/BB/2/' -e 's/CC/3/' -e 's/DD/4/'

> Thanks for the answer, but I can't do that.
> The file has thounsands of lines... (it was just an extract)

    So build a sed script. If the keys are the first 2 characters of
    every line:

sed `cut -c1,2 FILE | awk '
        x[$1]++ == 0 {printf "-e 's/%s/%d/ '", $1, ++n}'` FILE

--
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>

 
 
 

while read : slow

Post by John » Thu, 30 Jun 2005 16:29:47




> > sed -e 's/AA/1/' -e 's/BB/2/' -e 's/CC/3/' -e 's/DD/4/'

> Thanks for the answer, but I can't do that.
> The file has thounsands of lines... (it was just an extract)

In that case it is not really clear what you want to do
(how many different keys are there and are they known in
advance?) but in general terms you will probably find perl
to be a lot faster than sed for this sort of task.

--
John.

 
 
 

while read : slow

Post by luc » Thu, 30 Jun 2005 20:04:03



> sed `cut -c1,2 FILE | awk '
>         x[$1]++ == 0 {printf "-e 's/%s/%d/ '", $1, ++n}'` FILE

In fact, you gave me the idea to embed all my processing in awk.
I made a quick test by just reading & writing to a new file : it seems
to be about 100 times faster (!!!) than while..read.
I have extracted my keys with substr in a variable for each line.
I just need now to find a way to find and replace this variable with my
counter in each line (as I don't seem to be able to use sed in my awk
statement)
I'm going to search in google...

Thanks again.

 
 
 

while read : slow

Post by Ed Morto » Thu, 30 Jun 2005 22:49:52




>> sed `cut -c1,2 FILE | awk '
>>         x[$1]++ == 0 {printf "-e 's/%s/%d/ '", $1, ++n}'` FILE

> In fact, you gave me the idea to embed all my processing in awk.
> I made a quick test by just reading & writing to a new file : it seems
> to be about 100 times faster (!!!) than while..read.
> I have extracted my keys with substr in a variable for each line.
> I just need now to find a way to find and replace this variable with my
> counter in each line (as I don't seem to be able to use sed in my awk
> statement)

You can call sed from awk using system() but that's unlikely to be the
best solution. I'd do something like this:

awk 'BEGIN{lastNum = 1}
{    key = val = $0; sub(/^[A-Z]*/,"",val); sub(val,"",key)
      if (! (key in keys)) keys[key] = lastNum++
      print keys[key] val

Quote:}' file

Regards,

        Ed.

 
 
 

while read : slow

Post by Chris F.A. Johnso » Fri, 01 Jul 2005 09:37:16




>> You can call sed from awk using system() but that's unlikely to be the
>> best solution. I'd do something like this:

>> awk 'BEGIN{lastNum = 1}
>> {    key = val = $0; sub(/^[A-Z]*/,"",val); sub(val,"",key)
>>      if (! (key in keys)) keys[key] = lastNum++
>>      print keys[key] val
>> }' file

> Yes, thanks, I have used sub() which works perfectly (that I found in
> "The Awk Manual" on Google)
> I just don't understand why the while loop is even more than 100 times
> slower than awk (in both case, my algorithm was exactly the same)

   Because a shell script is interpreted, and the loop has to be
   explicitly coded (and interpreted on every loop); in awk, the loop
   code is implicit, i.e., compiled into awk itself.

--
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>

 
 
 

while read : slow

Post by luc » Fri, 01 Jul 2005 03:12:48



> You can call sed from awk using system() but that's unlikely to be the
> best solution. I'd do something like this:

> awk 'BEGIN{lastNum = 1}
> {    key = val = $0; sub(/^[A-Z]*/,"",val); sub(val,"",key)
>      if (! (key in keys)) keys[key] = lastNum++
>      print keys[key] val
> }' file

Yes, thanks, I have used sub() which works perfectly (that I found in
"The Awk Manual" on Google)
I just don't understand why the while loop is even more than 100 times
slower than awk (in both case, my algorithm was exactly the same)
 
 
 

while read : slow

Post by Ed Morto » Fri, 01 Jul 2005 03:45:13




>> You can call sed from awk using system() but that's unlikely to be the
>> best solution. I'd do something like this:

>> awk 'BEGIN{lastNum = 1}
>> {    key = val = $0; sub(/^[A-Z]*/,"",val); sub(val,"",key)
>>      if (! (key in keys)) keys[key] = lastNum++
>>      print keys[key] val
>> }' file

> Yes, thanks, I have used sub() which works perfectly (that I found in
> "The Awk Manual" on Google)
> I just don't understand why the while loop is even more than 100 times
> slower than awk (in both case, my algorithm was exactly the same)

It's hard to say since you never posted your original loop, but the
shell equivalent to the above would be something like (untested):

lastNum=0
while read line
do
        val="${line##*[A-Z]}"
        key="${line%$val}"
        eval idx="\${keys_$key}"
        if [ -z "$idx" ]; then
                lastNum=$(( lastNum + 1 ))
                eval \${keys_$key}="$lastNum"
                idx="$lastNum"
        fi
        echo "${idx}${val}"
done < file

so if that isn't what your loop does, then try that and see how close
they are.

        Ed.

 
 
 

1. Press Ctrl-D whiling inputting

I write a program:

#include <stdio.h>
main()
{
   char str[81];
   while(1)
   {
      gets(str);
      printf("You input: %s\n",str);
   }

My question is why it causes infinitely loop after I press Ctrl-D.
Is there any solution about this?  Thanks.

--
     +------------------------------------------------------------+
    / \   Trying to maintain a good friendship with all people   / \
   /   + - - - - - - - - - - - - - - - - - - - - - - - - - - - -/ - +


+------------------------------------------------------------+   /
 \ /Home Page: http://susis.ust.hk/~enoch                     \ /
  +------------------------------------------------------------+

2. Newbie questions from a user coming from linux

3. Tin newsreader, slow slow slow

4. Only an excellent programmer would solve this...

5. SMB slow, Samba slow, or something else slow?

6. DNS Configuration

7. Help - slow,slow, slip,slip, slow

8. Problem compiling su

9. Matrox g450 + tuxracer = slow slow slow!!! [???]

10. Newby (After 4.3.2 install, CDE LOGIN IS SLOW SLOW SLOW)

11. rcmd fast, slow, slow, slow, fast, slow, slow. Again!!

12. Very slow nfs read performance; linux srvr

13. QIC-02 (PC36) Controller and Wangtek Tape Drive Reads slow