peer review please

peer review please

Post by newbi » Thu, 08 Mar 2001 04:16:55



Hi,

Below is a simple script that I've just completed, and I would be grateful
if someone could point out any potential problems with the techniques used.
This is my first real script and I'd rather not develop bad habits!

The idea is to use it like a filter. I want it to take a csv file at stdin
and a specified file form the argument list of one, and direct the final
output to stdout for maybe further processing or directing straight to a
file. The output will have the place-holders replaced with the replacement
text sourced from csv file.

The csv file contains two fields (not quoted).

Field1 is a place-holder name (E.g. 10, 989, blah)
Field2 is the replacement text which will not contain the | char used in the
sed line (E.g. hello !10! !blah!)

A typical csv line might look like:

10,Smelly bloke

The specified file may contain many lines like:

Hello !10!
Yo !10!.

The output of the script using these will be:

Hello Smelly bloke
Yo Smelly bloke

The script works exactly as required (whoo hoo!), but that could be due to
good fortune and circumstances rather than doing it right.

Thanks for any comments/observations.

---
#!/bin/bash

# if the specified file doesn't exist, bomb out with a file missing message,
otherwise continue with main processing

if [ -f $1 ]; then

 cp $1 w     # set initial work file to contents of specified file
 while read foo <&0    # loop for each line of stdin
 do
     holder=!`echo $foo | cut -d , -f 1`!    # get place-holder name
     text=`echo $foo | cut -d , -f 2-`    # get replacement text
     sed "s|$holder|$text|g" w > y    # replace all entries
     cp y w    # update work file with new version
 done
 cat w     # send results to stdout
 rm w y    #remove work files
else
 echo "file $1 missing"
fi

 
 
 

peer review please

Post by Ian P. Springe » Fri, 09 Mar 2001 13:58:13


Here's how I'd write the script:

if [ ! -f "$1" ]; then
  echo "file $1 missing" >&2  # send error msg to stderr, not stdout
  exit 1  # exit w/ error status
fi
while read line; do    # no need for <&0 - stdin is default input
                               # also, use more meaningful var name than foo
  holder=!`echo $line | cut -d, -f1`!    # get place-holder name
  text=`echo $line | cut -d, -f2-`        # get replacement text
  sed "s|$holder|$text|g" $1      # replace all entries
                                                # no need for work files w or y,
as far as I can see
done
exit 0  # exit w/ success status


Quote:> Hi,

> Below is a simple script that I've just completed, and I would be grateful
> if someone could point out any potential problems with the techniques used.
> This is my first real script and I'd rather not develop bad habits!

> The idea is to use it like a filter. I want it to take a csv file at stdin
> and a specified file form the argument list of one, and direct the final
> output to stdout for maybe further processing or directing straight to a
> file. The output will have the place-holders replaced with the replacement
> text sourced from csv file.

> The csv file contains two fields (not quoted).

> Field1 is a place-holder name (E.g. 10, 989, blah)
> Field2 is the replacement text which will not contain the | char used in the
> sed line (E.g. hello !10! !blah!)

> A typical csv line might look like:

> 10,Smelly bloke

> The specified file may contain many lines like:

> Hello !10!
> Yo !10!.

> The output of the script using these will be:

> Hello Smelly bloke
> Yo Smelly bloke

> The script works exactly as required (whoo hoo!), but that could be due to
> good fortune and circumstances rather than doing it right.

> Thanks for any comments/observations.

> ---
> #!/bin/bash

> # if the specified file doesn't exist, bomb out with a file missing message,
> otherwise continue with main processing

> if [ -f $1 ]; then

>  cp $1 w     # set initial work file to contents of specified file
>  while read foo <&0    # loop for each line of stdin
>  do
>      holder=!`echo $foo | cut -d , -f 1`!    # get place-holder name
>      text=`echo $foo | cut -d , -f 2-`    # get replacement text
>      sed "s|$holder|$text|g" w > y    # replace all entries
>      cp y w    # update work file with new version
>  done
>  cat w     # send results to stdout
>  rm w y    #remove work files
> else
>  echo "file $1 missing"
> fi


 
 
 

peer review please

Post by newbi » Sat, 10 Mar 2001 05:06:21


Thanks Ian,

The exit status is something I hadn't looked into.

I did have a couple of problems though. Firstly I hadn't written it properly
as I wanted the code to act as a filter, i.e. originally my argument and
stdin were reversed. I've fixed that (modified script below).

I wasn't sure why you thought the workfile was superfluous though. I didn't
want the original file changed. Is there a way of making sed work over a
file in memory instead of mucking about with two workfiles? I directed your
script's stdout to a file and got a copy of the sed output for each line in
the csv file (24 meg's worth!). Am I missing the obvious here?

Thanks again.

--

if [ ! -f "$1" ]; then
  echo "file $1 missing" >&2  # send error msg to stderr, not stdout
  exit 1  # exit w/ error status
fi

cat > w         # stdin to work file
exec 10< $1
while read line <&10; do    # read line from csv file
  holder=!`echo $line | cut -d, -f1`!    # get place-holder name
  text=`echo $line | cut -d, -f2-`        # get replacement text
  sed "s|$holder|$text|g" w > y  # replace all entries
  cp y w
done

cat w           # final output to stdout
rm w y          # remove workfiles

exit 0  # exit w/ success status

 
 
 

peer review please

Post by Ian P. Springe » Sat, 10 Mar 2001 08:07:40


I guess I don't understand exactly what you're trying to do, if you don't
want to apply the substitution to every line in the csv file.  All I can
tell you is that sed doesn't overwrite the input file.  It writes its ouput
to stdout.

Also, instead of doing:
  exec 10< $1
  while read line <&10; do
you can just do:
  while read line <$1; do
or (more common):
  while read line; do
    <commands>
  done <$1

I still don't think you need the work files, though, as I said, I'm a bit
unclear as to what you're looking to do.

-Ian


Quote:> Thanks Ian,

> The exit status is something I hadn't looked into.

> I did have a couple of problems though. Firstly I hadn't written it
properly
> as I wanted the code to act as a filter, i.e. originally my argument and
> stdin were reversed. I've fixed that (modified script below).

> I wasn't sure why you thought the workfile was superfluous though. I
didn't
> want the original file changed. Is there a way of making sed work over a
> file in memory instead of mucking about with two workfiles? I directed
your
> script's stdout to a file and got a copy of the sed output for each line
in
> the csv file (24 meg's worth!). Am I missing the obvious here?

> Thanks again.

> --

> if [ ! -f "$1" ]; then
>   echo "file $1 missing" >&2  # send error msg to stderr, not stdout
>   exit 1  # exit w/ error status
> fi

> cat > w         # stdin to work file
> exec 10< $1
> while read line <&10; do    # read line from csv file
>   holder=!`echo $line | cut -d, -f1`!    # get place-holder name
>   text=`echo $line | cut -d, -f2-`        # get replacement text
>   sed "s|$holder|$text|g" w > y  # replace all entries
>   cp y w
> done

> cat w           # final output to stdout
> rm w y          # remove workfiles

> exit 0  # exit w/ success status

 
 
 

peer review please

Post by newbi » Sat, 10 Mar 2001 17:57:04


Cool. I like the simpler read process!

Basically, I have a templatefile with lots of place holders, and several csv
files containing the placeholder replacement text. I then want to "cat
templatefile | filterscript csvfile > newfile" for each csv file creating a
different "newfile". But I want filterscript to take its input from stdin
and send the processed results to stdout. Bit like a mailmerge working with
stdio, where the stdin is the letter and the csvfile the names & addresses
to embed into the contents of stdin.

Thanks again Ian. What litrarture would you recommend for a newbie. The bash
manual isn't too much help for me yet!



Quote:> I guess I don't understand exactly what you're trying to do, if you don't
> want to apply the substitution to every line in the csv file.  All I can
> tell you is that sed doesn't overwrite the input file.  It writes its
ouput
> to stdout.

> Also, instead of doing:
>   exec 10< $1
>   while read line <&10; do
> you can just do:
>   while read line <$1; do
> or (more common):
>   while read line; do
>     <commands>
>   done <$1

> I still don't think you need the work files, though, as I said, I'm a bit
> unclear as to what you're looking to do.

 
 
 

peer review please

Post by Ian P. Springe » Sun, 11 Mar 2001 03:29:05


I highly recommend "Portable Shell Programming" by Bruce Blinn (Prentice
Hall, 1996).

-Ian


> Cool. I like the simpler read process!

> Basically, I have a templatefile with lots of place holders, and several
csv
> files containing the placeholder replacement text. I then want to "cat
> templatefile | filterscript csvfile > newfile" for each csv file creating
a
> different "newfile". But I want filterscript to take its input from stdin
> and send the processed results to stdout. Bit like a mailmerge working
with
> stdio, where the stdin is the letter and the csvfile the names & addresses
> to embed into the contents of stdin.

> Thanks again Ian. What litrarture would you recommend for a newbie. The
bash
> manual isn't too much help for me yet!



> > I guess I don't understand exactly what you're trying to do, if you
don't
> > want to apply the substitution to every line in the csv file.  All I can
> > tell you is that sed doesn't overwrite the input file.  It writes its
> ouput
> > to stdout.

> > Also, instead of doing:
> >   exec 10< $1
> >   while read line <&10; do
> > you can just do:
> >   while read line <$1; do
> > or (more common):
> >   while read line; do
> >     <commands>
> >   done <$1

> > I still don't think you need the work files, though, as I said, I'm a
bit
> > unclear as to what you're looking to do.

 
 
 

peer review please

Post by Heiner Steve » Fri, 30 Mar 2001 06:09:41


 > cat > w         # stdin to work file
 > exec 10< $1
 > while read line <&10; do    # read line from csv file
 >   holder=!`echo $line | cut -d, -f1`!    # get place-holder name
 >   text=`echo $line | cut -d, -f2-`        # get replacement text
 >   sed "s|$holder|$text|g" w > y  # replace all entries
 >   cp y w
 > done
 >
 > cat w           # final output to stdout
 > rm w y          # remove workfiles
 >
 > exit 0  # exit w/ success status

The following script should run some order of magnitude faster.
It reads the input file only once (instead of once for each replacement),
and does not need any temporary files:

--
substlist=$1; shift
exec 3<&0 0<"$substlist"    # read from replacement list
sedscript=
OIFS=$IFS; IFS=,
while read holder text
do
    sedscript="${sedscript:+"$sedscript;"}s|$holder|$text|g"
done
exec 0<&3 3<&-            # restore standard input


--

The trick is to build one SED script in an environment variable.

Heiner
--
 ___ _

\__ \  _/ -_) V / -_) ' \    SHELLdorado for Shell Script Programmers:
|___/\__\___|\_/\___|_||_|   http://www.oase-shareware.org/shell/