getopts / awk script

getopts / awk script

Post by John » Sun, 24 Nov 2002 11:11:34




> I want to include a -h option to save having to read the sctipt if I
> forget what it does. So I put this at the top.

> But I also want to pass options to the ls -l part before the awk
> script, but if I do countnames -R ~ ie recursive count of all files in
> home directory it says

> illegal option --R

> but still processes them.

> Commenting out the getopts section solves this. So I presume this is
> where it probably lies.

> -----------------------------------------------------------------
> while getopts "h" opt; do
> case $opt in
> h ) echo "useage contnames [ls options ] [ directory name] " ;exit 1 ;;
> * ) echo "processing" ;;
> esac
> done

You have told getopts that you only want "h" -- this is why
getopts complains about "R".

You then pass the unprocessed arguments "$*" to the rest of
your script, specifically "ls -l", so you get "ls -l -R"

The blank lines come from ls -l -R, which uses blank lines
to separate different directories. And the directory names
and "total" lines do not have fields 5 and 9, so they also
will appear blank to your awk program.

You probably do not need to escape the "." in split().

You say the output is in the wrong order. Welcome to
the wacky world of buffering. You need to add
the statement: close("sort -gr +1") -- note that
the string must be *exactly* the same as the one
for the sort command -- that is the output stream you
are closing.

John.

 
 
 

getopts / awk script

Post by Peter S Tillie » Wed, 27 Nov 2002 06:45:38



[...]
Quote:> -----------------------------------------------------------------
> while getopts "h" opt; do
> case $opt in
> h ) echo "useage contnames [ls options ] [ directory name] " ;exit 1
;;
> * ) echo "processing" ;;
> esac
> done

> ls -l $* | awk '
> {name=$9; size=$5
> n=split(name, parts_of_name, "\.")
> if ( n == 1 )
> {++type[none]
> size_notype += size}
> else
> {id=parts_of_name[n]
> ++count_of_type[id]
> size_of_type[id] += size}
> next}

> END { print " "
> printf "%-15s Size %10d  Number %5d\n\n",

"None",size_notype,type[none]
Quote:> for (name in count_of_type)
> {Total_Number += count_of_type[name]
> Total_Size += size_of_type[name]
> printf "%-15s  %14d   %11d\n", name, size_of_type[name],

count_of_type[name] | "sort -gr +1"}

Quote:

> {print ""
> print "Note Total Number does not include those with no extension"
> print ""
> printf "%-12s %10d\n" , "Total Number  ",Total_Number
> printf "%-14s %10d\n" , "Total Size",Total_Size }}' | awk ' {print $0}
'

> ---------------------------------------------------------------------

This may do what you want:

#! /bin/bash
while getopts "hR" opt
do
  case $opt in
    R ) echo "Recursive listing requested" 1>&2
        recur="-R"
        shift
        ;;
    h ) echo "useage countnames [ls options ] [ directory name] " 1>&2
;exit 1 ;;
    * ) echo "processing" ;;
  esac
done

ls -l $recur $* | awk '
/^total/ || /^$/ || /^\..*:$/ {next} # skip totals, blank lines &
dirnames
  {
  name=$9; size=$5
  n=split(name, parts_of_name, "\\.")
  if ( n == 1 ) {
    ++type[none]
    size_notype += size
  } else {
    id=parts_of_name[n]
    ++count_of_type[id]
    size_of_type[id] += size
  }
  next

Quote:}

END {
  print " "
  printf "%-15s Size %10d  Number %5d\n\n", "No
type",size_notype,type[none]
  for (name in count_of_type) {
    Total_Number += count_of_type[name]
    Total_Size += size_of_type[name]
    printf "%-15s  %14d   %11d\n", name, size_of_type[name],
count_of_type[name] | "sort -gr +1"
  }
  close("sort -gr +1")

  print ""
  print "Note Total Number does not include those with no extension"
  print ""
  printf "%-12s %16d\n" , "Total Number  ",Total_Number
  printf "%-14s %16d\n\n" , "Total Size",Total_Size

Quote:}'

As an aside you could do all of this inside your awk program, including
the checking of options.

HTH
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"

 
 
 

getopts / awk script

Post by Peter S Tillie » Sat, 30 Nov 2002 08:12:01



> >>>>> On Mon, 25 Nov 2002 21:45:38 -0000

> >>>>> which is being read from comp.unix.shell

> [...]

> > This may do what you want:

> > #! /bin/bash
> > while getopts "hR" opt
> > do
> >   case $opt in
> >     R ) echo "Recursive listing requested" 1>&2
> >         recur="-R"
> >         shift
> >         ;;
> >     h ) echo "useage countnames [ls options ] [ directory name] "
1>&2
> > ;exit 1 ;;
> >     * ) echo "processing" ;;
> >   esac
> > done

> This seems to folllow the approach of specificaly dealing with each
> option, whilst what I wanted to do was deal with the help option and
> pass everything else on to ls. Though I suppose the problem could be
> passing something to ls meaning the output is not in the correct for
> for awk, and sepcificaly dealing with all posibilities is maybe
> beter. I found that pasing it as $* still worked the problem was not
> doing getopts ":h" ie error supressing.

Fine, I misunderstood what you were asking I think.  As you say getopts
":h" seems to work OK.

Quote:> 1>&2

> OK according to Learning the bash shell
> n>&m make file descriptor n a duplicate of standard out. I never
> thought this made sense shouldn't it be a duplicate of m ?

> I haven't found man bash or info bash exceptionaly clear to follow on
> this, but I interpret this to mean make standard out a duplicate of
> standard error. But isn't stdout and sterror the terminal if not
> specified. So I interpret this as meaning send stdout and sterror to
> the same place they were going before. But this realy doesn't make
> sense so I am most likely wrong. So what does it do?

OK, this puts the stdout of the echo commands for the help onto the
stderr fd, so that the help/recursive list message always appears on the
console (unless directed elsewhere) even if the stdout for the rest of
the script is redirected to a file.

Quote:

> > ls -l $recur $* | awk '
> > /^total/ || /^$/ || /^\..*:$/ {next} # skip totals, blank lines &
> > dirnames
> What happens if I don't handle these?. My thoughts were none of these
> have a 5th or 9th field so won't awk ignore them? Though it probaly
> saves time not processing them needlesly, what happens when awk tries
> to process a field that isn't set? Name asnd size were set for
> the last record. $9 and $5 are non existant so does name get set to
> void or is it kept as its last setting thus giving false positives.?

The result is an entry in the list of filenames for filenames which are
null and have size zero IIRC.  I did test this before posting and it
seemed to make sense to remove any possible confusion.

Quote:> >   {
> >   name=$9; size=$5
> >   n=split(name, parts_of_name, "\\.")
> Why do you use \\ I thought \. was an escaped . and \\. was an escaped
> \ followed by a . which would be a Regular expression dot. I am not
> sure of the coreect format for delimiting an RE in all parts I know
> its / / in the pattern but is "." a RE dot or a string dot?

The last argument of split is an RE string in this case, so it gets
parsed twice by awk.  First as a string where the \\. becomes \., the
second by the RE engine which sees a literal dot "."  Without the escape
the RE engine sees just a metacharacter dot which matches any character.

- Show quoted text -

Quote:> >   if ( n == 1 ) {
> >     ++type[none]
> >     size_notype += size
> >   } else {
> >     id=parts_of_name[n]
> >     ++count_of_type[id]
> >     size_of_type[id] += size
> >   }
> >   next
> > }
> > END {
> >   print " "
> >   printf "%-15s Size %10d  Number %5d\n\n", "No
> > type",size_notype,type[none]
> >   for (name in count_of_type) {
> >     Total_Number += count_of_type[name]
> >     Total_Size += size_of_type[name]
> >     printf "%-15s  %14d   %11d\n", name, size_of_type[name],
> > count_of_type[name] | "sort -gr +1"
> >   }
> >   close("sort -gr +1")
> Is this what gets it printed in the right order? I'm still reading the
> section on string functions in the O Reilys sed & Awk book and haven't
> reached close yet.

As John L said in his post the sort is done externally and
asynchronously and can complete well after the rest of the awk program.
In effect the close waits for the sort to complete before the remainder
of the awk program runs.

Quote:> >   print ""
> >   print "Note Total Number does not include those with no extension"
> >   print ""
> >   printf "%-12s %16d\n" , "Total Number  ",Total_Number
> >   printf "%-14s %16d\n\n" , "Total Size",Total_Size
> > }'

> I got the numbers by trial and error seeing what looked good, but at
> some point, I may try seeing if I can dynamicaly recreate tham based
> on the size of the largest string. I presume this is possible.

Yes it can be done dynamically by checking the length of the longest
filename using length(filename), or similar, and working from there.

Quote:> > As an aside you could do all of this inside your awk program,
including
> > the checking of options.

> OK how do I get awk to give me a directory listing? I have not read
> to the end of the book, but glancig forward through it is a section on
> reading input from a pipe and piping the output to getline. so do you
> mean something like ls | getline, or something different? What
> advantages each way.

Yes, something along the lines of:

    "ls -l "lsoption" "dirname | getline

in a loop will allow you to read each ls output line into $0 within awk.

If you place the awk program (quoted portion of the shell script that is
above) you can then invoke your program as follows:

    awk -f getopts.awk -f yourlsformatter.awk ...

for more information on this see the GNU awk (gawk) manual which also
provides a lot of useful examples.

Quote:> Many thanks.

HTH
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"
 
 
 

getopts / awk script

Post by Peter S Tillie » Mon, 02 Dec 2002 17:26:22


"Poster 2000" <p_oster_2...@yahoo.com.invalid> wrote in message

news:m33cpkv7xb.fsf.p_oster_2000.bigfoot.com@post2k.freeuk.com...

> >>>>> On Thu, 28 Nov 2002 23:12:01 -0000
> >>>>> In Message <as67s2$2kh$1$830fa...@news.demon.co.uk>
> >>>>> which is being read from comp.Unix.shell
> >>>>> "Peter S Tillier" <pet...@deadspam.com> Said
> [...]
> > Fine, I misunderstood what you were asking I think.  As you say
getopts
> > ":h" seems to work OK

> The -h was handled by getopts the -R was not. I wanted to be able to
> pass any parameter to ls but only have getopts handle the -h. The -R
> was still passed to ls and it still did a full recursive listing. But
> gave an error message at the start. Ie I think ls saw ls -R and did
> it's stuff but get opts saw -R set opts to ? and gave the error
> message. My mistake for not realising this.

Yes.  getopts will look at all the "option like" arguments passed to the
script, if you pass -R and -a then you'll get an error message for each
of them unless you use the colon ":" as the first character in the
option string.  Under bash you can also set OPTERR=0, which has the same
effect, not sure if this works the same for other shells.

> [...]

> OK 1 is stdout 2 is stderr 1>&2 is make stdout a duplicate of stderr
> which goes to console.

> OK if I follow what you are saying, the standard out of the echo
> command is console, but if the standard out of the entire script is
> redirected this will overide the stdout of the command. Had not
> thought of this.

The fd of the stdout of the echo commands is changed to become the fd of
stderr.  So now, for the echo commands we're considering stdout's fd is
2 and stderr's fd is 2.

[...]

> If the orriginal redirection of both stderr and stdout is console and
> stdout is later changed for the whole script. Then surely this
> overrides what it was set to before?

> Ie I can't see how saying make stdout go to the same place as stderr
> has any effect. If it's later redirected whith it appears to be surely
> this overided wherever it was sent to in the first place.

> Still finding this a bit confusing.

If (after redirecting the echo commands' stdout to stderr) you then
redirect the stdout for the script this will alter fd 1 to point to
wherever you want the output to go, but the help echo commands are
unaffected because their stdout fds are 2.

> >> > ls -l $recur $* | awk '
> >> > /^total/ || /^$/ || /^\..*:$/ {next} # skip totals, blank lines &
> >> > dirnames

These are the lines from the ls -lR listing that I was thinking of:

-rw-r--r--    1 petert   unknown      6055 Jul 28 05:22 test.c
                                            # blank
./sedtests:                                 # directory name
total 27                                    # total
-rw-r--r--    1 petert   unknown         3 Jan  9  2002 0

none of them serve any useful purpose in getting the output that you
want and can upset the counts - for the above few lines the count is
needlessly incremented by three and the total size is unaffected because
$5 & $9 are null.  It's easier just to let awk skip them and then not
have to worry about them again.

[...]

- Show quoted text -

> > The result is an entry in the list of filenames for filenames which
are
> > null and have size zero IIRC.  I did test this before posting and it
> > seemed to make sense to remove any possible confusion.

> OK I build up to arrays size_of_type  and count_of_type

> when $9 and $5 are blank name is blank or is it void. It was set
> previously. So id will become blank, or is it void. However
size_of_type[id]
> where id is blank or void must be will not have a value as size is not
> present count_of_type [id] will be incremented, since its a counter.

> So I would expect the blank entry to have a count but no size, but it
> still has a size, a very large one and not sorted.

But there are other file names that have no ".extension" these can, and
should, be included in your counts.  Examples on my system are:

...
-rw-r--r--    1 petert   unknown        27 Feb 19  2002 aaa
-rw-r--r--    1 petert   unknown       153 Nov 28 07:03 block.new
-rw-r--r--    1 petert   unknown       156 Nov 28 07:02 block.out
-rw-r--r--    1 petert   unknown       153 Nov 28 07:00 block.txt
-rw-r--r--    1 petert   unknown        51 Feb 23  2002 file1
-rw-r--r--    1 petert   unknown        30 Feb 23  2002 file2
-rw-r--r--    1 petert   unknown        34 Nov 14 07:03 input
-rw-r--r--    1 petert   unknown        76 Nov 13 21:50 join2.dat
...

In the above short section of the ls -lR listing above there are 4 files
with no extension and total size of 142 bytes.  These will add to the
count for null extensions for obvious reasons.

Files with names without an extension (like file1) will appear as blank
in the list of extensions. It seems that you have 19 of these in your
system.  The remainder of the count will be blank lines, totals and . or
.. entries in the ls -lR listing - these, of course have no size field,
so don't increase or decrease the size total when the line I suggest is
commented out, but the count will be affected.

> I don't see where I get the size from for either case there is no soze
> to put in the array. What are the 19 counts your sugestion seems to be
> missing, nor why they both sort to differnt places but neither of them
> in order.

Here's what I get from my ~ directory:

$ countnames.sh -R >file1
Recursive listing requested

$ # comment out line re blanks, etc.

$ countnames.sh -R >file2
Recursive listing requested

$ diff file1 file2
2c2
< No type         Size    1062986  Number    72
---

> No type         Size    1062940  Number    72

24c24
< sh                         5068            10
---
> sh                         5070            10
58a59
>                               0            59

63,64c64,65
< Total Number                354
< Total Size              4031985
---
> Total Number                413
> Total Size              4031987

[exited with 1]

$

In my case there are no blank entries for the file1 in the second there
are 59.  Quite why they
don't appear to sort correctly for you output I'm not sure.

[...]

> > The last argument of split is an RE string in this case, so it gets
> > parsed twice by awk.  First as a string where the \\. becomes \.,
the
> > second by the RE engine which sees a literal dot "."  Without the
escape
> > the RE engine sees just a metacharacter dot which matches any
character.

> OK can a RE be passed direcly or is it always treated as a string in
> this case. Ie in a pattern I do / RE / and don't treat it as as
> string, can I do the same ie substr? And elsewhere.

Modern awks (including gawk) and SUS/POSIX allow for the third argument
to split to be an RE constant, you can use /\./ if you wish instead of
"\\." - assuming that your awk supports this.  substr() doesn't use an
RE, but you can use one in sub() and gsub() if that is what you mean.

- Show quoted text -

> [...]
> Yes I tried doing things like ls -R | sort from / and running top. The
> seem to alternate which is running. I thought maybe sort would need
> all the data before it can begin but obviously doesn't. I got it to
> work by putting the extra print after the script. But if in the case
> of command1 | command2 command1 does notfinnish before command2 as it
> seems then even putting the print in a second command could still
> cause it to print before sort finnishes. Maybe I'll try it on a large
> test.

> su root
> cd /
> countnames -R

> Still printed at the end. So close would force this behaviour, whilst
> here I'm not sure what I'm doing is reliable. Also avoid runing two
> awk processes.

Safest is to close() the command as I did in my version.  This avoids
the need for any additional processes and/or other ways of waiting for
sort to finish.

- Show quoted text -

> [...]

> >> OK how do I get awk to give me a directory listing? I have not read
> >> to the end of the book, but glancig forward through it is a section
on
> >> reading input from a pipe and piping the output to getline. so do
you
> >> mean something like ls | getline, or something different? What
> >> advantages each way.

> > Yes, something along the lines of:

> >     "ls -l "lsoption" "dirname | getline

> > in a loop will allow you to read each ls output line into $0 within
awk.

> > If you place the awk program (quoted portion of the shell script
that is
> > above) you can then invoke your program as follows:

> >     awk -f getopts.awk -f yourlsformatter.awk ...

> I'm not sure of the advantage, but still intend to look into this.

> > for more information on this see the GNU awk (gawk) manual which
also
> > provides a lot of useful examples.

> Probaly worthwhile to bring all the stuff I read in the book together,
> at least I can now half follow it.

"sed and awk, 2nd. ed." by Dougherty and Robbins is a good initial
introduction to awk, but Arnold Robbins' "Effective awk Programming, A
GNU Manual, 3rd Ed.", O'Reilly, is an excellent second book and covers
not just gawk, but includes general awk language issues too (Arnold
Robbins is the gawk maintainer).  The latter is also freely available as
it is the documentation for FSF's GNU awk.  I like to have a paper copy,
but you can download an electronic version from:

<url: http://www.fsf.org/manual/gawk-3.1.1/gawk.html>
# various choices here: html, html .zip, .dvi, .ps, etc..

or

<url:
http://sourceforge.net/project/showfiles.php?group_id=23617&release_id=4
9017>
# scroll down to gawk and download the 5Mb doc.zip - also includes a
.pdf
# version and a windows .hlp file if you need it.

Hope this helps.  If you need more help by all means post again or email
me direct.

Peter
--
Peter S Tillier
"Who needs perl when you can write dc and sokoban in sed?"

 
 
 

1. passing args to getopts when getopts is embedded in functions

hello,

I hope that the subject line is descriptive enough.  What I'm trying
to do is basically this, parse command line arguments passed to the
script by getopts while getopts is nested two functions deep.  The
script is called with something similar to:

script -n arg_here -f "arg here too"

Parsing the options is easy if the call to getopts is NOT embedded in
functions.  Also, if I don't use multi-word arguments getopts has no
trouble regardless of where it sets.  Below is a stripped down version
of the script:

#! /usr/bin/ksh

main () {
  ParseArgs $*

ParseArgs () {
  while getopts d:t: FOO
  do case in $FOO
    d) do stuff ;;
    t) do more stuff ;;
  esac
  done

main $*

I'm sure all of you can see the problem already.  I've tried various
things like calling ParseArgs, and main, like so, main "$*" ParseArgs

Using the -x option to korn I was able to determine that my problem is
how the arguments are being passed, or at least seen, by the
functions.  Before the args are processed by the functions, they are
seen properly in the global area, if you will, of the script.
However, after being passed to the functions, the integrity of the
arguments is lost and, depending on how I pass them, they either
become one singe argument, or every word becomes and argument.

(This might be second nature to all of you, but it seems odd to me
that these functions should be passed arguments like this when their
not defined as taking arguments.)

Any help would be appreciated.  I'm sure the answer is simple, but
it's escaping me.  How do I prevent this from happening?

Andy

2. Canon BJC-210 -- no color!

3. getopt Vs getopts

4. Could anyone send a link which have linux kernel source code without tarball or zipped?

5. getopts or getopt

6. Help for newbie, way to concatenate a set of files.

7. getopts in a csh script?

8. restricting telnet access

9. kshell script question concerning getopts

10. Invoking getopts from within a function (internal to my ksh script)

11. Sh shell script problem on OSF4.0 (but not on SGI) (getopts and shift)

12. Ksh script and getopt.

13. how do i assign data from an awk to a variable on the script, since my script is using bourne and awk