Why doesn't 'cut' utility work intuitively?

Why doesn't 'cut' utility work intuitively?

Post by era eriksso » Sun, 21 Sep 1997 04:00:00



 >     cut -c 4-5,1-2
 > produces the same output, instead of
 > 12AB
 > 34CD
 > In other words, you can't use "cut" to re-order columns or fields.
 > Of course there are other ways to do that, using cut + paste, awk,
 > perl, or whatever. But why would this functionality have been
 > left out of "cut"?

Because printing selected columns can be done in one pass, but
reordering them takes more than a single pass over the input.
Typically, you'd have to buffer a whole line, which might be very
long. You +could+ assume lines are typically shorter than, say, 1024
characters, and arrange fallback to a more-complicated algorithm when
this assumption fails, ... or you could believe in "keep it simple and
stupid" and leave cut(1) alone because it does what it does rather
well, and let people write their own utility for reordering columns.
(This is fairly simple to do in Perl, or less generally even in sed.)

Food for thought:
  $ yes | tr -d '\012' | cut -c 2-,1

(More food for thought: As a matter of fact, the GNU cut(1) I have
here also seems to wait for the final newline before printing anything
at all ...)

/* era */

 $ sed -e 's/\(..\)\(..\).*/\2\1/'

Note addition of comp.unix.shell and followup to same

--
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

 
 
 

Why doesn't 'cut' utility work intuitively?

Post by David Sewe » Tue, 23 Sep 1997 04:00:00





> > In other words, you can't use "cut" to re-order columns or fields.
> > Of course there are other ways to do that, using cut + paste, awk,
> > perl, or whatever. But why would this functionality have been
> > left out of "cut"?

>Because printing selected columns can be done in one pass, but
>reordering them takes more than a single pass over the input.

[...]

Yes, I can see that from a programming point of view a one-pass
'cut' is more sensible.

I probably should have phrased my question a little differently:
why isn't there a standard, high-level text utility that can cut
and reorder columns of input text?  Granted that it's not hard to
write one, I'm just kind of surprised that it isn't part of the
formidable array of common Unix text-processing tools.

--

Dep't of Geosciences, Univ. of Arizona          |  (and fellow-workers) in
 WWW: http://packrat.aml.arizona.edu/~dsew/     |  what happens in the world."
                                                |              --Heraclitus

 
 
 

Why doesn't 'cut' utility work intuitively?

Post by Lawson Hans » Tue, 23 Sep 1997 04:00:00


Hello David,





>> > In other words, you can't use "cut" to re-order columns or fields.
>> > Of course there are other ways to do that, using cut + paste, awk,
>> > perl, or whatever. But why would this functionality have been
>> > left out of "cut"?

>>Because printing selected columns can be done in one pass, but
>>reordering them takes more than a single pass over the input.
>[...]

If by "reordering" the columns, you simply mean, say swap colum three
with column seven, or something simple like that, AND if your data is
VERY regular (i.e.: single item with no-intervening-space per column),
then it is simple to swap that sort of data with a few lines of Awk:

#---------->8-----------Cut Here----------8<-----------
#!/bin/sh
#
# Program:
#    mvcolumn, (based on program: column)
# Author:
#    Lawson Hanson, 19911106
# Purpose:
#    To extract a column from the "data" file
#    and print the data with the column moved
#    to its new place, on stdout
#
if [ $# -lt 2 ] ; then
  echo "Usage:  $0  [ -F field-separator ]  c1-from  c2-to  [ file ]"
  echo "   or:  command | $0  [ -F field-separator ]  c1-from  c2-to"
  exit 1
fi
#
fldsep=" "
#
case $1 in
 -F) fldsep=$2  ;  shift 2  ;;
  *) ;;
esac
#
N1=$1  ;  N2=$2  ;  shift 2
#
nawk  -F"${fldsep}"  '{
  cols=NF
  if ( c1 < c2 ) {
    for (i=1; i<c1; i++) printf("%s%s", $i, FS)
    for (i=c1+1; i<c2; i++) printf("%s%s", $i, FS)
    printf("%s%s", $c1, FS)
    for (i=c2; i<NF; i++) printf("%s%s", $i, FS)
  } else {
    for (i=1; i<c2; i++) printf("%s%s", $i, FS)
    printf("%s%s", $c1, FS)
    for (i=c2; i<c1; i++) printf("%s%s", $i, FS)
    for (i=c1+1; i<NF; i++) printf("%s%s", $i, FS)
  }
  printf("%s\n", $NF)

Quote:}'  c1=$N1 c2=$N2  $*

#---------->8-----------Cut Here----------8<-----------

Quote:>Yes, I can see that from a programming point of view a one-pass
>'cut' is more sensible.
>I probably should have phrased my question a little differently:
>why isn't there a standard, high-level text utility that can cut
>and reorder columns of input text?  Granted that it's not hard to
>write one, I'm just kind of surprised that it isn't part of the
>formidable array of common Unix text-processing tools.

I guess most Unix users would have a little utility like that
tucked away somewhere ... I wrote it in a few minutes when I
needed to do that sort of thing one day ... back in 1991.

I hope that helps.

Best regards,

Lawson Hanson

 
 
 

Why doesn't 'cut' utility work intuitively?

Post by era eriksso » Tue, 23 Sep 1997 04:00:00




 > > > In other words, you can't use "cut" to re-order columns or fields.
 > > > Of course there are other ways to do that, using cut + paste, awk,
 > > > perl, or whatever. But why would this functionality have been
 > > > left out of "cut"?
 > >Because printing selected columns can be done in one pass, but
 > >reordering them takes more than a single pass over the input.
 > [...]
 > I probably should have phrased my question a little differently:
 > why isn't there a standard, high-level text utility that can cut
 > and reorder columns of input text?  Granted that it's not hard to
 > write one, I'm just kind of surprised that it isn't part of the
 > formidable array of common Unix text-processing tools.

Probably because the generations before us wrote their own private
version and forgot about it.

Perhaps you mean "what can be done about it?". One answer might be to
pressure Larry to make a column-reordering script part of the Perl
distribution, perhaps in the eg directory. Seriously, though, if Unix
evolves the way it has thus far, there is no way to make anything
"standard" anymore. You can work to make something part of some
particular distribution or spec (POSIX maybe) or prove it to be good
enough that everyone will want to include it in their version of the
OS, but I don't think reorder_columns is going to make it. (Be a hero
and implement a patch to GNU cut(1) to do what you propose, though!)

Corollary: There will never be a standard script which does sort |
uniq -c | sort -rn

/* era */

On second thought, make it part of Emacs instead :-)

--
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>