sort keys based on character positions, not field positions?

sort keys based on character positions, not field positions?

Post by Norbert Burge » Tue, 11 Jul 2000 04:00:00



I'm working on a heavily-used (in our system) wrapper around /bin/sort.  A
file to be sorted,
key positions, and other sort parameters are passed into this routine, and
then /bin/sort is
system()'ed with the parameters.  Because the sort routine has no
information about the
actual contents of the file (whitespace, specifically), keys are defined in
terms of column positions,
and then /bin/sort is forced to treat the entire line as one field with the
argument:

sort "-t\\\n" +0.0 -0.72 ...

On SunOS 5.5.1, this fails with the message: "sort: option requires an
argument -- t".  This
particular routine has worked on Irix.

1) How can I portably define sort keys in terms of column positions, not
field positions?  Or,
equivalently, how can I indicate that sort should use the entire line as one
field, not whitespace
separated fields?

2) The "-t\\\n" syntax seems correct to me.  Is there a better solution?

I'd very much appreciate any tips.

Norbert Burger

 
 
 

sort keys based on character positions, not field positions?

Post by christia.. » Tue, 11 Jul 2000 04:00:00


Dear Norbert,

it should work like that:


Unfortunately, there seems to be a bug in Solaris 8, where this doesn't
work. I recommend to use GNU-sort instead (fast, almost same syntax).

Regards,
  Christian

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

sort keys based on character positions, not field positions?

Post by Norbert Burge » Tue, 11 Jul 2000 04:00:00



> it should work like that:


> Unfortunately, there seems to be a bug in Solaris 8, where this doesn't
> work. I recommend to use GNU-sort instead (fast, almost same syntax).

This is a great suggestion -- thanks, Christian.  But it makes me a little concerned --
especially because I don't know the contents of the files being sorted.  Does sort
accept higher-value characters (>127) as field separators?

Related question; consider the following text file (testsort.txt):

a b
a a

When running 'sort < testsort.txt', I would expect the file to be printed as, with no
change in line order.  (Field separator, by default, is whitespace, and the first two
keys are equal.).  But instead, I get:

a a
a b

Why?

Norbert

 
 
 

sort keys based on character positions, not field positions?

Post by Barry Margoli » Tue, 11 Jul 2000 04:00:00




>> it should work like that:


>> Unfortunately, there seems to be a bug in Solaris 8, where this doesn't
>> work. I recommend to use GNU-sort instead (fast, almost same syntax).

>This is a great suggestion -- thanks, Christian.  But it makes me a
>little concerned --
>especially because I don't know the contents of the files being sorted.
>Does sort
>accept higher-value characters (>127) as field separators?

I've never needed to specify a field separator when I wanted to sort by
character position.  I just use:

sort -k 1.<start>,1.<end>

and it works fine.  I just did:

sort -k 1.5,1.7

with the input file:

11 1ccc
22 2aaa

and the result was as expected:

22 2aaa
11 1ccc

Quote:>Related question; consider the following text file (testsort.txt):

>a b
>a a

>When running 'sort < testsort.txt', I would expect the file to be printed
>as, with no
>change in line order.  (Field separator, by default, is whitespace, and
>the first two
>keys are equal.).  But instead, I get:

>a a
>a b

>Why?

From the man page:

      By default, there is one sort key, the entire input line.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

sort keys based on character positions, not field positions?

Post by Norbert Burge » Tue, 11 Jul 2000 04:00:00



> >Why?
> From the man page:
>       By default, there is one sort key, the entire input line.

But also from the man page (in my mind, a contradictory statement):

Quote:>     -tx  Use x as the field separator character; x is not considered to be
>          part of a field (although it may be included in a sort key).  Each
>          occurrence of x is significant (for example, xx delimits an empty
>          field).  x may be a supplementary code set character.  The default
>          field separators are blank characters.

In the absence of a the -t option, the default separator is a blank character.  What
happens when the sort key (by default, the entire line) is larger than the sort field (by
default, a potentially-empty sequence of non-whitespace characters)?

In your file:

Quote:> 11 1ccc
> 22 2aaa

My understanding is that the first sort field is '11 1ccc', and that the first sort key is '11'.
What am I missing?

Norbert

 
 
 

sort keys based on character positions, not field positions?

Post by Barry Margoli » Tue, 11 Jul 2000 04:00:00




>> >Why?
>> From the man page:
>>       By default, there is one sort key, the entire input line.

>But also from the man page (in my mind, a contradictory statement):

>>     -tx  Use x as the field separator character; x is not considered to be
>>          part of a field (although it may be included in a sort key).  Each
>>          occurrence of x is significant (for example, xx delimits an empty
>>          field).  x may be a supplementary code set character.  The default
>>          field separators are blank characters.

>In the absence of a the -t option, the default separator is a blank
>character.  What
>happens when the sort key (by default, the entire line) is larger than
>the sort field (by
>default, a potentially-empty sequence of non-whitespace characters)?

Field separators are used to determine where to start from when
interpreting field specifiers.  But a sort key can span multiple fields,
and it will if the field_end goes past the field separator character.

Quote:>In your file:
>> 11 1ccc
>> 22 2aaa

>My understanding is that the first sort field is '11 1ccc', and that the
>first sort key is '11'.
>What am I missing?

I specified -k 1.5,1.7, so the key field is the 5th-7th characters after
the beginning of the 1st field.  That's 'cc' and 'aa'.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

sort keys based on character positions, not field positions?

Post by christia.. » Wed, 12 Jul 2000 04:00:00


Quote:> >> 11 1ccc
> >> 22 2aaa
> I specified -k 1.5,1.7, so the key field is the 5th-7th characters
after
> the beginning of the 1st field.  That's 'cc' and 'aa'.

But then you get in trouble if you have a file like:
 1 1ccc
22 2aaa
i.e., some lines starting with a blank, or not!?

Regards,
  Christian

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

sort keys based on character positions, not field positions?

Post by Barry Margoli » Wed, 12 Jul 2000 04:00:00



>> >> 11 1ccc
>> >> 22 2aaa
>> I specified -k 1.5,1.7, so the key field is the 5th-7th characters
>after
>> the beginning of the 1st field.  That's 'cc' and 'aa'.

>But then you get in trouble if you have a file like:
> 1 1ccc
>22 2aaa
>i.e., some lines starting with a blank, or not!?

No, I don't.  I tried:

sort -k 1.5,1.5
 1 1zb
22 2dd

and the result was:

22 2dd
 1 1zb

If it had skipped the leading blank before counting on the first line, it
would have compared 'b' to 'd', but it actually compared 'z' to 'd'.

Adding the -b option caused it to skip over the leading blank, resulting
in:

 1 1zb
22 2dd

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

1. Help Sorting by character position

We have a problem with a file that we receive.  It's fields are fixed
length but without seperators. The records vary in length with the max
record length being 163 characters. The records look something like like
this:

L1XXXXXXXXWA  XXXXX..............................  .......
L3XXXXXXXXWA  XXXXX...  .......      ...........................  ..
L2XXXXXXXXWA  XXXXX............. ..................

The first two characters represent a record tye while the cahracters 3
to  17 represent a customer number.

something like sort file on characters 3 to 17 and 1 to 2 to newfile.
So far I can't make heads or tails of the sort command syntax.

Thanks

Ronney B

2. HP jet direct/jetadmin not working from non-root w/tcp/ip

3. How do I "insert" characters based on a position?

4. resolv.conf and /etc/hosts - HELP Please

5. Insert a field into a position of a file

6. Slackware color xterm

7. sed insert character in specific position relative to end of line

8. What CD-Recordable is supported by FreeBSD?

9. (Q) Fonts with characters in positions 128-160

10. re- Position of a character

11. Position of a character in a string

12. Check for a character in a specific position on a line

13. Character position