gnu sort, field selection bug?

gnu sort, field selection bug?

Post by * Tong » Fri, 17 Jan 2003 06:47:55



Hi,

I think I have found a bug in sort field selection algorithm. The
following are the examples:

$ ls -1 sfa*
sfa1001ext
sfa1002ext
sfa100ext
sfa10ext
sfa1ext
sfa200ext
sfa20ext
sfa2ext
sfa300ext
sfa30ext
sfa3ext

The goal is sort on the number. Before we get into it, let's look at
some warm up exercises first:

$ ls sfa* | sort -k 1.4
sfa1001ext
sfa1002ext
sfa100ext
sfa10ext
sfa1ext
sfa200ext
sfa20ext
sfa2ext
sfa300ext
sfa30ext
sfa3ext

$ ls sfa* | sort -n -k 1.4
sfa1ext
sfa2ext
sfa3ext
sfa10ext
sfa20ext
sfa30ext
sfa100ext
sfa200ext
sfa300ext
sfa1001ext
sfa1002ext

  -- so far so good

$ ls sfa* | sort -n -k 1.4,1.4
sfa1001ext
sfa1002ext
sfa100ext
sfa10ext
sfa1ext
sfa200ext
sfa20ext
sfa2ext
sfa300ext
sfa30ext
sfa3ext

  -- There is the bug! The output should be the same as previous one.

My sort comes along with RH8:

$ sort --v
sort (textutils) 2.0.21
Written by Mike Haertel and Paul Eggert.

The impat of this bug will make it impossible to sort in certain
circumstances. For example:

$ ls -1 sfb*
sfb10-B
sfb10000-A
sfb10001-A
sfb10002-A
sfb11-B
sfb12-B
sfb8-B
sfb9-B
sfb9998-A
sfb9999-A

We need to sort the list first by the characters then by the
numbers. There is no way to do it with the current sort program:

$ ls -1 sfb* | sort -t- -k2,2 -n -k1.4,1.4
sfb10-B
sfb10000-A
sfb10001-A
sfb10002-A
sfb11-B
sfb12-B
sfb8-B
sfb9-B
sfb9998-A
sfb9999-A

  -- this should be the right way, but the result is wrong.

$ ls -1 sfb* | sort -t- -k2 -n -k1.4,1.5
sfb8-B
sfb9-B
sfb10-B
sfb10000-A
sfb10001-A
sfb10002-A
sfb11-B
sfb12-B
sfb9998-A
sfb9999-A

$ ls -1 sfb* | sort -t- -k2 -n -k1.4,1.6
sfb8-B
sfb9-B
sfb10-B
sfb11-B
sfb12-B
sfb10000-A
sfb10001-A
sfb10002-A
sfb9998-A
sfb9999-A

What do you think? Thanks

--
Tong (remove underscore(s) to reply)
  *niX Power Tools Project: http://xpt.sourceforge.net/
  - All free contribution & collection

 
 
 

gnu sort, field selection bug?

Post by John » Fri, 17 Jan 2003 13:42:26



Quote:> Hi,

> I think I have found a bug in sort field selection algorithm. The
> following are the examples:

> $ ls sfa* | sort -n -k 1.4
> sfa1ext
> sfa2ext
> sfa3ext
> sfa10ext
> sfa20ext
> sfa30ext
> sfa100ext
> sfa200ext
> sfa300ext
> sfa1001ext
> sfa1002ext

>   -- so far so good

> $ ls sfa* | sort -n -k 1.4,1.4
> sfa1001ext
> sfa1002ext
> sfa100ext
> sfa10ext
> sfa1ext
> sfa200ext
> sfa20ext
> sfa2ext
> sfa300ext
> sfa30ext
> sfa3ext

>   -- There is the bug! The output should be the same as previous one.

No it should not. -k 1.4,1.4 tells sort to start and stop sorting on
the fourth character of the first field (1, 2 or 3) which it has done.
In the previous example, -k 1.4 says to start on the same character,
and, where that character is the same in two items, to continue along
the field.

John.

 
 
 

gnu sort, field selection bug?

Post by Dave Bro » Fri, 17 Jan 2003 14:20:55




> I think I have found a bug in sort field selection algorithm. The
> following are the examples:

> ....
> My sort comes along with RH8:

> $ ls -1 sfb*
> sfb10-B
> sfb10000-A
> sfb10001-A
> sfb10002-A
> sfb11-B
> sfb12-B
> sfb8-B
> sfb9-B
> sfb9998-A
> sfb9999-A

> We need to sort the list first by the characters then by the
> numbers. There is no way to do it with the current sort program:

> $ ls -1 sfb* | sort -t- -k2,2 -n -k1.4,1.4
> sfb10-B
> sfb10000-A
> sfb10001-A
> sfb10002-A
> sfb11-B
> sfb12-B
> sfb8-B
> sfb9-B
> sfb9998-A
> sfb9999-A

>   -- this should be the right way, but the result is wrong.

You didn't post a "follow-up to" newsgroup, so I'll pick one.

I think your problem is where you're applying a global "n" option.

Try ls -1 sfb* | sort -t- -k2 -k1.4n

sfb9998-A
sfb9999-A
sfb10000-A
sfb10001-A
sfb10002-A
sfb8-B
sfb9-B
sfb10-B
sfb11-B
sfb12-B

which is, I think, what you want.

 
 
 

gnu sort, field selection bug?

Post by * Tong » Mon, 20 Jan 2003 11:09:32


PERFECT! Thanks a thousand Dave. I *thought* I've made a good
contribution to gnu tools (but submitting the detailed bug
report), but only to find I still have lots to learn :-)

--
Tong (remove underscore(s) to reply)
  *niX Power Tools Project: http://xpt.sourceforge.net/
  - All free contribution & collection

 
 
 

1. bash - sort command - can I sort by a certain field?

I'd like to use a bash command to sort a file by a certain field in that
file.
I've looked at sort but I can't see how to do it - interestingly, the
docs say the -t option is for specifying a delimiter, but then I can't
see anything else obvious about saying which field you'd like to sort by!
If sort can't do this directly, I'm sure something is possible using cut
+ sort together, but the exact form eludes me at the moment!

thanks
alex

2. XDMCP - not working

3. Why does sort or GNU sort (v1.5) perform like this?

4. PATCH: fix arcnet locking for 2.5

5. sort sort: 0653-657 A write error occurred while sorting (4.1.3)

6. glib version help

7. Sorting file by last field

8. Boot off SAN

9. Sorting ps on the TIME field

10. Multiple Field Sorts in UNIX(tm)

11. sort on multiple fields

12. Sorting fields in a single line of text?

13. sort on multiple fields