Change in sort dictionary order in Solaris 10

Change in sort dictionary order in Solaris 10

Post by Gary Mil » Sat, 19 Mar 2005 06:23:54



It seems that `sort -d' behaves differently in Solaris 10 when
the input file contains 8-bit characters.  This file:

Hospital A
Hospital! B
Htspital C
Hospital D

sorts without change in Solaris 9, but this way in Solaris 10:

Hospital A
Hospital! B
Hospital D
Htspital C

The second last line of the input file contains an `o' with a
circumflex over it.  It's hex value is `f4'.  My news software will
force me to edit it before posting.  We are using the en_CA.ISO8859-1
locale.

Does anyone know how `dictionary order' is supposed to behave in
this instance?

--
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

 
 
 

Change in sort dictionary order in Solaris 10

Post by Rich Tee » Sat, 19 Mar 2005 08:26:56



> Does anyone know how `dictionary order' is supposed to behave in
> this instance?

I *believe* it's locale specific.

--
Rich Teer, SCNA, SCSA

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich

 
 
 

Change in sort dictionary order in Solaris 10

Post by Thomas Dicke » Sat, 19 Mar 2005 09:34:48




>> Does anyone know how `dictionary order' is supposed to behave in
>> this instance?
> I *believe* it's locale specific.

But that wasn't the question he was asking.  Since you're running Solaris
10, perhaps you can quote from the sort manpage.  Looking at Solaris 8
(this host), it states (among other things):

  Ordering Options
     The default sort order depends on the value  of  LC_COLLATE.
     If  LC_COLLATE  is set to C, sorting will be in ASCII order.
     If LC_COLLATE is set to en_US, sorting is  case  insensitive
     except  when the two strings are otherwise equal and one has
     an uppercase letter earlier than the  other.  Other  locales
     will have other sort orders.

     The following options override the default  ordering  rules.
     When  ordering  options  appear independent of any key field
     specifications,  the  requested  field  ordering  rules  are
     applied  globally  to  all  sort  keys.  When  attached to a
     specific key (see Sort Key Options), the specified  ordering
     options  override  all global ordering options for that key.
     In the obsolescent forms, if one or more  of  these  options
     follows  a  +pos1  option, it will affect only the key field
     specified by that preceding option.

     -d    ``Dictionary'' order: only letters, digits, and blanks
           (spaces and tabs) are significant in comparisons.

--
Thomas E.*ey
http://www.veryComputer.com/
ftp://invisible-island.net

 
 
 

Change in sort dictionary order in Solaris 10

Post by Gary Mil » Sat, 19 Mar 2005 09:57:27





>>> Does anyone know how `dictionary order' is supposed to behave in
>>> this instance?
>> I *believe* it's locale specific.
>But that wasn't the question he was asking.  Since you're running Solaris
>10, perhaps you can quote from the sort manpage.  Looking at Solaris 8
>(this host), it states (among other things):
>     -d    ``Dictionary'' order: only letters, digits, and blanks
>           (spaces and tabs) are significant in comparisons.

That part didn't change in Solaris 10.  It also says:

     As noted, certain of the field modifiers (such as -M and -d)
     cause  the  interpretation  of  input  data  to be done with
     reference to locale-specific settings. The results  of  this
     interpretation  can  be unexpected if one's expectations are
     not aligned with the conventions established by the  locale.
     ...
         For printable  or
     dictionary  order, if these concepts are not well-defined by
     the locale, an empty sort key may be the result, leading  to
     the  next  key being the significant one for determining the
     appropriate ordering.

That's still not very specific, and doesn't explain the change between
Solaris 9 and Solaris 10, with the same local in each case.  I also
tried the sort in the C locale, with no change.

--
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

 
 
 

Change in sort dictionary order in Solaris 10

Post by Thomas Dicke » Sat, 19 Mar 2005 11:07:52



> That's still not very specific, and doesn't explain the change between
> Solaris 9 and Solaris 10, with the same local in each case.  I also
> tried the sort in the C locale, with no change.

I'm puzzled too.  But just to see more, I entered your example (adding the
0xf4 character), and as noted am on Solaris 8 (which I know has defective
locale tables, just to add to the confusion).  "locale -a" asserts that
this machine has en_CA.ISO8859-1, so I set LANG to that and ran sort -d.
But it produces the same effect as my en_US locale setting (as you see
in Solaris 10).

According to strcoll's manpage, the LC_COLLATE information for Solaris is
in ".so" files under /usr/lib/locale/locale:

FILES
     /usr/lib/locale/locale/locale.so.*
           LC_COLLATE database for locale

though I see an actual file which differs (ymmv):

        /usr/lib/locale/en_CA.ISO8859-1/en_CA.ISO8859-1.so.2

Other than by writing a program to infer the information, I don't see a
way to decide if the LC_COLLATE data are the same on the two systems.

--
Thomas E.*ey
http://www.veryComputer.com/
ftp://invisible-island.net

 
 
 

Change in sort dictionary order in Solaris 10

Post by Gary Mill » Sun, 20 Mar 2005 13:03:48



Quote:>It seems that `sort -d' behaves differently in Solaris 10 when
>the input file contains 8-bit characters.  This file:

Hospital A
Hospital! B
H?spital C
Hospital D

Quote:>sorts without change in Solaris 9, but this way in Solaris 10:

Hospital A
Hospital! B
Hospital D
H?spital C

I've inserted the two files with the original characters now.

Quote:>We are using the en_CA.ISO8859-1 locale.
>Does anyone know how `dictionary order' is supposed to behave in
>this instance?
>--
>-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

--
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-
 
 
 

Change in sort dictionary order in Solaris 10

Post by Thomas Dicke » Mon, 21 Mar 2005 02:21:04




>>It seems that `sort -d' behaves differently in Solaris 10 when
>>the input file contains 8-bit characters.  This file:
> Hospital A
> Hospital! B
> H?spital C
> Hospital D
>>sorts without change in Solaris 9, but this way in Solaris 10:

Looking again, I'm not sure what collating order would do that - unless
for instance, sort is mangling the data so that '?' is treated identically
to 'o'.

Quote:> Hospital A
> Hospital! B
> Hospital D
> H?spital C
> I've inserted the two files with the original characters now.
>>We are using the en_CA.ISO8859-1 locale.
>>Does anyone know how `dictionary order' is supposed to behave in
>>this instance?

Assuming that it's based on the locale, as I noted one could write a program
using strcoll() to deduce the information.  On this host (Solaris 8), I see
that 'o' is collated before the '?', which is consistent with Solaris 10.

141: 111 0x6f 0157 (o)
142: 211 0xd3 0323 ()
143: 243 0xf3 0363 ()
144: 210 0xd2 0322 ()
145: 242 0xf2 0362 ()
146: 212 0xd4 0324 (?)
147: 244 0xf4 0364 (?)

Here's the program (which does produce more output than quoted above).
You might try running that on the two systems to see if the program's
output is different - it might give some clues (or 'sort' in Solaris 9
might have not handled the comparison strictly according to locale).

/*
 * Use strcoll() to display single-byte character values ordered in the current
 * locale's collating order -T.Dickey 2005/3/19.
 */
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <limits.h>
#include <locale.h>

typedef unsigned char UChar;

static int
compare(const void *a, const void *b)
{
    static char p[2];
    static char q[2];
    p[0] = *(const UChar *) a;
    q[0] = *(const UChar *) b;
    return strcoll(p, q);

Quote:}

int
main(void)
{
    int ch;
    unsigned char sorted[UCHAR_MAX + 1];

    setlocale(LC_ALL, "");
    for (ch = 0; ch < sizeof(sorted); ++ch)
        sorted[ch] = ch;
    qsort(sorted, sizeof(sorted) / sizeof(UChar), sizeof(UChar), compare);
    for (ch = 0; ch < sizeof(sorted); ++ch) {
        printf("%03u: %3u 0x%02x %#04o", ch, sorted[ch], sorted[ch], sorted[ch]);
        if (isprint(sorted[ch]))
            printf(" (%c)", sorted[ch]);
        printf("\n");
    }
    return EXIT_SUCCESS;

Quote:}

--
Thomas E.*ey
http://www.veryComputer.com/
ftp://invisible-island.net
 
 
 

1. In search of Solaris 2.5 Netscape Navigator 3.0 plug ins

I am searching for plugins for Netscape Navigator on a SPARC Solaris 2.5
system.  While any help would be appreciated, I am especially looking for
plugs for RealAudio, wav, and Shockwave files.

Surely I don't have to go out and buy an IBM PC running windows or a
just to be able to access sites which use these technologies.
--

:s <URL:http://www.teraform.com/%7Elvirden/> <*> O- "We are all Kosh."
:s Unless explicitly stated to the contrary, nothing in this posting should
:s be construed as representing my employer's opinions.

2. RAID support in Linux?

3. Code fragments/plug-ins for solaris?

4. newsreader with authentication

5. Change sort order?

6. : Ramifications of UID change?

7. changing the sort order

8. Installing new kernel.

9. Need HELP to Log User Log-ins form the internet

10. Suse 7.0; Yast2, kinternet, rc.dialout und ich komme nicht ins Internet

11. Netscape plug-ins on AIX 4

12. what is a INS server?

13. dial ins