File Sort: How good is Unix sort?

File Sort: How good is Unix sort?

Post by Biju Srinivasa » Sat, 25 Oct 1997 04:00:00



I have a requirement where i need to sort and merge 2 flat files on a
Dec Alpha unix server.
$ uname -a  gives
OSF1 hostname V4.0 564 Alpha

The files have approx. 50 million rows each, with rows of around 1000
bytes.
The sort is rather straight-forward, on 2 fixed-width fields in the
files. I have no idea how efficient unix sort will be on such a
sort/merge nor do i have the resources to test this with real data at
this point. I am not worried about the time it is going to take, but
more on whether unix sort can handle such a volume of data.
I also came across a 3rd party appl. SyncSort which i believe is
effiicient, but much more powerful than what i need, which is just a
simple sort. Is it worth investing money on SyncSort for such a simple
need?
Will unix sort work for me? Are there any freeware utilities available
for my need?

Thanks in advance for any advice.
Biju.

 
 
 

File Sort: How good is Unix sort?

Post by Neil Schemenau » Sun, 26 Oct 1997 04:00:00



>I have a requirement where i need to sort and merge 2 flat files on a
>Dec Alpha unix server.
>$ uname -a  gives
>OSF1 hostname V4.0 564 Alpha

>The files have approx. 50 million rows each, with rows of around 1000
>bytes.
>The sort is rather straight-forward, on 2 fixed-width fields in the
>files. I have no idea how efficient unix sort will be on such a
>sort/merge nor do i have the resources to test this with real data at
>this point. I am not worried about the time it is going to take, but
>more on whether unix sort can handle such a volume of data.
>I also came across a 3rd party appl. SyncSort which i believe is
>effiicient, but much more powerful than what i need, which is just a
>simple sort. Is it worth investing money on SyncSort for such a simple
>need?
>Will unix sort work for me? Are there any freeware utilities available
>for my need?

Wow. That's a lot of data. At first I thought there is no way
sort will handle that. After looking at the GNU sort source,
I think I will be proved wrong. What you need is an external
merge sort. It looks like GNU sort does that. You will need a lot
of free disk space though, at least double the size of the two
files. If you don't have GNU sort, you can get it at:

        ftp://prep.ai.mit.edu/pub/gnu/

Good luck.

        Neil

 
 
 

File Sort: How good is Unix sort?

Post by era eriksso » Sun, 26 Oct 1997 04:00:00



posted to comp.unix.questions,comp.unix.osf.osf1,comp.unix.shell:
 > I have a requirement where i need to sort and merge 2 flat files on a
 > Dec Alpha unix server.

Are the files presorted?

 > Will unix sort work for me? Are there any freeware utilities available
 > for my need?

Each vendor's sort(1) will probably be different. If you have the
source, look at the source. If not, you might want to try a version
for which you do get the source. GNU sort (part of the textutils
package; prep.ai.mit.edu:/pub/gnu/textutils-*.tar.gz or a mirror close
to you) comes with source and has an option to merge presorted files
efficiently.

Hope this helps,

/* era */

Followups should probably go to only one newsgroup but I can't really
decide which ...

--
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

 
 
 

File Sort: How good is Unix sort?

Post by Tim Smither » Tue, 28 Oct 1997 04:00:00


Quite good.
I have sorted 800Mb data-sets before using just the
sort that is native to DEC. As the other poster pointed
out, it is only lack of disk space that will stop you, since
sort writes it's subsets to disk.

        -mouse

--
Tim Smithers.
DMouse Pty. Ltd.
ph: 0414 366 873 (Mobile)
ph: +61 2 9351 5698

( Perfection is a hobby many pursue,
         but few have a chance of success )

 
 
 

File Sort: How good is Unix sort?

Post by Bob Vicker » Tue, 28 Oct 1997 04:00:00



> I have a requirement where i need to sort and merge 2 flat files on a
> Dec Alpha unix server.
> $ uname -a  gives
> OSF1 hostname V4.0 564 Alpha

> The files have approx. 50 million rows each, with rows of around 1000
> bytes.

Biju,

You should note the following lines from the Digital Unix sort man page:

 Lines longer than 1024 bytes are truncated by sort.  The maximum
number  of fields on a line is 10.

Bob
--
======================================================================

Dept of Computer Science, Royal Holloway College, University of London
WWW:    http://www.dcs.rhbnc.ac.uk/%7Ebobv/
Phone:  +44 1784 443691

 
 
 

File Sort: How good is Unix sort?

Post by Joe Schun » Wed, 29 Oct 1997 04:00:00




> > I have a requirement where i need to sort and merge 2 flat files on a
> > Dec Alpha unix server.
> > $ uname -a  gives
> > OSF1 hostname V4.0 564 Alpha

> > The files have approx. 50 million rows each, with rows of around 1000
> > bytes.

> Biju,

> You should note the following lines from the Digital Unix sort man page:

>  Lines longer than 1024 bytes are truncated by sort.  The maximum
> number  of fields on a line is 10.

That's industrial strength sorting ... You might want to take a look at
SyncSort/Unix http://www.syncsort.com/ -- A UNIX version descended from
a long line of big iron sorters.

--
-------------------------------------------------------------------

 
 
 

1. sort sort: 0653-657 A write error occurred while sorting (4.1.3)

On a 4.1.3 system, we intermittently are seeing:

sort: 0653-657 A write error occurred while sorting

There is nothing in errpt, and the destination file system of
the sort is certainaly not full.

Does anyone have any suggestions as to what this could mean?

Thanks.

p.s. the actual sort operation is:

sort -d -f -u +0 -1 +1 -2 funds.load | awk '
  { printf "%s %s %s\n", $1, $2, $3 }' >funds.sorted

 *or it might be* :

sort -d -f -u +0 -1 acc_codes.update >acc_codes.sorted

(this is not my code, and I'm not familiar with its design).

Thanks,

--

2. pbm library

3. UNIX sort exponents (bc|sort)

4. susse

5. Solaris 2.3 binaries needed ...

6. New access to popular Solaris forum via news (NNTP) readers -- pan, nn, gnus, trn, etc.

7. can unix sort routine sort w/ non ASCII collating sequence?

8. Need good pgm to sort/file/filter/classify email in UNIX-style mbox's?

9. UNIX sort exponents (bc|sort)