## Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

Method      Approximate number of moves
---------   ------------------------------
.           Maximum          Maximum for
.                            1024^10 items
.           ==============   =============

heapsort    n.log(base_2)n   1.2677+32
quicksort   n^2              1.6069+60
BSAM        n + n^0.5        1.2677+30

For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
moves than heapsort and about 30-orders of magnitude (10^30 times)
fewer moves than with quicksort or radixsort.

These results are preliminary only, and may improve.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

Quote:>Method      Approximate number of moves
>---------   ------------------------------
>.           Maximum          Maximum for
>.                            1024^10 items
>.           ==============   =============
>heapsort    n.log(base_2)n   1.2677+32
>quicksort   n^2              1.6069+60
>BSAM        n + n^0.5        1.2677+30
>For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
>moves than heapsort and about 30-orders of magnitude (10^30 times)
>fewer moves than with quicksort or radixsort.

Hoo boy! and I thought this evening was going to be a dull one ...

[0] 1024^10 items ?! We're talking about particles-in-the-universe orders of
magnitude here. Any ratios at that end of the curve are subject material for
"Humour in Statistics".

[1] quicksort N^2 ... true, that is the MAXIMUM, but not nearly the average,
either for uniform (random) inputs, or for real-world datasets. People have
been working for quite a while on how to make quicksort stable in the face
of the classic worst case (data already sorted). An honest figure would be K.N
log N, and a low multiplier constant K at that.

[2] heapsort N log N ... you got that one right. "Vanilla" heapsort takes, on
the average, twice as many comparisons as the quicksort average case.
"heap-with-holes" sort is somewhat closer to quicksort.

[3] radix sort N^2 ... where did you pull that from? radix sort is
heavily dependent on the domain size of the keys being sorted; it tends to be
N+M, where M is the domain size. If the domain is small, it is essentially N.

[4] BSAM N+sqrt(N) ... If that's true, then you are about 20:1 over other N
log N methods at 1M records. On the other hand, it's hard to have a "binary
search" that doesn't show a "logN" term in it. This is going to be INTERESTING.

Quote:>These results are preliminary only, and may improve.

Would be hard for them not to.

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

|>
|> Method      Approximate number of moves
|> ---------   ------------------------------
|> .           Maximum          Maximum for
|> .                            1024^10 items
|> .           ==============   =============
|>
|> heapsort    n.log(base_2)n   1.2677+32
|> quicksort   n^2              1.6069+60
|> BSAM        n + n^0.5        1.2677+30
|>
|> For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
|> moves than heapsort and about 30-orders of magnitude (10^30 times)
|> fewer moves than with quicksort or radixsort.
|>
|> These results are preliminary only, and may improve.

Yes.  They certainly should.  I would give very poor marks to anyone who
couldn't do better than that.

There is a well-known, easy and very old algorithm for sorting N objects
with at most N-1 swaps (which is less than N+sqrt(N)), by generating an
index separately from the movement.  I implemented this in the NAG library
(using merge sort to get the indices) and it performs very well.  As far
as I remember, Knuth doesn't really describe it (i.e. he hints at its
existence, and leaves the details to the imagination of the reader), but
my memory may be at fault.

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.

Tel.:  +44 1223 334761    Fax:  +44 1223 334679

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

Quote:>Method      Approximate number of moves
>.           Maximum          Maximum for
>.                            1024^10 items
>.           ==============   =============
>heapsort    n.log(base_2)n   1.2677+32
>quicksort   n^2              1.6069+60
>BSAM        n + n^0.5        1.2677+30
>For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
>moves than heapsort and about 30-orders of magnitude (10^30 times)
>fewer moves than with quicksort or radixsort.

Since few current commercial computers have more than about
1G main memory, I'm not sure your numbers have any relevance
to the viability of the BSAM as an alternative sorting method
at all. To sort more than about 1G items requires that most
of them reside on disk during the sort, which means that your
algorithm will be affected dramatically by I/O, caching,
and clustering.

But let's forget that and assume that sorting can be done
in memory, and ignore I/O implications.

Given this, comparisons are only relavant down at 1M-1G
items using today's (massively parallel, supercomputing)
architectures. On 1G (1e9) items, one gets

quicksort O(n*log2(n)) =     29 897 352 853
BSAM      O(n*sqrt(n)) = 31 622 776 601 683

Goodness.  Assuming that your claim of O(n*sqrt(n)) is
accurate, then BSAM is 3 orders of magnitude SLOWER.

Incedentally, I see that you point out that the Binary Search
ACCESS method is really a sorting method, after all.  I expect
you'll find that people aren't really all that interested in
sorting any more.  Searching using indices, and maintaining the
indices, is a far more important issue.

(here's part of my Unix 'bc' session, if you con't believe me).

10^9
1000000000
10^9*(l(10^9)/l(2))
29897352853.9 (decimals stripped off)
10^9*sqrt(10^9)
31622776601683.7 (decimals stripped off)
l(10^9)/l(2)
29.89735285398626113114
sqrt(10^9)
31622.77660168379331998893

LEB
--
Larry Baker

My opinions are my own.  Don't mistake them for facts.

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

| >Method      Approximate number of moves
| >.           Maximum          Maximum for
| >.                            1024^10 items
| >.           ==============   =============
| >heapsort    n.log(base_2)n   1.2677+32
| >quicksort   n^2              1.6069+60
| >BSAM        n + n^0.5        1.2677+30
| >For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
| >moves than heapsort and about 30-orders of magnitude (10^30 times)
| >fewer moves than with quicksort or radixsort.

[material deleted which attempted to trivialize my statistics]

| On 1G (1e9) items, one gets
|
| quicksort O(n*log2(n)) =     29 897 352 853
| BSAM      O(n*sqrt(n)) = 31 622 776 601 683
|
| Goodness.  Assuming that your claim of O(n*sqrt(n)) is
| accurate, then BSAM is 3 orders of magnitude SLOWER.

My statistics do not claim n*sqrt(n) but n + sqrt( n);  hence BSAM

[material deleted speculating about interest in sorting vs indexing]

| (here's part of my Unix 'bc' session, if you con't believe me).

(I have no reason not to believe you, but I don't really care about
Unix, do you?  I only use Caldera Linux because it is the most modest
distribution of my seamless and reversible case tool.)

[statistics deleted which are not related by text to the discussion]

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

| |> These results are preliminary only, and may improve.
|
| Yes.  They certainly should.  I would give very poor marks to anyone who
| couldn't do better than that.
|
| There is a well-known, easy and very old algorithm for sorting N objects
| with at most N-1 swaps (which is less than N+sqrt(N)), by generating an
| index separately from the movement.

[garbled, but interesting, Knuth / NAG stuff deleted]

Are "swaps" really the same thing as "moves", or shouldn't "swaps"
really be "2*moves", in which case 2 * ( N - 1) is really larger than
N + sqrt( N), isn't it.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

possible deletions:

| | There is a well-known, easy and very old algorithm for sorting N objects
| | with at most N-1 swaps (which is less than N+sqrt(N))

A recent poster on comp.theory from cs.monash.edu.au states the number
of comparisons for that type of list merge sort with indexes is
log(N!).  So which is it?

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

|> | |> These results are preliminary only, and may improve.
|> |
|> | Yes.  They certainly should.  I would give very poor marks to anyone who
|> | couldn't do better than that.
|> |
|> | There is a well-known, easy and very old algorithm for sorting N objects
|> | with at most N-1 swaps (which is less than N+sqrt(N)), by generating an
|> | index separately from the movement.
|>
|> [garbled, but interesting, Knuth / NAG stuff deleted]
|>
|> Are "swaps" really the same thing as "moves", or shouldn't "swaps"
|> really be "2*moves", in which case 2 * ( N - 1) is really larger than
|> N + sqrt( N), isn't it.

No, it would actually 1.5*moves if you take that approach.  But this is
getting boring.  I think that we should have a shoot-out :-)

If you allow bulk moves to count only as one, or arrays to be reordered
to a new location, it is trivial to reorder an array in N+1 or N moves.
So let us allow only elemental moves, and remember that we must count
all moves to, from and within workspace.  Please post a series of up to
11 (i.e. N+log2(N)) moves that will sort the following array in place
into increasing order:

2, 1, 4, 3, 6, 5, 8, 7

If you can do this, I shall humbly apologise.  If not, ....

Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.

Tel.:  +44 1223 334761    Fax:  +44 1223 334679

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

Quote:>For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer

^^^^^^^
1024^10 = 10,240,000,000,000 is a LOT of items to sort!  What kind of
storage media do you want to store these on while sorting?

Also, like others here, I find it somewhat odd that BSAM "Binary Sort Access
Method" is actually a SORTing method, not an ACCESS method.  Would you
consider renaming it to something that makes it more clear what it is?

-Kyle

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

: Unix, do you?  I only use Caldera Linux because it is the most modest
: distribution of my seamless and reversible case tool.)

Colin's tool is seamless and reversible!!??!?!

Sounds like someone's shining on themselves again.

;-)

--
=====================================================================
Paul Brown                                     ^..^

#include <std_disclaimer.h>
"Think global - act loco!" - Zippy the Pinhead
=====================================================================

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

| So let us allow only elemental moves, and remember that we must count
| all moves to, from and within workspace.  Please post a series of up to
| 11 (i.e. N+log2(N)) moves that will sort the following array in place
| into increasing order:
|
|     2, 1, 4, 3, 6, 5, 8, 7
|

Please remind me to do this after the patent issues because it would
show just exactly how the sort works.

| If you can do this, I shall humbly apologise.  If not, ....
.                       ^^^^^^^^^^^^^^^^^^^^^^
I was unaware that Cambridge computer staff (much more the NAG) ever
sought absolution.  How refreshing;  CS Lewis would be proud!

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

>>Method      Approximate number of moves
>>---------   ------------------------------
>>.           Maximum          Maximum for
>>.                            1024^10 items
>>.           ==============   =============

>>heapsort    n.log(base_2)n   1.2677+32
>>quicksort   n^2              1.6069+60
>>BSAM        n + n^0.5        1.2677+30

>>For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
>>moves than heapsort and about 30-orders of magnitude (10^30 times)
>>fewer moves than with quicksort or radixsort.

>Hoo boy! and I thought this evening was going to be a dull one ...

>[0] 1024^10 items ?! We're talking about particles-in-the-universe orders of
>magnitude here. Any ratios at that end of the curve are subject material for
>"Humour in Statistics".

>[1] quicksort N^2 ... true, that is the MAXIMUM, but not nearly the average,
>either for uniform (random) inputs, or for real-world datasets. People have
>been working for quite a while on how to make quicksort stable in the face
>of the classic worst case (data already sorted).

This is only true if you choose the "median" to be the first or last element!!
The worst case behavior occurs when the "median" is the smallest or largest
element (or close to) in the partition.
simply choosing the "median" to be the middle element
(the actual median!) results in good behavior on sorted data.
i felt obliged to post because there seems to be a myth circulating
in this newsgroup that qsort worst case behavior is on sorted data when
in fact the worst case behavior depends on how close the "median" is
to the actual median (resulting in uneven partioning)!

- Show quoted text -

Quote:>An honest figure would be K.N
>log N, and a low multiplier constant K at that.

>[2] heapsort N log N ... you got that one right. "Vanilla" heapsort takes, on
>the average, twice as many comparisons as the quicksort average case.
>"heap-with-holes" sort is somewhat closer to quicksort.

>[3] radix sort N^2 ... where did you pull that from? radix sort is
>heavily dependent on the domain size of the keys being sorted; it tends to be
>N+M, where M is the domain size. If the domain is small, it is essentially N.

>[4] BSAM N+sqrt(N) ... If that's true, then you are about 20:1 over other N
>log N methods at 1M records. On the other hand, it's hard to have a "binary
>search" that doesn't show a "logN" term in it. This is going to be INTERESTING.

>>These results are preliminary only, and may improve.

>Would be hard for them not to.

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

| So let us allow only elemental moves, and remember that we must count
| all moves to, from and within workspace.  Please post a series of up to
| 11 (i.e. N+log2(N)) moves that will sort the following array in place
| into increasing order:
|
|     2, 1, 4, 3, 6, 5, 8, 7
|

Please remind me to do this after the patent issues because it would
show just exactly how the sort works.

Umm, Colin ... I'm afraid you've fallen for the trap.  I believe
that there is no way to perform the above sort without at least 12
moves.
--

"You cannot pray to a personal computer no matter how user-friendly it is."
(PGP key available upon request)                             W. Bingham Hunter

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

deletions:

|

|
|    | So let us allow only elemental moves, and remember that we must count
|    | all moves to, from and within workspace.  Please post a series of up to
|    | 11 (i.e. N+log2(N)) moves that will sort the following array in place
|    | into increasing order:
|    |
|    |     2, 1, 4, 3, 6, 5, 8, 7
|    |
|
|    Please remind me to do this after the patent issues because it would
|    show just exactly how the sort works.
|
| Umm, Colin ... I'm afraid you've fallen for the trap.  I believe
| that there is no way to perform the above sort without at least 12
| moves.

So what.

I did not analyze the challenge because it would have compromised the
protection afforded me by the patent process.

You are mistaken that I fell for a trap, just because I declined
participation in a meaningless exercise.

Maclaren's further comments were what were amusing, anyway, and
obviously constitute the value of his post. (See Maclaren's comments
about "humbly apologizing" -- overstated irony due to repetitious use
of similar words, eg in poetry, "colorful red" or "sweet sugar".)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

CEC Services, 2080 Kipling St, Lakewood, CO  80215-1502   USA
Voice: 303.231.9437;  Facsimile: .231.9438;  Data:  .231.9434
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

### Preliminary statistics for The Binary Sort Access Method [BSAM], a sorting method

Quote:

>For 1024^10 items, BSAM has 2-orders of magnitude (10^2 times) fewer
>moves than heapsort and about 30-orders of magnitude (10^30 times)
>fewer moves than with quicksort or radixsort.

Let me get this straight:

1024^10 = (2^10)^10 = 2 ^ 100 = 1.27 * 10^30 records.

To have distinct key values, the keys must be at least 100 bits (13 bytes)
long.  So you are indicating  that if I have a database of at least
1.6 * 10^31 bytes (That's 1.6 * 10^19 terabytes) we can get a factor of
100 performance increase?

I would find it hard to believe there is more than 10^5 terabytes of hard
disk space in the world (that's 100 million 1 gigabyte drives).

So if you must keep how your algorithm works a secret, could you please
explain how numbers like these are going to be relevant before the
end of the next century?

Thanks,
Mark Butler

--