## To Btree or not to Btree, that is the question...

### To Btree or not to Btree, that is the question...

Any advice on this will be gratefully recieved...

Scenario
--------

Table containing details of individuals. Each individual can belong to an
arbitrary number of sets. Within each set, the individual may be a
participant or an extra.

Separate table describes the sets. It contains three columns. First is a
set identifier (i2), second is the id number of the individual from the
other table (i4), third is flag describing the individuals role in this set
(i1).

Both tables are essentially static.

The queries that are executed on the second table are of two kinds.

The first is "Give me the id numbers of all members of this set". The
second is "Give me the id numbers of all actively participating members of
this set".

The most sensible choice of table structure for the latter table seems to
be ISAM on set_identifier and role_flag. However, there are between 90 and
3,000 members in each set. The latter figure results in a large overflow if
I use ISAM.

So, ...

Is a btree going to be more efficient than ISAM with overflow, since I can
see no way to get rid of the overflow?

TMIA,

Andy Law
------------------

( Big Nose in Edinburgh )

### To Btree or not to Btree, that is the question...

Quote:> [snip]
>The queries that are executed on the second table are of two kinds.

>The first is "Give me the id numbers of all members of this set". The
>second is "Give me the id numbers of all actively participating members of
>this set".

>The most sensible choice of table structure for the latter table seems to
>be ISAM on set_identifier and role_flag. However, there are between 90 and
>3,000 members in each set. The latter figure results in a large overflow if
>I use ISAM.
>Is a btree going to be more efficient than ISAM with overflow, since I can
>see no way to get rid of the overflow?

I'm going to stick my neck out a little, and say that since you have to
read the rows anyway, the overflow in this case is not real relevant.
I'd stay with ISAM.  In fact, if most of the time, most members are
actively participating, I might even consider hash on set_identifier.
With 2K pages you are still only reading something like 15 pages in the
worst case.

Why not try a couple sample queries either way, and see what actual
execution has to say?  (you can use dbmsinfo or trace point qe90 to get
actual execution numbers.)

Karl R. Schendel, Jr.

Ingres and Unix Expertise

President, North American Ingres Users Association

Join the NAIUA and make your voice heard!

### To Btree or not to Btree, that is the question...

Quote:>The most sensible choice of table structure for the latter table seems to
>be ISAM on set_identifier and role_flag. However, there are between 90 and
>3,000 members in each set. The latter figure results in a large overflow if
>I use ISAM.

I am somewhat puzzled by this, as I have never seen significant
overflow problems on a static ISAM table. This of course assumes that
considered adding the individual id to the (which can normally be
ignored but would presumably make it unique)?

I would anticipate BTREE being better than a heavily overloaded ISAM.

Mark Crowder
The British Library

### To Btree or not to Btree, that is the question...

Hello!

I wouldn't worry about the overflow unless you are updating the table
frequently and you need to have high concurrency. The performace difference
between ISAM and BTREE structure is not significant. If the tables are used for
OLAP applications, even HEAP structure will be fine for your query providing
that you have defined the proper indexes.

The decision for storage structure must be based on the managability and the
concurrency of the table, not for query tuning, in my opnion.

Regards, Young HAN
International Telecommunication Union

> Any advice on this will be gratefully recieved...

> Scenario
> --------

> Table containing details of individuals. Each individual can belong to an
> arbitrary number of sets. Within each set, the individual may be a
> participant or an extra.

> Separate table describes the sets. It contains three columns. First is a
> set identifier (i2), second is the id number of the individual from the
> other table (i4), third is flag describing the individuals role in this set
> (i1).

> Both tables are essentially static.

> The queries that are executed on the second table are of two kinds.

> The first is "Give me the id numbers of all members of this set". The
> second is "Give me the id numbers of all actively participating members of
> this set".

> The most sensible choice of table structure for the latter table seems to
> be ISAM on set_identifier and role_flag. However, there are between 90 and
> 3,000 members in each set. The latter figure results in a large overflow if
> I use ISAM.

> So, ...

> Is a btree going to be more efficient than ISAM with overflow, since I can
> see no way to get rid of the overflow?

> TMIA,

> Andy Law
> ------------------

> ( Big Nose in Edinburgh )

Hi, everybody!

I was wonderring if there is somebody out there who could help me with
understand how index extensions work...
Let me state the problem first.

I have many (15) boolean attributes and I need to be able to search the
database for entries with any combination of those attributes for being
true. For example - find all the entries, where a1=a2=a3=true or find
all the entries where a1=a2=a4=true etc...
Because there are so many of them (and the database is HUGE), putting
every attribute into a separate column and creating a separate index on
every possible combination, is really out of the question.
So, I was thinking about creating a single int2 column, with each bit
representing an attribute - so that, the first query I quoted above
would look like "select * from table where attributes & 7 = 7", and the
other query would be
"select * from table where attributes & 11 = 11' etc...

This looked so beautiful to me, but now I am stuck trying to index that
table  [:-(]

I started off, hoping to get away with btrees.

I defined an operator >>=(int2,int2) as 'select \$1&\$2=\$2;'
It looks nice so far, but then the question is - what do I do with the
other operations? By analogy with 'normal' comparison operators, I would do:

>> (I know the name is taken  [:-)]  as 'select not \$2 >>= \$1'
=<<                                  as 'select \$2 >>= \$1'
<<                                   as 'select not \$1 >>= \$2'
... and leave '=' intact.

But then I realized, that these set of operators, does not really define
a complete order - for example, if I compare, say, 5 and 3:
5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the
same time  [:-(]

So my question is, first of all, is that a problem? Does btree require a
complete order defined? Will it work with partial order?
Secondly, if it is a problem, perhaps, I am missing something here,
assuming that there is no way to define a set of operations to do what I
want and provide a completely ordered set (or do I need it to define a
perfect complete order - what exactly is required for btree to work? Any
ideas?)

And finally, if there is just no way I could get away with btrees, can I
make an rtree to work for me? Could somebody explain to me (or point me
to a doc somewhere) the meaning of the strategies (and requirements -
like transitivity etc...) I need for an rtree, and also what support
functions (like comparison func in case of a btree) do I need?

Thank you very much for your attention.
Any input will be greatly appreciated.

Dima