index duplicates PK, is it redundant?

index duplicates PK, is it redundant?

Post by bil » Fri, 11 Jul 2003 03:44:51



I came across a database that has an index with the same columns, in
the same order, as the primary key.  Is this index likely to be
useless and redundant?  I expect that if the index had different sort
orders it could be useful.

How about one index that duplicates another - if an index duplicates
another, would the second index be useless, redundant, and a drag on
the system?

Are PKs implemented by an index, so if an index that duplicates a PK,
this is the same as an index that duplicates another?

What space is used by the PK?  Is it like a MSSQLServer key and can be
clustered or not, or is the PK always stored in the same space as the
table?

Bill

 
 
 

index duplicates PK, is it redundant?

Post by Andrew Ham » Fri, 11 Jul 2003 10:30:34



> I came across a database that has an index with the same columns, in
> the same order, as the primary key.  Is this index likely to be
> useless and redundant?  I expect that if the index had different sort
> orders it could be useful.

No, it's not useless. The PK will "piggy-back" onto the index of the same
shape.

There are actually some administration benefits to doing it that way. You
can specifically remove the primary key and yet the index will keep working
allowing improved performance while you perform administration tasks. If the
PK is the index (ie based on the secret underlying index) then you can't
disable the PK without also disabling the index and therefore losing some
performance benefits.

Quote:> How about one index that duplicates another - if an index duplicates
> another, would the second index be useless, redundant, and a drag on
> the system?

Theoretically (and in practice?) the engine should block the existence of an
identical index. However it does appear to allow one index to exist that is
a sub-set of another index. That sorta makes sense IF the smaller index
allows duplicates and the larger index is unique, but even so I'd be
automatically suspicious of any indexs that are "too close" to each other.
I'd have to see a very strong case that proves they are relevant.

Quote:> Are PKs implemented by an index, so if an index that duplicates a PK,
> this is the same as an index that duplicates another?

If a suitable unique index already exists when a PK is defined, then the PK
will piggy-back a ride on the index. If no suitable index exists, then the
engine will create it's own index. If you look in sysindexes, you'll notice
that these secret indexes have a name that starts with a space character,
therefore it becomes impossible for anyone to affect that index with
ordinary SQL statements. You *could* affect them by doing * to the
sysindexes and other tables, but that's just begging for trouble when you
*up some underlying relationship that should exist...

Quote:> What space is used by the PK?  Is it like a MSSQLServer key and can be
> clustered or not, or is the PK always stored in the same space as the
> table?

Errrr emm errr....

Mapping the space concepts of one engine to another is not always possible
or complete. Informix indexes may be stored either with or separate from the
rows of the table. Even if stored together the index pages occupy different
"extents" within the "tablespace".

Any informix index can be clustered, but only one per table. For Informix,
the clustering operation is a one-off operation which physically sorts the
data rows to follow the sequence of the index. You need to re-apply the
clustering operation on a regular basis if events such as row deletions,
modifications, or irregular insertion of rows causes the clustering effect
to deteriorate.

Note that this operation demands the existence of enough space to duplicate
the table, since the old rows are not destroyed until the new copy of the
table is completely written. Further, this extra space must exist in the
dbspace(s) that the table currently occupies.

However if you have a table, clustered perhaps on the always-increasing key,
and rows are never deleted, then the table will maintain it's natural
clustering because new rows will be added to the end and the key will always
be bigger than the previous row. In fact, for such a table, applying a
clustering operation is a moot point because the laws of nature keep the
data rows in physical order.

--
There's nowt wrong wi' owt what mitherin' clutterbucks don't barly grummit!
-
Replies directly to this message will go to an account that may not be
checked for a week or two. For more timely e-mail response, use (only
in an emergency) ahamm sanderson net au with all the usual punctuation.

 
 
 

index duplicates PK, is it redundant?

Post by Mark Denha » Fri, 11 Jul 2003 11:14:07


Yes. No point in having 2 identical indexes, chances are the optimizer is
only going to use one of them. If one is a PK then that will be used.

PKs are implemented as an index. If you do not provide an explicit name for
the PK the engine generates one for you. I think it is best to deliberately
specify a name for the PK.

Since the PK is enforced via an index, the answer to you last question is it
depends:

1) Version of informix. With 9.x indexes are separate from the table by
default.
2) If you create the index first and then add the PK, the PK will use the
existing index. The location specified for the index dictates where the
index is stored. In pre9 versions, if you do not specify a location then the
index will be stored in the same extents as the data [the index will have
separate pages].
3) In 9.x, if you set the environment variable DEFAULT_ATTACH=1 when the
engine is started, indexes will be created in the same extents as the table
if you do not explicitly change the location [ie 7.x functionality is
adhered too].

Mark

----- Original Message -----


Sent: Wednesday, July 09, 2003 14:44
Subject: index duplicates PK, is it redundant?

> I came across a database that has an index with the same columns, in
> the same order, as the primary key.  Is this index likely to be
> useless and redundant?  I expect that if the index had different sort
> orders it could be useful.

> How about one index that duplicates another - if an index duplicates
> another, would the second index be useless, redundant, and a drag on
> the system?

> Are PKs implemented by an index, so if an index that duplicates a PK,
> this is the same as an index that duplicates another?

> What space is used by the PK?  Is it like a MSSQLServer key and can be
> clustered or not, or is the PK always stored in the same space as the
> table?

> Bill

sending to informix-list
 
 
 

index duplicates PK, is it redundant?

Post by Neil Trub » Fri, 11 Jul 2003 16:40:29




> > I came across a database that has an index with the same columns, in
> > the same order, as the primary key.  Is this index likely to be
> > useless and redundant?  I expect that if the index had different sort
> > orders it could be useful.

> No, it's not useless. The PK will "piggy-back" onto the index of the same
> shape.

So you're saying that the engine will not create another index of the same
construction, and you won't suffer the additional overhead of maintaining
one extra physical index?
 
 
 

index duplicates PK, is it redundant?

Post by Mark D. Stoc » Fri, 11 Jul 2003 18:56:35



> I came across a database that has an index with the same columns, in
> the same order, as the primary key.  Is this index likely to be
> useless and redundant?  I expect that if the index had different sort
> orders it could be useful.

Primary keys are implemented using an index, so the index was most
likely created by the primary key. If the index has a sensible name,
then it was most likely created manually, and the primary key is using
it as well.

Quote:> How about one index that duplicates another - if an index duplicates
> another, would the second index be useless, redundant, and a drag on
> the system?

AFAIK, you cannot create a duplicate index. You should receive an error
if you try this, although I haven't tried it for some time. If you
create a primary key and a similar index exists, then that index is used
by the primary key.

Quote:> Are PKs implemented by an index, so if an index that duplicates a PK,
> this is the same as an index that duplicates another?

I think I covered this.

Quote:> What space is used by the PK?  Is it like a MSSQLServer key and can be
> clustered or not, or is the PK always stored in the same space as the
> table?

I don't know SQLServer, but clustering in Informix is a term referring
to the order of the data, not the indexes. However, you implement this
by creating a clustered index, which will physically order the data in
the table. The location of the index pages is something different.

In version 7 and before, index pages are stored with the data pages by
default. You can explicitly detach indexes by locating them in a
different dbspace or fragment them.

In version 9, index pages are NOT stored with the data pages by default,
they are detached. You can still explicitly locate them in a different
dbspace or fragment them.

Cheers,
--
Mark.

+----------------------------------------------------------+-----------+

| Mydas Solutions Ltd            http://MydasSolutions.com |/////  / //|
|                      +-----------------------------------+////  / ///|
|                      |We value your comments, which have |///  / ////|
|                      |been recorded and automatically    |//  / /////|
|                      |emailed back to us for our records.|/  ////////|
+----------------------+-----------------------------------+-----------+

sending to informix-list

 
 
 

index duplicates PK, is it redundant?

Post by Art S. Kage » Fri, 11 Jul 2003 21:39:02






>> > I came across a database that has an index with the same columns, in
>> > the same order, as the primary key.  Is this index likely to be
>> > useless and redundant?  I expect that if the index had different sort
>> > orders it could be useful.

>> No, it's not useless. The PK will "piggy-back" onto the index of the
>> same shape.

> So you're saying that the engine will not create another index of the
> same construction, and you won't suffer the additional overhead of
> maintaining one extra physical index?

Correct, IDS will NOT create two indexes with the identical set of keys
in the same order with the identical sort ordering.  You CAN create
multiple indexes with the same keys in different order or with the
ASCending/DESCending attributes of the columns different.

If you create a PK and an index on those columns already exists the PK
will use the existing index and no create a new one, yes.

Art S. Kagel

 
 
 

index duplicates PK, is it redundant?

Post by Art S. Kage » Fri, 11 Jul 2003 21:58:16


Most of this has been covered so just briefs, but the last needs more
comment:

Quote:> I came across a database that has an index with the same columns, in the
> same order, as the primary key.  Is this index likely to be useless and
> redundant?  I expect that if the index had different sort orders it
> could be useful.

As noted the primary key is using that index.  Check out the
sysconstraints record for that constraint.

Quote:> How about one index that duplicates another - if an index duplicates
> another, would the second index be useless, redundant, and a drag on the
> system?

As noted, you cannot get a dup unless the order of the columns is
different or the ASCending/DESCending attributes of the ey fields are
different.  BTW IDS can read an index in either order so these
attributes are only useful when the combination of them in the keys
is not strictly the inverse of another index.  When they are inverses the
optimizer will likely use whichever was created first as the sort and
filter values of each as well as the depth and node counts will be
identical.

Quote:> Are PKs implemented by an index, so if an index that duplicates a PK,
> this is the same as an index that duplicates another?

When you create a PK it will use any existing unique index on the
selected key(s), if none exist it creates a 'hidden' index, one whose
name begins with a space character.

Quote:> What space is used by the PK?  Is it like a MSSQLServer key and can be
> clustered or not, or is the PK always stored in the same space as the
> table?

Just the space needed by the constraint record (sysconstraints) and the
space needed by the primary key index.  Clustering in Informix is
different than in SQLServer and its progenitor Sybase (and different
again in Oracle).  Sybase/SQLServer Clustered indexes remove the table
altogether and hang each data row off the index key node.  These indexes
remain clustered over time and are VERY efficient for selecting a single
row by that one key.  HOWEVER, a Sybase/SQLServer cluster index causes
the table to be SLIGHTLY less efficient to access using other indexes and
VERY INEFFICIENT to access using any non-indexed field (ie during a table
scan) or even to retrieve a range of keys.  

Informix Clutered index is simply another index but the table's rows have
been sorted on disk so that they are stored in the same order as their
keys in the index so that rows with adjacent keys are adjacent on disk.
This is VERY efficient for any kind of table scan using the cluster index
including ranges of keys without affecting the efficiency of access via
other indexes at all.  The downside is that the table does not remain
clustered long and the efficiency is lost over time so the index must be
'reclustered' periodically.

IDS index location was covered in another post.

Art S. Kagel

 
 
 

index duplicates PK, is it redundant?

Post by Andrew Ham » Sat, 12 Jul 2003 13:57:40



>> No, it's not useless. The PK will "piggy-back" onto the index of the
>> same shape.

> So you're saying that the engine will not create another index of the
> same construction, and you won't suffer the additional overhead of
> maintaining one extra physical index?

Bang on. Additionally, you make a modest gain for administration as I've
described and that's enough for me to prefer an explicit index first with a
piggy-back PK.

The same sort of deal happens with FK's by the way. They also require an
index on the columns that relate to the PRIMARY table. However some foreign
keys are very very poor candidates for indexing, so in a production
environment you may need to make a pragmatic decision to drop a FK
relationship in the interests of decent performance. In development, it's
about a thousand times more important to get debugging effects over
performance, so I say, put in every FK you can think of into development.

The bad ones are ones that have a lot of duplication in the table containing
the FK. For instance, imagine you have a US state reference table; the first
field is the state code eg CA, IL, etc, and the 2nd field can be a
description. A simple enough reference table.

Now, you might have an invoice table with several hundred thousand rows at
least. Let's say you apply a FK relationship to the state table. This
requires an index to be built on the state column of invoice table. If the
table has 500,000 rows and there are 50 states (give or take a Carolina or
two) then on average there will be 10,000 rows in the invoice table for each
state. Some states will of course be more highly represented...

Now, 10,000 identical elements in an index are basically stored by storing
one copy of the key value (eg 'CA' ) and then a linked list of the row
identifiers (not exactly the rowid) which means a linked list of 10,000
elements.

Managing a linked list of 10,000 elements is not a fast thing. If you ever
modify the state, or indeed delete a row from the invoice table, count on it
taking a large amount of time to find each pointer for each row. Deleting
10,000 elements for example during a purge, could take a large part of a
week to complete...

I think Informix should add the option of an index-free foreign key
relationship. That needs some explanation:

Firstly, the reference table containing the PK will by necessity be a unique
index. So that's fine. Welcome aboard. The uniq index on the PK is necessary
and sufficient for a very fast check that the column in the foreign table
exists in the primary table.

Just from that, you might wonder why the index is needed on the FK fields in
the first place. It's used whenever you attempt to delete a row from the
primary table. That is only legal if there is no related row in another
table. Therefore an index on the FK fields should be a quick way to find if
any related rows exist.

Well, it would be fast to find, but not frikken fast if you've defined
cascaded delete and there's a huge number of duplicates. Perhaps FK indexes
could be designed to optionally contain only the key value and a count of
referents. In that case, it will be very fast to maintain, still very fast
to find a relationship, and if you get silly with a cascaded delete, well,
the engine can do a sequential scan almost as quickly as using a very poorly
distributed index.

By the same argument, I think there's room to optionally declare a FK
relationship to be implemented completely without an index on the FK side.

Thankfully, an FK relationship between a pair of header/detail tables, such
as an invoice and the line items, will have a very nice index since on
average the FK index will only have as many duplicates as there are line
items per invoice, for example 10 on average.

 
 
 

1. Duplicate Indexes with PK

Another basic indexing question.

Access creates an index based on primary key field in addition to the
primary key index. Example: primary key is UserID field. When field is set
to primary key, Access creates PrimaryKey index as UserID, as well as UserID
index as UserID.

Is there a need in SQL 7 for two indexes on the same field, or can the index
with the field name be deleted?

Thanks!

Neil

2. 24420-MA-Boston-Metro West-Software Development-SYBASE-ORACLE-UNIX-Implementatio

3. I am getting duplicate records, I mean everything is duplicate even Access record number

4. Problem Connecting to Access

5. Redundant Systems, or am I just obsessive?

6. Errors E_PS0B04 and E_SC0215

7. Duplicates in PK

8. Localized version of MSDE

9. Transferring data from raw data with duplicate pk's

10. PK duplicate error-trapping

11. Adding PK with duplicate data

12. Removing duplicate records without using a pk

13. OI 1.2 Keys and Redundant Indices