Clustered Composite Index Is Big...too Big?

Clustered Composite Index Is Big...too Big?

Post by al » Thu, 23 Jan 2003 14:59:15



We have time based database with two main tables (which are really
horizontally partitioned views partitioned by month).

By putting the Data and Indexes on separate files I was able to discover
that one day of Data is 3 megs and indexes for the day are 72 megs. I did a
test where I deleted a non clustered index on each table and the index file
size dropped to 50 megs I do know that the size of the clustered index
affects the size of the non clustered index so this was expected. There are
a couple of other non clustered indexes too.

I am still pretty convinced that between the needs of our queries and the
structure of our Database our primary clustered composite unique key on the
2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
LogicalDate is our partition constraint and the rest make it unique. This is
the logical structure of the data from a reporting point of view. They are
all 4bit columns.

When our queries are designed around this structure they are very fast no
matter what size the DB. It seems that in most cases because there are quite
a few Y's that a query will decide to scan and skip the YID part of the
index. LogicalDate and XID are usually sufficient for a query to be fast.
The amount of records beneath XID is apparently at a size where a table scan
is the optimizers choice.

This being articulated it seems clear that any new schema would probably be
for flexibility since narrowing the fact tables would reduce the data
size...which is not so big, and maybe even add to the index size.

Maybe I have it wrong but I am looking for explanations why. It seems odd to
me that an index of 4 columns would be so much bigger than the data itself.
It seems to me they would be closer to the same size. At this time the index
size is not a deal breaker but I feel that I need to come to a deeper
understanding of what is going on.

Thanks In Advance,

al

 
 
 

Clustered Composite Index Is Big...too Big?

Post by Steve Kas » Thu, 23 Jan 2003 15:47:08


Al,

  It's very hard to tell what your database looks like from a narrative
description, but I'll at least point out some of the things you may be
overlooking.

  You say "one day of data is 3 megs, and indexes for the day
are 72 megs."  What do you mean be indexes "for the day"?
Indexes on tables are on the whole table, not on just 3% of the
table.

  If you have 3 megs of data a day and a year of data, even assuming
only 20 days/month, you have roughly 500 megs of data.  The non-
clustered index you dropped freed up space equal to 5% of the table,
so the index is far from being as large as the table.

  If you can, please try to be more specific as to why you think
the indexes are taking up too much space.  It will help if you provide
CREATE TABLE statements and the statements that create the indexes.

Steve Kass
Drew University


>We have time based database with two main tables (which are really
>horizontally partitioned views partitioned by month).

>By putting the Data and Indexes on separate files I was able to discover
>that one day of Data is 3 megs and indexes for the day are 72 megs. I did a
>test where I deleted a non clustered index on each table and the index file
>size dropped to 50 megs I do know that the size of the clustered index
>affects the size of the non clustered index so this was expected. There are
>a couple of other non clustered indexes too.

>I am still pretty convinced that between the needs of our queries and the
>structure of our Database our primary clustered composite unique key on the
>2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
>LogicalDate is our partition constraint and the rest make it unique. This is
>the logical structure of the data from a reporting point of view. They are
>all 4bit columns.

>When our queries are designed around this structure they are very fast no
>matter what size the DB. It seems that in most cases because there are quite
>a few Y's that a query will decide to scan and skip the YID part of the
>index. LogicalDate and XID are usually sufficient for a query to be fast.
>The amount of records beneath XID is apparently at a size where a table scan
>is the optimizers choice.

>This being articulated it seems clear that any new schema would probably be
>for flexibility since narrowing the fact tables would reduce the data
>size...which is not so big, and maybe even add to the index size.

>Maybe I have it wrong but I am looking for explanations why. It seems odd to
>me that an index of 4 columns would be so much bigger than the data itself.
>It seems to me they would be closer to the same size. At this time the index
>size is not a deal breaker but I feel that I need to come to a deeper
>understanding of what is going on.

>Thanks In Advance,

>al


 
 
 

Clustered Composite Index Is Big...too Big?

Post by Andrew J. Kell » Thu, 23 Jan 2003 22:42:26


Steve's right in that having the DDL and some sample data goes a long way.
The amount of space taken up in the db has little to do with just the size
of the columns.  If you create a clustered index with a fill factor of 50 it
will take twice as much space as if you specified 100.  Of coarse a fill
factor is only maintained when the index is built and you can have page
splits etc affect this even more.  You keep saying it table scans but I bet
it is really doing partial Clustered Index scans.  These can be a lot
different than table scans in terms of performance.

--

Andrew J. Kelly
SQL Server MVP


Quote:> We have time based database with two main tables (which are really
> horizontally partitioned views partitioned by month).

> By putting the Data and Indexes on separate files I was able to discover
> that one day of Data is 3 megs and indexes for the day are 72 megs. I did
a
> test where I deleted a non clustered index on each table and the index
file
> size dropped to 50 megs I do know that the size of the clustered index
> affects the size of the non clustered index so this was expected. There
are
> a couple of other non clustered indexes too.

> I am still pretty convinced that between the needs of our queries and the
> structure of our Database our primary clustered composite unique key on
the
> 2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
> LogicalDate is our partition constraint and the rest make it unique. This
is
> the logical structure of the data from a reporting point of view. They are
> all 4bit columns.

> When our queries are designed around this structure they are very fast no
> matter what size the DB. It seems that in most cases because there are
quite
> a few Y's that a query will decide to scan and skip the YID part of the
> index. LogicalDate and XID are usually sufficient for a query to be fast.
> The amount of records beneath XID is apparently at a size where a table
scan
> is the optimizers choice.

> This being articulated it seems clear that any new schema would probably
be
> for flexibility since narrowing the fact tables would reduce the data
> size...which is not so big, and maybe even add to the index size.

> Maybe I have it wrong but I am looking for explanations why. It seems odd
to
> me that an index of 4 columns would be so much bigger than the data
itself.
> It seems to me they would be closer to the same size. At this time the
index
> size is not a deal breaker but I feel that I need to come to a deeper
> understanding of what is going on.

> Thanks In Advance,

> al

 
 
 

Clustered Composite Index Is Big...too Big?

Post by Gert-Jan Stri » Fri, 24 Jan 2003 03:27:50


In addition to the other answers, if this is a long running system where
periods are deleted once in a while, then chances are, that the
nonclustered indexes have a lot of unused space. You can use dbcc
show_contig to view the fragmentation and unused space of an index. If
this is the case, then you should reindex.

Gert-Jan


> We have time based database with two main tables (which are really
> horizontally partitioned views partitioned by month).

> By putting the Data and Indexes on separate files I was able to discover
> that one day of Data is 3 megs and indexes for the day are 72 megs. I did a
> test where I deleted a non clustered index on each table and the index file
> size dropped to 50 megs I do know that the size of the clustered index
> affects the size of the non clustered index so this was expected. There are
> a couple of other non clustered indexes too.

> I am still pretty convinced that between the needs of our queries and the
> structure of our Database our primary clustered composite unique key on the
> 2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
> LogicalDate is our partition constraint and the rest make it unique. This is
> the logical structure of the data from a reporting point of view. They are
> all 4bit columns.

> When our queries are designed around this structure they are very fast no
> matter what size the DB. It seems that in most cases because there are quite
> a few Y's that a query will decide to scan and skip the YID part of the
> index. LogicalDate and XID are usually sufficient for a query to be fast.
> The amount of records beneath XID is apparently at a size where a table scan
> is the optimizers choice.

> This being articulated it seems clear that any new schema would probably be
> for flexibility since narrowing the fact tables would reduce the data
> size...which is not so big, and maybe even add to the index size.

> Maybe I have it wrong but I am looking for explanations why. It seems odd to
> me that an index of 4 columns would be so much bigger than the data itself.
> It seems to me they would be closer to the same size. At this time the index
> size is not a deal breaker but I feel that I need to come to a deeper
> understanding of what is going on.

> Thanks In Advance,

> al

 
 
 

Clustered Composite Index Is Big...too Big?

Post by al » Fri, 24 Jan 2003 08:50:55


Never mind, I did not know that a clustered index always is stored on the
same Filegroup as the Table.
I thought I was moving indexes but the tables went with them.
al


Quote:> We have time based database with two main tables (which are really
> horizontally partitioned views partitioned by month).

> By putting the Data and Indexes on separate files I was able to discover
> that one day of Data is 3 megs and indexes for the day are 72 megs. I did
a
> test where I deleted a non clustered index on each table and the index
file
> size dropped to 50 megs I do know that the size of the clustered index
> affects the size of the non clustered index so this was expected. There
are
> a couple of other non clustered indexes too.

> I am still pretty convinced that between the needs of our queries and the
> structure of our Database our primary clustered composite unique key on
the
> 2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
> LogicalDate is our partition constraint and the rest make it unique. This
is
> the logical structure of the data from a reporting point of view. They are
> all 4bit columns.

> When our queries are designed around this structure they are very fast no
> matter what size the DB. It seems that in most cases because there are
quite
> a few Y's that a query will decide to scan and skip the YID part of the
> index. LogicalDate and XID are usually sufficient for a query to be fast.
> The amount of records beneath XID is apparently at a size where a table
scan
> is the optimizers choice.

> This being articulated it seems clear that any new schema would probably
be
> for flexibility since narrowing the fact tables would reduce the data
> size...which is not so big, and maybe even add to the index size.

> Maybe I have it wrong but I am looking for explanations why. It seems odd
to
> me that an index of 4 columns would be so much bigger than the data itself
.
> It seems to me they would be closer to the same size. At this time the
index
> size is not a deal breaker but I feel that I need to come to a deeper
> understanding of what is going on.

> Thanks In Advance,

> al

 
 
 

Clustered Composite Index Is Big...too Big?

Post by Andrew J. Kell » Fri, 24 Jan 2003 10:35:11


Yes, the leaf node of a Clustered Index is the actual data.

--

Andrew J. Kelly
SQL Server MVP


> Never mind, I did not know that a clustered index always is stored on the
> same Filegroup as the Table.
> I thought I was moving indexes but the tables went with them.
> al



> > We have time based database with two main tables (which are really
> > horizontally partitioned views partitioned by month).

> > By putting the Data and Indexes on separate files I was able to discover
> > that one day of Data is 3 megs and indexes for the day are 72 megs. I
did
> a
> > test where I deleted a non clustered index on each table and the index
> file
> > size dropped to 50 megs I do know that the size of the clustered index
> > affects the size of the non clustered index so this was expected. There
> are
> > a couple of other non clustered indexes too.

> > I am still pretty convinced that between the needs of our queries and
the
> > structure of our Database our primary clustered composite unique key on
> the
> > 2 fact tables must be on four columns: LogicalDate,XID,YID,DateTime.
> > LogicalDate is our partition constraint and the rest make it unique.
This
> is
> > the logical structure of the data from a reporting point of view. They
are
> > all 4bit columns.

> > When our queries are designed around this structure they are very fast
no
> > matter what size the DB. It seems that in most cases because there are
> quite
> > a few Y's that a query will decide to scan and skip the YID part of the
> > index. LogicalDate and XID are usually sufficient for a query to be
fast.
> > The amount of records beneath XID is apparently at a size where a table
> scan
> > is the optimizers choice.

> > This being articulated it seems clear that any new schema would probably
> be
> > for flexibility since narrowing the fact tables would reduce the data
> > size...which is not so big, and maybe even add to the index size.

> > Maybe I have it wrong but I am looking for explanations why. It seems
odd
> to
> > me that an index of 4 columns would be so much bigger than the data
itself
> .
> > It seems to me they would be closer to the same size. At this time the
> index
> > size is not a deal breaker but I feel that I need to come to a deeper
> > understanding of what is going on.

> > Thanks In Advance,

> > al