Question on indexing column(s) within a table

Question on indexing column(s) within a table

Post by B-Ma » Sat, 31 Jan 2004 20:23:03



I have a table that contains about 450K records that represents sales leads.
 Within this table there are a set of columns (26 to be exact) that indicate
the industry the lead is in.  For eaxample, lets say:

A = Transportation
B = Telecommunications
C = Utility
.
.
.
Z = Other

This table I'm in the process of upgrading to SQL server (well MSDE for the
moment until development is complete) has one column for each of these
industry types.  For Example COL_A is either "A" or (null), COL_B is either
"B" or (null), etc...

Idealy I'd like to be able to perform a fast search in my app that would
return only leads in a particular industry.  For Example, All Sales Leads in
a list of postal codes that are in the Transportation industry.  My Question
is this:  Is it better to  1.) leave this table as it is, index each
Industry type column and customize my queries to seach on the respective
column for the industry type or 2.) Create a 32bit integer column where each
bit in the column represents each of the industry types (i.e. A = bit1, B =
bit2, C=bit3, etc) and index this single numeric column.

Option 1 seems real limited in scalability if say we wanted to add a new
industry type.  We'd have to add a column, create and index, write seperate
stored procs for the new type.  This seems like alot of work to have to do
whenever we wanted to add an new type.

Oprion2 seems better from a scalability standpoint than Option 1, A new type
is just assigned the value of the next bit.  However to query a given
industry type, I would imagine I would have to perform a bitwise AND on the
column and return all where the results of the AND was > 0.  However I would
only need one index for all the industry types which I would assume would be
a little more efficient.  Also are numerical indexes more efficient then a
chararacter?  Finally, it would seem that I would have wasted space in my
table whenever I had less than 32 industry types stored in a 32 bit integer
column.  We have 26 currently, and a 32bit int col would waste 6 bits per
record.  However I'm assuming if you had a lead that was unrealisticly in 25
industries, you would waste 1 byte (versus 6 bits) in the empty char column.

Is there any prefered or more correct solution to this problem.  Maybe I'm
thinking way too deeply into it and should just stick with the way the table
is currently laid out?  I'm hoping to get a few opinions from people that
have been at this long than myself.

Thanks,

Barry Lance

 
 
 

Question on indexing column(s) within a table

Post by Kevin3N » Sat, 31 Jan 2004 20:36:29


Normalize the design by creating an IndustryType table.  Kill the 26 columns
and put in one FK column pointing back to the PK in the new IndustryType
table.  Then put an index on the IndustryTypeID field in the Lead table.

--
Kevin Hill
President
3NF Consulting

www.3nf-inc.com/NewsGroups.htm


Quote:> I have a table that contains about 450K records that represents sales
leads.
>  Within this table there are a set of columns (26 to be exact) that
indicate
> the industry the lead is in.  For eaxample, lets say:

> A = Transportation
> B = Telecommunications
> C = Utility
> .
> .
> .
> Z = Other

> This table I'm in the process of upgrading to SQL server (well MSDE for
the
> moment until development is complete) has one column for each of these
> industry types.  For Example COL_A is either "A" or (null), COL_B is
either
> "B" or (null), etc...

> Idealy I'd like to be able to perform a fast search in my app that would
> return only leads in a particular industry.  For Example, All Sales Leads
in
> a list of postal codes that are in the Transportation industry.  My
Question
> is this:  Is it better to  1.) leave this table as it is, index each
> Industry type column and customize my queries to seach on the respective
> column for the industry type or 2.) Create a 32bit integer column where
each
> bit in the column represents each of the industry types (i.e. A = bit1, B
=
> bit2, C=bit3, etc) and index this single numeric column.

> Option 1 seems real limited in scalability if say we wanted to add a new
> industry type.  We'd have to add a column, create and index, write
seperate
> stored procs for the new type.  This seems like alot of work to have to do
> whenever we wanted to add an new type.

> Oprion2 seems better from a scalability standpoint than Option 1, A new
type
> is just assigned the value of the next bit.  However to query a given
> industry type, I would imagine I would have to perform a bitwise AND on
the
> column and return all where the results of the AND was > 0.  However I
would
> only need one index for all the industry types which I would assume would
be
> a little more efficient.  Also are numerical indexes more efficient then a
> chararacter?  Finally, it would seem that I would have wasted space in my
> table whenever I had less than 32 industry types stored in a 32 bit
integer
> column.  We have 26 currently, and a 32bit int col would waste 6 bits per
> record.  However I'm assuming if you had a lead that was unrealisticly in
25
> industries, you would waste 1 byte (versus 6 bits) in the empty char
column.

> Is there any prefered or more correct solution to this problem.  Maybe I'm
> thinking way too deeply into it and should just stick with the way the
table
> is currently laid out?  I'm hoping to get a few opinions from people that
> have been at this long than myself.

> Thanks,

> Barry Lance


 
 
 

Question on indexing column(s) within a table

Post by Terr » Sat, 31 Jan 2004 20:42:44


Hi Barry,

If each record has only one industry you could have a single field
IndustryType and then either store the text of the industry type or you can
have a lookup table of industry types and store the numeric key field of a
particular industry in that column.

If one Saleslead can have multiple types then you can have a table with two
fields, the key from your SalesLeads table and the key from the IndustryType
lookup table. That way one way sales lead could be associated with multiple
industry types.

Option1 is inflexible, bad design  I'm not familiar with option 2 but it
seems unnecessarily complicated.

I'm not an expert but perhaps this will help.


Quote:> I have a table that contains about 450K records that represents sales
leads.
>  Within this table there are a set of columns (26 to be exact) that
indicate
> the industry the lead is in.  For eaxample, lets say:

> A = Transportation
> B = Telecommunications
> C = Utility
> .
> .
> .
> Z = Other

> This table I'm in the process of upgrading to SQL server (well MSDE for
the
> moment until development is complete) has one column for each of these
> industry types.  For Example COL_A is either "A" or (null), COL_B is
either
> "B" or (null), etc...

> Idealy I'd like to be able to perform a fast search in my app that would
> return only leads in a particular industry.  For Example, All Sales Leads
in
> a list of postal codes that are in the Transportation industry.  My
Question
> is this:  Is it better to  1.) leave this table as it is, index each
> Industry type column and customize my queries to seach on the respective
> column for the industry type or 2.) Create a 32bit integer column where
each
> bit in the column represents each of the industry types (i.e. A = bit1, B
=
> bit2, C=bit3, etc) and index this single numeric column.

> Option 1 seems real limited in scalability if say we wanted to add a new
> industry type.  We'd have to add a column, create and index, write
seperate
> stored procs for the new type.  This seems like alot of work to have to do
> whenever we wanted to add an new type.

> Oprion2 seems better from a scalability standpoint than Option 1, A new
type
> is just assigned the value of the next bit.  However to query a given
> industry type, I would imagine I would have to perform a bitwise AND on
the
> column and return all where the results of the AND was > 0.  However I
would
> only need one index for all the industry types which I would assume would
be
> a little more efficient.  Also are numerical indexes more efficient then a
> chararacter?  Finally, it would seem that I would have wasted space in my
> table whenever I had less than 32 industry types stored in a 32 bit
integer
> column.  We have 26 currently, and a 32bit int col would waste 6 bits per
> record.  However I'm assuming if you had a lead that was unrealisticly in
25
> industries, you would waste 1 byte (versus 6 bits) in the empty char
column.

> Is there any prefered or more correct solution to this problem.  Maybe I'm
> thinking way too deeply into it and should just stick with the way the
table
> is currently laid out?  I'm hoping to get a few opinions from people that
> have been at this long than myself.

> Thanks,

> Barry Lance

 
 
 

Question on indexing column(s) within a table

Post by B-Ma » Sat, 31 Jan 2004 21:10:12


Thanks for the replies.  Yes, from what I've seen looking at the data, a
sales lead can have more than one industry type.  I knew there had to be a
better solution than dealing with the 26 individual fields.  The option of
creating a table that x-refs the sales lead and industry type hadn't occured
to me and certainly seems less complicated than using bit values.
Sometimes, those logical solutions can be the last ones to be thought of
;-).  The current app was written in Access and something needs to be done
with it since querying 450K records in Access is painful to say the least.
So I'm using this app as my initiation into the world of ADO.Net and SQL
server programming.

Thanks Again,

Barry

 
 
 

Question on indexing column(s) within a table

Post by Louis Davidso » Sat, 31 Jan 2004 23:16:43


Yeah, good thing you asked.  A proper SQL database will work out better for
you if you follow the suggestions that had been made.  This kind of thing is
covered in the first normal form (and there are five plus normal forms, each
that have useful relavance to building proper solutions.)

You will be able to answer most of your questions in a single, quite simple
select statement that deals with a single column rather than having to
bounce around in 26 fields, each having slightly different meaning.

Get a book (or two or three) on database design and you will see that it is
very natural to solve these kinds of problems, and most others by building
the intelligence into more tables with very specific meaning, rather than
having buckets that you have to decipher bitwisedly :)

Good luck,

--
----------------------------------------------------------------------------
-----------

Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services.  All other replies will be ignored :)


Quote:> Thanks for the replies.  Yes, from what I've seen looking at the data, a
> sales lead can have more than one industry type.  I knew there had to be a
> better solution than dealing with the 26 individual fields.  The option of
> creating a table that x-refs the sales lead and industry type hadn't
occured
> to me and certainly seems less complicated than using bit values.
> Sometimes, those logical solutions can be the last ones to be thought of
> ;-).  The current app was written in Access and something needs to be done
> with it since querying 450K records in Access is painful to say the least.
> So I'm using this app as my initiation into the world of ADO.Net and SQL
> server programming.

> Thanks Again,

> Barry