[posted and mailed]
>Can anyone give me some insight into how the SOUNDEX function affects
>the performance of SELECT statements on large databases in practice?
>e.g. if I have a very large (hundreds of thousands of records) customer
>database, and I run:
>SELECT * FROM CUSTOMERS WHERE LASTNAME = 'SMITH'
>SELECT * FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) = SOUNDEX('SMITH')
>What will the difference be? Negligible? 10% increase in time?
Well, it depends. If you don't have an index on LASTNAME, the
cost will be neglible. It will take one hell of a time no
matter what you do.
If you do have an index an LASTNAME, which I assume that you have,
the cost is likely to be a factor of 100. This is because SQL
Server can no longer use the index but will scan the entire table.
Then again, if you do this often enough, the entire table will be
held in cache, and you can get decent response times on maybe 5
seconds anyway. (All of course depends on the size of the rows, and
If you need to do this, there are several ways to go. One is to add
a column with the soundex value, and build an index on that. In SQL7
you could make that a computed column, but I don't know how well
indexes work with computed columns.
Another approach is to define a non-clustered index with LASTNAME
and the primary key of the table, and writed the query as
FROM CUSTOMERS c1
JOIN (SELECT pkey FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) =
SOUNDEX('SMITH')) AS c2 ON c1.pkey = c2.pkey
This would not give as good performance as adding a new column;
SQL Server would have to scan the entire index tree, but since
its smaller than the data pages, it's cheaper than a table scan.
The advantage is that you don't have to add a new column, only a
This is an incomplete mess.