SOUNDEX Performance

SOUNDEX Performance

Post by Ian Klinc » Tue, 14 Mar 2000 04:00:00



Can anyone give me some insight into how the SOUNDEX function affects
the performance of SELECT statements on large databases in practice?

e.g. if I have a very large (hundreds of thousands of records) customer
database, and I run:

SELECT * FROM CUSTOMERS WHERE LASTNAME = 'SMITH'

and

SELECT * FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) = SOUNDEX('SMITH')

What will the difference be?  Negligible?  10% increase in time?
Double?

Thanks,
Ian

(cc'ing newsgroup replies to my mailbox directly would be appreciated,
but I'll try to monitor the ng, too.)
--

Project Manager
Combeck Computer Design, Inc.
(905)206-0244 x37  fax: (905)206-9851

 
 
 

SOUNDEX Performance

Post by Erland Sommarsk » Tue, 14 Mar 2000 04:00:00


[posted and mailed]


>Can anyone give me some insight into how the SOUNDEX function affects
>the performance of SELECT statements on large databases in practice?

>e.g. if I have a very large (hundreds of thousands of records) customer
>database, and I run:

>SELECT * FROM CUSTOMERS WHERE LASTNAME = 'SMITH'

>and

>SELECT * FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) = SOUNDEX('SMITH')

>What will the difference be?  Negligible?  10% increase in time?
>Double?

Well, it depends. If you don't have an index on LASTNAME, the
cost will be neglible. It will take one hell of a time no
matter what you do.

If you do have an index an LASTNAME, which I assume that you have,
the cost is likely to be a factor of 100. This is because SQL
Server can no longer use the index but will scan the entire table.
Then again, if you do this often enough, the entire table will be
held in cache, and you can get decent response times on maybe 5
seconds anyway. (All of course depends on the size of the rows, and
the hardware.)

If you need to do this, there are several ways to go. One is to add
a column with the soundex value, and build an index on that. In SQL7
you could make that a computed column, but I don't know how well
indexes work with computed columns.

Another approach is to define a non-clustered index with LASTNAME
and the primary key of the table, and writed the query as

   SELECT *
   FROM CUSTOMERS  c1
   JOIN (SELECT pkey FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) =
        SOUNDEX('SMITH')) AS c2 ON c1.pkey = c2.pkey

This would not give as good performance as adding a new column;
SQL Server would have to scan the entire index tree, but since
its smaller than the data pages, it's cheaper than a table scan.
The advantage is that you don't have to add a new column, only a
new index.

--

This is an incomplete mess.

 
 
 

SOUNDEX Performance

Post by BPMargoli » Tue, 14 Mar 2000 04:00:00


Ian,

To add to Erland's posting ... if you feel that this is a real value added
feature, then consider adding a column to the Customers table (let's call it
Soundex_Lastname), automatically populate the new column via a trigger with the
value of Soundex(Lastname), and index the new column. The new query would them
look like ...

SELECT * FROM CUSTOMERS WHERE SOUNDEX_LASTNAME = SOUNDEX('SMITH')

Performance on this is likely to be acceptable.


> Can anyone give me some insight into how the SOUNDEX function affects
> the performance of SELECT statements on large databases in practice?

> e.g. if I have a very large (hundreds of thousands of records) customer
> database, and I run:

> SELECT * FROM CUSTOMERS WHERE LASTNAME = 'SMITH'

> and

> SELECT * FROM CUSTOMERS WHERE SOUNDEX(LASTNAME) = SOUNDEX('SMITH')

> What will the difference be?  Negligible?  10% increase in time?
> Double?

> Thanks,
> Ian

> (cc'ing newsgroup replies to my mailbox directly would be appreciated,
> but I'll try to monitor the ng, too.)
> --

> Project Manager
> Combeck Computer Design, Inc.
> (905)206-0244 x37  fax: (905)206-9851

 
 
 

SOUNDEX Performance

Post by Ian Klinc » Wed, 15 Mar 2000 04:00:00


Thanks for the help!  Looks like my VP was right. ;-)

Ian

--

Project Manager
Combeck Computer Design, Inc.
(905)206-0244 x37  fax: (905)206-9851

 
 
 

SOUNDEX Performance

Post by Jason Webste » Tue, 28 Mar 2000 04:00:00


Good post Erland,

It is worth mentioning here that anytime you put a column inside a function
or do any math on it, the server will be unable to use the index.

Examples:
select last_name from employees where last_name + first_name = 'John Smith'
select last_name from employees where upper(last_name)='SMITH'
ETC.

 
 
 

1. pgsql/contrib/soundex (soundex.c soundex.sql.in)


Author: tgl

Update of /home/projects/pgsql/cvsroot/pgsql/contrib/soundex
     from hub.org:/home/projects/pgsql/tmp/cvs-serv51753/contrib/soundex

Modified Files:
        soundex.c soundex.sql.in

-----------------------------  Log Message  -----------------------------

Revise handling of oldstyle/newstyle functions per recent discussions
in pghackers list.  Support for oldstyle internal functions is gone
(no longer needed, since conversion is complete) and pg_language entry
'internal' now implies newstyle call convention.  pg_language entry
'newC' is gone; both old and newstyle dynamically loaded C functions
are now called language 'C'.  A newstyle function must be identified
by an associated info routine.  See src/backend/utils/fmgr/README.

2. Memory Configuration

3. select soundex(string1), soundex(string2)

4. Amending an access database in visual basic

5. pgsql/contrib/soundex (soundex.c)

6. Executing a Requery Asynchronously

7. SQL Server Performance Problem - Good query performance, bad update performance

8. sp_configure doesn't work right?

9. Soundex and "look-alike" searches

10. SOUNDEX

11. MS SQL Soundex function

12. soundex - usage