Soundex and "look-alike" searches

Soundex and "look-alike" searches

Post by Marvin Pankaski » Thu, 13 Jan 2000 04:00:00



I am trying to construct the following form/query for retreiving
information from a SQL Server 7.0 database:

1. The user will type into the form the name of a herb. The user may
mispell the name, and the mispelling may be anywhere in the name and
involve 1 or more erroneous characters (e.g., accashia instead of
acacia)

2. When the form is submitted I would like to take the users search term
and "somehow" create a SQL select query (or some other code, preferably
in ASP) that would either find "look-alikes" or "sound-alikes" in the
database or do some substitution of characters in the users entry that
would eventually result in finding one or more records that partially
match the users entry

I have thought of creating a loop that takes the users entry, runs the
query, and if the returned recordset is empty, substitutes one character
at a time with the "_" wildcard. This seems much too cpu intensive. I
have also considered creating another table that has synonyms or
spelling variants for each primary term and searching against it.
Someone else suggested that the "soundex" function in SQL Server 7.0
would work, but I cannot find any application of this function, only the
syntax.

I have seen this kind of "fuzzy logic" used on other web sites, but have
no idea how they are implemented. I would appreciate some help.

Stuck in Florida,
Marvin Pankaskie

 
 
 

Soundex and "look-alike" searches

Post by J » Fri, 14 Jan 2000 04:00:00


Here's one for the world, check out transact SQL for more info. The only
problem with this one is that the first letter may not be missspelled. This
can be solved by doing a LIKE on the word with first and last letter removed
or something like that.


As
 set nocount on /* speeds up things */

 /* sees if the value is already present in the table based on a SOUNDEX
comparison */
 /* and a check if the entered value is not already part of an item */
 /* and finally checks for letter differences with a DIFFERENCE algoritm */

 SELECT whatever_column_you_choose
 FROM your_kind_of_table
 WHERE



return

Good luck, Jack


Quote:> I am trying to construct the following form/query for retreiving
> information from a SQL Server 7.0 database:

> 1. The user will type into the form the name of a herb. The user may
> mispell the name, and the mispelling may be anywhere in the name and
> involve 1 or more erroneous characters (e.g., accashia instead of
> acacia)

> 2. When the form is submitted I would like to take the users search term
> and "somehow" create a SQL select query (or some other code, preferably
> in ASP) that would either find "look-alikes" or "sound-alikes" in the
> database or do some substitution of characters in the users entry that
> would eventually result in finding one or more records that partially
> match the users entry

> I have thought of creating a loop that takes the users entry, runs the
> query, and if the returned recordset is empty, substitutes one character
> at a time with the "_" wildcard. This seems much too cpu intensive. I
> have also considered creating another table that has synonyms or
> spelling variants for each primary term and searching against it.
> Someone else suggested that the "soundex" function in SQL Server 7.0
> would work, but I cannot find any application of this function, only the
> syntax.

> I have seen this kind of "fuzzy logic" used on other web sites, but have
> no idea how they are implemented. I would appreciate some help.

> Stuck in Florida,
> Marvin Pankaskie


 
 
 

1. Looking for a "Uniqueness Search"

I'm looking for a text tool that more or less corresponds to the
pipelines:

 cat foo | sort | uniq | wc -l
 cat foo | wc -l

If the counts differ, then evidently there's something /not/ unique in
the files.

What it amounts to is looking at whether or not the data presented
represents a unique key across the data file.

What actually happens is that "foo" is actually the result of a "cut"
command, ala "cut -f 1,7,9 some_data_file".

I can easily script up such a comparison in Perl, using a hash table,
that can abort as soon as a non-unique value is found, roughly via:

  while ($line = <>) {
    if $HASH{$line} {  
       print $count, $HASH{$line};
       die -1;
    } else {
      $HASH{$line} = $count++;
    }
  }

Unfortunately, the interpretive overhead is pretty heavy, such that
this is a lot slower than "cat foo | sort | uniq | wc -l", even though
it is able to abort as soon as it hits a violation of the uniqueness
requirement.

I can't think of any Unix text command that does a uniqueness test
without starting with a sort; has anyone built such?  If not, then I
may wind up writing up something in C using dbm.  But there's no sense
in replicating a wheel that may already exist.

(Note that this /is/ relevant to "databasing" as the whole purpose of
the exercise is to validate that attempts to create unique indexes in
a PostgreSQL DBMS will succeed without actually initiating the "create
index.")
--

http://www.ntlug.org/~cbbrowne/rdbms.html
"prepBut nI vrbLike adjHungarian! qWhat's artThe adjBig nProblem?"
-- Alec Flett

2. Recent gateway problems

3. max of ("...","...","..")

4. Connection difficulities, PSQL 2000, NetWare 5.1

5. How to make a "decimal"-field to an "integer"-field

6. FOR XML Dates

7. "."and ","

8. Enabling NLS in informix

9. The ""string""

10. aReport."Field".SetFilter("this")?

11. The+instruction+at+"0x1f90db9b"+referenced+memory+at+"0x7fa03794"+The+memory+cou