phonetic search

phonetic search

Post by Gaylord Aulk » Thu, 27 Aug 1998 04:00:00



Hi all.

Has anyone ever seen a java approach for phonetic seach
in oracle databases? i think one needs to code the data
before inserting into the database and code the search
targets as well and thes make a simple query.

what we want to do is provide a search interface to a large
database with addresses stored in it. to be tolerant against
mistyping of names we planned to use a phonetic approach for
the query.

can anyone help?

gaylord aulke
--
agi business media productions
http://www.agi.de
+++news+++news+++news+++
agi is a demonstrating finalist in the
multimedia competition of The New York Festivals!
check our diary for more information:
http://www.agi.de/news.cgi?bereich=0

 
 
 

phonetic search

Post by Shane Petrof » Thu, 27 Aug 1998 04:00:00



> Hi all.

> Has anyone ever seen a java approach for phonetic seach
> in oracle databases?

This is not a real answer, but Joe Celko's 'SQL for Smarties' contains C
code for the Metaphone algorithm. (I thought I'd heard that there was an
Oracle library which includes a Soundex function)

Does anyone have experience with Metaphone or other Soundex derivatives?
How well do they work?

Shane

 
 
 

phonetic search

Post by Roedy Gree » Thu, 27 Aug 1998 04:00:00


Here is a repost of an article on Soundex phonetic searching.
Unfortunately it was originally posted in a very wide format which has been
disturbed.

Group:  comp.lang.java.programmer

Org:    City of NN
Date:   24 Jun 1997 13:07:16 GMT
Subj:   Re: Soundex
____________________________________________________________

Quote:>Hi there,
>Where can I find Soundex algorithm?
>Soundex is a way for searching people's name, it will give same return
>value for "Ann" and "An".
>Thanks

Here's the algorithm for SOUNDEX in the form of my project handout from a
Pascal class that I used to teach. The text is meant to be viewed as wide as
possible so don't reformat it.

Hey, did you know that the Soundex algorithm was developed just after the turn
of the century. Also, it is not all that accurate. My tests show 75% accuracy,
but perhaps you can find real stats to prove the worth of the SOUNDEX
algorithm.

Greg DiGiorgio
-------------------------------------------------------------------------------
-------------------
                                                        SOUNDEX Project

SOUNDEX is short for "Sound Index", whereby words are sorted into a dictionary
according to their sound instead of their spelling, allowing
look up of words that sound alike. For example, consider the following
sentence: "The Magi brought presence to the mail Christ child." The
correct rendering of this sentence should be: "The Magi brought presents to
the
male Christ child."

                                                        The SOUNDEX Algorithm

Step1. Assign the following numbers to each letter of the alphabet.

        A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R  
 S   T   U   V   W   X   Y   Z
        -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -  
 -   -   -   -   -   -   -   -
        0   1   2   3   0   1   2   0   0   2   2   4   5   5   0   1   2   6  
 2   3   0   1   0   2   0   2

Step 2. Assign the first character of the word to be converted to the first
character of the SOUNDEX code string. For example, assume we
        are converting "WHOLE".

                        ...
                VAR     CodeString, WordString : STRING;
                BEGIN
                        CodeString:=WordString[1];
                        ...

Step 3. Build the SOUNDEX code string by converting subsequent characters in
the word string according to the table you built in Step (1).
        For example, "WHOLE" would be converted to "W400". To convert a word,
ignore letters that convert to zero or letters that are
        duplicates of the letter immediately preceding them. Let's look a
little closer at the conversion of "WHOLE" to "W400".

                1. "W" is assigned to the first character of the code string so
that the code string looks like "W".
                2. "H" converts to a zero and is therefore skipped.
                3. "O" converts to a zero and is therefore skipped.
                4. "L" converts to a four and is appended to the code string so
that it now looks like "W4".
                5. "E" converts to a zero and is therefore skipped.
                6. To make the code string a 4 character string, append zeroes
until it is 4 characters long. Hence, "WHOLE" = "W400".

                                                SOUNDEX Project Algorithm

Step 1. Using a text file that I will supply, read in a word, convert it to
it's SOUNDEX equivalent and store the word and its SOUNDEX code
        string in a 2-D array or two 1-D arrays. Do this for each word in the
file. There will be one word per line, so use READLN.

Step 2. Sort your array(s) by the SOUNDEX code string for each word in
ascending order.

Step 3. Prompt the user for a word and read it in. Assume that the user will
enter one of the words you read in from the file and that it,
        like the words you read in, will be capitalized.

Step 4. Loop through your array(s) until you find the word the user keyed in
and note the array index of that word.

Step 5. Using the array index of the word the user entered, check the code
string of the word before and after the word the user entered.
        Pick the code string that exactly matches the one for the word the user
entered. Display the word associated with the matching
        SOUNDEX code string on the screen. If there is no match, tell the user.

                                                        Grading

Step #  Numeric Grade   Letter Grade    Output required if You Stop on this
Step
------  -------------   ------------    
-------------------------------------------------------------------------------
-------------------
  1         70               C          Print all the words you read from the
file along with thier converted SOUNDEX codes on the screen.
  2         80               B          Same as for step (1), but your display
should be sorted as step (2) outlines.
  3         90               B          Same as step (3), except that you
should display the array index of the word the user keyed in.
  4        100               A          Find the matching word and display it
on the screen. *Only have to match up to 75% of the words.

For the JAVA GLOSSARY and the CMP Utilities: <http://mindprod.com>
--
Roedy Green                          Canadian Mind Products
-30-

 
 
 

phonetic search

Post by Gaylord Aulk » Fri, 28 Aug 1998 04:00:00


Hi.

Thank you for re-posting the soundex algorithm. do you have
any experience in how this one works with foreign languages?
i am looking for an algorithm that works for german pronounciation.
have you ever tried this?

Gaylord
--
agi business media productions
http://www.agi.de
+++news+++news+++news+++
agi is a demonstrating finalist in the
multimedia competition of The New York Festivals!
check our diary for more information:
http://www.agi.de/news.cgi?bereich=0

 
 
 

phonetic search

Post by KenNort » Mon, 31 Aug 1998 04:00:00


My Database Developer column in the September issue of Web Techniques is
"Java-Enabled Databases and Adaptive Server." The article is about Sybase's
implementation of Java in the database, but the Java examples are not
platform-dependent. Download the source code from:

http://www.webtechniques.coms

One example is a class that uses JDBC and SOUNDEX, but the other is a
PhoneticString class that does Metaphone encoding. It is an excerpt from
_Database Magic with Ken North_ (Prentice-Hall). The book explains the
modified Metaphone algorithm and includes a test driver to exercise the
PhoneticString class from a Java application.

--

==================== Ken North ===========================
http://ourworld.compuserve.com/homepages/Ken_North




Remove: 'nospam.' from mail address to reply
Ken North Computing
2604B El Camino Real, #351
Carlsbad, CA 92008-1214
FAX: 760-729-5127
==========================================================



Quote:> Hi all.

> Has anyone ever seen a java approach for phonetic seach
> in oracle databases? i think one needs to code the data
> before inserting into the database and code the search
> targets as well and thes make a simple query.

> what we want to do is provide a search interface to a large
> database with addresses stored in it. to be tolerant against
> mistyping of names we planned to use a phonetic approach for
> the query.

> can anyone help?

> gaylord aulke
> --
> agi business media productions
> http://www.agi.de
> +++news+++news+++news+++
> agi is a demonstrating finalist in the
> multimedia competition of The New York Festivals!
> check our diary for more information:
> http://www.agi.de/news.cgi?bereich=0

 
 
 

1. Phonetic searching

I am looking for a solution that will search databases phonetically across
various languages in Europe. I know of soundex already and am looking for a
better solution (actually a far better solution) if anybody can help me I would
really appreciate it.

Please mail me at my email addresss

Thanks

Frank

2. Error 429 ....

3. Fuzzy Phonetic searching

4. Suggested Architecture for FMP & IIS

5. Phonetic searching algorithms

6. Difficulty with SHORT MONTHS (date changing of format)

7. Phonetic Searching on Names

8. Email from AppServer 3.0

9. Phonetic searching

10. Phonetic search in fulltext catalog

11. Phonetic searching algorithms

12. Phonetic search codes