Merge/Purge

Merge/Purge

Post by Gerry Thorp » Tue, 28 Sep 1999 04:00:00



Greetings,

I am looking for information about and software to do Merge/Purge.

Merge/Purge is a solution to the problem of multiple data sources
that need to be consolidated. The problem is that the consolidation
of data can not be done with a simple database join across tables.

A typical situation is as follows: John Doe, may be in one
data source as John Doe, another as Jonny Doe, another
as J. Doe, Johnathan Doe, etc. I would like to be able to
take the data from all of those sources and construct a single
row of data that aggregates the data from all those rows.

What I need is a system that knows that John Doe, Jonny Doe,
J. Doe and Johnathan Doe are all likely the same person.

Any leads would be appreciated.
Thanks in advance,
Gerry Thorpe

 
 
 

Merge/Purge

Post by Tom Leyla » Tue, 28 Sep 1999 04:00:00



> I am looking for information about and software to do Merge/Purge.

I think it is safe to say, this is a very specialized field (if you want to
do it right.)

Quote:> A typical situation is as follows: John Doe, may be in one
> data source as John Doe, another as Jonny Doe, another
> as J. Doe, Johnathan Doe, etc. I would like to be able to
> take the data from all of those sources and construct a single
> row of data that aggregates the data from all those rows.

I'm not sure you actually want to aggregate the data.  If "John Doe" uses
"Jonny" these days and really lives in Philadelphia then the fact that "John
Doe" appears 3 times as "St. Paul, MN" is of no value, he no longer lives
there.

Quote:> What I need is a system that knows that John Doe, Jonny Doe,
> J. Doe and Johnathan Doe are all likely the same person.

And the clue that they are the same person would be?

Quote:> Any leads would be appreciated.

1) Search the Internet for what not to do.
2) Expect a multiple-pass solution.
3) Expect a solution based upon specific knowledge of the domain.
4) Accept "reasonable" solutions.
5) Document your assumptions.
6) Consider posting your ideas (as they arrive) here, before you merge all
the "John Smith" records in Los Angeles into a single row.

Tom

Oh... visit www.deja.com and search for "database duplicates"

--
---> Learn a little something at http://www.leylan.com

 
 
 

Merge/Purge

Post by modisc » Thu, 28 Oct 1999 04:00:00


Check out www.vality.com
www.trilliumsoft.com
www.g1.com
www.postalsoft.com




> > I am looking for information about and software to do
> Merge/Purge.
> I think it is safe to say, this is a very specialized field (if
> you want to
> do it right.)
> > A typical situation is as follows: John Doe, may be in one
> > data source as John Doe, another as Jonny Doe, another
> > as J. Doe, Johnathan Doe, etc. I would like to be able to
> > take the data from all of those sources and construct a single
> > row of data that aggregates the data from all those rows.
> I'm not sure you actually want to aggregate the data.  If "John
> Doe" uses
> "Jonny" these days and really lives in Philadelphia then the fact
> that "John
> Doe" appears 3 times as "St. Paul, MN" is of no value, he no
> longer lives
> there.
> > What I need is a system that knows that John Doe, Jonny Doe,
> > J. Doe and Johnathan Doe are all likely the same person.
> And the clue that they are the same person would be?
> > Any leads would be appreciated.
> 1) Search the Internet for what not to do.
> 2) Expect a multiple-pass solution.
> 3) Expect a solution based upon specific knowledge of the domain.
> 4) Accept "reasonable" solutions.
> 5) Document your assumptions.
> 6) Consider posting your ideas (as they arrive) here, before you
> merge all
> the "John Smith" records in Los Angeles into a single row.
> Tom
> Oh... visit www.deja.com and search for "database duplicates"
> --
> ---> Learn a little something at http://www.leylan.com

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!