Some OLE DB Data Mining questions

Some OLE DB Data Mining questions

Post by John Sandifor » Wed, 04 Dec 2002 18:37:47



Hi, any help to the following questions would be great.

1.  I am applying the EM clustering method to discretize
continuous attributes, with both build in and custom
algorithms, which is great, however when I move to larger
database, I have noticed that the algorithm only utilizes
the first 1000 cases.  This has the effect that even
though my cases are in no particular order, the coverage
of the buckets is small, for example 60%.  How do I best
get round this?

2.  With the DMSample code, what is the best way to use
the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
algorithm which holds out a percentage of records for use
in scoring the algorithm?  I.e. how do I determine which
records to use for each stage?

3.  I would like to chain a series of data mining
algorithms together, e.g. perform some pre-processing and
pass the new data into a successive mining algorithm.  How
do I do this without saving data to temporary tables etc.
can I use the Insert into SQL with a data mining model in
OPENROWSET? In addition, is this possible using DTS?

4.  Does anybody have example DM SQL for each of the
various functions e.g. Cluster, ClusterDistance, Predict
(Scalar) etc.

5.  What are the options for the CLUSTERING_METHOD
parameter in the Miicrosoft_Clustering algorithm?

Thanks in advance

 
 
 

Some OLE DB Data Mining questions

Post by Jamie MacLennan [MS » Thu, 05 Dec 2002 04:21:06


Here's the answer to question 1 - I'll get back on the others.

1:  The discretization method always works with the first 1000 points of
data.  It assumes the data is in random order, and if so, 1000 points should
generally give a good picture of the data for this purpose.  If the data is
ordered, obviously this will not work well.   You can work around this issue
by using
    INSERT INTO <model>.COLUMN_VALUES(<discretized_column>) <source data
query>
before you train your model and specifying a query that gives a better
representation of the range of your data.  If you are developing your own
OLEDB provider using the Sample Provider, you can change the discretization
code as you see fit.

--
Jamie MacLennan
SQL Server Data Mining
-----------------------------------------------------------------
This posting is provided "AS IS" with no warranties, and confers no rights.


Quote:> Hi, any help to the following questions would be great.

> 1.  I am applying the EM clustering method to discretize
> continuous attributes, with both build in and custom
> algorithms, which is great, however when I move to larger
> database, I have noticed that the algorithm only utilizes
> the first 1000 cases.  This has the effect that even
> though my cases are in no particular order, the coverage
> of the buckets is small, for example 60%.  How do I best
> get round this?

> 2.  With the DMSample code, what is the best way to use
> the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
> algorithm which holds out a percentage of records for use
> in scoring the algorithm?  I.e. how do I determine which
> records to use for each stage?

> 3.  I would like to chain a series of data mining
> algorithms together, e.g. perform some pre-processing and
> pass the new data into a successive mining algorithm.  How
> do I do this without saving data to temporary tables etc.
> can I use the Insert into SQL with a data mining model in
> OPENROWSET? In addition, is this possible using DTS?

> 4.  Does anybody have example DM SQL for each of the
> various functions e.g. Cluster, ClusterDistance, Predict
> (Scalar) etc.

> 5.  What are the options for the CLUSTERING_METHOD
> parameter in the Miicrosoft_Clustering algorithm?

> Thanks in advance


 
 
 

Some OLE DB Data Mining questions

Post by ZhaoHui Tan » Thu, 05 Dec 2002 05:47:05


See my anwser inline:
 3.  I would like to chain a series of data mining
 algorithms together, e.g. perform some pre-processing and
 pass the new data into a successive mining algorithm.  How
 do I do this without saving data to temporary tables etc.
 can I use the Insert into SQL with a data mining model in
 OPENROWSET? In addition, is this possible using DTS?

Zhaohui: You have to use temporary tables. In DTS, you can build a task flow
using Prediction task, but at each step, you need to stop it in a temp
table. The Prediction task does this for you.

 4.  Does anybody have example DM SQL for each of the
various functions e.g. Cluster, ClusterDistance, Predict
 (Scalar) etc.
ZhaoHui: The best document is OLE DB for DM. ClusterDistance is not
implemented.
It is fairly straightforward, Select Customerid, Cluster() from
segmentationmodel Prediction Join ... and Select CustomerId,
Predict(CreditRisk) from treemodel prediction join ...

 5.  What are the options for the CLUSTERING_METHOD
 parameter in the Miicrosoft_Clustering algorithm?
ZhaoHui: We support EM and KMeans. But EM is the default method and the one
we recommend.

ZhaoHui


Quote:> Hi, any help to the following questions would be great.

> 1.  I am applying the EM clustering method to discretize
> continuous attributes, with both build in and custom
> algorithms, which is great, however when I move to larger
> database, I have noticed that the algorithm only utilizes
> the first 1000 cases.  This has the effect that even
> though my cases are in no particular order, the coverage
> of the buckets is small, for example 60%.  How do I best
> get round this?

> 2.  With the DMSample code, what is the best way to use
> the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
> algorithm which holds out a percentage of records for use
> in scoring the algorithm?  I.e. how do I determine which
> records to use for each stage?

> 3.  I would like to chain a series of data mining
> algorithms together, e.g. perform some pre-processing and
> pass the new data into a successive mining algorithm.  How
> do I do this without saving data to temporary tables etc.
> can I use the Insert into SQL with a data mining model in
> OPENROWSET? In addition, is this possible using DTS?

> 4.  Does anybody have example DM SQL for each of the
> various functions e.g. Cluster, ClusterDistance, Predict
> (Scalar) etc.

> 5.  What are the options for the CLUSTERING_METHOD
> parameter in the Miicrosoft_Clustering algorithm?

> Thanks in advance

 
 
 

Some OLE DB Data Mining questions

Post by Peter Kim [MS » Thu, 05 Dec 2002 08:02:42


I'll address question 3, 4.

For 3:
Yes, you can certainly use an OPENROWSET in which
you have data mining commands such as SELECT.
Here is an illustrative example.

select t.k, t.[Gender], [People].[Gender]
from [People] prediction join
openrowset(
'MSOLAP',
'Provider=MSOLAP;location=localhost;Initial catalog=FoodMart 2000',
'select p.k as k, p.a as a, [People].[Gender] as Gender
 from [People] prediction join
 (select 999 as k, 65 as a) as p    // you could replace this (SELECT) as another OPENROWSET()
 on p.a = [People].[Age]'
) as t
on t.a = [People].[Age]

Note that this OPENROWSET() is talking to Analysis Server thru MSOLAP.
Although this is a SELECT-example, you can certainly do the same with INSERT INTO.

For 4:
I've just uploaded a bunch of sample DMX queries (zipped to DMXQuerySamples.zip)
to the MSN DM community web site (http://groups.msn.com/AnalysisServicesDataMining).
After download and unzip it, use the DMSample VB app to load each xml file to run
to see how it works. The DMSample VB app is also available from the same web site.

Thanks.
--
Peter Kim
Please do not send email directly to this alias. This alias is for newsgroup purposes only.
This posting is provided "AS IS" with no warranties, and confers no rights.


> Hi, any help to the following questions would be great.

> 1.  I am applying the EM clustering method to discretize
> continuous attributes, with both build in and custom
> algorithms, which is great, however when I move to larger
> database, I have noticed that the algorithm only utilizes
> the first 1000 cases.  This has the effect that even
> though my cases are in no particular order, the coverage
> of the buckets is small, for example 60%.  How do I best
> get round this?

> 2.  With the DMSample code, what is the best way to use
> the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
> algorithm which holds out a percentage of records for use
> in scoring the algorithm?  I.e. how do I determine which
> records to use for each stage?

> 3.  I would like to chain a series of data mining
> algorithms together, e.g. perform some pre-processing and
> pass the new data into a successive mining algorithm.  How
> do I do this without saving data to temporary tables etc.
> can I use the Insert into SQL with a data mining model in
> OPENROWSET? In addition, is this possible using DTS?

> 4.  Does anybody have example DM SQL for each of the
> various functions e.g. Cluster, ClusterDistance, Predict
> (Scalar) etc.

> 5.  What are the options for the CLUSTERING_METHOD
> parameter in the Miicrosoft_Clustering algorithm?

> Thanks in advance

 
 
 

Some OLE DB Data Mining questions

Post by John Sandifor » Sun, 08 Dec 2002 01:11:27


Thanks very much for your help.

Quote:>-----Original Message-----
>See my anwser inline:
> 3.  I would like to chain a series of data mining
> algorithms together, e.g. perform some pre-processing and
> pass the new data into a successive mining algorithm.  
How
> do I do this without saving data to temporary tables etc.
> can I use the Insert into SQL with a data mining model in
> OPENROWSET? In addition, is this possible using DTS?

>Zhaohui: You have to use temporary tables. In DTS, you

can build a task flow
Quote:>using Prediction task, but at each step, you need to stop
it in a temp
>table. The Prediction task does this for you.

> 4.  Does anybody have example DM SQL for each of the
>various functions e.g. Cluster, ClusterDistance, Predict
> (Scalar) etc.
>ZhaoHui: The best document is OLE DB for DM.

ClusterDistance is not
>implemented.
>It is fairly straightforward, Select Customerid, Cluster
() from
>segmentationmodel Prediction Join ... and Select
CustomerId,
>Predict(CreditRisk) from treemodel prediction join ...

> 5.  What are the options for the CLUSTERING_METHOD
> parameter in the Miicrosoft_Clustering algorithm?
>ZhaoHui: We support EM and KMeans. But EM is the default
method and the one
>we recommend.

>ZhaoHui


message

>> Hi, any help to the following questions would be great.

>> 1.  I am applying the EM clustering method to discretize
>> continuous attributes, with both build in and custom
>> algorithms, which is great, however when I move to
larger
>> database, I have noticed that the algorithm only
utilizes
>> the first 1000 cases.  This has the effect that even
>> though my cases are in no particular order, the coverage
>> of the buckets is small, for example 60%.  How do I best
>> get round this?

>> 2.  With the DMSample code, what is the best way to use
>> the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
>> algorithm which holds out a percentage of records for
use
>> in scoring the algorithm?  I.e. how do I determine which
>> records to use for each stage?

>> 3.  I would like to chain a series of data mining
>> algorithms together, e.g. perform some pre-processing
and
>> pass the new data into a successive mining algorithm.  
How
>> do I do this without saving data to temporary tables
etc.
>> can I use the Insert into SQL with a data mining model
in
>> OPENROWSET? In addition, is this possible using DTS?

>> 4.  Does anybody have example DM SQL for each of the
>> various functions e.g. Cluster, ClusterDistance, Predict
>> (Scalar) etc.

>> 5.  What are the options for the CLUSTERING_METHOD
>> parameter in the Miicrosoft_Clustering algorithm?

>> Thanks in advance

>.

 
 
 

1. OLE DB for Data Mining Sample Provider Help

You could download DMSample appl from http://groups.msn.com/AnalysisServicesDataMining.
The DMSample app is a VB program (available as source code) that allows you to create
OLEDB DM commands (CREATE/INSERT/SELECT and etc) and send them to the Analysis
server, and display the result of execution. For the detail of the syntax, please look at the OLEDB DM spec
(http://www.microsoft.com/data/oledb/dm.htm).

I also suggest to visit the FAQ page (http://groups.msn.com/AnalysisServicesDataMining/faq.msnw)
for many other useful info.

--
Peter Kim
Please do not send email directly to this alias. This alias is for newsgroup purposes only.
This posting is provided "AS IS" with no warranties, and confers no rights.

2. Titan

3. OLE DB Data Mining SQL syntax

4. SQL-OPTIMIZER RECEIVES ACCOLADES FROM PC WEEK

5. OLE DB for Data Mining Sample Provider

6. Multi Currency Question

7. OLE DB FOR DATA MINING SAMPLE PROVIDER

8. More on Vb/Ado/Odbc/Access -3705: CPTimeout!

9. OLE DB for Data Mining

10. OLE DB for Data Mining Sample Provider

11. Sample from send-stone in OLE DB for Data Mining Resource Kit

12. OLE DB for Data Mining Sample Provider