Hi, any help to the following questions would be great.
1. I am applying the EM clustering method to discretize
continuous attributes, with both build in and custom
algorithms, which is great, however when I move to larger
database, I have noticed that the algorithm only utilizes
the first 1000 cases. This has the effect that even
though my cases are in no particular order, the coverage
of the buckets is small, for example 60%. How do I best
get round this?
2. With the DMSample code, what is the best way to use
the case iterators (e.g. SPM_SPM_CaseCacheItr), for an
algorithm which holds out a percentage of records for use
in scoring the algorithm? I.e. how do I determine which
records to use for each stage?
3. I would like to chain a series of data mining
algorithms together, e.g. perform some pre-processing and
pass the new data into a successive mining algorithm. How
do I do this without saving data to temporary tables etc.
can I use the Insert into SQL with a data mining model in
OPENROWSET? In addition, is this possible using DTS?
4. Does anybody have example DM SQL for each of the
various functions e.g. Cluster, ClusterDistance, Predict
5. What are the options for the CLUSTERING_METHOD
parameter in the Miicrosoft_Clustering algorithm?
Thanks in advance