■
Data Mining with Cubist
Data mining is all about extracting patterns from an organization's
stored or warehoused data. These patterns can be used to gain
insight into aspects of the organization's operations, and to
predict outcomes for future situations as an aid to decisionmaking.
Cubist builds rulebased predictive models that output values,
complementing See5/C5.0 that predicts categories. For instance,
See5/C5.0 might classify the percentage yield from some process as
"high", "medium", or "low", whereas Cubist would output a number
such as "73".
Cubist is a powerful tool for generating rulebased models that
balance the need for accurate prediction against the requirements of
intelligibility. Cubist models generally give better results than
those produced by simple techniques such as multivariate linear
regression, while also being easier to understand than neural
networks.
Some important features:
•
Cubist has been designed to analyze substantial databases containing
hundreds of thousands of records and tens to thousands of numeric or
nominal fields. If you have used neural networks or similar modeling
tools, you'll be surprised by Cubist's speed! (Cubist also takes
advantage of processors with dual cores, dual CPUs, or Intel
HyperThreading to speed up modelbuilding.)
•
To maximize interpretability, Cubist models are expressed as
collections of rules, where each rule has an associated multivariate
linear model. Whenever a situation matches a rule's conditions, the
associated model is used to calculate the predicted value.
•
Cubist is available for Windows 98/Me/2000/XP and several flavors of
Unix.
•
Cubist is easy to use and does not presume advanced knowledge of
Statistics or Machine Learning (although these don't hurt, either!)
•
RuleQuest provides C source code so that models constructed by
Cubist can be embedded in your organization's own systems.
