plot.CoreModel {CORElearn} | R Documentation |
The method plot
visualizes the models returned by CoreModel()
function or summaries obtained by applying these models to data.
Different plots can be produced depending on the type of the model.
## S3 method for class 'CoreModel' plot(x, trainSet, rfGraphType=c("attrEval", "outliers", "scaling", "prototypes", "attrEvalCluster"), clustering=NULL, ...)
x |
The model structure as returned by |
trainSet |
The data frame containing training data which produced the model |
rfGraphType |
The type of the graph to produce for random forest models. See details. |
clustering |
The clustering of the training instances used in some model types. See details. |
... |
Other options controlling graphical output passed to additional graphical functions. |
The output of function CoreModel
is visualized. Depending on the model type, different visualizations
are produced. Currently, classification tree, regression tree, and random forests are supported
(models "tree", "regTree", "rf", and "rfNear").
For classification and regression trees (models "tree" and "regTree") the visualization produces a graph
representing structure
of classification and regression tree, respectively. This process exploits graphical capabilities of
rpart
package. Internal structures of
CoreModel
are converted to rpart.object
and then visualized by calling
plot.rpart
and text.rpart
using some sensible values of graphical parameters. For more versatile
picture use getRpartModel
and call these two functions with different parameters.
An alternative is to use package rpart.plot and plot the rpart.object
with it, however note that
rpart.plot
can only display a single value in a leaf, which is not appropriate for model trees using e.g.,
linear regression in the leaves. For these cases function display
is a better alternative.
directly modifying the parameters.
For random forest models (models "rf" and "rfNear") different types of visualizations can be produced depending on the
graphType
parameter:
"attrEval"
the attributes are evaluated with random forest model and the importance scores are then
visualized. For details see rfAttrEval
.
"attrEvalClustering"
similarly to the "attrEval"
the attributes are evaluated with random forest
model and the importance scores are then visualized, but the importance scores are generated
for each cluster separately. The parameter clustering
provides clustering information on
the trainSet
. If clustering
parameter is set to NULL, the class values are used as
clustering information and visualization of attribute importance for each class separately is
generated.
For details see rfAttrEvalClustering
.
"outliers"
the random forest proximity measure of training instances in trainSet
is visualized and outliers for each class separately can be detected.
For details see rfProximity
and rfOutliers
.
"prototypes"
typical instances are found based on predicted class probabilities
and their values are visualized (see classPrototypes
).
"scaling"
returns a scaling plot of training instances in a two dimensional space using
random forest based proximity as the distance (see rfProximity
and a scaling function cmdscale
).
The method returns no value.
John Adeyanju Alao (initial implementation) and Marko Robnik-Sikonja (integration, improvements)
Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001
CoreModel
,
rfProximity
,
pam
,
rfClustering
,
rfAttrEvalClustering
,
rfOutliers
,
classPrototypes
,
cmdscale
# decision tree dataset <- CO2 md <- CoreModel(Plant ~ ., dataset, model="tree") plot(md, dataset) # more versatile graph can be obtained by explicit conversion to rpart.object rpm <- getRpartModel(md,dataset) # and than setting additional graphical parameters in plot.rpart and text.rpart # E.g., set angle to tan(0.5)=45 (degrees) and length of branches at least 5, # try to make a dendrogram more compact plot(rpm, branch=0.5, minbranch=5, compress=TRUE) #(pretty=0) full names of attributes, numbers to 3 decimals, text(rpm, pretty=0, digits=3) # an alternative is to use fancier rpart.plot package # rpart.plot(rpm) # rpart.plot has many parameters controlling the output # but it cannot plot models in leaves destroyModels(md) # clean up # regression tree dataset <- CO2 mdr <- CoreModel(uptake ~ ., dataset, model="regTree") plot(mdr, dataset) destroyModels(mdr) # clean up #random forests dataset <- iris mdRF <- CoreModel(Species ~ ., dataset, model="rf", rfNoTrees=30, maxThreads=1) plot(mdRF, dataset, rfGraphType="attrEval") plot(mdRF, dataset, rfGraphType="outliers") plot(mdRF, dataset, rfGraphType="scaling") plot(mdRF, dataset, rfGraphType="prototypes") plot(mdRF, dataset, rfGraphType="attrEvalCluster", clustering=NULL) destroyModels(mdRF) # clean up