wcls/bcls.matrix {clv} | R Documentation |
Functions compute two base matrix cluster scatter measures.
wcls.matrix(data,clust,cluster.center) bcls.matrix(cluster.center,cluster.size,mean)
data |
|
clust |
integer |
cluster.center |
|
cluster.size |
integer |
mean |
mean of all data objects. |
There are two base matrix scatter measures.
1. within-cluster scatter measure defined as:
W = sum(forall k in 1:cluster.num) W(k)
where W(k) = sum(forall x) (x - m(k))*(x - m(k))'
x | - object belongs to cluster k, |
m(k) | - center of cluster k. |
2. between-cluster scatter measure defined as:
B = sum(forall k in 1:cluster.num) |C(k)|*( m(k) - m )*( m(k) - m )'
|C(k)| | - size of cluster k, |
m(k) | - center of cluster k, |
m | - center of all data objects. |
wcls.matrix | returns W matrix (within-cluster scatter measure), |
bcls.matrix | returns B matrix (between-cluster scatter measure). |
Lukasz Nieweglowski
T. Hastie, R. Tibshirani, G. Walther Estimating the number of data clusters via the Gap statistic, http://citeseer.ist.psu.edu/tibshirani00estimating.html
# load and prepare data library(clv) data(iris) iris.data <- iris[,1:4] # cluster data pam.mod <- pam(iris.data,5) # create five clusters v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects # compute cluster sizes, center of each cluster # and mean from data objects cls.attr <- cls.attrib(iris.data, v.pred) center <- cls.attr$cluster.center size <- cls.attr$cluster.size iris.mean <- cls.attr$mean # compute matrix scatter measures W.matrix <- wcls.matrix(iris.data, v.pred, center) B.matrix <- bcls.matrix(center, size, iris.mean) T.matrix <- W.matrix + B.matrix # example of indices based on W, B i T matrices mx.scatt.crit1 = sum(diag(W.matrix)) mx.scatt.crit2 = sum(diag(B.matrix))/sum(diag(W.matrix)) mx.scatt.crit3 = det(W.matrix)/det(T.matrix)