skmeans {snipEM} | R Documentation |
Perform k-means clustering on a data matrix with cellwise outliers using a snipping algorithm.
skmeans(X, k, V, clust, s, itersmax = 10^5, D = 1e-1)
X |
Data. |
k |
Integer; number of clusters, |
V |
Binary matrix of the same size as X. Zeros correspond to initial snipped entries. |
clust |
Vector of size |
itersmax |
Max number of iterations of the algorithm. Default is |
s |
Binary vector of size |
D |
Tuning parameter for the fitting algorithm. Corresponds approximately to the maximal change in loss by switching two non
outlying entries. Comparing different choices is recommended. Default is |
This function computes the skmeans
estimator of Farcomeni
(2014). It leads to robust k-means in presence of
entry-wise and cellwise outliers. The number of snipped entries
sum(1-V)
and trimmed rows sum(1-s)
is kept
fixed throughout. Initial estimates for V
, s
and
clust
should be provided. Note that initializing with labels arising from
classical (non-robust) clustering methods may be detrimental for the final
performance of skmeans
and may even yield an error due to
empty clusters.
A list with the following elements:
loss | Loss function (the total sum of squares) at convergence. |
mu | Estimated locations. |
s | Final (optimal) trimmed rows in vector of size n . |
V | Final (optimal) V matrix. |
clust | Final (optimal) class labels as vector of size n . |
Alessio Farcomeni alessio.farcomeni@uniroma1.it, Andy Leung andy.leung@stat.ubc.ca
Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917
set.seed(1234) X <- matrix(NA,200,5) # two clusters k <- 2 X[1:100,] <- rnorm(100*5) X[101:200,] <- rnorm(100*5,15) clust <- rep(c(1,2), each=100) # 5% cellwise outliers s <- sample(200*5,200*5*0.05) X[s] <- runif(200*5*0.05,-100,100) V <- X V[s] <- 0 V[-s] <- 1 # Initial V and R Vinit <- matrix(1, nrow(X), ncol(X)) Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0 km <- kmeans(X,k) clustinit <- km$clust # Snipped robust clustering skm <- skmeans(X, k, Vinit, clustinit) table(clust,km$clust) table(clust,skm$clust)