sclust {snipEM} | R Documentation |
Estimates a finite Gaussian mixture model optimized over a snipping set.
sclust(X, k, V, R, restr.fact=12, tol = 1e-04, maxiters = 100, maxiters.S = 1000, print.it = FALSE)
X |
Data. |
k |
Number of clusters |
V |
Binary matrix of the same size as |
R |
Initial guess for cluster labels, |
restr.fact |
Restriction factor, i.e., constraint on the condition number of all covariance matrices for each cluster. Default is 12. |
tol |
Tolerance for convergence. Default is |
maxiters |
Maximum number of iterations for the SM algorithm. Default is |
maxiters.S |
Maximum number of iterations of the inner greedy snipping algorithm. Default is |
print.it |
Logical; if TRUE, partial results are print. Default is |
This function computes the sclust
estimator of Farcomeni
(2014). It leads to robust mixture modeling in presence of entry-wise outliers. It is
based on a classification-expectation-snip-maximize (CESM) algorithm. At the S step, the
likelihood is optimized over the set of snipped entries, at the M
step the location and scatter estimates are updated. The S step is
based on a greedy algorithm, unlike the one proposed in Farcomeni
(2014,2014a). The number of snipped entries sum(1-V)
is kept
fixed throughout. Note that initializing with labels arising from
classical (non-robust) clustering methods may be detrimental for the final
performance of sclust
and may even yield an error due to
empty clusters.
A list with the following elements:
R | Final cluster labels. |
mu | Estimated location matrix. |
S | Array of estimated scatter matrices. |
V | Final (optimal) V matrix. |
lik | Gaussian log-likelihood at convergence. |
iter | Number of outer iterations before convergence. |
Alessio Farcomeni alessio.farcomeni@uniroma1.it, Andy Leung andy.leung@stat.ubc.ca
Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917
Farcomeni, A. (2014) Robust constrained clustering in presence of entry-wise outliers, Technometrics, 56, 102-111
snipEM
, stEM
,
sumlog
,
ldmvnorm
set.seed(1234) X <- matrix(NA,200,5) # two clusters k <- 2 X[1:100,] <- rnorm(100*5) X[101:200,] <- rnorm(100*5,15) R <- rep(c(1,2), each=100) # 5% cellwise outliers s <- sample(200*5,200*5*0.05) X[s] <- runif(200*5*0.05,-100,100) V <- X V[s] <- 0 V[-s] <- 1 # Initial V and R Vinit <- matrix(1, nrow(X), ncol(X)) Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0 Rinit <- kmeans(X,2)$clust # Snipped robust clustering sc <- sclust(X,2,Vinit,Rinit) table(R,Rinit) table(R,sc$R)