em.cluster.R {ClustMMDD} | R Documentation |
Compute an approximation of the maximum likelihood estimates of parameters using Expectation and Maximization (EM) algorithm. A maximum a posteriori classification is then derived from the estimated set of parameters.
em.cluster.R(xdata, K, S, ploidy = 1, emOptions = list(epsi = NULL, typeSmallEM = NULL, typeEM = NULL, nberSmallEM = NULL, nberIterations = NULL, nberMaxIterations = NULL, putThreshold = NULL), cte = 1)
xdata |
A matrix of strings with the number of columns equal to ploidy * (number of variables). |
K |
The number of clusters (or populations). |
S |
The subset of clustering variables in the form of a vector of logicals indicating the selected variables. S gathers variables that are not identically distributed in at least two clusters. |
ploidy |
The number of unordered observations represented by a string in |
emOptions |
A list of EM options (see |
cte |
A double used as a value of λ in the penalty function pen(K,S)=λ*dim≤ft(K,S\right), where dim≤ft(K,S\right) is the number of free parameters in the model defined by ≤ft(K,S\right). |
A list of
N : The size (number of lines) of the dataset.
K : The number of clusters (populations).
S : A vector of logicals indicating the selected variables for clustering.
dim : The number of free parameters.
pi_K : The vector of mixing proportions.
prob : A list of matrices, each matrix being the probabilities of a variable in different clusters.
logLik : The log-likelihood.
entropy : The entropy.
criteria : Criteria values c(BIC, AIC, ICL, CteDim).
Tik : A stochastic matrix given the a posteriori membership probabilities.
mapClassif : Maximum a posteriori classification.
NbersLevels : The numbers of observed levels of the considered categorical variables.
levels : The observed levels.
Wilson Toussile.
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
dataR2C
for transformation of a classic data frame, backward.explorer
, selectK.R
,
dimJump.R
, model.selection.R
for both model selection and classification.
data(genotype1) head(genotype1) genotype2 = cutEachCol(genotype1[, -11], ploidy = 2) head(genotype2) #See the EM options EmOptions() # Options can be set by \code{\link{setEmOptions()}} par5 = em.cluster.R (genotype2, K = 5, S = c(rep(TRUE, 8), rep(FALSE, 2)), ploidy = 2) slotNames(par5) head(par5["membershipProba"]) par5["mixingProportions"] par5