selectK.R {ClustMMDD} | R Documentation |
Perform a selection of the number K of clusters for a given subset S of clustering variables.
selectK.R(xdata, S, Kmax, ploidy = 1, Kmin = 1, emOptions = list(epsi = 1e-05, nberSmallEM = 20, nberIterations = 15, nberMaxIterations = 5000, typeSmallEM = 0, typeEM = 0, putThreshold = FALSE), cte = 1, project = deparse(substitute(xdata)))
xdata |
A dataset in which data of each variable are in ploidy column(s). |
S |
A subset of clustering variables on the form of logical vector of the same length P as the number of variables in |
Kmax |
The maximum number of clusters to be explored. |
ploidy |
The number of occurrences for each variable in the data. For example, ploidy = 2 for genotype |
Kmin |
The minimum number of clusters to be explored. The default value is set to 1. |
emOptions |
A list of EM options (see |
cte |
A double used for the selection criterion named |
project |
The name of the project. The default value is the name of the dataset. |
A list of estimated paramaters for each selection criteria.
Wilson Toussile
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
backward.explorer
for more exploration of the competing models space, dimJump.R
for data driven calibration of the penality function, and model.selection.R
for model selection.
data(genotype1) head(genotype1) genotype2 = cutEachCol(genotype1[, -11], ploidy = 2) head(genotype2) S = c(rep(TRUE, 8), rep(FALSE, 2)) ## Not run: outPut = selectK.R(genotype2, S, Kmax = 6, ploidy = 2, Kmin=1) outPut[["BIC"]] file.remove("genotype2_ExploredModels.txt") ## End(Not run)