dimJump.R {ClustMMDD} | R Documentation |
Data driven calibration of the penalty function using the dimension jump version of the "slope heuristics".
dimJump.R(fileOrData, h = integer(), N = integer(), header = logical())
fileOrData |
A character string or a data frame (see details). If a data frame, it must contain columns named |
h |
An integer defining the size of the sliding window used to find the biggest jump. |
N |
The size of the sample data (number of rows). |
header |
The indication of whether the file contains header or not. |
This function is a dimension jump version of the so called slope heuristics for the calibration of penalty function using the data.
Assume that the penalty function is in the form
pen≤ft(K,S\right) = α*λ*dim≤ft(K,S\right)
, where
λ is the penalty parameter to be calibrated,
and α a coeffcient belonging to [1.5,2], to be given by the user in model.selection.R
for the final selection.
It returns a list containing two candidate values of λ and their bounds. It also produces a graphic that illustrates the "slope heuristics".
Wilson Toussile
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
backward.explorer
for exploration of competing models space, model.selection.R
for final selection.
# genotype2_ExploredModels was obtained via backward.explorer. data(genotype2_ExploredModels) outDimJump = dimJump.R(genotype2_ExploredModels, N = 1000, h = 5, header = TRUE) outDimJump[[1]]