initEmmix {EMMIXskew} | R Documentation |
Obtains intial parameter set for use in the EM algorithm. Grouping of the data occurs through one of three possible clustering methods: k-means, random start, and hierarchical clustering.
initEmmix(dat, g, clust, distr, ncov,maxloop=20) init.mix( dat, g, distr, ncov, nkmeans, nrandom, nhclust,maxloop=20)
dat |
The dataset, an n by p numeric matrix, where n is number of observations and p the dimension of data. |
g |
The number of components of the mixture model |
distr |
A three letter string indicating the type of distribution to be fit. See Details. |
ncov |
A small integer indicating the type of covariance structure. See Details. |
clust |
An initial partition of the data |
nkmeans |
An integer to specify the number of KMEANS partitions to be used to find the best initial values |
nrandom |
An integer to specify the number of random partitions to be used to find the best initial values |
nhclust |
A logical value to specify whether or not to use hierarchical cluster methods. If TRUE, the Complete Linkage method will be used. |
maxloop |
An integer to specify how many iterations to be tried to find the initial values,the default value is 10. |
The distribution type, determined by the distr
parameter, which may take any one of the following values:
"mvn" for a multivariate normal, "mvt" for a multivariate t-distribution, "msn" for a multivariate skew normal distribution and "mst" for a multivariate skew t-distribution.
The covariance matrix type, represented by the ncov
parameter, may be any one of the following:
ncov
=1 for a common variance, ncov
=2 for a common diagonal variance, ncov
=3 for a general variance, ncov
=4 for a diagonal variance, ncov
=5 for
sigma(h)*I(p)(diagonal covariance with same identical diagonal element values).
The return values include following components: pro
, a numeric vector of the mixing proportion of each component; mu
, a p by g matrix with each column as its corresponding mean;
sigma
, a three dimensional p by p by g array with its jth component matrix (p,p,j) as the covariance matrix for jth component of mixture models;
dof
, a vector of degrees of freedom for each component; delta
, a p by g matrix with its columns corresponding to skew parameter vectors.
When the dataset is huge, it becomes time-consuming to use a large maxloop to try every initial partition. The default is 10.
During the procedure to find the best inital clustering and intial values, for t-distribution and skew t-distribution, we don't estimate the degrees of freedom dof
, instead they are fixed at 4 for each component.
pro |
A vector of mixing proportions, see Details. |
mu |
A numeric matrix with each column corresponding to the mean, see Details. |
sigma |
An array of dimension (p,p,g) with first two dimension corresponding covariance matrix of each component, see Details. |
dof |
A vector of degrees of freedom for each component, see Details. |
delta |
A p by g matrix with each column corresponding to a skew parameter vector, see Details. |
McLachlan G.J. and Krishnan T. (2008). The EM Algorithm and Extensions (2nd). New Jersay: Wiley.
McLachlan G.J. and Peel D. (2000). Finite Mixture Models. New York: Wiley.
sigma<-array(0,c(2,2,3)) for(h in 2:3) sigma[,,h]<-diag(2) sigma[,,1]<-cbind( c(1,0.2),c(0.2,1)) mu <- cbind(c(4,-4),c(3.5,4),c( 0, 0)) delta <- cbind(c(3,3),c(1,5),c(-3,1)) dof <- c(3,5,5) pro <- c(0.3,0.3,0.4) n1=300;n2=300;n3=400; nn<-c(n1,n2,n3) n=1000 p=2 ng=3 distr="mvn" ncov=3 #first we generate a data set set.seed(111) #random seed is set dat <- rdemmix(nn,p,ng,distr,mu,sigma,dof,delta) clust<- rep(1:ng,nn) initobj1 <- initEmmix(dat,ng,clust,distr, ncov) initobj2 <- init.mix( dat,ng,distr,ncov,nkmeans=10,nrandom=0,nhclust=FALSE)