classifyV {MGSDA} | R Documentation |
Classify observations in the test set using the supplied matrix of canonical vectors V and the training set.
classifyV(Xtrain, Ytrain, Xtest, V, prior = T,tol1=1e-10)
Xtrain |
A Nxp data matrix; N observations on the rows and p features on the columns. |
Ytrain |
A N vector containing the group labels. Should be coded as 1,2,...,G, where G is the number of groups. |
Xtest |
A Mxp data matrix; M test observations on the rows and p features on the columns. |
V |
A pxr matrix of canonical vectors that is used to classify observations. |
prior |
A logical indicating whether to put larger weights to the groups of larger size; the default value is TRUE. |
tol1 |
Tolerance level for the eigenvalues of V'WV. If some eigenvalues are less than |
For a new observation with the value x, the classification is performed based on the smallest Mahalanobis distance in the projected space:
min_g (V'x-Z_g)(V'WV)^(-1)(V'x-Z_g),
where Z_g are the group-specific means of the training dataset in the projected space and W is the sample within-group covariance matrix.
If prior=T
, then the above distance is adjusted by -2 log n_g/N, where n_g is the size of group g.
Returns a vector of length M with predicted group labels for the test set.
Irina Gaynanova
I.Gaynanova, J.Booth and M.Wells (2015) "Simultaneous Sparse Estimation of Canonical Vectors in the p>>N setting.", JASA, to appear
### Example 1 # generate training data n=10 p=100 G=3 ytrain=rep(1:G,each=n) set.seed(1) xtrain=matrix(rnorm(p*n*G),n*G,p) # find V V=dLDA(xtrain,ytrain,lambda=0.1) sum(rowSums(V)!=0) # generate test data m=20 set.seed(3) xtest=matrix(rnorm(p*m),m,p) # perform classification ytest=classifyV(xtrain,ytrain,xtest,V)