orthoProjection {resemble} | R Documentation |
Functions to perform orthogonal projections of high dimensional data matrices using partial least squares (pls) and principal component analysis (pca)
orthoProjection(Xr, X2 = NULL, Yr = NULL, method = "pca", pcSelection = list("cumvar", 0.99), center = TRUE, scaled = FALSE, cores = 1, ...) pcProjection(Xr, X2 = NULL, Yr = NULL, pcSelection = list("cumvar", 0.99), center = TRUE, scaled = FALSE, method = "pca", tol = 1e-6, max.iter = 1000, cores = 1, ...) plsProjection(Xr, X2 = NULL, Yr, pcSelection = list("opc", 40), scaled = FALSE, tol = 1e-6, max.iter = 1000, cores = 1, ...) ## S3 method for class 'orthoProjection' predict(object, newdata, ...) pcProjection(Xr, X2 = NULL, Yr = NULL, pcSelection = list("cumvar", 0.99), center = TRUE, scaled = FALSE, method = "pca", tol = 1e-06, max.iter = 1000, cores = 1, ...) plsProjection(Xr, X2 = NULL, Yr, pcSelection = list("opc", 40), scaled = FALSE, tol = 1e-06, max.iter = 1000, cores = 1, ...) ## S3 method for class 'orthoProjection' predict(object, newdata, ...)
Xr |
a |
X2 |
an optional |
Yr |
if the method used in the |
method |
the method for projecting the data. Options are: "pca" (principal component analysis using the singular value decomposition algorithm), "pca.nipals" (principal component analysis using the non-linear iterative partial least squares algorithm) and "pls" (partial least squares). |
pcSelection |
a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis distance of each sample in
The default method for the |
center |
a logical indicating if the data |
scaled |
a logical indicating if |
cores |
number of cores used when |
... |
additional arguments to be passed to |
tol |
tolerance limit for convergence of the algorithm in the nipals algorithm (default is 1e-06). In the case of PLS this applies only to Yr with more than two variables. |
max.iter |
maximum number of iterations (default is 1000). In the case of |
object |
object of class "orthoProjection" (as returned by |
newdata |
an optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. It must contain the same number of columns, to be used in the same order. |
In the case of method = "pca"
, the algrithm used is the singular value decomposition in which given a data matrix X, is factorized as follows:
X = UDV^{\mathrm{T}}
where U and V are othogonal matrices, and where U is a matrix of the left singular vectors of X, D is a diagonal matrix containing the singular values of X and V is the is a matrix of the right singular vectors of X.
The matrix of principal component scores is obtained by a matrix multiplication of U and D, and the matrix of principal component loadings is equivalent to the matrix V.
When method = "pca.nipals"
, the algorithm used for principal component analysis is the non-linear iterative partial least squares (nipals).
In the case of the of the partial least squares projection (a.k.a projection to latent structures) the nipals regression algorithm. Details on the "nipals" algorithm are presented in Martens (1991).
When method = "opc"
, the selection of the components is carried out by using an iterative method based on the side information concept (Ramirez-Lopez et al. 2013a, 2013b). First let be P a sequence of retained components (so that P = 1, 2, ...,k .
At each iteration, the function computes a dissimilarity matrix retaining p_i components. The values of the side information of the samples are compared against the side information values of their most spectrally similar samples.
The optimal number of components retrieved by the function is the one that minimizes the root mean squared differences (RMSD) in the case of continuous variables, or maximizes the kappa index in the case of categorical variables. In this process the simEval
function is used.
Note that for the "opc"
method is necessary to specify Yr
(the side information of the samples).
Multi-threading for the computation of dissimilarities (see cores
parameter) is based on OpenMP and hence works only on windows and linux.
orthoProjection
, pcProjection
, plsProjection
, return a list
of class orthoProjection
with the following components:
scores
a matrix
of scores corresponding to the samples in Xr
and X2
(if it applies). The number of components that the scores represent is given by the number of components chosen in the function.
X.loadings
a matrix
of loadings corresponding to the explanatory variables. The number of components that these loadings represent is given by the number of components chosen in the function.
Y.loadings
a matrix
of partial least squares loadings corresponding to Yr
. The number of components that these loadings represent is given by the number of components chosen in the function. This object is only returned if the partial least squares algorithm was used.
weigths
a matrix
of partial least squares ("pls") weights. This object is only returned if the "pls" algorithm was used.
projectionM
a matrix
that can be used to project new data onto a "pls" space. This object is only returned if the "pls" algorithm was used.
variance
a matrix
indicating the standard deviation of each component (sdv), the cumulative explained variance (cumExplVar) and the variance explained by each single component (explVar). These values are computed based on the data used to create the projection matrices.
For example if the "pls" method was used, then these values are computed based only on the data that contains information on Yr
(i.e. the Xr
data)
If the principal component method is used, the this data is computed on the basis of Xr
and X2
(if it applies) since both matrices are employed in the computation of the projection matrix (loadings in this case).
svd
the standard deviation of the retrieved scores.
n.components
the number of components (either principal components or partial least squares components) used for computing the global distances.
opcEval
a data.frame
containing the statistics computed for optimizing the number of principal components based on the variable(s) specified in the Yr
argument. If Yr
was a continuous was a continuous vector
or matrix
then this object indicates the root mean square of differences (rmse) for each number of components. If Yr
was a categorical variable this object indicates the kappa values for each number of components.
This object is returned only if "opc"
was used within the pcSelection
argument. See the simEval
function for more details.
method
the orthoProjection
method used.
predict.orthoProjection
, returns a matrix of scores proprojected for newdtata
.
Leonardo Ramirez-Lopez
Martens, H. (1991). Multivariate calibration. John Wiley & Sons.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
## Not run: require(prospectr) data(NIRsoil) Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),] Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)] Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)] Xr <- NIRsoil$spc[as.logical(NIRsoil$train),] Xu <- Xu[!is.na(Yu),] Yu <- Yu[!is.na(Yu)] Xr <- Xr[!is.na(Yr),] Yr <- Yr[!is.na(Yr)] # A partial least squares projection using the "opc" method # for the selection of the optimal number of components plsProj <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, method = "pls", pcSelection = list("opc", 40)) # A principal components projection using the "opc" method # for the selection of the optimal number of components pcProj <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, method = "pca", pcSelection = list("opc", 40)) # A partial least squares projection using the "cumvar" method # for the selection of the optimal number of components plsProj2 <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, method = "pls", pcSelection = list("cumvar", 0.99)) ## End(Not run)