orthoDiss {resemble} | R Documentation |
This function computes dissimilarities (in an orthogonal space) between either observations in a given set or between observations in two different sets. The dissimilarities are computed based on either principal component projection or partial least squares projection of the data. After projecting the data, the Mahalanobis distance is applied.
orthoDiss(Xr, X2 = NULL, Yr = NULL, pcSelection = list("cumvar", 0.99), method = "pca", local = FALSE, k0, center = TRUE, scaled = FALSE, return.all = FALSE, cores = 1, ...)
Xr |
a |
X2 |
an optional |
Yr |
either if the method used in the |
pcSelection |
a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis distance of each sample in
The default method for the |
method |
the method for projecting the data. Options are: "pca" (principal component analysis using the singular value decomposition algorithm), "pca.nipals" (principal component analysis using the non-linear iterative partial least squares algorithm) and "pls" (partial least squares). See the |
local |
a logical indicating whether or not to compute the distances locally (i.e. projecting locally the data) by using the k0 nearest neighbour samples of each sample. Default is |
k0 |
if |
center |
a logical indicating if the spectral data |
scaled |
a logical indicating if |
return.all |
a logical. In case |
cores |
number of cores used when |
... |
additional arguments to be passed to the |
When local = TRUE
, first a global distance matrix is computed based on the parameters specified. Then, by using this matrix for each target observation, a given set of nearest neighbours (k0) are identified. These neighbours (together with the target observation) are projected (from the original data space) onto a (local) orthogonal space (using the same parameters specified in the function).
In this projected space the Mahalanobis distance between the target sample and the neighbours is recomputed. A missing value is assigned to the samples that do not belong to this set of neighbours (non-neighbour samples).
In this case the dissimilarity matrix cannot be considered as a distance metric since it does not necessarily satisfies the symmetry condition for distance matrices (i.e. given two samples x_i and x_j, the local dissimilarity (d) between them is relative since generally d(x_i, x_j) \neq d(x_j, x_i)). On the other hand, when local = FALSE
, the dissimilarity matrix obtained can be considered as a distance matrix.
a list
of class orthoDiss
with the following components:
n.components
the number of components (either principal components or partial least squares components) used for computing the global distances.
global.variance.info
the information about the expalined variance(s) of the projection. When local = TRUE
, the information corresponds to the global projection done prior computing the local projections.
loc.n.components
if local = TRUE
, a data.frame
which specifies the number of local components (either principal components or partial least squares components) used for computing the dissimilarity between each target sample and its neighbour samples.
dissimilarity
the computed dissimilarity matrix. If local = FALSE
a distance matrix
. If local = TRUE
a matrix
of class orthoDiss
. In this case each column represent the dissimilarity between a target sample and its neighbourhood.
Multi-threading for the computation of dissimilarities (see cores
parameter) is based on OpenMP and hence works only on windows and linux.
Leonardo Ramirez-Lopez
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
## Not run: require(prospectr) data(NIRsoil) Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),] Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)] Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)] Xr <- NIRsoil$spc[as.logical(NIRsoil$train),] Xu <- Xu[!is.na(Yu),] Yu <- Yu[!is.na(Yu)] Xr <- Xr[!is.na(Yr),] Yr <- Yr[!is.na(Yr)] # Computation of the orthogonal dissimilarity matrix using the # default parameters ex1 <- orthoDiss(Xr = Xr, X2 = Xu) # Computation of a principal component dissimilarity matrix using # the "opc" method for the selection of the principal components ex2 <- orthoDiss(Xr = Xr, X2 = Xu, Yr = Yr, pcSelection = list("opc", 40), method = "pca", return.all = TRUE) # Computation of a partial least squares (PLS) dissimilarity # matrix using the "opc" method for the selection of the PLS # components ex3 <- orthoDiss(Xr = Xr, X2 = Xu, Yr = Yr, pcSelection = list("opc", 40), method = "pls") # Computation of a partial least squares (PLS) local dissimilarity # matrix using the "opc" method for the selection of the PLS # components ex4 <- orthoDiss(Xr = Xr, X2 = Xu, Yr = Yr, pcSelection = list("opc", 40), method = "pls", local = TRUE, k0 = 200) ## End(Not run)