mblControl {resemble}R Documentation

A function that controls some aspects of the memory-based learning process in the mbl function

Description

This function is used to specify various aspects in the memory-based learning process in the mbl function

Usage

mblControl(sm = "pc",
           pcSelection = list("opc", 40),
           pcMethod = "svd",
           ws = if(sm == "movcor") 41,
           k0,
           returnDiss = FALSE,
           center = TRUE,
           scaled = TRUE,
           valMethod = c("NNv", "loc_crossval"),
           localOptimization = TRUE,
           resampling = 10, 
           p = 0.75,
           range.pred.lim = TRUE,
           progress = TRUE,
           cores = 1,            
           allowParallel = TRUE)

Arguments

sm

a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbours of each observation for which a prediction is required (see mbl). Options are:

  • "euclid": Euclidean dissimilarity.

  • "cosine": Cosine dissimilarity.

  • "sidF": Spectral information divergence computed on the spectral variables.

  • "sidD": Spectral information divergence computed on the density distributions of the spectra.

  • "cor": Correlation dissimilarity.

  • "movcor": Moving window correlation dissimilarity.

  • "pc": Principal components dissimilarity: Mahalanobis dissimilarity computed on the principal components space.

  • "loc.pc": Dissimilarity estimation based on local principal components.

  • "pls": Partial least squares dissimilarity: Mahalanobis dissimilarity computed on the partial least squares space.

  • "loc.pls" Dissimilarity estimation based on local partial least squares.

The "pc" spectral dissimilarity metric is the default. If the "sidD" is chosen, the default parameters of the sid function are used however they cab be modified by specifying them as additional arguments in the mbl function.

This argument can also be set to "none", in such a case, a dissimilarity matrix must be specified in the dissimilarityM argument of the mbl function.

pcSelection

a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis dissimilarity of each sample in sm = "Xu" to the centre of sm = "Xr". It also specifies the number of components in any of the following cases: sm = "pc", sm = "loc.pc", sm = "pls" and sm = "loc.pls". This list must contain two objects in the following order:

  • method:the method for selecting the number of components. Possible options are: "opc" (optimized pc selection based on Ramirez-Lopez et al. (2013a, 2013b). See the orthoProjection function for more details; "cumvar" (for selecting the number of principal components based on a given cumulative amount of explained variance); "var" (for selecting the number of principal components based on a given amount of explained variance); and "manual" (for specifying manually the desired number of principal components)

  • value:a numerical value that complements the selected method. If "opc" is chosen, it must be a value indicating the maximal number of principal components to be tested (see Ramirez-Lopez et al., 2013a, 2013b). If "cumvar" is chosen, it must be a value (higher than 0 and lower than 1) indicating the maximum amount of cumulative variance that the retained components should explain. If "var" is chosen, it must be a value (higher than 0 and lower than 1) indicating that components that explain (individually) a variance lower than this threshold must be excluded. If "manual" is chosen, it must be a value specifying the desired number of principal components to retain.

The default method for the pcSelection argument is "opc" and the maximal number of principal components to be tested is set to 40. Optionally, the pcSelection argument admits "opc" or "cumvar" or "var" or "manual" as a single character string. In such a case the default for "value" when either "opc" or "manual" are used is 40. When "cumvar" is used the default "value" is set to 0.99 and when "var" is used the default "value" is set to 0.01.

pcMethod

a character string indicating the principal component analysis algorithm to be used. Options are: "svd" (default) and "nipals". See orthoDiss.

ws

an odd integer value which specifies the window size when the moving window correlation dissimilarity is used (i.e sm = "movcor"). The default is 41.

k0

if any of the local dissimilarity methods is used (i.e. either sm = "loc.pc" or sm = "loc.pls") a numeric integer value. This argument controls the number of initial neighbours(k0) to retain in order to compute the local principal components (at each neighbourhood).

returnDiss

a logical indicating if the dissimilarity matrices must be returned.

center

a logical indicating whether or not the predictor variables must be centered at each local segment (before regression).

scaled

a logical indicating whether or not the predictor variables must be scaled at each local segment (before regression).

valMethod

a character vector which indicates the (internal) validation method(s) to be used for assessing the global performance of the local models. Possible options are: "NNv" and "loc_crossval". Alternatively "none" can be used when corss-validation is not required (see details below).

localOptimization

a logical. If valMethod = "loc_crossval", it optmizes the parameters of the local pls models (i.e. pls factors for pls and minimum and maximum pls factors for wapls1).

resampling

a value indicating the number of resampling iterations at each local segment when "loc_crossval" is selected in the valMethod argument. Default is 10.

p

a value indicating the percentage of samples to be retained in each resampling iteration at each local segment when "loc_crossval" is selected in the valMethod argument. Default is 0.75 (i.e. 75 "%")

range.pred.lim

a logical value. It indicates whether the prediction limits at each local regression are determined by the range of the response variable values employed at each local regression. If FALSE, no prediction limits are imposed. Default is TRUE.

progress

a logical indicating whether or not to print a progress bar for each sample to be predicted. Default is TRUE. Note: In case multicore processing is used, this progress bar will not be printed.

cores

number of cores used for the computation of dissimilarities when method in pcSelection is "opc" (which can be computationally intensive) (default = 1). See details.

allowParallel

To allow parallel execution of the sample loop (default is TRUE)

Details

The validation methods avaliable for assessing the predictive performance of the memory-based learning method used are described as follows:

Multi-threading for the computation of dissimilarities is based on OpenMP and hence works only on windows and linux. However, the loop used to iterate over the Xu samples in mbl uses the %dopar% operator of the foreach package, which can be used to parallelize this internal loop. The last example given in the mbl function ilustrates how to parallelize the mbl function.

Value

mblControl returns a list of class mbl with the specified parameters

Author(s)

Leonardo Ramirez-Lopez and Antoine Stevens

References

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.

See Also

fDiss, corDiss, sid, orthoDiss, mbl

Examples

#A control list with the default parameters
mblControl()

#A control list which specifies the moving correlation 
#dissimilarity metric with a moving window of 30
mblControl(sm = "movcor", ws = 31)

[Package resemble version 1.2.2 Index]