mblControl {resemble} | R Documentation |
mbl
functionThis function is used to specify various aspects in the memory-based learning process in the mbl
function
mblControl(sm = "pc", pcSelection = list("opc", 40), pcMethod = "svd", ws = if(sm == "movcor") 41, k0, returnDiss = FALSE, center = TRUE, scaled = TRUE, valMethod = c("NNv", "loc_crossval"), localOptimization = TRUE, resampling = 10, p = 0.75, range.pred.lim = TRUE, progress = TRUE, cores = 1, allowParallel = TRUE)
sm |
a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbours of each observation for which a prediction is required (see
The This argument can also be set to |
pcSelection |
a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis dissimilarity of each sample in
The default method for the |
pcMethod |
a character string indicating the principal component analysis algorithm to be used. Options are: |
ws |
an odd integer value which specifies the window size when the moving window correlation dissimilarity is used (i.e |
k0 |
if any of the local dissimilarity methods is used (i.e. either |
returnDiss |
a logical indicating if the dissimilarity matrices must be returned. |
center |
a logical indicating whether or not the predictor variables must be centered at each local segment (before regression). |
scaled |
a logical indicating whether or not the predictor variables must be scaled at each local segment (before regression). |
valMethod |
a character vector which indicates the (internal) validation method(s) to be used for assessing the global performance of the local models. Possible
options are: |
localOptimization |
a logical. If |
resampling |
a value indicating the number of resampling iterations at each local segment when |
p |
a value indicating the percentage of samples to be retained in each resampling iteration at each local segment when |
range.pred.lim |
a logical value. It indicates whether the prediction limits at each local regression are determined by the range of the response variable values employed at each local regression. If |
progress |
a logical indicating whether or not to print a progress bar for each sample to be predicted. Default is |
cores |
number of cores used for the computation of dissimilarities when |
allowParallel |
To allow parallel execution of the sample loop (default is |
The validation methods avaliable for assessing the predictive performance of the memory-based learning method used are described as follows:
Leave-nearest-neighbour-out cross validation ("NNv"
): From the group of neighbours of each sample to be predicted, the nearest sample (i.e. the most similar sample) is excluded and then a local model is fitted using the remaining neighbours. This model is then used to predict the value of the target response variable of the nearest sample. These predicted values are finally cross validated with the actual values (See Ramirez-Lopez et al. (2013a) for additional details). This method is faster than "loc_crossval"
Local leave-group-out cross validation ("loc_crossval"
): The group of neighbours of each sample to be predicted is partitioned into different equal size subsets. Each partition is selected based on a stratified random sampling which takes into account the values of the response variable of the corresponding set of neighbours. The selected local subset is used as local validation subset and the remaining samples are used for fitting a model. This model is used to predict the target response variable values of the local validation subset and the local root mean square error is computed. This process is repeated m times and the final local error is computed as the average of the local root mean square error of all the m iterations. In the mbl
function m is controlled by the resampling
argument and the size of the subsets is controlled by the p
argument which indicates the percentage of samples to be selected from the subset of nearest neighbours. The global error of the predictions is computed as the average of the local root mean square errors.
No validation ("none"
): No validation is carried out. If "none"
is seleceted along with "NNv"
and/or "loc_crossval"
, then it will be ignored and the respective validation(s) will be carried out.
Multi-threading for the computation of dissimilarities is based on OpenMP and hence works only on windows and linux.
However, the loop used to iterate over the Xu
samples in mbl
uses the %dopar%
operator of the foreach
package, which can be used to parallelize this internal loop. The last example given in the mbl
function ilustrates how to parallelize the mbl
function.
mblControl
returns a list
of class mbl
with the specified parameters
Leonardo Ramirez-Lopez and Antoine Stevens
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
fDiss
, corDiss
, sid
, orthoDiss
, mbl
#A control list with the default parameters mblControl() #A control list which specifies the moving correlation #dissimilarity metric with a moving window of 30 mblControl(sm = "movcor", ws = 31)