optimRandomForest {pedometrics} | R Documentation |
Compute the optimum number of iterations needed to de-bias a random forest regression.
optimRandomForest(x, y, niter = 10, nruns = 100, ntree = 500, ntrain = 2/3, nodesize = 5, mtry = max(floor(ncol(x)/3), 1), profile = TRUE, progress = TRUE)
x |
Data frame or matrix of covariates (predictor variables). |
y |
Numeric vector with the response variable. |
niter |
Number of iterations. Defaults to |
nruns |
Number of simulations to be used in each iteration. Defaults to
|
ntree |
Number of trees to grow. Defaults to |
ntrain |
Number (or proportion) of observation to be used as training cases. Defaults to 2/3 of the total number of observations. |
nodesize |
Minimum size of terminal nodes. Defaults to
|
mtry |
Number of variables randomly sampled as candidates at each split. Defaults to 1/3 of the total number of covariates. |
profile |
Should the profile of the standardized mean squared prediction
error be plotted at the end of the optimization? Defaults to
|
progress |
Should a progress bar be displayed. Defaults to
|
A fixed proportion of the total number of observations is used to calibrate (train) the random forest regression. The set of calibration observations is randomly selected from the full set of observations in each simulation. The remaining observations are used as test cases (validation). In general, the smaller the calibration dataset, the more simulation runs are needed to obtain stable estimates of the mean squared prediction error (MSPE).
The optimum number of iterations needed to de-bias the random forest regression is obtained observing the evolution of the MSPE as the number of iterations increases. The MSPE is defined as the mean of the squared differences between predicted and observed values.
The original function was published as part of the dissertation of Ruo Xu, which was developed under the supervision of Daniel S Nettleton dnett@iastate.edu and Daniel J Nordman dnordman@iastate.edu.
Ruo Xu xuruo.isu@gmail.com, with improvements by Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com
Breiman, L. Random forests. Machine Learning. v. 45, p. 5-32, 2001.
Breiman, L. Using adaptive bagging to debias regressions. Berkeley: University of California, p. 16, 1999.
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News. v. 2/3, p. 18-22, 2002.
Xu, R. Improvements to random forest methodology. Ames, Iowa: Iowa State University, p. 87, 2013.