optimRandomForest {pedometrics}R Documentation

Optimum number of iterations to de-bias a random forest regression

Description

Compute the optimum number of iterations needed to de-bias a random forest regression.

Usage

optimRandomForest(x, y, niter = 10, nruns = 100, ntree = 500,
  ntrain = 2/3, nodesize = 5, mtry = max(floor(ncol(x)/3), 1),
  profile = TRUE, progress = TRUE)

Arguments

x

Data frame or matrix of covariates (predictor variables).

y

Numeric vector with the response variable.

niter

Number of iterations. Defaults to niter = 10.

nruns

Number of simulations to be used in each iteration. Defaults to nruns = 100.

ntree

Number of trees to grow. Defaults to ntree = 500.

ntrain

Number (or proportion) of observation to be used as training cases. Defaults to 2/3 of the total number of observations.

nodesize

Minimum size of terminal nodes. Defaults to nodesize = 5.

mtry

Number of variables randomly sampled as candidates at each split. Defaults to 1/3 of the total number of covariates.

profile

Should the profile of the standardized mean squared prediction error be plotted at the end of the optimization? Defaults to profile = TRUE.

progress

Should a progress bar be displayed. Defaults to progress = TRUE.

Details

A fixed proportion of the total number of observations is used to calibrate (train) the random forest regression. The set of calibration observations is randomly selected from the full set of observations in each simulation. The remaining observations are used as test cases (validation). In general, the smaller the calibration dataset, the more simulation runs are needed to obtain stable estimates of the mean squared prediction error (MSPE).

The optimum number of iterations needed to de-bias the random forest regression is obtained observing the evolution of the MSPE as the number of iterations increases. The MSPE is defined as the mean of the squared differences between predicted and observed values.

Note

The original function was published as part of the dissertation of Ruo Xu, which was developed under the supervision of Daniel S Nettleton dnett@iastate.edu and Daniel J Nordman dnordman@iastate.edu.

Author(s)

Ruo Xu xuruo.isu@gmail.com, with improvements by Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

References

Breiman, L. Random forests. Machine Learning. v. 45, p. 5-32, 2001.

Breiman, L. Using adaptive bagging to debias regressions. Berkeley: University of California, p. 16, 1999.

Liaw, A. & Wiener, M. Classification and regression by randomForest. R News. v. 2/3, p. 18-22, 2002.

Xu, R. Improvements to random forest methodology. Ames, Iowa: Iowa State University, p. 87, 2013.

See Also

randomForest


[Package pedometrics version 0.6-6 Index]