predict.blockForest {blockForest} | R Documentation |
This function is to be applied to the entry 'forest' of the output of
blockfor
. See the example section for illustration.
## S3 method for class 'blockForest' predict(object, data = NULL, predict.all = FALSE, num.trees = object$num.trees, type = "response", se.method = "infjack", quantiles = c(0.1, 0.5, 0.9), seed = NULL, num.threads = NULL, verbose = TRUE, ...)
object |
|
data |
New test data of class |
predict.all |
Return individual predictions for each tree instead of aggregated predictions for all trees. Return a matrix (sample x tree) for classification and regression, a 3d array for probability estimation (sample x class x tree) and survival (sample x time x tree). |
num.trees |
Number of trees used for prediction. The first |
type |
Type of prediction. One of 'response', 'se', 'terminalNodes', 'quantiles' with default 'response'. See below for details. |
se.method |
Method to compute standard errors. One of 'jack', 'infjack' with default 'infjack'. Only applicable if type = 'se'. See below for details. |
quantiles |
Vector of quantiles for quantile prediction. Set |
seed |
Random seed. Default is |
num.threads |
Number of threads. Default is number of CPUs available. |
verbose |
Verbose output on or off. |
... |
further arguments passed to or from other methods. |
For type = 'response'
(the default), the predicted classes (classification), predicted numeric values (regression), predicted probabilities (probability estimation) or survival probabilities (survival) are returned.
For type = 'se'
, the standard error of the predictions are returned (regression only). The jackknife-after-bootstrap or infinitesimal jackknife for bagging is used to estimate the standard errors based on out-of-bag predictions. See Wager et al. (2014) for details.
For type = 'terminalNodes'
, the IDs of the terminal node in each tree for each observation in the given dataset are returned.
For type = 'quantiles'
, the selected quantiles for each observation are estimated. See Meinshausen (2006) for details.
If type = 'se'
is selected, the method to estimate the variances can be chosen with se.method
. Set se.method = 'jack'
for jackknife-after-bootstrap and se.method = 'infjack'
for the infinitesimal jackknife for bagging.
For classification and predict.all = TRUE
, a factor levels are returned as numerics.
To retrieve the corresponding factor levels, use rf$forest$levels
, if rf
is the ranger object.
Object of class blockForest.prediction
with elements
predictions | Predicted classes/values (only for classification and regression) |
unique.death.times | Unique death times (only for survival). |
chf | Estimated cumulative hazard function for each sample (only for survival). |
survival | Estimated survival function for each sample (only for survival). |
num.trees | Number of trees. |
num.independent.variables | Number of independent variables. |
treetype | Type of forest/tree. Classification, regression or survival. |
num.samples | Number of samples. |
Marvin N. Wright
Wright, M. N. & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J Stat Softw 77:1-17. https://doi.org/10.18637/jss.v077.i01.
Wager, S., Hastie T., & Efron, B. (2014). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife. J Mach Learn Res 15:1625-1651. http://jmlr.org/papers/v15/wager14a.html.
Meinshausen (2006). Quantile Regression Forests. J Mach Learn Res 7:983-999. http://www.jmlr.org/papers/v7/meinshausen06a.html.
# NOTE: There is no association between covariates and response for the # simulated data below. # Moreover, the input parameters of blockfor() are highly unrealistic # (e.g., nsets = 10 is specified much too small). # The purpose of the shown examples is merely to illustrate the # application of predict.blockForest(). # Generate data: ################ set.seed(1234) # Covariate matrix: X <- cbind(matrix(nrow=40, ncol=5, data=rnorm(40*5)), matrix(nrow=40, ncol=30, data=rnorm(40*30, mean=1, sd=2)), matrix(nrow=40, ncol=100, data=rnorm(40*100, mean=2, sd=3))) colnames(X) <- paste("X", 1:ncol(X), sep="") # Block variable (list): block <- rep(1:3, times=c(5, 30, 100)) block <- lapply(1:3, function(x) which(block==x)) # Binary outcome: ybin <- factor(sample(c(0,1), size=40, replace=TRUE), levels=c(0,1)) # Survival outcome: ysurv <- cbind(rnorm(40), sample(c(0,1), size=40, replace=TRUE)) # Divide in training and test data: Xtrain <- X[1:30,] Xtest <- X[31:40,] ybintrain <- ybin[1:30] ybintest <- ybin[31:40] ysurvtrain <- ysurv[1:30,] ysurvtest <- ysurv[31:40,] # Binary outcome: Apply algorithm to training data and obtain predictions # for the test data: ######################################################################### # Apply a variant to the training data: blockforobj <- blockfor(Xtrain, ybintrain, num.trees = 100, replace = TRUE, block=block, nsets = 10, num.trees.pre = 50, splitrule="extratrees", block.method = "SplitWeights") blockforobj$paramvalues # Obtain prediction for the test data: (predres <- predict(blockforobj$forest, data = Xtest, block.method = "SplitWeights")) predres$predictions # Survival outcome: Apply algorithm to training data and obtain predictions # for the test data: ########################################################################### # Apply a variant to the training data: blockforobj <- blockfor(Xtrain, ysurvtrain, num.trees = 100, replace = TRUE, block=block, nsets = 10, num.trees.pre = 50, splitrule="extratrees", block.method = "SplitWeights") blockforobj$paramvalues # Obtain prediction for the test data: (predres <- predict(blockforobj$forest, data = Xtest, block.method = "SplitWeights")) rowSums(predres$chf)