AUCRF {AUCRF} | R Documentation |
AUCRF is an algorithm for variable selection using Random Forest based on optimizing the area-under-the ROC curve (AUC) of the Random Forest. The proposed strategy implements a backward elimination process based on the initial ranking of the variables.
AUCRF(formula, data, k0 = 1, pdel = 0.2, ranking=c("MDG","MDA"), ...)
formula |
an object of class |
data |
a data frame containing the variables in the model. Dependent variable must be a
binary variable defined as |
k0 |
number of remaining variables for stopping the backward elimination process.
By default |
pdel |
fraction of remaining variables to be removed in each step. By default |
ranking |
specifies the importance measure provided by |
... |
optional parameters to be passed to the |
The AUC-RF algorithm is described in detail in Calle et. al.(2011). The following is a summary:
Ranking and AUC of the initial set:
Perform a random forest using all predictor variables and the response, as specified in the formula
argument, and compute the AUC of the random forest. Based on the selected measure of importance (by default MDG),
obtain a ranking of predictors.
Elimination process:
Based on the variables ranking, remove the less important variables (fraction of variables specified in
pdel
argument). Perform a new random forest with the remaining variables and compute its AUC.
This step is iterated until the number of remaining variables is less or equal than k0
.
Optimal set:
The optimal set of predictive variables is considered the one giving rise to the Random Forest with the
highest OOB-AUCopt. The number of selected predictors is denoted by Kopt
An object of class AUCRF
, which is a list with the following components:
call |
the original call to |
data |
the |
ranking |
the ranking of predictors based on the importance measure. |
Xopt |
optimal set of predictors obtained. |
OOB-AUCopt |
AUC obtained for the optimal set of predictors. |
Kopt |
size of the optimal set of predictors obtained. |
AUCcurve |
values of AUC obtained for each set of predictors evaluated in the elimination process. |
RFopt |
the |
Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) "AUC-RF: A new strategy for genomic profiling with Random Forest". Human Heredity. (In press)
OptimalSet
, AUCRFcv
, randomForest
.
# load the included example dataset. This is a simulated case/control study # data set with 4000 patients (2000 cases / 2000 controls) and 1000 SNPs, # where the first 10 SNPs have a direct association with the outcome: data(exampleData) # call AUCRF process: (it may take some time) # fit <- AUCRF(Y~., data=exampleData) # The result of this example is included for illustration purpose: data(fit) summary(fit) plot(fit) # Additional randomForest parameters can be included, otherwise default # parameters of randomForest function will be used: # fit <- AUCRF(Y~., data=exampleData, ntree=1000, nodesize=20)