buildMS {pedometrics} | R Documentation |
This function allows building a series of linear models (lm
) using
one or more automated variable selection implemented in function
stepVIF
and stepAIC
.
buildMS(formula, data, vif = FALSE, vif.threshold = 10, vif.verbose = FALSE, aic = FALSE, aic.direction = "both", aic.trace = FALSE, aic.steps = 5000, ...)
formula |
A list containing one or several model formulas (a symbolic description of the model to be fitted). |
data |
Data frame containing the variables in the model formulas. |
vif |
Logical for performing backward variable selection using the
Variance-Inflation Factor (VIF). Defaults to |
vif.threshold |
Numeric value setting the maximum acceptable VIF value.
Defaults to |
vif.verbose |
Logical for printing iteration results of backward
variable selection using the VIF. Defaults to |
aic |
Logical for performing variable selection using Akaike
Information Criterion (AIC). Defaults to |
aic.direction |
Character string setting the direction of variable
selection when using AIC. Available options are |
aic.trace |
Logical for printing iteration results of variable selection
using the AIC. Defaults to |
aic.steps |
Integer value setting the maximum number of steps to be
considered for variable selection using the AIC. Defaults to
|
... |
Further arguments passed to the function |
This function was devised to deal with a list of linear model formulas. The
main objective is to bring together several functions commonly used when
building linear models, such as automated variable selection. In the current
implementation, variable selection can be done using stepVIF
or
stepAIC
or both. stepVIF
is a backward variable selection
procedure, while stepAIC
supports backward, forward, and bidirectional
variable selection. For more information about these functions, please visit
their respective help pages.
An important feature of buildMS
is that it records the initial number
of candidate predictor variables and observations offered to the model, and
adds this information as an attribute to the final selected model. Such
feature was included because variable selection procedures result biased
linear models (too optimistic), and the effective number of degrees of
freedom is close to the number of candidate predictor variables initially
offered to the model (Harrell, 2001). With the initial number of candidate
predictor variables and observations offered to the model, one can calculate
penalized or adjusted measures of model performance. For models built using
builtMS
, this can be done using statsMS
.
Some important details should be clear when using buildMS
:
this function was originally devised to deal with a list of formulas, but can also be used with a single formula;
in the current implementation, stepVIF
runs before
stepAIC
;
function arguments imported from stepAIC
and stepVIF
were named as in the original functions, and received a prefix (aic
or vif
) to help the user identifying which function is affected by a
given argument without having to go check the documentation.
A list containing the fitted linear models.
Add option to set the order in which stepAIC
and
stepVIF
are run.
Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com
Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.
## Not run: # based on the second example of function stepAIC require(MASS) cpus1 <- cpus for(v in names(cpus)[2:7]) cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])), include.lowest = TRUE) cpus0 <- cpus1[, 2:8] # excludes names, authors' predictions cpus.samp <- sample(1:209, 100) cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin + chmax + perf), formula(log10(perf) ~ syct + mmin + cach + chmin + chmax), formula(log10(perf) ~ mmax + cach + chmin + chmax + perf)) data <- cpus1[cpus.samp,2:8] cpus.ms <- buildMS(cpus.form, data, vif = TRUE, aic = TRUE) ## End(Not run)