buildMS {pedometrics}R Documentation

Build a series of linear models using automated variable selection

Description

This function allows building a series of linear models (lm) using one or more automated variable selection implemented in function stepVIF and stepAIC.

Usage

buildMS(formula, data, vif = FALSE, vif.threshold = 10,
  vif.verbose = FALSE, aic = FALSE, aic.direction = "both",
  aic.trace = FALSE, aic.steps = 5000, ...)

Arguments

formula

A list containing one or several model formulas (a symbolic description of the model to be fitted).

data

Data frame containing the variables in the model formulas.

vif

Logical for performing backward variable selection using the Variance-Inflation Factor (VIF). Defaults to VIF = FALSE.

vif.threshold

Numeric value setting the maximum acceptable VIF value. Defaults to vif.threshold = 10.

vif.verbose

Logical for printing iteration results of backward variable selection using the VIF. Defaults to vif.verbose = FALSE.

aic

Logical for performing variable selection using Akaike Information Criterion (AIC). Defaults to aic = FALSE.

aic.direction

Character string setting the direction of variable selection when using AIC. Available options are "both", "forward", and "backward". Defaults to aic.direction = "both".

aic.trace

Logical for printing iteration results of variable selection using the AIC. Defaults to aic.trace = FALSE.

aic.steps

Integer value setting the maximum number of steps to be considered for variable selection using the AIC. Defaults to aic.steps = 5000.

...

Further arguments passed to the function stepAIC.

Details

This function was devised to deal with a list of linear model formulas. The main objective is to bring together several functions commonly used when building linear models, such as automated variable selection. In the current implementation, variable selection can be done using stepVIF or stepAIC or both. stepVIF is a backward variable selection procedure, while stepAIC supports backward, forward, and bidirectional variable selection. For more information about these functions, please visit their respective help pages.

An important feature of buildMS is that it records the initial number of candidate predictor variables and observations offered to the model, and adds this information as an attribute to the final selected model. Such feature was included because variable selection procedures result biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001). With the initial number of candidate predictor variables and observations offered to the model, one can calculate penalized or adjusted measures of model performance. For models built using builtMS, this can be done using statsMS.

Some important details should be clear when using buildMS:

  1. this function was originally devised to deal with a list of formulas, but can also be used with a single formula;

  2. in the current implementation, stepVIF runs before stepAIC;

  3. function arguments imported from stepAIC and stepVIF were named as in the original functions, and received a prefix (aic or vif) to help the user identifying which function is affected by a given argument without having to go check the documentation.

Value

A list containing the fitted linear models.

TODO

Add option to set the order in which stepAIC and stepVIF are run.

Author(s)

Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

References

Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

See Also

stepAIC, stepVIF, statsMS.

Examples

## Not run: 
# based on the second example of function stepAIC
require(MASS)
cpus1 <- cpus
for(v in names(cpus)[2:7])
  cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
                    include.lowest = TRUE)
cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
cpus.samp <- sample(1:209, 100)
cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                  chmax + perf),
                  formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                  formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
data <- cpus1[cpus.samp,2:8]
cpus.ms <- buildMS(cpus.form, data, vif = TRUE, aic = TRUE)

## End(Not run)

[Package pedometrics version 0.6-6 Index]