dummyVars {caret} | R Documentation |
dummyVars
creates a full set of dummy variables (i.e. less than full rank parameterization)
dummyVars(formula, ...) ## Default S3 method: dummyVars(formula, data, sep = ".", levelsOnly = FALSE, fullRank = FALSE, ...) ## S3 method for class 'dummyVars' predict(object, newdata, na.action = na.pass, ...) contr.dummy(n, ...) ## DEPRECATED contr.ltfr(n, contrasts = TRUE, sparse = FALSE) class2ind(x, drop2nd = FALSE)
formula |
An appropriate R model formula, see References |
data |
A data frame with the predictors of interest |
sep |
An optional separator between factor variable names and their levels. Use |
levelsOnly |
A logical; |
fullRank |
A logical; should a full rank or less than full rank parameterization be used? If |
object |
An object of class |
newdata |
A data frame with the required columns |
na.action |
A function determining what should be done with missing values in |
n |
A vector of levels for a factor, or the number of levels. |
contrasts |
A logical indicating whether contrasts should be computed. |
sparse |
A logical indicating if the result should be sparse. |
x |
A factor vector. |
drop2nd |
A logical: when the factor |
... |
additional arguments to be passed to other methods |
Most of the contrasts
functions in R produce full rank parameterizations of the predictor data. For example, contr.treatment
creates a reference cell in the data and defines dummy variables for all factor levels except those in the reference cell. For example, if a factor with 5 levels is used in a model formula alone, contr.treatment
creates columns for the intercept and all the factor levels except the first level of the factor. For the data in the Example section below, this would produce:
(Intercept) dayTue dayWed dayThu dayFri daySat daySun 1 1 1 0 0 0 0 0 2 1 1 0 0 0 0 0 3 1 1 0 0 0 0 0 4 1 0 0 1 0 0 0 5 1 0 0 1 0 0 0 6 1 0 0 0 0 0 0 7 1 0 1 0 0 0 0 8 1 0 1 0 0 0 0 9 1 0 0 0 0 0 0
In some situations, there may be a need for dummy variables for all the levels of the factor. For the same example:
dayMon dayTue dayWed dayThu dayFri daySat daySun 1 0 1 0 0 0 0 0 2 0 1 0 0 0 0 0 3 0 1 0 0 0 0 0 4 0 0 0 1 0 0 0 5 0 0 0 1 0 0 0 6 1 0 0 0 0 0 0 7 0 0 1 0 0 0 0 8 0 0 1 0 0 0 0 9 1 0 0 0 0 0 0
Given a formula and initial data set, the class dummyVars
gathers all the information needed to produce a full set of dummy variables for any data set. It uses contr.ltfr
as the base function to do this.
class2ind
is most useful for converting a factor outcome vector to a matrix of dummy variables.
The output of dummyVars
is a list of class 'dummyVars' with elements
call |
the function call |
form |
the model formula |
vars |
names of all the variables in the model |
facVars |
names of all the factor variables in the model |
lvls |
levels of any factor variables |
sep |
|
terms |
the |
levelsOnly |
a logical |
The predict
function produces a data frame.
contr.ltfr
generates a design matrix.
contr.ltfr
is a small modification of contr.treatment
by Max Kuhn
http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models
model.matrix
, contrasts
, formula
when <- data.frame(time = c("afternoon", "night", "afternoon", "morning", "morning", "morning", "morning", "afternoon", "afternoon"), day = c("Mon", "Mon", "Mon", "Wed", "Wed", "Fri", "Sat", "Sat", "Fri")) levels(when$time) <- list(morning="morning", afternoon="afternoon", night="night") levels(when$day) <- list(Mon="Mon", Tue="Tue", Wed="Wed", Thu="Thu", Fri="Fri", Sat="Sat", Sun="Sun") ## Default behavior: model.matrix(~day, when) mainEffects <- dummyVars(~ day + time, data = when) mainEffects predict(mainEffects, when[1:3,]) when2 <- when when2[1, 1] <- NA predict(mainEffects, when2[1:3,]) predict(mainEffects, when2[1:3,], na.action = na.omit) interactionModel <- dummyVars(~ day + time + day:time, data = when, sep = ".") predict(interactionModel, when[1:3,]) noNames <- dummyVars(~ day + time + day:time, data = when, levelsOnly = TRUE) predict(noNames, when)