vgam {VGAM}R Documentation

Fitting Vector Generalized Additive Models

Description

Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

Usage

vgam(formula, family, data = list(), weights = NULL, subset = NULL, 
     na.action = na.fail, etastart = NULL, mustart = NULL, 
     coefstart = NULL, control = vgam.control(...), offset = NULL, 
     method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
     contrasts = NULL, constraints = NULL, 
     extra = list(), form2 = NULL, qr.arg = FALSE, smart = TRUE, ...)

Arguments

formula

a symbolic description of the model to be fit. The RHS of the formula is applied to each linear/additive predictor, and usually includes at least one s term. Different variables in each linear/additive predictor can be chosen by specifying constraint matrices.

family

Same as for vglm.

data

an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which vgam is called.

weights, subset, na.action

Same as for vglm. Note that subset may be unreliable and to get around this problem it is best to use subset to create a new smaller data frame and feed in the smaller data frame. See below for an example. This is a bug that needs fixing.

etastart, mustart, coefstart

Same as for vglm.

control

a list of parameters for controlling the fitting process. See vgam.control for details.

method

the method to be used in fitting the model. The default (and presently only) method vgam.fit uses iteratively reweighted least squares (IRLS).

constraints, model, offset

Same as for vglm.

x.arg, y.arg

logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGAM model matrix type model.matrix(vgamfit) where vgamfit is a vgam object.

contrasts, extra, form2, qr.arg, smart

Same as for vglm.

...

further arguments passed into vgam.control.

Details

A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of M additive predictors. The central formula is given by

eta_j = sum_{k=1}^p f_{(j)k}(x_k)

where x_k is the kth explanatory variable (almost always x_1=1 for the intercept term), and f_{(j)k} are smooth functions of x_k that are estimated by smoothers. The first term in the summation is just the intercept. Currently only one type of smoother is implemented and this is called a vector (cubic smoothing spline) smoother. Here, j=1,…,M where M is finite. If all the functions are constrained to be linear then the resulting model is a vector generalized linear model (VGLM). VGLMs are best fitted with vglm.

Vector (cubic smoothing spline) smoothers are represented by s() (see s). Local regression via lo() is not supported. The results of vgam will differ from the gam() (in the gam) because vgam() uses a different knot selection algorithm. In general, fewer knots are chosen because the computation becomes expensive when the number of additive predictors M is large.

The underlying algorithm of VGAMs is iteratively reweighted least squares (IRLS) and modified vector backfitting using vector splines. B-splines are used as the basis functions for the vector (smoothing) splines. vgam.fit() is the function that actually does the work. The smoothing code is based on F. O'Sullivan's BART code.

A closely related methodology based on VGAMs called constrained additive ordination (CAO) first forms a linear combination of the explanatory variables (called latent variables) and then fits a GAM to these. This is implemented in the function cao for a very limited choice of family functions.

Value

An object of class "vgam" (see vgam-class for further information).

WARNING

Currently vgam can only handle constraint matrices cmat, say, such that crossprod(cmat) is diagonal. This is a bug that I will try to fix up soon; see is.buggy.

See warnings in vglm.control.

Note

This function can fit a wide variety of statistical models. Some of these are harder to fit than others because of inherent numerical difficulties associated with some of them. Successful model fitting benefits from cumulative experience. Varying the values of arguments in the VGAM family function itself is a good first step if difficulties arise, especially if initial values can be inputted. A second, more general step, is to vary the values of arguments in vgam.control. A third step is to make use of arguments such as etastart, coefstart and mustart.

Some VGAM family functions end in "ff" to avoid interference with other functions, e.g., binomialff, poissonff, gaussianff, gammaff. This is because VGAM family functions are incompatible with glm (and also gam in the gam library and gam in the mgcv library).

The smart prediction (smartpred) library is packed with the VGAM library.

The theory behind the scaling parameter is currently being made more rigorous, but it it should give the same value as the scale parameter for GLMs.

Author(s)

Thomas W. Yee

References

Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.

Yee, T. W. (2008) The VGAM Package. R News, 8, 28–39.

See Also

is.buggy, vgam.control, vgam-class, vglmff-class, plotvgam, vglm, s, vsmooth.spline, cao.

Examples

 # Nonparametric proportional odds model 
pneumo <- transform(pneumo, let = log(exposure.time))
vgam(cbind(normal, mild, severe) ~ s(let),
     cumulative(parallel = TRUE), data = pneumo, trace = TRUE)

# Nonparametric logistic regression 
fit <- vgam(agaaus ~ s(altitude, df = 2), binomialff, data = hunua)
## Not run:  plot(fit, se = TRUE) 
pfit <- predict(fit, type = "terms", raw = TRUE, se = TRUE)
names(pfit)
head(pfit$fitted)
head(pfit$se.fit)
pfit$df
pfit$sigma

# Fit two species simultaneously 
fit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)),
             binomialff(multiple.responses = TRUE), data = hunua)
coef(fit2, matrix = TRUE)  # Not really interpretable 
## Not run:  plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2)

ooo <- with(hunua, order(altitude))
with(hunua, matplot(altitude[ooo], fitted(fit2)[ooo,], ylim = c(0, 0.8),
     xlab = "Altitude (m)", ylab = "Probability of presence", las = 1,
     main = "Two plant species' response curves", type = "l", lwd = 2))
with(hunua, rug(altitude)) 
## End(Not run)

# The subset argument does not work here. Use subset() instead.
set.seed(1)
zdata <- data.frame(x2 = runif(nn <- 100))
zdata <- transform(zdata, y = rbinom(nn, 1, 0.5))
zdata <- transform(zdata, subS = runif(nn) < 0.7)
sub.zdata <- subset(zdata, subS)  # Use this instead
if (FALSE)
  fit4a <- vgam(cbind(y, y) ~ s(x2, df = 2),
                binomialff(multiple.responses = TRUE), 
                data = zdata, subset = subS)  # This fails!!!
fit4b <- vgam(cbind(y, y) ~ s(x2, df = 2),
              binomialff(multiple.responses = TRUE), 
              data = sub.zdata)  # This succeeds!!!

[Package VGAM version 1.0-1 Index]